COS 402: Artificial Intelligence

Homework #3

Satisfaction

Fall 2005

Due: Monday, October 24


Special late policy:  For this homework only, there will be a special late policy that hopefully will reduce the pressure of midterms followed by break.  During midterms week, one late "day" will be equivalent to two consecutive calendar days.  For instance, Tuesday and Wednesday of midterms week will together count as a single late "day".  All other rules of the standard late policy will remain in effect (including weekends counting as a single late "day", and the limit of not turning in the assignment more than five "days" late).  To be clear, here is a table summarizing how many "days" late your assignment will be counted if turned in on the following dates:

Calendar date Number of late days charged
Monday, October 24 0
Wednesday, October 26 1
Friday, October 28 2
Sunday, October 30 3
Monday, October 31 4
Tuesday, November 1 5
Wednesday, November 2 not accepted on or after this date

For this homework only, if you are out of the Princeton area over break, you may mail or email the written part of the assignment to Berk.  If mailed, your homework is considered submitted on the post mark date, and should be sent to this address:  Berk Kapicioglu, Princeton University, Department of Computer Science, 35 Olden Street, Princeton, NJ 08544.  (It would be wise to send him email at the same time you mail your assignment so that he can look out for it; also, save a photocopy of your work.)


Part I:  Written Exercises

See instructions on the assignments page on how and when to turn these in.  (Numbers in brackets indicate approximate point values.)

1.  [15]  Exercise 7.4a,b,c in R&N.  Be sure to prove these assertions; don't just wave your hands.

2.  [15]  Exercise 7.9 in R&N.  Also show how the resolution algorithm would handle each of these questions.

3.  [10]  Exercise 7.17 in R&N.

4.  [15]  Exercise 13.11 in R&N.  Be sure to justify your answers.


Part II:  Programming

In this assignment, you will have a chance to develop and experiment with your own ideas for solving CNF satisfiability problems.  The assignment has three parts, one of which is optional and for extra credit:

  1. Write the fastest program you can for finding satisfying assignments (models) for a given CNF sentence.
  2. Write a program for generating many "interesting" CNF sentences.
  3. (Optional) Run a systematic experiment using your solver and generator programs.

Each of these are described in detail below.  After the submission deadline, we will also conduct our own "implementation challenge" to see whose program is fastest and most effective.

A satisfiability solver

The first part of the assignment is to implement a satisfiability solver, or "sat-solver" for short.  Given a CNF sentence, your sat-solver should attempt to find a satisfying assignment as quickly as possible.  Your sat-solver does not need to be complete in the sense of returning "failure" when the CNF is unsatisfiable.  Rather, if no satisfying assignment can be found, it should return the "best" assignment it can find, i.e., one with as many clauses satisfied as possible.  Most likely, you will want to use a variant of WalkSat, but this is not required.  Be creative.  There are many ways in which even WalkSat can be tinkered with to improve performance.

Your sat-solver will be given a CNF in a representation described below.  It also will be given a time limit in the form of a "timer" object.  Your sat-solver must return with some answer (i.e., some assignment) by the time the timer reaches zero, or as soon thereafter as possible.

A generator

The second part of the assignment is to devise and implement a method of generating many CNF sentences corresponding to a natural class of problems.  For instance, in class, we discussed how graph-coloring problems can be converted into CNF satisfiability problems.  Thus, one way of generating interesting CNF sentences is to first generate random graph-coloring problems, and then convert them into CNF sentences whose satisfiability must be solved to find a coloring of the randomly generated graph.  Similarly, we saw in class how a blocks-world planning problem can be converted into a satisfiability problem.  A generator in this case could be designed by first generating random blocks-world problems (i.e., start and goal configurations) and then converting these into satisfiability problems.

For this part of the assignment, you should (1) choose a natural problem, (2) figure out how to reduce this problem to a satisfiability problem, (3) implement a generator for natural instances of the original problem, and (4) implement your reduction.  The end result will be a program that generates CNF sentences, but of a very particular form.  Since we discussed graph coloring and blocks-world planning problems in class, these would not be appropriate choices for this assignment.  However, here are some other examples:

Some of these are deliberately vague.  You can fill in the details as you wish.  You should not feel constrained to choose any of these.  On the contrary, you are strongly encouraged to be creative and come up with your own problem.

You need to pick a problem for which it is easy and natural to generate many CNF sentences.  For instance, it would not be appropriate to choose a problem such as n-queens since, for each n, there is only a single problem instance.

Once you have chosen a problem, you need to figure out how to generate random instances, and how to reduce these instances to satisfiability problems.  Be very careful to check that an assignment that satisfies the CNF also is a solution to the original problem instance, and vice versa.

For some problems, it is possible to design your generator to produce only satisfiable CNF sentences.  For instance, for the graph-coloring problem, you can start by 3-coloring a set of vertices and then add random edges, never adding an edge that would connect two like-colored vertices.  The resulting graphs (without the colors, of course) will clearly be 3-colorable by construction, so the generated CNF's will also be satisfiable.  If possible, you are encouraged (but not required) to produce a generator of this form since it should make the implementation challenge more interesting (obviously, no one will be able to solve an unsatisfiable CNF).

The final step is to implement all of the above.  The end product will be a method for generating many CNF sentences.  Your generator (and those of your classmates) will be used to generate the CNF sentences used for the class implementation challenge.

In case the process of creating a generator is still unclear, here is a clarifying note written by Matt Hibbs when he was a TA for this course in 2003.

A systematic experiment

The final part of this assignment, which is optional and for extra credit, is to conduct a systematic experiment of some sort exploring some aspect of your sat-solver.  An example of such an experiment is the one shown in Figure 7.18b of R&N.  In this experiment, the performance of WalkSat was tested on randomly generated 3-CNF sentences with the number of symbols fixed, but with varying clause-to-symbol ratios.  Most likely, your generator will take some parameters, such as the number of edges and number of vertices in the graph that is to be 3-colored in the coloring problem.  You might try measuring run time as a function of one of the parameters (with the others held fixed).  For instance, in this example, you could measure run time as a function of the number of edges with the number of vertices held fixed.  Here is how you might conduct such an experiment:

  1. Fix the number of vertices to be, say, 50.  Let the number of edges vary taking values, say, of 20, 40, 60, ..., 200.
  2. For each of these settings, generate, say, 50 random graphs and convert them to CNF sentences.  If possible, generate only CNF sentences that are known to be satisfiable (see note above).
  3. Run your sat-solver on all of the generated CNF sentences, and record all the results, i.e., running time and the number of unsatisfied clauses.  You may need to limit the running time of each run to keep the experiment from running for too long.
  4. Make a plot with the number of edges on the x-axis, and the average (or median) running time on the y-axis.  (There are many programs available for making plots including excel, gnuplot and matlab.)

The numbers chosen here for number of vertices and number of edges were arbitrary.  In your own experiment, if possible, you should try to identify an "interesting" region for the parameter settings, as in R&N Figure 7.18b where the running time increases but then decreases as the clause-to-symbol ratio is increased.

This is just one possible experiment that could be conducted.  Myriad possibilities exist.  For instance, you might want to compare performance of two substantially different algorithms, or variants of the same algorithm, over a range of problems.  Or, if your algorithm has a parameter, you can see how the algorithm's performance varies as a function of this parameter.  Most likely, if you play around with your algorithm and your generator, you will encounter some aspect of one or the other that seems worth exploring systematically.  Moreover, such an experiment may help you to design a more effective sat-solver.  As with the other parts of this assignment, you are encouraged to think creatively.  It is sufficient to conduct a single experiment for this part of the assignment, although you are welcome to do more.

Be careful to plan your experiment carefully, possibly after running some preliminary test.  Because we are dealing with randomized algorithms and random instances, you will almost certainly need to run your algorithm many times, taking the average (or median) of the results.  This also means that the experiments can take quite a while to run.  This is why it is so important that you start early if you are planning to do this part of the assignment.  Ideally, you would run your algorithm hundreds of times to get reliable results.  Unfortunately, you probably will not have time to do this, and may be forced to run only ten or twenty times.  Plan carefully, and be sure your experiments will terminate in a reasonable period of time before beginning them.  For instance, the experiment outlined above will require generating 50 * 10 = 500 CNF sentences.  If we allow our sat-solver 30 seconds for each one, this gives a total (worst-case) running time of 500 * 30 = 15000 seconds, or about 4 hours, 10 minutes.

Obviously, running times may vary depending on who else is using the computer you are on.  If you are using a shared computer, this might lead to unreliable results.  In this case, you may instead wish to measure some quantity other than running time, such as the number of variable flips made by the algorithm.

The implementation challenge

Once you turn in your sat-solver and generator, we will run everyone's sat-solver on everyone's generator to determine whose sat-solver runs the fastest.  In particular, each generator will be used to generate two CNF sentences.  We will then run every sat-solver on the resulting collection of CNF sentences.  Each sat-solver will be given 15 seconds on a 2.4GHz PC to solve each CNF sentence.  If your sat-solver solves a CNF, you will be charged the actual number of seconds it took to solve it.  If you do not solve it, you will be charged the full 15 seconds, or your actual time, whichever is larger (this is to penalize going over the time limit).  We will then take the average (over all CNF's) of these running times.

We also will run other statistics, such as the average number of unsatisfied clauses achieved by each sat-solver.

To make the implementation challenge interesting, please try to set up your generator to produce CNF's that are challenging but that at least your own sat-solver would have a reasonable chance of solving in 15 seconds (if run on a 2.4GHz machine).  Obviously, it won't be much fun if no one's sat-solver can solve anyone's generated CNF's, nor if all of the CNF's are trivially solvable almost instantly.

Your grade will not depend on how well you do in this implementation challenge relative to the rest of the class.  Although you are encouraged to do the best you can, this should not be regarded as anything more than a fun (I hope) communal experiment exploring methods for solving satisfiability problems.

Outside sources

It is okay to use outside sources (such as articles or websites) to come up with ideas for sat-solvers or generators.  However, this is certainly not required, nor even encouraged (it is even slightly discouraged since the emphasis of this assignment is in coming up with your own ideas).  To the extent that it is done in a manner consistent with the course collaboration policy, it is also okay to exchange ideas with your classmates.  In any case, all outside sources must be acknowledged in the report that you turn in.

The code we are providing

We are providing the following classes and interfaces:

In addition to what is described above, each Generator and SatSolver must provide methods called author() and description() returning as Strings the author of the program, and a brief description of the generator or sat-solver, and how it works.  We will use these methods when publicly reporting the results of the implementation challenge.  The "author" that your program reports can be your real name or, if you want to remain anonymous, you can choose a pseudonym (and you do not need to use the same name for your generator and sat-solver).

A CNF is represented as an array of arrays of Literals.  Each component array corresponds to a clause (disjunction of literals) in the natural way.  Thus, cnf[4][1] corresponds to literal #1 in clause #4.  Note that, in java, these component arrays do not all need to be of the same length.  (See, for instance, the section on multidimensional arrays in Java in a Nutshell.)  Likewise, a clause is represented as an array of  Literals with the obvious interpretation, so a CNF can be viewed equivalently as an array of clauses.

A model or assignment is represented as a boolean array in the natural way.  Thus, model[3] is the true/false assignment to symbol #3.

We also are providing Generators called RandomCnfGenerator and RandomSatisfiedCnfGenerator which generate random CNF formulas of a fixed size with a fixed number of symbols and clauses.  RandomCnfGenerator generates CNF sentences that may or may not be satisfiable, while RandomSatisfiedCnfGenerator is guaranteed to only generate CNF's that are satisfiable.  You can use these in your experiments, or as examples for writing your own generator.

A large number of CNF sentences are available from the website of the DIMACS implementation challenge that was conducted several years ago.  You may download and try out your program on these, or use these in your experiments.  The contents of the files on this website are (very tersely) described here.  Further description, and even more benchmark problems can be found on the satlib site. We have provided a Generator called DimacsFileGenerator that reads in files of this sort (after they have been unpacked and uncompressed, if necessary).

Each of the classes RandomCnfGenerator, RandomSatisfiedCnfGenerator  and DimacsFileGenerator has its own main method which can be used for testing.  Moreover, you may wish to use these as examples of how to use SatSolvers and Generators.

We are also providing a sample SatSolver called RandomSearchSatSolver that tries out random models until it finds one satisfying the given CNF, or runs out of time.

All code can be downloaded from this directory, or all at once from this tar file.  Documentation is available here.

What to turn in

Using whiteboard, as in HW#2, you should turn in the following:

The code that you turn in should not write anything to standard output (this will really mess up the tournament).

In addition to the program itself, you should write a brief description of:

This written work should be handed in with the written exercises.

What you will be graded on

You will be graded on completing all the different parts of this assignment.  You have a great deal of freedom in choosing how much work you want to put into this assignment, and your grade will in part reflect how great a challenge you decide to take on.  Creativity will certainly be an important component of your grade.

Although your standing in the implementation challenge will not affect your grade, your sat-solver should be at least as fast and effective as vanilla WalkSat.  However, you should not let this deter you from trying a risky idea for an algorithm quite different from WalkSat.  If you have such an idea, you might go ahead and implement it, and compare its performance to WalkSat, thus satisfying the third (optional) part of the assignment.  You then can enter the faster of the two algorithms in the implementation challenge.

Needless to say, your code should not halt prematurely with an exception (unless given bad data).

As usual, you should follow good programming practices, including documenting your code.  Finally, the written part of this assignment should be clear and concise.

This programming assignment will be worth roughly 75 points, which will be divided between the write-up, the generator, the sat-solver, and overall creativity.  The extra credit "systematic experiment" portion will be worth 5-30 points (depending on what you do for this part).