COS 402: Artificial Intelligence

Homework #3

Satisfaction

Fall 2004

Due: Friday, October 22


Special late policy:  Because this homework is due right before vacation, there will be a special late policy that hopefully will reduce the pressure over the break on people who are unable to complete the assignment on time (especially those who might be traveling).  In particular, the first five calendar days of break will count as a single late "day".  So, if you turn in this homework after the due date (Oct. 22) but before 11:59pm on Oct. 27, it will count as being only one day late.  After that, calendar days will count the same as late days.  For instance, a homework turned in on Oct. 28 will count as being two days late.  As usual, no homeworks will be accepted more than five "days" late, which means they will not be accepted after Sunday, Oct. 31.

For this homework only, if you are out of the Princeton area over break, you may mail or email the written part of the assignment to Zafer.  If mailed, your homework is considered submitted on the post mark date, and should be sent to this address:  Zafer Barutcuoglu, Princeton University, Department of Computer Science, 35 Olden Street, Princeton, NJ  08544.  (It would be wise to send him email at the same time you mail your assignment so that he can look out for it; also, save a photocopy of your work.)


Part I:  Written Exercises

See instructions on the assignments page (as well as the special instructions above) on how and when to turn these in.  (Each of these will be worth around 15 points.)

1.  Exercise 7.4a,b,c in R&N.  Be sure to prove these assertions; don't just wave your hands.

2.  Exercise 7.9 in R&N.  Also show how the resolution algorithm would handle each of these questions.

3.  Suppose two cards are drawn at random from a full deck of cards.

  1. What is the probability that the first card is either a King or a Queen?
  2. What is the probability that the first card is either a King or a Heart?
  3. Is the suit (heart, spade, diamond, club) of the first card independent of the suit of the second card?  Justify your answer (this means justify mathematically using the definition of independence).
  4. Is the suit of the first card independent of the rank (A,2,3,...,10,J,Q,K) of  the second card?  Justify your answer.

4.  Exercise 13.11 in R&N.


Part II:  Programming

In this assignment, you will have a chance to develop and experiment with your own ideas for solving CNF satisfiability problems.  The assignment has three parts, one of which is optional and for extra credit:

  1. Write the fastest program you can for finding satisfying assignments (models) for a given CNF sentence.
  2. Write a program for generating many "interesting" CNF sentences.
  3. (Optional) Run a systematic experiment using your solver and generator programs.

Each of these are described in detail below.  After the submission deadline, we will also conduct our own "implementation challenge" to see whose program is fastest and most effective.

A satisfiability solver

The first part of the assignment is to implement a satisfiability solver, or "sat-solver" for short.  Given a CNF sentence, your sat-solver should attempt to find a satisfying assignment as quickly as possible.  Your sat-solver does not need to be complete in the sense of returning "failure" when the CNF is unsatisfiable.  Rather, if no satisfying assignment can be found, it should return the "best" assignment it can find, i.e., one with as many clauses satisfied as possible.  Most likely, you will want to use a variant of WalkSat, but this is not required.  Be creative.  There are many ways in which even WalkSat can be tinkered with to improve performance.

Your sat-solver will be given a CNF in a representation described below.  It also will be given a time limit in the form of a "timer" object.  Your sat-solver must return with some answer (i.e., some assignment) by the time the timer reaches zero, or as soon thereafter as possible.

A generator

The second part of the assignment is to devise and implement a method of generating many CNF sentences corresponding to a natural class of problems.  For instance, in class, we discussed how graph-coloring problems can be converted into CNF satisfiability problems.  Thus, one way of generating interesting CNF sentences is to first generate random graph-coloring problems, and then convert them into CNF sentences whose satisfiability must be solved to find a coloring of the randomly generated graph.  Similarly, we saw in class how a blocks-world planning problem can be converted into a satisfiability problem.  A generator in this case could be designed by first generating random blocks-world problems (i.e., start and goal configurations) and then converting these into satisfiability problems.

For this part of the assignment, you should (1) choose a natural problem, (2) figure out how to reduce this problem to a satisfiability problem, (3) implement a generator for natural instances of the original problem, and (4) implement your reduction.  The end result will be a program that generates CNF sentences, but of a very particular form.  Since we discussed graph coloring and blocks-world planning problems in class, these would not be appropriate choices for this assignment.  However, here are some other examples:

Some of these are deliberately vague.  You can fill in the details as you wish.  You should not feel constrained to choose any of these.  On the contrary, you are strongly encouraged to be creative and come up with your own problem.

You need to pick a problem for which it is easy and natural to generate many CNF sentences.  For instance, it would not be appropriate to choose a problem such as n-queens since, for each n, there is only a single problem instance.

Once you have chosen a problem, you need to figure out how to generate random instances, and how to reduce these instances to satisfiability problems.  Be very careful to check that an assignment that satisfies the CNF also is a solution to the original problem instance, and vice versa.

For some problems, it is possible to design your generator to produce only satisfiable CNF sentences.  For instance, for the graph-coloring problem, you can start by 3-coloring a set of vertices and then add random edges, never adding an edge that would connect two like-colored vertices.  The resulting graphs (without the colors, of course) will clearly be 3-colorable by construction, so the generated CNF's will also be satisfiable.  If possible, you are encouraged (but not required) to produce a generator of this form since it should make the implementation challenge more interesting (obviously, no one will be able to solve an unsatisfiable CNF).

The final step is to implement all of the above.  The end product will be a method for generating many CNF sentences.  Your generator (and those of your classmates) will be used to generate the CNF sentences used for the class implementation challenge.

In case the process of creating a generator is still unclear, here is a clarifying note written last year by Matt Hibbs, one of that year's TA's.

A systematic experiment

The final part of this assignment, which is optional and for extra credit, is to conduct a systematic experiment of some sort exploring some aspect of your sat-solver.  An example of such an experiment is the one shown in Figure 7.18b of R&N.  In this experiment, the performance of WalkSat was tested on randomly generated 3-CNF sentences with the number of symbols fixed, but with varying clause-to-symbol ratios.  Most likely, your generator will take some parameters, such as the number of edges and number of vertices in the graph that is to be 3-colored in the coloring problem.  You might try measuring run time as a function of one of the parameters (with the others held fixed).  For instance, in this example, you could measure run time as a function of the number of edges with the number of vertices held fixed.  Here is how you might conduct such an experiment:

  1. Fix the number of vertices to be, say, 50.  Let the number of edges vary taking values, say, of 20, 40, 60, ..., 200.
  2. For each of these settings, generate, say, 50 random graphs and convert them to CNF sentences.  If possible, generate only CNF sentences that are known to be satisfiable (see note above).
  3. Run your sat-solver on all of the generated CNF sentences, and record all the results, i.e., running time and the number of unsatisfied clauses.  You may need to limit the running time of each run to keep the experiment from running for too long.
  4. Make a plot with the number of edges on the x-axis, and the average (or median) running time on the y-axis.  (There are many programs available for making plots including excel, gnuplot and matlab.)

The numbers chosen here for number of vertices and number of edges were arbitrary.  In your own experiment, if possible, you should try to identify an "interesting" region for the parameter settings, as in R&N Figure 7.18b where the running time increases but then decreases as the clause-to-symbol ratio is increased.

This is just one possible experiment that could be conducted.  Myriad possibilities exist.  For instance, you might want to compare performance of two substantially different algorithms, or variants of the same algorithm, over a range of problems.  Or, if your algorithm has a parameter, you can see how the algorithm's performance varies as a function of this parameter.  Most likely, if you play around with your algorithm and your generator, you will encounter some aspect of one or the other that seems worth exploring systematically.  Moreover, such an experiment may help you to design a more effective sat-solver.  As with the other parts of this assignment, you are encouraged to think creatively.  It is sufficient to conduct a single experiment for this part of the assignment, although you are welcome to do more.

Be careful to plan your experiment carefully, possibly after running some preliminary test.  Because we are dealing with randomized algorithms and random instances, you will almost certainly need to run your algorithm many times, taking the average (or median) of the results.  This also means that the experiments can take quite a while to run.  This is why it is so important that you start early if you are planning to do this part of the assignment.  Ideally, you would run your algorithm hundreds of times to get reliable results.  Unfortunately, you probably will not have time to do this, and may be forced to run only ten or twenty times.  Plan carefully, and be sure your experiments will terminate in a reasonable period of time before beginning them.  For instance, the experiment outlined above will require generating 50 * 10 = 500 CNF sentences.  If we allow our sat-solver 30 seconds for each one, this gives a total (worst-case) running time of 500 * 30 = 15000 seconds, or about 4 hours, 10 minutes.

Obviously, running times may vary depending on who else is using the computer you are on.  If you are using a shared computer, this might lead to unreliable results.  In this case, you may instead wish to measure some quantity other than running time, such as the number of variable flips made by the algorithm.

The implementation challenge

Once you turn in your sat-solver and generator, we will run everyone's sat-solver on everyone's generator to determine whose sat-solver runs the fastest.  In particular, each generator will be used to generate three CNF sentences.  We will then run every sat-solver on the resulting collection of CNF sentences.  Each sat-solver will be given 20 seconds on a 2.4GHz PC to solve each CNF sentence.  If your sat-solver solves a CNF, you will be charged the actual number of seconds it took to solve it.  If you do not solve it, you will be charged the full 20 seconds, or your actual time, whichever is larger (this is to penalize going over the time limit).  We will then take the average (over all CNF's) of these running times.

We also will run other statistics, such as the average number of unsatisfied clauses achieved by each sat-solver.

To make the implementation challenge interesting, please try to set up your generator to produce CNF's that are challenging but that at least your own sat-solver would have a reasonable chance of solving in 20 seconds (if run on a 2.4GHz machine).  Obviously, it won't be much fun if no one's sat-solver can solve anyone's generated CNF's, nor if all of the CNF's are trivially solvable almost instantly.

Your grade will not depend on how well you do in this implementation challenge relative to the rest of the class.  Although you are encouraged to do the best you can, this should not be regarded as anything more than a fun (I hope) communal experiment exploring methods for solving satisfiability problems.

The code we are providing

We are providing the following classes and interfaces:

A CNF is represented as an array of arrays of Literals.  Each component array corresponds to a clause (disjunction of literals) in the natural way.  Thus, cnf[4][1] corresponds to literal #1 in clause #4.  Note that, in java, these component arrays do not all need to be of the same length.  (See the section on multidimensional arrays in Java in a Nutshell.)  Likewise, a clause is represented as an array of  Literals with the obvious interpretation, so a CNF can be viewed equivalently as an array of clauses.

A model or assignment is represented as a boolean array in the natural way.  Thus, model[3] is the true/false assignment to symbol #3.

We also are providing a Generator called RandomCnfGenerator which generates random CNF formulas of a fixed size with a fixed number of symbols and clauses.  You can use it in your experiments, or as an example for writing your own generator.

A large number of CNF sentences are available from the website of the DIMACS implementation challenge that was conducted several years ago.  You may download and try out your program on these, or use these in your experiments.  The contents of the files on this website are (very tersely) described here (unfortunately, I really don't know much more about these beyond what appears in this document). We have provided a Generator called DimacsFileGenerator that reads in files of this sort (after they have been uncompressed).

Both classes RandomCnfGenerator and DimacsFileGenerator have their own main methods which can be used for testing.  Moreover, you may wish to use these as examples of how to use SatSolvers and Generators.

All code can be downloaded from this directory, or all at once from this tar file.  Documentation is available here.

What to turn in

Using whiteboard, as in HW#2, you should turn in the following:

The code that you turn in should not write anything to standard output or standard error.

In addition to the program itself, you should write a brief description of:

This written work should be handed in with the written exercises.

What you will be graded on

You will be graded on completing all the different parts of this assignment.  You have a great deal of freedom in choosing how much work you want to put into this assignment, and your grade will in part reflect how great a challenge you decide to take on.  Creativity will certainly be an important component of your grade.

Although your standing in the implementation challenge will not affect your grade, your sat-solver should be at least as fast and effective as vanilla WalkSat.  However, you should not let this deter you from trying a risky idea for an algorithm quite different from WalkSat.  If you have such an idea, you might go ahead and implement it, and compare its performance to WalkSat, thus satisfying the third (optional) part of the assignment.  You then can enter the faster of the two algorithms in the implementation challenge.

Needless to say, your code should not halt prematurely with an exception (unless given bad data).

As usual, you should follow good programming practices, including documenting your code.  Finally, the written part of this assignment should be clear and concise.

This programming assignment will be worth roughly 75 points, which will be divided between the write-up, the generator, the sat-solver, and overall creativity.  The extra credit "systematic experiment" portion will be worth 5-30 points (depending on what you do for this part).