COS 126Closest Pairs Due: Wed. 4/9, 11:59pm

Suppose that N random points have been thrown onto a chessboard. In this assignment, you'll write a program to find and plot all pairs of points that are closer to one another than to any other on a two-dimensional space, like a chessboard. You'll be using arrays of linked lists to make it possible to handle large sets of points. This problem is representative of "clustering" problems that arise in database and graphics problems. The following 3-part sequence will lead to the final solution in manageable steps.

### Part 1: Warm Up

Write `gridcount.c`, a program that prints how many points fell into each square. The input is (x,y) coordinates in the unit square (that is, x and y are both between 0 and 1), one pair per line. You can use `/u/cs126/bin/randpoints` to check the input format and to generate test data for small cases. (The source code for `randpoints` is in `/u/cs126/examples/randpoints.c`). With no command-line arguments, `randpoints` generates 10 random points:

```% /u/cs126/bin/randpoints
0.242578 0.013470
0.383139 0.414653
0.067769 0.993127
0.484308 0.765338
0.031834 0.030935
0.932640 0.887880
0.591330 0.478779
0.833354 0.186335
0.735653 0.115053
0.698659 0.355604```

You can specify the number of points with a command-line argument, e.g., `/u/cs126/bin/randpoints 100` emits 100 points. You can pipe the output of randpoint directly into your program, e.g.,

```% /u/cs126/bin/randpoints 100 | a.out
2   1   0   1   0   2   2   3
2   0   2   3   1   1   3   1
1   1   2   2   0   0   0   1
4   2   2   4   1   0   3   2
1   0   0   2   1   1   1   3
6   0   2   0   1   1   1   2
2   4   5   2   1   1   2   2
2   1   1   1   1   2   0   2```

Your program should represent a G×G chessboard with a two-dimensional array, `int grid[G][G]`, where `grid[i][j]` is the number of points that land in the square at row `i`, column `j`. For example, with G = 10, the input line ```0.123 0.456``` causes `grid[1][4]` to be incremented. The output above shows the result for G = 8. Here's a outline of `gridcount.c`:

```#include <stdio.h>

#define G 8

int grid[G][G];

void gridinit(void) { /* initialize grid */ }

void gridinsert(float x, float y) { /* increment grid[i][j] */ }

void gridcount(void) { /* print the grid */ }

int main(void) {
float x, y;

gridinit();
while (scanf("%f %f\n", &x, &y) != EOF) {
gridinsert(x, y);
}
gridcount();
return 0;
}```

Copy this code as a starting point for your `gridcount.c`. Implement the bodies of the functions `gridinit`, `gridinsert`, and `gridcount`; each body is only a few lines of code. Your goal is to be able to handle the data file `/u/cs126/data/stars`, which is a map of about 5000 stars in the sky projected onto the unit square. Here's the output with G = 8:

```% a.out </u/cs126/data/stars
1  12  50  86 122 166  23   0
4  45  76  80 130 143 100   9
29  64  74  92 117 114  72  43
65  65  87  94 137 122  73  46
52  79  85 111 149 110  65  55
45  72  82 149 166  96  54  39
16  62 107 162 111  67  67  10
0  14  54 102  79  50   7   0```

Do not submit `gridcount.c`. The purpose of this warm up is to familiarize yourelf with the problem and with the `grid` data structure to prepare for solving the more difficult problem described below. Run your program for 100 random points, 1000 random points and the star data, and try changing G. Note that `G` is built into the program; to change it, you have to change the `#define` line and recompile your program.

### Part 2: Counting with Linked Lists

The first refinement to the program above is to replace the counts by linked lists of points. That is, in `gridlistcount.c`, `grid[i][j]` points to a linked list of the points that land in the `i`,`j` square. The number of points in each square can be computed by counting the number of nodes in each linked list. Here's an outline of `gridlistcount.c`; bold type identifies those lines that are different from the corresponding lines in `gridcount.c`.

```#include <stdio.h>
#include "misc.h"

#define G 8

struct node {
float x, y;
struct node *next;
} *grid[G][G];

void gridinit(void) { /* initialize grid */ }

void gridinsert(float x, float y) { /* add a node for x,y to grid[i][j] */ }

void gridcount(void) { /* print the grid */ }

int main(void) {
float x, y;

gridinit();
while (scanf("%f %f\n", &x, &y) != EOF) {
gridinsert(x, y);
}
gridcount();
return 0;
}```

The linked lists are built from `node` structures; the `next` field points to the next node on a list. As each point is read, `gridinsert` creates a node for the point and adds it to appropriate linked list. `gridcount` traverses `grid` to count the number of points in each square, printing out the results in a G ×G array as shown above. The output of `gridlistcount` should be identical to the output of `gridcount`.

You'll need to allocate `struct node`s for each point in `gridinsert`. Use `emalloc` described in lectures 14 and 15. Include `misc.h` as shown in the outline above, and compile your program with the command

`% lcc -n -I/u/cs126/include gridlistcount.c /u/cs126/lib/libmisc.a`

`/u/cs126/lib/libmisc.a` is the library that holds `emalloc`. See the lcc man page for information about the `-n` option; you can appreciate the value of this option only by attempting to debug without it.

### Part 3: Closest Pairs

The third and last refinement is the program `gridclose.c`, which finds the closest pairs and plots the points. That is, for each poin P, find the point that is closest to P and draw a line between the two points. As in `gridlistcount.c`, build a linked list of all the points that fall into each square in a G×G grid. Then, for each point, look in the lists associated with the neighboring squares for the closed point. For large point sets, this approach makes the difference between being able to solve the problem and not being able to afford to wait for the answer. Here's a outline of `gridclose.c`; as above, bold type identifies those lines that are different from the corresponding lines in `gridlistcount.c`.

```#include <stdio.h>
#include "misc.h"
#include "print.h"

#define G 20

struct point { float x, y; };
struct node {
struct point pt;
struct node *closest;
struct node *next;
} *grid[G][G];

void gridinit(void) { /* initialize grid */ }

void gridinsert(float x, float y) { /* add a node for x,y to grid[i][j] */ }

void gridclosest(struct node *p) { /* find the node closest to p */ }

int main(void) {
float x, y;

printprolog(G);
gridinit();
while (scanf("%f %f\n", &x, &y) != EOF) {
printpoint(x, y);
gridinsert(x, y);
}
/* for every node t,
set t->closest by calling gridclosest(t) */
/* for every node t,
call printpair(&t->pt, &u->pt) if t->closest = u and u->closest = t */
printepilog();
return 0;
}```

You may copy this code as a starting point for your `gridclose.c`. Notice that the x,y coordinates are now packaged in `point` structures, `node` structures have a point field named pt, and node structures have new field, `closest`, which holds a pointer to another node.You'll be able to use your `gridinit` and `gridinsert` functions from `gridlistcount.c` with only a few modications.

The new function, `gridclosest`, sets the `closest` field in a node `p` to point to the node that holds the point closest to `p`. As suggested in the pseudocode comments above, you need to set the `closest` fields in `main`. Once all of the closest fields are set, make a pass through the nodes and call ```printpair(t, t->closest)``` for every node `t` for which `t == t->closest->closest`. You may ignore the possibility that a point is so isolated that there are no points in any nearby grid squares (for real data, one would presumably find that out by using `gridcount`).

The various `print` functions emit PostScript to draw points (printpoint) and lines between points (printpair). The source code for these functions is in `/u/cs126/examples/print.h` and `/u/cs126/examples/print.c`. You'll need to compile your program with the command line

`% lcc -n -I/u/cs126/examples -I/u/cs126/include gridclose.c /u/cs126/examples/print.c /u/cs126/lib/libmisc.a`

You can make this a bit less painful by using `/u/cs126/bin/lcc` instead of the default `lcc`; `/u/cs126/bin/lcc` automatically looks in `/u/cs126/examples`, `/u/cs126/include` and `/u/cs126/lib` for include files, source files, object files, and libraries, and it automatically searches `/u/cs126/lib/libmisc.a` when it builds an `a.out`. Thus, you can compile your program and `/u/cs126/examples/print.c` with the command

`% /u/cs126/bin/lcc -n gridclose.c print.c`

If you add `/u/cs126/bin` to your path, you can omit `/u/cs126/bin/` in the command above.

Try your program on 100 random points and on the star data. Redirect the output of your program in a file and use `gs`, the Ghostscript previewer, to view the generated Postscript. Use commands like

```% /u/cs126/bin/randpoints 100 | a.out >100.ps
% a.out </u/cs126/data/stars >stars.ps```

Follow the links above to see the expected output. `100.ps` should be 174 lines and `stars.ps` should be 6062 lines! Of course, you should test your program first on very small point sets.

`/u/cs126/bin/submit 8 readme gridlistcount.c gridclose.c`
Do not turn in `gridcount.c`.