COS 226 Programming Assignment Checklist: Kd-Trees

Frequently Asked Questions

Can I add extra public methods to the Point2D and RectHV APIs? You are permitted (and encouraged) to add a main() test client, but you may not add any other public methods.

Is a point on the boundary of a rectangle considered inside it? Do two rectangle intersect if they have just one point in common? Yes, and yes.

What should I do if a point has the same x-coordinate as the point in a node when inserting / searching in a 2d-tree? Go the right subtree as specified.

What should be the format of the toString() methods in RectHV? These methods are primarily for debuginng, so any reasonable format is fine. We recommend [xmin, xmax] x [ymin, ymax].

Can I assume that the arguments to methods in RectHV are not null? You can assume that the arguments to contains(), intersects(), distanceTo(), and distanceSquaredTo() are not null. However, if the argument to equals() is null, you must return false to obey the contract. You can also assume that none of the arguments to the constructor are Double.NaN.

My implementation of distanceTo() in RectHV has numerous cases. Am I on the right track? This is acceptable provided you handle every possible case. However, there is a direct approach that requires about 6 lines of code.

What is the purpose of the distanceSquaredTo() methods in Point2D and RectHV? If you need to compare two distances, it is often more efficient to compare the squares of the two distances to avoid the expensive operation of taking square roots.

The suggested Node type in the checklist doesn't have an integer for storing size like BST Nodes. How should I keep track of tree size? The reason BST Nodes have size is to support the rank operation. For KdTree, there is no need to store subtree size. Instead, make size an instance variable of KdTree instead of Node.

When do I need private methods? Private methods are useful for keeping your code organized. One common organizational scheme is to recursively call private methods. See page 399 of the book or the code for BST.java. For example, the public insert() method for KdTree is best implemented using a supporting private insert() method which is called recursively. As with BST, this private method will have more arguments than the public method. Part of the assignment is deciding what arguments are needed.

Can I assume that all x- or y-coordinates of points inserted into the KdTree will be between 0 and 1? Yes. You may also assume that the insert(), contains(), and nearest() methods in KdTree are passed points with x- and y-coordinates between 0 and 1. You may also assume that the range() method in KdTree is passed a rectangle that lies in the unit box. However, the RectHV data type should not make such assumptions.

What should I do if a point is inserted twice in the data structure? The data structure represents a set of points, so you should keep only one copy.

How should I scale the coordinate system when drawing? Don't, please keep the default range of 0 to 1.

How should I set the size and color of the points and rectangles when drawing? Your RectHV data types should draw lines using StdDraw.line() or StdDraw.rectangle(), without calling StdDraw.setPenColor() or StdDraw.setPenRadius(). In client code (such as the draw() method in KdTree), use StdDraw.setPenColor() before drawing the points and splitting lines. Also use StdDraw.setPenRadius(.01) before drawing the points and StdDraw.setPenRadius() before drawing the lines.

Am I required to explicitly store a RectHV in each node in my 2d-tree? No, this is not a requirement, but it is probably wise in your first version.

What should range() return if there are no points in the range? It should return an Iterable<Point2D> object with zero points.

How much memory does a Point2D object use? For simplicity, assume that each Point2D object uses 32 bytes—in reality, it uses a bit more because of the Comparator instance variables.

How much memory does a RectHV object use? It depends on your implementation. It is your job to analyze it.

I run out of memory when running some of the large sample files. What should I do? Be sure to ask Java for additional memory, e.g., java -Xmx1600m RangeSearchVisualizer < input1M.txt.

Testing and Submitting

Testing.   We highly recommend testing each method in each data type. Catching mistakes in RectHV early will save you significant time when debugging later.

Sample input files.   We provide some input files for testing

The result of running the KdTreeVisualizer on the points in circle10.txt should look like the following:

If nearest is then called with p = (.81, .30) the number of nodes visited in order to find that F is nearest is 5.

Starting with circle10k.txt if nearest is called with p = (.81, .30) the number of nodes visited in order to find that K is nearest is 6.

Possible Progress Steps

These are purely suggestions for how you might make progress. You do not have to follow these steps.

  1. Getting started. Download the directory kdtree to your system. It contains some sample input files and test clients that you will use.

  2. Node data type. There are several reasonable ways to represent a node in a 2d-tree. One approach is to include the point, a link to the left/bottom subtree, a link to the right/top subtree, and an axis-aligned rectangle corresponding to the node.
    private static class Node {
       private Point2D p;      // the point
       private RectHV rect;    // the axis-aligned rectangle corresponding to this node
       private Node lb;        // the left/bottom subtree
       private Node rt;        // the right/top subtree
    }
    
    Unlike the Node class for BST, this Node class can be static because it does not refer to a generic Key or Value type that depends on the object associated with the outer class. This saves the 8-byte inner class object overhead.

  3. Writing KdTree. Start by writing isEmpty() and size(). These should be very easy. From there, write a simplified version of insert() which does everything except set up the RectHV for each node. Write the contains() method, and use this to test that insert() was implemented properly. Note that insert() and contains() are best implemented by using private helper methods analogous to those found on page 399 of the book or by looking at BST.java. We recommend using orientation as an argument to these helper methods.

    Now add the code to insert() which sets up the RectHV for each Node. Next, write draw(), and use this to test these rectangles. Finish up KdTree with the nearest and range methods. Finally, test your implementation using our interactive client programs as well as any other tests you'd like to conduct.

Optimizations

These are some ways to speed up your 2d-tree.