Programming Assignment 5: Kd-Trees


Write a data type to represent a set of points in the unit square (all points have x- and y-coordinates between 0 and 1) using a 2d-tree to support efficient range search (find all of the points contained in a query rectangle) and nearest neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval.

Range search and k-nearest neighbor


Geometric primitives. To get started, use the following geometric primitives for points and axis-aligned rectangles in the plane.

Geometric primitives

Use the immutable data type Point2D.java (part of algs4.jar) for points in the plane. Here is the subset of its API that you may use:

public class Point2D implements Comparable<Point2D> {
   public Point2D(double x, double y)              // construct the point (x, y)
   public  double x()                              // x-coordinate
   public  double y()                              // y-coordinate
   public  double distanceSquaredTo(Point2D that)  // square of Euclidean distance between two points
   public     int compareTo(Point2D that)          // for use in an ordered symbol table
   public boolean equals(Object that)              // does this point equal that?
   public    void draw()                           // draw to standard draw
   public  String toString()                       // string representation
}
Use the immutable data type RectHV.java (not part of algs4.jar) for axis-aligned rectanges. Here is the subset of its API that you may use:
public class RectHV {
   public    RectHV(double xmin, double ymin,      // construct the rectangle [xmin, xmax] x [ymin, ymax]
                    double xmax, double ymax)      // throw a java.lang.IllegalArgumentException if (xmin > xmax) or (ymin > ymax)
   public  double xmin()                           // minimum x-coordinate of rectangle
   public  double ymin()                           // minimum y-coordinate of rectangle
   public  double xmax()                           // maximum x-coordinate of rectangle
   public  double ymax()                           // maximum y-coordinate of rectangle
   public boolean contains(Point2D p)              // does this rectangle contain the point p (either inside or on boundary)?
   public boolean intersects(RectHV that)          // does this rectangle intersect that rectangle (at one or more points)?
   public  double distanceSquaredTo(Point2D p)     // square of Euclidean distance from point p to closest point in rectangle
   public boolean equals(Object that)              // does this rectangle equal that?
   public    void draw()                           // draw to standard draw
   public  String toString()                       // string representation
}
Do not modify these data types.

Brute-force implementation. Write a mutable data type PointSET.java that represents a set of points in the unit square. Implement the following API by using a red-black BST (using either SET from algs4.jar or java.util.TreeSet, do not try to implement your own red-black BST).

public class PointSET {
   public    PointSET()                            // construct an empty set of points
   public           boolean isEmpty()              // is the set empty?
   public               int size()                 // number of points in the set
   public              void insert(Point2D p)      // add the point p to the set (if it is not already in the set)
   public           boolean contains(Point2D p)    // does the set contain the point p?
   public              void draw()                 // draw all of the points to standard draw
   public Iterable<Point2D> range(RectHV rect)     // all points in the set that are inside the rectangle
   public    Point2D nearest(Point2D p)            // a nearest neighbor in the set to p; null if set is empty
   public static void main(String[] args)          // unit testing of the methods
}
Your implementation should support insert() and contains() in time proportional to the logarithm of the number of points in the set in the worst case; it should support nearest() and range() in time proportional to the number of points in the set.

2d-tree implementation. Write a mutable data type KdTree.java that uses a 2d-tree to implement the same API (but replace PointSET with KdTree). A 2d-tree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in the nodes, using the x- and y-coordinates of the points as keys in strictly alternating sequence, starting with the x-coordinates.

Insert (0.7, 0.2)

insert (0.7, 0.2)
Insert (0.5, 0.4)

insert (0.5, 0.4)
Insert (0.2, 0.3)

insert (0.2, 0.3)
Insert (0.4, 0.7)

insert (0.4, 0.7)
Insert (0.9, 0.6)

insert (0.9, 0.6)
Insert (0.7, 0.2)
Insert (0.5, 0.4)
Insert (0.2, 0.3)
Insert (0.4, 0.7)
Insert (0.9, 0.6)

The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest neighbor search. Each node corresponds to an axis-aligned rectangle in the unit square, which encloses all of the points in its subtree. The root corresponds to the unit square; the left and right children of the root corresponds to the two rectangles split by the x-coordinate of the point at the root; and so forth.

Clients.  You may use the following interactive client programs to test and debug your code.

Analysis of running time and memory usage. Analyze the effectiveness of your approach to this problem by giving estimates of its time and space requirements.

Submission.  Submit only PointSET.java and and KdTree.java. Each of the two data types should include their own main() that thoroughly tests the associated operations. We will supply Point2D.java, RectHV.java, stdlib.jar, and algs4.jar. You may not call any library functions other than those in java.lang, java.util, stdlib.jar, and algs4.jar. Finally, submit a readme.txt file and answer the questions.

Challenge for the bored.  Create a convincing boid simulator using a kdTree to track boids. You'll need to add a new method called kNearest that returns each boid's k nearest neighbors. Feel free to use the (uncommented and hacked together) starter code provided on the ftp site. It's not too hard to get my starter code running (and it looks cool), but the flocking really isn't quite there (e.g. boids seem to like to stay precisely evenly spaced), and the hawk manuevers like a brick (largely due to my physics model where the hawk gets a fixed amount of directional thrust. If you make the delta thurst large enough to allow manueverability, then he can outrun any boid, which is no fun). Josh is the only one who will provide support for this optional part of the assignment. Working simulations should be emailed directly to Josh. Very good submissions that substantially improve the overall flocking behavior will earn a single point of extra credit. Code that simply fills in the blanks in BoidKdTree will not get extra credit (but still feel free to email me with such submissions along with any thoughts you may have on this challenge). Due by Dean's date.

This assignment was developed by Kevin Wayne.