COS 226 Programming Assignment

Kd-Trees


Write a data type to represent a set of points in the unit square (all points have x- and y-coordinates between 0 and 1) using a 2d-tree to support efficient range search (find all of the points contained in a query rectangle) and nearest neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval.

Range search and k-nearest neighbor


Geometric primitives. To get started, implement a geometric primitive for axis-aligned rectangles in the plane.

Geometric primitives

Use the immutable data type Point2D.java for points in the plane, which implements a superset of the following API:

public class Point2D implements Comparable<Point2D> {
   public Point2D(double x, double y)              // construct the point (x, y)
   public double x()                               // x-coordinate
   public double y()                               // y-coordinate
   public double distanceTo(Point2D that)          // Euclidean distance between two points
   public double distanceSquaredTo(Point2D that)   // square of Euclidean distance between two points
   public int compareTo(Point2D that)              // for use in an ordered symbol table
   public boolean equals(Object that)              // does this point equal that?
   public void draw()                              // draw to standard draw
   public String toString()                        // string representation
}
Write an immutable data type RectHV.java that implements the following API:
public class RectHV {
   public RectHV(double xmin, double ymin,         // construct the rectangle [xmin, xmax] x [ymin, ymax]
                 double xmax, double ymax)         // throw a java.lang.IllegalArgumentException if (xmin > xmax) or (ymin > ymax)
   public double xmin()                            // minimum x-coordinate of rectangle
   public double ymin()                            // minimum y-coordinate of rectangle
   public double xmax()                            // maximum x-coordinate of rectangle
   public double ymax()                            // maximum y-coordinate of rectangle
   public boolean contains(Point2D p)              // does this rectangle contain the point p (either inside or on boundary)?
   public boolean intersects(RectHV that)          // does this rectangle intersect that rectangle (at one or more points)?
   public double distanceTo(Point2D p)             // Euclidean distance from point p to the closest point in rectangle
   public double distanceSquaredTo(Point2D p)      // square of Euclidean distance from point p to closest point in rectangle
   public boolean equals(Object that)              // does this rectangle equal that?
   public void draw()                              // draw to standard draw
   public String toString()                        // string representation
}
Thoroughly test your data type before proceeding.

Brute-force implementation. Write a mutable data type PointSET.java that represents a set of points in the unit square. Implement the following API by using a red-black BST (using either SET or java.util.TreeSet).

public class PointSET {
   public PointSET()                               // construct an empty set of points
   public boolean isEmpty()                        // is the set empty?
   public int size()                               // number of points in the set
   public void insert(Point2D p)                   // add the point p to the set (if it is not already in the set)
   public boolean contains(Point2D p)              // does the set contain the point p?
   public void draw()                              // draw all of the points to standard draw
   public Iterable<Point2D> range(RectHV rect)     // all points in the set that are inside the rectangle
   public Point2D nearest(Point2D p)               // a nearest neighbor in the set to p; null if set is empty
}
Your implementation should support insert() and contains() in time proportional to the logarithm of the number of points in the set; it should support nearest() and range() in time proportional to the number of points in the set.

2d-tree implementation. Write a mutable data type KdTree.java that uses a 2d-tree to implements the same API as PointSET. A 2d-tree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in the nodes, using the x- and y-coordinates of the points as keys in strictly alternating sequence.

Insert (0.7, 0.2)

insert (0.7, 0.2)
Insert (0.5, 0.4)

insert (0.5, 0.4)
Insert (0.2, 0.3)

insert (0.2, 0.3)
Insert (0.4, 0.7)

insert (0.4, 0.7)
Insert (0.9, 0.6)

insert (0.9, 0.6)
Insert (0.7, 0.2)
Insert (0.5, 0.4)
Insert (0.2, 0.3)
Insert (0.4, 0.7)
Insert (0.9, 0.6)

The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest neighbor search. Each node corresponds to an axis-aligned rectangle in the unit square, which encloses all of the points in its subtree. The root corresponds to the unit square; the left and right children of the root corresponds to the two rectangles split by the x-coordinate of the point at the root; and so forth.

Clients.  You may use the following interactive client programs to test and debug your code.

Analysis.  Analyze your approach to this problem giving estimates of its time and space requirements by answering the relevant questions in the readme.txt file. In particular:

Submission.  Submit RectHV.java, PointSET.java, and KdTree.java and any other files needed by your program (excluding those in stdlib.jar and algs4.jar). Finally, submit a readme.txt file and answer the questions.

This assignment was developed by Kevin Wayne.