Binary search trees
Consider a binary search tree with numerical keys (say doubles). Geometrically, we can view the tree as recursively subdividing the interval (-∞,∞):

kd trees are a data structure for representing a collection of points in k-dimensional space. The geometric description of binary search trees above corresponds to the special case of 1d trees (also called 1-dimensional kd trees). kd trees recursively subdivide k-dimensional space as follows:

See here for a visual depiction of bounding boxes of kd trees. Insertion and search run in worst-case Θ(log n) time in a balanced in a kd tree, and worst-case Θ(n) time in an unbalanced kd tree.

Nearest neighbor search (see here) finds the point in the set that is geometrically closest to a given target point. Nearest neighbor search traverses the tree recursively starting from the root, (potentially) exploring both the left and right child of every node. Efficient nearest neighbor search is enabled by the following heuristics:

Typical running time of nearest neighbor search is is Θ(log n); worst-case is Θ(n).

Range search (see here) finds all the points in a kd tree that are contained within a given (k-dimensional) bounding box. Efficient range search is enabled by the following pruning heuristic: if the target range does not intersect the bounding box of the node, then return -- none of the points in the tree below the current node may belong to the range. Typical running time of range search is Θ(log n + m) where n is the number of points in the tree and m is the number of matches; worst-case running time (assuming a balanced tree) is Θ(n(d-1)/d + m) (e.g., Θ(√n + m) for 2d trees) where d is the number of dimensions.

Recommended Problems

C level

  1. Suppose that a 3-d tree contains N nodes. What is its height of the tree in the worst (largest) and best (smallest) case?

B level

  1. Suppose that a set of points is organized into a kd tree. Design an efficient algorithm for finding the nearest neighbor of a target point that lies within a given bounding box
  2. Consider the set of points (0.1, 9.1), (2.0,1.4), (6.7,5.7), (0.2, 0.2), (4.3,2.7), (1.1,5.7), (5.1, 8.7). In what order should they be inserted into a 2d tree order to mimimize height?
  3. Answers
  4. Given a sequence of N intervals (a1, b1), ..., (aN, bN), design an O(N log N) sweep-line algorithm to find a value x that is contained within the maximum number of intervals. You may assume that no two endpoints have the same value.
    1. What are the events?
    2. How do you implement the sweep line?
    3. What data structure stores the set of intervals that intersect the sweep line?
    4. How does your sweep-line algorithm work, i.e., how do you process each event?

A level

  1. Suppose that you are given a set S of N 2-dimensional points. Design an algorithm to construct a 2d tree of height Θ(log N) that contains S, in Θ(log N) time. Is it possible to construct a 2d tree of height ~log N?
  2. Answers