GEOMETRIC APPLICATIONS OF BINARY SEARCH TREES

Binary search trees
Consider a binary search tree with numerical keys (say doubles). Geometrically, we can view the tree as recursively subdividing the interval (-∞,∞):

• Each node is associated with an interval (a,b) that contains all points in the subtree below it
• The root is associated with the whole interval (-∞,∞)
• Each internal node divides its associated interval in two sub-intervals at node's key. For example, if a node n is associated with the interval (a,b), then n.left is associated with the interval (a,n.key) and n.right is associated with the interval (n.key,b).

kd-Trees
kd trees are a data structure for representing a collection of points in k-dimensional space. The geometric description of binary search trees above corresponds to the special case of 1d trees (also called 1-dimensional kd trees). kd trees recursively subdivide k-dimensional space as follows:

• Each node is associated with a bounding box that contains all points in the subtree below it
• The root is associated with the entire space
• If an internal node divides its associated bounding box in two at the halfspace that passes through the point associated with that node and is aligned with the ith axis, where i is the level of the node mod k. For example, in a 2d tree, then along any path from root to leaf, the nodes alternate between splitting the bounding box horizontally at node.key.x (even levels) and vertically at node.key.y (odd levels).
See here for a visual depiction of bounding boxes of kd trees. Insertion and search run in worst-case Θ(log n) time in a balanced in a kd tree, and worst-case Θ(n) time in an unbalanced kd tree.

Nearest neighbor search (see here) finds the point in the set that is geometrically closest to a given target point. Nearest neighbor search traverses the tree recursively starting from the root, (potentially) exploring both the left and right child of every node. Efficient nearest neighbor search is enabled by the following heuristics:

• pruning rule: if the current champion is closer than the distance to the target point than the bounding box of the current node, then return -- the nearest neighbor is not in the subtree rooted at the current node.
• optimistic ordering: after visiting a node, visit the child that lies on the same side of the halfspace defined by the node as does the query point.
Typical running time of nearest neighbor search is is Θ(log n); worst-case is Θ(n).

Range search (see here) finds all the points in a kd tree that are contained within a given (k-dimensional) bounding box. Efficient range search is enabled by the following pruning heuristic: if the target range does not intersect the bounding box of the node, then return -- none of the points in the tree below the current node may belong to the range. Typical running time of range search is Θ(log n + m) where n is the number of points in the tree and m is the number of matches; worst-case running time (assuming a balanced tree) is Θ(n(d-1)/d + m) (e.g., Θ(√n + m) for 2d trees) where d is the number of dimensions.

Recommended Problems

C level

1. Suppose that a 3-d tree contains N nodes. What is its height of the tree in the worst (largest) and best (smallest) case?

B level

1. Suppose that a set of points is organized into a kd tree. Design an efficient algorithm for finding the nearest neighbor of a target point that lies within a given bounding box