Algorthm Development. Developing a good algorithm is an iterative process. We create a model of the problem, develop an algorithm, and revise the performance of the algorithm until it meets our needs.

Dynamic connectivity problem. The problem is defined on an undirected graph with N vertices. There are two operations: add an edge and determine whether two vertices are connected by a path. Connectedness is an equivalence relation. This implies that we can partition the vertices into sets such that every vertex is in exactly one set and two vertices are connected if and only if they are in the same set. This problem motivates the union-find data type.

Union-Find. The goal is to develop a data type that support the following two core operations on disjoints sets over the the elements { 0, 1, 2, ..., N − 1 }:

The call union(p, q) merges the sets containing p and q; the call find(p) returns an identifier for the set containing element p.

Quick find. This is the most natural solution, where each element is given an explicit identifier that indicates in which set it belongs. We use an array id[] of length N, where id[i] is the identifier of element i (which is returned by find(i)). To union two objects p and q, we set every element with p's identifer to have q's identifier.

Performing M union-find operations takes M N time. If M is proportional to N, this results in N2 time.

Quadratic algorithms don't scale. Given an N times larger problems on an N times faster computer, the problem takes N times as long to run.

Quick union. We store the elements in a forest of trees, with the elements in each tree corresponding to a different set. We store the parent pointers in an array, where parent[i] is the parent in the tree of element i. We use the root of the tree as the set identifier. By convention, we set the parent pointer of a root to itself. The find() method climbs the ladder of parents until it reaches the root (an object whose parent is itself). To union p and q, we set the root of p to point to the root of q.

Performing M union-find operations takes NM time in the worst case. Again, this results in quadratic behavior.

Weighted quick union (union-by-size). Rather than union(p, q) making the root of p point to the root of q, we instead make the root of the smaller tree point to the root of the larger one. The size of a tree is the number of nodes. Using union-by-size, the height of each tree is at most lg N (you should understand this proof). (An alternate strategy, known as union-by-height, use the height of the tree instead of the size.)

Warning: if the two trees have the same size, the code has the opposition convention as quick union and sets the root of the second tree to point to the root of the first tree.

Weighted quick union with path compression. When find is called, the tree is compressed. Results in nearly flat trees. Making M calls to union and find with N objects results in no more than M log*(N) array accesses. For any conceivable values of N in this universe, log*(N) is at most 5.

Recommended Problems

C level

  1. What are the best-case and worst-case tree heights for weighted quick-union and weighted quick-union with path compression? Give your answers in terms of order of growth.
  2. Textbook: 1.5.1, 1.5.2, 1.5.3

B level

  1. Fall 11 Midterm, #1
  2. Fall 12 Midterm, #1
  3. Textbook: 1.5.8
  4. Textbook: 1.5.9

A level

  1. Textbook: 1.5.10
  2. If we're concerned about tree height, why don't we use height for deciding tree size instead of weight? What is the worst-case tree height for weighted-quick-union vs. heighted-quick-union? What is the average tree height?
  3. Try writing weighted-quick-union-with-path-compression without looking at the code on the booksite. You may look at the API. Compare your resulting code with the booksite code.