Algorthm Development. Developing a good algorithm is an iterative process. We create a model of the problem, develop an algorithm, and revise the performance of the algorithm until it meets our needs.

Union-Find. The ultimate goal is to develop a data type that support the following operations on a fixed number N of objects:

We do not care about finding the actual path between p and q. We care only about their connectedness. A third operation we can support is very closely related to connected():

The find() method is defined so that find(p) == find(q) iff connected(p, q).

Key observation: connectedness is an equivalence relation. Saying that two objects are connected is the same as saying they are in an equivalence class. This is just fancy math talk for saying "every object is in exactly one bucket, and we want to know if two objects are in the same bucket". When you union two objects, you're basically just pouring everything from one bucket into another.

Quick find. This is the most natural solution, where each object is given an explicit number. Uses an array id[] of length N, where id[i] is the bucket number of object i (which is returned by find(i)). To union two objects p and q, we set every object in p's bucket to have q's number.

Performing M union-find operations takes M N time. If M is proportional to N, this results in N2 time.

Quadratic algorithms don't scale. Given an N times larger problems on an N times faster computer, the problem takes N times as long to run.

Quick union. id[i] is the parent object of object i. An object can be its own parent. The find() method climbs the ladder of parents until it reaches the root (an object whose parent is itself). To union p and q, we set the root of p to point to the root of q.

Performing M union-find operations takes NM time in the worst case. Again, this results in quadratic behavior.

Weighted quick union. Rather than union(p, q) making the root of p point to the root of q, we instead make the root of the smaller tree point to the root of the larger one. The tree's size is the number of nodes, not the height of the tree. Results in tree heights of lg N (you should understand this proof).

Warning: if the two trees have the same size, the code has the opposition convention as quick union and sets the root of the second tree to point to the root of the first tree.

Weighted quick union with path compression. When find is called, the tree is compressed. Results in nearly flat trees. Making M calls to union and find with N objects results in no more than M log*(N) array accesses. For any conceivable values of N in this universe, log*(N) is at most 5.

Recommended Problems

C level

  1. What are the best-case and worst-case tree heights for weighted quick-union and weighted quick-union with path compression? Give your answers in terms of order of growth.
  2. Textbook: 1.5.1, 1.5.2, 1.5.3

B level

  1. Fall 11 Midterm, #1
  2. Fall 12 Midterm, #1
  3. Textbook: 1.5.8
  4. Textbook: 1.5.9

A level

  1. Textbook: 1.5.10
  2. If we're concerned about tree height, why don't we use height for deciding tree size instead of weight? What is the worst-case tree height for weighted-quick-union vs. heighted-quick-union? What is the average tree height?
  3. Try writing weighted-quick-union-with-path-compression without looking at the code on the booksite. You may look at the API. Compare your resulting code with the booksite code.