### UNION–FIND STUDY GUIDE

Algorthm Development. Developing a good algorithm is an iterative process. We create a model of the problem, develop an algorithm, and revise the performance of the algorithm until it meets our needs.

Dynamic connectivity problem. The problem is defined on an undirected graph with n vertices. There are two operations: add an edge and determine whether two vertices are connected by a path. Connectedness is an equivalence relation. This implies that we can partition the vertices into sets such that every vertex is in exactly one set and two vertices are connected if and only if they are in the same set. This problem motivates the union–find data type.

Union–Find. The goal is to develop a data type that support the following two core operations on disjoints sets over the the elements { 0, 1, 2, ..., n − 1 }:

• union(int p, int q)
• find(int p)
The call union(p, q) merges the sets containing p and q; the call find(p) returns an identifier for the set containing element p.

Quick find. This is the most natural solution, where each element is given an explicit identifier that indicates in which set it belongs. We use an array id[] of length n, where id[i] is the identifier of element i (which is returned by find(i)). To union two objects p and q, we set every element with p's identifer to have q's identifier.

• Union: May require many changes to id[]. Takes n time in the worst case (to union large sets).

• Find: takes constant time.
Performing m union–find operations takes mn time. If m is proportional to n, this results in n2 time.

Quadratic algorithms don't scale. Given an k-times larger problems on an k-times faster computer, the problem takes k-times as long to run.

Quick union. We store the elements in a forest of trees, with the elements in each tree corresponding to a different set. We store the parent pointers in an array, where parent[i] is the parent in the tree of element i. We use the root of the tree as the set identifier. By convention, we set the parent pointer of a root to itself. The find() method climbs the ladder of parents until it reaches the root (an object whose parent is itself). To union p and q, we set the root of p to point to the root of q.

• Union: Requires changing only one entry in parent[], but also requires root finding (worst case n time).

• Find: Requires root finding (worst case n time).
Performing m union–find operations takes mn time in the worst case. Again, this results in quadratic behavior.

Weighted quick union (union-by-size). Rather than union(p, q) making the root of p point to the root of q, we instead make the root of the smaller tree point to the root of the larger one. The size of a tree is the number of nodes. Using union-by-size, the height of each tree is at most lg n (you should understand this proof). (An alternate strategy, known as union-by-height, use the height of the tree instead of the size.)

• Union: Requires only one change to parent[], but also requires root finding (worst case log n time).

• Find: Requires root finding (worst case log n time).
Warning: if the two trees have the same size, the code has the opposition convention as quick-union and sets the root of the second tree to point to the root of the first tree.

### Recommended Problems

#### C level

1. What are the best-case and worst-case tree heights for weighted quick-union and weighted quick-union with path compression? Give your answers in terms of order of growth.
2. Textbook: 1.5.1, 1.5.2, 1.5.3

#### B level

1. Fall 11 Midterm, #1
2. Fall 12 Midterm, #1
3. Textbook: 1.5.8
4. Textbook: 1.5.9

#### A level

1. Textbook: 1.5.10
2. If we're concerned about tree height, why we perform union-by-height instead of union-of-size? What is the worst-case tree height for union-by-height vs. union-by-size? What is the average tree height?
3. Try writing weighted quick-union with path compression without looking at the code on the booksite. You may look at the API. Compare your resulting code with the booksite code.