Computer Science 226
Algorithms and Data Structures
Fall 2013


Course Information | Assignments | Exercises | Lectures | Exams | Precepts | Study Guide | Booksite

Union Find and Intro
Analysis of Algorithms
Stacks and Queues
Elementary Sorts
Mergesort
Quicksort
Collections, PQs, and Priority Queues
General Sorting
BSTs
Balanced BSTs
Hashing


Geometric Applications of BSTs
Undirected Graphs
Directed Graphs
MSTs
Shortest Paths
Max flow
String Sorts
Tries
Substring Matching
Regular Expressions
Compression
Reductions and Intractability

Study Guide

This page provides a list of what I consider the most important topics and ideas from each lecture. The optional exercises listed on this page are not for a grade. Instead, they are to provide you with a better idea of what we're expecting you to know before the final.

Optional exercises are either C problems, B problems, or A problems. C problems are at the same level as the exercises, and generally require that you only be able to mechanically carry out the algorithms. B problems are harder problems that require you to really know what's going on (the exams will have plenty of these). A problems are problems that either cover tangential material (like how Kosaraju-Shamir works), require deep comprehension of the material, involve very clever tricks, or perhaps even all 3.

Don't be intimidated by its length. I've tried to be as comprehensive as possible.

Union Find and General Intro

Key Concepts

Algorthm Development. Developing a good algorithm is an iterative process. We create a model of the problem, develop an algorithm, and revise the performance of the algorithm until it meets our needs.

Union-Find
. The ultimate goal is to develop a data type that support the following operations:

We do NOT care about finding the actual path from p to q. We only care about their connectedness.

Very closely related to connected is a 3rd operation we can support:

Find is defined such that find(p) = find(q) iff connected(p, q).

We assume that the number of objects N is fixed.

Key Observation: Connectedness is an Equivalence Relation. Saying that two objects are connected is the same as saying they are in an equivalence class.

This is just fancy math-talk for saying "every object is in exactly one bucket, and we want to know if two objects are in the same bucket". When you union two objects, you're basically just pouring everything from one bucket into another.

Quick Find. This is the most natural solution, where each object is given an explicit number. Uses an array id of length N. id[i] is the bucket number of object i (which is returned by find). To union two objects p and q, we set every object in q's bucket to have p's number.

Performing M union-find operations takes NM time. If M were proportional to N, this results in N2 time;

Quadratic Algorithms Don't Scale. Given an N times larger problems on an N times faster computer, the problem takes N times as long to run.

Quick Union. id[i] is the parent object of object i. An object can be its own parent. The find method climbs the ladder of parents until it reaches the root (an object whose parent is itself). To union p and q, we set p's parent to q.

Performing M union-find operations takes NM time in the worst case. Again, this results in quadratic behavior.

Weighted Quick Union
. Rather than union(p, q) pointing p's parent at q, we instead point the smaller tree at the larger one. The tree's size is given by the NUMBER of nodes, not the height of the tree. Results in tree heights of lg N (you should understand this proof).

Weighted Quick Union with Path Compression. When find is called, the tree is compressed. Results in nearly flat trees. Making M calls to union and find with N objects results in no more than M log*(N) array accesses. Log*(N) is at most 5 in our universe.

Recommended Problems

C level

  1. What are the best and worst case tree heights for weighted quick-union and weighted quick-union with path compression? Give your answers in terms of order of growth.
  2. Textbook: 1.5.1, 1.5.2, 1.5.3

B level

  1. Fall 11 Midterm, #1
  2. Fall 12 Midterm, #1
  3. Textbook: 1.5.8
  4. Textbook: 1.5.9

A level

  1. Textbook: 1.5.10
  2. If we're concerned about tree height, why don't we use height for deciding tree size instead of weight? What is the worst-case tree height for weighted-quick-union vs. heighted-quick-union? What is the average tree height?
  3. Try writing weighted-quick-union-with-path-compression without looking at the code on the booksite. You may look at the API. Compare your resulting code with the booksite code.

Analysis of Algorithms

Key Concepts

Tilde notation. We say that f(N) ~ g(N) if in the large N limit, f(N)/g(N) converges to 1. This is a general concept that can be applied to any functions, and is not restricted to runtime, memory, or any other specific domain.

Power-law assumption. For empirical analyses in COS226 we will assume that runtimes obey a power law (i.e. T(N) ~ aNb). Empirical derivation of accurate hypotheses for non-power law runtimes is beyond the scope of our course.

Cost model. For theoretical analyses of runtime in COS226, we will assume a 'cost model', namely that some particular operation (operations) dominates the runtime of a function. We can then express the runtime in terms of the total number of that operation. Often, we give this count in Tilde notation.

Order of growth. If we have two functions f(N) and g(N), and f(n) ~ a G(N), we say the order of growth of f(N) is g(N).

Performance depends on inputs. We can characterize an algorithm's performance by its best case, worst case, and average case.

Difficulty of a problem. To understand the difficulty of a particular problem in COS226, we consider only the worst-case order-of-growth of its solutions. We can naturally upper bound the difficulty of a problem by the performance of the best known solution. We can naturally lower bound the difficulty of a problem with general considerations of the problem. This is actually as trivial as it sounds.

Big Oh, Big Omega, Big Theta. Big Theta is precisely the same thing as "order of growth". If we say "f(N) is Θ(g(N))" or "f(N)=Θ(g(N))", we are simply saying f(N) is the same order of growth as g(N). If f(N)=O(g(N)), then f(N) has an order of growth that is less than or equal to g(N). More precisely, we could say that the limit of f(N)/g(N) as N goes to infinity is NOT infinity. If f(N)=Ω(g(N)), then f(N) has an order of growth that is greater than or equal to g(N). We could also say that the limit of f(N)/G(N) as N goes to infinity is greater than 0.

As with Tilde notation, these are all general concepts that can be applied to any functions, and are not limited to discussions of specific topics in computer science.

Worst case order of growth isn't everything. Just because an algorithm has a better order of growth does not mean it is better. We will see examples of this later (Quicksort vs. Mergesort. Quick select vs. Median of medians).

Memory analysis. Know how to calculate the memory utilization of a class with the 64 bit memory model.

Theoretical and empirical analysis. Hypotheses generated through theoretical analysis (or guesswork like our power law assumption) should be validated with data before being fully trusted.

Recommended Problems

C level

  1. Textbook 1.4.4
  2. Fall 2010 Midterm, #1d
  3. Fall 2011 Midterm, #2

B level

  1. Textbook 1.4.5
  2. Fall 2008 Midterm, #3
  3. Fall 2009 Midterm, #1
  4. Fall 2010 Midterm, #1a, #1c
  5. Spring 2012 Midterm, #1
  6. Spring 2013 Final, #4

A level

  1. Spring 2013 Midterm, #1
  2. Suppose the optimal (and possibly unknown) solution to problem P has order of growth F(N). Suppose that the best known solution has runtime that is Θ(B(N)). Finally, suppose that there is a clever proof that no solution can possibly have order of growth that is less than L(N). Which of the following can you surmise?
    1. F(N) = O(B(N))
    2. B(N) = O(F(N))
    3. The limit of F(N)/B(N) as N goes to infinity cannot be infinity.
    4. F(N) = Ω(L(N))
    5. L(N) = Ω(F(N))
    6. B(N) > F(N) for sufficiently large N.

Stacks and Queues

Key Concepts

Stacks and Queues. Understand the semantics of each.

Linked Lists. Understand how they can be used to implement a stack or a queue, how to calculate the memory used by a linked list (using the 64 bit memory model) for N items in a queue, and how to show that queue and stack operations complete in constant time.

Resizing Arrays. Understand how they can be used to implement a stack or a queue. Understand why the worst case memory usage and runtimes are different than the best case.

Amortized Analysis. The idea is simpler that it might seem at first. If we say that X-operations take amortized f(M) Y-operations, this means that if we start with an empty data structure, the average cost for any sequence of M X-operations is f(M) Y-operations.

For example, for a resizing array stack, we say that push and pop operations take amortized constant time. This means that if we start with an empty stack, any sequence of pushes or pops take on average a constant number of array accesses per push or pop. In this example X is {push, pop}, Y is array accesses, and f(M) is just 1.

Generics. Understand that these provide the ability to parameterize data types. Understand why this approach is superior to other approaches (writing separate classes, using the Object type and doing runtime casting).

Compile time vs. run time errors. Understand why the former are preferred.

Iterators. Understand what it means for a class to implement Iterable. Understand what a class must do to implement Iterator. Understand how the foreach operator works (a.k.a. the colon operator). Understand how to convert code using : into equivalent longhand code.

Loitering. Understand how to avoid loitering in Java.

Recommended Problems

C level

  1. Textbook 1.3.6
  2. 1.3.3
  3. 1.3.22, then 1.3.23
  4. Understand why the we prevent loitering by adding the line A[N] = null in Algorithm 1.1 on page 141 of the book. Why do we not have a special purpose line like this for our linked list implementation (page 149, algorithm 1.2).

B level

  1. Fall 2008 Midterm, #3b (you don't need to know what a symbol table is to do this problem) -- this is a great problem!
  2. Textbook 1.3.5
  3. Textbook 1.4.35 - a pushdown stack is just a stack.
  4. Textbook 1.4.36 [important note: they recommend using a static inner class to reduce Node overhead, hence the disagreement with the lecture slides!]
  5. Consider Algorithm 1.1 on page 141 of the book. We showed in lecture that the memory usage of this class is ~8N and ~32N bytes in the best and worst cases respectively. If we get rid of the line A[N] = null, which prevents loitering, how does this affect the best and worst case memory usage of our resizing array based stack? Highlight for answer: It doesn't! Follow up question with no provided answer: Then why do we care?

A level

  1. Textbook 1.4.32
  2. Suppose we have a resizing array that increases in size by K entries when the array is full, and decreases in size by K entries when the array has K empty entries. Show that the push and pop take amortized M time for some worst case sequence. Give an example of a worst case sequence. Observe that this results in M2 time for M operations.

Elementary sorts

Key Concepts

Inversions. The number of pairs of elements in a sequence that are out of order. An array with no inversions is ordered.

Selection sort. Be able to selection sort an array. How many compares does it use? How many exchanges?

Insertion sort. Be able to insertion sort an array. How many compares does it use in the best and worst case? How many exchanges in the best and worst case?

Partially ordered arrays. Why is insertion sort linear for partially sorted arrays?

Shellsort. Not covered Fall 2013.

Shuffling. Can use sorting to shuffle in N lg N by assigning each value an arbitrary unique value between 0 and 1. Can sort in N time using Knuth shuffle. You don't need to know the Knuth shuffle details.

Convex hull. Understand how sorting leads to a solution to the convex hull problem. You should be able to carry out the algorithm. You do not need to understand the code for determining ccw turns.

Recommended Problems

C level

  1. Final 2008 midterm, #9
  2. Give a worst case and best case input for insertion sort and selection sort.

B level

  1. Textbook 2.1.2
  2. Textbook 2.1.6, 2.1.7
  3. Fall 2012 midterm, #8

A level

  1. Textbook 2.1.28
For more problems, see the general sorting section.

Mergesort

Key Concepts

Mergesort. Merge left, merge right, merge.

Merge. Understand how to carry out the merge operation. How many compares does it use when comparing two arrays of size N in the best case? In the worst case?

Mergesort order of growth. Understand how to show that the order of growth of the number of compares is N lg N. Understand why the entire algorithm is also order N lg N.

Mergesort compare bounding. Know why the best case is ~ 1/2 N lg N and the worst case is ~ N lg N compares.

Mergesort properties. Mergesort is stable (why?). Mergesort uses N memory (why?). Does mergesort do particularly well on already sorted arrays? Partially ordered arrays?

Recommended Problems

C level

  1. Give a worst case and a best case input for mergesort.

B level

  1. Textbook 2.2.22
  2. Textbook 2.2.8

A level

  1. Using the decision tree idea, rederive the proof that all comparison based sorting algorithms must use at least ~N lg N compares. You may assume Stirling's formula.
For more problems, see the general sorting section.

Quicksort

Key Concepts

Quicksort. Partition on some pivot. Quicksort to the left of the pivot. Quicksort to the right.

Partitioning. Understand exactly how to carry out 2 way partitioning as discussed in class. Be able to recognize 3 way partitioning as discussed in the book / blackboard exercise.

Quicksort order of growth. Understand how to show that in the best case, quicksort is N lg N, and in the worse case is N^2. Shuffling is needed to probabilistically guarantee N lg N behavior.

Quicksort compare counting. Know why the best case is ~N lg N compares and worst case is ~1/2 N^2. Despite the greater number of compares, quicksort is usually faster than mergesort. Be familiar with the fact that shuffling yields 2N ln N compares on average (but you don't need to fully digest this proof -- especially solution of the difficult recurrence relation, as that involves discrete math that is beyond the scope of the course).

Pivot choice. Understand how the pivot affects the size of the subproblems created after partitoning.

Quicksort properties. Quicksort is not stable and is in-place (uses no more than log N memory).

Practical improvements. Cutoff to insertion sort. Using 3-way partitioning to attain faster performance for arrays with a constant number of keys.

Recommended Problems

C level

  1. Give a worst case input for non-random quicksort that chooses the leftmost element as a pivot. Why is a best case input hard to think of?
  2. Textbook 2.3.8
  3. Textbook 2.3.11
  4. Spring 2013 mitderm, #4a

B level

  1. Textbook 2.3.3
  2. Textbook 2.3.13 (don't do 2.3.20)
  3. Spring 2013 midterm, #4b

A level

  1. One strategy to speeding up quicksort is to insertion sort subarrays of size approximately 10 or smaller. An alternate approach is to simply NOT sort arrays of size less than 10, then use insertion sort on the entire array. Prove that this alternate approach results in a linear time sort, depite insertion sort's worst case N^2 performance.
For more problems, see the general sorting section.

Collections, PQs and Heaps

Key Concepts

Terminology. Abstract data types are independent of implementation. Collections are abstract data types.

Common collections. Priority queue, set, symbol table. Know what each of them does. You do not need to memorize the names of the the operations shown in the slides.

Heaps. A max (min heap is a binary tree such that every node is larger (smaller) than all of its children. This definition naturally applies recursively, i.e. a heap of height 5 is composed of two heaps of height 4 plus a parent.

Array-heaps. The heap data structure must be implemented in terms of even more elementary data structures in most programming languages, because there is not a heap primitive. In 226, we utilize a clever trick where we represent the heap as an array. You should understand how this works, e.g. why do we leave the 0th position empty in the array?

Runtimes of various PQ implementations. Know the runtimes of the 3 primary PQ operations for an unordered array, ordered array, and heap implementation.

Heapsort. A max PQ provides an easy algorithm for putting items in ascending order. We simply insert everything into the PQ and then delete the max until the PQ is empty. If we use an array representation of a heap, we can do everything in place (this is not obvious). This is somewhat subtle.

Heapification. Understand how the bottom up heapification process works. Know that it is linear time.

Recommended Problems

Note: The reason I've given lots of problems here is not because this is a more important topic, but because there are just so many interesting problems.

C level

  1. Textbook 2.4.2 (assume we'd also like to support delete operations)
  2. Textbook 2.4.4
  3. Textbook 2.4.11, 2.4.12
  4. Textbook 2.4.14
  5. Spring 2008 #5, Fall 2008 #4, Fall 2009 #3, etc.

B level

  1. Textbook 2.4.7
  2. Textbook 2.4.10
  3. Textbook 2.4.17, how could you use this technique to find the kth largest item? What would be the runtime of this algorithm in terms of N and k?
  4. Textbook 2.4.21
  5. Textbook 2.4.27
  6. Textbook 2.4.32 (easier than it looks)
  7. Spring 2008 Midterm #4
  8. Fall 2012 Midterm #4
  9. Spring 2013 Midterm #5a and #5b

A level

  1. Prove that bottom up heapification runs in linear time.
  2. Textbook 2.4.29
  3. Textbook 2.4.30 (great problem!)
  4. Fall 2010 Midterm #7
  5. Spring 2012 Miderm #8
  6. Use a resizing array based queue and a symbol table to create an algorithm that solves FastCollinear in N^2. Don't write code. Make sure you can output the segments in sorted order. You may assume the symbol table supports all operations in constant time.

General sorting

On the midterms, sorting questions tend to broadly cover the entire sorting chapter of the book.

Recommended Problems

C level

  1. What sort does Java use for sorting object arrays? For sorting primitive arrays? Why?
  2. Spring 2012 Midterm #1

B level

  1. This classic problem: Spring 2008 Midterm #1, Fall 2008 Midterm #1, Fall 2012 Midterm #2, Spring 2013 Midterm #6
  2. Spring 2008 Midterm, #2b
  3. Fall 2008 Midterm, #2
  4. Fall 2010 Miderm, #2b

A level

  1. Spring 2008 Midterm, #2a
  2. Spring 2012 Midterm, #2b
  3. Fall 2012 Midterm, #2

Symbol Tables and BSTs

Key Concepts

Elementary implementations. Ordered array and linked list. Understand the performance of each.

Comparable vs. equals. Keys must either be comparable or have an equals method which takes into account the semantics of the object. The default equals method only checks reference equality.

API for Symbol Table and Ordered Symbol Table. Know the operations that go with symbol tables and ordered symbol tables. You don't have to be able to list them, but you should understand what they mean.

BST basics. Know the definition of a BST and how to search and insert.

Recursive BST code. You should be able to read recursive BST code. You will not be required to write recursive BST code, since you won't have practice until KdTree.

In-order traversal. Understand how to perform an in order traversal (in optional slides, also at beginning of the BST lecture)

Treesort. Understand how to use a BST to sort items. Insert all the items, then perform an inorder traversal. This algorithm is also known as Quicksort.

Deletion and Hibbard deletion. Understand how to delete an item with 1 or 0 children. Understand how to delete an item with 2 children, a.k.a. Hibbard deletion.

Symbol table performance. Understand the performance of our elementary implementations. Understand the performance of a BST based symbol table in the best, worst, and average (shuffled) case, with an emphasis on the fact that the height of a tree is the key factor. Understand what Hibbard deletions do to tree height. You do not need to know why Hibbard deletes result in sqrt(n) height.

Recommended Problems

C level

  1. What is the best case BST height? Worst case?
  2. If shuffling guarantees log N tree height, why don't we simply shuffle our input data before building our BST based symbol table to avoid worst case behavior?
  3. Textbook 3.2.3, but give only two orderings.
  4. Textbook 3.2.4

B level

  1. In our elementary implementations, we considered ordered arrays and linked lists. We did not consider ordered linked lists. Why not?
  2. Give an example of something that you might like to store as a key in a symbol table for which there is no natural compare method.
  3. Do there exist any objects for which it is impossible to define a total order? In other words, can we always write a compare method if we're willing to do the work, or are some data types fundamentally impossible to compare other than for equality?
  4. Stable 2-way partitioning is a partitioning procedure in which items that are smaller than the pivot are kept in the same relative order after partitioning, and likewise with larger items. For example, if we stably partition GADFLY on G, we'd get ADFGLY. Perform an entire quicksort using stable partitioning on the array [5, 3, 2, 1, 9, 7, 6], and record the compares you make along the way. Build the BST for this array. Observe that the same exact compares were made.
  5. Textbook 3.2.22 (pretty easy, but good to do)
  6. Textbook 3.2.23
  7. Textbook 3.2.24 (so easy it might seem hard)

A level

  1. Spring 2008 Midterm, #8
  2. BSTs allow us to perform a stable Quicksort. What are the disadvantages of this stable Quicksort compared to the version we discussed in the Quicksort lecture?

Balanced Search Trees

Key Concepts

2-3 trees. Understand how to search and insert. Understand why insertion technique leads to perfect balance and symmetric order.

Terminology. Symmetric order. Perfect balance.

Performance of 2-3 trees. Tree height is between log3(N) and log2(N). All paths are of the same height.

Mapping between LLRBs and 2-3 Trees. Understand how to map a 2-3 tree to an LLRB and vice versa.

LLRBs. BST such that no node has two red links touching it. Perfect black balance. Red links lean left.

LLRB get. Exactly the same as regular BST get.

Storing color. A node's color is the same as the edge connecting that node to its parent.

LLRB insert. Insert as usual, then apply rotations recursively as follows, working upwards from the new node.

  1. Case 3: A parent node has a black left child and a red right child, so rotate the parent left. The former right child is now the boss. Reminder: null links are considered black for the purposes of deciding cases.
  2. Case 2: A grandparent node has a red left child whose left child is also red. Rotate this grandparent right (so that one in the middle is now the boss).
  3. Case 1: A parent node has two red children. Flip colors.
Conveniently, we can always apply case 3, case 2, then case 1 to every node in a tree, and we can guarantee that the entire tree will be an LLRB.

LLRB performance. Perfect black balance ensures worst case performance for get and insert is ~2 log(N).

Recommended Problems

C level

  1. Given an LLRB, is there exactly one corresponding 2-3 tree? Given a 2-3 tree, is the exactly one corresponding LLRB?
  2. Draw a 2-3 tree with all 3 nodes. Why is the height log3(N)?
  3. How many compares does it take in the worst case to decide whether to take the left, middle, or right link from a 3 node?
  4. Fall 2010 Midterm, #4. Fall 2011 Midterm, #6. Spring 2012 Midterm, #5. Spring 2013 Midterm, #2. Fall 2008 Midterm, #6. Fall 2009 Midterm, #4.

B level

  1. For a 2-3 tree of height h, what is the best case number of compares to find a key? The worst case?

A level

  1. Show that in the worst case, we need to perform order log N LLRB operations (rotation and color flipping) after an insert (not as hard as it might seem). Give an example of a tree which requires log N LLRB operations after a single insert (also not so hard).

Hash Tables

Key Concepts

Brute force approach. All data is just a sequence of bits. Can treat key as a gigantic number and use it as an array index. Requires exponentially large amounts of memory.

Hashing. Instead of using the entire key, represent entire key by a smaller value. In Java, we hash objects with a hashCode() method that returns an integer (32 bit) representation of the object.

hashCode() to index conversion. To use hashCode() results as an index, we must convert the hashCode() to a valid index. Modulus does not work since hashCode may be negative. Taking the absolute value then the modulus also doesn't work since Math.abs(Integer.MIN_VALUE) is negative. We use hashCode & 0x7FFFFFFF instead before taking the modulus.

Hash function. Converts a key to a value between 0 and M - 1. In Java, this means calling hashCode(), setting the top bit to 0, then taking the modulus.

Collision resolution. Two philosophies for resolving collisions discussed in class: Separate chaining and 'open addressing'. We didn't use the term open addressing, but it's where you use empty array entries to handle collisions, e.g. linear probing. We will use this term at the beginning of lecture 11.

Separate chaining with a linked list. Key/value pairs are stored in a linked list of nodes of length M. Hash function tells us which of these linked lists to use. Get and insert both require potentially scanning through entire list.

Resizing separate chaining hash tables. Understand how resizing may lead to objects moving from one linked list to another. Primary goal is so that M is always proportional to N.

Performance of separate chaining hash tables. Cost of a given get, insert, or delete is given by number of entries in the linked list that must be examined.

Linear probing hash tables. If the space that should be occupied by a key is already occupied by something else, try the space to the right. If that's occupied, go right more. Repeat. This philosophy works for get and insert.

Performance of linear probing hash tables. As before, perfmrance determined by number of entries in the key array that must be examined.

Hash functions and the uniform hashing assumption. For our analyses above, we assumed that our hash function distributes all input data evenly across bins. Design of such function is beyond the scope of our course. Most important guideline is to use all the bits in the key.

Dangers of hashing. If hashCode() is known and easy to reverse, adversary can design a sequence of inputs that result in everything being placed in one bin. Or if hashCode() is just plain bad, same thing can happen.

Recommended Problems

C level

  1. Textbook 3.4.5
  2. Fall 2010 Midterm, #5, Fall 2012 Midterm, #6a

B level

  1. Textbook 3.4.13, 3.4.14
  2. Textbook 3.4.15
  3. Fall 2012 Midterm, #6b, Fall 2009 Midterm, #5
  4. Spring 2013 Midterm, #3b-#3d

A level

  1. Textbook 3.4.23, here R is the multiplicative factor (we used 31 in class).

Geometric Applications of BSTs

Key Concepts

1d range search

Kd-Trees

Recommended Problems

None.

Undirected Graphs

Key Concepts

Graph Representation
There are many possible internal representations of a graph. We discussed three. You should be familiar with the tradeoffs for each.

Important graph problems

You should know how to solve the first three -- not necessarily because these problems are so important, but because you should have at least a few examples of how to use DFS or BFS to solve problems. For the last four, it's good scholarship to be know whether the problem is tractable (polynomial time) and whether there exists an easy to understand tractable algorithm. We may even ask you how these things on an exam. You do not need to be familiar with the details of the best known algorithm for their solution.

One important meta-point is that it is non-trivial to determine the difficulty level of a graph problem. For example, finding an Euler tour is polynomial time and the algorithm is non-trivial but not hideously complex. By contrast, finding a Hamilton tour is an intractable problem. Planarity can be solved in linear time, but it's a rather complex algorithm to follow. Graph isomorphism is particularly notable since its tractability has remained elusive despite decades of research.

Important graph tools

Graph client design pattern (used for 4.1, 4.2, 4.3, and 4.4). Instead of creating highly complex classes that support basic graph operations AND solution of important graph problems, we have a basic class (e.g. Graph) that gives us basic operations, and special purpose clients that solve subproblems on a graph (e.g. ConnectedComponents(Graph g)).

Recommended Problems

C level

  1. Fall 09 Final, #2a
  2. Textbook: 4.1.12

B level

  1. Fall 09 Final, #2b
  2. Fall 10 Final, #2c
  3. Textbook: 4.1.14, 4.1.10
  4. What is the run time for DFS or BFS if we use an adjacency matrix insted of adjacency lists?

A level

  1. Fall 12 Final, #15
  2. Textbook: 4.1.33

Directed Graphs

Key Concepts

Using directed graphs vs. undirected graphs. Know when to use each. This should be fairly natural. For example, you'd use an undirected graph to represent friend connections between Facebook users, and a directed graph to represent following connections on Twitter.

BFS and DFS are directed graph algorithms. Many canonical unweighted digraph problems (e.g. shortest paths and reachability) can be solved using code that is effectively identical to code for solving problems on unweighted graphs.

Important digraph problems.

Important digraph tools.

Recommended Problems

C level

  1. Spring 08 Final, #1a, #1b
  2. Fall 08 Final, #1a, #1b
  3. Fall 10 Final, #3a
  4. Spring 12 final, #3a, #3ab

B level

  1. Spring 08 Final, #1c, #1d
  2. Fall 08 Final, #1c
  3. Fall 10 Final, #3b
  4. Fall 10 Final, #2d, #2e
  5. Textbook: 4.2.10, 4.2.20

A level

  1. Fall 2008, #11
  2. 4.2.19, 4.2.22 (exam will not require you to know why Kosaraju-Sharir works)
  3. 4.2.27
  4. 4.2.40

Just for fun

  1. Write code similar to (or copy) the webcrawler shown in lecture.

Minimum Spanning Trees

Definition of a weighted graph and an MST.

Cuts.

Manually performing Kruskal's and Prim's algorithms. These are easy and you should be able to do them even under the influence of powerful sedatives.

Why Kruskal's and Prim's algorithms work.

Implementing Kruskal's and Prim's algorithms.

Recommended Problems

C level

  1. Spring 08 Final, #2a, #2b
  2. Fall 08 Final, #2a, #2b
  3. Fall 09 Final, #1b
  4. Fall 09 Final, #3a, #3c
  5. Fall 10 Final, #4a, #4b
  6. Fall 10 Final, #3a, #3b
  7. Spring 12 Final, #4a, #4b
  8. Would Kruskal's or Prim's algorithm work with weighted directed graphs?

B level

  1. Fall 09 Final, #3b
  2. Fall 12 Final, #4a, #4b, #4c, #4d
  3. Textbook 4.3.8 - this is a particularly handy concept!
  4. Textbook 4.3.12
  5. Textbook 4.3.13
  6. Textbook 4.3.20
  7. True or False: Given any two components that are generated as Kruskal's algorithm is running (but before it has completed), the smallest edge connecting those two components is part of the MST.
  8. Textbook 4.3.19 - to be clear, they mean that the PQ is implemented with a sorted list.
  9. Note the solution on the booksite is for an unsorted list. That's also a fine problem.

A level

  1. Spring 08 Final, #3
  2. Fall 09 Final, #3d
  3. Textbook 4.3.26

Just for fun

  1. Figure out if Kruskal's, Prim's Lazy, or Prim's Eager algorithm are faster on "typical" graphs. Use the code from the booksite.

Shortest Paths

Key Concepts

Graph Representation

Shortest path representation

Edge relaxation

Manually performing Dijkstra's, Acyclic Shortest Paths, and Bellman Ford - These should be very easy to manully execute.

Bellman Ford algorithm

Dijkstra's algorithm

Acyclic shortest paths

Recommended Problems

C level

  1. Fall 2009 Final, #4
  2. Fall 2010 Final, #5
  3. Textbook 4.3.1 and 4.4.1

B level

  1. Spring 2008 Final, #11 (this should be very easy if you've done Seam Carving!)
  2. Fall 2011 Final, #4
  3. Spring 2012 Final, #5
  4. Fall 2012 Final, #5
  5. Think about how to solve the undirected version of the shortest paths problem. Why is it easy for graphs with positive edge weights? Why is it hard if we include negative edge weights?
  6. Textbook 4.4.25
  7. Since Prim's and Dijkstra's algorithms are so similar, why do we need to check for cycle in Prim's, but not in Dijkstra's?
  8. Textbook 4.4.42 and 4.4.44

A level

  1. Fall 2008 Final, #11 (I remember getting asked this at a career fair as a freshman!)
  2. Textbook 4.4.34
  3. Textbook 4.4.37

Max-Flow

Key Concepts

Graph Representation

Max-flow, min-cut, and Ford-Fulkerson basics

Ford-Fulkerson details

Recommended Problems

C level

  1. Fall 2011, #11
  2. Spring 2012, #11
  3. Fall 2012, #6

B level

  1. Textbook 6.36
  2. Textbook 6.38

A level

  1. Textbook 6.37

String sorts

Key Concepts

Terminology Key indexed counting. Allows you to sort N keys in linear time (better than linearithmic). Beats linearithmic bound by avoiding any comparisons. This is a completely different philosophy for how things should be sorted. This is the most important concept for this lecture.

Manually performing LSD and MSD. Should be doable in your sleep.

LSD.

MSD.

Implementation issues.

Recommended Problems

C level

  1. Spring 2012 Final, #6

B level

  1. Fall 2008 Final, #6
  2. Fall 2012 Final, #7
  3. Spring 2008 Final, #12
  4. Textbook 5.1.8, 5.1.10

A level

  1. How could we avoid the performance hit from our special charAt function?
  2. What makes MSD cache unfriendly?
  3. Textbook 5.1.22. The results may surprise you.

Tries

Key Concepts

Terminology and Basics Hash Tables and LLRBs with String keys Tries TSTS

Recommended Problems

C level

  1. Spring 2008 Final, #5
  2. Spring 2008 Final, #4
  3. Fall 2011 Final, #8
  4. Fall 2012 Final, #8
  5. Textbook 5.2.3, 5.2.4

B level

  1. Fall 2009 Final, #5
  2. Spring 2012 Final, #9
  3. Textbook 5.2.21 (just design the API)
  4. When would we want to use an R-way trie instead of a TST?
  5. Give a worst case input that causes a TST to be unbalanced (and thus slow).
  6. Is the number of character compares in an R-way trie strictly better than for an LLRB? For a hash table? Is a trie guaranteed to be faster in overall run time than an LLRB or a hash table?

A level

  1. When might we want to use a hash table instead of a TST?
  2. What would happen if we had an R-way trie where we keep our next nodes in a linked list? How would this affect memory performance? Timing? What if we used an LLRB? A hash table?
  3. Repeat Spring 2013 midterm problem #8, but assume your keys are Strings instead of Comparables.

Substring Matching

Key Concepts

Terminology and Basics KMP Boyer-Moore Rabin Karp

Recommended Problems

C level

  1. Spring 2008 Final, #6
  2. Fall 2009 Final, #6
  3. Fall 2012 Final, #10

B level

  1. Fall 2009 Final, #5
  2. Fall 2009 Final, #10 [great problem!]
  3. Fall 2010 Final, #10
  4. Fall 2011 Final, #6
  5. Spring 2012 Final, #7
  6. Fall 2012 Final, #9
  7. Give an example of when you might want to use KMP? Boyer Moore? Rabin Karp?

A level

  1. Fall 2010 Final, #10
  2. Fall 2012 Final, #14
  3. Give an example of when you might prefer to use the Monte Carlo version of Rabin Karp over the Las Vegas version.
  4. Textbook: 5.3.22
  5. Textbook: 5.3.26

Regular Expressions

Key Concepts

Terminology and Basics NFA Simulation NFA Construction

Recommended Problems

C level

  1. Spring 2008 Final, #8
  2. Fall 2008 Final, #8
  3. Fall 2009 Final, #7
  4. Fall 2010 Final, #9
  5. Spring 2012 Final, #10
  6. Spring 2012 Final, #11
  7. 5.4.1, 5.4.2

B level

  1. Fall 2011 Final, #7
  2. Textbook 5.4.16, 5.4.17, 5.4.18

A level

None!

Just for fun

  1. 5.4.13

Data Compression

Key Concepts

Terminology and Basics Run length coding Huffman coding LZW Kolmogorov Complexity

Recommended Problems

C level

  1. Spring 2008 Final, #4
  2. Fall 2008 Final, #7
  3. Fall 2011 Final, #10b
  4. Textbook 5.5.3
  5. Why is the Kolmogorov complexity of "a" (relatively) easy to find?

B level

  1. Fall 2011 Final, #10a
  2. Textbook 5.5.13
  3. Textbook 5.5.17
  4. Is the Kolmogorov complexity of π small or large?

A level

  1. Fall 2012 Final, #13
  2. Textbook 5.5.23 (don't know the answer myself, I'm quite curious)

Reductions and Intractability

Key Concepts

Recommended Problems

Note: The Fall 2013 final will not feature one of those really sneaky reduction problems (typically part a of the B level and A level problems below). However, we will probably ask you about how reductions work.

C level

  1. Textbook 6.52

B level

  1. Textbook 6.57
  2. Textbook 6.61, but replace "search" with "decision" to match our terminology from class.
  3. Textbook 6.62
  4. Spring 2008 Final, #8
  5. Fall 2008 Final, #12
  6. Fall 2009 Final, #11
  7. What does the Venn diagram look like for NP-Complete and NP?
  8. If P=NP, then why is the set of NP-Complete problems the same as P?

A level

  1. Fall 2010 Final, #10
  2. Fall 2011 Final, #13
  3. Spring 2012 Final, #13
  4. Fall 2012 Final, #15