ANALYSIS OF ALGORITHMS STUDY GUIDE


Empirical analysis. If the running time of our program (approximately) obeys a power law T(n) ~ anb, we can use a doubling hypothesis to estimate the coefficients a and b.

Tilde notation. We say that f(n) ~ g(n) if f(n)/g(n) converges to 1 as n gets large. This is a general concept about mathematical functions and is not restricted to running time, memory, or any other specific domain.

Cost model. For theoretical analyses of running time in COS 226, we will assume a cost model, namely that some particular operation (or operations) dominates the running time of a program. Then, we express the running time in terms of the total number of that operation as a function of the input size. To simplify things, we usually give this frequency count in tilde notation.

Order of growth. If we have two functions f(n) and g(n), and f(n) ~ c g(n) for some constant c > 0, we say the order of growth of f(n) is g(n). Typically g(n) is one of the following functions: 1, log n, n, n log n, n2, n3, or 2n.

Worst-case order of growth isn't everything. Just because one algorithm has a better order of growth than other does not mean that it is faster in practice. We will encounter some notable counterexamples, including quicksort vs. mergesort.

Memory analysis. Know how to calculate the memory utilization of a class with the 64-bit memory model from the textbook.

Theoretical and empirical analysis. Hypotheses generated through theoretical analysis (or guesswork like our power law assumption) should be validated with data before being fully trusted.

Recommended Problems

C level

  1. Textbook 1.4.4
  2. Fall 2011 Midterm, #2

B level

  1. Textbook 1.4.5
  2. Spring 2012 Midterm, #1
  3. For each of the functions shown, give the best order of growth of the running time.
     public static int f1 (int n) {
        int x = 0;
        for (int i = 0; i < n; i++)
            x++;
            return x;
        }
    
     public static int f2(int n) {
        int x = 0;
        for (int i = 0; i < n; i++)
            for (int j = 0; j < i*i; j++)
               x++;
            return x;
        }
    
     public static int f3 (int n) {
        if (n <= 1) return 1;
        return f3(n-1) + f3(n-1)
     }
    
     public static int f4 (int n) {
        if (n <= 1) return 1;
        return f4(n/2) + f4(n/2);
     }
    
     public static int f5 (int n) {
        if (n <= 1) return 1;
        return f1(n) + f5(n/2) + f5(n/2);
     }
    
     public static void f6(int n) {
        // 1<<i is the same as 2^i.
        // Ignore integer overflow.
        // 1<<i takes constant time.
        for (int i = 0; i < n; i = 1 << i);
     }
    
     Answers
    
      
  4. Consider the following three algorithms:
    1. Algorithm 1 solves problems of size N by recursively dividing them into 2 sub-problems of size N/2 and combining the results in time c (where c is some constant).
    2. Algorithm 2 solves problems of size N by solving one sub-problem of size N/2 and peforming some processing taking some constant time c.
    3. Algorithm 3 solves problems of size N by solving two sub-problems of size N/2 and performing a linear amount (i.e., cN where c is some constant) of extra work.
    (a) For each algorithm, write down a recurrence relation showing how T(N), the running time on an instance of size N, depends on the running time of a smaller instance.
    (b) For each recurrence relation, what is the running time for each T(N) (use tilde notation)?

    Answers

  5. Suppose we wanted to simulate percolation in a cube with N sites on a side, with each site connected to its neighbors up, down, left, right, forward, and back. If we used WeightedQuickUnionUF, what would be the order of growth of the expected running time, as a function of N?

    Answers

A level

  1. The code below operates on bacterial genomes of approximately 1 megabyte in size.
        int N = Integer.parseInt(args[0]);
        String[] genomes = new String[N];
        for (int i = 0; i < N; i++) {
            In gfile = new In("genomeFile" + i + ".txt");
            genomes[i] = gfile.readString();
        }
        for (int i = 1; i < N; i++) {
            for (int j = i; j > 0; j--) {
                if (genomes[j-1].length() > genomes[j].length())
                    exch(genomes, j-1, j);
                else break;
            }
        }
    
    1. What is the theoretical order of growth of the worst case running time as a function of N?
    2. A table of runtimes for the program above is given below. Approximate the empirical run time in tilde notation as a function of N. Do not leave your answer in terms of logarithms.
            N Time (s)
            1 0.15
            2 0.14
            4 0.19
            8 0.41
            16 0.85
            32 1.66
            64 3.38
            
    3. Explain any discrepancy between your answers to (a) and (b). Be as specific and detailed as possible.

    Answers