COS 429 - Computer Vision

Fall 2019

Course home Outline and Lecture Notes Assignments


Assignment 3: Tracking

Due Thursday, November 21

Changelog and Clarifications


In this assignment you will be building a face tracker based on the Lukas-Kanade algorithm. For background, you should read the lecture slides and the Lucas-Kanade paper and look at the notes for Lecture 13.


Part 1. Preliminaries (25 points)


Do this:


1.1 Motion (10 points)

We will be working with two motion models:

  1. Translation by $(u,v)$, where the coordinate $(x,y)$ is updated to a new location $(x',y')$ according to $$ x'=x+u $$ $$ y'=y+v $$
  2. Translation by $(u,v)$ and scale by $s$, where the scale $s$ is defined by the change of the object width from $w$ to $w'$ as: $$ s = \frac{w'-w}{w} $$ Thus $s=0$ means no scale, $s=1$ means the object doubles in size, $s=-0.5$ means it shrinks in half, and $s=-1$ means it shrinks to a singularity.

    To define the motion of a point $(x,y)$ due to scale, one needs to define the "center" of motion. We will refer to this arbitrary point as $(x_0,y_0)$. Thus scaling the point $(x,y)$ means $$ x' = x+(x-x_0)*s $$ $$ y' = y+(y-y_0)*s $$

    After scaling, we apply the translation also, producing the motion equations: $$ x' = x+u+(x-x_0)*s $$ $$ y' = y+v+(y-y_0)*s; $$

In the code, the file uvs.py contains functions that manipulate $(u,v,s)$ motion models. The data structure they use is simply a vector of 5 elements: [u, v, s, x0, y0].


Do this and turn in:


1.2 Rectangles (10 points)

In frame-to-frame alignment, our goal is to estimate these parameters, $(u,v)$ or $(u,v,s)$, from a local part of a given pair of images. For simplicity, we will limit ourselves to rectangular areas of the image. Our rectangles are defined as rect = [xmin, xmax, ymin, ymax] and the file rect.py contains functions to manipulate rectangles. (Technical note: the sides of a rect can take on non-integer values, where a rectangle may include only part of some pixels. Pixels are centered on the integer grid, thus a rectangle that contains exactly pixel [1,3] would be rect = [0.5 1.5 2.5 3.5].)

Given 2 rectangles (of the same aspect ratio), we can compute the $(u,v,s)$ motion between them. This could be useful to define an initial motion for the tracking if you have a guess of 2 rectangles that define the object (e.g. by running your face detector from Assignment 2).


Do this and turn in:


1.3 Image sequence (5 points)

Included with the starter code (in the data directory) are the 'woman' and 'man' video sequences of faces from http://vision.ucsd.edu/~bbabenko/project_miltrack.html .

Each frame of the video is just an image (here it's actually stored as set of .png files). We could represent the image as just a numpy array, as you have done in the previous assignments, but we will be cutting and warping images, and it is useful for an image to be able to have a coordinate system that is not necessarily rooted at the top-left corner. Here we will use a "coordinate image" structure (often called coimage or coi in the code). It is just a wrapper around the image with some extra fields, for example:

    coi = 
        im: [240x320 double]
        origin: [30 20]
        label: 'img00005'
        level: 1 

The level field will be useful later to keep track of which pyramid-level this image represents.

The file coi.py contains functions to create and manipulate these images. See the constructor function coimage in coi.py for more information.

The class FaceSequence in coi.py) is there to simplify the access to the tracking sequences downloaded above. See part 1.3 in test.py for some example usage. To initialize:

    fs = FaceSequence('path/to/data/woman')

Then, to access, e.g., the 5th image and read it into a coimage structure, you can do:

    coi = fs.readImage(4)

Note that FaceSequence is 0-indexed. If you want to access a sequence of images, say every 3rd image starting from the 2nd, do:

    fs.next = 1
    fs.step = 3
    coi1 = fs.readNextImage()  # the 2nd
    coi2 = fs.readNextImage()  # the 5th

Additionally, fs contains the "ground truth" rectangles stored with the clip in

     rect = fs.gt_rect[0, :]  # rectangle for 1st image

Beware: only some frames have valid ground truth rectangles, otherwise this rect = [-0.5, -0.5, -0.5, -0.5].

Do this and turn in:


Part 2: LK at a Single Level (35 points)

2.1 (u,v) motion (20 points)

Review the tracking lectures. The Lukas-Kanade algorithm repeatedly warps the "current" image backward according to the current motion to be similar to the "previous" image inside the area defined by the "previous rectangle".

Let $N$ be the number of pixels inside the rectangle. Recall that we need to compute a motion update $x = (u,v)$ (for translation-only motion) that satisfies: $$ Ax=b $$ where $A$ is an $N \times 2$ matrix, each of whose rows is the image gradient $(dx, dy$ at some pixel of the previous image, and $b$ is an $N \times 1$ column vector of errors (image intensity differences) between the previous and current image. To solve this with least squares, we compute $$ x = (A^T A)^{-1} A^T b. $$

The new combined motion (in motion + update) should be

    new_mot_u = mot_u + x_u;
    new_mot_v = mot_v + x_v;
Notice that $A$, hence $(A^T A)^{-1}$, does not change between iterations: we only need to re-warp the current image according to the updated motion.

The function LKonCoImage implements the LK algorithm on a single level.

A default set of params can be generated with LKinitParams.

Do this and turn in:


2.2 (u,v,s) motion (10 points)

Now we will implement motion with scale. The formulas are very similar: $x = (u, v, s)$ is the mot_update we want to solve for, and so each row of $A$ has another column: $$ A_i = (dx_i \; dy_i \; ww_i) $$ where $ww = dx \cdot (x-x_0) + dy \cdot (y-y_0)$ for each pixel inside the rectangle. (Thus, $A^T A$ will be a $3 \times 3$ matrix.)

Hint: coiPixCoords might be useful here.

Finally the motion update is a bit more complex:

    new_mot_u = mot_u + x_u + x_u * mot_s
    new_mot_v = mot_v + x_v + x_v * mot_s
    new_mot_s = mot_s + x_s + x_s * mot_s

Do this and turn in:

2.3 Analysis (5 points)

Experiment with the scale version of LKonCoImage and answer the following questions in your write-up:
  1. How far from the correct motion (in u, v, and s) can a single level of LK typically recover and still end up finding the correct motion? You can give numerical ranges for your answer. Keep the default max_iter for this question. Hint: play with the input motion.
  2. How consistent is the result (when it can converge) given different initial motions? Convergence here means that the algorithm was able to find the correct motion (or was very close). If it found a motion that is nowhere close to correct, then it did not converge.
  3. Can you make it more consistent by changing the parameters: max_iter, uvs_min_significant, and err_change_thr. Explain what each of these do and whether/how changing them changes the result.


Part 3: Multi-level LK (25 points)

The function LKonPyaramid implements multi-resolution LK. It should call LKonCoImage for each level of an Gaussian image pyramid. The image pyramid is built using

    pyr = coPyramid(coi, levs)

Each time we go "up" a level the image is subsampled by a factor of 2. The origin of the coordinate system is kept at the same pixel, but the width and height of any object is reduced to 1/2. A feature at coordinate x_L2 on level 2 would have coordinate x_L1 = x_L2 * 2; on level 1. The function rectChangeLevel implements level conversions for rectangles.

3.1 Active Levels (10 points)

The first task is to figure out which levels of the pyramid we want to work on. The function defineActiveLevels in LK.py should return a vector [L_start, ..., L_end], inclusive, of the levels we can work on. L_start for now should be the lowest level of the given pyramid (since we are willing to work on very big images), but the top level needs to be computed so that the number of pixels in the area of the rectangle is not smaller than specified in the parameter min_pix[0].

You may assume that in a pyramid, levels always start at 1, levels are in order, and levels are not skipped.

Hint: Be sure you are using all of the given arguments (prect, mep1, mep2), as all of them impose constraints on the active levels.

Do this and turn in:

3.2 Changing Levels (10 points)

We also need to update the motion as we move from level to level. This is done by calling the function uvsChangeLevel in uvs.py.

Do this and turn in:

3.3 Analysis (5 points)

Answer the following questions in your write-up:
  1. Run LKonPyramid on a pair of images from each example sequence ('man' and 'woman'). Note that the further apart the images are in the sequence the worse your results are likely to be. Then, submit in your write-up the figures generated by LKonPyramid in part 3.3 of test.py. There should be a figure generated for each level.
  2. How far can init_motion be from the correct motion? How does this compare with your results from part 2.2 on a single level? Note that part 3 uses test frames that are further away, so you should rerun part 2.2 with the new pair of frames.


Part 4: Incremental Tracking for a Sequence (15 points)

4.1 LK on a Sequence (10 points)

The function LKonSequence runs LKonPyramid on a sequence in an incremental way: 1→2, 2→3, 3→4, ... In other words, it should go through the sequence and calls on pairs of consecutive frames. mots and rects correspond to mot and rect from prior parts of this assignment, and they each are of length seq_length + 1. For each pair of frames, we find a new motion and rectangle, and the extra element is because we also store the initial motion and rectangle in mots and rects.


Do this and turn in:

4.2 Analysis (5 points)


Do this and turn in:


Submitting

This assignment is due Thursday, November 21, 2019 at 11:59 PM. Please see the general notes on submitting your assignments, as well as the late policy and the collaboration policy.

Submissions will be done through Gradescope. This assignment has 2 submissions:

  1. Assignment 3 Written: Submit one single PDF containing your write-up, including code snippets of all the code you added, answers to questions, and all requested figures.
  2. Assignment 3 Code: Please submit all of your code files. This should include coi.py, LK.py, rect.py, uvs.py, and test.py.

Please note that as was the case for Assignment 2, Assignment 3 Code is worth 0 points on Gradescope. We will grade the write-up and code together, but we will put scores into Assignment 3 Written only.


Last update 13-Dec-2019 03:17:23