COS 429 - Computer Vision

Fall 2005

Assignment 2

Due Thursday, Oct. 27

Submission Guidelines

Please submit your write-up as an HTML file, with links to your code and images (or even better, IMG tags). Nothing fancy is required. If you didn't receive any comments on the way you formatted your assignment 1, it was probably just fine.

1. Questions (30%)

When we were discussing the Hough transform for lines, we saw that parameterizing lines by slope and intercept led to a less uniform parameterization than angle and distance from center. To analyse this nonuniformity, assume that you are given 1000 lines (of random orientations) and you must assign them to 5 buckets based on orientation.
1. How many lines fall in each bucket if the buckets are assigned uniformly based on angle?
2. How many lines fall in each bucket if the buckets are assigned uniformly based on slope? For concreteness, assume that the slopes of the buckets are -2, -1, 0, 1, 2 and that each line is assigned to the bucket that is nearest to its slope.
The finite size of an image implies that, on average, the length in pixels of the visible portions of lines close to the image center C is greater than that of lines distant from C. How does this bias the Hough transform? How could you counter this bias?
Imagine you are given two vectors, a "signal" S and a "template" T. Assume T is shorter than S. Now, you want to find the position within S at which T is the best match according to the sum of squared differences (SSD) criterion. That is, given an offset k at which you are looking for T within S, you want to find the k that minimizes

Show how you can do this without an explicit loop over k by computing two convolutions:
1. S convolved with some variant of T
  Hint: this won't necessarily be T itself, but some simple transformation. Look at the equation above, and compare with the definition of discrete convolution.
2. The vector consisting of the square of each element of S, convolved with a vector the same length as T but consisting of all ones.
You will need this result for the face detection portion of the assignment below.

2. Aligned database of faces (20%)

The remainder of this assignment will be to implement a face detection system trained on a database of examples. The first step is to align and normalize the examples. Download the following images:

cos429_f05_faces_scaled.zip (2 MB)
Note: this includes pictures taken Oct 11 - download this again if you have the earlier version

and make sure you can load them into Matlab. These images are rescaled and slightly cropped from the originals, to make them more manageable. If you would like to work with the originals instead (including pictures with glasses, etc.), those are here:

cos429_f05_faces_orig.zip (72 MB)

Then, implement code that:

Presents each face image in turn
Lets the user click on the centers of the eyes, and stores those coordinates (using the getpts function)
Converts the image to grayscale
Warps the image so that the eye points are mapped to fixed locations, 100 pixels apart horizontally (look up the imtransform and cp2tform functions)
Crops out an appropriate section (e.g., 300 pixels tall by 200 wide) of the image (the easiest way is to use the 'XData' and 'YData' options to imtransform)
Saves the results, so you don't have to do the clicking more than once

The result should be a collection of well-aligned equal-sized images. To verify, look at (and submit) the average of the images, and confirm that it looks like a face. If you're feeling ambitious, use more features than just the eyes to compute alignment. You could also try to localize the eyes and align faces automatically.

3. Face detection (40%)

We're just going to use the average face we computed above as a template. To find faces in a test image, we'll find where subsets of the test image match the template best, using the SSD metric. As you showed above, this can be done using a couple of convolutions.

There are two wrinkles to deal with:

Because the filter is now rather larger than the ones we were using in the edge detector assignment, you'll get much better performance by using the FFT to do the convolution. Implement an FFT-based convolution (using the fft and ifft functions), and compare its performance to conv2 on an image and filter of your choosing.
Since the face can appear at any size, you'll need to do your convolution at multiple scales. This can be done by scaling either the face template or the input image. Pick one of these options, and justify why you chose the one you did.

Run your face detector on some test images of crowds of people. You can use your own, or here are some taken in class:

cos429_f05_faces_target.zip

Show the SSD score image for each test. Devise a method based on thresholding and/or nonmaximum suppression for narrowing the output down to a set of discrete locations in each image where you think there's a face.

4. Eigenfaces (10%)

Run PCA with whitening on your database of faces (read up on the Matlab svd function, especially the svd(X,0) syntax). Show us the top 5 principal components. If you're feeling ambitious, try some recognition experiments with the faces you detected in part (3) - can you recognize faces based on their projection onto the top few principal components and a nearest-neighbor classifier?

Last update 29-Dec-2010 12:00:22

smr at princeton edu