COS 429 - Computer Vision

Fall 2019

Course home Outline and Lecture Notes Assignments


Assignment 2: Face Detection and Model Fitting

Due Thursday, Oct. 17


Part II. Training a Face Classifier

Dalal-Triggs: The 2005 paper by Dalal and Triggs proposes to perform pedestrian detection by training a classifier on a Histogram of Gradients (HoG), then applying that classifier throughout an image. Your next task is to train that classifier, and test how well it works.

Training: At training time, you will need two sets of images: ones containing faces and ones containing nonfaces. The set of faces provided to you comes from the Caltech 10,000 Web Faces dataset. The dataset has been trimmed to 6,000 faces, by eliminating images that are not large or frontal enough. Then, each image is cropped to just the face, and is resized to a uniform 36x36 pixels. All of these images are grayscale.

The non-face images come from Wu et al. and the SUN scene database. You'll need to a bit more work to use these, though, since they come in a variety of sizes and include entire scenes. So, you will need to randomly sample patches from these images, and resize each patch to be the same 36x36 pixels as the face dataset.

Once you have your positive and negative examples (faces and nonfaces, respectively), you'll compute the HoG descriptor (a partial implementation of which is provided for you) on each one. Finally, you'll train the logistic regression classifier you wrote in Part I to run on the feature vectors.

As mentioned in the Dalal and Triggs paper and in class, the HoG descriptor has a number of parameters that affect its performance. Two of these are exposed as inputs to the hog36 function, but the code to use them must be implemented. These parameters are the number of orientations in each bin, and whether those orientations cover the full 360 degrees or orientations 180 degrees apart are collapsed together. Implement the code needed to use the parameters orientations and wrap180, and then experiment with these parameters to see whether the conclusions reached in the paper (i.e., performance does not improve beyond about 9 orientations, and it doesn't matter whether or not you wrap at 180 degrees) hold equally well for face detection as for pedestrian detection.

Predicting: One of the nice things about logistic regression is that its output ranges from 0 to 1, and is naturally interpreted as a probability. (In fact, using logistic regression is equivalent to assuming that the two classes are distributed according to Gaussian models with different means but the same variance.) In lecture, we thresholded the output of the learned model to get a 0/1 prediction for the class, which effectively thresholded at a probability of 0.5. But for face detection you may wish to bias the detector to give fewer false positives (i.e., be less likely to mark non-faces as faces) or fewer false negatives (i.e., be less likely to mark faces as non-faces). Therefore, you will look at graphs that plot the false-negative rate vs. false-positive rate as the threshold of probability is changed. Curves that lie closer to the bottom-left corner indicate better performance. Of course, you will look at performance on both the training set (for which you expect great performance) and a separate test set (which may or may not perform as well).


Do this:


Do this and turn in:

  1. Implement logistic_prob.py, get_training_data.py and get_testing_data.py. Look for sections marked "Fill in here". The trickiest part is likely to be selecting random squares (i.e., random positions and random sizes no smaller than 36x36) from the nonface images. Turn in these three files.
  2. Run

    from test_face_classifier import test_face_classifier
    test_face_classifier(250, 100, 4, True)

    in the Python console which trains the classifier on 250 faces and 250 nonfaces, tests it on 100 faces and 100 nonfaces, using 4 orientations that are wrapped at 180 degrees. If all goes well, the training should complete in a few seconds, and you should see two plots (again, one by one), for training and testing performance. The training plot should be boring: running down the bottom and left sides of the graph, indicating perfect performance. The testing plot, though, should indicate imperfect performance. Turn in the training and testing plots.
  3. Train the classifier on 6,000 faces and 6,000 nonfaces, and test it on 500 faces and 500 nonfaces. The training time will take longer, but should still finish in a few minutes, depending on your CPU. Note that with the same test set, the testing accuracy should increase significantly with more training data. Turn in the training and testing plots.
  4. Train the classifier with increasing the number of orientations from 4 to 6, 9, 12. Use 12,000 training images (6,000 faces and 6,000 nonfaces) and 1,000 test images (500 faces and 500 nonfaces) as before. Briefly describe what happens to test accuracy.
  5. Modify hog36.py to disable wrapping of orientations at 180 degrees. Turn in hog36.py. Train the classifier with 12,000 training images (6,000 faces and 6,000 nonfaces), 1,000 test images (500 faces and 500 nonfaces), 9 orientations, and no wrapping. Do you see the same behavior as Dalal and Triggs, in that turning off the wrapping of orientations at 180 degrees makes little difference to accuracy? Briefly explain why (or why not) that is the case.
  6. In parts III and IV of this assignment, you will run this detector at many locations throughout an image that may or may not contain some faces. Would you prefer to run the detector with a threshold that favors fewer false positives, fewer false negatives, or some balance? Briefly explain why.


Acknowledgment: idea and datasets courtesy James Hays.






Last update 24-Oct-2019 16:41:21