COS 429: Computer Vision, Fall 2013
Assignment 2: Image Mosaics

Part 1: Thought Exercise

In one paragraph, please describe why you think SIFT features are so effective for correspondence-finding applications in computer vision, as compared to other methods. Is it mostly that the SIFT feature locations detected with DoG (differences of Gaussians) are especially repeatable across different images (compared to corners, for example)? Is it that extracting extrema in both scale and space is critical to finding salient feature locations? Is it that estimating scale and orientation are critical to effective matching? Is that histograms of gradient directions are especially effective at distinguishing correct feature matches from incorrect ones (e.g., compared to L2 distances between pixels in KxK windows, for example). Or, is it a particularly well-chosen combination of all these factors -- if so, why/how are the methods synergistic? Please support your answer with detailed descriptions of the SIFT algorithm where possible.

 

Part 2: Programming Exercise

Your goal for this part of the assignment is to write a MATLAB program to create an image mosaic out of two overlapping input images. For example, the two images shown on the left below have been "stitched" into the panorama shown on the right.

Input1 Input2 Output Panorama

Creating a panoramic image requires mapping one image plane to the other. Since in general we do not know how to relate the position and orientation of the two camera views, we will use image features techniques discussed in class to recover the underlying mapping. First, we will identify salient feature points in both images. Then, we will find correspondences between those feature points. Next, we will compute a transformation that maps corresponding feature points onto one another. Finally, once we have the transformation, we can warp one image onto the other and compose the two images to generate the final result.

These steps can be coded in MATLAB with the following functions (detailed descriptions of input and output variables appear in the code skeleton provided in the cos429_assignment2.zip file). Please implement at least the ones marked in bold:

Step A: Feature detection
features = detectfeatures(input_image, max_features, algorithm)
produces a 4xF matrix representing the 2D locations, scales, and orientations of salient feature points in the input image, where F is the number of features detected (F <= max_features) and algorithm can be any of the following:

Step B: Feature description
descriptors = computedescriptors(input_image, features, algorithm) produces a KxF descriptor for each feature, where F = size(features) and K is the size of the descriptor, which will be different for different algorithms (e.g., K=128 for SIFT).

Step C: Feature matching
matches = findmatches(features1, descriptors1, features2, descriptors2, max_matches, algorithm) produces a 2xM matrix of integers representing the indices of the features in features1 and features2 that provide the best pairwise matches, where M is the number of detected matches (M <= max_matches), and algorithm is one of the following:

Step D: Feature correspondence
correspondences = findcorrespondences(features1, descriptors1, features2, descriptors2, matches, algorithm) produces a 2xC matrix of integers representing the indices of the features from features1 and features2 that provide the best set of correspondences consistent with a homography transformation among the provided matches, where C is the number of correspondences found, and algorithm is one of the following:

Step E: Homography estimation
transform = computetransform(features1, features2, correspondences, algorithm, groundtruth_filename) produces a 3 x 3 matrix representing the homgraphy that best aligns the provided correspondences.

Step F: Image composition
output_image = compositeimage(input_image1, input_image2, transformation, algorithm) warps input_image1 by the given transform and the composites (merges) it with input_image2 using one of these algorithms:

Implementations for many of these steps are provided for you (see notes in parentheses). Your main tasks are to implement the 'harris' algorithm for Step A, the 'window' algorithm for Step B, the 'ratio' algorithm for Step C, and the 'RANSAC' algorithm for step D. Implementing the 'mutual' algorithm for Step C is optional.

 

Part 3: Experimentation

The previous part of the assignment describes a pipeline for image mosaicing. There are multiple possible implementations for each of the steps. For this part of the assignment, we would like you to experiment with different design choices and evaluate how well different algorithms work.

Specifically, please implement an algorithm of your own choice to improve at least ONE of the steps -- i.e., create an 'awesome' algorithm for any step (except E). The "Experimentation" section of your writeup should include a description of your modification, an explanation of why you chose it, and an analysis of how and why your 'awesome' algorithm improves or hurts the results.

Please think carefully about how to design the experiment to test whether your modification improves the results. At the very least, you should show images and quantiative evaluations comparing results on a small set of images with your `awesome` algorithm compared to other options for the same step using the same combinations of options for other steps.

 

Part 4: Results and Analysis

You should execute your program using runme.m on a variety of input test image pairs to investigate how well it works with different combinations of algorithms and under different input conditions.

First, please show outputs of your program for all of the images in the "input" subdirectory of cos429_assignment2.zip and at least one pair of images taken with your own camera using the 'sift', 'sift', 'ratio', 'RANSAC', 'cp2tform', and 'simple' options. Note that some of these inputs are HARD, and so you should not expect to get perfect results for all of them.

Second, please compute and compare results for a small set of images (of your choosing) with the four possible combinations of using the 'sift' and 'corner' options for feature detection and the 'sift' and 'window' options for feature description (along with 'ratio', 'RANSAC', 'cp2tform', and 'simple'). For each of the four combinations, please provide overlay images and quantative evaluations to compare the results in your writeup -- these comparisons can inform your answer to the thought exercise in Part 1.

Please include these results in a third section of your writeup titled "Results and Analysis." In addition to showing images, please provide a short discussion of how well your program works. Overall, the goal of this section is to answer questions like: When does your program succeed? When does it fail? Which step(s) are failing when the program fails? What are the key attributes of input images that affect its success? What parameter settings affect its success? You do not have to answer all of these questions, but you should discuss at least one characteristic of the input image pairs and/or parameters that affects the quality of your results, demonstrated by results for a set of input pairs spanning cases where the algorithm does and doesn't work well.

To facilitate your evaluations and comparisons, we provide 'ground truth' results for each of the test images and an evaluation metric to measure how well the homography computed by your program matches the ground truth. The evaluation metric is the average distance between the position to which pixels in input_image1 are mapped by your program versus the position to which they are mapped by the ground truth homography (see computeerror.m). You can create ground truth data for your own examples using runme_generategroundtruth.m.

 

Getting Started

You should start from cos429_assignment2.zip, which provides the directory structure to follow in your submission, a template for your writeup (writeup/writeup.html), a set of test input images (input/*.jpg), corresponding ground truth mosaics (groundtruth/*.mat), simple templates for the MATLAB functions you must implement, a script to ease creating results for your writeup (code/runme.m). and a script to ease creating ground truth for your own examples (code/runme_generategroundtruth.m).

You should edit the MATLAB files (and possibly create new ones) to implement the algorithms, download new test images into the input subdirectory, execute runme.m to produce your results, and complete your writeup.

 

Submitting Your Solution

Please submit your solution via the dropbox link here.

Your submission should include a single file named "assignment2.zip" with the following structure:

Please follow the general policies for submitting assignments, including the late policy and collaboration policy.