COS 429: Computer Vision, Fall 2013
Assignment 2: Image Mosaics

Part 1: Thought Exercise

In one paragraph, please describe why you think SIFT features are so effective for correspondence-finding applications in computer vision, as compared to other methods. Is it mostly that the SIFT feature locations detected with DoG (differences of Gaussians) are especially repeatable across different images (compared to corners, for example)? Is it that extracting extrema in both scale and space is critical to finding salient feature locations? Is it that estimating scale and orientation are critical to effective matching? Is that histograms of gradient directions are especially effective at distinguishing correct feature matches from incorrect ones (e.g., compared to L2 distances between pixels in KxK windows, for example). Or, is it a particularly well-chosen combination of all these factors -- if so, why/how are the methods synergistic? Please support your answer with detailed descriptions of the SIFT algorithm where possible.

Part 2: Programming Exercise

Your goal for this part of the assignment is to write a MATLAB program to create an image mosaic out of two overlapping input images. For example, the two images shown on the left below have been "stitched" into the panorama shown on the right.

Input1 Input2 Output Panorama

Creating a panoramic image requires mapping one image plane to the other. Since in general we do not know how to relate the position and orientation of the two camera views, we will use image features techniques discussed in class to recover the underlying mapping. First, we will identify salient feature points in both images. Then, we will find correspondences between those feature points. Next, we will compute a transformation that maps corresponding feature points onto one another. Finally, once we have the transformation, we can warp one image onto the other and compose the two images to generate the final result.

These steps can be coded in MATLAB with the following functions (detailed descriptions of input and output variables appear in the code skeleton provided in the cos429_assignment2.zip file). Please implement at least the ones marked in bold:

Step A: Feature detection
features = detectfeatures(input_image, max_features, algorithm)
produces a 4xF matrix representing the 2D locations, scales, and orientations of salient feature points in the input image, where F is the number of features detected (F <= max_features) and algorithm can be any of the following:

'random': return features at random positions, scales, and orientations (provided).

'sift': return features detected by SIFT (provided).
'harris': return the strongest features computed with the Harris corner detector. For each returned corner feature, the scale should be set based on the window size (and possibly downsampling factor of the input image) and the orientation should align with the eigenvector associated with the largest eigenvalue of the covariance matrix computed for the feature.

'awesome': return features computed with your own algorithm.

Step B: Feature description
descriptors = computedescriptors(input_image, features, algorithm) produces a KxF descriptor for each feature, where F = size(features) and K is the size of the descriptor, which will be different for different algorithms (e.g., K=128 for SIFT).

'random': return a KxF matrix of random values (provided).
'sift': return a 128xF matrix containing the SIFT descriptor for each feature as computed by SIFT (provided).
'window': return a (k*k)xF matrix containing the luminance sampled on a kxk rectangular grid of locations centered at the feature position after the neighborhood has been scaled and rotated according to the scale and orientation of the feature (e.g., k=7).
'awesome': return descriptors computed with your own algorithm.

Step C: Feature matching
matches = findmatches(features1, descriptors1, features2, descriptors2, max_matches, algorithm) produces a 2xM matrix of integers representing the indices of the features in features1 and features2 that provide the best pairwise matches, where M is the number of detected matches (M <= max_matches), and algorithm is one of the following:

'random': return a 2xM matrix of random values (provided).
'mutual': return all matches between features i1 and i2 where the L2 distance between descriptors i1 and i2 is less than the L2 distance between descriptor i1 and any other in descriptors2 AND also less than the L2 distance between descriptor i2 and any other in descriptors1 (note: this is not required).
'ratio': return all matches between features i1 and i2 where the L2 distance between descriptors i1 and i2 is less a constant match_ratio times the L2 distance between descriptor i1 and second best match in descriptors2 (e.g., match_ratio=0.6).
'awesome': return matches computed with your own algorithm.

Step D: Feature correspondence
correspondences = findcorrespondences(features1, descriptors1, features2, descriptors2, matches, algorithm) produces a 2xC matrix of integers representing the indices of the features from features1 and features2 that provide the best set of correspondences consistent with a homography transformation among the provided matches, where C is the number of correspondences found, and algorithm is one of the following:

'random': return a 2xM matrix of random values (provided).
'RANSAC': return the best set of inlier feature correspondences found with the RANSAC algorithm. Choose the number of RANSAC iterations carefully to ensure that the best homography is likely to be found.
'awesome': return correspondences computed with your own algorithm.

Step E: Homography estimation
transform = computetransform(features1, features2, correspondences, algorithm, groundtruth_filename) produces a 3 x 3 matrix representing the homgraphy that best aligns the provided correspondences.

'groundtruth': return the best homography transformation computed from the groundtruth data (provided)
'cp2tform': return the homography transformation computed from the given feature correspondences with 'cp2tform' (provided)

Step F: Image composition
output_image = compositeimage(input_image1, input_image2, transformation, algorithm) warps input_image1 by the given transform and the composites (merges) it with input_image2 using one of these algorithms:

'simple': Returns a composite image where input_image1 is warped by the transformation and then copied over input_image2 (provided).
'overlay': Returns a composite image useful for visualizing errors in the mosaic. Gray indicates areas where the two images are similar after warping and compositing, and green/magenta indicates areas where they are different (provided).
'awesome': return the image composited with your own algorithm.

Implementations for many of these steps are provided for you (see notes in parentheses). Your main tasks are to implement the 'harris' algorithm for Step A, the 'window' algorithm for Step B, the 'ratio' algorithm for Step C, and the 'RANSAC' algorithm for step D. Implementing the 'mutual' algorithm for Step C is optional.

Part 3: Experimentation

The previous part of the assignment describes a pipeline for image mosaicing. There are multiple possible implementations for each of the steps. For this part of the assignment, we would like you to experiment with different design choices and evaluate how well different algorithms work.

Specifically, please implement an algorithm of your own choice to improve at least ONE of the steps -- i.e., create an 'awesome' algorithm for any step (except E). The "Experimentation" section of your writeup should include a description of your modification, an explanation of why you chose it, and an analysis of how and why your 'awesome' algorithm improves or hurts the results.

Please think carefully about how to design the experiment to test whether your modification improves the results. At the very least, you should show images and quantiative evaluations comparing results on a small set of images with your `awesome` algorithm compared to other options for the same step using the same combinations of options for other steps.

Part 4: Results and Analysis

You should execute your program using runme.m on a variety of input test image pairs to investigate how well it works with different combinations of algorithms and under different input conditions.

First, please show outputs of your program for all of the images in the "input" subdirectory of cos429_assignment2.zip and at least one pair of images taken with your own camera using the 'sift', 'sift', 'ratio', 'RANSAC', 'cp2tform', and 'simple' options. Note that some of these inputs are HARD, and so you should not expect to get perfect results for all of them.

Second, please compute and compare results for a small set of images (of your choosing) with the four possible combinations of using the 'sift' and 'corner' options for feature detection and the 'sift' and 'window' options for feature description (along with 'ratio', 'RANSAC', 'cp2tform', and 'simple'). For each of the four combinations, please provide overlay images and quantative evaluations to compare the results in your writeup -- these comparisons can inform your answer to the thought exercise in Part 1.

Please include these results in a third section of your writeup titled "Results and Analysis." In addition to showing images, please provide a short discussion of how well your program works. Overall, the goal of this section is to answer questions like: When does your program succeed? When does it fail? Which step(s) are failing when the program fails? What are the key attributes of input images that affect its success? What parameter settings affect its success? You do not have to answer all of these questions, but you should discuss at least one characteristic of the input image pairs and/or parameters that affects the quality of your results, demonstrated by results for a set of input pairs spanning cases where the algorithm does and doesn't work well.

To facilitate your evaluations and comparisons, we provide 'ground truth' results for each of the test images and an evaluation metric to measure how well the homography computed by your program matches the ground truth. The evaluation metric is the average distance between the position to which pixels in input_image1 are mapped by your program versus the position to which they are mapped by the ground truth homography (see computeerror.m). You can create ground truth data for your own examples using runme_generategroundtruth.m.

Getting Started

You should start from cos429_assignment2.zip, which provides the directory structure to follow in your submission, a template for your writeup (writeup/writeup.html), a set of test input images (input/*.jpg), corresponding ground truth mosaics (groundtruth/*.mat), simple templates for the MATLAB functions you must implement, a script to ease creating results for your writeup (code/runme.m). and a script to ease creating ground truth for your own examples (code/runme_generategroundtruth.m).

You should edit the MATLAB files (and possibly create new ones) to implement the algorithms, download new test images into the input subdirectory, execute runme.m to produce your results, and complete your writeup.

Submitting Your Solution

Please submit your solution via the dropbox link here.

Your submission should include a single file named "assignment2.zip" with the following structure:

A subdirectory named "code" containing your source code. You should not change the API of the MATLAB functions provided, but you may add new MATLAB files as you see fit.
A subdirectory named "groundtruth" containing all groundtruth files in ".mat" format (this is provided).
A subdirectory named "input" containing all test input files in ".jpg" format (include both your new examples and the test images provided).
A subdirectory named "features" containing images overlaid by marks for features created with detectfeatures.m (should be produced automatically by runme.m).
A subdirectory named "matches" containing images overlaid by lines between matches found with findmatches.m (should be produced automatically by runme.m).
A subdirectory named "correspondences" containing images overlaid by lines between correspondences found with findcorrespondences.m (should be produced automatically by runme.m).
A subdirectory named "output" containing all images produced by compositeimages (should be produced automatically by runme.m).
A subdirectory named "writeup" containing a file named "writeup.html" containing separate sections titled: (you can start from the template provided in the .zip file):
1. Thought exercise: your answer to the question in Part 1.
2. Programming exercise: a brief statement of what works and does not work in your implementation of Part 2.
3. Experimentation: description of what 'awesome' algorithmic option you chose to implement and why.
4. Results and Analysis: display and discussion of qualitative results (images) and quantitative results (errors from computerrors.m) produced by your program for different inputs, parameter settings, and algorithmic modifications.
5. Conclusion: summary of your findings.
6. Notes: any other comments, description of any assistance received and/or any discussions had with other students.

Please follow the general policies for submitting assignments, including the late policy and collaboration policy.

COS 429: Computer Vision, Fall 2013 Assignment 2: Image Mosaics