Andy Zeng
Block Matching

Block Matching (aka Brute Force Feature Correspondence)

In stereo, the problem of correspondence is often framed in terms of disparity. Suppose we are given a reference image I, and another image I* of the same scene taken from a different viewpoint. We can assume, as discussed before, that the images have been rectified. This means that the epipolar lines are horizontal and parallel to each other, in other words they are the scan lines of the image. Alternatively, the two viewpoints are merely shifted versions of each other, and the shift is horizontal.

For a given pixel (i, j) in the reference image I, the corresponding pixel in I* will be of the form (i,j* ). The disparity of the pixel (i, j) is then (j -j* ). We will be working with three different datasets: cones, teddy and tsukuba. In each directory, there is a left and right image. For this project, I chose to use the left one as the reference image. In this problem, we will look at ways of computing the disparity. The method we will consider for computing disparity considers each pixel independently. For each pixel (i, j) in the left image, we will find the best matching pixel (i,j*) in the right image. More precisely we will compute a cost for each j* and pick the j* that minimizes the cost. Henceforth we will use C(i,j,j*) to denote this cost. Consider a 3 x 3 window centered at (i, j) in the left image. String the pixel values in this window into a vector and call it x. Similarly string the pixel values in a 3 x 3 window centered at (i,j*) in the right image and call it y. We can consider the following definitions for the cost C(i,j,j*): sum squared distances (SSD) and normalized cross correlation (NCC).

Gallery

SSD seems to be performing better than NCC (although for the first case, the error rates were very close) Visualizations of disparity (computed over the mean) (accuracies obtained from comparing against the ground truth). The results are shown below.

Cones (original images, left and right respectively)

Cones Disparity Map (left image: SSD, error = 55.31%) (right image: NCC, error = 55.87%)

Teddy (original images, left and right respectively)

Teddy Disparity Map (left image: SSD, error = 48.23%) (right image: NCC, error = 62.57%)

Tsukuba (original images, left and right respectively)

Tsukuba Disparity Map (left image: SSD, error = 35.14%) (right image: NCC, error = 66.29%)

Aside from noting the obvious differences in performance between SSD and NCC (SSD consistently performs better), the errors seem to be more prominent in areas where shading does not change as much. In other words, we can visualize certain features in the images that tell us where the edges of the images are (hence the distinguishable silhouettes of the objects in our visualizations above), but the disparity in areas between edges are not as accurately presented. This makes sense because our algorithm for pixel correspondence work specifically to track the differences in patches in order to match them between images.

Error rates: Cones: SSD 7×7: e≈33.19%
NCC 7×7: e≈27.87%
SSD 11×11: e≈33.25%
NCC 11×11: e≈29.71%
Teddy:
SSD 7×7: e≈33.24%
NCC 7×7: e≈26.21%
SSD 11×11: e≈33.80%
NCC 11×11: e≈28.63%
Tsukuba:
SSD 7×7: e≈18.26%
NCC 7×7: e≈24.43%
SSD 11×11: e≈16.18%
NCC 11×11: e≈17.42%
Observations: with larger patches, NCC can occasionally outperform SSD. Additionally, we see that in images where there are smaller/less areas where the shade is near flat, accuracy grows as the patch size increases. This makes sense since correspondences between patches are more likely to be unique. But we see that when handling images with more/larger areas with flat shading, larger patches could mean less accuracy (goldilocks concept?).