Part 0: Warmup
Our objective is to sharpen an image of our choice by accentuating the high frequencies. To obtain the high frequencies, I took the difference between the original image and the image filtered using a Gaussian 11x11 filter (sigma = 3). This is then multiplied by some alpha value (determining sharpness) and then added on to the original image. Here are some results using different values of alpha.
Part 1: Hybrid Images
In this part, our objective is to blend two images (aligned with the provided align_images function) into a hybrid image by blending the high frequency portion of one image with the low-frequency portion of another. To obtain the low-frequency portion of the first image, we filter the first image with a Gaussian 45x45 filter. To obtain the high-frequency portion of the second image, we take the difference between the second image and the second image filtered with a Gaussian 45x45 filter. We then add the low-frequency and high-frequency portions to generate the hybrid image. The sigma values of the Gaussian filters were determined by the cutoff frequencies, which were set as the half power point of the Gaussian, and derived from the standard deviations in the frequency domain (reference to the equations found in this article). Here are a few sample results, where the cutoffs were empirically computed to be 5 for the low frequencies and 100 for the high frequencies.
Note that the low pass filter effectively blocks the higher frequencies while the high pass filter increases the amplitude of the higher frequencies relative to the lower frequencies. The hybrid image gets the best of both worlds!
Check out some more hybrid images that were generated.
After testing quite a few images (in addition to the ones displayed above), I found out that in order to get a pretty good hybrid image, the second image (used to obtain the high frequency portion) needs to have a decent number of edges near the area of interest to balance versus the amount of visual features found in second image near that same area. Having too many edges can make it difficult to see the first image, while having too little can make it difficult to see the second image. Additionally, the first image (used to obtain the low frequency portion) should avoid "extremely dark" or "extremely bright" regions in comparison to the second image (after alignment) to minimize the amount of blatant visual artifacts, e.g. last image (the nose of the puppy is too overwhelming and the baby cheetah's face does not have enough edges to exaggerate the high frequencies). That is also why in general, when generating a hybrid image with two face images, it is better to align the facial features, particularly the eyes.
Bells and Whistles (Color Hybrid Images)
To add color to the effect, we simply perform the same frequency operations as done on the grayscale images above, but over another dimension to reflect the RGB image structure. Likewise, the operations could also be performed over each color channel, then stacked together for the colored final result, which was what I did. When combining a grayscale version to a colored version, the grayscale image is simply replicated 3 times and stacked before being operated upon. After some testing, I've come to the conclusion that using color for both components provides with the best results. If we use color channels for one image but grayscale channels for the other, it is generally very difficult to keep the delicate balance of visual features that determine a good hybrid image (described in the section above under "Empirical Observations"). In the sample below, the result from using the colored version of only the high frequencies did reasonably well, but this was not the case for some other test images.
Part 2: Gaussian and Laplacian Stacks
In this part, our objective is to implement Gaussian and Laplacian stacks, which are quite similar to the pyramids implemented in Project 1, but instead of downsampling, the same image is convolved with a Gaussian filter of increasing sigma at each level (in this sample, we double sigma (the standard deviation) at each level. Consequently, the image becomes coarser at every level of the Gaussian stack, but the size of the image is preserved. Each level of the Laplacian stack was generated by taking the difference between the image from the Gaussian stack of the same level and the image from the Gaussian stack of the next level. The Laplacian stack essentially respresents the frequencies between each level of the Gaussian stack. Here are some sample images, which were generated with a stack height of 5 and a Gaussian filter of size 45x45 and a sigma value of 2,4,8,16,32 respectively as the stack increments. Note: I used the 'imadjust' function in Matlab to make the Laplacian images more visible.
It is quite interesting to see that as the Gaussian stack increments in sigma value, we see a stronger image of Lincoln in the first sample and a stronger image of Nicholas Cage in the second sample. At the same time, in the first image of the Laplacian stacks, we can see a strong visualization of Gala in the first sample and a strong visualization of the tiger in the second sample (along with many other details), reflecting the high frequencies of the original image. But as the stack increments, the images in the Laplacian stack dilute as the band frequencies begin to reflect the lower frequency ranges.
Part 3: Multiresolution Blending
For this part, our objective to blend two images using image splines, which is a smooth seam that joins the images through slight distortion, across multiple bands of image frequencies. More specifically, given images A and B, we construct the Laplacian stacks for both images A and B (LA, LB) at height n (n = 5 for my samples). We compute the Gaussian stack (GR) of the mask image at height n as well. We then compute the Laplacian stack (LS) of our resulting image using the following equation (from the paper provided from the assignment)
Conceptually, each level of the computed Laplacian stack represents a band of frequencies - the equation above performs blending separately at each level. We then compute the (n+1)th image of the Gaussian stack for image A, the (n+1)th image of the Gaussian stack for image B, and sum up these two images with the total sum of the Laplacian stack (LS). This works because each level of a Laplacian stack represents the difference between the image of the same level in the Gaussian stack, and the image of the next level in the Gaussian stack. For any given image, adding its entire Laplacian stack of height n to the (n+1)th level of its Gaussian stack should return the original image. So by blending each level of LA with LB using the mask from the corresponding level of GR, we effectively blend the two images A and B at multiple bands of frequencies.
Here are some sample images. I used a Gaussian filter of size 45x45 and a sigma value of 1,2,4,8,16 respectively down the Gaussian and Laplacian stack of height 5. For the first image (my favorite result), I've displayed the Gaussian and Laplacian stacks for each individual image, as well as the Laplacian stacks weighted by the Gaussian stack of the mask.
Check out some more images! The parameters used are the same as above.
After testing over a variety of images (in addition to the ones above), it seems that there are two main reasons for why some blending results look worse than others (like the two failure cases above). The first possible reason for failure is a poor alignment. Like the terminator example, the facial features do not match up correctly, making the blending seem more noticeable than it actually is, since the differences in image content between the left and right images become more accentuated. The second possible reason for blending failure would be the differing image textures/backgrounds. In the football example, the color contrast between the two images along the seam are just so drastic (aside from the grass), that their difference becomes very difficult to smooth out. As long as I avoided these problems when selecting which images to blend, the result usually turned out to be pretty good.
To blend color images (similar to Bells and Whistles in part 1), we simply needed to deal with an extra dimension to reflect the different color channels.I performed the frequency operations over the stacks for each individual color channel and stacked the results together for a final color image.
What I Learned
The coolest thing I learned was the power of Gaussian and Laplacian filters and stacks! It was also super fun to visualize the frequency domains and to understand what was actually going on.