# Image Analogies

## Advanced Computer Graphics' 1st Assignment by Thiago Pereira

## Problem Statement

Given three input images A, Ap and B, create an output image Bp such that Bp : B as Ap : B. In other words, A is related to Ap in the same way that B is related to Bp. They are analogous images. In this assignment I implemented the "Image Analogies" paper [Hertzmann 01].## Features

### Basic Algorithm

I started by implementing the fundamental image analogies framework using brute force search for finding similarities in a single resolution level. For each synthesized pixel, a query neighborhood vector is built by appending KxK values from B with a partial neighborhood from Bp. This vector is compared against all neighborhood vectors built from A and Ap in the exact same way. Partial neighborhoods must be used for two reasons. First, during synthesis Ap is an incomplete image, only some of its pixels can be used. Second, when we are working on a pixel near the border, we do not use pixels outside the image. All comparisons are performed using pixel luminance values and at the end of the process chroma values are transferred directly from B to Bp. Our first example shows this clearly. While analogy was used to blur the input image, it only blurs the luminance and the colors stay the same.: | :: | : | ||||

A |
Ap |
B |
Bp |

: | :: | : | ||||

A |
Ap |
B |
Bp |

: | :: | : | ||||

A |
Ap |
B |
Bp |

: | :: | : | ||||

A |
Ap |
B |
Bp |

### Coherence

The L2 norm of neighborhoods is far from being a perfect similarity measure. For this reason, the best neighborhood found using this criteria may not actually be the most desirable one. One simple but important modification to the algorithm is using coherence. We search in a 3x3 neighborhood of the current pixel and see if any of its neighbors after an adequate shift provides a low error in the L2 norm. Low here means comparable to best neighborhood up to a tolerance as described in the original paper. Coherence has the nice property that it brings a patch-based behavior in the point-based method. This is clearly shown in the images below where we vary the coherence parameter. The green and red images encode respectively x and y coordinates of the source pixel. They clearly show bigger patches for higher coherency values. For all our other results in this assignment, we use kappa = 2.Kappa = 0 | Kappa = 2 | Kappa = 10 |

### Multiresolution

While some results work with small neighborhood sizes, other cases require bigger neighborhoods. The rule of thumb is that the neighborhood size needs to big bigger than the texel size. However, the bigger the neighborhood the slower is the algorithm. In fact, it has quadratic complexity in the width of the neighborhood. The best solution for this problem is using multiresolution. It works by synthesizing from coarser levels to finer levels. The algorithm was modified to use bigger feature vectors which include not only the current level, but also the previous one. The benefit here is keeping a constant neighborhood size at each level. However, the results are much better since a small size in the coarse level is in fact a bigger one in the finer level. We show texture synthesis cases each using 1, 2 and 3 levels of resolution. These results show how the quality improves with more levels as explained above. It is also noteworthy that there is a very small change in running time. All these results were generated with brute force search and without coherence. See the next section for more multiresolution results.Examplar |

One level | Two levels | Three levels |

348s | 500s | 548s |

### Approximate Nearest Neighbor Search

One the major optimizations in the image analogies algorithm is using a search structure to accelerate per pixel queries. We used the ANN library which uses a kd-tree to answer approximate search for high dimensional points.However, to really enjoy the benefits of approximate search we must be careful with the image borders. As opposed to brute force search, we have to use neighborhoods of fixed size when using a search structure. My first implementation used brute force on the border and only used ANN in the interior as described in the analogies paper, but this proved to slow. In fact, it even affects the complexity of the algorithm. The solution adopted was to mirror the values from the inside of the image whenever an inexistent value was required. When we are working with A, Ap or B at any resolution level we always have a value to mirror. However, when we are working with Bp, the required value may fall outside the already synthesized part. In this case, we use an upsampled value from is lower resolution estimate. This solution fails at the coarsest level of the pyramid when we must use brute force search on the border pixels.

We show the same result of multiresolution including the three levels but now using approximate search. In particular, the 3 level result went from 548s to 4s. In addition, we present a higher resolution synthesized output. Notice, how the time decreases as more levels are added. The reason is with a single level there are too many pixels using brute force search. Using multiple levels, brute force search is only used in the coarsest level.

One level | Two levels | Three levels |

24s | 6s | 4s |

One level | Two levels | Three levels |

50s | 19s | 13s |

### Hole filling

We also adapted the algorithm for hole filling. There are two main changes here. First, as described above, search structures only handle fixed size neighborhoods. In hole filling, we have always have to use brute force search because the neighborhoods are constantly changing. Second, since synthesized values are uncertain, we use an order which favors using visible pixels. We synthesize from outside to inside using a Breadth-First Search in a graph built with four-connected neighborhoods.## Applications

### Image Filters

Using the higher resolution version input, the system was capable of blurring and embossing. These results used 1 resolution level and a neighborhood size of 5. Blurring took 7 minutes and embossing 12 minutes.### Texture Synthesis

I synthesized the two texture exemplars below in two scenarios both multiplying the size by 2 and by 4. The small ropes took 1.5 minutes and the big one 6 minutes. The flowers took longer because the of the larger exemplar: 10 minutes for the small one and 45 minutes for the big one. These exemplars used a 5x5 neighborhood and 3 resolution levels.**Examplar**

**Synthesis**

### Object Removal

The examples below show one application of hole filling: removing unwanted objects from photographs. While texture synthesis works well if the texture to be filled is uniform, the method has problems when trying to propagate structure, grass was synthesized on the roof. More advanced hole filling techniques would prioritize propagation of edges first. Since only brute force was used these two examples took a long time processing. The first one took 1 hour and 15 minutes with a neighborhood size of 9 and the second one took 40 minutes with a neighborhood size of 5.### Texture by numbers

The following texture by numbers examples had long running times because of their resolutions. The river took 2.5 hours and the waterfall 45 minutes. These examples were generated only using the luminance value, losing some of the information present in the originals. For example, the system interpreted an average value of gray on the smoothed red border as blue. Following this interpretation, it synthesized a very thin river between the red and black regions.: | ||

A |
Ap |

: | ||

B |
Bp |

: | ||

A |
Ap |

: | ||

B |
Bp |

### Texture transfer

A final example for the art contest.: | ||

A |
Ap |

: | ||

B |
Bp |

## Acknowledgments

I am grateful to Yiming Liu, Tianqiang Liu and Jingwan Lu for interesting discussions and suggestions.## References

- Aaron Hertzmann, "Image Analogies", SIGGRAPH, 2001.
- Michael Ashikhmin, "Synthesizing Natural Textures," I3D, 2001.
- Li-Yi Wei and Marc Levoy, "Fast Texture Synthesis Using Tree-structured Vector Quantization", SIGGRAPH 2000.
- Alyosha Efros, "Texture Synthesis by Non-parametric Sampling", ICCV, 1999.