Andy Zeng


Textures are important in computer vision, and they can be easily quantified with something known as textons. In this project, I've constructed textons across a variety of textures and subsequently used them to recognize textured patches. This project can be expanded upon for object recognition.

Computing Textons

Textons are simple units which comprise more complicated texture patterns. The simplest way to define them is as following:
1. From the training images, extract 5x5 sized image patches
2. Center each image patch, i.e. subtract the mean rgb value of the pixels in that patch. Consider normalizing the patches to be unit L1 length.
3. Cluster the image patches using K-means (e.g. for K = 25,50,... ). The centers of the K clusters are the textons.
The clusters are computed only once over the dataset in an unsupervised manner. From now on, they are fixed.

Representing Images with Texton Histograms

We can describe a given image with a histogram of textons that occur in it. For each texture image we extract all 5x5 sized patches, center them and compute the cluster they are associated with. We compute a K dimensional histogram of textons for that texture. The i-th entry in the histogram corresponds to the number of image patches that are associated to the i-th texton. The final histogram is normalized, so that the entries sum up to 1.

Classifying New Images

Given a new input image we want to figure out which of the N given texture examples it most closely resembles. For a new image, we extract all the 5x5 size patches and center and normalize them (the same way you did before). Then for each extracted patch we compute the associated texton and subsequently construct a histogram with the counts of the textons. Finally, to classify we measure the similarity of the constructed histogram to the histograms of the textures in the training set (e.g. using L1 distance, X^ 2 distance, intersection kernel). The winning texture is the texture that is associated with the test patch. This is called the nearest neighbor algorithm. For our project, we will use 1 image per texture for 17 different texture patterns. Then using a 1NN classifier over computed histograms, get a predicted label over the test images containing similar texture patterns.


Training images for different textures (left to right, top to bottom: bark, beach sand, brick wall, grass, gravel, herringbone weave, leather, metal grates, pigskin, plastick_bubbles, raffia, straw, water, wood fence, wood grain, wood roof, woolen cloth)

The last image is a visualtization of the computed centroids/textons.

Test images, classify which texture is which (accuracy: ~70.56%).

For each test image, we compute the patches and use knnsearch to classify each patch accordingly to a texton. Histograms are constructed and compared with a naïve algorithm that computes the compare value between two histograms as being the sum of the absolute values of the differences of the normalized histogram elements. Because we are using such a naïve algorithm, classification accuracy can likely be improved using another method to compare histograms. But for the results of this assignment, the naïve algorithm is used. K-means produces some deviation between calls, so the accuracy obtained was from the most accurate run.

Note: It seems as though my implementation fails on all cases that involve the wood fence texture. This is reasonable, since the training image of the wood fence texture contains features across the top half of the image that look very similar to the features of other the other training images used. As a result, texton classification associated with wood fence are inaccurate unless classifying an image that shows both the top and the side portions of the fence (like in the training image). Additionally, my implementation also seems to fail on the grass and gravel case, returning bark and woolen cloth respectively. Both inaccurate classifications are likely due to inaccuracies as a result of using small patch sizes and comparing histograms naïvely – the grass more so. The gravel case could also be a special case, since the lack of interspersed dark spots could be making classification inaccurate.


The neat thing about this texture recognition trick is that it can be used to solve a variety of computer vision problems. Not only can objects be identified by their texture, but texture can even be generated. Textons are a great quatifying feature.

We can identify even a particular scene to be within a forest if there is a considerable amount of bark texture, or the scene could be a living room with a considerable amount of leather seating textures, etc...

Scenery training images:

Test Images (accuracy ~94.44%) tests failed on indoor christmas tree.