Self-Organizing Feature Maps

One can interpret clustering in terms of neural networks -- simplified models of brain structures for recognizing patterns. In particular, one can think of each cluster center as describing the neural "memory" of a typical pattern. If an input pattern x stimulates a "unit" having a memory mi, one can have the "firing rate" of the unit vary inversely with the distance from x to mi. Thus, the closer that an input pattern x is to the memory mi, the greater the firing rate of the corresponding unit.

Now for biological sensory neurons, it often happens that neurons that respond to similar patterns are physically close to one another. Thus, in vision one encounters so-called retinatopic maps, where the topology of the image is preserved. Similarly, in hearing one encounters so-called tonotopic maps, where the spectral topology is preserved.

There is no reason to expect this kind of regularity will result from the k-means procedures. For example, there is no reason to think that cluster center m2 in any way represents a pattern that is "between" m1 and m3. However, Kohonen showed how one can find clusters so that units that are spatially close correspond to patterns that are similar. He called the resulting networks self-organizing feature maps. We illustrate Kohonen's approach for units arranged in one dimension, although the approach is easily extended to two or more dimensions.

Let us arrange the units along a line as shown above. When an example x is presented, Unit i responds according to some inverse function of || x - mi ||, such as exp( - || x - mi ||2 ). The unit that responds most strongly has a "memory" mi that is closest to x -- even though x and mi might not be all that similar. We make mi become more like x through the updating rule

mi <-- mi + a (x - mi ) .

So far, this is the same as the sequential k-means procedure. However, Kohonen makes two modifications. First, he says that we should not update only mi, but that we should also update other cluster centers that are in the neighborhood of mi along the one-dimensional layout, i.e., units that are spatially near the unit having memory mi. Second, he says that the size of the neighborhood should be programmed to shrink as time goes on.

The result of Kohonen's modifications is that units that are spatially close tend to develop similar memory vectors. Of course, the rate at which the neighborhood shrinks is critical. If the neighborhood is large and it shrinks very slowly, the cluster centers will tend to stick close to the overall mean of all of the examples. At the other extreme, if the neighborhood shrinks very rapidly, the results will be the same as one gets with the sequential k-means procedure.

There is little theory to guide the choice of the shrinking schedule. However, extensive experiments have shown that, with judicious trial and error, one can often obtain useful clusterings that, in some sense, preserve topology. See (Kohonen 1995) and (Ritter et al. 1992) for many interesting examples.

Back to Sequential Up to Clustering