- Make initial guesses for the means
**m**_{1},**m**_{2}, ...,**m**_{k}

- Until there are no changes in any mean
- Use the estimated means to classify the examples into clusters
- For i from 1 to k
- Replace
**m**_{i}with the mean of all of the examples for Cluster i

- Replace
- end_for

- end_until

This is a simple version of the **k-means procedure**. It
can be viewed as a greedy algorithm for partitioning the n examples into
k clusters so as to minimize the sum of the squared distances to the cluster
centers. It does have some weaknesses.

- The way to initialize the means was not specified. One popular way to start is to randomly choose k of the examples.
- The results produced depend on the initial values for the means, and it frequently happens that suboptimal partitions are found. The standard solution is to try a number of different starting points.
- It can happen that the set of examples closest to
**m**_{i}is empty, so that**m**_{i}cannot be updated. This is an annoyance that must be handled in an implementation, but that we shall ignore. - The results depend on the metric used to measure ||
**x**-**m**_{i}||. A popular solution is to normalize each variable by its standard deviation, though this is not always desirable. - The results depend on the value of k.

__________

* If it turns out to be better, why stop with k = 3 or k = 4 ? If we go
to the extreme of k = n, we wind up with what is called a **nearest-neighbor
classifier**. If the number of examples is large enough, a nearest-neighbor
classifier can perform excellently. However, it is computationally very
expensive. This brings up the point that, like all engineering problems,
classifier design is a compromise between price and performance.

On to Fuzzy k-means
Up to Clustering