A Parallel Out-of-Core K-Means Clusterer

This is a re-implemtation of the K-means clustering algorithm. I hate to re-implement the wheel, but the freely available K-means clusterers I can find online all lack this or that feature that I think might be essential to allowing me to cluster my 50GB dataset into 1 million clusters within a few days. After failing to figure out a clean way to run Mahout on my dataset, I decided to give up searching the web and write my own cluster. Following are a few features that might make you interested to give my clusterer a try (although I don't think they really justify a re-implementation because both parallization and out-of-core are only needed at the same time when the dataset is too large to be clustered): You can download the source code here; there are some instructions on compiling and running in the same file. Feel free to drop me an email if you encounter some problem or simply to say that the program works.