A Time-Space Efficient Locality Sensitive Hashing Method for Similarity Search in High Dimensions
Locality Sensitive Hashing (LSH) by Indyk and Motwani is a popular technique for designing indexing data structures for approximate nearest neighbor search on high dimensional data. However, its drawback is that a very large number of hash tables are needed in order to achieve good approximation accuracy. As a result, the high space requirement makes the approach impractical for large datasets. A recent (theoretical) result showed that a point perturbation technique has the potential to reduce the space requirement of the LSH approach at the cost of an increase in query time. This paper proposes a new LSH method called hash-perturbation LSH based on perturbing the hash values. Our evaluation with an image dataset and an audio dataset shows that the proposed approach is both time and space efficient. To achieve similar search quality, the hash-perturbation LSH approach has a similar time efficiency as the basic LSH approach while reducing the space requirement by a factor of five. For image dataset, its time efficiency is twice of the the point perturbation approach.