Cheaper by the Dozen: Batched Algorithms
Abstract:
In this paper we develop the idea of batching, processing several queries
at a time, for more efficient algorithms. The prominence of massive data
sets in computation has necessitated new ideas in algorithms. Consider a
webpage that answers queries on a large data set. Instead of answering
these queries one at a time, which can result in a substantial bottleneck,
we wait for several queries to accumulate, and then apply a batched
algorithm that can answer them significantly faster. We give an example
of the power batched algorithms by showing an O(nbd^{.3}) time batched
algorithm which answers b d-dimensional Hamming nearest neighbor queries
on a dataset of size n, given that b \ge d.To better understand the power of batching we give batched algorithms for
the classical problems of: string matching, range searching, selection, 1D
and 2D nearest neighbor, convex hull membership, and halfplane
intersection. Our batched algorithms answer b queries on a dataset of
size n >> b in 0(nlogb) time, using o(n) storage and at most 2n/B I/O's
(where B is the block size). We use two techniques, query data structures
and sampling, in the design of our batched algorithms. The advantages of
these batched algorithms algorithms, over putting the massive dataset
into a classical data structure, are threefold: improved asymptotic
performance, significantly smaller data structures, and a number of I/O's
which is linear in the size of the massive dataset.