Sublinear Distributed Reconstruction (thesis)
Online property reconstruction: Large data sets are ubiquitous today with the
advent of high speed computing and the explosion of data storage capabilities. Using this data as an input is quite a challenge, since we do not even have the luxury of reading the whole data. Linear time algorithms are too slow, and we are forced to move into the realm of sublinear time algorithms.
Usually, data sets are useful because they satisfy some structural property. An ordered list of numbers may be in sorted order, and this property may be central to the usefulness of this list. Such a property is very sensitive to noise. Given a large data set f that does not satisfy some such property P, can we modify f minimally to make it have P? Furthermore, this data set is large and we may not have access to it all at once. We would ideally like to make this change online.
These considerations led to the formulation of the online property reconstruction model, which we investigate in this thesis. We are given oracle access to a function f (defined over an appropriate discrete domain) which does not satisfy a property P. We would like to output a function g which satisfies f and differs from f at very few values. The function g is output by a sublinear time online procedure, called a filter. Such a procedure takes as input a domain point x, and outputs g(x) in time sublinear to the domain size. We design filters for the following scenarios:
1. Monotonicity : The functions are real-valued on the d-dimensional domain
[1, n]d, and the property to be maintained is monotonicity. This is a natural
starting point for discussing the online reconstruction model, and involves the
development of many sublinear techniques to study monotonicity. We show the surprising result that once a random seed of size (d log n)O(1) is fixed, the value of g(x) (for any input domain point x) can be computed in (log n)O(d)
time. These results are provided in Chapter 2 and are joint work with Michael
2. Convexity : We take property reconstruction to the geometric world and study
convexity in two and three dimensions. Given a polygonal chain or a 3D
terrain, we design filters that minimally modify the input and make it convex.
Especially for 3D, we require a large set of geometric tools. We also prove
lower bounds showing a complexity gap between 2D and 3D. This is joint
work with Bernard Chazelle and given in Chapter 3.
3. Expansion : The input is a bounded degree graph which we want to make
into an expander. The main algorithmic technique used here is random walks.
We initiate a discussion into the behavior of random walks in graph that are
almost expanders - graphs formed by arbitrarily connecting a small unknown
graph to a large expander. We show interesting mixing properties of walks in
such graphs, and use this to construct a filter for expansion. These results are
given in Chapter 4 and are joint work with Satyen Kale and Yuval Peres.
Self-improving algorithms: A real world algorithm performs the same task,
say sorting, repeatedly on inputs coming from some source. It is natural to assume that this source, although unknown, may possess some structure that allows for faster running time. We investigate ways in which an algorithm can improve its expected performance by fine-tuning itself automatically with respect to an arbitrary, unknown input distribution. Assume that the inputs to the algorithm are generated indepedently from a fixed distribution D. A self-improving algorithm learns about D and then makes optimizations specific to D. This allows it to beat the worst-case running time for the given problem. We give such self-improving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an algorithm to sort a list of numbers with optimal expected limiting complexity; and (ii) an algorithm to compute the Delaunay triangulation of a set of points with optimal expected limiting complexity. These results are described in Chapter 5 and are joint work with Nir Ailon, Bernard Chazelle, Ken Clarkson, Ding Liu, and Wolfgang
Mulzer. In both cases, the algorithm begins with a training phase during which it adjusts itself to the input distribution, followed by a stationary regime in which the algorithm converges to its optimized incarnation.