Chazelle's liner notes

Liner Notes

Natural Algorithms

For much of my professional life, designing algorithms had been my thing. Then, one day, I watched a majestic flock of geese fly over Carnegie Lake and it dawned upon me that it had been their thing, too. Having been at it for 100 million years, even longer than I had, naturally their algorithmic genius surpassed mine. Undaunted, I resolved to catch up. The premise of my current research is that interpreting biological or social self-organized systems as "natural algorithms" brings upon them a fresh, new perspective ripe for inquiry. I believe that only the algorithm has the expressive power to model complex self-adaptive systems at the right levels of abstraction. Algorithms are the differential equations of the 21st century. Beyond its trite catchiness, this line serves to remind us that mankind's grasp of PDEs vastly exceeds its mastery of algorithms. The first order of business, therefore, is to build new analytical tools for natural algorithms. Please join me in this endeavor and watch this space for timely dispatches from the trenches.

Triangulating a Simple Polygon in Linear Time   (FOCS 1990, DCG 1991)

To triangulate a simple polygon is to parse its vertex sequence in the grammar of Jordan curves. A triangulation algorithm is a compiler. Neighbor relations inside the polygon are all encoded on its boundary. The question is how to decode them; or, in an echo of Cauchy's integral formula for analytic functions, how to infer (geometric) locality in 2D from (combinatorial) locality in 1D. The solution is to run two distinct divide-and-conquer schemes simultaneously. One of them, based on tree centroids, seeks to exploit the parenthesis structure of a triangulation: its parse tree. The idea is to break up the problem into a hierarchy and maintain it dynamically via a 2D analogue of tree rotations. The other divide-and-conquer device is a navigation tool for ray shooting based on Lipton and Tarjan's beautiful separator theorem.

Linear-time polygon triangulation has intriguing consequences. For example, one cannot check in linear time whether a list of segments ab, cd, ef, gh, etc, is free of intersections, but if the list is of the form ab, bc, cd, de, etc, then miraculously one can. Segueing into my favorite open problem in plane geometry, can the self-intersections of a polygonal curve be computed in linear time? I know the answer (it's yes) but not the proof.

An Optimal Convex Hull Algorithm in Any Fixed Dimension   (FOCS 1991, DCG 1993)

Welcome to the glorious world of derandomization! In 1989, Clarkson and Shor published one of the most influential papers in computational geometry. Its highlight was an optimal probabilistic resolution of the convex hull problem, a lovely algorithm crying out to be derandomized. Lovely might not be the first word that comes to mind when reading my paper, as you watch in horror a punishing barrage of sampling artillery fired at enemy positions. The VC-dimension squadron charges with all it's got: semicuttings, ε-approximations, dual shatter functions, k-sets, product spaces, sensitive sampling, higher-moment averaging, etc. But in the fog of battle, doubts begin to creep in. Was the intrepid conqueror (that would be me) trying to cross the Pacific in a gold-plated bathtub? And how was all that gold going to help? And the fancy-schmancy rubber duckies? But help they did. And the world was made safer for determinism.

The work is technically demanding. Hervé Brönnimann, Jirka Matoušek, and I later joined forces to simplify the algorithm: it's lighter fare but still a full course meal. Is there a zero-calorie solution? I doubt it. That said, to underestimate the power of algorithms sums up the history of the field, so perhaps I should wipe that skeptical smirk off my face.

A Minimum Spanning Tree Algorithm with Inverse-Ackermann Type Complexity   (FOCS 1997, JACM 2000)

At it nears its 100th anniversary, MST is the oldest open problem in computer science. Well done, Otakar Borůvka, the problem's progenitor. For more than two years, I lived, breathed, drank, and ate MST. How special is this paper to me? Not to overstate it, let's just say it's eternally emblazoned upon my heart and forever enshrined in the deepest recesses of my soul. Or something like that. Besides sheer stubborness, one explanation for my perseverance is that I chronically underestimated the difficulty of the problem. (Never make light of the salutary power of delusion.) Triangulating a polygon or derandomizing Clarkson-Shor, now these were hard nuts to crack for real men and real women! But wussy MST? Come on, be serious. Wasn't computing the thing just a matter of bopping around a graph with a priority queue in tow? How hard could it be to beat that sort of kindergarten health class exercise? If all you can do is the hokey-pokey, well, you watch.

The cockiness didn't last. I used to joke that I discovered this solution by a process of elimination, having tried every other conceivable algorithm. The big plan was to use the discrepancy method to derandomize Karger-Klein-Tarjan (KKT), on the view that random sampling was overkill. Easier said than done. After years of hard labor, I finally went, "Oh, Mama, can this really be the end, to be stuck inside of Borůvkaville with the Ackermann blues again?" Though unlikely to be optimal, my 15-year old bound has yet to be improved. Progress will require a new idea. Perhaps a dynamic extension of the Komlós verifier? Your turn.

My working mental picture had little to do with graphs—I rarely visualize a problem by its original description, anyway. Rather, it featured a crater-pocked landscape with water flowing from one crater to the next under a complex flood control mechanism. I realized early on that existing priority queues, the workhorses of most MST algorithms, were entropically unfit for the task. What I needed was an approximate priority queue that could compute medians optimally and operate at any entropy rate. The answer was the soft heap, which I designed with an eye toward integrating Yao-Borůvka within a Prim-type framework—Kruskal's method is of no use here because it operates in the wrong entropy regime. Designing the soft heap was not the hard part: fitting it to Ackermann's crater-filling schedule was. (The clearest exposition of the algorithm is in my book on the discrepancy method, which is freely available here.)

MST is the textbook example of a matroid optimization problem for which, as prevailing wisdom goes, "greed is good." Nonsense. This misguided view held back progress for years; that is, until KKT came along and showed greed for what it was: rubbish. My algorithm reaffirmed that finding. Unlike Shortest-Paths, MST is divide-and-conquer heaven: it gives the algorithm designer a huge canvas to try out wild ideas. And Ackermann is robust enough to accommodate the wildest among them. Trouble is, as you build canals in the crater-pocked landscape, dirty water accumulates in the wrong places. He who masters waste water management, ie, edge deletion, wins. Therein lies the key to future progress.

Why derandomize? I'll spare you the mindless pablum about certainty vs. near-certainty. Rather, consider traveling from San Francisco to New York City by plane or by horse. The first option is more convenient but you'll see nothing: KKT gives you the view of MST from 30,000 feet, with cloud cover and a side order of peanuts and diet coke. My algorithm, on the other hand, will take you across the redwood forests, the craggy canyons, the whitewater rapids, the tallgrass prairie, the lush meadows, the rolling grasslands, with time to spare for a duck terrine with black truffles at Chez Panisse and miso black cod at Nobu. Different lifestyle.

Fractional Cascading: I. A Data Structuring Technique  •  With Leo Guibas   (ICALP 1985, Algorithmica 1986)

Imagine a computer network that connect cities together, with at each node, tucked into a corner, the city's phone book. As you surf around, naturally you look up your name in every phone book that comes your way. Can you beat binary search? (No hashing, no superlinear storage.) The answer is yes. The trick is to let fractional samples of the phone books cascade across the network. Success depends on the favorable outcome of a "battle of exponentials" pitting the swelling data volume caused by the cascading against the exponential decay from the fractional sampling. The technique has found many uses, including packet routing on the Internet—or so I was told over beer and pretzels by a theorist at a conference, so it must be true. Fractional cascading in action is an awesome sight to behold, with unpredictable avalanches reminiscent of Bak's sandpiles. Self-organized criticality beckons. To pin down the critical exponents would be cool; it might even be doable. Can phone books be replaced by planar maps or higher-dimensional structures? No.

A Spectral Approach to Lower Bounds with Applications to Geometric Searching   (FOCS 1994, SICOMP 1998)

People have it backwards. Designing algorithms is all about lower bounds. Just as Christian martyrs in the Colosseum focused on the lions' lunch plans more than on their own, so the algorithm designer asks not "what can I do?" but, rather, "what will the adversary do?" Every problem has a weakness, a bottleneck, a choke point: that's where the algorithmist wields her scalpel. To search for a faster algorithm is to look for the weak link. An upper bound is the name we give to a limitation on lower bounds, ie, a constraint on the adversary. I am not being facetious: in algorithms research, the key to discovery is spotting the chink in the problem's armor. The algorithmist's motto: if it's soft, stab it! Conversely, proving lower bounds is to design algorithms for the adversary.

This work began with my observation that designing data structures for range searching is more often than not an eigenvalue problem in disguise. To show the inexistence of good algorithms is then a matter of bounding the spectrum of certain linear maps. Which, as luck would have it, is where discrepancy theory shines. To cement the connection, I devised an entropy-based approach to arithmetic circuits that went like this. Feed the circuit a random input from a well-chosen distribution and observe the computation with blurry eyes. If the dominant eigenvalue is high enough then the circuit will tend to amplify the signals, in turn causing the blurring to increase the entropy—something that wouldn't happen with clear vision (ie, in a deterministic computation). Now argue that, in a bounded-coefficient model, no (blurry) gate can raise the entropy by too much, hence the need for many gates. Voilà. Where lower bounds for range searching differ from upper bounds is in their disproportionate reliance on classical math. My lower bound work has borrowed liberally from convex geometry, linear algebra, algebraic topology, Fourier analysis, graph theory, stochastic geometry, number theory, information theory, coding theory, tensors, and even wavelets [1, 2, 3, 4, 5, 6, 7, 8, 9].

An Optimal Algorithm for Intersecting Line Segments in the Plane  •  With Herbert Edelsbrunner   (FOCS 1988, JACM 1992)

How to compute optimally all pairwise intersections among a collection of line segments. The challenge is to avoid sorting the intersections, which rules out standard Bentley-Ottmann sweeplining. I had shown earlier how to do that, but at the price of a pesky additive overhead. Our way around the peskiness was to use a hierarchical sweep and exploit the rich combinatorics of line arrangements, which is what locally a set of segments with many intersections looks like. The data structuring is all pre-VC dimension. In retrospect, Herbert and I were lucky to pull it off with the limited divide-and-conquer technology of the day.

The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors  •  With Nir Ailon   (STOC 2006, SICOMP 2009)

The FJLT is a simple method for low-distortion embeddings by random projections using sparse matrices. I hear it's now used in practice (surely my first step towards world domination). The idea behind the FJLT is to use Heisenberg's inequality to fill up sparse vectors via a randomized Fourier transform. Randomization is necessary because not all vectors can be densified. Fortunately, the dense vectors outnumber the sparse ones, so randomizing the FFT moves sparsity away from the playing field. This work is a rare use of the uncertainty principle in theoretical computer science.

An Optimal Algorithm for Intersecting Three-Dimensional Convex Polyhedra   (FOCS 1989, SICOMP 1992)

There is a genre of geometric problems whose solutions are conceptually trivial yet fiendishly tricky to implement: the worst of all worlds, if you will. Intersecting two tetrahedra is one of them. Oddly enough, intersecting two n-vertex convex polyhedra is not. In fact, I know few geometric problems whose exploration offers such breathtaking vistas. Picture stitching together the two polyhedra's boundaries as the act of hiking in a valley surrounded by tall mountains. The valley, as it happens, consists of disjoint parts that can be reached only by climbing the neighboring peaks. Bad idea! Ah, but from a dual perspective, you see, the structure of the mountain peaks very much resembles that of the valley. So say hello to primal-dual trekking. Soon, though, we need tunnels across the mountains to use as shortcuts, a job we subcontract to the Dobkin-Kirkpatrick hierarchy. Alas, we can't dig too deeply into the mountains lest we get hit with an extra log factor, so the hiking metaphor stops there.

Salvation comes in the shape of a "cloning" recursion. What's that? In the world of algorithms, it is useful to distinguish among three types of recursion: binary search gives your the nonbranching kind; quicksort gives you the no-clone branching sort (no data replication); linear selection (median finding) gives you the cloning branching variety (with data replication). This last flavor is the tastiest of them all because it features two competing exponential growths (as in fractional cascading). Confession time: I've always had a weakness for linear selection and its sky-high coolness-to-simplicity ratio. I'll admit that RSA and FFT have enviable ratios, too, but both owe their sparkle to algebra whereas median finding is pure algorithmic sizzle. Just as Hume is said to have interrupted Kant's "dogmatic slumber," so linear selection interrupted the sorting snoozefest I once endured in Algorithms 101.

Lower Bounds for Linear Degeneracy Testing  •  With Nir Ailon   (STOC 2004, JACM 2005)

Given n numbers, do any r of them add up to 0 or, more generally, form a zero of a given r-variate polynomial? From 3-SUM to bin packing to testing general position, this problem is ubiquitous. It is NP-complete in general and polynomial for constant r. This paper revisits a remarkable argument of Jeff Erickson to derive the first exponential lower bound. Boasting aside (or maybe not), our proof is a feast for the mathematical eye, bringing in polyhedral geometry, Reed-Solomon codes, and tensor products, all the while pointing to an intriguing generalization of coding theory as the way forward. Move over, Damien Hirst, and watch, from the kiddie chair, what real art is about!

Cutting Hyperplanes for Divide-and-Conquer   (FOCS 1991, DCG 1993)

Cuttings are the most powerful divide-and-conquer tools in computational geometry. They're a product of the sampling revolution that shook the field in the late eighties. This paper presents the first optimal construction with respect to both time and size. It builds on my work with Joel Friedman (see below) and on seminal results by Jirka Matoušek. The algorithm starts out predictably, as we dig our way through a hierarchy of hyperplane arrangements. But then we hit a snag! The recursion's tiny errors get amplified at each level and sampling alone can't rescue us from the unfolding catastrophe. What to do? As so often before, geometry steps in to save the day.

Self-Improving Algorithms  •  With N. Ailon, K.L. Clarkson, D. Liu, W. Mulzer, C. Seshadhri   (SODA 2006, SICOMP 2011)

Google's PageRank is fast but its appeal lies elsewhere: it's smart. We suggest that, in order to raise their IQs, algorithms should learn about their users and improve themselves along the way. Prescriptive learning is less taxing than the more traditional, descriptive kind: it's the difference between learning geography to pass the test and doing it for its own sake. This paper provides a proof of concept by giving the first optimal algorithms for sorting and Delaunay triangulations, where optimality is defined with reference to arbitrary, unknown distributions on the inputs. Further work should look into optimization problems. This paper opens up a new area of investigation which is sure to trigger a mad rush to the entrance gates. Or thus spoke I.

Quasi-Optimal Range Searching in Spaces of Finite VC-Dimension  •  With Emo Welzl   (DCG 1989)

Any set of n points in the plane can be made into the vertices of a simple polygon that no line can cut in more than roughly √n places. This fundamental geometric fact should be taught in every high school in the land. The proof is elementary yet surprising, and a joy to teach in the classroom. It is easy to generalize to higher dimension and other settings, and the result provides a powerful new tool for range searching and discrepancy theory. The paper tightens up a previous bound of Emo, whose proof featured one of the earlier uses of multiplicative rescaling, an important technique now ubiquitous in machine learning, game theory, and theoretical computer science. The improvement comes from a packing argument interesting in its own right, and which Haussler generalized a few years later. Roughly, the idea is to define a discrete "metric" for a hyperplane arrangement (or a set system of bounded VC-dimension) and argue that it behaves in much the same way as the Euclidean metric of the appropriate dimension.

The Discrepancy Method: Randomness and Complexity   (Cambridge University Press, 2000)

It's been an endless source of fascination for me that a deeper understanding of deterministic computation should require the mastery of randomization. I wrote this book to illustrate this powerful connection. From minimum spanning trees to linear programming to Delaunay triangulations, the most efficient algorithms are often derandomizations of probabilistic solutions. The discrepancy method puts the spotlight on one of the most fruitful questions in all of computer science: if you think you need random bits, please tell us why?

The book is freely available at the link above. If you'd rather buy a print version, be aware that I get a few pennies off each copy in royalties. So if, thanks to you, I ever become so rich I can enslave all of humanity, you'll be proud to know you did your part.

A Deterministic View of Random Sampling and Its Use in Geometry  •  With Joel Friedman   (FOCS 1988, Combinatorica 1990)

This article introduced derandomization in computational geometry as an algorithm design methodology. The idea, now mainstream, that the road to better deterministic algorithms might pass through probabilistic solutions was then considered radical among geometers. We give a deterministic construction of optimal-size cuttings using a sampling device that lies somewhere between an ε-net and an ε-approximation. It would take a few more years to tighten up the running time.

Filtering Search: A New Approach to Query-Answering   (FOCS 1983, SICOMP 1986)

This work explores an idea so trivial it's a wonder it leads anywhere. If the output to a search is not a single item but a whole list of them, then the algorithm can be slowed down in proportion to the size of the output. This in turn brings about savings in storage. The output size is usually unknown ahead of time, so the idea is to let off the gas pedal gradually as we learn that number in unary. This banal observation failed to live up to its triteness. In fact, it led to dozens of improvements in geometric searching. The paper also introduced, in the hive graph, the idea of fractional sampling which later spawned fractional cascading.

The Power of Geometric Duality  •  With Leo Guibas and D.T. Lee   (FOCS 1983, BIT 25)

A concept now so commonplace as duality still had buzz in the early years. (Duality in computational geometry has the same meaning it does in linear programming, but its modalities of use are quite different.) For all of you, Jeopardy contestants, this paper features the first zone theorem (discovered independently by Edelsbrunner, O'Rourke, and Seidel). The notion would then blossom into a whole "school" led by Micha Sharir.

Lines in Space: Combinatorics and Algorithms  •  With Herbert Edelsbrunner, Leo Guibas, Micha Sharir, Jorge Stolfi   (STOC 1989, Algorithmica 1996)

Every summer for many years, Leo Guibas arranged for this group (plus or minus a few) to meet for three weeks at DEC SRC near the Stanford campus. The earlier gatherings took place at PARC but, when Xerox rediscovered its soul in the copying business, we had to move our circus show down the road. Our stated goal was to advance the state-of-the-art in computational geometry. What was in it for DEC was never entirely clear. The company went belly up in the late nineties so now, at least, we know the answer to that question. This paper sets the algorithmic foundations for the study of lines in 3D, ie, spaghetti before you cook them. The pasta may look straight but the best way to represent uncooked spaghetti is by means of some sort of lasagna, the Plücker hypersurface, baked in projective five-dimensional space. When, soon, the Flying Spaghetti Monster is confirmed as the universe's sole divinity, I expect our paper to become a Holy Text.

Computational Geometry and Convexity   (PhD Thesis, Yale University, 1980)

Michael I. Shamos founded computational geometry and wrote the first PhD thesis in the field. I wrote the second. Mike and I shared the best advisor in the world, David Dobkin. I loved my grad school days. I met my wife, I worked in this great new field, and I treasured my access to Yale's proudest academic asset: pizza.

Click here for my other publications.