Data-Powered Computing

Bernard Chazelle

Property Testing           Sublinear Algorithms           Uncertainty           Streaming           Tools           Clustering           Reconstruction

Property Testing

Alon, N.   Testing subgraphs in large graphs, Random Structures and Algorithms 21 (2002), 359-370.

Alon, N., Dar, S., Parnas, M., Ron, D.   Testing of clustering, Proc. 41st FOCS (2000), 240-250.

Alon, N., Fischer, E., Krivelevich, M., Szegedy, M.   Efficient testing of large graphs, Combinatorica 20 (2000), 451-476.

Alon, N., Kaufman, T., Krivelevich, M., Litsyn, S., Ron, D.   Testing low-degree polynomials over GF(2), Proc. RANDOM-APPROX (2003), 188-199.

Alon, N., Krivelevich, M.   Testing k-colorability, SIAM J. Discrete Math. 15 (2002), 211-227.

Alon, N., Krivelevich, M., Newman, I., Szegedy, M.   Regular languages are testable with a constant number of queries, SIAM J. Comput. 30 (2001), 1842-1862.

Alon, N., Shapira, A.   Testing satisfiability, J. Algorithms 47 (2003), 87-103.

Alon, N., Shapira, A.   Testing subgraphs in directed graphs, Proc. 35th STOC (2003), 700-709.

Alon, N., Shapira, A.   A characterization of easily testable induced subgraphs, Proc. 15th SODA (2004), 935-944.

Batu, T., Ergun, F., Kilian, J. Magen, A., Raskhodnikova, S., Rubinfeld, R., Sami, R.   A sublinear algorithm for weakly approximating edit distance, Proc. 36th STOC (2004).

Batu, T., Fischer, E., Fortnow, L., Kumar, R., Rubinfeld, R., White, P.   Testing random variables for independence and identity, Proc. 42nd FOCS (2001), 442-451.

Batu, T., Fortnow, L., Rubinfeld, R., Smith, W.D., White, P.   Testing that distributions are close, Proc. 41st FOCS (2000), 259-269.

Batu, T., Kumar, R., Rubinfeld, R.   Sublinear algorithms for testing monotone and unimodal distributions, Proc. 36th STOC (2004).

Batu, T., Rubinfeld, R., White, P.   Fast approximate PCPs for multidimensional bin-packing problems, Proc. 3rd APPROX-RANDOM (1999), 245-256.

Bender, M., Ron, D.   Testing properties of directed graphs: acyclicity and connectivity, Random Structures and Algorithms 20 (202), 184-205.

Blum, M., Luby, M., Rubinfeld, R.   Self-testing/correcting with applications to numerical problems, J. Comput. Sys. Sci. 47 (1993), 549-595.

Bogdanov, A., Obata, K., Trevisan, L.   A linear lower bound on the query complexity of property testing algorithms for 3-coloring in bounded-degree graphs, Proc. 43rd FOCS (2002), 93-102.

Bogdanov, A., Trevisan, L.   Lower bounds for testing bipartiteness in dense graphs, ECCC 64 (2002).

Buhrman, H., Fortnow, L., Newman, I., Roehrig, H.   Quantum property testing, Proc. 14th SODA (2003), 480-488.

Czumaj, A., Sohler, C.   Property testing with geometric queries, Proc. 9th ESA (2001), 266-277.

Czumaj, A., Sohler, C.   Testing hypergraph coloring, Proc. 28th ICALP (2001), 493-505.

Czumaj, A., Sohler, C.   Abstract combinatorial programs and efficient property testers, Proc. 43rd FOCS (2002), 83-92.

Czumaj, A., Sohler, C., Ziegler, M.   Property testing in computational geometry, Proc. 8th ESA (2000), 155-166.

Dodis, Y., Goldreich, O., Lehman, E., Raskhodnikova, S. Ron, D., Samorodnitsky, A.   Improved testing algorithms for monotonicity, Proc. RANDOM-APPROX (1999), 97-108.

Ergun, F., Kannan, S., Kumar, S.R., Rubinfeld, R., Vishwanthan, M.   Spot-checkers, JCSS 60 (2000), 717-751.

Ergun, F., Kumar, S. Ravi, Rubinfeld, R.   Fast approximate PCPs, Proc. 31st STOC (1999), 41-50.

Fischer, E.   On the strength of comparisons in property testing, Electronic Colloquium on Computational Complexity (ECCC) 8 (2001).

Fischer, E.   The art of uninformed decisions: A primer to property testing, The Computational Complexity Column of The Bulletin of the European Association for Theoretical Computer Science 75 (2001), 97-126.

Fischer, E.   Testing graphs for colorability properties, Proc. 12th SODA (2001), 873-882.

Fischer, E., Kindler, G., Ron, D., Safra, S., Samorodnitsky, A.   Testing juntas, Proc. 43rd FOCS (2002), 103-112.

Fischer, E., Lehman, E., Newman, I., Raskhodnikova, S., Rubinfeld, R., Samorodnitsky, A.   Monotonicity testing over general poset domains, Proc. 34th STOC (2002), 474-483.

Fischer, E., Newman, I.   Testing of matrix properties, Proc. 33rd STOC (2001), 286-295.

Fischer, E., Newman, I., Sgall, J.   Functions that have read-twice constant width branching programs are not necessarily testable, Proc. 17th CCC (2002), 73-79.

Goldreich, O.   Combinatorial property testing - a survey, 1997.

Goldreich, O.   Property testing in massive graphs, Handbook of Massive Data Sets (2002), 123-147.

Goldreich, O., Goldwasser, S., Lehman, E., Ron, D., Samorodnitsky, A.   Testing monotonicity, Combinatorica 20 (2000), 301-337.

Goldreich, O., Goldwasser, S., Ron, D.   Property testing and its connection to learning and approximation, J. ACM 45 (1998), 653-750.

Goldreich, O., Ron, D.   Property testing in bounded degree graphs, Proc. 29th STOC (1997), 406-415.

Goldreich, O., Ron, D.   A sublinear bipartite tester for bounded degree graphs, Combinatorica 19 (1999), 335-373.

Goldreich, O., Ron, D.   On testing expansion in bounded-degree graphs, Electronic Colloquium on Computational Complexity (ECCC) 7 (2000).

Goldreich, O., Ron, D.   On estimating the average degree of a graph, ECCC 11, 13 (2004).

Goldreich, O., Trevisan, L.   Three theorems regarding testing graph properties, Proc. 42nd FOCS (2001), 460-469; comments

Halevy, S., Kushilevitz, E.   Distribution-free property testing, Proc. RANDOM-APPROX (2003), 302-317.

Kaufman, T., Krivelevich, M., Ron, D.   Tight bounds for testing bipartiteness in general graphs, Proc. RANDOM-APPROX (2003), 341-353.

Kearns, M., Ron, D.   Testing problems with sub-learning sample complexity, J. Comput. Syst. Sci. 61 (2000), 428-456.

Kiwi, M., Magniez, F., Santha, M.   Approximate testing with error relative to input size, JCSS 66, 2 (2003), 371-392.

Kumar, R., Rubinfeld, R.   Sigact News 34 (2003), 57-67.

Newman, I.   Testing of functions that have small width branching programs, Proc. 41st FOCS (2000), 251-258.

Parnas, M., Ron, D.   Testing metric properties, Proc. 33rd STOC (2001), 276-285.

Parnas, M., Ron, D.   Testing the diameter of graphs, Random Structures and Algorithms 20 (2002), 165-183.

Parnas, M., Ron, D., Rubinfeld, R.   Testing parenthesis languages, Proc. RANDOM-APPROX (2001), 261-272.

Parnas, M., Ron, D., Rubinfeld, R   On testing convexity and submodularity, Proc. RANDOM-APPROX (2002), 11-25.

Parnas, M., Ron, D., Samorodnitsky, A.   Testing basic Boolean formulae, SIAM J. Discrete Math. 16 (2002), 20-46.

Ron, D.   Property testing, Handbook of Randomized Computing, Vol. II, eds., Pardalos, P.M., Rajasekaran, S., Reif, J., Rolim, J.D.P., Kluwer Academic Publishers (2001), 597-649.

Rubinfeld, R., Sudan, M.   Robust characterizations of polynomials with applications to program testing, SIAM J. Comput. 25 (1996), 252-271.

Sublinear Algorithms

Abello, J., Pardalos, P.M., Resende, M.G.C.   Handbook of Massive Data Sets, The ACM Digital Library, 2002.

Achlioptas, D., McSherry, F.   Fast computation of low rank matrix, Proc. 33rd STOC (2001), 611-618.

Azar, Y., Fiat, A., Karlin, A., McSherry, F., Saia, J.   Spectral analysis of data, Proc. 33rd STOC (2001), 619-626.

Batu, T., Dasgupta, S., Kumar, R., Rubinfeld, R.   The complexity of approximating the entropy, Proc. 34th STOC (2002), 678-687.

Batu, T., Ergun, F., Kilian, J., Magen, A., Raskhodnikova, S., Rubinfeld, R., Sami, R.   A sublinear algorithm for weakly approximating edit distance, Proc. 35th STOC (2003), 316-324.

Bradley, P.S., Mangasarian, O.L., Musicant D.R.   Optimization methods in massive data sets, Handbook of Massive Data Sets (2002), Kluwer, 439-471.

Brunner, R.J., Djorgovski, S.G., Prince, T.A., Szalay, A.S.   Massive datasets in astronomy, Handbook of Massive Data Sets (2002), Kluwer, 931-979.

Cahill, M.H., Lambert, D., Pinheiro, J.C., Sun, D.X.   Detecting fraud in the real world, Handbook of Massive Data Sets (2002), Kluwer, 911-929.

Chazelle, B., Liu, D., Magen, A.   Sublinear geometric algorithms, Proc. 35th STOC (2003), 531-540.

Chazelle, B., Rubinfeld, R., Trevisan, L.   Approximating the minimum spanning tree weight in sublinear time, Proc. 28th ICALP (2001), 190-200.

Czumaj, A., Sohler, C.   Sublinear-time approximation for clustering via random sampling, Proc. 31st ICALP, 2004.

Czumaj, A., Sohler, C.   Estimating the weight of metric minimum spanning trees in sublinear-time Proc. 36th STOC (2004).

Czumaj, A., Ergun, F., Fortnow, L., Magen, A., Newman, I., Rubinfeld, R., Sohler, C.   Sublinear-time approximation of Euclidean minimum spanning tree, Proc. 14th SODA (2003), 813-822.

Drineas, P., Kannan, R.   Pass efficient algorithm for approximating large matrices, Proc. 14th SODA (2003), 223-232.

Drineas, P., Kannan, R.   Fast Monte Carlo algorithms for approximate matrix multiplication, Proc. 42nd FOCS (2001), 452-459.

DuMouchel, W.   Data squashing: constructing summary data sets, Handbook of Massive Data Sets (2002), Kluwer, 579-591.

Feige, U.   On sums of independent random variables with unbounded variance, and estimating the average degree in a graph, Proc. 36th STOC (2004).

Frieze, A., Kannan, R.   Quick approximation to matrices and applications, Combinatorica 19 (1999), 175-220.

Frieze, A., Kannan, R., Vempala, S.   Fast Monte-Carlo algorithms for finding low-rank approximations, Proc. 39th FOCS (1998), 378-390.

Gibbons, P.B., Matias, Y.   Synopsis data structures for massive data sets, Abstract in Proc. 10th SODA (1999), 909-910.

Harrower, M.   Representing uncertainty: does it help people make better decisions? Manuscript, 2002.

Indyk, P.   Sublinear-time algorithms for metric space problems, Proc. 31st STOC (1999), 428-434.

Indyk, P.   A sublinear-time approximation scheme for clustering in metric spaces, Proc. 40th FOCS (1999), 154-159.

Ma, Q., Wang, J.T.L., Gattiker, J.R.   Mining biomolecular data using background knowledge and artificial neural networks, Handbook of Massive Data Sets (2002), Kluwer, 1141-1168.

Mishra, N., Oblinger, D., Pitt, L.   Sublinear time approximate clustering, Proc. 12th SODA (2001), 439-447.

Murtagh, F.   Clustering in massive data sets, Handbook of Massive Data Sets (2002), Kluwer, 501-543.


Beier, R., Czumaj, A., Krysta, P., Voecking, B.   Computing equilibria for congestion games with (im)perfect information, Proc. 15th SODA (2004), 739-748.

Beier, R., Voecking, B.   Typical properties of winners and losers in discrete optimization, Proc. 36th STOC (2004).

Charikar, M., Fagin, R., Guruswami, V., Kleinberg, J.M., Raghavan, P., Sahai, A.   Query strategies for priced information, Proc. 32nd STOC (2000), 582-591.

Feder, T., Motwani, R., Panigrahy, R., Olston, C., Widom, J.   Computing the median with uncertainty, SIAM J. Comput. 32 (2003), 538-547.

Feder, T., Motwani, R., O'Callaghan, L., Olston, C., Panigrahy, R.   Computing shortest paths with uncertainty, Proc. 20th STACS (2003), 367-378.

Khanna, S., Tan, W.C.   On computing functions with uncertainty, Proc. 20th PODS (2001), 171-182.

Gupta, A., Pal, M., Ravi, R., Sinha, A.   Boosted sampling: Approximation algorithms for stochastic optimization, Proc. 36th STOC (2004).

Immorlica, N., Karger, D.R., Minkoff, M., Mirrokni, V.S.   On the costs and benefits of procrastination: approximation algorithms for stochastic combinatorial optimization problems, Proc. 15th SODA (2004), 684-693.

Ng, S.H., Chick, S.E.   Reducing input parameter uncertainty for simulations, Proc. 33rd Conf. Winter Simulation (2001), 364-371.

Sellen, J., Choi, J., Yap, C.K.   Precision-sensitive Euclidean shortest path in 3-Space, SIAM J. Comput. 29 (2000), 1577-1595.

Spielman, D.A., Teng, S.H.   Smoothed analysis: why the simplex algorithm usually takes polynomial time, Extended abstract in Proc. 33rd STOC (2001), 296-305.


Alon, N., Matias, Y., Szegedy, M.   The space complexity of approximating the frequency moments, J. Comput. Syst. Sci. 58 (1999), 137-147.

Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.   Models and issues in data stream systems, Proc. 21st PODS (2002), 1-16.

Babcock, B., Datar, M., Motwani, R.   Sampling from a moving window over streaming data, Proc. 13th SODA (2002), 633-634.

Charikar, M., Chen, K., Farach-Colton, M.   Finding frequent items in data streams, Theor. Comput. Sci. 312 (2004), 3-15.

Charikar, M., L. O'Callaghan and R. Panigrahy,   Better streaming algorithms for clustering problems, Proc. 35th STOC (2003), 30-39.

Cormode, G.   Stable distributions for stream Computations: it's as easy as 0,1,2. Proc. Workshop on Management and Processing of Data Streams (MPDS) 2003.

Cormode, G., Muthukrishnan, S.   What's hot and what's not: tracking frequent items dynamically, Proc. PODS (2003), 296-306.

Datar, M., Gionis, A., Indyk, P., Motwani, R.   Maintaining stream statistics over sliding windows, SIAM J. Comput. 31 (2002), 1794-1813.

Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.   Fast, small-space algorithms for approximate histogram maintenance, Proc. 34th STOC (2002), 389-398.

Gilbert, A., Kotidis, Y., Muthukrishnan, S., Strauss, M.   How to summarize the universe: dynamic maintenance of quantiles, Proc. 28th VLDB (2002), 454-465.

Indyk, P.   Stable distributions, pseudorandom generators, embeddings and data stream computation, Proc. 41st FOCS (2000), 189-197.

Muthukrishnan, S.   Data streams: algorithms and applications,


Broder, A., Charikar, M., Frieze, A., Mitzenmacher, M.   Min-wise independent permutations, J. Comput. Sys. Sci. 60 (2000), 630-659.

Charikar, M.   Similarity estimation techniques from rounding algorithms, Proc. 34th STOC (2002), 380-388.

Indyk, P.   Nearest neighbors in high-dimensional spaces, Handbook of Discrete and Computational Geometry, eds., J.E. Goodman and J. O'Rourke, CRC Press, to appear.

Indyk, P.   Algorithmic applications of low-distortion geometric embeddings, Proc. 42nd FOCS (2001), 10-33.

Indyk, P., Matousek, J.   Low-distortion embeddings of finite metric spaces, Handbook of Discrete and Computational Geometry, eds., J.E. Goodman and J. O'Rourke, CRC Press, to appear.

Indyk, P., Motwani, R.   Approximate nearest neighbors, Towards removing the curse of dimensionality, Proc. 30th STOC (1998), 604-613.

Indyk, P., Motwani, R., Raghavan, P., Vempala, S.   Locality-preserving hashing in multidimensional spaces, Proc. 29th STOC (1997), 618-625.

Kleinberg, J.M.   Two Algorithms for nearest-neighbor search in high dimensions, Proc. 29th STOC (1997), 599-608.

Kushilevitz, E., Ostrovsky, R., Rabani, Y.   Efficient search for approximate nearest neighbor in high dimensional spaces, SIAM J. Comput. 30 (2000), 457-474.