dxiao
|
email: @lri.fr |
| snail mail:
David Xiao |
English | Français | 中文 | עברית
Welcome to my home page. I am a postdoc in the Algorithms and Complexity Group at LRI, focusing on complexity theory and cryptography. I obtained my Ph. D from Princeton University in 2009 under the co-supervision of Boaz Barak and Avi Wigderson (IAS).
Bienvenue à mon site web. Je suis post-doctorant dans l'équipe algorithmes et complexité au LRI, et je m'intéresse surtout à la théorie de la complexité et à la cryptographie. J'ai obtenu mon doctorat à l'Université de Princeton en 2009 sous la direction de Boaz Barak et Avi Wigderson (IAS).
欢迎光临本网页。 我在法国计算机研究所(LRI)的复杂性与算法系作博士后,研究专业是复杂性理论与密码学。 我2009年在美国普林斯顿大学取得博士,导师为Boaz Barak 与 Avi Wigderson (IAS).
Complexity
We present a constant-round public-coin protocol between an efficient verifier and an all-powerful (but possibly cheating) prover that achieves the following. Both parties take as input an efficiently verifiable set $S$, an approximation $s \approx |S|$, and an efficiently computable partition $f : S \rightarrow \zo^*$ where $f(x) = f(x')$ if and only if $x, x'$ belong to the same set of the partition. The verifier is guaranteed to output with high probability a uniformly random $x \getsr S$ along with a number $s_x$ that approximates the size of the set $|f^{-1}(f(x))|$.
Our main application is to rule out certain black-box reductions from the worst-case hardness of $\SAT$ to the task of breaking the binding of a constant-round statistically hiding commitment scheme. We show that the existence of a $k$-adaptive randomized reduction for the above task implies that $\coNP$ has a $O(k)$-round public-coin interactive protocol, namely $\coNP \subseteq \AM[k]$. As corollaries, we derive the following:
Our impossibility result holds even if the hiding property of the commitment scheme is only guaranteed to hold against honest receivers. Naturally, our result extends to any primitive that implies constant-round statistically hiding commitment via a black-box proof of security. Most notably, this includes collision resistant hash functions, as well as constant-round protocols for single-server private information retrieval and oblivious transfer with statistical security for one of the parties.
Learning theory, and in particular PAC learning, was introduced by Valiant [CACM 1984] and has since become a major area of research in theoretical and applied computer science. One natural question that was posed at the very inception of the field is whether there are classes of functions that are hard to learn.
PAC learning is hard under widely held conjectures such as the existence of one-way functions, and on the other hand it is known that if PAC learning is hard then $\P \neq \NP$. We further study sufficient and necessary conditions for PAC learning to be hard, and we prove that:
Here, ``standard techniques'' refers to various classes of efficient reductions. Together, these results imply that the hardness of PAC learning lies between the non-triviality of $\ZK$ on the one hand and the hardness of $\NP$ on the other hand. Furthermore, the hardness of PAC learning lies ``strictly'' between the two, in the sense that most standard techniques cannot prove equivalence with either $\ZK \neq \BPP$ or $\NP \neq \P$.
xIn proving these results, we show new connections between PAC learning and auxiliary-input one-way functions, which were defined by Ostrovsky and Wigderson [ISTCS 1993] to better understand $\ZK$. We also define new problems related to PAC learning that we believe of are independent interest, and may be useful in future studies of the complexity of PAC learning.
A secure failure-localization (FL) protocol allows a sender to localize faulty links on a single path through a network to a receiver, even when intermediate nodes on the path behave adversarially. Such protocols were proposed as tools that enable Internet service providers to select high-performance paths through the Internet, or to enforce contractual obligations. We give the first formal definitions of security for FL protocols and prove that for such protocols, security requires each intermediate node on the path to have some shared secret information (\eg keys), and that every black-box construction of a secure FL protocol from a random oracle requires each intermediate node to invoke the random oracle. This suggests that achieving this kind of security is unrealistic as intermediate nodes have little incentive to participate in the real world.
Ahlswede and Winter [IEEE Trans. Inf. Th. 2002] introduced a Chernoff bound for matrix-valued random variables, which is a non-trivial generalization of the usual Chernoff bound for real-valued random variables. We present an efficient derandomization of their bound using the method of pessimistic estimators (see Raghavan [JCSS 1988]). As a consequence, we derandomize a construction of Alon and Roichman [RSA 1994] to efficiently construct an expanding Cayley graph of logarithmic degree on any (possibly non-abelian) group. This gives an optimal solution to the homomorphism testing problem of Shpilka and Wigderson [STOC 2004]. We also apply these pessimistic estimators to the problem of solving semi-definite covering problems, thus giving a deterministic algorithm for the quantum hypergraph cover problem of Ahslwede and Winter.
Learning is a central task in computer science, and there are various formalisms for capturing the notion. One important model studied in computational learning theory is the PAC model of Valiant (CACM 1984). On the other hand, in cryptography the notion of "learning nothing" is often modelled by the simulation paradigm: in an interactive protocol, a party learns nothing if it can produce a transcript of the protocol by itself that is indistinguishable from what it gets by interacting with other parties. The most famous example of this paradigm is zero knowledge proofs, introduced by Goldwasser, Micali, and Rackoff (SICOMP 1989).
Applebaum et al. (FOCS 2008) observed that a theorem of Ostrovsky and Wigderson (ISTCS 1993) combined with the transformation of one-way functions to pseudo-random functions (Håstad et al. SICOMP 1999, Goldreich et al. J. ACM 1986) implies that if there exist non-trivial languages with zero-knowledge arguments, then no efficient algorithm can PAC learn polynomial-size circuits. They also prove a weak reverse implication, that if a certain non-standard learning task is hard, then zero knowledge is non-trivial. This motivates the question we explore here: can one prove that hardness of PAC learning is equivalent to non-triviality of zero-knowledge? We show that this statement cannot be proven via the following techniques:
Under the standard conjecture that NP is not contained in SZK, our results imply that most standard techniques do not suffice to prove the equivalence between the non-triviality of zero knowledge and the hardness of PAC learning. Our results hold even when considering non-uniform hardness of PAC learning with membership queries. In addition, our technique relies on a new kind of separating oracle that may be of independent interest.
We consider the question of whether $\P\neq\NP$ implies that there exists some concept class that is efficiently representable but is still hard to learn in the PAC model of Valiant (CACM '84). We show that unless the Polynomial Hierarchy collapses, such a statement cannot be proven via a large class of reductions including Karp reductions, truth-table reductions, and a restricted form of non-adaptive Turing reductions. Also, a proof that uses a Turing reduction of constant levels of adaptivity would imply an important consequence in cryptography as it yields a transformation from any average-case hard problem in $\NP$ to a one-way function. Our results hold even in the stronger model of agnostic learning.
These results are obtained by showing that lower bounds for improper learning are intimately related to the complexity of zero-knowledge arguments and to the existence of weak cryptographic primitives. In particular, we prove that if a language $L$ reduces to the task of improper learning circuits, then, depending on the type of the reduction in use, either (1) $L$ has a statistical zero-knowledge argument system, or (2) the worst-case hardness of $L$ implies the existence of a weak variant of one-way functions defined by Ostrovsky-Wigderson (ISTCS '93). Interestingly, we observe that the converse implication also holds. Namely, if (1) or (2) hold then the intractability of L implies that improper learning is hard.
Ahlswede and Winter [IEEE Trans. Inf. Th. 2002] introduced a Chernoff bound for matrix-valued random variables, which is a non-trivial generalization of the usual Chernoff bound for real-valued random variables. We present an efficient derandomization of their bound using the method of pessimistic estimators (see Raghavan [JCSS 1988]). As a consequence, we derandomize a construction of Alon and Roichman [RSA 1994] to efficiently construct an expanding Cayley graph of logarithmic degree on any (possibly non-abelian) group. This gives an optimal solution to the homomorphism testing problem of Shpilka and Wigderson [STOC 2004]. We also apply these pessimistic estimators to the problem of solving semi-definite covering problems, thus giving a deterministic algorithm for the quantum hypergraph cover problem of Ahslwede and Winter.
The results above appear as theorems in our paper ``A randomness-efficient sampler for matrix-valued functions and applications'' [FOCS 2005, ECCC 2005], as consequences of the main claim of that paper: a randomness efficient sampler for matrix valued functions via expander walks. However, we discovered an error in the proof of that main theorem (which we briefly describe in the appendix). That claim stating that the expander walk sampler is good for matrix-valued functions thus remains open. One purpose of the current paper is to show that the applications in that paper hold true despite our inability to prove the expander walk sampler theorem for matrix-valued functions.
N.B.: refer to [WX08] for details about errors in this paper.
In this paper we give a randomness-efficient sampler for matrix-valued functions. Specifically, we show that a random walk on an expander approximates the recent Chernoff-like bound for matrix-valued functions of Ahlswede and Winter [IEEE Trans. Inf. Th. '02], in a manner which depends optimally on the spectral gap. The proof uses perturbation theory, and is a generalization of Gillman's [FOCS '93] and Lezaud's [Ann. App. Prob. '98] analyses of the Ajtai-Komlos-Szemeredi sampler for real-valued functions [STOC '87].
Derandomizing our sampler gives a few applications, yielding deterministic polynomial time algorithms for problems in which derandomizing independent sampling gives only quasi-polynomial time deterministic algorithms. The first (which was our original motivation) is to a polynomial-time derandomization of the Alon-Roichman theorem [Rand. St. and Alg. '94]: given a group of size $n$, find $O(\log n)$ elements which generate it as an expander. This implies a second application - efficiently constructing a randomness-optimal homomorphism tester, significantly improving the previous result of Shpilka and Wigderson [FOCS '04]. The third is to a ``non-commutative'' hypergraph covering problem - a natural extension of the set-cover problem which arises in quantum information theory, in which we efficiently attain the integrality gap when the fractional semi-definite relaxation cost is constant.
Cryptography and Security
Selective opening attacks against commitment schemes occur when the commitment scheme is repeated in parallel (or concurrently) and an adversary can choose depending on the commit-phase transcript to see the values and openings to some subset of the committed bits. Commitments are secure under such attacks if one can prove that the remaining, unopened commitments stay secret.
We prove the following black-box constructions and black-box lower bounds for commitments secure against selective opening attacks:
Our lower bounds improve upon the parameters obtained by the impossibility results of Bellare et al. (EUROCRYPT '09), and are proved in a fundamentally different way, by observing that essentially all known impossibility results for black-box zero-knowledge can also be applied to the case of commitments secure against selective opening attacks.
In addition to the impossibility results mentioned above, we also rule out the existence of commitments with zero statistical binding error and receiver public-coin commitments for parallel composition.
We consider the following problem: can we construct constant-round zero-knowledge proofs (with negligible soundness) for $\NP$ assuming only the existence of one-way permutations? We answer the question in the negative for fully black-box constructions (using only black-box access to both the underlying primitive and the cheating verifier) that satisfy a natural restriction on the ``adaptivity'' of the simulator's queries. Specifically, we show that only languages in $\coAM$ have constant-round zero-knowledge proofs of this kind.
Edge networks connected to the Internet need effective monitoring techniques to drive routing decisions and detect violations of Service Level Agreements (SLAs). However, existing measurement tools, like ping, traceroute, and trajectory sampling, are vulnerable to attacks that can make a path look better than it really is. In this paper, we design and analyze path-quality monitoring protocols that reliably raise an alarm when the packet-loss rate and delay exceed a threshold, even when an adversary tries to bias monitoring results by selectively delaying, dropping, modifying, injecting, or preferentially treating packets.
Despite the strong threat model we consider in this paper, our protocols are efficient enough to run at line rate on high-speed routers. We present a secure sketching protocol for identifying when packet loss and delay degrade beyond a threshold. This protocol is extremely lightweight, requiring only 250–600 bytes of storage and periodic transmission of a comparably sized IP packet to monitor billions of packets. We also present secure sampling protocols that provide faster feedback and accurate round-trip delay estimates, at the expense of somewhat higher storage and communication costs. We prove that all our protocols satisfy a precise definition of secure path-quality monitoring and derive analytic expressions for the trade-off between statistical accuracy and system overhead. We also compare how our protocols perform in the client-server setting, when paths are asymmetric, and when packet marking is not permitted.
.A secure failure-localization path-quality-monitoring (FL-PQM) protocol allows a sender to localize faulty links on a single path through a network to a receiver, even when intermediate nodes on the path behave adversarially. Such protocols were proposed as tools that enable Internet service providers to select high-performance paths through the Internet, or to enforce contractual obligations. We give the first formal definitions of security for FL-PQM protocols and construct:
We also prove lower bounds for such protocols:
Graph theory
This thesis has two goals. First, we present an introduction to the line of work that began with the study of expander graphs in the non-constructive setting, which then led to the algebraic con- struction of expanders, and finally has recently produced combinatorial constructions. Second, we extend the work on combinatorial constructions by new analyses and new constructions.
Miscellaneous
Previous work on estimating the entropy of written natural language has focused primarily on English. We expand this work by considering other natural languages, including Arabic, Chinese, French, Greek, Japanese, Korean, Russian, and Spanish. We present the results of PPM compression on machine-generated and human-generated translations of texts in various languages.
Under the assumption that languages are equally expressive, and that PPM compression does well across languages, one would expect that translated documents would compress to approximately the same size. We verify this empirically on a novel corpus of translated documents. We suggest as an application of this finding using the size of compressed natural language texts as a mean of automatically testing translation quality.
