DNA Hash Pooling and its Applications
Dennis Shasha
Computer Science, New York University
In this paper we describe a new technique for the characterisation of populations of DNA strands. Such tools are vital to the study of ecological systems, at both the micro (e.g., individual humans) and macro (e.g., lakes) scales. Existing methods make extensive use of DNA sequencing and cloning, which can prove costly and time consuming. The overall objective is to address questions such
as: (i) (Genome detection) Is a known genome sequence present at least in part in an environmental sample?
(ii) (Sequence query) Is
a specific fragment sequence present in a sample? (iii) (Similarity
Discovery) How similar in terms of
sequence content are two unsequenced samples?
We propose a method involving
multiple filtering criteria that result in ``pools" of DNA of high or very high purity.
Because our method is similar in spirit to hashing in computer science, we call the method "DNA hash pooling".
To illustrate this method, we describe examples using pairs of restriction enzymes.
The "in silico" empirical results we present reflect a robustness to experimental error.
The method requires minimal DNA sequencing and, when sequencing is required, little or no cloning.
|