Structural variation, in the broadest sense, is defined as the genomic changes among individuals that are not single nucleotide variants. Rapid computational methods are needed to comprehensively detect and characterize specific classes of structural variation using next-gen sequencing technology. We have developed a suite of tools using a new aligner, mrFAST, and algorithms focused on the characterization of structural variants that have been more difficult to assay : (i) deletions, small insertions, inversions and mobile element insertions using read-pair signatures (VariationHunter), (ii) novel sequence insertions coupling read-pair data local sequence assembly (NovelSeq), (iii) absolute copy number of duplicated genes using read-depth analysis coupled with single-unique nucleotide (SUN) identifiers. I will present a summary of our results of 9 high-coverage human genomes regarding these particular classes of structural variation compared to other datasets. In particular, I will also summarize our read-depth analysis of 159 low-coverage human genomes for copy number variation of duplicated genes. We use these data to correct CNV genotypes for copy number and location and discover previously hidden patterns of complex polymorphism. Our results demonstrate, for the first time, the ability to assay both copy and content of complex regions of the human genome, opening these regions to disease association studies and further population and evolutionary analyses. The algorithms we have developed will provide a much needed step towards a highly reliable and comprehensive structural variation discovery framework, which, in turn will enable genomics researchers to better understand the variations in the genomes of newly sequenced human individuals including patient genomes.
Can Alkan is currently a Senior Fellow in Professor Evan E. Eichler's group in the Department of Genome Sciences at the University of Washington. His current work includes computational prediction of human structural variation, and characterization of segmental duplications and copy-number polymorphisms using next generation sequencing data. He graduated from Bilkent University Dept. of Computer Engineering in 2000, and received his Ph.D. in Computer Science from Case Western Reserve University in 2005. During his Ph.D. he worked on the evolution of centromeric DNA, RNA-RNA interaction prediction and RNA folding problems.