This course seminar will explore algorithmic challenges that emerge in the analysis and interpretation of genome sequencing data, with a particular focus on applications in cancer genomics and immunogenomics. Areas of focus include:
The mutational process of cancer evolution. The underlying algorithmic problem is to construct trees that represent the relationships between cells from mutational data. We will explore tree reconstruction algorithms using phylogenetic techniques (perfect phylogeny and Dollo parsimony) and population genetic techniques (branching processes and the coalescent).
The identification of combinations of cancer causing mutations. Such combinations typically result from biological interactions between genes, which are represented via graphs, or networks. We will examine algorithms to analyze data on graphs including random walks (e.g. PageRank), diffusion processes, community detection, and spectral methods for graph partitioning.
Course Organization
The course will be organized in seminar style where students will read and present articles and recent research papers on the topics listed above. These topics will be introduced with introductory lectures. Students will undertake a project to further study one of the topics. To the extent possible, projects will be adjusted to the background/interest of the student and could range from theoretical (e.g. designing a new algorithm and proving its correctness), to the practical (a software implementation). The project will include a written proposal, midterm report, and final presentation.
Prerequisites
Undergraduate-level knowledge of algorithms
Undergraduate-level knowledge of probability and statistics: conditional probability, Bayes’ rule, random variables, distributions.
Linear algebra
No biology background is assumed. Necessary background will be
introduced in lectures and reading.