include "course_info.php"; ?>
![]()
Princeton University
|
Computer Science 557
|
|
include "http://www.cs.princeton.edu/courses/descriptions/cs$code"; ?>
The goal of this course is to introduce students to computational issues involved in analysis and display of large-scale biological data sets. Algorithms covered will include clustering and machine learning techniques for gene expression and proteomics data analysis, biological networks, joint learning from multiple data sources, and visualization issues for large-scale data sets. We will focus on the computational approaches and include some examples from other disciplines. No prior knowledge of biology or bioinformatics is required, and an introduction to the field of bioinformatics and the nature of biological data will be provided. In depth knowledge of computer science is not required, but some understanding of programming and computation will be helpful. The course will be taught in a mixed lectures and seminar format, and will involve completing a project and a final exam.
The course is open to graduate and advanced undergraduate students from all departments.
Professor: echo $prof_fullname; ?> - echo $prof_room; ?> Olga Troyanskaya - 204 CS Building - 258-1749 ogt@cs.princeton.edu (e-mail is the best way to contact)
if(substr($code,0,1) == "5") { echo "Graduate Coordinator:\n"; echo "Graduate Coordinator: Melissa Lawson - 310 CS Building - 258-5387 mml@cs.princeton.edu
Course Format & Grading
This course will cover the following issues: microarray analysis, data integration, biological networks, visualization of large-scale biological data. The course will focus on the computational methods and will also incorporate non-biological examples when possible. The class will consist of a mixture of lectures, student presentations of current literature papers, and discussions of these papers.
Students will also complete a team or individual project and have a final exam. The project will need to have a significant content related to the course, but could contribute to the student's current research and reflect the student's computational background. For example, you could implement and evaluate a machine learning method application for microarray data (if you have computational background).
Books
There is no required book for this class. Material will be presented in lectures, and readings will be based on current literature. However, here are a few recommendations for the curious.
If you need to catch up on molecular biology and genetics:
R. Brent. Genomic Biology. Cell 100:169-183, 2000.
L. Hunter. Molecular Biology for Computer Scientists. In Artificial Intelligence and Molecular Biology, L. Hunter editor, 1993, AAAI Press.
Introduction to bioinformatics:
P.L. Elkin. Primer on Medical Genomics Part V: Bioinformatics. In Mayo Clinic Proceedings.
NCBI primer on microarray analysis
Example last year's project topics
"Plumbing the breadth of S. cerevisiae interaction databases"
"3D clustering"
"Analysis of cluster quality using visualization"
Readings
Each student presentation will be UNDER 25 minutes (under 30mins with questions), and there will be 2 student presentations
per class, followed by a 20 minute discussion. It is perfectly fine to have a presentation that takes 20 minutes, with questions
you will probably take around 25-30mins anyway. Aim at a mixed audience, but explain methods in details.
CLASS |
PAPERS |
PRESENTERS |
9/13 | Intro to biology |
lecture
|
9/15 |
Intro to microarray analysis - some suggested readings below: Lockhart et al "Genomics, gene expression, and DNA microarrays" (general microarray) Kaminski N et al "Practical approaches to analyzing results of microarray experiments" (review) |
lecture |
9/20 |
More about clustering in microarrays Troyanskaya et al "Missing value estimation for DNA microarrays" (low-level processing)
|
lecture David Karig |
9/22 |
Yang et al "Normalization for cDNA Microarray Data" (normalization/statistics-don't need to go into very fine detail on statistical methods) Tanay et al "Discovering statistically significant biclusters in gene expression data" (Clustering/algorithms) |
Gunter S. Melissa C. |
9/27 |
Tanay et al continued
Datta et al "Comparison and validation of statistical clustering techniques for microarray gene expression data" (Cluster Evaluation and Validation) |
Melissa
Sergey K. |
9/29 |
McShane LM et al "Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data" (ClusterEvaluation) Raychaudhuri et al "The computational analysis of scientific literature to define and recognize gene expression clusters" (Data organization/some biology) |
Berk K. Tony C. |
10/4 | Machine learning (download slides) |
guest lecture by Rob Schapire |
10/6 |
Machine learning
|
guest lecture by Rob Schapire
|
10/11 |
Middendorf et al "Predicting
genetic regulatory response using classification" Brown, MPS et al "Knowledge-based analysis of microarray gene expression data by using support vector machines" (Data organization) |
Ian P. Umar S. |
10/13 |
Bergmann et al. "Similarities and differences in genome-wide expression data of six organisms." (Biology, combining gene expression datasets) Grad YH et al. "Prediction of similarly-acting cis-regulatory modules by subsequence profiling and comparative genomics in D. melanogaster and D. pseudoobscura." (motifs, some biology, HMMs) |
Zafer Curtis H. |
10/18 | Functional genomics data sources, intro to Bayes nets, some intro information about class projects | lecture |
10/20 |
Lanckriet et al "A
statistical framework for genomic data fusion." (data fusion, SVM,
kernels) discussion of independent projects |
Stasy J.
|
11/1 |
Troyanskaya et al "A Bayesian framework for combining heterogeneous data sources for gene function prediction (in S. cerevisiae)" (data fusion, Bayes) Segal, E. et al "Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data" (data fusion, regulatory modules discovery, Bayes) |
Tony C. Berk |
11/3 |
Bar-Joseph et al "Computational discovery of gene modules and regulatory networks" Han et al. "Evidence for dynamically organized modularity in the yeast protein–protein interaction network" (biology, biological networks) |
Melissa Sergey K. |
11/8 | Project proposal presentations | project groups |
11/10 |
Stuart et al "A gene-coexpression network for global discovery of conserved genetic modules." McCarroll et al "Comparing genomic expression patterns across species identifies shared transcriptional profile in aging." |
Umar Curtis |
11/15 |
Yeger-Lotem et al "Network motifs in integrated cellular networks of transcription–regulation and protein–protein interaction" (biological networks) Haverty et al "Computational inference of transcriptional regulatory networks from expression profiling and transcription factor binding site identification"(biological networks) |
David Sergey |
11/17 |
Ideker et al "Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network" (networks, some experiments) Wong et al "Combining biological networks to predict genetic interactions." (biological networks) |
David Stacy |
11/24 |
Sharan et al "Click
and Expander: a system for clustering and visualizing gene expression data" Demir et al "Patika: an integrated visual environment for collaborative construction and analysis of cellular pathways" |
Tony Ian |
11/29 |
Papatheodorou I et al.
Visualization of microarray results to assist interpretation. Baehrecke EH et al. Visualization and analysis of microarray and gene ontology data with treemaps. |
Melissa Berk |
12/1 |
Davidson et al.
Cluster stability and the use of noise in interpretation of clustering.
(Interesting clustering algorithm + visualization) Breitkreutz "Osprey: a network visualization system" - this is a short paper, so the presenter should also download the Osprey software and show us visualizations it is capable of, as well as outline the limitations of the software (based on its use, not just the paper) |
Curtis Gunter |
12/6 | Project presentations - 15 mins, make sure to talk about your current results, and where exactly are you going with the project | Curtis Melissa Stacy Berk Sergey |
12/8 | Project presentations & discussion of exam etc. | Ian Tony David Umar |
Finals will be posted Friday 12/3 and will be due Monday 12/13 (finals are open book, open computer, but on honor code should be done on your own). Project writeups will be 5 pages of text (no smaller than 10pt font) + however many figures and refs you want |
|
Alter et al "Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms" (data fusion, SVD)