Princeton University
Computer Science Department

Computer Science 597F

AdTopCS: Visualization & Analysis of large-scale genomic data sets

Olga Troyanskaya



Course Summary

The goal of this course is to introduce students to computational issues involved in analysis and display of large-scale biological data sets. Algorithms covered will include clustering and machine learning techniques for gene expression and proteomics data analysis, biological networks, joint learning from multiple data sources, and visualization issues for large-scale biological data sets. No prior knowledge of biology or bioinformatics is required, and an introduction to the field of bioinformatics and the nature of biological data will be provided. In depth knowledge of computer science is not required, but some understanding of programming and computation will be helpful. The course will be taught in a mixed lectures and seminar format, and will involve completing a project.

The course is open to graduate and advanced undergraduate students from all departments.


Administrative Information

SIGNING UP:  You should be able to sign up for this course through SCORE using 25078 as the class no (course is COS 597F).  Let Melissa Lawson know if this doesn't work.

Lectures: MW 1330-1450, Room: 301

Professor: - Olga Troyanskaya - 204 CS Building - 258-1749 ogt@cs.princeton.edu (e-mail is the best way to contact)

Graduate Coordinator:\n"; echo "Graduate Coordinator: Melissa Lawson - 310 CS Building - 258-5387 mml@cs.princeton.edu


Course Format & Grading

This course will cover the following issues: microarray analysis, data integration, biological networks, visualization of large-scale biological data.  The class will consist of a mixture of lectures, student presentations of current literature papers, and discussions of these papers. 

Students will also complete a team or individual project.  The project will need to have a significant content related to the course, but could contribute to the student's current research and reflect the student's computational background. For example, you could implement and evaluate a machine learning method application for microarray data (if you have computational background). 

Grades will depend on class participation in discussions of assigned reading (20%), presentations (35%), and project (45%).

Books

There is no required book for this class.  Readings will be based on current literature.  However, here are a few book recommendations for the curious.

If you need to catch up on molecular biology and genetics: 

DOE primer on human genetics

R. Brent. Genomic Biology. Cell 100:169-183, 2000.

L. Hunter. Molecular Biology for Computer Scientists. In Artificial Intelligence and Molecular Biology, L. Hunter editor, 1993, AAAI Press.

 

Introduction to bioinformatics:

P.L. Elkin.  Primer on Medical Genomics Part V: Bioinformatics.  In Mayo Clinic Proceedings.

NCBI bioinformatics primer

NCBI primer on microarray analysis

Approximate Schedule

Note: This schedule is approximate and may change.  

	 S  M Tu  W Th  F  S
Sep	14 15 16 17 18 19 20	introduction to biology, bioinformatics, data; first class
	21 22 23 24 25 26 27	microarray analysis, types of experiments, databases
	28 29 30
Oct	          1  2  3  4	microarray analysis
	 5  6  7  8  9 10 11	microarray analysis
	12 13 14 15 16 17 18	proteomics
	19 20 21 22 23 24 25	data integration
	26 27 28 29 30 31	fall break
Nov	                   1
	 2  3  4  5  6  7  8	data integration
	 9 10 11 12 13 14 15	biological networks
	16 17 18 19 20 21 22	biological networks
	23 24 25 26 27 28 29	Thanksgiving
	30
Dec	    1  2  3  4  5  6	visualization
	 7  8  9 10 11 12 13	visualization; last class
	14 15 16 17 18 19 20	winter break
	21 22 23 24 25 26 27
	28 29 30 31
Jan	             1  2  3
	 4  5  6  7  8  9 10	
	11 12 13 14 15 16 17
	18 19 20 21 22 23 24	
	25 26 27 28 29 30 31
Slides
9/15 - Course details, molecular biology 101, challenges in functional genomics, intro to microarrays
9/17 - A (very) brief overview of database issues, data filtering, normalization, and clustering
Kai Li's guest lecture about visualization
Readings
NOTE: readings are listed for the date when they are DUE 
(not they date on which they are assigned)
Each student presentation will be UNDER 30 minutes (INCLUDING questions), and there will be 2 student presentations 
per class, followed by a 20 minute discussion.  It is perfectly fine to have a presentation that takes 20 minutes, with questions
you will probably take around 25-30mins anyway.  Aim at a mixed audience, but explain methods in details.
 

CLASS

PAPERS

PRESENTERS

9/15

 

DOE "Genomics and its impact on Science and Society"

R. Brent. "Genomic Biology"

lecture

 

9/17

Lockhart et al  "Genomics, gene expression, and DNA microarrays" (general microarray)

Kaminski N et al "Practical approaches to analyzing results of microarray experiments" (review)

lecture

9/22 CLASS CANCELLED  

9/24

Troyanskaya et al "Missing value estimation for DNA microarrays" (low-level processing)

Yang et al "Normalization for cDNA Microarray Data" (normalization/statistics-don't need to go into very fine detail on statistical methods)

Elena Nabieva

Tony Wirth

9/29

Eisen et al "Cluster analysis and display of genome-wide expression patterns" (Clustering/biology)

Cheng, Y et al "Biclustering of expression data" (Clustering)

Jessica Fong

Jie Chen

10/1

Brown, MPS et al "Knowledge-based analysis of microarray gene expression data by using support vector machines" (Data organization)

Raychaudhuri et al "The computational analysis of scientific literature to define and recognize gene expression clusters" (Data organization/some biology)

Joseph Berillari

 

Kristina Rogale

10/6

McShane LM et al "Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data" (Evaluation)

Alter et al "Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms" (combining expression data sets)

Nathanial Dirksen

 

Matthew Hibbs

10/8

 

Dettling et al "Boosting for tumor classification with gene expression methods" (Classification)

 

Troyanskaya et al "Nonparametric methods for identifying differentially expressed genes in microarray data" (marker selection/evaluation) (optional)

 

Discussion of class projects.

 

 

Andre Cavalcanti

 

 

 

Olga Troyanskaya

10/15

Liu, X. et al "Bioprospector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes" (regulatory regions discovery)

Segal, E. et al "Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data" (gene expression networks)

Jordan Vance

 

Robert Osada

10/17 @3pm

Phizicky et al "Protein analysis on a proteomic scale"

Eisenberg et al "Protein function in the post-genomic era"

Nathaniel Dirksen

Joseph Berillari

10/20

Edwards et al "Bridging structural biology and genomics: assessing protein interaction data with known complexes"

Tong et al "A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules"

K. Rogale

 

J. Fong

10/22

Greenbaum et al "Interrelating different types of Genomic Data, from proteome to secretome: Oming in on the function"

Troyanskaya et al "A Bayesian framework for combining heterogeneous data sources for gene function prediction (in S. cerevisiae)"

J. Vance

 

M. Hibbs

11/3

Class projects proposal presentations

project groups

11/5

Stuart et al "A gene coexpression network for global discovery of conserved genetic modules"

Letovsky et al "Predicting protein function from protein/protein interaction data: a probabilistic approach"

Elena Nabieva

Andre Cavalcanti

11/10

Smith et al. "Evaluating functional network inference using simulations of complex biological systems"

Lanckriet et al "Kernel-based data fusion and its application to protein function prediction in yeast"

Robert Osada

Kristina Rogale

11/12

Ihmels et al "Revealing modular organization in the yeast transcriptional network"

Bar-Joseph et al "Computational discovery of gene modules and regulatory networks"

Jessica Fong

Elena Zaslavsky

11/17

outside lecture - JP Singh

JP Singh

11/19

Project progress reports

Matt & Nathaniel

Jordan

Joe

Kristina

Andre & Jie

11/24

Sharan et al "Click and Expander: a system for clustering and visualizing gene expression data"

Breitkreutz "Osprey: a network visualization system" - this is a short paper, so the presenter should also download the Osprey software and show us visualizations it is capable of, as well as outline the limitations of the software (based on its use, not just the paper)

Jie Chen

Matt Hibbs

 

11/26

Demir et al "Patika: an integrated visual environment for collaborative construction and analysis of cellular pathways"

Werner-Washburne et al "Comparative Analysis of multiple genome-scale data sets"

Nathaniel Dirksen

 

Joe Berillari

12/1

outside lecture - Kai Li

Kai Li

12/3

Q&A about course and projects

 

12/8

Final project presentations

Kristina

Andre & Jie

12/10

Final project presentations

Matt & Nathaniel

Jordan

Joe