COS557/MOL557:

Analysis & Visualization of Large Scale Genomic Data Sets  

Fridays 1:30-4:00pm

Rm. 302 in CS building

(35 Olden street)

 

Expression of genes predictive of outcome in lung cancer.

Course Info

The goal of this course is to introduce students to computational issues involved in analysis and display of large-scale biological data sets.  Techniques covered will include clustering and machine learning techniques for gene expression microarrays and proteomics data analysis, biological networks and pathways modeling, data integration in genomics, and visualization issues for large-scale data sets.

A short introduction to the field of bioinformatics and the nature of biological data will be provided, no prior knowledge of biology is required. In depth knowledge of computer science is not required, but students must have some understanding of computation.

The course will be taught in a mixed lectures and seminar format, and will involve completing a project and a final exam.  The course is open to graduate and advanced undergraduate students from all departments.

Level: Graduate and upper level undergraduate

 

Background: Some understanding of computation (programming background not required)

 

Format: Mixed lectures and seminar-style

 

Instructor: Prof. Olga Troyanskaya

 

Grading: 40% class presentations, 15% class participation (including attendance), 45% final project (15% project proposal, 30% final project report)

 

Auditors: Auditors are welcome, but every auditor must participate in presentations and discussions (but does not need to do the final project).   

Administrative:

There is no required book for this class.  Material will be presented in lectures, and readings will be based on current literature.  However, here are a few recommendations for the curious.

If you need to catch up on molecular biology and genetics: 

DOE primer on human genetics

R. Brent. Genomic Biology. Cell 100:169-183, 2000.

L. Hunter. Molecular Biology for Computer Scientists. In Artificial Intelligence and Molecular Biology, L. Hunter editor, 1993, AAAI Press.

 

Introduction to bioinformatics:

P.L. Elkin.  Primer on Medical Genomics Part V: Bioinformatics.  In Mayo Clinic Proceedings.

NCBI bioinformatics primer

NCBI primer on microarray analysis

Presentations:

Each presentation should be 30mins, with 10-15mins for discussion afterwards.  Presentations should be in power point (or another slides format), and you must e-mail me the power point after your presentation before I can grade it.

 

A good presentation would include:

-a brief overview of the paper

-outline of major methods and findings

-analysis of what the paper did well

-analysis of problems/issues with the approach

-what is the future (don’t just retype the “future work” section, we’re looking for your analysis here)

 

Course Announcements (check here often):

 

 

 

 

Wk

Topic

Papers

Presenters

1

Introduction to the course and bio

Intro to the course and introduction to biology and bioinformatics

Reading: “Systems biology 101-what you need to know

Lecture

2

Microarrays

Microarray analysis introduction and overview

 

Required reading:

Comparison and validation of statistical clustering techniques for microarray gene expression data

 

Suggested readings:

Lockhart et al  "Genomics, gene expression, and DNA microarrays" (general microarray)

Kaminski N et al "Practical approaches to analyzing results of microarray experiments" (review)

**Hand and Heard “Finding groups in gene expression data” (a very nice and fairly complete review of clustering microarray data)

Lecture

 

 

David Braun

3

Microarrays

Normalization and analysis of cDNA micro-arrays using within-array

replications applied to neuroblastoma cell response to a cytokine.

 

Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms

 

Combined static and dynamic analysis for determining the quality of time-series expression profiles.

Yingying Fan

 

 

Ronny Luss

 

 

Qing Wang

4

 

 

CLASS CANCELLED

 

5 (3/10)

Regulation (modules and pathways)

Inferring quantitative models of regulatory networks from expression data

 

Combining phylogenetic motif discovery and motif clustering to predict co-regulated genes.

 

Genome-wide discovery of transcriptional modules from DNA sequence and gene expression.

Sonya Nikolova

 

Wei Dong

 

 

Clifford Lam

6

(3/17)

Data integration

Introduction and overview of data integration and networks prediction

 

 

Lecture

 

 

7 (3/31)

Interactions and Networks

Estimating gene regulatory networks and protein-protein interactions of Saccharomyces cerevisiae from multiple genome-wide data. 

 

 

Emergent behavior of growing knowledge about molecular interactions

 

 

Tim O’Connor

 

 

 

Wei Dong

 

 

8

(4/7)

Bayesian methods in biology and medicine

Guest Lecture

 

Required reading:

Inference in Bayesian Networks

 

The synthetic genetic interaction spectrum of essential genes  (bio)

Joseph Kahn, Novartis Pharmaceuticals

 

 

 

Adam Litterman

9 (4/14)

Interactions and Networks

Probabilistic model of the human protein-protein interaction network

 

Creation and implications of a phenome-genome network

 

Systematic interpretation of genetic interactions using protein networks

Jeffrey Traer Bernstein

 

Jim Tonn

 

David Braun

 

10 (4/21)

Comparative Genomics et al.

Functional genomic hypothesis generation and experimentation by a robot scientist

 

A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules

 

Detection of parallel functional modules by comparative analysis of genome sequences

Lisa Chung

 

 

Tim O’Connor

 

 

 

Adam Litterman

11 (4/28)

Project presentaitons and visualization

 

Final project proposals (5 min presentations, only students taking the course for credit.

 

 

 

Using process diagrams for the graphical representation of biological networks

 

Visualization methods for statistical analysis of microarray clusters.

 

 

 

 

 

 

Lisa Chung

 

 

Jeffrey Traer Bernstein

12 (5/5)

Visualization

 

Click and Expander: a system for clustering and visualizing gene expression data

 

Cluster stability and the use of noise in interpretation of clustering. (Interesting clustering algorithm + visualization)

 

Visualization and analysis of microarray and gene ontology data with treemaps

 

Qing Wang

 

 

Sonya Nikolova

 

 

Jim Tonn

 

Additional articles of interest

Inferring pathways and networks with a Bayesian

framework