COS557/MOL557:
Analysis & Visualization of Large Scale
Genomic Data Sets
Mondays 1:00-3:30pm
Rm. 200 in CIL (genomics building)
Course Info
The goal of this course is to introduce students to computational issues involved in analysis and display of large-scale biological data sets. Techniques covered will include clustering and machine learning techniques for gene expression microarrays and proteomics data analysis, biological networks and pathways modeling, data integration in genomics, and visualization issues for large-scale data sets.
A short introduction to the field of bioinformatics and the nature of biological data will be provided, no prior knowledge of biology is required. In depth knowledge of computer science is not required, but students must have some understanding of computation (though no need to know programming).
The course will be taught in a mixed lectures and seminar format, and will involve completing a project and a final exam. The course is open to graduate and advanced undergraduate students from all departments.
Administrative
info:
Level: Graduate and upper level undergraduate
Background: Some understanding of computation
Format: Mixed lectures and seminar-style
Grading: 30% presentations
15% quizzes
20% participation (including attendance and participation in discussions)
35% final project (10% project proposal, 25% final project report)
Auditors: Auditors are welcome, must participate in presentations and discussions (but do not need to do the final project).
For more admin info, see syllabus):
There is no required book for this class. Material will be presented in lectures, and readings will be based on current literature. However, here are a few recommendations for the curious.
If you need to catch up on molecular biology and genetics:
R. Brent. Genomic Biology. Cell 100:169-183, 2000.
L. Hunter. Molecular Biology for Computer Scientists. In Artificial Intelligence and Molecular Biology, L. Hunter editor, 1993, AAAI Press.
Introduction to bioinformatics:
NCBI primer on microarray analysis
Presentations:
Each presentation should be 20mins, with discussion afterwards. Presentations should be in power point (or another slides format), and you must e-mail me the power point after your presentation before I can grade it.
A good presentation would include:
-a brief overview of the paper
-outline of major methods and findings, with background of important concepts (e.g. if the paper uses Dynamic Bayesian Networks, give an intro of what they are)
-suggest discussion points for the class: what the paper did well, what are problems/issues with the approach, what puzzled you
-what should be the future of this method (don’t just retype the “future work” section, we’re looking for your analysis here)
Course Announcements (check here often):
PLEASE sign
up for the course on blackboard, or you won’t get any of the course-related
emails, which are important.
If you are
auditing, sing up for audit. If
you are a postdoc and can’t officially sign up, let
me know, and I’ll make sure
to copy you on e-mails.
CHECK THE ASSIGNMENTS BELOW – if you don’t have an assignment and you are AUDITING OR
taking
the class for credit, let me know ASAP.
Course schedule:
Wk |
Topic |
Papers |
Presenters |
1 (2/5) |
Introduction to the
course and bio |
Intro to the course and introduction to
biology and bioinformatics |
Lecture |
2 (2/12) |
Microarrays |
Microarray analysis introduction and
overview Required reading: Hand and Heard “Finding
groups in gene expression data” (a very nice review of clustering
microarray data, present general concepts and choose 2-3 methods (not
hierarchical or kmeans or SOM) to describe in a bit
more detail) Suggested readings: Lockhart et al "Genomics,
gene expression, and DNA microarrays" (general microarray) Kaminski N et al "Practical
approaches to analyzing results of microarray experiments" (review) Ehrenreich A. “DNA microarray technology for
the microbiologist: an overview.” (a nice intro to types of microarrays and
how microarray experiments work) |
Lecture Patrick B. |
3 (2/19) |
Microarrays |
Normalization
of Microarray Data: Single-Labeled and Dual-Labeled Arrays |
Tony
Ambrosini Andrew F Chris Bristow |
4 (2/26) |
Regulation (modules
and pathways) |
Inferring transcriptional
modules from ChIP-chip, motif and microarray data. .A Systems
Approach to Mapping DNA Damage Response Pathways (bio+method) |
Ari S. Sandhya Julie Wu |
5 (3/5) |
Bayesian methods in
biology and medicine |
Guest Lecture Required reading: Inference
in Bayesian Networks |
Lecture Lecture |
6 (3/12) |
Data integration |
Introduction and overview of data
integration and networks prediction Inferring gene
networks from time series microarray data using dynamic Bayesian networks. |
Lecture Siddhartha B. |
7 (3/26) |
CLASS CANCELLED |
NSF workshop trip |
|
8 (4/2) |
Interactions and
Networks |
The synthetic
genetic interaction spectrum of essential genes (bio) Emergent
behavior of growing knowledge about molecular interactions Analysis of
the human protein interactome and comparison with
yeast, worm and fly interaction datasets (bio + analysis).
|
Yuanfang Yulia M. Jeffrey Breunig |
9 (4/9) |
|
Probabilistic
model of the human protein-protein interaction network Cluster
stability and the use of noise in interpretation of clustering. Discovery of biological
networks from diverse functional genomic data |
Yue Niu Namita Bisaria Alex O. |
10 (4/16) |
Interactions,
networks, and pathways |
Global mapping of pharmacological space Refinement and expansion of signaling pathways: The osmotic response network in yeast Herpesviral protein networks and their interaction with the human proteome (bio + analysis) |
Maria C. Adam Stoler Emily Capra |
11 (4/23) |
Comparative Genomics & Visualization |
Modeling cellular machinery through biological network comparison Detection of parallel functional modules by comparative analysis of genome sequences Click and Expander: a system for clustering and visualizing gene expression data |
Daniel Barrett Brendan Collins Bill Zeller |
12 (4/30) |
Project proposal presentations – Part II |
Final project proposals |
All students taking the course for credit |
|
Additional articles of interest |
Inferring pathways and networks with a Bayesian framework Click and Expander: a system for clustering and visualizing gene expression data Cluster stability and the use of noise in interpretation of clustering. (Interesting clustering algorithm + visualization) Factorgrams: A tool for visualizing multi-way
associations in biological data V Cheung, I Givoni,
D Dueck, BJ Frey |
|