Princeton University
Computer Science Department

Computer Science 597C
AdTopCS: Scalable Systems and Applications in Data Analysis

Jaswinder Pal Singh

Fall 2005


Directory
General Information

Course Summary

Parallel and distributed computer systems have become mainstream vehicles for building applications, ranging from scientific and commercial computing to novel information services on the Internet. With the dramatic increase in the amounts and rates of data available in many fields -- generated by experimental observation, computer simulation or human productivity and creativity -- these systems are increasingly being used to analyze these data, either to build compelling new information services for end users in society or to uncover insights and reverse-engineer underlying processes from the data. Given the complexity of the systems and applications, it is important that the designers of computing infrastructure understand the needs of applications and that application/service builders understand how best to exploit the systems. In many areas -- e.g. Web search and scientific computing -- it is a combination of these skills, used together, that leads to the design of the most effective applications as well as systems.

We will take an integrative approach to investigating scalable (parallel and distributed) systems and applications in data analysis in a variety of areas. Interestingly, several key data analysis and mining approaches are fundamentally common across disciplines, but with variants and applications developed separately, so there is a lot for them to learn from one another. The main focus will be on some key broadly applicable data analysis methods and approaches, how to exploit scalable systems to construct effective applications, as well on how best to make or design choices in system software and architecture given the needs of the applications. Examples of data analysis approaches we will study include the use of principal components (broadly defined) and clustering in fields ranging from biology to new Internet data services, methods that learn from data, and applications that process, analyze and route streaming data. The relative emphasis on systems and methods or applications in the course will be determined primarily by the interests of attending students.

The course is intended to be of interest to students in Computer Science, as well as students in other departments across the University who are interested either in using parallel and distributed systems effectively (for any kinds of applications, including simulation) or in scalable data analysis.


Administrative Information

Lectures: F 1330-1620, Room: 302, CS Building

Professor: Jaswinder Pal Singh - 423 CS Building - 258-5329 jps@cs.princeton.edu

Graduate Coordinator: Melissa Lawson - 310 CS Building - 258-5387 mml@cs.princeton.edu

Teaching Assistants: TBA