COS 598A: Parallel Architecture and Programming  

Spring 2007, Princeton University


Course Summary | Announcement | Administrative | Schedule | Resources


Course Summary

Parallel computing is a mainstay of modern computation and information analysis and management, ranging from scientific computing to information and data services.  The inevitable and rapidly growing adoption of multi-core parallel architectures within a processor chip by all of the computer industry pushes explicit parallelism to the forefront of computing for all applications and scales, and makes the challenge of parallel programming and system understanding all the more crucial.  The challenge of programming parallel systems has been highlighted as one of the three greatest challenges for the computer industry by leaders of even the largest desktop companies.

This course caters to students from all departments who are interested in using parallel computers of various scales to speed up the solution of problems. It also caters to computer science and engineering students who want to understand and grapple with the key issues in the design of parallel architectures and software systems.  A significant theme of the treatment of systems issues is their implications for application software, drawing the connection between the two and thereby making the systems issues relevant to users of such systems as well.  In addition to general programming and systems, there will be a significant focus on the modern trend toward increasingly more parallel multi-core processors within a single chip.

The first two thirds of the course will focus on the key issues in parallel programming and architecture.  In the last third, we will examine some advanced topics ranging from methods to tolerate latency to programming models for clustered commodity systems to new classes of information applications and services that strongly leverage large-scale parallel systems. Students will do a parallel programming project, either with an application they propose from their area of interest or with one that is suggested by the instructors. Students will have dedicated access to two kinds of multi-core processor systems in addition to large-scale multiprocessors for their projects.

Prerequisites: COS 318 and 475 or instructor's permission.

Text Book: Parallel Computer Architecture: A Hardware-Software Approach by David Culler and Jaswinder P. Singh, with Anoop Gupta Tanenbaum, Morgan Kaufmann Publishers, 1998.


Announcements


Administrative Information

Lectures

Professors

Teaching Assistants


Syllabus (Tentative)

Week 1 (2/5): Overview of Parallel Architecture (Lecture notes)
Motivation for parallel systems; history of programming models, architectures and convergence to modern system design; fundamental design issues; trends in modern processor and communication architecture and in the usage of parallel computers.
 
Week 2 (2/12): Parallel Programs (Lecture notes)
A structured process for parallelizing applications, illustrated through some representative case studies.  What parallel programs look like in three programming models: a shared address space, explicit message passing, and a model proposed for multicore processors.  Synchronization and coordination methods for parallel programs, including multithreading and event-based pipeline models.
 
Week 3 (2/19): Shared Memory Multiprocessors (Lecture notes, warm-up parallel programming assignment)
Overview of small-scale cache-coherent shared address space multiprocessors that have a uniformly accessible memory system.  Overview of cache coherence and memory consistency, and an introduction to the design space of protocols and their tradeoffs.  How synchronization is implemented in such systems, and the implications for parallel software.
 
Week 4 (2/26): Project Proposal Presentations
 Students discuss projects they plan to do for the course, chosen from their own ideas or ones that the instructors provide.
 
Week 4 (2/27): Invited Lecture on Multicore Processors by Prof. Kunle Olukotun (Lecture notes) An in-depth look at the history, motivation and trends in on-chip parallel design at processor scale, namely the inevitable trend toward modern multi-core processors.  Design issues for these systems, industrial case studies, and future directions.  Students discuss projects they plan to do for the course, chosen from their own ideas or ones that the instructors provide.
 
Week 5 (3/5): Programming for Performance (lecture notes)
(Christian Bienia's Tools Slides) An exploration of the key issues in writing high performance parallel programs, following the stages of the structured process above.  Focus on the shared address space model.  Load balancing, data locality, communication cost, etc. An in-depth look at some case studies, and the implications for programming models.
 
Week 6 (3/12): Workload-driven Evaluation and Project Status
Issues in evaluating parallel systems and design tradeoffs using application workloads.  Methods for scaling workloads and machines, for evaluating real systems, and for evaluating ideas and tradeoffs through simulation. Characterizing workloads for use in system evaluation.
 
Week 7 (3/26): Scalable Computers (lecture notes)
The design of scalable systems, which involve physically distributing memory with  processing nodes. Methods for realizing programming models in distributed-memory systems, and the relationship between support in the communication architecture and efficiency of realizing programming models. The implications of communication architecture support for the design of application software.  Scalable synchronization methods.
 
Week 8 (4/2): Directory-based Cache Coherence (lecture notes)
Supporting a cache-coherent shared address space on scalable systems with physically distributed memory. An overview of directory-based approaches, assessment of key tradeoffs and design challenges, and implications for the design of application software. Synchronization in such systems, and case studies of commercial realizations.
 
Week 9 (4/9): Latency Tolerance (lecture notes)
Methods for tolerating the high latency of memory access and inter-processor communication, which unlike bandwidth and processing limitations is not solved by throwing more money at the problem. Trading bandwidth for latency, using techniques like precommunication, block data transfer, relaxed memory consistency models, and multithreading within a processing core.
 
Week 10 (4/16): Clusters and their Applications (cancelled)
Commodity-based systems that do not lend themselves well to supporting a cache-coherent shared address space, but that are increasingly important in practice and at large scale, for both scientific computing as well as scalable information services.  Programming models for such systems, including a symmetric but non-coherent shared address space (using the Unified Parallel C example) and explicit message passing.  Implications for application software.
 
Week 11 (4/23): Invited lecture by Dr. Andrew Birrell from Microsoft Research (lecture notes)
Mutual Exclusion: Some History, Some Problems, and a Glimmer of Hope
 
Week 12 (4/30): Invited lecture by Prof. Kathy Yelick from UC Berkeley

5/15: Final project due
5/16: Final project presentations starting 1:30pm


Precepts

Due to the individual nature of the course projects, no conventional precept is offered for the course. Instead, students are encouraged to meet with the teaching assistant to discuss issues with their projects on a personal basis.

Students are required to complete milestones with their projects during the course of the semester. The exact requirements and deadlines will be announced in advance. All information relevant for submission and deadlines are available on the precept website.

Resources

Parallel Machines

There are three types of parallel machines available for you to use.

Tutorial and Documentation