Princeton COS 598A: Parallel Architecture and Programming

COS 598A: Parallel Architecture and Programming

Spring 2008, Princeton University

Course Summary | Announcement | Administrative | Course Projects | Resources

Course Summary

Parallel computing is a mainstay of modern computation and information analysis and management, ranging from scientific computing to information and data services. The inevitable and rapidly growing adoption of multi-core parallel architectures within a processor chip by all of the computer industry pushes explicit parallelism to the forefront of computing for all applications and scales, and makes the challenge of parallel programming and system understanding all the more crucial. The challenge of programming parallel systems has been highlighted as one of the three greatest challenges for the computer industry by leaders of even the largest desktop companies.

This course caters to students from all departments who are interested in using parallel computers of various scales to speed up the solution of problems. It also caters to computer science and engineering students who want to understand and grapple with the key issues in the design of parallel architectures and software systems. A significant theme of the treatment of systems issues is their implications for application software, drawing the connection between the two and thereby making the systems issues relevant to users of such systems as well. In addition to general programming and systems, there will be a significant focus on the modern trend toward increasingly more parallel multi-core processors within a single chip.

The first two thirds of the course will focus on the key issues in parallel programming and architecture. In the last third, we will examine some advanced topics ranging from methods to tolerate latency to programming models for clustered commodity systems to new classes of information applications and services that strongly leverage large-scale parallel systems. Students will do a parallel programming project, either with an application they propose from their area of interest or with one that is suggested by the instructors. Students will have dedicated access to two kinds of multi-core processor systems in addition to large-scale multiprocessors for their projects.

Prerequisites: COS 318 and 475 or instructor's permission.

Text Book: Parallel Computer Architecture: A Hardware-Software Approach by David Culler and Jaswinder P. Singh, with Anoop Gupta Tanenbaum, Morgan Kaufmann Publishers, 1998.

Announcements

5/1: Final assignment online
4/4: Parallelization primer and links to pthreads examples online
4/2: Project proposals online
4/1: Machine descriptions for projects available

Administrative Information

Lectures

Friday 13:30-16:20, Room: 402

Professor

Jaswinder Pal Singh, room 423 CS Building, 8-5329, jps at cs
Office Hours: Wednesday 3-4 pm, Thursday 1:30-2:30pm

Teaching Assistants

Christian Bienia, room 318a CS Building, 8-1759, cbienia at cs
Adrian Soviani, CS Building, asoviani at cs

Course Projects

The course requires the successful completion of a parallelization project. Progress must be shown by reaching two milestones, the last one of which requires the submission of a fully parallelized program which scales well. The submission of the final milestone is due on:

Dean's Date (Tuesday, May 13; 11:59pm)

We have a list of suggested projects from which you may choose if you are still looking for a proposal.

Final Milestone (Due on Tuesday, May 13; 11:59pm)

Please submit your parallelized program in a serial and a parallel version. Complete this summary with all the information asked for. You can use the following script to check for data races as described in the summary.

Milestone 1 (Due on Sunday, April 12; 11:59pm)

Please submit a serial version of your program, an input for your program and a brief summary. Use this template for the summary. The template contains further instructions and all the questions you have to answer, such as the following ones:

How to compile and run your program, including all arguments to use
Execution time required by the input you chose
An outline of the parallelization you are going to use

Precepts

Due to the individual nature of the course projects, no conventional precept is offered for the course. Instead, students are encouraged to meet with the teaching assistant to discuss issues with their projects on a personal basis.

Students are required to complete milestones with their projects during the course of the semester. The exact requirements and deadlines will be announced in advance.

Extensions

Project complexities vary, and the amount of work necessary to parallelize the chosen programs can differ. In justified cases, we offer a way to relax the requirements given above so that a parallel version can be submitted at a later time. Extensions are not granted automatically, you have to make a compelling case for it. If you need an extension, mail the instructors and ask for an appointment (see below). Have the following information at hand for the meeting:

Why your project requires more time than other projects
Your development roadmap in general and parallelization strategy in particular
Your estimate for the amount of time necessary to get an initial parallel version. We will use this information to agree on an appropriate deadline for your project.

Submission Instructions

You can submit your work on niagara.cs.princeton.edu. You need a working account for the CS department network and on niagara. A detailed description of that machine is available in the Resources section.

To submit your work, execute the following command:

/usr/local/bin/submit FILE...

You can specify any number of files, and you can execute the submit command as often as you want to add additional files or overwrite already submitted files. Do not forget to submit the template, too.

You can also delete your submission as follows:

/usr/local/bin/unsubmit

The unsubmit command does not take any arguments and will delete all previously submitted files. It is currently not possible to delete individual files from your submission.

Resources

Parallel Machines

There are three types of parallel machines available for you to use.

Quad-core Xeon Multiprocessors
- Hostname: hbar.cs.princeton.edu
- Architecture: Architecture: 8-way CMP/SMP (2 Quad-core processors, each of which consists of 2 Dual-core processors within the same package)
- Operating System: GNU/Linux
- Account Type: Princeton CS Department
- Notes: Hbar is a frontend node for development work and job control, submit jobs to get exclusive access for an identical machine for your program (see below)
SGI Origin 300 (Hecate Supercomputer)
- Hostname: hecate.princeton.edu
- Architecture: ccNUMA with 128 Intel Itanium 2 processors
- Operating System: GNU/Linux
- Notes: More information available from here.
Niagara Multiprocessor
- Hostname: niagara.cs.princeton.edu
- Architecture: 32-way CMP (1 Sun Niagara processor)
- Operating System: Sun Solaris
- Note: niagara can only be used from within the CS network
How to use these multiprocessors

IMPORTANT NOTE: You have to change your password on the niagara machine as soon as possible. The current default password is your login name. To change your password, log into the machines following the instructions below and type "passwd". The program will guide you through the process and ask for your new password. You will need an account on the CS network in order to access all machines. More information about that is available in the CS guide under section "Accounts".

Three different types of shared-memory computing resources are offered to allow you to work on your projects. All machines use a Unix operating system and can be accessed using SSH. If you are working from a workstation which also uses a Unix operating system, you can log into a machine called "hostname" as follows:

ssh hostname

Replace "hostname" with the correct name of the machine (a full description of all machines is given below). The computers do not share a common filesystem, you have to manually copy all files which you need to each machine. If you're working from a Unix workstation, you can use scp to copy a file "my_project.tgz" as follows:

scp my_project.tgz hostname: (Note the colon at the end of "hostname")

The SSH programs are also available for Windows programs, but you will have to install and setup the programs yourself. We do not offer any support for that.

All machines have a pre-installed version of gcc which you can use to compile your programs. While it is possible to write your programs directly on the servers, we recommend that you work offline on your local workstation using an editor or integrated development environment (IDE) which you are familiar with. This will allow you to write and test your program in a more convenient way. Only use the shared-memory computers for performance experiments.

Of the shared-memory resources which we offer, only "hecate" has a job submission system which guarantees that no more than one program runs on a set of CPUs at any time. On all other machines, it is possible that multiple computationally intensive programs run at the same time. This is a problem if you are running performance experiments and need the exact timing of the program. To get accurate timing numbers, monitor the execution of your program from another shell with "top" and re-run your program as needed. "top" is an interactive tool which lists the resource requirements of all running programs ordered by CPU time used. The column labeled "%CPU" shows the share of CPU time that each program has received during the last monitoring interval. Your program should have close to 100% CPU time. If other CPU-intensive programs are running at the same time, your program will get less CPU time and the other programs will show up at the beginning of the list generated by "top". On Solaris, you can use "prstat".

To use the job submission system on hecate you have to write a job submission script and use a small set of tools to manage your program runs. You can use the following example script "run.cmd" as a template (adjust the number of CPUs and other values as needed):

#PBS -l ncpus=4,walltime=1:00:00 #PBS -m abe #PBS -M your_puid@princeton.edu # # You can list all commands as you would use them in a shell, for example: cd my_projects # As the last command, simply execute your program as you would normally do, for example: ./my_program --threads=4
Submit your job as follows:

qsub run.cmd

More explanations are available in the hecate tutorial.

The same job control system is also used on hbar, but a slightly different job submission script should be used. Hbar is a "2-node cluster". The frontend, hbar.cs.prineton.edu, is the node which is publicly accessible. This node is intended for development work, test runs and job control. The second node is hidden and used as a dedicated compute node. It is intended for time-sensitive performance measurement. The job control system guarantees that submitted jobs have exclusive access to the resources specified in the job submission script, without any interference from other users.

You can use a job submission script such as the following:

#PBS -l nodes=1:ppn=8 #PBS -m abe #PBS -M your_puid@princeton.edu # cd my_projects ./my_program --threads=8
Job submission and control works like on hecate (see above).

A more detailed tutorial on the PBS job submission system can be found here

Tutorial and Documentation

Tutorials and Examples
- Introduction to Programming with Threads, by Andrew Birrell, 1989.
- GNU Pth Portable Threads, by Ralf S. Engelschall, 2006.
- Basic Use of Pthreads: An Introduction to POSIX Threads, 2004.
- 3 classic pthreads examples (includes producer / consumer problem)
- Several more complex examples for pthreads
- Parallelization primer given in class
Reference book
- Multithreaded Programming with Threads, Bil Lewis and Daniel J. Berg, Sun Microsystems, Prentice Hall 1998.