|
|
Parallel
computing is a mainstay of modern computation and information analysis and
management, ranging from scientific computing to information and data services.
The inevitable and rapidly growing adoption of multi-core parallel architectures
within a processor chip by all of the computer industry pushes explicit
parallelism to the forefront of computing for all applications and scales, and
makes the challenge of parallel programming and system understanding all the
more crucial. The challenge of programming parallel systems has been
highlighted as one of the three greatest challenges for the computer industry by
leaders of even the largest desktop companies.
This course caters to students from all departments who are interested in using
parallel computers of various scales to speed up the solution of problems. It
also caters to computer science and engineering students who want to understand
and grapple with the key issues in the design of parallel architectures and
software systems. A significant theme of the treatment of systems issues is
their implications for application software, drawing the connection between the
two and thereby making the systems issues relevant to users of such systems as
well. In addition to general programming and systems, there will be a
significant focus on the modern trend toward increasingly more parallel
multi-core processors within a single chip.
The first two thirds of the course will focus on the key issues in parallel
programming and architecture. In the last third, we will examine some advanced
topics ranging from methods to tolerate latency to programming models for
clustered commodity systems to new classes of information applications and
services that strongly leverage large-scale parallel systems. Students will do a
parallel programming project, either with an application they propose from their
area of interest or with one that is suggested by the instructors. Students will
have dedicated access to two kinds of multi-core processor systems in addition
to large-scale multiprocessors for their projects.
Prerequisites: COS 318 and 475 or instructor's permission.
Text Book: Parallel Computer Architecture: A Hardware-Software Approach by David Culler and Jaswinder P. Singh, with Anoop Gupta Tanenbaum, Morgan Kaufmann Publishers, 1998.
There are three types of parallel machines available for you to use.
IMPORTANT NOTE: You have to change your password on the niagara
machine as soon as possible. The current default password is your
login name. To change your password, log into the machines following
the instructions below and type "passwd". The program will guide you through
the process and ask for your new password. You will need an account on the CS network in order to access all machines. More information about that is available in the CS guide under section "Accounts".
Three different types of shared-memory computing resources are offered to
allow you to work on your projects. All machines use a Unix operating system
and can be accessed using SSH. If you are working from a workstation which
also uses a Unix operating system, you can log into a machine called
"hostname" as follows:
ssh hostname
Replace "hostname" with the correct name of the machine (a full description
of all machines is given below). The computers do not share a common
filesystem, you have to manually copy all files which you need to each
machine. If you're working from a Unix workstation, you can use scp to copy
a file "my_project.tgz" as follows:
scp my_project.tgz hostname: (Note the colon at the end of "hostname")
The SSH programs are also available for Windows programs, but you will have
to install and setup the programs yourself. We do not offer any support for
that.
All machines have a pre-installed version of gcc which you can use to
compile your programs. While it is possible to write your programs directly
on the servers, we recommend that you work offline on your local workstation
using an editor or integrated development environment (IDE) which you are
familiar with. This will allow you to write and test your program in a more
convenient way. Only use the shared-memory computers for performance
experiments.
Of the shared-memory resources which we offer, only "hecate" has a job
submission system which guarantees that no more than one program runs on a
set of CPUs at any time. On all other machines, it is possible that multiple
computationally intensive programs run at the same time. This is a problem
if you are running performance experiments and need the exact timing of the
program. To get accurate timing numbers, monitor the execution of your
program from another shell with "top" and re-run your program as needed.
"top" is an interactive tool which lists the resource requirements of all
running programs ordered by CPU time used. The column labeled "%CPU" shows
the share of CPU time that each program has received during the last
monitoring interval. Your program should have close to 100% CPU time. If
other CPU-intensive programs are running at the same time, your program will
get less CPU time and the other programs will show up at the beginning of
the list generated by "top". On Solaris, you can use "prstat".
To use the job submission system on hecate you have to write a job
submission script and use a small set of tools to manage your program runs.
You can use the following example script "run.cmd" as a template (adjust the
number of CPUs and other values as needed):
#PBS -l ncpus=4,walltime=1:00:00
#PBS -m abe
#PBS -M your_puid@princeton.edu
#
# You can list all commands as you would use them in a shell, for example:
cd my_projects
# As the last command, simply execute your program as you would normally do,
for example:
./my_program --threads=4
Submit your job as follows:
qsub run.cmd
More explanations are available in the hecate
tutorial.
The same job control system is also used on hbar, but a slightly different job submission script should be used. Hbar is a "2-node cluster". The frontend, hbar.cs.prineton.edu, is the node which is publicly accessible. This node is intended for development work, test runs and job control. The second node is hidden and used as a dedicated compute node. It is intended for time-sensitive performance measurement. The job control system guarantees that submitted jobs have exclusive access to the resources specified in the job submission script, without any interference from other users.
You can use a job submission script such as the following:
#PBS -l nodes=1:ppn=8
#PBS -m abe
#PBS -M your_puid@princeton.edu
#
cd my_projects
./my_program --threads=8
Job submission and control works like on hecate (see above).
A more detailed tutorial on the PBS job submission system can be found here