Princeton University
Scalable I/O Research




Overview

The Scalable I/O Initiative (SIO) is a multi-institutional effort to attack problems in moving data into, out of and around a scalable parallel machine. Princeton University is a member of the Operating System working group (group leader, Professor Kai Li). Other members of the working group include Carnegie Mellon University, University of Arizona, and University of Washington . Industry sites include Intel Scalable Systems Division , and IBM TJ Watson Research Center.

Our SIO research focus on the memory hierarchy and file systems. We plan to implement the memory server and checkpointing tools, and improve the performance of Intel Parallel File System (PFS) on the Paragon multi-computers .

We have been an active participant in the drafting an SIO Low-level API(LLAPI). The API was formally announced to the parallel computing community at Supercomputing '96. At the same conference, Princeton also announced the first reference implementation of the SIO LLAPI for Intel Paragon.

Closely related research efforts at Princeton include the Distributed Frame Buffer project (a collaboration with Pat Hanrahan at Stanford University) and the SHRIMP (Scalable High-performance Really Inexpensive MultiProcessor) project.

Some other institutions involved in SIO activities are Argonne National Laboratory (Integration and applications), Rice University (Compiler and runtime), Syracuse University (Compiler and runtime), University of Illinois at Urbana/Champaign (Characterization), and University of Maryland (Compiler and runtime).




People

The SIO group at Princeton currently includes





Our SIO Projects

Reference Implementation of SIO Low-level API

Mr. NinHui Sun is a visiting computer scientist from The Institute of Computer Technology, a research institution of Chinese Academy of Science. He has just completed a reference implementation of the SIO Low-level API for Intel Paragon. The test results on our 10-node Paragon indicate that SIO LLAPI can be more efficient than Intel PFS in all cases. His implementation is available to whoever is interested in trying out the SIO LLAPI. The source code for this implementation is available as a compressed tar file.

User-level Checkpointing Facility

Yuqun Chen has finished implementing and released a user-level checkpointing facility for Paragon OS. We have successfully ported James Plank's UNIX checkpointing library libckpt to Paragon OS. Based on libckpt, two checkpointing libraries, libMPIckpt and libNXckpt have been developed for Paragon OS R1.4. They are completely user-level and support most of NX and MPI message-passing calls, respectively.

The SIO Integration Group at Caltech has formally released the checkpointing libraries. Interested users may either contact Dr. Heidi Lorenz-Wirzba or visit our web page for further information.

Memory Server

Yuanyuan Zhou will implement a memory server for Intel Paragon. The memory server model extends the memory hierarchy of multi-computers by introducing a remote memory layer whose latency lies somewhere between local memory and disk. This mechanism should improve Paragon's virtual memory performance by manifolds. Yuanyuan Zhou has completed a prototype using Mach 3.0 external pager and Intel NX message passing library.

Improving Paragon Parallel File System

Sanjeev Kumar has examined the structure of Intel PFS and tried to find ways to improve its performance.


Related Papers

A Study of Integrated Prefetching and Caching Strategies.
Pei Cao, Edward W. Felten, Anna R. Karlin, and Kai Li. In Proceedings of the 1995 ACM SIGMETRICS, 1995.

Evaluating Multi-Port Frame Buffer Designs for a Mesh-Connected Multicomputer.
Gordon Stoll, Bin Wei, Douglas W. Clark, Edward W. Felten, Kai Li, and Patrick Hanrahan. Int'l Symposium on Computer Architecture, June 1995.

Synchronization for a Multi-Port Frame Buffer on a Mesh-Connected Multicomputer.
Bin Wei, Gordon Stoll, Douglas W. Clark, Edward W. Felten, and Kai Li. Parallel Rendering Symposium, Oct. 1995.

Libckpt: Transparent Checkpointing under Unix.
James S. Plank, Micah Beck, Gerry Kingsley, Kai Li.
Proceedings of the 1995 Winter USENIX Technical Conference. 1995.

Storage Alternatives for Mobile Computers.
Fred Douglis, Ramon Caceres, Frans Kaashoek, Kai Li, Brian Marsh and Joshua A. Tauber.
Proceedings of the First USENIX Symposium on Operating Systems Design and Implementation (OSDI). November 1994. Pages 25--38.

Implementation and Performance of Application-Controlled File Caching.
Pei Cao, Edward W. Felten, and Kai Li.
Proceedings of the First USENIX Symposium on Operating Systems Design and Implementation (OSDI). November 1994. Pages 165--178.

Faster Checkpointing with N+1 Parity.
James S. Plank and Kai Li.
IEEE 24th International Symposium on Fault-Tolerant Computing. Austin, TX, Pages 288--297, June 1994.

Application-Controlled File Caching Policies.
Pei Cao, Edward Felten and Kai Li.
Proceedings of the 1994 Summer USENIX Technical Conference. Pages 171--182. June 1994.

"Low-Latency Concurrent Checkpoint for Parallel Programs."
Kai Li, Jeffrey Naughton and James Plank.
IEEE Transactions on Parallel and Distributed Computing , 5(8):874--879. 1994.

Performance Results of ICKP - A Consistent Checkpointer on the iPSC/860.
James Plank and Kai Li.
IEEE Parallel and Distributed Technologies. 2(2):62--67. 1994.


Computer Science Department at Princeton University

Principal Contact:
li@cs.princeton.edu