Asynchronous Checkpointing for PVM Requires Message-Logging

Kevin Skadron

7 February 1996


Abstract

Distributed computing using networked workstations offers cost-efficient parallel computing, but the higher rate of failure requires effective fault-tolerance. Asynchronous consistent checkpointing offers a low-overhead solution.

Parallel Virtual Machine (PVM) allows a heterogeneous network of UNIX workstations to serve immmediately as a distributed computer by providing message-passing services implemented on top of UNIX inter-process communication.

We briefly show that correct user-level support for an aggressive, asynchronous two-phase-commit checkpointing protocol for PVM's virtual circuit mode requires message logging.


Download a postscript copy of this document


Back
Back to index of Kevin's other undergraduate work
Kevin's home page
Princeton CS home page

Updated Feb. 8, 1996
Copyright © 1996, Kevin Skadron