Distributed computing using networked workstations offers cost-efficient parallel computing, but the higher rate of failure requires effective fault-tolerance. Asynchronous consistent checkpointing offers a low-overhead solution.
Parallel Virtual Machine (PVM) allows a heterogeneous network of UNIX workstations to serve immmediately as a distributed computer by providing message-passing services implemented on top of UNIX inter-process communication.
We briefly show that correct user-level support for an aggressive, asynchronous two-phase-commit checkpointing protocol for PVM's virtual circuit mode requires message logging.
Download a postscript copy of this document