Quick links

Checkpointing Multicomputer Applications

Report ID:
TR-316-91
Date:
March 1991
Pages:
25
Download Formats:
[PDF]

Abstract:

Efficient checkpointing and resumption of multicomputer applications is
essential if multicomputers are to support time-sharing and the
automatic resumption of jobs after a system failure. We present a
checkpointing scheme that is transparent, imposes overhead only during
checkpoints, requires minimal message logging, and allows for quick
resumption of execution from a checkpointed image. Since checkpointing
multicomputer applications poses requirements different from those
posed by checkpointing general distributed systems, existing
distributed checkpointing schemes are inadequate for multicomputer
checkpointing. Our checkpointing scheme makes use of special
properties of multicomputer interconnection networks to satisfy this
new set of requirements.

Follow us: Facebook Twitter Linkedin