Quick links

Fast Cluster Failover Using Virtual Memory-Mapped Communication

Report ID:
TR-591-99
Date:
December 1998
Pages:
10
Download Formats:

Abstract:

This paper proposes a novel way to use virtual memory-mapped
communication (VMMC) to reduce the failover time on clusters. With the
VMMC model, applications' virtual address space can be efficiently
mirrored on remote memory either automatically or via explicit
messages. When a machine fails, its applications can restart from the
most recent checkpoints on the failover node with minimal memory
copying and disk I/O overhead. This method requires little change to
applications' source code. We developed two fast failover protocols:
deliverate update failover protocol (DU) and automatic
update failover protocol
(AU). The first can run on any system that
supports VMMC, whereas the other requires special network interface
support.

We implemented these two protocols on two different clusters that
supported VMMC communication. Our results with three transaction-based
applications show that both protocols work quite well. The deliberate
update protocol imposes 4-21% overhead when taking checkpoints every 2
seconds. If an application can tolerate 20% overhead, this protocol
can failover to another machine within 4 milliseconds in the best case
and from 0.1 to 3 seconds in the worst case. The failover performance
can be further improved by using special network interface
hardware. The automatic update protocol is able to take checkpoints
every 0.1 seconds with only 3-12% overhead. If 10% overhead is
allowed, it can failover applications from 0.01 to 0.4 seconds in the
worst case.

Follow us: Facebook Twitter Linkedin