Recovery in a Triple Modular Redundant Database System
In a Triple Modular Redundant (TMR) database system the database is fully replicated at three computers. All transactions are executed at all nodes in the same relative order. The system can tolerate the arbitrary failure of a single computer since the correct data can be obtained from the two operating
copies. After a failure, it is important to repair the computer so that the system can tolerate additional future failures. Repair in this case involves getting a correct and up to date copy of the database, without halting the two operational nodes. In this paper we analyze this database recovery problem.
We describe a solution that has been implemented on an experimental TMR system running on SUN-s/120 workstations. We also present performance results that illustrate the cost of recovery.