Data Caching in an Information Retrieval System | Computer Science Department at Princeton University

Report ID:

TR-065-86

Authors:

Garcia-Molina, Hector / Alonso, Rafael / Barbara, Daniel / Abad, Soraya

Date:

November 1986

Pages:

Download Formats:

[PDF]

Abstract:

Currently existing computer communication networks give users access to an ever growing number of information retrieval systems. Some of those services are provided by commercial enterprises (examples are Dow Jones and The Source), while others are research efforts (such as the Boston Community Information System). In many cases these systems are accessed from personal or medium size computers which usually have available sizable amounts of local storage. To improve the response time of user queries, it becomes desirable to cache data at the user's site. However, to reduce the overhead of maintaining multiple copies, it may be appropriate to allow copies to diverge in a controlled fashion. This makes it possible to propagate updates to the copies efficiently, e.g., when the system is lightly loaded, when communication tariffs are lower, or by batching together updates. It also makes it possible to access the copies even when the communication lines or the central site are down. In this paper
we present the notion of quasi-copies which embodies the ideas sketched above. We also define the types of deviations that seem useful, and discuss the available implementation strategies.