Long-Term Caching Strategies for Very Large Distributed File Systems
This paper examines the feasibility of using long term (disk based)
caches in very large distributed file systems (DFSs). We begin with an
analysis of file access patterns in a distributed Unix workstation
environment, and identify properties of use to the DFS designer. We
then introduce long-term caching strategies that maintain consistency
while dramatically reducing the load on file servers. We describe a
number of algorithms for maintaining client caches, and present the
results of a trace-driven simulation that shows how relatively small
disk-based caches can be used to reduce server traffic by 60% to 90%.
Finally, we outline possible mechanisms for dynamically organizing
these caches into adaptive hierarchies to allow arbitrary scaling of
the number of clients and the use of low-bandwidth communication
networks. A small (2 or 3 level) hierarchy, coupled with smart caching
techniques, has the potential to reduce traffic by an order of
magnitude or more over a flat scheme.