A Class of Randomized Strategies for Low-Cost Comparison of File Copies
In this paper we present a class of algorithms for comparison of remotely located file copies that use randomized signatures. We are able to show a simple technique that sends on the order of 4f log(n) bits, where f is the number of pages in the file. We later show how to improve the bound in the number of bits sent, making them grow with f as f log(f) and with n as log(n) log(log(n)). A third class of algorithms is presented, in which the number of signatures grows with f as frf , where r can be made to approach 1. This class of techniques exhibit a worse asymptotic behavior, but they perform very well
in practice. Previously published algorithms were aimed to diagnose 1 and 2 differing pages by sending O(log(n) log(log(n))) and O log2(n) log(log(n))) bits respectively. Moreover, our techniques prove to be very competitive in practice sending less bits for the cases f = 1 and f = 2 respectively.