SiteRank: Link-Based Relevance Computation for Persistent Search
Abstract:
Existing search services rely heavily on citation-based authority (e.g. PageRank) to assess the quality of publications. The quality and relevance of results is particularly important in persistent search, but the current rank computations are strongly biased against new pages. We propose SiteRank, a new ranking mechanism that handles new publications well and also dramatically reduces the communication and computation costs.
This performance improvement is especially valuable when authority is computed in a persistent search service. Current systems, whether small-scale notifiers (e.g. CNN Alerts) or persistent queries on traditional search engines (e.g. Google Alerts), suffer from limited coverage and/or low refresh rates. We propose Distributed Persistent Search, a new architecture based on a publishsubscribe framework that achieves linear improvement in publication processing and notification routing, as a function of the number of servers used.