Quick links

Availability, Scalability and Cost-Effectiveness of Cluster-Based Internet Infrastructures (Thesis)

Report ID:
January 2001
Download Formats:


Clusters of commodity computers are a cost-effective hardware platform for
large-scale Internet services. Availability and scalability are major
concerns in the design of infrastructures for such services. My
dissertation examines the opportunities in the data storage systems for
improving the availability and scalability of cluster-based Internet
infrastructures at a low cost. The goal of availability is to maximize the
percentage of client requests that succeed despite the failure of one or
more servers in the cluster. The goal of scalability is to efficiently
scale the server throughput with the cluster size. My basic approach is to
investigate the data and request distribution strategies across nodes in
the cluster, i.e. how to partition and replicate data on disk or in memory
and how to direct requests to the right partitions in order to achieve
high availability and scalability.
Maintaining availability in the face of failures is a critical requirement
for Internet services. Existing approaches in cluster-based data storage
rely on redundancy to survive a small number of failures, but the system
becomes largely unavailable if more failures occur. I study a failure
isolation approach in which each server in the cluster can deliver data to
clients independently of the failures of other servers. This approach is
complementary to existing redundancy-based methods: redundancy can mask
the first few failures, and failure isolation can take over and maintain
availability for the majority of clients if more failures occur.

The ability to achieve high quality of service with minimal committed
resources allows savings in many aspects including equipment cost, power
consumption, and administration effort for Internet services. I study how
to improve the price-performance ratio of Internet application servers by
efficiently managing a cluster of in-memory databases as the cache for
dynamic content. I observe that a good management strategy could be found
at least for certain applications despite the challenges of dynamic
content. It strives to maximize effective cache capacity and minimize
synchronization cost. It is light-weighted and adapts dynamically to the
changes in loads and access patterns.

Follow us: Facebook Twitter Linkedin