Shared Virtual Memory Across SMP Nodes Using Automatic Update: Protocols and Performance
|Authors:||Bilas, Angelos, Iftode, Liviu, Martin, David, Singh, Jaswinder Pal|
As the workstation market moves form single processor to small-scale shared memory multiprocessors, it is very attractive to construct larger-scale multiprocessors by connecting widely available symmetric multiprocessors (SMPs) in a less tightly coupled way. Using a shared virtual memory (SVM) layer for this purpose preserves the shared memory programming abstraction across nodes. We explore the feasibility and performance implications of one such approach by extending the AURC (Automatic Update Release Consistency) protocol, used in the SHRIMP multicomputer, to connect hardware-coherent SMPs rather than uniprocessors. We describe the extended AURC protocol, and compare its performance with both the AURC uniprocessor node case as well as with an all-software Lazy Release Consistency (LRC) protocol extended for SMPs. We present results based on detailed simulations of two protocols (AURC and LRC) and two architectural configurations of a system with 16 processors; one with one processor per node (16 nodes) and one with four processors per node (4 nodes). We find that, unless the bandwidth of the network interface is increased, the network interface becomes the bottleneck in a clustered architecture especially for AURC. While a LRC protocol can benefit from the reduction in per processor communication in a clustered architecture, the write-through traffic in AURC increases significantly the communication demands per network interface. This causes more traffic contention and either prevents the performance of AURC from improving under SMP or hurts it severely for applications with significant communication requirements. Thus, while AURC performs better than LRC, for applications with high communication needs, the reverse may be true in clustered architectures. Among possible solutions, two are investigated in the paper: protocol changes and bandwidth increases. Further work is clearly needed on the systems and application sides to evaluate whether AURC can be extended for multiprocessor node systems.