High-performance Graph Processing on GPUs
Back in 2013, implementing high-performance graph algorithms for GPUs required manual coding in CUDA, a slow and difficult process. To significantly lower the level of effort required, we created IrGL, a language and compiler specifically for generating high-performance graph algorithm implementations for GPUs. Powered by three throughput optimizations, the IrGL-generated code outperformed nearly all handwritten graph algorithms achieving speedups of up to 6x.
Freed from the drudgery of writing low-level code, IrGL has allowed us to look at a number of problems revolving around graphs. We've used the high-performance implementations to identify key memory system bottlenecks that limit performance on current GPU architectures.
We've also translated graph database queries to IrGL and executed them on GPUs. In the course of extending this to the general problem of subgraph isomorphism (a key primitive in graph databases), we were named GraphChallenge 2017 champions for our implementation of the triangle-counting and k-truss problems.
Along the way, we also built Groute, a runtime for asynchronous multi-GPU graph analytics that has achieved order-of-magnitude improvements over existing synchronous implementations. We sped up exhaustive testing of software by traversing graphs that were too big to materialize in memory. Recently, we have also used IrGL's ability to generate hundreds of variants of the same graph algorithm to explore correctness and performance portability issues on GPUs.
Many interesting questions still remain unexplored, however, and I will summarize our current efforts in this area.
[Joint work with Keshav Pingali, Tal Ben-Nun, Michael Sutton, M. Amber Hassaan, Chad Voegele, Yi-Shan Lu, Ahmet Celik, Milos Gligoric, Sarfraz Khurshid, Tyler Sorensen and Alastair Donaldson]
Sreepathi Pai is an Assistant Professor of Computer Science at the University of Rochester. His research interests are in compilers, programming languages and implementation, performance models and computer architecture. His most recent research has revolved around the IrGL compiler that produces high-performance GPU code for graph algorithms.
He earned his PhD at the Indian Institute of Science and his B.E. in Computer Engineering at the University of Mumbai. Prior to joining the Department of Computer Science at Rochester, he was a Postdoctoral Fellow at the University of Texas at Austin.