Organizing Computation for High-Performance Visual Computing
Future visual computing applications—from photorealistic real-time rendering, to 4D light field cameras, to pervasive sensing and computer vision—demand orders of magnitude more computation than we currently have. From data centers to mobile devices, performance and energy scaling is limited by locality (the distance over which data has to move, e.g., from nearby caches, far away main memory, or across
networks) and parallelism. Because of this, I argue that we should think of the performance and efficiency of an application as determined not just by the algorithm and the hardware on which it runs, but critically also by the organization of computations and data. For algorithms with the same complexity—even the exact same set of arithmetic operations and data—executing on the same hardware, the order and granularity of execution and placement of data can easily change performance by an order of magnitude because of locality and parallelism. To extract the full potential of our machines, we must treat the organization of computation as a first class concern while working across all levels from algorithms and data structures, to compilers, to hardware.
This talk will present facets of this philosophy in systems I have built for visual computing applications from image processing and vision, to 3D rendering, simulation, optimization, and 3D printing. I will show that, for data-parallel pipelines common in graphics, imaging, and other data-intensive applications, the organization of computations and data for a given algorithm is constrained by a fundamental tension between parallelism, locality, and redundant computation of shared values. I will focus particularly on the Halide language and compiler for image processing, which explicitly separates what computations define an algorithm from the choices of organization which determine parallelism, locality, memory footprint, and synchronization. I will show how this approach can enable much simpler programs to deliver performance often many times faster than the best prior hand-tuned C, assembly, and CUDA implementations, while scaling across radically different architectures, from ARM cores, to massively parallel GPUs, to FPGAs and custom ASICs.
Jonathan Ragan-Kelley is a postdoc in computer science at Stanford. He works on high-efficiency visual computing, including systems, compilers, and architectures for image processing, vision, 3D rendering, 3D printing, physical simulation, and scientific computing.
He earned his PhD in Computer Science at MIT in 2014, where he built the Halide language for high-performance image processing. Halide is now used throughout industry to deploy code to hundreds of millions of smartphones and process tens of billions of images per day. Jonathan previously built the Lightspeed preview system, which was used on over a dozen films at Industrial Light & Magic and was a finalist for an Academy Technical Achievement Award. He has worked in GPU architecture, compilers, and research at NVIDIA, Intel, and ATI.