Interactive Systems for Code and Data Demography
Programming—the means by which we tell computers what to do—has changed a lot over time. Programming today means programming alongside hundreds of fellow students, thousands of fellow professional software engineers at a particular company, or millions of fellow developers in the open-source community sharing their code online. In this talk, I will describe several interactive systems I have built that exploit the structure within large volumes of peer-produced code to help individual programmers learn how to write more correct, readable code.
These systems are made possible by code demography, which I define as statistics, algorithms, and visualizations that help people comprehend and interact with population-level structure and trends in large code corpora. The key to my approach is designing or inferring abstractions that capture critical features and abstract away variation that is irrelevant to the user. Code demography can reveal strategically diverse sets of aligned code examples which, according to theories of human concept learning, help people learn, i.e., construct mental abstractions that generalize well.
I will focus this talk on two families of systems that use program analysis, program synthesis, and visualization to either power active data-driven teaching in large programming classrooms or passive knowledge sharing within developer communities. Some of these systems have been integrated into UC Berkeley’s largest introductory programming class, which regularly enrolls over 1500 students. I will conclude with my vision for how the techniques of code demography can be generalized to more types of messy, structured, complex data corpora in order to help data scientists and enable new data-driven programming paradigms.
Elena Glassman is an EECS postdoctoral researcher at UC Berkeley, in the Berkeley Institute of Design, funded by the NSF ExCAPE Expeditions in Computer Augmented Program Engineering grant and the Moore/Sloan Data Science Fellowship from the UC Berkeley Institute for Data Science (BIDS). She earned her PhD in EECS at the MIT CS & AI Lab in August 2016, where she created scalable systems that analyze, visualize, and provide insight into the code of thousands of programming students. She has been a summer research intern at both Google and Microsoft Research, working on systems that help people teach and learn. She recently joined the program committees of ACM CHI, ACM Learning at Scale, and two SPLASH workshops on programming usability. She was awarded the 2003 Intel Foundation Young Scientist Award, both the NSF and NDSEG graduate fellowships, the MIT EECS Oral Master’s Thesis Presentation Award, a Best of CHI Honorable Mention, and the MIT Amar Bose Teaching Fellowship for innovation in teaching methods.
Prior to entering the field of human-computer interaction (HCI), she earned her MEng in the MIT CSAIL Robot Locomotion Group and was a visiting researcher at Stanford in the Stanford Biomimetics and Dextrous Manipulation Lab.