Selected Research

On Linear Identifiability of Learned Representations

Identifiability is a desirable property of a statistical model: it implies that the true model parameters may be estimated to any desired precision, given sufficient computational resources and data. We study identifiability in the context of representation learning: discovering nonlinear data representations that are optimal with respect to some downstream task. When parameterized as deep neural networks, such representation functions typically lack identifiability in parameter space, because they are overparameterized by design. In this paper, building on recent advances in nonlinear ICA, we aim to rehabilitate identifiability by showing that a large family of discriminative models are in fact identifiable in function space, up to a linear indeterminacy. Many models for representation learning in a wide variety of domains have been identifiable in this sense, including text, images and audio, state-of-the-art at time of publication. We derive sufficient conditions for linear identifiability and provide empirical support for the result on both simulated and real-world data.

Geoffrey Roeder, Luke Metz, Diederik P. Kingma

In submission. arXiv link.
I gave a talk at Yale on this work in March 2020.
I presented an early version of this work at the Conference on the Mathematical Theory of Deep Neural Networks (a.k.a. DeepMath) 2019.

A Data-Driven Computational Scheme for the Nonlinear Mechanical Properties of Cellular Mechanical Meta-Materials under Large Deformation

Cellular mechanical metamaterials are a special class of materials, whose mechanical properties are primarily determined by their geometry. But, capturing the nonlinear mechanical behavior of these materials, especially with complex geometries and under large deformation, can be challenging due to inherent computational complexity. In this work, we propose a data-driven multiscale computational scheme to as a possible route to resolve this challenge. We use a neural network to approximate the effective strain energy density as a function of cellular geometry and overall deformation. The network is constructed by “learning” from the data generated by finite element calculation of a set of representative volume elements at cellular scales. This effective strain energy density is then used to predict the mechanical responses of cellular materials at larger scales. Compared with direct finite element simulation, the proposed scheme can reduce the computational time up to two orders of magnitude. Potentially, this scheme can facilitate new optimization algorithms for designing cellular materials of highly specific mechanical properties.

Tianju Xue, Alex Beatson, Maurizio Chiaramonte, Geoffrey Roeder, Jordan T. Ash, Yigit Menguc, Sigrid Adriaenssens, Ryan P. Adams and Sheng Mao

Accepted for publication in the journal Soft Matter (Royal Society of Chemisty). Paper link.

Learning Composable Energy Surrogates for PDE Order Reduction

Meta-materials are an important emerging class of engineered materials in which complex macroscopic behaviour--whether electromagnetic, thermal, or mechanical--arises from modular substructure. Simulation and optimization of these materials are computationally challenging, as rich substructures necessitate high-fidelity finite element meshes to solve the governing PDEs. To address this, we leverage parametric modular structure to learn component-level surrogates, enabling cheaper high-fidelity simulation. We use a neural network to model the stored potential energy in a component given boundary conditions. This yields a structured prediction task: macroscopic behavior is determined by the minimizer of the system's total potential energy, which can be approximated by composing these surrogate models. Composable energy surrogates thus permit simulation in the reduced basis of component boundaries. Costly ground-truth simulation of the full structure is avoided, as training data are generated by performing finite element analysis with individual components. Using dataset aggregation to choose training boundary conditions allows us to learn energy surrogates which produce accurate macroscopic behavior when composed, accelerating simulation of parametric meta-materials.

Alex Beatson, Jordan T. Ash, Geoffrey Roeder, Tianju Xue, Ryan P. Adams

Accepted at NeurIPS 2020 (Oral presentationo). arXiv link.

Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems

We introduce a flexible, scalable Bayesian inference framework for nonlinear dynamical systems characterised by distinct and hierarchical variability at the individual, group, and population levels. Our model class is a generalisation of nonlinear mixed-effects (NLME) dynamical systems, the statistical workhorse for many experimental sciences. We cast parameter inference as stochastic optimisation of an end-to-end differentiable, block-conditional variational autoencoder. We specify the dynamics of the data-generating process as an ordinary differential equation (ODE) such that both the ODE and its solver are fully differentiable. This model class is highly flexible: the ODE right-hand sides can be a mixture of user-prescribed or "white-box" sub-components and neural network or "black-box" sub-components. Using stochastic optimisation, our amortised inference algorithm could seamlessly scale up to massive data collection pipelines (common in labs with robotic automation). Finally, our framework supports interpretability with respect to the underlying dynamics, as well as predictive generalization to unseen combinations of group components (also called "zero-shot" learning). We empirically validate our method by predicting the dynamic behaviour of bacteria that were genetically engineered to function as biosensors.

Geoffrey Roeder, Paul K. Grant, Andrew Phillips, Neil Dalchau, Edward Meeds

Accepted for publication at ICML 2019: arXiv link, poster link

Blog post: Efficient Inference for Dynamical Models using Variational Autoencoders

Design Motifs for Probabilistic Generative Design

Generative models can be used to produce designs that obey hard-to-specify constraints while still producing plausible examples. Recent examples of this include drug design, text with desired sentiment, or images with desired captions. However, most previous applications of generative models to design are based on bespoke, ad-hoc procedures. We give a unifying treatment of generative design based on probabilistic generative models. Some of these models can be trained end-to-end, can take advantage of both labelled and unlabelled examples, and automatically trade off between different design goals.

Geoffrey Roeder, Nathan Killoran (co-first author), Will Grathwohl, David Duvenaud

Submitted to ICLR 2018 workshop track

Backpropagation through the Void: Optimizing Control Variates for Black-Box Gradient Estimation

Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud

Published at ICLR 2018
Accepted as a contributed talk at the Deep Reinforcement Learning Symposium, NeurIPS 2017.
I gave a talk on the paper at the University of Cambridge in November, 2017

Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference

We propose a simple and general variant of the standard reparameterized gradient estimator for the variational evidence lower bound. Specifically, we remove a part of the total derivative with respect to the variational parameters that corresponds to the score function. Removing this term produces an unbiased gradient estimator whose variance approaches zero as the approximate posterior approaches the exact posterior. We analyze the behavior of this gradient estimator theoretically and empirically, and generalize it to more complex variational distributions such as mixtures and importance-weighted posteriors.

Geoffrey Roeder, Yuhai Wu, David Duvenaud

Published at NeurIPS 2017
A short version of the paper was published at NeurIPS 2016's Advances in Approximate Bayesian Inference workshop
Andrew Miller wrote a great blog post exploring the key ideas of the paper.

MatLearn: Machine Learning Algorithm Implementations in Matlab

Link to website

I merged multiple code bases from many graduate student contributors into a publishable software package, and added a variety of unsupervised learning algorithms including sparse autoencoders, Hidden Markov Models, Linear-Gaussian State Space Models, t-Distributed Stochastic Neighbour Embedding, and Convolutional Neural Networks for image classification.

Download package