David Duvenaud
  • Associate Professor, Department of Statistical Sciences and Department of Computer Science, Faculty of Arts & Science, University of Toronto

    Faculty Member, Vector Institute

    Canada CIFAR Artificial Intelligence Chair

    Co-founder, Invenia

    Canada Research Chair in Generative Models

    Website | Google Scholar

Research Interests

  • Approximate inference
  • Automatic model-building
  • Model-based optimization


David Duvenaud is an associate professor in computer science and statistics at the University of Toronto, and holds a Canada Research Chair in generative models. He is also a founding member of the Vector Institute. His postdoc was at Harvard University, where he worked on hyperparameter optimization, variational inference, deep learning and automatic chemical design. He did his Ph.D at the University of Cambridge, studying Bayesian nonparametrics with Zoubin Ghahramani and Carl Rasmussen. David spent two summers on the machine vision team at Google Research, and also co-founded Invenia, an energy forecasting and trading company.

Research Activity and News


Stochastic Hyperparameter Optimization Through Hypernetworks. Models are usually tuned by nesting optimization of model weights inside the optimization of hyperparameters. We collapse this nested optimization into joint stochastic optimization of weights and hyperparameters. Our method trains a neural net to output approximately optimal weights as a function of hyperparameters. This method converges to locally optimal weights and hyperparameters for sufficiently large hypernetworks. We compare this method to standard hyperparameter optimization strategies and demonstrate its effectiveness for tuning thousands of hyperparameters. Jonathan Lorraine, David Duvenaudarxiv | bibtex | code


Isolating Souces of Disentanglement in Variational Autoencoders. We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables. We use this to motivate our β-TCVAE (Total Correlation Variational Autoencoder), a refinement of the state-of-the-art β-VAE objective for learning disentangled representations, requiring no additional hyperparameters during training. We further propose a principled classifier-free measure of disentanglement called the mutual information gap (MIG). We perform extensive quantitative and qualitative experiments, in both restricted and non-restricted settings, and show a strong relation between total correlation and disentanglement, when the latent variables model is trained using our framework. Tian Qi Chen, Xuechen Li, Roger Grosse, David Duvenaud, arxiv


Noisy Natural Gradient as Variational Inference. Bayesian neural nets combine the flexibility of deep learning with uncertainty estimation, but are usually approximated using a fully-factorized Guassian. We show that natural gradient ascent with adaptive weight noise implicitly fits a variational Gassuain posterior. This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, allowing us to scale to modern-size convnets. Our noisy K-FAC algorithm makes better predictions and has better-calibrated uncertainty than existing methods. This leads to more efficient exploration in active learning and reinforcement learning. Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse, arxiv


Backpropagation through the Void: Optimizing control variates for black-box gradient estimation. We learn low-variance, unbiased gradient estimators for any function of random variables. We backprop through a neural net surrogate of the original function, which is optimized to minimize gradient variance during the optimization of the original objective. We train discrete latent-variable models, and do continuous and discrete reinforcement learning with an adaptive, action-conditional baseline. Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, David Duvenaud International Conference on Learning Representations, 2018 arxiv | code | slides | bibtex


Automatic chemical design using a data-driven continuous representation of molecules. We develop a molecular autoencoder, which converts discrete representations of molecules to and from a continuous representation. This allows gradient-based optimization through the space of chemical compounds. Continuous representations also let us generate novel chemicals by interpolating between molecules. Rafa Gómez-Bombarelli, Jennifer Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling,Sheberla, Dennis, Jorge Aguilera-Iparraguirre, Timothy Hirzel, Ryan P. Adams, Alán Aspuru-Guzik American Chemical Society Central Science, 2018 arxiv | bibtex | slides | code


Sticking the landing: Simple, lower-variance gradient estimators for variational inference. We give a simple recipe for reducing the variance of the gradient of the variational evidence lower bound. The entire trick is just removing one term from the gradient. Removing this term leaves an unbiased gradient estimator whose variance approaches zero as the approximate posterior approaches the exact posterior. We also generalize this trick to mixtures and importance-weighted posteriors. Geoff Roeder, Yuhuai Wu, David Duvenaud Neural Information Processing Systems, 2017 arxiv | bibtex | code


Reinterpreting importance-weighted autoencoders. The standard interpretation of importance-weighted autoencoders is that they maximize a tighter, multi-sample lower bound than the standard evidence lower bound. We give an alternate interpretation: it optimizes the standard lower bound, but using a more complex distribution, which we show how to visualize. Chris Cremer, Quaid Morris, David Duvenaud ICLR Workshop track, 2017, arxiv | bibtex



Composing graphical models with neural networks for structured representations and fast inference. We propose a general modeling and inference framework that combines the complementary strengths of probabilistic graphical models and deep learning methods. Our model family composes latent graphical models with neural network observation likelihoods. All components are trained simultaneously. We use this framework to automatically segment and categorize mouse behavior from raw depth video. Matthew Johnson, David Duvenaud, Alex Wiltschko, Bob Datta, Ryan P. Adams, Neural Information Processing Systems, 2016 preprint | video | code | slides | bibtex | animation

Scroll to Top
Juan Felipe Carr Alvarez headshotMurat Erdogdu