Photo: Left to Right – Will Grathwohl, Jesse Bettencourt, Yulia Rubanova and Ricky Chen (Absent – David Duvenaud)
Vector Faculty Members and students collaborated and won two of four Best Paper awards and a Best Student Paper Award at NeurIPS 2018, the most important machine learning conference in the world.
Vector co-founder and Faculty Member, David Duvenaud and Vector students Jesse Bettencourt, Yulia Rubanova and Ricky Chen, all from the University of Toronto, are the authors of “Neural Ordinary Differential Equations“, and received the Best Paper award at NeurIPS 2018.
Additionally, Will Grathwohl, Ricky Chen and Jesse Bettencourt received a second award when they won Best Student Paper at the Advances in Approximate Bayesian Inference Workshop in collaboration with David Duvenaud and Ilya Sutskever.
Also claiming top honours with a Best Paper award is one of Vector’s newest Faculty Members from the University of Waterloo, Shai Ben-David and Vector Postgraduate Affiliate, Hassan Ashtiani and their collaborators for their paper, Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes.
The thirty-second annual conference on Neural Information Processing Systems (NeurIPS) kicked off this Sunday in Montreal. NeurlPS is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers.
Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes
By Hassan Ashtiani, Shai Ben-David, Nicholas Harvey, Christopher Liaw, Abbas Mehrabian, Yaniv Plan
We prove that ϴ(k d^2 / ε^2) samples are necessary and sufficient for learning a mixture of k Gaussians in R^d, up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that O(k d / ε^2) samples suffice, matching a known lower bound.
The upper bound is based on a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such a sample compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in R^d has an efficient sample compression.
Neural Ordinary Differential Equations
By Ricky Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud (*equal contribution)
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
Best Student Paper
Symposium on Advances in Approximate Bayesian Inference 2018. Oral Presentation
FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models
By Will Grathwohl*, Ricky T. Q. Chen*, Jesse Bettencourt, Ilya Sutskever, David Duvenaud. (*equal contribution)
A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In this paper, we use Hutchinson’s trace estimator to give a scalable unbiased estimate of the log-density. The result is a continuous-time invertible generative model with unbiased density estimation and one-pass sampling, while allowing unrestricted neural network architectures. We demonstrate our approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.