ICLR 2021: Researchers adopt teaching tricks to train neural networks like they’re students

April 23, 2021

April 23, 2021

By Ian Gormely

Members of Vector’s research community are readying themselves for the 2021 edition of the International Conference on Learning Representations (ICLR), one of the premier deep learning conference in the world. This year’s conference will be held virtually from May 3 to May 7.

Vector Faculty Members had a number of papers accepted to the conference. ICLR had almost 3000 papers submitted for consideration this year, and only accepted a quarter of them.

Among the accepted papers from Vector Faculty is “Teaching With Commentaries,” co-authored by Vector co-founder and Chief Scientific Advisor Geoffrey Hinton, Faculty Member David Duvenaud and researchers Aniruddh Raghu, Maithra Raghu, and Simon Kornblith that imagines neural networks as students.

“Real teachers have to learn how to teach,” says Duvenaud, “and you don’t just show students a bunch of data without offering help.” Just as students need context to understand what they’re being taught, “neural networks can benefit from commentaries tailored to the way they learn.” To this end, he and his co-authors tried to build a system that automatically helps the students – the neural networks – learn better.

In doing so neural networks are able to learn faster and make more accurate predictions (known as reducing overfitting) and researchers are able to gain some insight into how neural networks work, enabling them to better understand the “black box.” “We think that seeing which types of commentaries were helpful should shed some light on what the neural network was taking away from the data under different circumstances.”

When training a classifier, Duvenaud and his co-authors found that it was helpful to emphasize different training data. When there is little overlap between them, say pictures of cars and planes, where one has wheels and the other has wings, then it was best to focus on the most stereotypical examples of what defines each. But when there is no hard distinction, like an arbitrary cut-off between tall and short people, the classifiers did better when the borderline examples were emphasized.

Below are abstracts and simplified summaries for many of the accepted papers co-authored by Vector Faculty Members.

An Inequality Benchmark for Evaluating Generalization in Theorem Proving

Yuhuai Wu, Albert Jiang, Jimmy Ba, and Roger Grosse.

Interactive theorem provers give a powerful way to formally verify mathematical theorems and computer software, but it is notoriously difficult and time-consuming for humans to write formal proofs. If we could train a deep learning agent to complete all or part of the proof, that would dramatically expand the scope of what can be formally verified. We developed a dataset consisting of synthetically generated mathematical inequality problems and used this dataset to assess the ability of deep learning systems to generalize to theorem statements unlike what they’ve seen before.

A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks

Renjie Liao, Raquel Urtasun, Richard Zemel

Graph neural networks (GNNs) have recently become popular in handling graph-structured data like molecule generation for drug discovery and link prediction in social networks. In this paper, we aim to understand why GNNs can generalize from training graphs to unseen testing graphs. First, we show generalization bounds for the popular variants of GNNs via a PAC-Bayesian approach. Our result reveals that the maximum node degree of graphs and spectral norm of weights govern the generalization bounds of GNNs. Moreover, our PAC-Bayes bound improves over the previous Rademacher complexity-based bound, showing a tighter value empirically on both synthetic and real-world graph datasets.

Bayesian Few-Shot Classification with One-vs-Each Polya-Gamma Augmented Gaussian Processes

Jake Snell, Richard Zemel

Training a deep image classifier is an expensive and time-consuming process. We would like to train a model once and deploy it, but in real life a classifier will encounter hard-to-classify images, including ones from classes it has never seen before. We developed a novel approach based on Gaussian processes that averages over an infinite number of models, weighted by their fit to the new data. Our algorithm is better calibrated (improved estimates of its confidence in its predictions) than previous baselines while exhibiting strong accuracy in this challenging setting.

CaPC Learning: Confidential and Private Collaborative Learning

Christopher A. Choquette-Choo, Natalie Dullerud, Adam Dziedzic, Yunxiang Zhang, Somesh Jha, Nicolas Papernot, Xiao Wang

CaPC is a protocol for collaborative machine learning with strong guarantees of privacy and confidentiality. Organizations that trained models locally can now collaborate and jointly make predictions without revealing to one another the inputs they are predicting on, their models, or their training data. CaPC is model agnostic – each of the collaborating parties can use different types of architecture. Our framework improves the fairness of models even when dealing with non-uniform data distribution, especially when combined with active learning. CaPC provides a new way to realize the requirements imposed by privacy legislation while making little changes to existing ML pipelines.

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell, Tong Li, Animesh Garg

C-learning is a novel reinforcement learning algorithm with applications in robotics and path planning. The main objective of a C-learning agent is to learn efficient paths to prespecified goals. By training the value function with an additional parameter, horizon, this method addresses three shortcomings in the previous works: 1) It learns shorter and more efficient paths to a goal 2) It learns the tasks with less experience and training data 3) It finds multiple ways to reach a goal. Therefore, the users can choose between different paths based on their preference in speed and reliability.

Conservative Safety Critics for Exploration

Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine, Florian Shkurti, Animesh Garg

This paper presents a new approach for safety in the context of reinforcement learning (RL) for robotics. RL is a trial and error based learning paradigm where an agent interacts with the environment, gets rewarded positively for desirable behavior, and negatively for undesirable behavior such that its desirable behavior gets reinforced over time. While training robots to solve a particular task with RL, it is important that undesirable behaviors that could lead to catastrophic failures for example damaging the robot are avoided. This paper presents an algorithm for training RL agents by provably constraining the probability of catastrophic failures and thereby enabling training of robots safely.

Dataset Inference: Ownership Resolution in Machine Learning

Pratyush Maini, Mohammad Yaghini, Nicolas Papernot

Are you concerned that an ML model may be a stolen copy of your proprietary model? We make the pessimistic but realistic observation that one cannot prevent model stealing. Instead, in Dataset Inference, we seek to detect if an adversary has stolen the model after the fact. Our key insight is that the model owner’s most valuable intellectual property is the dataset they trained on. Therefore, no matter how an adversary attempts to steal, their model will contain information private to the victim dataset. Dataset inference uses these signals to distinguish suspect model behavior on samples from training and unseen data and determine whether an adversary has used private knowledge.

Emergent Road Rules In Multi-Agent Driving Environments

Avik Pal, Jonah Philion, Yuan-Hong Liao, Sanja Fidler

To safely share the road with human drivers, self-driving cars must abide by “road rules” that human drivers follow. “Road rules” include law-enforced rules, like the requirement that vehicles stop at red lights, as well as social rules like the implicit designation of fast lanes. We show that in simulated driving environments in which agents seek to reach their destinations quickly, the agents develop road rules that mimic road rules humans have developed. Our results suggest the feasibility of a new paradigm for self-driving in which agents trained fully in simulation can be deployed in the real world.

Latent Skill Planning for Exploration and Transfer

Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti

This paper describes an approach for learning re-usable skills to solve tasks with robots efficiently. We let the robot interact with the environment and build up a world model of how the environment changes in response to the actions of the robot. Using this world model, the robot plans for a sequence of high level skills that are required to solve a particular task, for example walking from point A to B. The key idea of our approach is to learn these skills such that they can be reused for different tasks and slightly different environments. This is important for minimizing the number of interactions the robot has with the environment, which are often expensive and time-consuming.

No MCMC for me: Amortized Sampling for Fast and Stable Training of Energy-Based Models

Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud

Standard neural network classifiers are simply a function that takes in an image, and outputs the probability that that image belongs to different classes. There’s a promising alternative way to train these models, called generative modeling, that also learns how to produce realistic images at the same time. One of the advantages of this approach is that it can learn from mostly unlabeled data. However, training these models typically involves an expensive search over images. We show how a machine can learn to perform that search faster during training, letting us fit large models faster, on both large images, and the sorts of unstructured tables of data often seen in business or health settings.

Planning from Pixels using Inverse Dynamics Models

Keiran Paster, Sheila A. McIlraith, Jimmy Ba

Automatically learning to model the parts of an environment that are relevant to decision making is key to enabling deep reinforcement learning (DRL) agents to solve complicated, real-world tasks. We show that models learned by predicting actions (inverse dynamics) rather than predicting future states (forward dynamics) enable accurate modeling of environment dynamics even in complicated visual environments and can be used to help the agent plan more efficiently. Our new DRL algorithm (GLAMOR) is an exciting step towards enabling agents to model and plan in more complicated environments.

Teaching with Commentaries

Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

How do teachers learn to teach? One way that teachers help students is by providing commentaries on examples that are being shown to students. We implemented a similar idea for training neural nets. The way our teachers came up with useful commentaries was by simulating a “student” neural network learning from their examples and commentaries, and adjusting the commentaries using backprop to improve the student’s performance. Looking at these commentaries shed light on how the students learned, and which parts of the data were important for different tasks, such as classifying medical images.

Theoretical Bounds On Estimation Error For Meta-learning

James Lucas, Mengye Ren, Irene Raissa Kameni Kameni, Toniann Pitassi, Richard Zemel

Traditionally, we assume that machine learning models are taught using the same distribution of data that we expect to see in the wild. But this isn’t very realistic. For example, we might have healthcare data from 5 hospitals and want to deploy our model in a new hospital where patient demographics and medical training differ substantially. We investigate the fundamental difficulty of this problem and prove lower-bounds on the best possible performance of any machine learning algorithm in this setting.

Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding

Sana Tonekaboni, Danny Eytan, Anna Goldenberg

Time series data are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal’s generative process to define neighborhoods in time with stationary properties and learns to distinguish neighboring samples using a debiased contrastive objective. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients’ states in settings where labeling data is practically impossible.

Wandering Within A World: Online Contextualized Few-shot Learning

Mengye Ren, Michael L. Iuzzolino, Michael C. Mozer, Richard S. Zemel

We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting, which mimics the visual experience of an agent wandering within a world. We introduce a new dataset based on large scale indoor imagery, and propose a new model that can make use of spatiotemporal contextual information through a combination of short-term and long-term memory.

When does preconditioning help or hurt generalization?

Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuku, Denny Wu, Ji Xu

One of the most perplexing phenomena in neural net training is that the choice of optimization algorithm affects not only the speed of convergence, but also the generalization ability of the converged solution. One important optimization choice is the preconditioner, which determines how fast parameters move in different directions. We analyze the generalization properties of various preconditioners in the context of linear regression. We find that, in contrast to the popular belief that second-order optimizers generalize worse than first-order ones, the true effect is much more nuanced; we analyze various situations in which preconditioning with second-order information can help or hurt generalization.

ICLR 2021: Researchers adopt teaching tricks to train neural networks like they’re students

Related:

Vector Institute and South Korea’s National AI Research Lab partner to accelerate frontier AI research

The AI Scientist: Towards full automation of the research life cycle

Vector researchers advance generative AI, responsible AI, and scientific discovery at ICML 2026

Anne Martel: Using AI to personalize cancer treatment

Mohamad Moosavi: Accelerating the search for climate solutions with AI

Hassan Ashtiani: Building trustworthy AI through mathematical foundations

Vector researchers advance representation learning and deep learning research at ICLR 2026

Remarkable 2026 Poster Session: 60 research projects shaping AI’s future

CRISPNAM-FG: An interpretable Fine-Gray deep survival model for competing risks in health care

The New Cartography of the Invisible

Vector researchers advance AI frontiers with 80 papers at NeurIPS 2025

When smart AI gets too smart: Key insights from Vector’s 2025 ML Security & Privacy Workshop

Vector Institute names 13 new Faculty Members, expanding core research leadership across Ontario

Vector researchers dive into deep learning at ICLR 2025

Vector researchers tackle real-world AI challenges at ICML 2025

Transforming Youth Mental Health Support: FAIIR’s AI-Powered Crisis Response Model

AI Weather Forecasting Breakthrough: How Canadian Innovation is Transforming Climate Prediction | Aardvark Weather

Exploring Intelligence: Vector Faculty Member Kelsey Allen’s Path from Particle Physics to Cognitive Machine Learning

Real World Multi-Agent Reinforcement Learning – Latest Developments and Applications

Leveraging Large Language Models for More Efficient Systematic Reviews in Medicine and Beyond

Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making

Recommender Systems: Where Academia Meets Industry

My Visiting Researcher Term at Vector Institute

Vector researchers presenting more than 98 papers at NeurIPS 2024

Unlocking the Potential of Prompt-Tuning in Federated Learning

New multimodal dataset will help in the development of ethical AI systems

Unveiling Alzheimer’s: How Speech and AI Can Help Detect Disease

Vector co-founder Geoffrey Hinton wins the Nobel Prize in Physics 2024

Empowering Air Travelers: A Chatbot for Canadian Air Passenger Rights

Vector Institute researchers reconvene for the second edition of the Machine Learning Privacy and Security Workshop

Vector researcher Wenhu Chen on improving and benchmarking foundation models

Vector Researchers present papers at ACL 2024

AtomGen: Streamlining Atomistic Modeling through Dataset and Benchmark Integration

Vector researchers presented more than 50 papers at ICML 2024

Vector researchers are presenting over a dozen papers at CVPR 2024

Vector Institute Computer Vision Workshop showcases the field’s current capabilities and future potential

Vector researcher Gautam Kamath breaks down the latest developments in robustness and privacy

World-leading AI Trust and Safety Experts Publish Major Paper on Managing AI Risks in the journal Science

Standardized protocols are key to the responsible deployment of language models

The known unknowns: Vector researcher Geoff Pleiss digs deep into uncertainty to make ML models more accurate

Breaking Ground: Natural language processing headlines Vector Institute’s latest workshop gathering

Vector Research Blog: Is Your Neural Network at Risk? The Pitfall of Adaptive Gradient Optimizers

How Vector Researcher Xi He uses differential privacy to help keep data private

Vector Research Blog: Structured Neural Networks for Density Estimation and Causal Inference

Vector Research Blog: Causal Effect Estimation Using Machine Learning

Machine learning theory takes centre stage at Vector Institute workshop

Introducing FlexModel: Breakthrough Framework for Unveiling the Secrets of Large Generative AI Models

Neutralizing Bias in AI: Vector Institute’s UnBIAS Framework Revolutionizes Ethical Text Analysis

Vector researchers presenting more than 65 papers at NeurIPS 2023

AI for Chemistry and Materials: blending old and new ways of thinking

AI & public health: using natural language processing for clinical database management

ICML 2023: Developing an adaptive computation model for multidimensional generative tasks

Vector Research Blog: Large Language Models, Prompting and PEFT

Dan Roy named Vector Research Co-Director

Unlocking AI-powered approaches to cancer treatment and detection

Vector community explores data privacy research at Machine Learning Privacy and Security Workshop

Machine Learning Meets Quantum Mechanics: Vector Workshop Showcases Groundbreaking Developments in Quantum Computing

Over 20 Vector research papers accepted at CVPR 2023

Vector research featured at ICLR 2023

AI Research Symposium highlights new Vector research

Vector researchers win top honours at NeurIPS 2022

Canada can lead in AI for Science

Vector researcher Alán Aspuru-Guzik delivers CIFAR Massey Talk

Deep Learning for Building Footprint Extraction in Aerial Imagery

Graham Taylor named Vector Research Director

Acceleration Consortium, Matter Lab, and Vector Institute collaborate on software to power self-driving labs

New Vector Faculty Member Jeff Clune’s quest to create open-ended AI systems

Vector research blog: Value Gradient weighted Model-Based Reinforcement Learning

New AI framework helps map and manage invasive mussel species in Canada’s lakes

Computer Vision Technical Report details insights from industry-academic collaborative project

Vector researchers recognized with awards at the 2022 International Conference on Learning Representations (ICLR)

Research Symposium brings together Vector community to celebrate student and postdoc work

Amateur hockey given professional viewing experience courtesy of machine vision startup co-founded by Vector researcher

AI-enabled tool that identifies COVID-19 variants co-developed by Vector researcher Bo Wang

Technology, including AI, increasingly plays a key role in our food chain

Spotlight on Health at NeurIPS 2021

Vector researchers presenting more than 50 papers at NeurIPS 2021

Vector researchers help institutions ensure privacy and confidentiality when sharing ML models