Talking to Machines: new ML model allows for more "expressive" communication between researchers and AI systems - Vector Institute for Artificial Intelligence

July 14, 2021

By Ian Gormely

The 2021 edition of the International Conference on Machine Learning (ICML), held virtually from July 18 to 24, will once again bring together the machine learning community to share and learn about the latest cutting-edge ML research.

Among the papers co-authored by Vector researchers at this year’s conference is “LTL2Action: Generalizing LTL Instructions for Multi-Task RL,” co-authored by Pashootan Vaezipoor, Andrew Li, Rodrigo Toro Icarte, and Canada CIFAR AI Chair and Vector Faculty Member Sheila McIlraith. It makes strides toward building a Machine Learning (ML) system that can perform a diverse range of tasks and follow open-ended instructions. “We want a human to be able to tell an AI system, such as a robot or a phone, what they want the AI to do in a manner that’s simple and natural for the human,” says co-author Andrew Li. But the ambiguity and open-ended nature of the way humans naturally speak and write can be confusing to machines.

The group turned to linear temporal logic (LTL), an expressive formal language that lacks the ambiguity of natural language, but that can still communicate the kinds of instructions required by ML systems. “You have this very rich language that is quite useful when working in realms such as robotics,” says Pashootan Vaezipoor. “The possibilities are endless.”

The work builds on previous work by Toro Icarte and others in McIlraith’s research group that used LTL and other formal languages as means to communicate what a human wants an ML system to do. “Usually you need huge amounts of labeled data or interactions from a human being to train a model like this,” says McIlraith. LTL2Action is different, in that it generalizes to never-before-seen instructions for over 10³⁹ possible tasks without requiring any human feedback. “That’s something that’s really powerful about this work.”

Below are abstracts and simplified summaries for many of the accepted papers co-authored by Vector Faculty Members.

Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition
Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Animashree Anandkumar

In real-world multi-agent systems, agents with different capabilities may join or leave without altering the team’s overarching goals. Coordinating teams with such dynamic composition is challenging: the optimal team strategy varies with the composition. We propose COPA, a coach-player framework to tackle this problem. We assume the coach has a global view of the environment and coordinates the players, who only have partial views, by distributing individual strategies. Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players. We validate our methods on a resource collection task, a rescue game, and the StarCraft micromanagement tasks. We demonstrate zero-shot generalization to new team compositions. Our method achieves comparable or better performance than the setting where all players have a full view of the environment. Moreover, we see that the performance remains high even when the coach communicates as little as 13% of the time using the adaptive communication strategy.

Efficient Statistical Tests: A Neural Tangent Kernel Approach
Sheng Jia, Ehsan Nezhadarya, Yuhuai Wu, Jimmy Ba

Are you sure your ML model can reliably make predictions on the test data? What if the accuracy is low simply because your test data is inherently different from the training data?

In our latest work on “Efficient Statistical Tests: A Neural Tangent Kernel Approach”, we show an efficient way of detecting data discrepancy between two sets of samples using a Neural Tangent Kernel based two sample test. Using our approach, ML practitioners can quickly identify whether their test samples come from the same distribution as the training samples. The main advantage of our method is that we do not need to train our kernel while still having the compositionality of the kernel for high-dimensional data using neural networks. Our testing process will quickly identify if your model is ready to be deployed on the new tasks.

Environment Inference for Invariant Learning
Elliot Creager, Jörn-Henrik Jacobsen, Richard Zemel

While ML systems tend to perform well in contexts similar to the training data, they may fail when deployed in new settings that differ subtly from those seen before. Invariant learning seeks to address this type of brittleness by learning features that are “invariant” to context changes during training. Unfortunately, this requires that the training data be manually partitioned into “environments” that encode the relevant contexts. To tackle the more realistic setting where this information is unavailable, we propose Environment Inference for Invariant Learning (EIIL), a method that infers worst-case environment labels directly from the training data, which can improve downstream Invariant Learning methods in some settings.

f-Domain-Adversarial Learning: Theory and Algorithms
David Acuna, Guojun Zhang, Marc T. Law, Sanja Fidler

Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain, and a related labeled dataset. In this paper, we introduce a novel and general domain-adversarial framework. Specifically, we derive a novel generalization bound for domain adaptation that exploits a new measure of discrepancy between distributions based on a variational characterization of f-divergences. It recovers the theoretical results from Ben-David et al. (2010a) as a special case and supports divergences used in practice. Based on this bound, we derive a new algorithmic framework that introduces a key correction in the original adversarial training method of Ganin et al. (2016). We show that many regularizers and ad-hoc objectives introduced over the last years in this framework are then not required to achieve performance comparable to (if not better than) state-of-the-art domain-adversarial methods. Experimental analysis conducted on real-world natural language and computer vision datasets show that our framework outperforms existing baselines, and obtains the best results for f-divergences that were not considered previously in domain-adversarial learning.

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection
Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja Fidler, Jose M. Alvarez

Training on datasets with long-tailed distributions has been challenging for major recognition tasks such as classification and detection. To deal with this challenge, image resampling is typically introduced as a simple but effective approach. However, we observe that long-tailed detection differs from classification since multiple classes may be present in one image. As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level. We address object-level resampling by introducing an object-centric memory replay strategy based on dynamic, episodic memory banks. Our proposed strategy has two benefits: 1) convenient object-level resampling without significant extra computation, and 2) implicit feature-level augmentation from model updates. We show that image-level and object-level resamplings are both important, and thus unify them with a joint resampling strategy (RIO). Our method outperforms state-of-the-art long-tailed detection and segmentation methods on LVIS v0.5 across various backbones.

Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding
Yangjun Ruan, Karen Ullrich, Daniel Severo, James Townsend, Ashish Khisti, Arnaud Doucet, Alireza Makhzani, Chris J. Maddison

Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and the true posterior. In this paper, we show how to remove this gap asymptotically by deriving bits-back coding algorithms from tighter variational bounds. The key idea is to exploit extended space representations of Monte Carlo estimators of the marginal likelihood. Naively applied, our schemes would require more initial bits than the standard bits-back coder, but we show how to drastically reduce this additional cost with couplings in the latent space. When parallel architectures can be exploited, our coders can achieve better rates than bits-back with little additional cost. We demonstrate improved lossless compression rates in a variety of settings, especially in out-of-distribution or sequential data compression.

Label-Only Membership Inference Attacks
Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, Nicolas Papernot

Are you tempted to avoid paying model utility for a theoretical guarantee of privacy by defeating specific attacks like membership inference? We identify a group of defenses that we call “confidence-masking” which we show are not a viable defense against membership inference attacks. We do this by creating the first label-only membership inference attacks and show that these adaptive attacks can successfully extract membership despite the use of these defenses that mask confidence scores. Using our attacks, we provide a rigorous evaluation of the efficacy of many defenses and show that differentially private training with transfer learning attains the best trade-off between privacy leakage and model performance.

Learning a Universal Template for Few-shot Dataset Generalization
Eleni Triantafillou, Hugo Larochelle, Richard Zemel, Vincent Dumoulin

Few-shot learning is the problem of learning new concepts from only a handful of labeled examples. This poses a significant challenge for traditional machine learning algorithms that are data hungry and rely on the availability of large labeled datasets, in stark contrast with humans’ flexible learning capabilities. Our work attacks a particularly challenging few-shot learning scenario, where we’re given labeled examples from a diverse set of datasets (including images of flowers, mushrooms, textures, sketches, handwritten characters, etc), towards the goal of building a model that can then few-shot learn classes originating from previously-unseen datasets. Compared to the well-studied few-shot classification problem, this formulation presents the additional challenge that the new classes are thematically and visually distinct from those that were available for training. For this, we propose to learn a model ‘template’ (a subset of layers of a neural network) that, when ‘filled-in’ appropriately (by choosing values for the remaining layers), defines models that work well for different data distributions, allowing it to quickly learn diverse sets of classes. We propose a scalable and efficient instantiation of this idea which achieves strong results on few-shot learning diverse sets of classes.

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning
Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy

“Inductive bias” refers to the ways that a model generalizes to situations it hasn’t seen before. While inductive biases are typically specified through a model architecture or prior distribution, we give an alternative approach for specifying useful inductive biases: defining a set of synthetic auxiliary tasks for which that inductive bias is helpful. We successfully apply this approach to several benchmarks of learning mathematical reasoning.

LTL2Action: Generalizing LTL Instructions for Multi-Task RL
Pashootan Vaezipoor, Andrew Li, Rodrigo Toro Icarte, Sheila McIlraith

Imagine a multi-purpose AI that can perform diverse tasks and follow open-ended language instructions. Typically, training such an AI to understand and adhere to language commands is a labor-intensive process, requiring substantial human feedback — either interactively or as a large corpora of human-labelled instructions. We instead propose a deep reinforcement learning (RL) framework that utilizes unambiguous, compositional formal language instructions, enabling automatic generation of feedback for training. Instructions are specified in Linear Temporal Logic (LTL), which can express complex temporal patterns in a human-interpretable syntax. We demonstrate in complex robotics domains that our RL agent learns how to interpret the language, enabling it to generalize to never-before-seen instructions from a diverse space of more than 1039 possible tasks.

Markpainting: Adversarial Machine Learning meets Inpainting
David Khachaturov, Ilia Shumailov, Yiren Zhao, Nicolas Papernot, Ross Anderson

Inpainting is a learned interpolation technique that is based on generative modeling and used to populate masked or missing pieces in an image; it has wide applications in picture editing and retouching. Recently, inpainting started being used for watermark removal, raising concerns. In this paper we study how to manipulate it using our markpainting technique. First, we show how an image owner with access to an inpainting model can augment their image in such a way that any attempt to edit it using that model will add arbitrary visible information. We find that we can target multiple different models simultaneously with our technique. This can be designed to reconstitute a watermark if the editor had been trying to remove it. Second, we show that our markpainting technique is transferable to models that have different architectures or were trained on different datasets, so watermarks created using it are difficult for adversaries to remove. Markpainting is novel and can be used as a manipulation alarm that becomes visible in the event of inpainting.

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions
Will Grathwohl, Milad Hashemi, Kevin Swersky, David Duvenaud, Chris Maddison.

Often scientists want to automatically explore all possible hypotheses that could explain some data. But usually most hypotheses fit the data very poorly, and it’s difficult to find the ones that are compatible with the data. This is especially true in cases where the hypotheses have many degrees of freedom, but in the last few decades, methods that search based on the gradient of the fit of the hypothesis have scaled to tens of thousands or millions of degrees of freedom. We worked out a simple method to apply this gradient-based search to hypotheses that are described by discrete choices. We demonstrate this approach on modeling protein-folding data.

PID Accelerated Value Iteration Algorithm
Amir-massoud Farahmand, Mohammmad Ghavamzadeh

How can we accelerate the computation of the optimal policy for reinforcement learning (RL) agents? Many RL algorithms are based on a fundamental algorithm called Value Iteration (VI). Value Iteration, however, is quite slow for problems with long planning horizon in which the agent must look far into the future. This work proposes modifications to VI to accelerate its convergence. The key insight is that VI can be interpreted as a dynamical system, which can then be modified using tools from control theory, such as Proportional-Derivative-Integral (PID) controller, to design faster variants of the VI algorithm.

Principled Exploration via Optimistic Bootstrapping and Backward Induction
Chenjia Bai, Lingxiao Wang, Lei Han, Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang

One principled approach for provably efficient exploration is incorporating the upper confidence bound (UCB) into the value function as a bonus. However, UCB is specified to deal with linear and tabular settings and is incompatible with Deep Reinforcement Learning (DRL). In this paper, we propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I). OB2I constructs a general-purpose UCB-bonus through non-parametric bootstrap in DRL. The UCB-bonus estimates the epistemic uncertainty of state-action pairs for optimistic exploration. We build theoretical connections between the proposed UCB-bonus and the LSVI-UCB in a linear setting. We propagate future uncertainty in a time-consistent manner through episodic backward update, which exploits the theoretical advantage and empirically improves the sample-efficiency. Our experiments in the MNIST maze and Atari suite suggest that OB2I outperforms several state-of-the-art exploration approaches.

S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning
Karsten Roth, Timo Milbich, Björn Ommer, Joseph Paul Cohen, Marzyeh Ghassemi

Deep Metric Learning (DML) provides a crucial tool for visual similarity and zero-shot applications by learning generalizing embedding spaces, although recent work in DML has shown strong performance saturation across training objectives. However, generalization capacity is known to scale with the embedding space dimensionality. Unfortunately, high dimensional embeddings also create higher retrieval cost for downstream applications. To remedy this, we propose \emph{Simultaneous Similarity-based Self-distillation (S2SD). S2SD extends DML with knowledge distillation from auxiliary, high-dimensional embedding and feature spaces to leverage complementary context during training while retaining test-time cost and with negligible changes to the training time. Experiments and ablations across different objectives and standard benchmarks show S2SD offers notable improvements of up to 7% in Recall@1, while also setting a new state-of-the-art. Code available here.

Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition
Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, Roger Grosse

Gaussian processes define distributions over functions. We introduce the Harmonic Kernel Decomposition, which decomposes a GP which obeys certain symmetries into a sum of orthogonal GPs. Using this decomposition, we give an inference algorithm for GPs which is more scalable than previous approaches.

Segmenting Hybrid Trajectories with Latent ODEs
Ruian Shi, Quaid Morris

Hybrid trajectories are a type of time series data which contain sudden changes in how the data is generated. For example, measurements of a patient’s medical status can abruptly shift due to the patient acquiring a new disease. Many real-world time series datasets contain hybrid trajectories, but it is challenging for traditional methods to accurately model them, especially when the positions of change are unknown. Thus, we develop the Latent Segmented ODE (LatSegODE), which provides accurate interpolation and extrapolation of hybrid trajectories, and precisely detects the positions of abrupt change. The LatSegODE uses an optimized search algorithm to find the best reconstruction while considering all possible positions of change in a hybrid trajectory, allowing it to operate without prior knowledge of where and how many positions of change exist.

SketchEmbedNet: Learning Novel Concepts by Imitating Drawings
Alexander Wang, Mengye Ren, Richard S. Zemel

Sketch drawings capture the salient information of visual concepts. Previous work has shown that neural networks are capable of producing sketches of natural objects drawn from a small number of classes. While earlier approaches focus on generation quality or retrieval, we explore properties of image representations learned by training a model to produce sketches of images. We show that this generative, class-agnostic model produces informative embeddings of images from novel examples, classes, and even novel datasets in a few-shot setting. Additionally, we find that these learned representations exhibit interesting structure and compositionality.

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning
Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

Reinforcement Learning in large action spaces is a challenging problem. Cooperative multi-agent reinforcement learning (MARL) exacerbates matters by imposing various constraints on communication and observability. In this work, we consider the fundamental hurdle affecting both value-based and policy-gradient approaches: an exponential blowup of the action space with the number of agents. For value-based methods, it poses challenges in accurately representing the optimal value function. For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic. We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function with a low-complexity hypothesis class. This requires accurately modelling the agent interactions in a sample efficient way. To this end, we propose a novel tensorised formulation of the Bellman equation. This gives rise to our method Tesseract, which views the Q-function as a tensor whose modes correspond to the action spaces of different agents. Algorithms derived from Tesseract decompose the Q-tensor across agents and utilise low-rank tensor approximations to model agent interactions relevant to the task. We provide PAC analysis for Tesseract-based algorithms and highlight their relevance to the class of rich observation MDPs. Empirical results in different domains confirm Tesseract’s gains in sample efficiency predicted by the theory.

Unsupervised part representation by Flow Capsules
Sara Sabour, Andrea Tagliasacchi, Soroosh Yazdani, Geoffrey E. Hinton, David J. Fleet

Capsule networks aim to parse images into a hierarchy of objects, parts and relations. While promising, they remain limited by an inability to learn effective low level part descriptions. To address this issue we propose a way to learn primary capsule encoders that detect atomic parts from a single image. During training we exploit motion as a powerful perceptual cue for part definition, with an expressive decoder for part generation within a layered image model with occlusion. Experiments demonstrate robust part discovery in the presence of multiple objects, cluttered backgrounds, and occlusion. The part decoder infers the underlying shape masks, effectively filling in occluded regions of the detected shapes. We evaluate FlowCapsules on unsupervised part segmentation and unsupervised image classification.

Value Iteration in Continuous Actions, States and Time
Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

Classical value iteration approaches are not applicable to environments with continuous states and actions. For such environments, the states and actions are usually discretized, which leads to an exponential increase in computational complexity. In this paper, we propose continuous fitted value iteration (cFVI). This algorithm enables dynamic programming for continuous states and actions with a known dynamics model. Leveraging the continuous-time formulation, the optimal policy can be derived for non-linear control-affine dynamics. This closed-form solution enables the efficient extension of value iteration to continuous environments. We show in non-linear control experiments that the dynamic programming solution obtains the same quantitative performance as deep reinforcement learning methods in simulation but excels when transferred to the physical system. The policy obtained by cFVI is more robust to changes in the dynamics despite using only a deterministic model and without explicitly incorporating robustness in the optimization. Videos of the physical system are available at here.

Talking to Machines: new ML model allows for more “expressive” communication between researchers and AI systems

Related:

Vector researchers tackle real-world AI challenges at ICML 2025

Ontario’s AI ecosystem: fueling real economic growth with record number of jobs and private investments

Transforming Youth Mental Health Support: FAIIR’s AI-Powered Crisis Response Model