Vector researchers presented more than 50 papers at ICML 2024

July 23, 2024

2024 Research Research 2024

Vector researchers presented more than 50 papers at the 2024 International Conference on Machine Learning (ICML). 35 papers co-authored by Vector Faculty Members were accepted to the conference, with a further 15 from Vector Faculty Affiliates. This year’s conference was held in Vienna, Austria from July 21 to July 27. 

Among the 50 papers, four were recognized with Best Paper awards:

Below are simplified summaries for the accepted papers and poster sessions from Vector Faculty Members. 

Paper descriptions written by AI and edited by paper co-authors.

Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, Animesh Garg
Poster Session 4

This paper introduces a new approach to reinforcement learning called Adaptive Horizon Actor-Critic (AHAC). The researchers aimed to improve how robots learn complex movement tasks, like walking or running.

Traditional methods often struggle with these tasks due to the complexity of physical interactions, especially when objects come into contact. AHAC addresses this by adapting how far into the future it looks when making decisions, focusing on smoother movements and avoiding problematic collisions. The team tested AHAC on various robot simulations, including simple hopping robots and complex humanoid figures. They found that AHAC outperformed existing methods, achieving 40% better results across different tasks. Notably, AHAC was particularly effective for more complex robots with many moving parts. One key innovation is that AHAC can adjust its planning horizon during the learning process, allowing it to avoid difficulties that arise from long-term predictions in complex physical interactions.

This research represents a significant step forward in teaching robots to perform complex physical tasks more efficiently and effectively. It could lead to more capable and adaptable robots in various real-world applications.

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis
Poster Session 1

This paper introduces “Align Your Steps” (AYS), a new method for improving the sampling process in diffusion models, which are a type of AI used for generating images and videos. Diffusion models work by gradually removing noise from random data, but this process can be slow and produce lower quality results when done quickly. The researchers developed a mathematical framework to optimize the “sampling schedule” – the steps the model takes when generating images. They found that by carefully adjusting these steps, they could significantly improve the quality of the generated content, especially when using fewer steps. The team tested their method on various tasks, including generating 2D shapes, images, and videos. In almost all cases, AYS outperformed existing methods, producing higher quality outputs with the same computational resources. For example, in image generation tasks, AYS achieved up to 40% better results than previous methods.

This research is significant because it makes diffusion models more efficient and effective, potentially leading to faster and higher quality AI-generated content across various applications, from art creation to video synthesis.

Asymmetry in Low-Rank Adapters of Foundation Models

Asymmetry in Low-Rank Adapters of Foundation Models

Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon
Poster Session 6

This paper investigates the asymmetry in low-rank adaptation (LoRA), a popular method for fine-tuning large language models. LoRA adapts models by adding a product of two matrices, A and B, to the original model weights. The researchers discovered that these matrices play different roles: A extracts features from the input, while B uses these features to create the desired output.

The study shows, both theoretically and empirically, that fine-tuning only the B matrix is more effective than fine-tuning only A. Surprisingly, using a random, untrained A matrix performs nearly as well as a fine-tuned one. This finding suggests that optimizing B alone can achieve similar performance to full LoRA while using fewer parameters.

The researchers tested their approach on various tasks and models, including RoBERTa, BART, LLaMA-2, and Vision Transformers. In most cases, their method outperformed standard LoRA and other baselines, especially when using fewer training steps.

This work is significant because it offers a more efficient way to fine-tune large language models, potentially reducing computational costs and improving generalization. It also provides insights into how these models adapt to new tasks.

Auditing Private Prediction

Auditing Private Prediction

Karan Chadha, Matthew Jagielski, Nicolas Papernot, Christopher A. Choquette Choo, Milad Nasresfahani
Poster Session 3

This paper introduces the first framework for auditing private prediction algorithms in machine learning. While differential privacy provides theoretical upper bounds on privacy leakage, this work establishes practical lower bounds through empirical auditing. The researchers focus on four algorithms: PATE, CaPC, PromptPATE, and Private-kNN.

The framework uses adversaries with varying poisoning and query capabilities to assess privacy leakage. New techniques are developed to evaluate leakage in terms of Renyi DP.

Key findings include:

  1. Current privacy analyses of private prediction can be improved.
  2. Algorithms more susceptible to poisoning show higher privacy leakage.
  3. Adversaries without query control cause less privacy leakage than those with full control.

This work is significant as it provides a comprehensive auditing framework for private prediction algorithms, complementing theoretical guarantees with practical lower bounds. It helps researchers and practitioners better understand and improve real-world privacy guarantees of machine learning models during inference.

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E Turner, Alireza Makhzani
Poster Session 6

This paper investigates whether the square root operation can be removed from adaptive gradient methods in machine learning, particularly for training large language models. Adaptive methods like Adam are popular for training transformers but often underperform compared to stochastic gradient descent (SGD) on convolutional neural networks (CNNs).

The researchers propose a theoretical framework called “Align Your Steps” (AYS) to optimize sampling schedules in diffusion models without using square roots. They analyze the method from a second-order optimization perspective and demonstrate its effectiveness across various models and datasets.

Key findings include:

  1. AYS closes the generalization gap between adaptive methods and SGD on CNNs.
  2. It maintains performance on transformer models compared to square root-based methods.
  3. AYS enables low-precision training for matrix adaptive methods, improving efficiency.

The study provides new insights into adaptive optimization methods, challenging the necessity of the square root operation. It suggests that adaptivity, rather than just sign-based updates, plays a crucial role in the success of these methods. This work opens up new avenues for developing more efficient and effective optimization algorithms for deep learning.

Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals

Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals

Ziyi Liu, Daniel Roy, Idan Attias
Poster Session 1

This paper explores the challenge of adapting to causal structures in multi-armed bandit problems, a type of decision-making scenario. The researchers investigate how to design algorithms that can perform well both when there’s useful causal information available and when there isn’t.

The study introduces the concept of a “Pareto regret frontier,” which represents the best possible trade-offs between performance in different types of environments. They prove that it’s impossible to achieve optimal performance in all scenarios simultaneously, but they develop an algorithm that comes close to the best possible trade-offs.

The researchers also show how to reduce causal bandit problems to linear bandit problems, allowing for more efficient solutions in some cases. They provide the first instance-dependent regret bounds for causal bandits, which can lead to better performance in specific scenarios.

Finally, the paper examines the common assumption that algorithms have perfect knowledge of certain probability distributions. They show that this assumption is necessary for achieving better performance, but some level of imperfect knowledge can still be useful.

This research advances our understanding of causal inference in decision-making problems and provides new tools for designing adaptive algorithms.

A Computational Framework for Solving Wasserstein Lagrangian Flows

A Computational Framework for Solving Wasserstein Lagrangian Flows

Kirill Neklyudov, Rob Brekelmans, Alexander Tong, Lazar Atanackovic, qiang liu, Alireza Makhzani
Poster Session 4

This paper introduces a unified computational framework for solving “Wasserstein Lagrangian Flows,” which are optimization problems on the space of probability distributions which minimize a given Lagrangian action or “cost”. Through the choice of Lagrangian, Wasserstein Lagrangian flows encompass optimal transport problems and their variants,  including Schrödinger bridges, physically-constrained transport, and unbalanced optimal transport.  

The authors focus on applications in single-cell biology, which seek to understand the evolution of populations of cells. The choice of Lagrangian can be used to incorporate prior knowledge of the true dynamics, such that the optimal cost-minimizing solution better matches the given data. After learning, a neural network model of the dynamics can be used to simulate trajectories of the underlying process, which may correspond to predicting treatment effects or understanding cell differentiation or developmental processes

Confidence Aware Inverse Constrained Reinforcement Learning

Confidence Aware Inverse Constrained Reinforcement Learning

Sriram Ganapathi Subramanian, Guiliang Liu, Mohammed Elmahgiubi, Kasra Rezaee, Pascal Poupart
Poster Session 4

This paper introduces Confidence Aware Inverse Constrained Reinforcement Learning (CA-ICRL), a novel approach in the field of reinforcement learning. The method addresses a crucial problem in real-world applications: learning constraints from expert demonstrations when these constraints are too numerous or complex to be fully specified.

CA-ICRL improves upon existing Inverse Constrained Reinforcement Learning methods by incorporating a measure of confidence in the learned constraints. This allows users to specify a desired confidence level, and the algorithm learns constraints that are at least as restrictive as the true underlying constraints with that level of confidence.

A key innovation of CA-ICRL is its ability to determine whether the available expert demonstrations are sufficient to learn constraints with the desired confidence and performance levels. This feature can guide users in collecting additional expert data if needed.

The authors demonstrate CA-ICRL’s effectiveness through experiments in various simulated environments and a realistic autonomous driving scenario. The method consistently outperforms existing approaches in terms of constraint violation rates and rewards obtained.

Overall, CA-ICRL provides a more flexible and informative approach to learning constraints from demonstrations, potentially improving the safety and effectiveness of reinforcement learning in complex real-world applications.

Differentially Private Post-Processing for Fair Regression

Differentially Private Post-Processing for Fair Regression

Ruicheng Xian, Qiaobo Li, Gautam Kamath, Han Zhao
Poster Session 5

This paper presents a differentially private post-processing algorithm for learning fair regressors that satisfy statistical parity. The method addresses both privacy concerns in handling sensitive data and fairness issues in machine learning models.

The algorithm consists of three main steps:

  1. Privately estimating output distributions using histogram density estimation and the Laplace mechanism
  2. Computing the Wasserstein barycenter of these distributions
  3. Using optimal transports to the barycenter for post-processing to achieve fairness

The authors provide a theoretical analysis of the algorithm’s sample complexity and fairness guarantees. They reveal a trade-off between statistical bias and variance induced by the choice of the number of bins in the histogram. Using fewer bins always improves fairness at the expense of higher error.

The method can be applied to post-process any given regressor to improve fairness by remapping its outputs. Experiments on Law School and Communities & Crime datasets demonstrate the algorithm’s effectiveness in balancing privacy, fairness, and accuracy.

This work contributes to the growing field of privacy-preserving fair machine learning, offering a flexible approach that allows practitioners to adjust the privacy-fairness-accuracy trade-off based on their specific requirements.

Disguised Copyright Infringement of Latent Diffusion Models

Disguised Copyright Infringement of Latent Diffusion Models

Yiwei Lu, Matthew Yang, Zuoqiu Liu, Gautam Kamath, Yaoliang Yu
Poster Session 5

This paper introduces the concept of “disguised” copyright infringement in latent diffusion models (LDMs), challenging the current understanding of what constitutes access to copyrighted material. The authors demonstrate that it’s possible to create “disguises” – images that look visually different from copyrighted content but share similar latent information when processed by LDMs.

The paper presents an algorithm for generating these disguises and shows how they can be used to train LDM-based models (like textual inversion and DreamBooth) to reproduce copyrighted content without directly including the original images in the training set. This raises concerns about the current methods of detecting copyright infringement in AI training data.

To address this, the authors propose a broader notion of “acknowledgment” and introduce detection methods including feature similarity search and encoder-decoder examination. These tools could augment existing auditing practices for AI training data.

The study has significant implications for copyright law, AI governance, and the ongoing debate about the use of copyrighted material in training generative AI models. It calls for a more nuanced understanding of “access” in the context of copyright infringement for AI systems.

Experts Don’t Cheat: Learning What You Don’t Know By Predicting Pairs

Experts Don’t Cheat: Learning What You Don’t Know By Predicting Pairs

Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris Maddison
Poster Session 4

This paper presents a novel approach to quantifying uncertainty in generative models, addressing the challenge of distinguishing between aleatoric uncertainty (inherent randomness) and epistemic uncertainty (lack of knowledge) in probabilistic predictions. The authors propose training models to predict pairs of independent responses drawn from the true distribution, allowing the model to “cheat” by observing one response while predicting the other.

The key insight is that the degree of “cheating” reveals the model’s epistemic uncertainty. The paper proves that this strategy incentivizes models to become second-order calibrated, enabling accurate estimation of the gaps between the model’s predictions and the true distribution. The authors introduce a “cheat-corrected epistemic confidence” metric that can be used to filter out potentially hallucinated samples.

Theoretical guarantees are provided for detecting statistical hallucinations, and the approach is demonstrated on synthetic tasks, including describing digits of π and a partially observable reinforcement learning task. The method outperforms existing filtering techniques in these scenarios.

This work contributes to the field of uncertainty quantification in machine learning, offering a new perspective on how to identify what a model doesn’t know, with potential applications in improving the safety and reliability of generative AI systems.

FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler

FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler

Hongyi Peng, Han Yu, Xiaoli Tang, Xiaoxiao Li
Poster Session 6

This paper introduces FedCal, a novel approach to address model calibration in federated learning (FL) settings. The authors identify that data heterogeneity in FL poses significant challenges for model calibration, affecting both local and global performance. FedCal aims to achieve both local and global calibration without relying on a global validation dataset, which is often impractical in FL scenarios.

The proposed method utilizes client-specific scalers for local calibration, which are then aggregated to form a global scaler. This approach effectively corrects output misalignment without sacrificing prediction accuracy. The authors provide a theoretical analysis showing that despite constraining variance in clients’ label distributions, the global calibration error is still asymptotically lower bound.

Extensive experiments across four benchmark datasets demonstrate that FedCal significantly outperforms existing baselines, reducing global calibration error by an average of 47.66%. The method is shown to be robust to increasing levels of data heterogeneity and can be integrated with existing FL frameworks.

FedCal also introduces the concept of “acknowledgment” to address indirect access to copyrighted material in FL, broadening the understanding of data usage in collaborative learning environments.

The Fundamental Limits of Least-Privilege Learning

The Fundamental Limits of Least-Privilege Learning

Theresa Stadler, Bogdan Kulynych, Michael Gastpar, Nicolas Papernot, Carmela Troncoso
Poster Session 4

This paper examines the fundamental limits of least-privilege learning in machine learning, particularly in contexts where data representations are shared instead of raw data to prevent misuse. The authors provide the first formal definition of the least-privilege principle for machine learning, framing it as a bound on the inference gain about data beyond what is already revealed through a task’s fundamental leakage.

The research proves a crucial trade-off: under realistic assumptions about data distribution, any representation that has utility for a given task must inevitably leak information beyond what’s required for that task. This finding challenges the notion that it’s possible to create representations that are useful for a specific task while revealing nothing else about the underlying data.

Through theoretical analysis and extensive experiments across various datasets, model architectures, and learning techniques, the authors demonstrate that this trade-off is fundamental and cannot be circumvented by existing methods like attribute censoring or differential privacy.

The paper’s findings have significant implications for privacy-preserving machine learning, suggesting that current approaches to limit data access through feature representations may not provide the level of privacy protection often assumed. It calls for a re-evaluation of privacy expectations in collaborative learning and model partitioning scenarios.

Genie: Generative Interactive Environments

Genie: Generative Interactive Environments

Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel
Oral 1x Video

Genie is a novel generative AI model that creates interactive, action-controllable virtual environments from unlabeled Internet videos. Trained on over 200,000 hours of publicly available gaming footage, this 11B-parameter model can generate diverse, playable worlds from text prompts, synthetic images, photographs, and even hand-drawn sketches.

The model comprises three key components: a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model. This architecture enables frame-by-frame control without requiring ground-truth action labels during training. Genie demonstrates scalability benefits with increasing model size and batch size, suggesting potential for further improvements with additional computational resources.

Genie’s capabilities extend beyond gaming environments. When trained on robotics datasets, it successfully learns distinct and consistent actions, suggesting potential applications in robotic simulation and control.

Importantly, Genie shows promise for training generalist agents. Its learned latent actions can be used to infer policies from unseen action-free videos, potentially unlocking vast amounts of data for future AI training.

While limitations exist, such as occasional hallucinations and limited memory span, Genie represents a significant step towards creating diverse, interactive virtual environments and training more capable AI agents.

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel Roy
Oral 5x Optimization 2

This paper explores the relationship between memorization and learning in stochastic convex optimization (SCO). The authors quantify memorization using conditional mutual information (CMI), which measures the information a learning algorithm reveals about its training data. They establish a fundamental tradeoff between a learning algorithm’s accuracy and its CMI.

For Lipschitz-bounded SCO, the authors prove that every ε-learner has CMI bounded below by Ω(1/ε²). For strongly convex SCO, this bound is Ω(1/ε). These results hold despite optimal sample complexity, indicating that accurate learning necessitates substantial memorization.

The paper demonstrates the necessity of memorization by designing an adversary capable of identifying a significant fraction of training samples in specific SCO problems. This finding challenges the intuition that ideal learning algorithms should avoid memorizing irrelevant information.

The authors discuss several implications of their results, including limitations of CMI-based generalization bounds for SCO and the impossibility of constant-sized sample compression schemes. These findings contribute to our understanding of the role of memorization in learning and have implications for privacy and generalization in machine learning.

Intersectional Unfairness Discovery

Intersectional Unfairness Discovery

Gezheng Xu, Qi Chen, Charles X. Ling, Boyu Wang, Changjian Shui
Poster Session 4

This paper introduces the Bias-Guided Generative Network (BGGN), a novel approach for discovering intersectional unfairness in AI systems. Unlike traditional methods that focus on single sensitive attributes or rely on enumeration and search techniques, BGGN formulates the discovery process as a generative task. This allows for an efficient and diverse generation of high-bias intersectional sensitive attributes.

The researchers demonstrate BGGN’s effectiveness on real-world text (Toxic) and image (CelebA) datasets. The model not only discovers known biases but also generates unseen yet potentially high-bias intersectional attributes. To validate these discoveries, the authors use modern generative AI models like LLaMA and Midjourney to produce new texts and images based on the discovered attributes.

BGGN outperforms conventional search algorithms and generative models in identifying diverse and high-bias subgroups. It also provides insights into potential unfairness in popular generative AI systems, as the generated content often exhibits bias.

This work contributes to the understanding of intersectional fairness in AI and offers a scalable method for proactively discovering unfairness that may be present but unnoticed in complex systems with multiple sensitive attributes.

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

Jinsoo Yoo, Yunpeng Liu, Frank Wood, Geoff Pleiss
Poster Session 5

This paper introduces Layerwise Proximal Replay (LPR), a novel approach to online continual learning that combines experience replay with a proximal point method. The authors identify a limitation in current replay-based methods: unstable optimization trajectories that impede overall accuracy. LPR addresses this by modifying the optimization geometry to balance learning from new and replay data while allowing only gradual changes in hidden activations of past data.

The method is extensively evaluated across multiple problem settings and datasets, consistently demonstrating improved performance over existing replay-based methods. Notably, LPR shows benefits even with unlimited memory, suggesting its improvements go beyond just preventing catastrophic forgetting.

LPR’s framework involves a layerwise preconditioner applied to loss gradients, designed to promote continued learning while limiting sudden performance degradation on past data. The authors provide a detailed mathematical formulation and analysis of the method’s effects on internal representations and optimization stability.

Comparisons with state-of-the-art methods show LPR’s superiority across various metrics and problem settings. The paper also discusses the relationship between LPR and existing gradient projection methods, highlighting key differences that make LPR more suitable for online continual learning with replay buffers.

Learning High-Order Relationships of Brain Regions

Learning High-Order Relationships of Brain Regions

Weikang Qiu, Huangrui Chu, Selena Wang, Haolan Zuo, Xiaoxiao Li, Yize Zhao, ZHITAO YING
Poster Session 5

This paper introduces HyBRiD, a novel method for identifying high-order relationships among brain regions from fMRI data. The authors propose that these relationships should be maximally informative and minimally redundant (MIMR) regarding phenotypic outcomes. HyBRiD represents brain regions as nodes in a hypergraph, with hyperedges representing high-order relationships.

The method employs a Constructor to identify hyperedge structures and a Weighter to compute hyperedge weights. A multi-head drop-bottleneck framework is introduced to achieve the MIMR objective, with theoretical guarantees. HyBRiD avoids searching in exponential space by learning masks to identify hyperedges, ensuring efficiency and consistency across subjects.

Experiments on ABIDE and ABCD datasets demonstrate that HyBRiD outperforms state-of-the-art predictive models by an average of 11.2%. The results show that higher-degree hyperedges are more significant in predicting cognitive outcomes, highlighting the importance of high-order relationships in brain function.

The authors provide qualitative analysis of the most significant hyperedges, revealing coordinated interactions of multiple brain regions in cognitive tasks. This work contributes to understanding complex brain functions and may benefit clinical studies and diagnostic tools in neurology.

Learning Latent Structures in Network Games via Data-Dependent Gated-Prior Graph Variational Autoencoders

Learning Latent Structures in Network Games via Data-Dependent Gated-Prior Graph Variational Autoencoders

Xue Yu, Muchen Li, Yan Leng, Renjie Liao
Poster Session 5

This paper introduces GPGVAE, an unsupervised learning model for inferring latent interaction types and network structures in network games. The model addresses the challenge of revealing hidden relationships among individuals based on their observed actions, without prior knowledge of utility functions or partial network connections.

GPGVAE employs a spectral GNN-based encoder to predict interaction types (strategic complement vs. substitute) and a data-dependent gated prior to model network structures. It also features a Transformer-based mixture of Bernoulli encoder for network structures and a GNN-based decoder for game actions.

The authors propose a stage-wise training strategy and investigate various Monte Carlo gradient estimation methods. They demonstrate that GPGVAE outperforms state-of-the-art models on both synthetic and real-world datasets, showing an average improvement of 11.2% in inferring network structures.

The model effectively captures both strategic complement and substitute interactions, providing insights into the importance of high-order relationships in network structures. The authors also conduct extensive ablation studies and analyze the model’s performance under different game settings.

This work contributes to understanding complex network dynamics and may have applications in fields such as social network analysis, economics, and policy-making.

Learning to Route Among Specialized Experts for Zero-Shot Generalization

Learning to Route Among Specialized Experts for Zero-Shot Generalization

Mohammed Muqeeth, Haokun Liu, Yufan Liu, Colin Raffel
Poster Session 2

This paper introduces PHATGOOSE, a novel method for improving zero-shot generalization by routing among specialized expert models. The method addresses the challenge of recycling a large collection of specialized models to enhance a base language model’s zero-shot capabilities without requiring simultaneous access to the datasets used to create these models.

PHATGOOSE learns to route among specialized modules produced through parameter-efficient fine-tuning. It enables per-token and per-module routing, which the authors hypothesize improves zero-shot generalization by allowing different expert capabilities to be used at different stages and for different tokens.

The method is post-hoc, requiring only a modest amount of additional computing after each expert model is trained. In experiments covering various specialized model collections and zero-shot generalization benchmarks, PHATGOOSE outperforms past methods for post-hoc routing and, in some cases, surpasses explicit multitask training.

Qualitative analysis validates that PHATGOOSE’s performance stems from its ability to perform per-token and per-module routing. The authors provide insights into the routing strategies learned by the model and discuss potential avenues for future work in this area.

This work sets the groundwork for a promising new framework for the decentralized development of generalist AI systems.

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang
Oral 4x Retrieval

MagicLens is a novel approach to self-supervised image retrieval that supports open-ended instructions. The key innovation lies in its data construction pipeline, which leverages naturally occurring image pairs from web pages and uses large language models to generate diverse, open-ended instructions describing the relationships between these images.

The MagicLens model architecture consists of a Constructor that identifies hyperedge structures and a Weighter that computes hyperedge weights. It employs a multi-head drop-bottleneck framework for optimization, ensuring that the learned representations are maximally informative and minimally redundant.

Extensive experiments demonstrate that MagicLens outperforms state-of-the-art methods on multiple image retrieval benchmarks, including CIRCO, DTIN, and GeneCIS. Notably, it achieves this performance with significantly fewer parameters than previous methods, showing high parameter efficiency.

A key strength of MagicLens is its ability to handle complex and beyond-visual search intents, as demonstrated through human evaluations on a large-scale retrieval pool of 1.4 million images. The model shows remarkable versatility in understanding and satisfying diverse search instructions, even those requiring abstract reasoning or contextual understanding.

This work sets a new benchmark for image retrieval with open-ended instructions and opens up possibilities for more flexible and powerful image search systems.

Measuring Stochastic Data Complexity with Boltzmann Influence Functions

Measuring Stochastic Data Complexity with Boltzmann Influence Functions

Nathan Ng, Roger Grosse, Marzyeh Ghassemi
Poster Session 3

This paper introduces IF-COMP, a novel method for estimating stochastic data complexity in deep neural networks using temperature-scaled Boltzmann influence functions (BIFs). The approach aims to approximate the predictive normalized maximum likelihood (pNML) distribution, addressing the challenge of estimating uncertainty in model predictions, particularly for out-of-distribution data.

IF-COMP introduces a temperature-scaled proximal Bregman objective to soften local curvature, allowing for a more accurate approximation of hindsight-optimal outputs. By linearizing the model, IF-COMP efficiently estimates the pNML distribution without explicit optimization steps, resulting in a 7-15 times speedup compared to existing methods like ACNML.

The method demonstrates strong performance on three key tasks: uncertainty calibration, mislabel detection, and out-of-distribution detection. Notably, IF-COMP outperforms various baselines, including Bayesian and optimization-tracing approaches, often with less information available.

Extensive experiments validate IF-COMP’s ability to accurately estimate ground truth pNML complexity and its effectiveness across different neural network architectures and datasets. The results highlight the potential of Minimum Description Length (MDL) based approaches for improving uncertainty estimates in deep neural networks, offering a promising direction for enhancing model reliability and calibration under distribution shifts.

Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning

Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning

Saber Malekmohammadi, Yaoliang Yu, Yang Cao
Poster Session 5

This paper introduces ROBUST-HDP, a novel algorithm for heterogeneous differentially private federated learning (DPFL) systems. The method addresses the challenge of heterogeneity in clients’ privacy requirements, batch sizes, and dataset sizes, which can lead to varying noise levels in clients’ model updates.

ROBUST-HDP employs Robust PCA to efficiently estimate the true noise level in clients’ updates, allowing for more effective aggregation of model updates. This approach improves upon existing methods that rely on potentially suboptimal or vulnerable aggregation strategies based on clients’ reported privacy parameters.

The authors provide theoretical analysis and convergence guarantees for ROBUST-HDP, demonstrating its effectiveness in various heterogeneity scenarios. Extensive experiments on multiple datasets show that ROBUST-HDP outperforms state-of-the-art methods in terms of utility and convergence speed. Notably, the algorithm maintains high parameter efficiency with a significantly smaller model size compared to previous approaches. It 

ROBUST-HDP also demonstrates robustness against potential falsification of privacy parameters by clients, making it suitable for untrusted server settings. The paper’s findings suggest that ROBUST-HDP offers a promising approach for improving the performance and reliability of heterogeneous DPFL systems while maintaining strong privacy guarantees.

Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift

Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift

Benjamin Eyre, Elliot Creager, David Madras, Vardan Papyan, Richard Zemel
Poster Session 1

This paper introduces Spectral Adapted Regressor (SpAR), a novel method for improving out-of-distribution (OOD) performance in regression tasks. The authors begin by analyzing how Ordinary Least Squares (OLS) regression is sensitive to covariate shift, characterizing OOD risk in terms of eigenspectrum decomposition of source and target data.

The key insight is the concept of “Spectral Inflation,” where subspaces with small variations during training see increased variation during evaluation. This motivates SpAR, a lightweight method that adapts the weights of a pre-trained neural regression model’s last layer using unlabeled test data to estimate subspaces with spectral inflation and project them away.

SpAR uses Robust PCA to identify subspaces where train/test variance differs the most. The method is theoretically grounded and empirically validated on synthetic and real-world datasets, demonstrating improved OOD performance compared to existing approaches.

The authors provide a comprehensive analysis, including proofs of theorems, ablation studies, and comparisons with state-of-the-art methods. SpAR shows promise in addressing the challenge of OOD generalization in regression tasks, offering a computationally efficient post-processing approach that can be applied to various pre-trained models.

Overcoming Data and Model Heterogeneities in Decentralized Federated Learning via Synthetic Anchors

Overcoming Data and Model Heterogeneities in Decentralized Federated Learning via Synthetic Anchors

Chun-Yin Huang, Kartik Srinivas, Xin Zhang, Xiaoxiao Li
Poster Session 1

This paper introduces DeSA, a novel approach to decentralized federated learning that addresses both data and model heterogeneity without requiring a central server. The key innovation lies in the use of synthetic anchor data, generated through distribution matching, to facilitate mutual knowledge transfer among clients.

DeSA incorporates two main components: a REG loss that regularizes the distribution of client latent embeddings with the anchors, and a KD loss that enables clients to learn from each other. The authors provide theoretical analysis showing how these components contribute to improved generalization bounds.

Extensive experiments on diverse datasets demonstrate that DeSA outperforms existing decentralized federated learning algorithms in both inter- and intra-client performance. The method shows robustness across various tasks and data distributions, even in scenarios with significant domain shifts.

One of DeSA’s key strengths is its ability to handle both data and model heterogeneity simultaneously, a challenge that previous methods have struggled to address in a serverless setting. By synthesizing global anchors based on raw data distribution, DeSA provides a flexible and effective solution for collaborative learning in decentralized environments.

Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

Florian Tramer, Gautam Kamath, Nicholas Carlini
Oral 1x Positions on How We Do Machine Learning Research

This paper critically examines the practice of using large-scale public data for pretraining models that are then fine-tuned with differential privacy on sensitive data. The authors raise three main concerns:

  1. Privacy of public data: Web-scraped data used for pretraining may contain sensitive information, potentially compromising individuals’ privacy even when models are labeled as “privacy-preserving.”
  2. Benchmark limitations: Current benchmarks for private learning may overestimate the value of public pretraining by using tasks that closely resemble the pretraining data, potentially not reflecting real-world privacy-sensitive applications.
  3. Computational requirements: Large pretrained models often require sensitive data to be uploaded to powerful third-party servers for fine-tuning and inference, potentially introducing new privacy risks.

The authors argue that these issues may lead to a false sense of privacy protection and call for more careful consideration of what constitutes “public” data, the development of more appropriate benchmarks for private learning, and the exploration of privacy-preserving techniques that don’t require outsourcing computation. The paper concludes by encouraging researchers to address these challenges and develop more robust approaches to privacy-preserving machine learning.

Position Paper: Rethinking LLM Censorship as a Security Problem

Position Paper: Rethinking LLM Censorship as a Security Problem

David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan
Poster Session 1

This position paper critically examines the practice of using large-scale public data for pretraining machine learning models that are then fine-tuned with differential privacy on sensitive data. The authors raise three main concerns:

  1. Privacy of public data: Web-scraped data used for pretraining may contain sensitive information, potentially compromising individuals’ privacy even when models are labeled as “privacy-preserving.”
  2. Benchmark limitations: Current benchmarks for private learning may overestimate the value of public pretraining by using tasks that closely resemble the pretraining data, potentially not reflecting real-world privacy-sensitive applications.
  3. Computational requirements: Large pretrained models often require sensitive data to be uploaded to powerful third-party servers for fine-tuning and inference, potentially introducing new privacy risks.

The authors argue that these issues may lead to a false sense of privacy protection. They call for more careful consideration of what constitutes “public” data, the development of more appropriate benchmarks for private learning, and the exploration of privacy-preserving techniques that don’t require outsourcing computation. The paper concludes by encouraging researchers to address these challenges and develop more robust approaches to privacy-preserving machine learning while acknowledging the importance of recent work in showing that differential privacy can be preserved for complex machine learning problems.

Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse
Oral 3x Probabilistic Inference

This paper introduces a new approach called “twisted Sequential Monte Carlo” (SMC) for improving language model outputs. The goal is to make language models generate text that meets specific criteria, such as having a certain sentiment or avoiding harmful content. The researchers propose using SMC, a statistical sampling method, combined with “twist functions” that guide the text generation process. They develop a new way to learn these twist functions called “contrastive twist learning.” The paper demonstrates that this approach can effectively steer language model outputs towards desired characteristics while maintaining text quality. It also introduces new methods for evaluating how well different techniques work for controlling language model outputs. The researchers test their approach on tasks like generating positive or negative reviews, filling in missing text, and creating non-toxic stories. They show that their method often performs better than existing techniques. This work provides a flexible framework for controlled text generation, which could be useful for various applications, including making AI language models safer and more reliable.

Quality Diversity through Human Feedback: an Open-Ended Backend for Diversity-Driven Optimization

Quality Diversity through Human Feedback: an Open-Ended Backend for Diversity-Driven Optimization

Li Ding, Jenny Zhang, Jeff Clune, Lee Spector, Joel Lehman
Poster Session 1

The paper presents a novel method called Quality Diversity through Human Feedback (QDHF) that aims to enhance artificial intelligence systems by making them more capable of generating diverse and high-quality solutions. Traditional optimization methods focus on finding the single best solution; however, many complex tasks benefit from having a variety of solutions.

The key innovation of this research is integrating human feedback directly into Quality Diversity (QD) algorithms. QD algorithms excel at producing diverse solutions but often rely on manually defined metrics to measure diversity. QDHF improves this by learning what diversity means to humans through their feedback, making it more adaptable and effective for tasks requiring creativity and exploration.

Empirical studies show that QDHF achieves superior performance in generating diverse and high-quality solutions compared to existing methods. It proves particularly effective in tasks like text-to-image generation, where it significantly enhances the variety and quality of the generated images.

By combining human insight with advanced algorithms, QDHF offers a robust approach to solving open-ended and complex problems.

Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

Parand Alizadeh Alamdari, Toryn Q Klassen, Elliot Creager, Sheila McIlraith
Poster Session 4

The research explores fairness in sequential decision-making, affecting multiple stakeholders over time. Traditional fairness studies focus on single, isolated decisions, but this work highlights that fairness in a sequence of decisions depends on the entire decision history, making it inherently non-Markovian (not solely dependent on the present state). The study emphasizes the need to assess fairness throughout the process, not just at its conclusion.

Key contributions include:

  1. Introduction of non-Markovian fairness, accounting for historical context in sequential decisions.
  2. Identification of various fairness properties such as long-term, anytime, periodic, and bounded fairness, which offer different ways to measure fairness over time.
  3. Examination of how memory supports the construction of fair policies in decision-making.
  4. Development of FairQCM, an algorithm that enhances reinforcement learning by augmenting training data to improve the creation of fair policies.

This investigation broadens the understanding of fairness in decision-making processes, emphasizing the importance of the historical context and ongoing fairness assessment.

A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?

A Sober Look at LLMs for Material Discovery:  Are They Actually Good for Bayesian Optimization Over Molecules?

Agustinus Kristiadi, Felix Strieth-Kalthoff, Marta Skreta, Pascal Poupart, Alan Aspuru-Guzik, Geoff Pleiss
Poster Session 4

The research paper examines the use of large language models (LLMs) like ChatGPT for Bayesian optimization (BO) in material discovery, particularly focusing on molecules. Bayesian optimization is a technique used to optimize complex functions with limited data by leveraging prior knowledge. The study evaluates whether LLMs, which have shown promise in tasks involving natural language processing, are effective in aiding this optimization process within molecular chemistry.

Key findings include:

  1. LLMs can be useful for Bayesian optimization if pre-trained or refined with domain-specific data.
  2. Directly using general-purpose LLMs without domain-specific adjustments often yields suboptimal results.
  3. Techniques such as parameter-efficient fine-tuning (PEFT) and Bayesian neural networks can enhance the performance of LLMs in this context.
  4. The research provides insights and software tools for leveraging LLMs in scientific discovery, promoting efficiency in material discovery workflows while acknowledging limitations and future directions.

This study emphasizes a balanced, evidence-based approach to integrating LLMs in specialized scientific applications.

Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E Turner, Alireza Makhzani
Poster Session 5

This paper introduces Structured Inverse-Free Natural Gradient Descent (SINGD), a new optimization method for training neural networks. The authors aim to address two main issues with existing second-order methods like KFAC: high memory consumption and numerical instability in low-precision settings.

SINGD builds on the Inverse-Free Natural Gradient Descent (INGD) method, extending it to be more memory-efficient and numerically stable. The key innovations are:

  1. Formulating an inverse-free KFAC update
  2. Imposing structures on Kronecker factors to reduce memory usage

The authors demonstrate that SINGD can outperform first-order methods like AdamW on various neural network architectures (CNNs, Transformers, GNNs) while using similar or less memory. Importantly, SINGD remains stable in low-precision (half-precision) settings where KFAC becomes unstable.

This work bridges the gap between first- and second-order optimization methods in modern low-precision neural network training, potentially enabling more efficient training of large-scale models.

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli Shama Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue
Oral 2x Music and audio

This paper introduces a novel method called Stochastic Control Guidance (SCG) for generating symbolic music (like piano rolls) using diffusion models while adhering to non-differentiable musical rules. The key challenge is that many musical rules, such as note density or chord progression, are not differentiable, making traditional guidance methods ineffective.

The researchers approach this problem by framing it as a stochastic control problem. They develop SCG, which can work with pre-trained diffusion models in a plug-and-play manner, allowing for training-free guidance even with non-differentiable rules. SCG works by sampling multiple possible next steps at each iteration and selecting the one that best follows the target rules.

Additionally, the paper introduces a latent diffusion architecture for high-resolution symbolic music generation. When combined with SCG, this framework outperforms current state-of-the-art generators in various settings, demonstrating improved music quality and rule-based controllability.

The significance of this work lies in its ability to generate high-quality, rule-compliant symbolic music without the need for retraining models for each new rule, potentially making it a valuable tool for composers and music producers.

Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Wang
Poster Session 3

This paper investigates how language models (LMs) develop reasoning abilities through pre-training. The authors propose that LMs can aggregate indirect reasoning paths seen during pre-training, enabling them to draw new conclusions. They test this hypothesis in two scenarios: logical reasoning with knowledge graphs and chain-of-thought reasoning for math problems.

For knowledge graphs, they show that LMs pre-trained on random walk paths can deduce missing relations. For math problems, they demonstrate that training on unlabeled random walk reasoning paths improves performance on multiple datasets.

The study reveals that LMs can efficiently utilize unlabeled reasoning paths, and there’s usually an optimal path length for training. These findings support the authors’ hypothesis and suggest ways to improve LM pre-training for enhanced reasoning capabilities.

This work provides insights into how LMs acquire reasoning skills and offers potential strategies for improving their performance on complex reasoning tasks.

Related:

2024
Research 2024

Unlocking the Potential of Prompt-Tuning in Federated Learning

2024
AI Talent

Navigating the AI Talent Landscape: How Vector Institute Partnerships Address the Skills Gap

2024
AI Talent

Canadian AI job market shifting, favouring specialized, in-demand skills