Vector researchers tackle real-world AI challenges at ICML 2025 - Vector Institute for Artificial Intelligence

Leading researchers from Vector are presenting cutting-edge work at this year’s International Conference on Machine Learning (ICML), taking place July 13-19, 2025 in Vancouver, Canada and through virtual platforms. With a variety of accepted papers, Vector researchers are tackling some of the most pressing challenges in artificial intelligence – from making AI systems safer and more trustworthy to developing new tools for health care and environmental monitoring.

Below you will find 45 accepted papers and poster sessions from Vector Faculty Members, Vector Faculty Affiliates, and Vector Distinguished Postdoctoral Fellows. Papers marked with an asterisk (*) denote a spotlight paper.

Adaptive Elicitation of Latent Information Using Natural Language

Jimmy Wang, Tom Zollo, Richard Zemel (Vector Faculty Member), Hongseok Namkoong

Abstract

Eliciting information to reduce uncertainty on a latent entity is a critical skill in many application domains, e.g., assessing individual student learning outcomes, diagnosing underlying diseases, or learning user preferences.Though natural language is a powerful medium for this purpose, large language models (LLMs) and existing fine-tuning algorithms lack mechanisms for strategically gathering information to refine their own understanding of the latent entity.We propose an adaptive elicitation framework that actively}reduces uncertainty on the latent entityby simulating counterfactual responses. Since probabilistic modeling of an abstract latent entity is difficult, we validate and finetune LLM-based uncertainty quantification methods using perplexity over masked future observations produced by the latent entity.Our framework enables the development of sophisticated information-gathering strategies, and we demonstrate its versatility through experiments on dynamic opinion polling and adaptive student assessment.

Summary

We propose a framework for using LLMs to ask informative questions about variables and entities that cannot be directly observed. Potentially impactful applications include constructing a dynamic diagnostic questionnaire that maximizes the information gained about a patient’s health or generating a personalized set of test questions that yield the most insight into a student’s learning needs.

AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling

Alexander Capstick, Rahul G. Krishnan (Vector Faculty Member), Payam Barnaghi

Abstract

Large language models (LLMs) acquire a breadth of information across various domains. However, their computational complexity, cost, and lack of transparency often hinder their direct application for predictive tasks where privacy and interpretability are paramount. In fields such as healthcare, biology, and finance, specialised and interpretable linear models still hold considerable value. In such domains, labelled data may be scarce or expensive to obtain. Well-specified prior distributions over model parameters can reduce the sample complexity of learning through Bayesian inference; however, eliciting expert priors can be time-consuming. We therefore introduce AutoElicit to extract knowledge from LLMs and construct priors for predictive models. We show these priors are informative and can be refined using natural language. We perform a careful study contrasting AutoElicit with in-context learning and demonstrate how to perform model selection between the two methods. We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning. We show that AutoElicit saves over 6 months of labelling effort when building a new predictive model for urinary tract infections from sensor recordings of people living with dementia.

Summary

In this work, we propose AutoElicit, a method for using LLMs to aid predictive modelling tasks, with a focus on healthcare. Specifically, we present a method for using LLMs to elicit expert prior distributions for linear predictive models and demonstrate how human experts can aid the process. We then compare the posterior predictions with those made through in-context learning, where language models make predictions directly. Using data from our study on dementia, we show that AutoElicit saves over 6 months of labelling effort when building a new predictive model for urinary tract infections from sensor recordings of participants.

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

Gül Sena Altıntaş, Devin Kwok, Colin Raffel (Vector Faculty Member), David Rolnick

Abstract

Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is unclear to what extent such effects lead to meaningfully different networks in terms of either the models’ weights or the underlying functions that were learned. In this work, we show that during an initial “chaotic” phase, an extremely small perturbation reliably causes otherwise identical training trajectories to diverge – an effect that diminishes rapidly over training time. We quantify this divergence through (i) $L^2$ distance between parameters, (ii) the similarity of parameter vectors as measured by permutation alignment, and, most importantly, (iii) the loss barrier when interpolating between networks; revealing how perturbations across different hyperparameter or fine-tuning settings drive training trajectories toward distinct loss minima. Our findings provide insights into neural network training stability, with practical implications for fine-tuning and model merging techniques.

Summary

Due to noise, two neural networks trained from the same random starting point can learn one of many different solutions to the same problem, whereas pre-trained networks tend to learn the same solution. What we don’t know is, when and how do networks switch from learning different solutions to the same solution? To answer this question, we train twin copies of neural networks in exactly the same way, but add a tiny change (perturbation) to one of the copies during training. We find that for networks at random starting points, even the tiniest change (far smaller than typical random effects) causes training to learn different solutions, whereas pre-trained networks only learn different solutions when changes much larger than random effects are applied. Our findings are significant because we often need to retrain and combine knowledge from several huge networks (such as large language models). As some methods work better with similar solutions versus different solutions, we can tailor our retraining or model combining methods to best target each case.

Calibrated Value-Aware Model Learning with Probabilistic Environment Models

Claas Voelcker, Anastasiia Pedan, Arash Ahmadian, Romina Abachi, Igor Gilitschenski (Vector Faculty Affiliate), Amir-massoud Farahmand (Vector Faculty Affiliate)

Abstract

The idea of value-aware model learning, that models should produce accurate value estimates, has gained prominence in model-based reinforcement learning. The MuZero loss, which penalizes a model’s value function prediction compared to the ground-truth value function, has been utilized in several prominent empirical works in the literature. However, theoretical investigation into its strengths and weaknesses is limited. In this paper, we analyze the family of value-aware model learning losses, which includes the popular MuZero loss. We show that these losses, as normally used, are uncalibrated surrogate losses, which means that they do not always recover the correct model and value function. Building on this insight, we propose corrections to solve this issue. Furthermore, we investigate the interplay between the loss calibration, latent model architectures, and auxiliary losses that are commonly employed when training MuZero-style agents. We show that while deterministic models can be sufficient to predict accurate values, learning calibrated stochastic models is still advantageous.

Summary

This paper analyzes Value-Aware Model Learning (VAML), including the MuZero loss, in model-based reinforcement learning. VAML-based losses train a model to predict accurate estimates of the value of taking an action in each state, instead of training it to predict states themselves accurately.

Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics

Tyler Kastner, Mark Rowland, Yunhao Tang, Murat Erdogdu (Vector Faculty Member), Amir-massoud Farahmand (Vector Faculty Affiliate)

Abstract

We study the problem of distributional reinforcement learning using categorical parameterisations and a KL divergence loss. Previous work analyzing categorical distributional RL has done so using a Cramér distance-based loss, simplifying the analysis but creating a theory-practice gap. We introduce a preconditioned version of the algorithm, and prove that it is guaranteed to converge. We further derive the asymptotic variance of the categorical estimates under different learning rate regimes, compare to that of classical reinforcement learning, and analyze how these updates are affected in the linear function approximation setting. We finally empirically validate our results and perform an empirical investigation into the relative strengths of using KL losses, and derive a number of actionable insights for practitioners.

Summary

A popular approach to deep reinforcement learning is to use classification losses to learn the range of possible future outcomes. Previous theoretical works studying this algorithm change the loss used in order to simplify the analysis, but this creates a theory-practice gap. In this work, we directly study these learning algorithms with the classification loss used in practice, the KL divergence. We show that with some modifications to the dynamics (the use of a preconditioner matrix), the updates provably converge. We also study the efficiency of these methods compared to standard reinforcement learning, and we prove results on the exact variance of these algorithms as they approach convergence. Throughout our analysis, we obtain a number of insights that are valuable to anyone using these methods in practice, such as how to modify the learning rate used as one changes the number of atoms (a separate hyperparameter), and how the number and locations of these atoms affect the error incurred.

Commute Graph Neural Networks

Wei Zhuo, Han Yu, Guang Tan, Xiaoxiao Li (Vector Faculty Member)

Abstract

Graph Neural Networks (GNNs) have shown remarkable success in learning from graph-structured data. However, their application to directed graphs (digraphs) presents unique challenges, primarily due to the inherent asymmetry in node relationships. Traditional GNNs are adept at capturing unidirectional relations but fall short in encoding the mutual path dependencies between nodes, such as asymmetrical shortest paths typically found in digraphs. Recognizing this gap, we introduce **C**ommute **G**raph **N**eural **N**etworks (CGNN), an approach that seamlessly integrates node-wise commute time into the message passing scheme. The cornerstone of CGNN is an efficient method for computing commute time using a newly formulated digraph Laplacian. Commute time is then integrated into the neighborhood aggregation process, with neighbor contributions weighted according to their respective commute time to the central node in each layer. It enables CGNN to directly capture the mutual, asymmetric relationships in digraphs. Extensive experiments confirm the superior performance of CGNN. Source code of CGNN is anonymously available here.

Summary

Many GNNs treat directed graphs (digraphs) as collections of one-way edges, so they fail to capture the asymmetric round-trip connectivity that actually determines how strongly two nodes interact. This limitation is evident in social media, where a fan can instantly reach a celebrity, yet the return interaction rarely occurs. We introduce Commute Graph Neural Networks (CGNN) to explicitly model this asymmetry. CGNN leverages a novel digraph Laplacian (DiLap) coupled with lightweight, feature-based graph rewiring. This ensures sparsity and irreducibility, facilitating efficient computation of deterministic commute times, defined as the expected number of steps for a random walk from one node to another and back again. These commute times serve as weights for neighbor messages, allowing mutually reachable nodes to exert greater influence during aggregation. Commute time naturally captures realistic mutual interactions, such as follower-celebrity dynamics in social media or bidirectional web traffic, therefore, CGNN provides a more accurate, interpretable, and broadly applicable framework for learning from directed networks.

A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD

Ruinan Jin, Xiao Li, Yaoliang Yu (Vector Faculty Member), Baoxiang Wang

Abstract

Adaptive moment estimation (Adam) is a cornerstone optimization algorithm in deep learning, widely recognized for its flexibility with adaptive learning rates and efficiency in handling large-scale data. However, despite its practical success, the theoretical understanding of Adam’s convergence has been constrained by stringent assumptions, such as almost surely bounded stochastic gradients or uniformly bounded gradients, which are more restrictive than those typically required for analyzing stochastic gradient descent (SGD).In this paper, we introduce a novel and comprehensive framework for analyzing the convergence properties of Adam. This framework offers a versatile approach to establishing Adam’s convergence. Specifically, we prove that Adam achieves asymptotic (last iterate sense) convergence in both the almost sure sense and the $L_1$ sense under the relaxed assumptions typically used for SGD, namely $L$-smoothness and the ABC inequality. Meanwhile, under the same assumptions, we show that Adam attains non-asymptotic sample complexity bounds similar to those of SGD.

Summary

Adam is one of the most popular optimization methods used to train deep learning models. It works well in practice because it can automatically adjust how fast it learns during training. However, until now, understanding exactly when and why Adam works has required very strong and often unrealistic mathematical assumptions. In this paper, we present a new theoretical framework that shows Adam can succeed under much more relaxed and practical conditions—similar to those needed to analyze the more basic algorithm SGD (stochastic gradient descent). Our results show that Adam not only performs well in practice but also has strong theoretical guarantees, helping bridge the gap between its empirical success and formal understanding. This work may also help researchers analyze other similar optimization methods more easily.

Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention

Stephan Rabanser, Ali Shahin Shamsabadi, Olive Franzese, Xiao Wang, Adrian Weller, Nicolas Papernot (Vector Faculty Member)

Abstract

Cautious predictions—where a machine learning model abstains when uncertain—are crucial for limiting harmful errors in safety-critical applications. In this work, we identify a novel threat: a dishonest institution can exploit these mechanisms to discriminate or unjustly deny services under the guise of uncertainty. We demonstrate the practicality of this threat by introducing an uncertainty-inducing attack called Mirage, which deliberately reduces confidence in targeted input regions, thereby covertly disadvantaging specific individuals. At the same time, Mirage maintains high predictive performance across all data points. To counter this threat, we propose Confidential Guardian, a framework that analyzes calibration metrics on a reference dataset to detect artificially suppressed confidence. Additionally, it employs zero-knowledge proofs of verified inference to ensure that reported confidence scores genuinely originate from the deployed model. This prevents the provider from fabricating arbitrary model confidence values while protecting the model’s proprietary details. Our results confirm that Confidential Guardian effectively prevents the misuse of cautious predictions, providing verifiable assurances that abstention reflects genuine model uncertainty rather than malicious intent.

Summary

When artificial intelligence (AI) systems are unsure, they often choose to “abstain” from making a prediction. This cautious behavior helps avoid harmful mistakes in high-stakes settings like medicine, finance, or criminal justice. But what if that very mechanism—meant to promote safety—could be twisted into a tool for harm? In our work, we reveal a troubling possibility: an organization could deliberately make its AI system appear uncertain for certain people—not because the task is genuinely hard, but to quietly deny them services like loans or benefits. We call this deceptive strategy Mirage, an attack that reduces the AI’s confidence in specific cases while still performing well overall. This makes it hard for outside observers to notice anything suspicious. To stop such misuse, we introduce Confidential Guardian, a new system that allows independent auditors to check whether an AI’s cautious behavior is real or artificially manufactured. It does this by analyzing how the AI behaves on trusted test cases, and verifying its behavior using a technique that ensures honesty—without revealing the model’s inner workings. Our findings highlight a hidden danger in today’s AI systems, and offer a path toward greater transparency and fairness—ensuring that caution is used for safety, not for discrimination.

Direct Motion Models for Assessing Generated Videos

Kelsey Allen (Vector Faculty Member), Carl Doersch, Guangyao Zhou, Mohammed Suhail, Danny Driess, Ignacio Rocco, Yulia Rubanova, Thomas Kipf, Mehdi S. M. Sajjadi, Kevin Murphy, Joao Carreira, Sjoerd van Steenkiste

Abstract

A current limitation of video generative video models is that they generate plausible looking frames, but poor motion — an issue that is not well captured by FVD and other popular methods for evaluating generated videos. Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion. Our novel approach is based on auto-encoding point tracks and yields motion features that can be used to compare distributions of videos (as few as one generated and one ground truth, or as many as two datasets), and reconstruction errors for evaluating motion of single videos. We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data, and can predict human evaluations of temporal consistency and realism in generated videos obtained from open-source models better than a wide range of alternatives.

Summary

Current artificial intelligence models that create videos often make believable-looking individual frames, but the way things move in the videos isn’t very realistic. Existing ways of checking video quality don’t do a good job of spotting these poor movements, and usually require access to a whole set of videos rather than being applicable to just one. We created a new way to measure video quality that focuses specifically on how well objects move and interact. Our method works by tracking points on objects throughout the video and using this information to understand the motion. This allows us to see how realistic the movement is, even for individual videos. We found that our new approach, which uses these tracked points, is much better at detecting weird or unnatural movements in computer-generated videos compared to other methods. It also does a better job of matching what people think looks realistic and consistent in videos made by AI. Additionally, our method can help pinpoint exactly where in a video the movement looks wrong, which makes it easier to understand the errors being made.

Disparate Conditional Prediction in Multiclass Classifiers

Sivan Sabato (Vector Faculty Member), Eran Treister, Elad Yom-Tov

Abstract

We propose methods for auditing multiclass classifiers for fairness under multiclass equalized odds, by estimating the deviation from equalized odds when the classifier is not completely fair. We generalize to multiclass classifiers the measure of Disparate Conditional Prediction (DCP), originally suggested by Sabato & Yom-Tov (2020) for binary classifiers. DCP is defined as the fraction of the population for which the classifier predicts with conditional prediction probabilities that differ from the closest common baseline. We provide new local-optimization methods for estimating the multiclass DCP under two different regimes, one in which the conditional confusion matrices for each protected sub-population are known, and one in which these cannot be estimated, for instance, because the classifier is inaccessible or because good-quality individual-level data is not available. These methods can be used to detect classifiers that likely treat a significant fraction of the population unfairly. Experiments demonstrate the accuracy of the methods. The code for the experiments is provided as supplementary material.

Summary

Many machine learning systems make decisions that affect people’s lives, like approving loans or recommending medical treatments. When access to the underlying system is difficult, it becomes harder to check if they are treating all groups of people fairly. Moreover, existing fairness checks often do not address cases where these systems handle more than two possible outcomes. We introduce new methods to audit these multiclass decision systems for fairness. We build on a fairness measure called Disparate Conditional Prediction (DCP), which looks at how many people receive predictions that differ from a fair baseline. We extend this measure to work with systems that support more than two outcomes, and provide two ways to estimate the DCP, one for cases in which we have detailed data about how the system behaves for different groups, and the other for cases when we do not have access to the system or high-quality individual data. These tools make it easier to detect when a decision-making system is likely treating a significant portion of the population unfairly and thus helps organizations and regulators identify and address bias leading to fairer outcomes for everyone.

Fast Exact Unlearning of Fine-Tuning Data for LLMs

Andrei Muresanu, Anvith Thudi, Michael Zhang, Nicolas Papernot (Vector Faculty Member)

Abstract

Modern machine learning models are expensive to train, and there is a growing concern about the challenge of retroactively removing specific training data. Achieving exact unlearning in deep learning pipelines—producing models as if certain data had never been included in training—remains an open problem. In this paper, we revisit exact unlearning in deep learning and show that for large language models (LLMs) we can efficiently exactly unlearn “fine-tuning data” (the data used to adapt a pre-trained model). This follows from two observations. First, we can use in-context learning to adapt the LLM to the fine-tuning dataset instead of SGD based algorithms. Second, we show that accurate in-context learning can be done with quantized k-means, which allows for effectively constant time unlearning operations. Our empirical evaluation shows that this unlearning recipe has similar performance to fine-tuning alternatives, but vastly reduces the unlearning costs. Our study also highlights the need for new measures of unlearning cost when adapting the learning algorithm to have faster unlearn operations.

Summary

After deploying a model, it may become necessary to “unlearn” some of the original training data. Exactly unlearning training data has been expensive for deep learning, and in this paper we showed that it can be efficient when adapting a pre-trained LLM to a task. This followed from observing that a sometimes effective learning algorithm is pre-pending training examples to the prompt given to an LLM. We studied ways of unlearning this selection of examples, and found we could do so with costs independent of the model and dataset size. We also observed all past efforts to making unlearning faster also increased inference cost, and proposed new metrics to capture this trade-off.

FedOne: Query-Efficient Federated Learning for Black-box Discrete Prompt Learning

Ganyu Wang, Jinjie Fang, Maxwell (Juncheng) Yin, Bin Gu, Xi Chen, Boyu Wang (Vector Faculty Affiliate), Yi Chang, Charles X. Ling

Abstract

Black-Box Discrete Prompt Learning (BDPL) is a prompt-tuning method that optimizes discrete prompts without accessing model parameters or gradients, making the prompt tuning on a cloud-based Large Language Model (LLM) feasible. Adapting Federated Learning (FL) to BDPL could further enhance prompt tuning performance by leveraging data from diverse sources. However, all previous research on federated black-box prompt tuning had neglected the substantial query cost associated with the cloud-based LLM service. To address this gap, we conducted a theoretical analysis of query efficiency within the context of federated black-box prompt tuning. Our findings revealed that degrading FedAvg to activate only one client per round, a strategy we called \textit{FedOne}, enabled optimal query efficiency in federated black-box prompt learning. Building on this insight, we proposed the FedOne framework, a federated black-box discrete prompt learning method designed to maximize query efficiency when interacting with cloud-based LLMs. We conducted numerical experiments on various aspects of our framework, demonstrating a significant improvement in query efficiency, which aligns with our theoretical results.

Summary

Large language models like ChatGPT are often accessed through paid services that don’t let users see or change the model’s internal components. To customize these models for specific tasks, users must repeatedly “query” them, which is both costly and slow. This paper explores how many users can work together to fine-tune these models without sharing their data, using a method called federated learning. But in this setup, the cost multiplies: each participating user has to make many queries to the LLMs. Making it impractical. We introduce FedOne, a new approach that trains the model by activating only one user at a time. Our analysis shows that this setup is not only far more efficient in reducing expensive queries but also retains strong performance. We tested this idea on real-world tasks using models like GPT-3.5 and showed that FedOne is both effective and cost-efficient. FedOne makes it easier for people and organizations to adapt powerful AI tools to their needs at a lower cost.

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts *

Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Alan Aspuru-Guzik (Vector Faculty Member), Arnaud Doucet, Rob Brekelmans (Vector Distinguished Postdoctoral Fellow), Alexander Tong, Kirill Neklyudov

Abstract

While score-based generative models are the model of choice across diverse domains, there are limited tools available for controlling inference-time behavior in a principled manner, e.g. for composing multiple pretrained models. Existing classifier-free guidance methods use a simple heuristic to mix conditional and unconditional scores to approximately sample from conditional distributions. However, such methods do not approximate the intermediate distributions, necessitating additional `corrector’ steps. In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. We derive a weighted simulation scheme which we call Feynman-Kac Correctors (FKCs) based on the celebrated Feynman-Kac formula by carefully accounting for terms in the appropriate partial differential equations (PDEs). To simulate these PDEs, we propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality. We empirically demonstrate the utility of our methods by proposing amortized sampling via inference-time temperature annealing, improving multi-objective molecule generation using pretrained models, and improving classifier-free guidance for text-to-image generation.

Summary

Diffusion models are powerful tools for generating data like images, molecules, or text, but it is generally difficult to control their generation process. This paper introduces a method called Feynman-Kac Correctors (FKC), which allows for precise control over what a diffusion model generates without retraining it. FKC works by adjusting the way samples are drawn from the model, based on the Sequential Monte Carlo framework and, in particular, the Feynman-Kac formula. This enables a principled approach to sampling from combined target distributions, like mixtures or products of multiple pretrained models, or temperature-annealed target distributions. We show that FKC improves sampling in three settings: 1. classifier-free guidance, which is widely used in text-to-image generation, 2. generating molecules that satisfy multiple objectives (binding to two proteins simultaneously) and 3. sampling from physical systems at different temperatures using a model trained at a single temperature. Unlike traditional methods, FKC allows for flexible and efficient sampling with little added computation. This opens up new possibilities for applications in AI, drug discovery, and scientific simulations.

Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator *

Yu Li, Felix Dangel (Vector Distinguished Postdoctoral Fellow), Derek Tam, Colin Raffel (Vector Faculty Member)

Abstract

The diagonal of a model’s Fisher Information Matrix (the “Fisher”) has frequently been used as a way to measure parameter sensitivity. Typically, the Fisher is estimated by computing the squared gradient of the model’s outputs with respect to its parameters, averaged over a few hundred or thousand examples — a process which incurs nontrivial computational costs. At the same time, adaptive gradient methods like the ubiquitous Adam optimizer compute a moving average of the squared gradient over the course of training. This paper therefore explores whether an approximation of the Fisher can be obtained “for free” by recycling the squared gradient accumulator that has already been computed over the course of training. Through a comprehensive set of experiments covering five applications of the Fisher, we demonstrate that the “Squisher” (**Squ**ared gradient accumulator as an approximation of the F**isher**) consistently performs similarly to the Fisher while outperforming baseline methods. Additionally, we clarify the exact differences between the Squisher and the Fisher and provide empirical quantification of their respective impact.

Summary

Understanding which parts of a neural network are most important (i.e., which parameters matter most) can help with tasks like model merging, pruning, transfer learning, and continual learning. A popular tool for this is the diagonal of the Fisher Information Matrix, which we refer to as the Fisher. But calculating it can be expensive — it requires extra computation on hundreds or thousands of examples. In this paper, we ask if we get a good-enough version of the Fisher without paying the full price. Surprisingly, the answer is yes. During training, widely used optimizers like Adam already keep track of a similar quantity: the squared gradients of the model’s parameters. This approximation, which we call Squisher (**Squ**ared gradient accumulator as an approximation of the F**isher**), requires no extra computation or memory and is readily available “for free.” Across five common applications of the Fisher, we show that Squisher produces results comparable to the original Fisher method, but with significantly lower computational cost. It saves time and resources, making it easier to apply these techniques at scale.

Galileo: Learning Global & Local Features of Many Remote Sensing Modalities

Gabriel Tseng, Anthony Fuller, Marlena Reil, Henry Herzog, Patrick Beukema, Favyen Bastani, James Green, Evan Shelhamer (Vector Faculty Member), Hannah Kerner, David Rolnick

Abstract

We introduce a highly multimodal transformer to represent many remote sensing modalities – multispectral optical, synthetic aperture radar, elevation, weather, pseudo-labels, and more – across space and time. These inputs are useful for diverse remote sensing tasks, such as crop mapping and flood detection. However, learning shared representations of remote sensing data is challenging, given the diversity of relevant data modalities, and because objects of interest vary massively in scale, from small boats (1-2 pixels and fast) to glaciers (thousands of pixels and slow). We present a novel self-supervised learning algorithm that extracts multi-scale features across a flexible set of input modalities through masked modeling. Our dual global and local contrastive losses differ in their targets (deep representations vs. shallow input projections) and masking strategies (structured vs. not). Our Galileo is a single generalist model that outperforms SoTA specialist models for satellite images and pixel time series across eleven benchmarks and multiple tasks.

Summary

We capture a lot of information about our planet from “remote sensing data” (satellite observations, topographic maps, and more) but we know less than you might think. Analyzing remote sensing data with machine learning can help us better understand our changing planet. We present a machine learning model — which we call Galileo — that can help summarize remote sensing data. This means that with minimal further processing, its summaries can help make predictions and maps, like of floods or agricultural fields. We achieve this by giving Galileo an incomplete set of data for a time and place, and having it reconstruct what we removed. By being careful about exactly what we ask Galileo to reconstruct, we can make sure Galileo’s summaries take into account big and slow things (like glaciers) as well as small and fast things (like fishing boats). Galileo is uniquely relevant to remote sensing in practice by its modeling of data across space, time, and a variety of data types (e.g. optical data from satellites, topographic maps, weather data and more). We test Galileo on 15 diverse tasks against 11 other methods: Galileo performs best with a single general model. This makes it immediately useful in many existing applications.

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It *

Marvin F. da Silva, Felix Dangel (Vector Distinguished Postdoctoral Fellow), Sageev Oore (Vector Faculty Affiliate)

Abstract

The concept of sharpness has been successfully applied to traditional architectures like MLPs and CNNs to predict their generalization. For transformers, however, recent work reported weak correlation between flatness and generalization. We argue that existing sharpness measures fail for transformers, because they have much richer symmetries in their attention mechanism that induce directions in parameter space along which the network or its loss remain identical. We posit that sharpness must account fully for these symmetries, and thus we redefine it on a quotient manifold that results from quotienting out the transformer symmetries, thereby removing their ambiguities. Leveraging tools from Riemannian geometry, we propose a fully general notion of sharpness, in terms of a geodesic ball on the symmetry-corrected quotient manifold. In practice, we need to resort to approximating the geodesics. Doing so up to first order yields existing adaptive sharpness measures, and we demonstrate that including higher-order terms is crucial to recover correlation with generalization. We present results on diagonal networks with synthetic data, and show that our geodesic sharpness reveals strong correlation for real-world transformers on both text and image classification tasks.

Summary

In deep learning, understanding why some neural networks make better predictions than others is an important problem. One popular idea to explain this is called sharpness. Sharpness looks at the shape of the network’s loss landscape, a kind of landscape showing how good or bad the network is doing depending on small changes in its internal parameters. Generally, if this landscape is “flat,” it means small changes don’t hurt performance much, and the model is more likely to generalize well to data it has not seen before. This idea works well for older types of neural networks like MLPs (multilayer perceptrons) and CNNs (convolutional neural networks). But for transformers this relationship breaks down. Researchers have found that sharpness, as it’s usually measured, doesn’t reliably predict whether a transformer will generalize well. We argue that the problem isn’t with the idea of sharpness itself, but with how it’s measured in transformers. Transformers have a lot of ways you can change their internal parameters without actually changing how the model behaves (symmetries). These symmetries confuse traditional sharpness measurements. Using tools from differential geometry, we introduce a more accurate definition of sharpness that takes these symmetries into account, finding that once we correct for these symmetries, sharpness is still a useful concept.

Homophily Enhanced Graph Domain Adaptation

Ruiyi Fang, Bingheng Li, Jingyu Zhao, Ruizhi Pu, QIUHAO Zeng, Gezheng Xu, Charles X. Ling, Boyu Wang (Vector Faculty Affiliate)

Abstract

Graph Domain Adaptation (GDA) transfers knowledge from labeled source graphs to unlabeled target graphs, addressing the challenge of label scarcity. In this paper, we highlight the significance of graph homophily, a pivotal factor for graph domain alignment, which, however, has long been overlooked in existing approaches. Specifically, our analysis first reveals that homophily discrepancies exist in benchmarks. Moreover, we also show that homophily discrepancies degrade GDA performance from both empirical and theoretical aspects, which further underscores the importance of homophily alignment in GDA. Inspired by this finding, we propose a novel homophily alignment algorithm that employs mixed filters to smooth graph signals, thereby effectively capturing and mitigating homophily discrepancies between graphs. Experimental results on a variety of benchmarks verify the effectiveness of our method.

Summary

Graphs are powerful ways to represent complex relationships, like how people interact on social networks or how information flows across the internet. In many real-world situations, useful information (like labels or categories) exists for one graph but not for another. Graph Domain Adaptation (GDA) helps transfer this knowledge from one graph to another, saving time and resources. In our research, we discovered that a key factor called homophily—the tendency for connected nodes to be similar is often different between graphs, and this mismatch can hurt GDA’s performance. Surprisingly, this issue has largely been ignored until now. We studied how these differences affect results and found that aligning this similarity across graphs can make a big difference. We developed a new method to smooth out these differences and improve how well knowledge transfers between graphs. Our approach works well across various datasets, showing promise for improving learning from graph data in many applications, from recommendation systems to social networks.

Improving Robustness to Subpopulation Shifts by Heuristic Subspace Exploration with Enhanced Diversification

Nguyen Nhat Minh To, Paul Wilson, Viet Nguyen, Mohamed Harmanani, Michael Cooper, Fahimeh Fooladgar, Purang Abolmaesumi, Parvin Mousavi (Vector Faculty Member), Rahul G. Krishnan (Vector Faculty Member)

Abstract

Subpopulation shifts, characterized by disparities in subpopulation distributions between training and target datasets, can significantly degrade the performance of the machine learning model. Current solutions to subpopulation shifts often involve modifying empirical risk minimization with re-weighting strategies to improve generalization across subpopulations. This strategy often relies on assumptions about the number and nature of subpopulations and annotations of subpopulation membership, which are unavailable for many real-world datasets. We propose a new solution to heuristically explore the feature subspace: we train many classifiers while enforcing diversification to promote the discovery and correct classification of new subpopulations without requiring prior knowledge of the subpopulations. Given a feature extractor network, we replace its standard linear layer with a mixture of prototypical classifiers, where each member is trained to classify the data while focusing on different features and samples compared to the other members. We demonstrate that our solution outperforms the prior state-of-the-art in worst-group accuracy on most benchmarks using empirical evaluations across nine real-world datasets covering diverse domains and subpopulation shift types (code is available at https://anonymous.4open.science/r/prototypical_ensembles-BCB3).

Summary

Machine learning models often struggle when they encounter situations that differ slightly from what they were trained on. This is a major issue when data includes hidden subgroups, such as different types of people, environments, or medical conditions, that are not equally represented. For example, a model trained mostly on healthy patients might not work well on those with rare diseases. Our research introduces a new technique called the Diversified Prototypical Ensemble (DPE) to tackle this problem. Instead of using just one model, we create a group of simple classifiers called prototypes. Each one learns to focus on different patterns or features in the data. We encourage these classifiers to be as different as possible, so together they can cover a broader variety of hidden subgroups. The key benefit of DPE is that it does not require prior knowledge of the subgroups. It can automatically discover and adapt to them using only the data itself. This makes it especially useful in real-world situations where such subgroup labels are missing or hard to define. Across nine challenging datasets, our method consistently outperforms existing solutions and helps make machine learning models more fair and reliable when used in diverse populations.

Interaction-Aware Gaussian Weighting for Clustered Federated Learning

Alessandro Licciardi, Davide Leo, Eros Fanì, Barbara Caputo, Marco Ciccone (Vector Distinguished Postdoctoral Fellow)

Abstract

Federated Learning (FL) emerged as a decentralized paradigm to train models while preserving privacy. However, conventional FL struggles with data heterogeneity and class imbalance, which degrade model performance.Clustered FL balances personalization and decentralized training by grouping clients with analogous data distributions, enabling improved accuracy while adhering to privacy constraints. This approach effectively mitigates the adverse impact of heterogeneity in FL.In this work, we propose a novel clustering method for FL, **FedGWC** (Federated Gaussian Weighting Clustering), which groups clients based on their data distribution, allowing training of a more robust and personalized model on the identified clusters. **FedGWC** identifies homogeneous clusters by transforming individual empirical losses to model client interactions with a Gaussian reward mechanism. Additionally, we introduce the *Wasserstein Adjusted Score*, a new clustering metric for FL to evaluate cluster cohesion with respect to the individual class distribution. Our experiments on benchmark datasets show that **FedGWC** outperforms existing FL algorithms in cluster quality and classification accuracy, validating the efficacy of our approach.

Summary

Training AI models usually requires centralizing vast amounts of data, which raises privacy concerns. Federated Learning (FL) offers a solution by allowing edge devices or institutions – such as smartphones and hospitals – to train a shared model collaboratively without sending their private data to a central server. However, real-world data is often messy: different devices might have very diverse types of data, or some data categories might be rare on some devices whilst common on others. This *data heterogeneity* makes it hard for FL models to perform well across all devices. Our work introduces **FedGWC**, a new method to make FL training more effective. Instead of forcing all devices to train one model, FedGWC groups devices with similar data characteristics into clusters, allowing each cluster to train its specialized model, which is much better suited to the data within that group. Think of it like organizing a study group: instead of everyone studying the same broad topic, smaller groups form to focus on specific subjects they all need help with. FedGWC does this by analyzing how well each device’s model learns from its own data without actually looking at the data itself. We also developed a new way to measure how good these clusters are, especially when some data categories are much rarer than others. Our experiments show that FedGWC significantly improves the accuracy of models in FL setups, especially when data is diverse and unevenly distributed. This means we can build more powerful and personalized AI applications while preserving sensitive private information.

Language Models May Verbatim Complete Text They Were Not Explicitly Trained On *

Ken Ziyu Liu, Christopher A. Choquette Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Nicolas Papernot (Vector Faculty Affiliate), Percy Liang

Abstract

An important question today is whether a given text was used to train a large language model (LLM). A \emph{completion} test is often employed: check if the LLM completes a sufficiently complex text. But, we require a ground-truth definition of membership; most commonly, it is defined as a member based on the \ngram overlap between the target text and any text in the dataset. In this work, we demonstrate that this $n$-gram based membership definition can be effectively gamed. We study scenarios where sequences are \emph{non-members} for a given $n$ and we find that completion tests still succeed. We find many natural cases of this by retraining LLMs after removing all training samples that were completed: these cases include exact duplicates, near-duplicates, and even short overlaps; they showcase that it is difficult to find a single viable choice of $n$. Using these insights, we design adversarial datasets that can cause any target sequences to be completed without containing it, for any reasonable choice of $n$. Our findings highlight the inadequacy of $n$-gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.

Summary

What exactly do we mean by “training set inclusion” under language models? A vast body work—across research, policy, and even lawsuits—has implicitly converged on definitions based on $n$-gram (substring) overlap. That is, a piece of text is considered a “member” of the training set, if some span of that text (n-gram) can be found in the training set. This paper is a tale of two experiments that demonstrates the fundamental limitations of all $n$-gram based membership definitions. We ask two questions from the lens of (verbatim) text completion with a language model: 1. **Deletion:** can we *prevent* the verbatim generation of a text by deleting all of its n-grams and retraining the model from scratch? The answer is no! Many deleted texts can still be generated verbatim by the retrained LLM. 2. **Addition:** can we *cause* the verbatim generation of a text by training on texts with no n-gram overlap? The answer is yes! And it only takes a few gradient steps of fine-tuning. The key message of this work is that data membership in LLMs extends beyond set membership of text in the raw dataset; it also encompasses data neighborhoods (“soft membership”) due to LLM generalization, data provenance, preprocessing, and other auxiliary information that the training algorithm gets access to throughout the ML pipeline. Many subfields, such as copyright, unlearning, membership inference, and data transparency, require a membership definition, and our work shows overly simplistic notions of membership hinder progress in these areas.

LAST SToP for Modeling Asynchronous Time Series

Shubham Gupta, Thibaut Durand, Graham Taylor (Vector Faculty Member), Lilian Bialokozowicz

Abstract

We present a novel prompt design for Large Language Models (LLMs) tailored to **Asynchronous Time Series**. Unlike regular time series, which assume values at evenly spaced time points, asynchronous time series consist of events occurring at irregular intervals, each described in natural language. Our approach effectively utilizes the rich natural language of event descriptions, allowing LLMs to benefit from their broad world knowledge for reasoning across different domains and tasks. This allows us to extend the scope of asynchronous time series analysis beyond forecasting to include tasks like anomaly detection and data imputation. We further introduce **Stochastic Soft Prompting**, a novel prompt-tuning mechanism that significantly improves model performance, outperforming existing finetuning methods such as QLORA. Through extensive experiments on real-world datasets, we demonstrate that our approach achieves state-of-the-art performance across different tasks and datasets.

Summary

Most AI systems analyze data that arrives at regular intervals, like daily stock prices or hourly temperature readings. But many real-world events happen unpredictably — like medical emergencies, social media posts, or equipment failures — and are described in natural language rather than just numbers. Traditional methods struggle with this “asynchronous time series” data because they can’t handle irregular timing and rich text descriptions together. We developed LASTS, a new approach that uses Large Language Models to analyze these irregular event sequences. Instead of forcing events into rigid categories, our method preserves their natural language descriptions, allowing the AI to use its understanding of language and world knowledge. We also created “Stochastic Soft Prompting,” a finetuning technique that helps the LLMs understand our specific domain data much better than other famous finetuning techniques. Our approach significantly outperforms existing methods across multiple real-world datasets. This makes sophisticated time series analysis more accessible and could improve applications in healthcare monitoring, financial analysis, and social media understanding, helping organizations better predict and respond to irregular but important events.

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

Seyed Mohammad Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal (Vector Faculty Member), Renjie Liao (Vector Faculty Member)

Abstract

Advances in Large Language Models (LLMs) have sparked interest in their ability to solve Olympiad-level math problems. However, the training and evaluation of these models are constrained by the limited size and quality of available datasets, as creating large-scale data for such advanced problems requires extensive effort from human experts.In addition, current benchmarks are prone to contamination, leading to unreliable evaluations.In this paper, we present an automated pipeline that leverages the rich resources of the Art of Problem Solving (AoPS) forum, which predominantly features Olympiad-level problems and community-driven solutions.Using open-source LLMs, we develop a method to extract question-answer pairs from the forum, resulting in **AoPS-Instruct**, a dataset of more than 600,000 high-quality QA pairs.Our experiments demonstrate that fine-tuning LLMs on AoPS-Instruct improves their reasoning abilities across various benchmarks. Moreover, we build an automatic pipeline that introduces **LiveAoPSBench**, an evolving evaluation set with timestamps, derived from the latest forum data, providing a contamination-resistant benchmark for assessing LLM performance.Notably, we observe a significant decline in LLM performance over time, suggesting their success on older examples may stem from pre-training exposure rather than true reasoning ability. Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning, offering valuable insights into the capabilities and limitations of LLMs in this domain.

Summary

Most existing LLMs struggle with advanced math problems because there is very little high‑quality training data for Olympiad‑level questions, and existing benchmarks often include problems the models have already seen during pre‑training, making evaluations unreliable. To address this, we built an automated pipeline that mines the Art of Problem Solving forum for genuine competition‑level problems and community‑provided solutions, then uses open‑source LLMs to extract and clean more than 600,000 question–answer pairs, creating the AoPS‑Instruct dataset. We also developed LiveAoPSBench, an evolving evaluation set drawn from the latest forum posts, which filters out any overlap with earlier data to avoid contamination. By fine‑tuning various LLMs on AoPS‑Instruct, we observed marked improvements in their ability to solve challenging math problems. Furthermore, tracking performance over time on LiveAoPSBench revealed that many models perform worse on newer questions, indicating that past successes often stemmed from having seen similar problems during pre‑training rather than genuine reasoning skills. This work offers a scalable way to generate and maintain large, reliable datasets for advanced mathematical reasoning, helping researchers better understand and push the true capabilities of LLMs in this domain.

Leveraging Per-Instance Privacy for Machine Unlearning

Naz Sepahvand, Anvith Thudi, Berivan Isik, Ashmita Bhattacharyya, Nicolas Papernot (Vector Faculty Member), Eleni Triantafillou, Daniel Roy (Vector Faculty Member), Gintare Karolina Dziugaite

Abstract

We present a principled, per-instance approach to quantifying the difficulty of unlearning via fine-tuning. We begin by sharpening an analysis of noisy gradient descent for unlearning (Chien et al., 2024), obtaining a better utility–unlearning tradeoff by replacing worst-case privacy loss bounds with per-instance privacy losses (Thudi et al.,2024), each of which bounds the (Renyi) divergence to retraining without an individual data point. To demonstrate the practical applicability of our theory, we present empirical results showing that our theoretical predictions are born out both for Stochastic Gradient Langevin Dynamics (SGLD) as well as for standard fine-tuning without explicit noise. We further demonstrate that per-instance privacy losses correlate well with several existing data difficulty metrics, while also identifying harder groups of data points, and introduce novel evaluation methods based on loss barriers. All together, our findings provide a foundation for more efficient and adaptive unlearning strategies tailored to the unique properties of individual data points.

Summary

In scenarios including following legislation, or corrupted training data, a model trainer is required to “forget” some part of their training dataset. We make the connection that a metric derived from statistics collected during training can be predictive of how hard it will be to forget a datapoint. Theoretically we prove that this metric provides an upper bound on how many steps of gradient descent are required to forget a datapoint. Empirically we find across training setups, this metric accurately ranks datapoints by how many gradient descent steps they require to be forgotten. Moreover, we find our proposed metrics discovers harder to forget datapoints, compared to past approaches to identifying difficult data points.

MedRAX: Medical Reasoning Agent for Chest X-ray

Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo Wang (Vector Faculty Member)

Abstract

As a cornerstone of diagnostic imaging, chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems. Data and code will be publicly available at https://medrax25.github.io.

Summary

Chest X-ray interpretation is a critical but labor-intensive task in medicine. Existing artificial intelligence (AI) tools often function as standalone applications, which restricts their integration into comprehensive clinical workflows. Moreover, current general-purpose AI models, despite their advancements, may not consistently provide the multi-step analytical capabilities or the transparent decision-making processes required in medical diagnostics. We have developed MedRAX, an AI framework designed to overcome these limitations in chest X-ray analysis. MedRAX operates by coordinating a suite of specialized AI tools, each proficient in specific tasks such as disease detection, identifying and outlining anatomical structures, or answering detailed image-based questions. The system dynamically selects and sequences these tools, integrating their outputs to address complex medical queries without requiring retraining of the core framework when tools are added or modified. This approach enables MedRAX to offer more accurate, detailed, and interpretable analyses of chest X-rays compared to existing methods, representing a significant advancement towards the practical application of AI in radiology. The system aims to improve diagnostic efficiency, reduce potential for error, and increase the clarity of AI-driven insights, thereby supporting medical professionals and potentially enhancing patient care through more robust AI assistance.

MixMin: Finding Data Mixtures via Convex Minimization

Anvith Thudi, Evianne Rovers, Yangjun Ruan, Tristan Thrush, Chris Maddison (Vector Faculty Member)

Abstract

Modern machine learning pipelines are increasingly combining and mixing data from diverse and disparate sources, e.g., pre-training large language models. Yet, finding the optimal data mixture is a challenging and open problem. We formalize this data mixing problem as a bi-level objective: the best mixture is the one that would lead to the best model for a downstream objective. Unfortunately, this objective is generally intractable. In this paper, we make the observation that the bi-level data mixing objective becomes convex as our model class becomes larger. We develop and study a gradient-based approach for optimizing this convex objective, which we call MixMin, and test it on language modeling and chemistry tasks. MixMin was the only method that uniformly improved the data mixture in all our experiments. With MixMin, we improved the data mixture using less than 0.2% additional compute for a pythia-$410M$ model trained on $8.2B$ tokens, resulting between 1-5% relative improvement to negative log likelihood on PIQA, ARC Easy, SciQ, and OpenWebMath. Crucially, we found that MixMin mixtures for smaller models improved training of larger models, suggesting that MixMin mixtures may be scale-invariant. When mixing bioassay data to train an XGBoost model, we saw improvements to average precision scores of $0.03-0.15$.

Summary

Performant machine learning requires having a relevant dataset for the task you want to learn. When given many sources of data, the problem of knowing how to make a good dataset from these sources poses a typically hard optimization problem. In this paper we showed this optimization can be simplified if we first train a (cheap) model on each of our sources of data. With this we provided a method to make better datasets, leading to improvements on language modeling and chemistry tasks. Our work paves a way for finding useful datasets for typically data-scarce tasks.

Multi-Session Budget Optimization for Forward Auction-based Federated Learning

Xiaoli Tang, Han Yu, Zengxiang Li, Xiaoxiao Li (Vector Faculty Member)

Abstract

Auction-based Federated Learning (AFL) has emerged as an important research field in recent years. The prevailing strategies for FL data consumers (DCs) assume that the entire team of the required data owners (DOs) for an FL task must be assembled before training can commence. In practice, a DC can trigger the FL training process multiple times. DOs can thus be gradually recruited over multiple FL model training sessions. Existing bidding strategies for AFL DCs are not designed to handle such scenarios. Therefore, the problem of multi-session AFL remains open. To address this problem, we propose the Multi-session Budget Optimization Strategy for forward Auction-based Federated Learning (MBOS-AFL). Based on hierarchical reinforcement learning, MBOS-AFL jointly optimizes inter-session budget pacing and intra-session bidding for AFL DCs, with the objective of maximizing the total utility. Extensive experiments on six benchmark datasets show that it significantly outperforms seven state-of-the-art approaches. On average, \methodname{} achieves 12.28% higher utility, 14.52% more data acquired through auctions for a given budget, and 1.23% higher test accuracy achieved by the resulting FL model compared to the best baseline. To the best of our knowledge, it is the first budget optimization decision support method with budget pacing capability designed for DCs in multi-session forward auction-based FL.

Summary

Auction-based Federated Learning (AFL) has emerged as an important research field in recent years. The prevailing strategies for FL data consumers (DCs) assume that the entire team of the required data owners (DOs) for an FL task must be assembled before training can commence. In practice, a DC can trigger the FL training process multiple times. DOs can thus be gradually recruited over multiple FL model training sessions. Existing bidding strategies for AFL DCs are not designed to handle such scenarios. Therefore, the problem of multi-session AFL remains open. To address this problem, we propose the Multi-session Budget Optimization Strategy for forward Auction-based Federated Learning (MBOS-AFL). Based on hierarchical reinforcement learning, MBOS-AFL jointly optimizes intersession budget pacing and intra-session bidding for AFL DCs, with the objective of maximizing the total utility. Extensive experiments on six benchmark datasets show that it significantly outperforms seven state-of-the-art approaches. On average, MBOS-AFL achieves 12.28% higher utility, 14.52% more data acquired through auctions for a given budget, and 1.23% higher test accuracy achieved by the resulting FL model compared to the best baseline. To the best of our knowledge, it is the first budget optimization decision support method with budget pacing capability designed for DCs in multi-session forward AFL.

On the Importance of Gaussianizing Representations

Daniel Eftekhari, Vardan Papyan (Vector Faculty Affiliate)

Abstract

The normal distribution plays a central role in information theory – it is at the same time the best-case signal and worst-case noise distribution, has the greatest representational capacity of any distribution, and offers an equivalence between uncorrelatedness and independence for joint distributions. Accounting for the mean and variance of activations throughout the layers of deep neural networks has had a significant effect on facilitating their effective training, but seldom has a prescription for precisely what distribution these activations should take, and how this might be achieved, been offered. Motivated by the information-theoretic properties of the normal distribution, we address this question and concurrently present normality normalization: a novel normalization layer which encourages normality in the feature representations of neural networks using the power transform and employs additive Gaussian noise during training. Our experiments comprehensively demonstrate the effectiveness of normality normalization, in regards to its generalization performance on an array of widely used model and dataset combinations, its strong performance across various common factors of variation such as model width, depth, and training minibatch size, its suitability for usage wherever existing normalization layers are conventionally used, and as a means to improving model robustness to random perturbations.

Summary

Successfully training deep neural networks depends greatly on how data is represented, as it is processed through the layers of a network. Up until now, controlling the average and spread of these representations was the main approach used to help neural networks train effectively. In this work, we furthermore motivated a specific distribution that neural network representations should follow, and materialized this choice of distribution using a new layer we developed. Our experiments and analysis comprehensively demonstrated the effectiveness of this new layer.

On the Learnability of Distribution Classes with Adaptive Adversaries

Tosca Lechner (Vector Distinguished Postdoctoral Fellow), Alex Bie, Gautam Kamath (Vector Faculty Member)

Abstract

We consider the question of learnability of distribution classes in the presence of adaptive adversaries — that is, adversaries capable of intercepting the samples requested by a learner and applying manipulations with full knowledge of the samples before passing it on to the learner. This stands in contrast to oblivious adversaries, who can only modify the underlying distribution the samples come from but not their i.i.d.\ nature. We formulate a general notion of learnability with respect to adaptive adversaries, taking into account the budget of the adversary. We show that learnability with respect to additive adaptive adversaries is a strictly stronger condition than learnability with respect to additive oblivious adversaries.

Summary

Generalizing from training data underlies most machine learning processes. Often this training data is assumed to be generated directly from the phenomena one wants to learn. In our work we study the situation, where an adversary gets to manipulate the training data, before the learner gets to see it. We study adaptive adversaries, who have access to the whole training data and can therefore manipulate with this full knowledge. We contrast them with oblivious adversaries, who only are aware of the data generating process, but not of the training data itself. We show that adaptive adversaries can be strictly stronger than oblivious adversaries. In particular, we study additive adversaries, who can add data points and subtractive adversaries, who can delete data points. We show a separation between adaptive additive and oblivious adaptive adversaries. Thus, we show that in some situations adding data points when knowing a sample can gravely hurt the learning process, while similar additive manipulations on the data-generating process will not hurt the learning process too much.

Optimizing Noise Distributions for Differential Privacy

Atefeh Gilani, Felipe Gomez, Shahab Asoodeh (Vector Faculty Affiliate), Flavio Calmon, Oliver Kosut, Lalitha Sankar

Abstract

We propose a unified optimization framework for designing continuous and discrete noise distributions that ensure differential privacy (DP) by minimizing R\’enyi DP, a variant of DP, under a cost constraint. R\’enyi DP has the advantage that by considering different values of the R\’enyi parameter $\alpha$, we can tailor our optimization for any number of compositions. To solve the optimization problem, we reduce it to a finite-dimensional convex formulation and perform preconditioned gradient descent. The resulting noise distributions are then compared to their Gaussian and Laplace counterparts. Numerical results demonstrate that our optimized distributions are consistently better, with significant improvements in $(\varepsilon, \delta)$-DP guarantees in the moderate composition regimes, compared to Gaussian and Laplace distributions with the same variance.

Summary

Protecting sensitive information is a major concern in the age of big data. Differential Privacy (DP) is a popular method for ensuring privacy by adding random noise to data, making it difficult to identify individuals. However, choosing the right type of noise is critical—too much noise can ruin data accuracy, and too little can fail to protect privacy. In this work, we introduce a new way to find the best noise distribution for a given privacy guarantee. Our method improves the accuracy of results while still meeting strong privacy standards. We show that our optimized noise works better than commonly used noise types, such as Gaussian or Laplace, across different datasets and privacy settings. This approach can help make privacy-preserving machine learning more reliable and effective in real-world applications.

PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling *

Avery Ma, Yangchen Pan, Amir-massoud Farahmand (Vector Faculty Affiliate)

Abstract

Many-shot jailbreaking circumvents the safety alignment of large language models by exploiting their ability to process long input sequences. To achieve this, the malicious target prompt is prefixed with hundreds of fabricated conversational turns between the user and the model. These fabricated exchanges are randomly sampled from a pool of malicious questions and responses, making it appear as though the model has already complied with harmful instructions. In this paper, we present PANDAS: a hybrid technique that improves many-shot jailbreaking by modifying these fabricated dialogues with Positive Affirmations, Negative Demonstrations, and an optimized Adaptive Sampling method tailored to the target prompt’s topic. Extensive experiments on AdvBench and HarmBench, using state-of-the-art LLMs, demonstrate that PANDAS significantly outperforms baseline methods in long-context scenarios. Through an attention analysis, we provide insights on how long-context vulnerabilities are exploited and show how PANDAS further improves upon many-shot jailbreaking.

Summary

Large language models can be tricked into generating harmful output by overloading them with long, fake conversations. These conversations are designed to make it seem like the model has already followed dangerous instructions many times. In this paper, we introduce PANDAS, a technique that improves this type of attack by modifying the fake conversations with positive affirmation phrases, negative demonstrations, and a more targeted selection of content. Results on state-of-the-art open-source models show that PANDAS is more effective at eliciting harmful outputs than previous methods. We also analyze the models’ intermediate outputs to understand the effect of PANDAS.

Position: Beyond Assistance – Reimagining LLMs as Ethical and Adaptive Co-Creators in Mental Health Care

Abeer Badawi, Md Tahmid Rahman Laskar, Jimmy Huang, Shaina Raza (Vector Applied Machine Learning Scientist), Elham Dolatabadi (Vector Faculty Affiliate)

Abstract

This position paper argues for a fundamental shift in how Large Language Models (LLMs) are integrated into the mental health care domain. We advocate for their role as co-creators rather than mere assistive tools. While LLMs have the potential to enhance accessibility, personalization, and crisis intervention, their adoption remains limited due to concerns about bias, evaluation, over-reliance, dehumanization, and regulatory uncertainties. To address these challenges, we propose two structured pathways: SAFE-I Implementation Guidelines for ethical and responsible deployment, and HAAS-E Evaluation Framework for multidimensional, human-centered assessment. SAFE-I provides a blueprint for data governance, adaptive model engineering, and real-world integration, ensuring LLMs align with clinical and ethical standards. HAAS-E introduces evaluation metrics that go beyond technical accuracy to measure trustworthiness, empathy, cultural sensitivity, and actionability. We call for the adoption of these structured approaches to establish a responsible and scalable model for LLM-driven mental health support, ensuring that AI complements—rather than replaces—human expertise.

Summary

What if AI could be your teammate, not your replacement, in delivering compassionate mental health care? As the digital-native generation turns to tools like ChatGPT for everything from schoolwork to career advice, it won’t be long before they rely on AI for emotional and mental health support. The question is no longer if LLMs belong in mental health, but how they can contribute safely, ethically, and meaningfully. This paper argues that LLMs are ready to do more than automate tasks when designed with ethical and safety considerations. These tools can help ease the burden on overstretched teams, provide personalized guidance, and offer timely support. But the stakes are high: without proper safeguards, LLMs can cause serious harm, spreading bias and misinformation, or leading users to place misplaced trust in their responses. Implementing strong safeguards is essential to ensure these tools are safe, reliable, and aligned with ethical standards. To translate this vision into action, our position proposes two frameworks: SAFE-i, which supports responsible design and deployment through three pillars: Ethical Data Foundations, Model Engineering, and Real-World Integration. HAAS-e, which proposes a human-centered evaluation framework built around four essential dimensions based on trustworthiness, fairness, empathy, and actionability, introducing metrics like the Contextual Empathy Score (CES), Cultural Sensitivity Index (CSI), Personalization Appropriateness Score (PAS), and Actionability and Safety Assessment (ASA). Together, these tools offer a practical roadmap for aligning AI systems with human values, clinical goals, and diverse cultural contexts—empowering mental health professionals with adaptive, ethical, and empathetic AI collaborators.

Position: Humanity faces existential risk from gradual disempowerment

Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, David Duvenaud (Vector Faculty Member)

Abstract

This paper examines the systemic risks posed by incremental advancements in artificial intelligence, developing the concept of `gradual disempowerment’, in contrast to the abrupt takeover scenarios commonly discussed in AI safety. We analyze how even incremental improvements in AI capabilities can undermine human influence over large-scale systems that society depends on, including the economy, culture, and nation-states. As AI increasingly replaces human labor and cognition in these domains, it can weaken both explicit human control mechanisms (like voting and consumer choice) and the implicit alignments with human preferences that often arise from societal systems’ reliance on human participation to function. Furthermore, AI systems may amplify existing misalignments with human preferences by optimizing these systems more powerfully. These distortions across domains may be mutually reinforcing: economic power shapes cultural narratives and political decisions, while cultural shifts alter economic and political behavior. We argue that this dynamic could lead to an effectively irreversible loss of human influence over crucial societal systems, precipitating an existential catastrophe through the permanent disempowerment of humanity. This analysis suggests the need for both technical research and governance approaches that specifically address the risk of incremental erosion of human influence across interconnected societal systems.

Summary

AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be centrally driven by having more competitive machine alternatives to humans in almost all societal functions, such as economic labor, decision making, artistic creation, and even companionship. A gradual loss of control of our own civilization might sound implausible. Hasn’t technological disruption usually improved aggregate human welfare? We argue that the alignment of societal systems with human interests has been stable only because of the necessity of human participation for thriving economies, states, and cultures. Once this human participation gets displaced by more competitive machine alternatives, our institutions’ incentives for growth will be untethered from a need to ensure human flourishing. Decision-makers at all levels will soon face pressures to reduce human involvement across labor markets, governance structures, cultural production, and even social interactions. Those who resist these pressures will eventually be displaced by those who do not. Still, wouldn’t humans notice what’s happening and coordinate to stop it? Not necessarily. What makes this transition particularly hard to resist is that pressures on each societal system bleed into the others. For example, we might attempt to use state power and cultural attitudes to preserve human economic power. However, the economic incentives for companies to replace humans with AI will also push them to influence states and culture to support this change, using their growing economic power to shape both policy and public opinion, which will in turn allow those companies to accrue even greater economic power. Once AI has begun to displace humans, existing feedback mechanisms that encourage human influence and flourishing will begin to break down. For example, states funded mainly by taxes on AI profits instead of their citizens’ labor will have little incentive to ensure citizens’ representation. This could occur at the same time as AI provides states with unprecedented influence over human culture and behavior, which might make coordination amongst humans more difficult, thereby further reducing humans’ ability to resist such pressures. We describe these and other mechanisms and feedback loops in more detail in this work. Though we provide some proposals for slowing or averting this process, and survey related discussions, we emphasize that no one has a concrete plausible plan for stopping gradual human disempowerment and methods of aligning individual AI systems with their designers’ intentions are not sufficient. Because this disempowerment would be global and permanent, and because human flourishing requires substantial resources in global terms, it could plausibly lead to human extinction or similar outcomes.

Position: Sustaining human-generated data for ML requires shifting focus toward intrinsic human motivations

Sebastin Santy, Prasanta Bhattacharya, Manoel Ribeiro, Kelsey Allen (Vector Faculty Member), Sewoong Oh

Abstract

Progress in AI has relied on human-generated data, from annotator marketplaces to the wider Internet. However, the widespread use of large language models now threatens the quality and integrity of human-generated data on these very platforms. We argue that this issue goes beyond the immediate challenge of filtering AI-generated content — it reveals deeper flaws in how data collection systems are designed. Existing systems often prioritize speed, scale, and efficiency at the cost of intrinsic human motivation, leading to declining engagement and data quality. We propose that rethinking data collection systems to align with contributors’ intrinsic motivations–rather than relying solely on external incentives–can help sustain high-quality data sourcing at scale while maintaining contributor trust and long-term participation.

Summary

Discussions around data quality in machine learning often focus on technical indicators and definitions, overlooking the human sources that generate this data. Much of today’s data comes from user participation on online platforms. This led us to ask: can we learn something about sustaining data quality by examining how humans participate on these platforms? We examine the quantity-quality tradeoff in data generation through the lens of human motivation. Drawing from social science, we show how excessive reliance on external incentives can undermine intrinsic motivation. We propose a shift: design engaging, suitably-incentivized environments (e.g., online games) that encourage meaningful participation while producing high-quality data. Our paper highlights the motivational forces behind online data generation for AI/ML and illustrates cases of past systems that have successfully navigated the quantity-quality tradeoff to generate meaningful human data. We also emphasize key design considerations for building trustworthy data collection environments of the future that will not only generate high quality data, but also respect and support the people contributing it.

Position: The Most Expensive Part of an LLM should be its Training Data

Nikhil Kandpal, Colin Raffel (Vector Faculty Member)

Abstract

Large Language Model (LLM) training is an increasingly expensive endeavor due to growing computational requirements, hardware demands, energy costs, and engineering labor. Apart from training costs, an oft-overlooked (and seldom paid) cost is the human labor required to write the trillions of words of text used to train state-of-the-art LLMs. In this position paper, we aim to assign a monetary value to this labor and make the case that the most expensive part of producing an LLM *should* be the compensation provided to training data producers for their work. To support this position we study 64 LLMs released between 2016 and 2024, breaking down both the cost of training the models as well as the hypothetical cost of creating the training data. Our analysis indicates that even with an extremely conservative estimate of how much compensation should be provided for the human labor that went into creating the training data, the costs of these models’ training datasets are 1-3 orders of magnitude larger than the costs to train the models themselves. In the face of the massive gap between the value of training data and the current lack of compensation for its creation, we highlight and discuss research directions that could enable fairer practices in the future.

Summary

Training a modern Large Language Model (LLM) is an incredibly expensive endeavor due to the cost of specialized hardware, energy required to run that hardware, and the enormous engineering labor needed to architect large-scale training systems. However, an often overlooked (and seldom paid) expense is the human labor behind these models’ training data. Every LLM is built on an unfathomable amount of human effort: trillions of carefully written words sourced from books, academic papers, codebases, social media, and more. This position paper aims to assign a monetary value to this labor and argues that the most expensive part of producing an LLM \emph{should} be the compensation provided to training data producers for their work. To support this position, we study 64 LLMs released between 2016 and 2024, estimating what it would cost to pay people to produce their training datasets from scratch. Even under highly conservative estimates of wage rates, the costs of these models’ training datasets are 10-1000 times larger than the costs to train the models themselves, representing a significant financial liability for LLM providers. In the face of the massive gap between the value of training data and the lack of compensation for its creation, we highlight and discuss research directions that could enable fairer practices in the future.

QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions

Zhun Deng, Thomas Zollo, Benjamin Eyre, Amogh Inamdar, David Madras, Richard Zemel (Vector Faculty Member)

Abstract

As machine learning models grow increasingly competent, their predictions are being used to supplement scarce or expensive data for estimating important quantities. By combining a small set of high-fidelity observed data (i.e., genuine measurements) with a larger set of imputed data (i.e., model predictions), practitioners can enhance estimation quality beyond what either source provides in isolation. Though this paradigm shows promise, existing frameworks focus narrowly on estimating means or single quantiles, limiting their applicability for many critical domains and use cases. To address this challenge, we introduce **QuEst**, a framework to incorporate both observed and imputed data to estimate and provide rigorous confidence intervals for quantile-based distributional measures. Such quantile-based measures include tail measures such as CVaR, population segments like quartiles, and other key quantities of interest in fields including economics, sociology, education, and medicine. As part of QuEst, we also introduce an algorithm to estimate these statistics for multidimensional measurements and metrics. Further, we offer a novel spline function based method for optimizing our method (as well as other existing methods for such hybrid estimation).We demonstrate the utility of our framework through experiments in economic modeling, opinion polling, and language model auto-evaluation.

Summary

We introduce QuEst, a method for combining real, observed data with machine learning model predictions to produce better estimates of important quantities. Our framework is especially useful for enhancing experimental findings in fields such as economics, sociology, education, medicine, as well as for evaluating language models.

Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner (Vector Faculty Affiliate), Pascal Poupart (Vector Faculty Member)

Abstract

Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel _doubly Bayesian_ offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.

Summary

Imagine teaching an AI to perform a task, like navigating a building, using only a fixed set of recorded examples. When faced with a new situation it hasn’t seen before, the AI can become confused and make poor decisions because its knowledge is incomplete. Many existing approaches make the AI overly cautious to avoid mistakes, but this prevents it from adapting effectively.We introduce a new method called Reflect-then-Plan (RefPlan) that helps an AI reason intelligently about what it doesn’t know. Our method works in two steps: * Reflect: As the AI operates, it continuously “reflects” on its recent experiences—the actions it took and what happened as a result—to update its understanding of the specific environment it’s currently in. * Plan: When “planning” its next move, it doesn’t rely on a single, rigid prediction of the future. Instead, it considers a range of possible scenarios based on its uncertainty, making its strategy more robust to the unexpected. Our results show that this approach significantly improves the AI’s performance, making it more flexible and resilient, especially when faced with unfamiliar situations, limited data, or changing conditions.

Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks

Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal (Vector Faculty Member)

Abstract

Text watermarking aims to subtly embeds statistical signals into text by controlling the Large Language Model (LLM)’s sampling process, enabling watermark detectors to verify that the output was generated by the specified model. The robustness of these watermarking algorithms has become a key factor in evaluating their effectiveness. Current text watermarking algorithms embed watermarks in high-entropy tokens to ensure text quality. In this paper, we reveal that this seemingly benign design can be exploited by attackers, posing a significant risk to the robustness of the watermark. We introduce a generic efficient paraphrasing attack, the Self-Information Rewrite Attack (SIRA), which leverages the vulnerability by calculating the self-information of each token to identify potential pattern tokens and perform targeted attack. Our work exposes a widely prevalent vulnerability in current watermarking algorithms. The experimental results show SIRA achieves nearly 100\% attack success rates on seven recent watermarking methods with only \.88 per million tokens cost. Our approach does not require any access to the watermark algorithms or the watermarked LLM and can seamlessly transfer to any LLM as the attack model even mobile-level models. Our findings highlight the urgent need for more robust watermarking.

Summary

The rapid advancement of Large Language Models (LLMs) has brought concerns about their potential misuse, such as spreading misinformation and threatening academic integrity. To address this, text watermarking has emerged as a promising solution, subtly embedding undetectable patterns into LLM-generated text to verify its origin. However, the effectiveness of these watermarks depends on their robustness against attacks that try to remove them. Existing attack methods are often inefficient, untargeted, resource-intensive, and not easily transferable across different LLMs. Our research introduces the Self-Information Rewrite Attack (SIRA), a novel and efficient paraphrasing attack that reveals a fundamental vulnerability in current text watermarking algorithms. We discovered that watermarking techniques embed patterns in “high-entropy” tokens—tokens with high self-information due to their unpredictability and low probability. SIRA exploits this by calculating the self-information of each token to identify and mask these potential watermark-carrying tokens. We then use an LLM to perform a targeted “fill-in-the-blank” task, rewriting the masked text while preserving its semantic integrity. SIRA represents a significant step forward in understanding and evaluating the robustness of LLM watermarking. Our experiments show that SIRA achieves nearly 100% attack success rates across seven recent watermarking methods, at a very low cost of $0.88 per million tokens. This attack doesn’t require any prior knowledge of the watermark algorithm or the LLM used, and it’s highly transferable, even working with smaller, mobile-level models. By exposing this widespread vulnerability, our work highlights the urgent need for developing more robust and adaptive watermarking approaches to ensure transparency and integrity in AI-generated content.

Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction

Yudong W Xu, Wenhao Li, Scott Sanner (Vector Faculty Affiliate), Elias Khalil (Vector Faculty Affiliate)

Abstract

CSPs find use in many applications and thus accelerating their solution with machine learning is of wide interest. Most existing approaches rely on supervised learning from feasible solutions or reinforcement learning, paradigms that require either feasible solutions to these NP-Complete CSPs or large training budgets and a complex expert-designed reward signal.To address these challenges, we propose ConsFormer, a self-supervised framework that leverages a Transformer as a solution refiner. ConsFormer constructs a solution to a CSP iteratively in a process that mimics local search. Instead of using feasible solutions as labeled data, we devise differentiable approximations to the discrete constraints of a CSP to guide model training. Our model is trained to improve random assignments for a single step but is deployed iteratively at test time, circumventing the bottlenecks of supervised and reinforcement learning. Our method can tackle out-of-distribution CSPs simply through additional iterations.

Summary

Solving problems under specific rules and restrictions is part of many real-life tasks, from completing puzzles like Sudoku to scheduling employee shifts. These problems are often hard to solve, and even the best traditional methods can struggle as the problems grow larger and more complex. Artificial intelligence has been used to help tackle these problems more efficiently. However, many existing methods rely on having examples of good solutions or require extensive trial and error, which can be slow or impractical. We introduce ConsFormer which takes a different approach. It trains an AI model to make small improvements to a solution in a single step, without needing correct answers during training. When deployed, ConsFormer is repeatedly used to make steady improvements, starting from a random guess and refining it step by step. ConsFormer works across different problems and can handle more challenging instances simply by running more improvement steps. This makes it a promising tool for solving complex real-world constraint reasoning problems efficiently.

Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

Mohammed Adnan, Rohan Jain, Ekansh Sharma, Rahul G. Krishnan (Vector Faculty Member), Yani Ioannou

Abstract

The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH’s sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10/100 & ImageNet) and models (VGG11 & ResNet20/50).

Summary

Modern artificial intelligence (AI) systems are incredibly powerful but often require massive amounts of computing power and data to train. This makes them expensive and out of reach for many researchers and developers. To address this, scientists have been exploring “sparser” AI models—systems that use only a small fraction of their potential connections—making them much more efficient to train and run. However, a major hurdle is that a sparse model setup that works well with one starting point for training often fails when training begins from a different starting point. Our research identifies the root cause: misalignment. Think of it like using a key (the sparse setup) on a lock that has been rotated slightly—it just doesn’t fit. To solve this, we developed a method to “re-align” the sparse structure so it matches the patterns of a new starting point. This adjustment dramatically improves the performance of sparse models trained from different starting points, making them nearly as effective as their original versions. Our findings make it easier and more practical to develop leaner, more efficient AI systems, paving the way for broader accessibility and innovation in AI research.

Stochastic Forward–Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets

Haoye Lu, Qifan Wu, Yaoliang Yu (Vector Faculty Member)

Abstract

Recent diffusion-based generative models achieve remarkable results by training on massive datasets, yet this practice raises concerns about memorization and copyright infringement. A proposed remedy is to train exclusively on noisy data with potential copyright issues, ensuring the model never observes original content. However, through the lens of deconvolution theory, we show that although it is theoretically feasible to learn the data distribution from noisy samples, the practical challenge of collecting sufficient samples makes successful learning nearly unattainable. To overcome this limitation, we propose to pretrain the model with a small fraction of clean data to guide the deconvolution process. Combined with our Stochastic Forward–Backward Deconvolution (SFBD) method, we attain an FID of $6.31$ on CIFAR-10 with just $4\%$ clean images (and $3.58$ with $10\%$). Theoretically, we prove that SFBD guides the model to learn the true data distribution. The result also highlights the importance of pretraining on limited but clean data or the alternative from similar datasets. Empirical studies further support these findings and offer additional insights.

Summary

Modern image generation models – such as those behind AI art tools – are typically trained on massive collections of images. However, this practice raises important concerns: some of the training data may be copyrighted, and models risk memorizing and reproducing such content too closely. One proposed solution is to train models only on noisy (blurred or altered) versions of the images, ensuring the originals are never directly seen. Yet in practice, we show that learning from noisy data alone is extremely difficult – it requires an impractically large number of samples to be effective. In this work, we focus on diffusion models and demonstrate that introducing even a small fraction of clean (original) data, just 4% or 10%, can make a substantial difference. We propose a method called Stochastic Forward–Backward Deconvolution (SFBD), which alternates between denoising noisy samples using the current model and then retraining the model with those denoised results. This process helps the model gradually learn to generate realistic images, even when most of the training data is noisy. Our experiments show that SFBD achieves image quality close to models trained on fully clean datasets, while greatly reducing legal and ethical risks. This work offers a promising path toward training generative models more responsibly and efficiently.

Suitability Filter: A Statistical Framework for Model Evaluation in Real-World Deployment Settings *

Angéline Pouget, Mohammad Yaghini, Stephan Rabanser, Nicolas Papernot (Vector Faculty Member)

Abstract

Deploying machine learning models in safety-critical domains poses a key challenge: ensuring reliable model performance on downstream user data without access to ground truth labels for direct validation. We propose the suitability filter, a novel framework designed to detect performance deterioration by utilizing suitability signals – model output features that are sensitive to covariate shifts and indicative of potential prediction errors. The suitability filter evaluates whether classifier accuracy on unlabeled user data shows significant degradation compared to the accuracy measured on the labeled test dataset. Specifically, it ensures that this degredation does not exceed a pre-specified margin, which represents the maximum acceptable drop in accuracy. To achieve reliable performance evaluation, we aggregate suitability signals for both test and user data and compare these empirical distributions using statistical hypothesis testing, thus providing insights into decision uncertainty. Our modular method adapts to various models and domains. Empirical evaluations across different classification tasks demonstrate that the suitability filter reliably detects performance deviations due to covariate shift. This enables proactive mitigation of potential failures in high-stakes applications.

Summary

Machine learning models learn from data to make decisions, but it can be tricky to ensure they remain dependable when they encounter new, real-world situations. This research introduces a new way to check if these models are starting to make more mistakes with new data, particularly when we can’t easily verify whether their decisions are correct. The method works by examining subtle clues in how the model behaves with both familiar and new data to detect if its decision-making quality has declined. Experiments showed this approach can successfully flag when a model is struggling because the new information is different from what it was prepared for. This helps build confidence that these machine learning models are working correctly and can be trusted, especially in important everyday applications.

Theoretical Limitations of Ensembles in the Age of Overparameterization *

Niclas Dern, John Cunningham, Geoff Pleiss (Vector Faculty Member)

Abstract

Classic ensembles generalize better than any single component model. In contrast, recent empirical studies find that modern ensembles of (overparameterized) neural networks may not provide any inherent generalization advantage over single but larger neural networks. This paper clarifies how modern overparameterized ensembles differ from their classic underparameterized counterparts, using ensembles of random feature (RF) regressors as a basis for developing theory. In contrast to the underparameterized regime, where ensembling typically induces regularization and increases generalization, we prove with minimal assumptions that infinite ensembles of overparameterized RF regressors become pointwise equivalent to (single) infinite-width RF regressors, and finite width ensembles rapidly converge to single models with the same parameter budget. These results, which are exact for ridgeless models and approximate for small ridge penalties, imply that overparameterized ensembles and single large models exhibit nearly identical generalization. We further characterize the predictive variance amongst ensemble members, demonstrating that it quantifies the expected effects of increasing capacity rather than capturing any conventional notion of uncertainty. Our results challenge common assumptions about the advantages of ensembles in overparameterized settings, prompting a reconsideration of how well intuitions from underparameterized ensembles transfer to deep ensembles and the overparameterized regime.

Summary

In safety-critical applications like medical diagnosis or self-driving cars, researchers often combine multiple AI models into so-called “ensembles” to improve predictions – similar to consulting a committee rather than a single expert. This approach has worked well for simple models, but with today’s powerful neural networks that can memorize entire datasets, ensembles often fail to deliver the expected benefits. We analyzed this mathematically using simplified neural networks. We discovered that when models are complex enough to memorize their training data, ensembles of them closely behave like a single, larger model. This means ensembling large models offers little gain over simply training a single, bigger model. Furthermore, we found that a common method for estimating the uncertainty of ensemble predictions – measuring disagreement among ensemble members – lacks theoretical grounding in such cases. Our results don’t deny that ensembles can still be useful in practice since larger models might, for example, be hard to train. However, they caution against viewing ensembles as a simple and reliable strategy for boosting performance over what a single larger model could achieve or for assessing uncertainty.

Towards Cost-Effective Reward Guided Text Generation

Ahmad Rashid, Ruotian Wu, Rongqi Fan, Hongliang Li, Agustinus Kristiadi (Vector Distinguished Postdoctoral Fellow), Pascal Poupart (Vector Faculty Member)

Abstract

Reward guided text generation (RGTG) has emerged as a viable alternative to offline reinforcement learning from human feedback (RLHF). RGTG methods can align baseline language models to human preferences without further training such as in RLHF methods (PPO and DPO). However, they rely on a reward model to score each candidate token generated by the language model at inference and incur significant overhead. Additionally, the reward model is trained to score full sequences only, which can lead to sub-optimal choices for partial sequences. In this work, we present a novel reward model architecture which is trained, using a Bradley-Terry loss, to prefer the optimal expansion of a sequence with just a single call to the reward model. That is, a score for all possible candidate tokens is generated simultaneously, leading to efficient inference. We theoretically analyze RGTG reward models and demonstrate that baseline reward models prefer sub-optimal sequences compared to our method during inference. Empirically, our reward model leads to significantly faster inference, compared to other RGTG methods, with fewer calls to the reward model and competitive performance compared to both RGTG and RLHF.

Summary

Can language models improve with the help of human feedback without re-training? Re-training is expensive as it requires computational resources and electricity consumption, and contributes to carbon emissions. Prior work has shown that it is indeed possible to do so, but comes at the cost of longer response times from the language model, when answering a query. We present a method , FaRMA, that can significantly reduce this response time while still avoid re-training. Moreover, we demonstrate scenarios where the prior methods fail to provide good responses and show that FaRMA is not vulnerable to these.

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

Honghua Dong, Jiacheng Yang, Xun Deng, Yuhe Jiang, Gennady Pekhimenko (Vector Faculty Member), Fan Long, Xujie Si (Vector Faculty Affiliate)

Abstract

Type inference for dynamic languages like Python is a persistent challenge in software engineering. While large language models (LLMs) have shown promise in code understanding, their type inference capabilities remain underexplored. We introduce `TypyBench`, a benchmark designed to evaluate LLMs’ type inference across entire Python repositories. `TypyBench` features two novel metrics: `TypeSim`, which captures nuanced semantic relationships between predicted and ground truth types, and `TypeCheck`, which assesses type consistency across codebases. Our evaluation of various LLMs on a curated dataset of 50 high-quality Python repositories reveals that, although LLMs achieve decent `TypeSim` scores, they struggle with complex nested types and exhibit significant type consistency errors. These findings suggest that future research should shift focus from improving type similarity to addressing repository-level consistency. `TypyBench` provides a foundation for this new direction, offering insights into model performance across different type complexities and usage contexts.

Summary

Figuring out the specific data types used in flexible programming languages like Python can be a real headache for software developers. While the powerful AI models known as LLMs are good at understanding code, we didn’t know how well they could handle this specific task on a large scale. To find out, we created TypyBench, a new test to see how accurately these AIs can predict data types across entire software projects. We developed two new ways to measure their performance: one that checks if the predicted type is close in meaning to the correct one, and another that verifies if the AI’s predictions are consistent throughout the code. Our tests on 50 high-quality Python projects revealed that while the AIs are pretty good at guessing the general meaning of types, they often make mistakes with more complicated ones and create inconsistencies within the same project. This shows that future efforts should focus on making AI predictions more consistent, and TypyBench provides the perfect tool to guide this research.

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Harrish Thasarathan, Julian Forsyth, Thomas Fel, Matthew Kowal, Konstantinos Derpanis (Vector Faculty Affiliate)

Abstract

We present Universal Sparse Autoencoders (USAEs), a framework for uncovering and aligning interpretable concepts spanning multiple pretrained deep neural networks. Unlike existing concept-based interpretability methods, which focus on a single model, USAEs jointly learn a universal concept space that can reconstruct and interpret the internal activations of multiple models at once. Our core insight is to train a single, overcomplete sparse autoencoder (SAE) that ingests activations from any model and decodes them to approximate the activations of any other model under consideration. By optimizing a shared objective, the learned dictionary captures common factors of variation—concepts—across different tasks, architectures, and datasets. We show that USAEs discover semantically coherent and important universal concepts across vision models; ranging from low-level features (e.g., colors and textures) to higher-level structures (e.g., parts and objects). Overall, USAEs provide a powerful new method for interpretable cross-model analysis and offers novel applications—such as coordinated activation maximization—that open avenues for deeper insights in multi-model AI systems.

Summary

Modern computer vision models are increasingly diverse, trained using various datasets and architectures to accomplish specific visual tasks such as depth estimation or object recognition. These design choices shape what visual “concepts” or features each model learns—from recognizing edges and textures to understanding objects and scenes. This raises a core scientific question: do these models, despite their differences, converge on learning the same fundamental visual concepts? Answering this question is challenging because the internal representations these models learn are encoded in ways that humans cannot directly interpret. Our work introduces Universal Sparse Autoencoders (USAE), to create a universal, interpretable concept space that reveals what multiple vision models learn in common about the visual world. Our approach enables us to identify the most important universal concepts shared across models, while also discovering features that are unique to specific models. This analysis provides insight into which architectural and training choices lead to better visual representations, and which concepts appear to be fundamental building blocks for visual understanding. This work advances our ability to understand and compare how different AI systems perceive and process visual information.