Vector researchers presenting more than 65 papers at NeurIPS 2023
December 4, 2023
December 4, 2023
Vector researchers are presenting 65 papers at this year’s Conference on Neural Information Processing Systems (NeurIPS). Running December 10 through 16th online in New Orleans and online, Vector Faculty, Faculty Affiates, and Postdoctoral Fellows are showcasing new work that pushes the boundaries in different AI fields of research with the potential to impact many facets of daily life including health, chemical materials discovery, data privacy, music and our understanding of the natural world.
Below are simplified summaries for some of the accepted papers and workshops from Vector Researchers
Paper descriptions written by paper co-authors and/or generative AI.
A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset
Zahra Gharaee, ZeMing Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott C. Lowe, Jaclyn T.A. McKeown, Chris C.Y. Ho, Joschka McLeod, Yi-Yun C Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel X. Chang, Graham W. Taylor, Paul Fieguth
This study creates a catalogue of insect biodiversity: The BIOSCAN-Insect Dataset. The dataset contains labelled images of various insects, taxonomically classified by domain experts, and collected associated genomic data in the form of raw nucleotide “DNA barcode” sequences. The dataset has over a million images to train computer vision models for taxonomic assessment. It may also be of interest to the wider machine learning community due to intrinsic challenges it presents, such as skewed image distribution across insect types and the detailed complexity of taxonomic labelling. Beyond insect identification from images, this work also contributes to efforts to use imagery and genomic data in complementary ways to survey biodiversity. The paper introduces the dataset and explores the classification task through modern convolutional- and transformer-based methods.
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi
Like any tool, AI models can get outdated or start behaving unexpectedly. In this paper, we discuss a new tool called GRACE, a Lifelong Model Editing method that can fine-tune these AI models whenever they misbehave, without disturbing their overall functioning. It’s just like fine-tuning a musical instrument without changing its character. GRACE does this by creating an internal list of modifications, rather than tweaking the model’s structure. It can do this thousands of times using only examples of errors, which is a new achievement. We tested GRACE on various popular AI models and found that it not only corrected errors effectively but also adapted well to new, unseen situations.
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori Hashimoto
AlpacaFarm is a novel simulator designed to tackle three primary obstacles in the development of large language models like ChatGPT: expensive data collection, unreliable evaluations, and the lack of standard methods. It creates LLM prompts for simulated human feedback, reducing costs by 45x compared to using actual crowd workers while maintaining high consistency with human responses. AlpacaFarm introduces an automatic evaluation mechanism, confirmed through real-world interactions, and provides standard implementations for methods such as PPO and expert iteration, utilizing pairwise feedback learning. We find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a 10% improvement in win rate against Davinci003.
An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient
Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan
In the study of artificial intelligence, specifically Reinforcement Learning (RL), teaching machines to make cautious decisions is common. Traditionally, this is done by controlling the unpredictability in the machine’s performance results. However, this can be a sensitive process that may impede learning. The paper suggests an alternative approach with a new risk measure called Gini deviation. The authors offer a fresh strategy for machines to learn while managing this risk. Testing showed that their method outperforms older strategies by maintaining effective performance with less risk, succeeding in areas where previous methods were inadequate in guiding machine behavior effectively.
Batchnorm Allows Unsupervised Radial Attacks
Amur Ghose, Apurv Gupta, Yaoliang Yu, Pascal Poupart
Computer vision researchers often need to test the robustness of image recognition systems by trying to fool them with altered images, known as adversarial examples. Typically, to create these deceptive images, one needs access to the model’s outputs, such as classification labels and the associated confidence levels. However, this study found that when it comes to deep learning models for image recognition that use a certain technique called batch normalization, adversarial examples can be crafted just by examining the model’s mid-process calculations. They can do this by focusing on how much those calculations deviate from a standard type of geometric distribution, all without any knowledge of the actual labels or final output. These mid-process calculations naturally form patterns, resembling well-understood mathematical shapes and distributions. They also discovered that this tactic can expose a security flaw in these models, including when they are adapted for other tasks. Specifically, the vulnerability is linked to the use of batch normalization, and removing it can reduce the risk. Moreover, the finding is significant not only for image recognition models but also for the latest transformer-based models, especially those designed for processing visual information.
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, Colin Raffel
This study introduces Petals, a new system that lets researchers join forces and combine their computing power to use large models more effectively. Recent large models need powerful computers to run, which many individual researchers don’t have. There are some workarounds like storing parts of the model elsewhere (RAM offloading) or using online services (hosted APIs), but these have downsides: Offloading makes the models too slow for real-time use, and APIs don’t let researchers change the models as needed for in-depth studies. With Petals, you can use even the biggest models on normal computer setups without the problems of other methods. Plus, Petals provides a transparent look into the models’ inner workings, which is essential for researchers who want to make specific customizations and improvements to them.
Distribution Learnability and Robustness
Shai Ben-David, Alex Bie, Gautam Kamath, Tosca Lechner
This study looks into how well AI can learn from data and remain robust when data is tainted. Typically, we’d hope that if an AI can learn from clean data, it could also handle data that’s been corrupted to some extent by a malicious source. The researchers focused on estimating an unknown probability distribution, and discovered that this holds true when the disruption involves only adding misleading data points. For example, under a model of contamination called Huber, if an adversary adds false information to the dataset, the AI can still learn effectively. But the situation changes if the adversary starts removing data points from the dataset – a process known as subtractive contamination. In this case, if the AI has learned from perfect data, it doesn’t necessarily mean it will do well with the compromised dataset. This challenges the assumption that the ability to learn in ideal situations often assumes that learning under less-than-ideal conditions is also possible. The research further discusses the consequences of their findings for data compression methods and learning with privacy guarantees, like differential privacy.
Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning
Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand
The world we live in is inherently stochastic, and every decision we take requires us considering the risks associated with it. Risk-sensitive reinforcement learning centres on designing agents which can similarly make decisions with risk in mind, which an agent learns through interactions with the environment. It is often beneficial to learn a model of the environment which the agent can then use to interact with rather than using the environment itself. This approach allows an agent to perform fewer interactions with the true environment; this is particularly important when interactions with the true environment are costly, or in safety-critical applications, when mistakes in the true environment must be avoided. In this work, we study the problem of the best way to learn such models for risk-sensitive learning. This question has been studied many times in the context of risk-neutral learning, however we show that such approaches are far from optimal for the risk-sensitive setting. We introduce a general framework for how to learn these models, and demonstrate that one can choose what type of risk the model should be most aware of. We show that our framework can be combined with a wide range of existing model-free algorithms, and empirically show the benefits of our approach.
Distribution-Free Statistical Dispersion Control for Societal Applications
Zhun Deng, Thomas P. Zollo, Jake C. Snell, Toniann Pitassi, Richard Zemel
For AI systems that take on tasks with serious consequences, it is essential to understand the reliability of the system. Traditionally, the aim is to predict the system’s overall accuracy or its error margins. Yet, in areas where decisions have significant societal impacts, there’s a need to ensure that its mistakes don’t unfairly affect different groups. To address this, this paper presents a novel framework that goes beyond average performance, assessing how equitable a system’s decisions are across a population. It’s a broader approach that accounts for a variety of possible outcomes and their societal effects and can handle more complex statistical analyses than prior techniques. The effectiveness of this framework has been proven in diverse applications, such as detecting harmful language, assisting with medical diagnoses from images, and making film recommendations. Their work is a step towards responsible AI that is fair and reliable for high-stakes scenarios. This research underlines the importance of not just AI performance but also the equality of its impact on society.
Doubly Robust Peer-To-Peer Learning Protocol
Nicholas Franzese, Adam Dziedzic, Christopher A. Choquette-Choo, Mark R. Thomas, Muhammad Ahmad Kaleem, Stephan Rabanser, Congyu Fang, Somesh Jha, Nicolas Papernot, Xiao Wang
This study focuses on collaborative machine learning, where different organizations work together and combine their data to build better models. Even though working together like this might seem to protect the privacy of everyone’s data, there’s still a risk. Either the central server that collects the updates from all the clients, or the clients (the various organizations) themselves, might not follow the agreed rules. A dishonest server could try to dig into the clients’ data, or the clients might send harmful data to mess with the learning process. Ideally, each party, whether client or server, wants to be sure the other side will play fair. The research proposes a new way of working together, where the learners are equal peers and there’s no central server. This method is meant to stop a server from taking advantage and also keep clients from sending bad data. This paper presents a flexible framework that can take any good algorithm for combining model updates and make it work securely in a world where servers and clients might misbehave. The researchers also show that their approach can handle large models with many parameters and lots of peers, proving that it’s practical for real use.
DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets
Lazar Atanackovic, Alexander Tong, Bo Wang, Leo J. Lee, Yoshua Bengio, Jason Hartford
This study explores how cells control gene activity and function. The researchers focus on understanding the complex web of interactions between genes, addressing two key issues. First, gene networks are circular, not one-way pathways. Second, observations are often noisy, making it hard to pinpoint exact patterns. Traditional approaches tackle either the circular nature or the noise problem, but not both. Here, the team uses RNA velocity—how fast genes create products—to create a method that deals with both challenges. They introduce a new technique using Generative Flow Networks, which helps to map potential gene interactions by considering their dynamic and circular nature. This method offers a clearer understanding of gene networks than previous attempts.
Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
Haonan Duan, Adam Dziedzic, Nicolas Papernot, Franziska Boenisch
This research tackles privacy risks in large language models (LLMs), which learn from data prompts. These prompts can inadvertently leak sensitive information. The researchers validate that privacy threats are real by demonstrating a straightforward but effective attack that reveals whether specific data was used to train an LLM. The conventional alternative, fine-tuning models with private algorithms for gradient descent, sacrifices the simplicity and speed that prompts offer. Addressing this, the researchers introduce an innovative method called “private learning to prompt.” They find a way to create ‘soft prompts’—modifiable inputs for LLMs—in a private manner. However, this isn’t possible for ‘discrete prompts’, which are set and specific. As a solution, they suggest collecting and merging the output from multiple LLMs using a technique they liken to a ‘flock of stochastic parrots.’ This collective output becomes a single, privacy-friendly prompt. The findings reveal that the performance of LLMs using this privacy-focused method approaches that of standard, non-private methods, indicating its viability for practical use with existing online AI services.
Functional Renyi Differential Privacy for Generative Modeling
Dihong Jiang, Sun Sun, Yaoliang Yu
The study explores R’enyi differential privacy (RDP), a concept gaining traction as an alternative to traditional differential privacy (DP) due to its better composability and flexibility. Existing privacy methods using RDP are limited to randomizing outputs that are fixed-length vectors. In this work, the researchers build upon previous studies to adapt RDP for scenarios where the outcome could be a function—potentially with infinite dimensions. They develop a set of tools, including a version of the Gaussian mechanism adapted for sampled data, and rules for composition and post-processing. These tools are designed to help integrate RDP more easily into practical applications. To demonstrate its usefulness, they apply this extended version of RDP, named functional RDP (f-RDP), to functions within the mathematical space known as reproducing kernel Hilbert space (RKHS). In this context, they create a differentially private generative model (DPGM), where the machine learning model’s training process consists of safely releasing loss functions with RDP protection. Empirical results suggest that this new training approach offers a better balance between privacy and performance compared to current methods.
GAUCHE: A Library for Gaussian Processes in Chemistry
Ryan-Rhys Griffiths, Leo Klarner, Henry B. Moss, Aditya Ravuri, Sang Truong, Samuel Stanton, Gary Tom, Bojana Rankovic, Yuanqi Du, Arian Jamasb, Aryan Deshwal, Julius Schwartz, Austin Tripp, Gregory Kell, Simon Frieder, Anthony Bourached, Alex Chan, Jacob Moss, Chengzhi Guo, Johannes Durholt, Saudamini Chaurasia, Felix Strieth-Kalthoff, Alpha A. Lee, Bingqing Cheng, Alán Aspuru-Guzik, Philippe Schwaller, Jian Tang
GAUCHE is a library of mathematical tools designed to learn from chemical data. It’s built for handling Gaussian processes, a technique in machine learning known for being excellent at measuring uncertainty and improving decision-making based on predictions. Gaussian processes are really good at guessing the unknown based on what is known, especially in complex situations where uncertainty is a big deal. However, using them for chemistry is a bit like trying to fit a square peg into a round hole. Chemical data can be very complex, looking like intricate graphs, strings of information, or even a series of on-off signals (bit vectors). GAUCHE is designed to work with these complicated formats, transforming Gaussian processes into a powerful tool for chemists. The creators of GAUCHE aim to make it easier for those in the chemistry field to embrace advanced uncertainty measures and Bayesian optimisation—a method that balances exploration of new possibilities with the development of existing ones. They demonstrate GAUCHE’s potential in two important areas: discovering new molecules and figuring out the best conditions for chemical reactions. In essence, GAUCHE is meant to be a bridge that connects advanced machine learning techniques to the real-world puzzles of chemistry.
Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat Erdogdu
Recent findings suggest that for gradient-based learning of single index models (which depend on a one-dimensional projection of the input), the number of necessary learning samples, or ‘sample complexity’, is influenced by what’s called the information exponent. Past research focused on isotropic data, where input is uniformly distributed without any distinct orientation. However, real-world data often exhibits a ‘spiked covariance structure’, where data is unevenly distributed, influencing the learning process. This paper examines the impact of data with such a structure on model training. The researchers discover that standard spherical gradient methods might fail to detect the correct data orientation, even if it aligns with the desired outcome. They suggest that techniques similar to batch normalization in neural networks can mitigate this issue. Moreover, by exploiting the particular data distribution and its alignment with the targeted results, they demonstrate improved sample complexity over isotropic scenarios. Notably, with a large enough spike in the data’s structure, the study shows that gradient-based learning can require fewer samples and outperform certain established methods, despite the complexity suggested by the information exponent.
Have it your way: Individualized Privacy Assignment for DP-SGD
Franziska Boenisch, Christopher Mühl, Adam Dziedzic, Roy Rinberg, Nicolas Papernot
This paper tweaks a popular privacy-focused method used in training machines, known as Differentially Private Stochastic Gradient Descent (DP-SGD). To protect the privacy of people’s information in machine learning, researchers often use a “privacy budget.” This is like a limit on how much privacy can be risked when someone’s data is used to help train a computer to make decisions. However, everyone values their privacy differently. Some might not mind sharing more, while others want to keep their data as private as possible. To address this, the study proposes a new idea: why not let each person set their own privacy limit? This paper presents a new method called Individualized DP-SGD (IDP-SGD). By changing how the machine picks and uses the data and adjusting the ‘noise’ that is added to keep the data anonymous, IDP-SGD allows for privacy that matches each person’s preferences. The result is a more balanced system where privacy and the usefulness of the data are better aligned to serve individual needs.
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks
Jimmy Z. Di, Jack Douglas, Jayadev Acharya, Gautam Kamath, Ayush Sekhari
The research introduces a subtle yet potent cybersecurity threat: camouflaged data poisoning attacks. These attacks are particularly relevant in situations where machine learning models are frequently updated or “unlearn” specific data—a process that might occur when requested to forget or delete certain information. Here’s how the attack works: The attacker stealthily slips a few modified data points into the training set. These points are designed to lie dormant, initially having little or no effect on the model’s behavior. Later, the attacker triggers the removal of some of these data points. It’s at this moment—the retraining of the model—that the attack takes effect, and the model’s predictions start to go wrong. Specifically, the attack focuses on causing the model to incorrectly label a particular piece of data—this could be misidentifying an image or misclassifying text. To demonstrate the concept, experiments were run on image datasets such as CIFAR-10, Imagenette, and Imagewoof. The cunning part lies in how these poisoned points are created; they’re camouflaged to blend in with normal data, making the harmful effect only apparent after one of the poison pills is removed during model retraining. This method of attack opens up new concerns about the robustness of models in dynamic environments where data is frequently added or deleted.
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data
Alon Albalak, Colin Raffel, William Yang Wang
Teaching a model to learn from a small set of examples, known as few-shot learning, often leads to models that can’t generalize well—they become too tailored to their limited data (a problem called overfitting). This study presents an improved technique for few-shot learning that incorporates additional data to improve performance on the target task. Previous methods mixed in extra data but became unwieldy with large amounts of information.
The innovation here uses strategies from the multi-armed bandit problem—balancing between trying new options and using what’s known—to efficiently manage much larger datasets. Two new algorithms, EXP3-FLAD and UCB1-FLAD, are introduced, which are not overwhelmed by the quantity of auxiliary data and effectively blend exploration and exploitation. The results show a 4% increase in performance over previous methods. They also enabled training language models with fewer parameters to surpass the capabilities of the larger GPT-3 model, indicating a promising avenue for creating AI models that generalize better from limited examples.
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
Shalev Lifshitz, Keiran Paster, Harris Chan, Jimmy Ba, Sheila McIlraith
AI is moving beyond chatbots and into the massive open world of Minecraft. This paper introduces a powerful generative model trained on years of Minecraft gameplay from YouTube videos that can play the game and follow both natural language text and visual instructions. The model, which is called STEVE-1 (Steve is the main character in Minecraft), plays by looking at the pixels on the screen and choosing how to move the keyboard and mouse. The paper introduces a novel methodology, inspired by previous text-to-image models like DALL•E 2, that lets us build on existing foundation models with relatively little additional cost to create this powerful, instructable agent that can find resources, craft items, explore, and more. STEVE-1 bridges text and visual input with low-level behavioral control in the form of keyboard strokes and mouse clicks. Importantly, using a novel variant of hindsight relabelling, STEVE-1 learns to follow instructions without training on a specific set of tasks. Research materials, including model weights and training scripts, have been shared for further exploration in the field.
Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective
Jimmy Ba, Murat A Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu
In this research, we investigate how machine learning models learn a single-index target function under spiked covariance data. We ask the following question: how large should the spike magnitude be, in order for kernel methods and neural networks trained with gradient descent, to learn the underlying target function? Our result demonstrates that both kernel methods and neural networks benefit from low-dimensional structures in the data; moreover, under our setting, neural networks can adapt to such structures more effectively.
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, Yu Su
Text-guided image editing, useful for both personal and professional purposes like Photoshop, often depends heavily on manual adjustments due to the limitations of current zero-shot methods or those trained on noisy, synthesized datasets. To improve this, we’ve developed MagicBrush – a first-of-its-kind, carefully curated dataset aimed at instruction-based image editing tasks. It boasts over 10,000 sets of images paired with text instructions and their edited outcomes, fitting a variety of editing contexts, including both single and multiple edit sequences, with or without provided masks. We’ve fine-tuned a model called InstructPix2Pix using MagicBrush and achieved notably better results based on human evaluations. Beyond this, we’ve rigorously tested current image editing models against MagicBrush through various assessments, uncovering the challenges posed by our dataset and highlighting the disconnect between existing technologies and the demands of real-world image editing.
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu
Artificial Intelligence’s role in music, particularly in understanding it, lags behind other artistic AI ventures like visual art creation and writing assistance. To combat the challenges of scarce in-depth learning resources and standardized benchmarks in music AI, we introduce MARBLE. This benchmarking platform caters to Music Information Retrieval (MIR) tasks, offering a detailed taxonomy that spans acoustic features to abstract descriptions. MARBLE implements a standard evaluation protocol using 14 tasks over 8 public datasets to consistently assess the capabilities of various music AI models. Designed to be accessible, scalable, and aligned with copyright norms, MARBLE sets the stage for replicable research while encouraging improvement and innovation in music AI. Preliminary results highlight the promise of recent large-scale music models, with opportunities for refinement. Access to MARBLE’s leaderboard and resources is publicly available to inspire future developments in music AI.
MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy
Honghua Dong,, Jiawei Xu, Yu Yang, Rui Zhao, Shiwen Wu, Chun Yuan, Xiu Li, Chris J. Maddison, Lei Han
Graph neural networks (GNNs) typically excel at local data processing but fall short when it comes to recognizing long-range interactions (LRIs) within graphs. Our MeGraph model innovatively merges local graph structures with an overarching graph hierarchy into one unified framework to address this. This layered approach alternates between local message-passing at various scales and integrating insights across the full graph hierarchy. By continuously blending local and global information this way, MeGraph achieves an enhanced balance in data analysis. Validated by a newly developed benchmark specifically designed to test LRI detection, MeGraph showcases superior performance. It holds its ground against or outperforms leading models in established benchmarks and demonstrates its capability across diverse real-world datasets, highlighting its versatility and effectiveness in graph data analysis.
Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations
Guanren Qiao, Guiliang Liu, Pascal Poupart, zhiqiang xu
Inverse Constraint Reinforcement Learning (ICRL) learns the hidden rules that experts demonstrate without being explicitly told. Traditional methods work on the assumption that all the expert behaviors come from a single type of expert, which oversimplifies real situations with diverse experts. Our new technique, Multi-Modal Inverse Constrained Reinforcement Learning (MMICRL), can distinguish and learn from multiple experts’ rules at once. It identifies various experts in the data and adapts to each one’s specific constraints. MMICRL refines its learning process through an objective that ensures it can replicate the nuanced behaviors of different experts while preserving behavioral diversity. Integrated with contrastive learning to improve its robustness, MMICRL has proven in tests to excel in identifying constraints and performing tasks, surpassing other methods.
Neural Lighting Simulation for Urban Scenes
Ava Pun, Gary Sun, Jingkang Wang, Yun Chen, Ze Yang, Sivabalan Manivasagam, Wei-Chiu Ma, Raquel Urtasun
Outdoor lighting changes can undermine the effectiveness of robots that rely on visual data, particularly if they haven’t been trained under varying light conditions. LightSim is our solution—a camera simulation tool designed to create a diverse and realistic set of images under different lighting scenarios. This system uses sensor data to generate detailed 3D models of urban environments, which can have their elements altered, removed, or viewed from new perspectives, all while maintaining accurate lighting. LightSim employs a combination of true-to-life rendering techniques and learning-based adjustments to change light conditions, like the position and intensity of sunlight. The result is a consistent set of virtual videos that mimic real light variations. Tests demonstrate LightSim’s superior ability to replicate realistic lighting compared to previous systems. More crucially, when robots are trained with videos from LightSim, their ability to perceive and understand visual data in different lighting improves markedly.
Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression
Ayoub El Hanchi, Murat Erdogdu
In this study, we examine a statistical method known as empirical risk minimization used for predicting relationships between variables in linear regression with a focus on the ‘p-norm’ for ‘p’ values from just above 1 to infinity. We discover that, when our model perfectly predicts the outcome without any prior assumptions, only a number of samples equal to the number of predicting variables is needed to pinpoint the accurate relationship. When ‘p’ is 2 or more, given minimal assumptions, we confirm a reliable estimate for how much our prediction’s risk may exceed the actual risk. This applies to ‘p’ values between 1 and 2 as well, assuming the method’s applicability is confirmed through certain mathematical conditions.
Private Distribution Learning with Public Data: The View from Sample Compression
Shai Ben-David, Alex Bie, Clément L. Canonne, Gautam Kamath, Vikrant Singhal
This research examines how to learn about a data distribution in a way that keeps certain data private, when also given some public data alongside it, termed public-private learning. In this scenario, the learner uses both publicly available data and privately held data taken from an unknown distribution to estimate that distribution. The key is that the learner must protect the privacy of the private data according to strict privacy rules, known as pure differential privacy. The findings suggest that the ability to learn from both public and private data sources in this manner is tied to two concepts. The first is whether the data can be represented by a smaller, simpler set, often called a sample compression scheme. The second is a new idea called list learning. By exploiting these relationships, the study was able to confirm previous findings on Gaussian distributions and also to provide new insights. These include estimates of how much data is needed for learning with mixtures of Gaussian distributions, outcomes for learners that can handle inaccuracies and shifts in data distribution, and how learnability is maintained when mixing and matching different distributions. An additional discovery is that when learning Gaussian distributions in a multi-dimensional space, at least the number of dimensions’ worth of public samples is necessary to ensure private learnability. This number is nearly as high as the current known limit, which is just one more than the number of dimensions.
Probabilistic Invariant Learning with Randomized Linear Classifiers
Leonardo Cotta, Gal Yehuda, Assaf Schuster, Chris Maddison
Building models that are complex yet respect task-specific consistencies is challenging and often demands significant computational resources. Our innovation lies in applying randomness to create models that are both intricate and consistent but use fewer resources. This approach hinges on embracing a probabilistic version of universality and invariance, leading to more resource-efficient models. We present Randomized Linear Classifiers (RLCs), a new type of binary classification model that can probabilistically approximate smooth functions and retain invariance with high likelihood under certain parameter and data size constraints. These RLCs are specially designed for classification tasks with invariance on sets, graphs, and spheres, achieving this with less resource usage compared to conventional neural networks. Our experiments confirm that RLCs perform effectively in tasks where deterministic models with invariance often underperform, demonstrating the merit and resource efficiency of our probabilistic approach.
Resolving Interference When Merging Models
Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal
Transfer learning involves refining a model that’s already been trained with a focus on a new, specific task. It offers benefits like better performance, quicker learning, and the need for fewer examples to learn effectively. However, these improved models are usually restricted to just one task and don’t share what they’ve learned with similar models. To address this, the field has seen the development of methods that aim to combine these single-task models into one model that can handle multiple tasks simultaneously, without needing further training. But these merging techniques often failed because they didn’t consider how different parts of the models might interfere with each other, leading to worse overall performance. The paper presents a new method called TIES-Merging, which better fuses models by: (1) resetting aspects that changed very little in training, (2) fixing conflicts where models disagree on whether a feature should be more or less important, and (3) combining features only when there is agreement on their importance. This method has proven to be more effective across various test scenarios, including different types of tasks, model complexities, and architectures. The study also looks into how different kinds of interference affect the merged model, emphasizing the need to address conflicts in feature importance.
Robust Data Valuation with Weighted Banzhaf Values
Weida Li, Yaoliang Yu
A recent study by Wang and Jia, tackled the challenge of determining the importance of individual pieces of data used to train artificial intelligence. Common methods, like the Shapley value, struggle because of unpredictable factors in the calculations that lead to inconsistent rankings of data importance. Instead, Wang and Jia suggest using the Banzhaf value, which they believe is less affected by this unpredictability. However, when looking at a wider set of Banzhaf values that have been adjusted with weights, the study finds that the regular Banzhaf value isn’t always the most stable. The researchers use a new approach called Kronecker noise, which helps them to measure the unpredictability and find a way to adjust the Banzhaf values to make them more consistent. They develop a new method that estimates these adjusted Banzhaf values more efficiently and quickly, performing well when tested with both theoretical noise and real-world, unpredictable data. This could make it a valuable tool for figuring out how important each piece of data is when teaching AI systems. Their findings suggest that these weighted Banzhaf values hold potential for dealing with the uncertainties in assigning value to training data.
Scaling Data-Constrained Language Models
Niklas Muennighoff, Alexander Rush, Boaz Barak, Teven Le Scao, Nouamane Tazi, Aleksandra Piktus, Thomas Wolf, Colin Raffel, Sampo Pyysalo
Current scaling of language models often involves increasing the number of parameters and the amount of training data (which is typically sourced from the internet). This strategy may soon reach a data ceiling due to finite text available online. Addressing this challenge, our research explores model scaling with limited data. We experimented with various levels of data reuse and computational limits, observing the effects on models with up to 9 billion parameters. We learned that reusing data multiple times, up to four epochs, doesn’t harm the model if the compute resources remain fixed. Beyond this point, however, the benefit of additional computing power plateaus, providing no further gains in model performance. We propose a new formula to guide when to invest in computational resources, considering the diminishing returns on data reiteration and surplus parameters. Our research also tests alternative ways to enhance limited datasets for training, to maintain model improvement without relying on vast, unique texts.
Shaped Attention Mechanism in the Infinite Depth-and-Width Limit at Initialization
Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy
In the realm of deep learning, Transformers are a type of network architecture that has become popular due to its effectiveness in managing sequences, like language. This paper explores a way to predict how easily such networks can be trained by analyzing the covariance matrix of the outputs—a snapshot of the network’s learning dynamics—especially when these networks are scaled up significantly in size. To do this, we modified the attention mechanism, a fundamental component of how Transformers weigh the importance of different parts of data. We introduced adjustments for networks in the proportional limit, where depth and width are infinitely large. The study found that at the start of training, the expected learning behavior of these massive networks is captured by a stochastic differential equation (SDE) defined by how depth scales with width. For ensuring stability in this scaled-up scenario, it’s essential to tweak the softmax function within the Transformer’s attention mechanism. This involves a delicate balance introduced by centering and scaling adjustments relative to the network’s size. The resulting network, called a “shaped Transformer,” demonstrates stability and predictability in learning, even when the network is vast. Simulations confirm that the SDE model is surprisingly accurate in reflecting the actual behavior of substantial networks, paving the way for future large-scale, trainable deep learning models.
Sharp Calibrated Gaussian Processes
Alexandre Capone, Sandra Hirche, Geoff Pleiss
Gaussian processes are widely used in engineering and science to predict outcomes and estimate uncertainties. However, these estimates don’t always align with what’s observed in the real-world—a problem known as miscalibration. Current methods to fix this issue typically involve making the range of uncertainty larger, but this can lead to overly broad and impractical confidence intervals. To address this, the paper describes a new method that generates frequentist confidence intervals for Gaussian processes using a mathematical mechanism similar to the posterior predictive variance computation. These confidence intervals are free to use different kernel hyperparameters than the posterior mean prediction, enabling confidence intervals that obtain tight frequentist coverage guarantees. The results show that this new calibration method outperforms existing ones, promising better reliability in practical applications.
Similarity-based cooperative equilibrium
Caspar Oesterheld, Johannes Treutlein, Roger Grosse, Vincent Conitzer, Jakob Foerster
In the quickly evolving field of machine learning, systems are becoming more independent, often having to make decisions while interacting with other similar systems. One classic problem where cooperation is key is the Prisoner’s Dilemma—a situation where two parties must decide to cooperate or betray each other without knowing the other’s decision. According to traditional game theory, machine learning (ML) agents are expected to choose betrayal because it seems safer. Previous research has suggested that if these agents could fully see into each other’s ‘thinking’—like seeing each other’s source code or, for ML agents, their weights—they might choose to cooperate. But complete openness isn’t always practical, whereas a partial glimpse into one another’s mechanisms is more common. Addressing this middle ground, the paper presents a scenario where agents only know a single piece of information about each other: a number showing how similar one agent is to the other. The authors prove that even this slim insight is enough to reach cooperative decisions, just as if they had full transparency. Furthermore, they show that ML agents can actually learn to cooperate in this setting through straightforward learning techniques. These findings could be vital for designing ML systems that need to interact and make decisions in social settings.
Spatially Resolved Gene Expression Prediction from Histology Images via Bi-modal Contrastive Learning
Ronald Xie, Kuan Pang, Sai W. Chung, Catia T. Perciani, Sonya A. MacParland, Bo Wang, Gary D. Bader
This paper showcases a new method called BLEEP that helps doctors and researchers examine tissues more closely and quickly understand the genes at work. By looking at tissue slides stained with special dyes, BLEEP uses a sophisticated technique to map genes associated with different diseases. It learns from a large number of examples to predict gene activity in any part of a tissue slide. This is faster and cheaper than traditional ways of studying gene expressions. Tested on human liver samples, BLEEP outperformed current methods, promising to speed up disease research and diagnosis while saving costs. This breakthrough suggests a future where analyzing tissues at the genetic level could become routine for medical professionals, improving our understanding and treatment of various illnesses.
Structured Neural Networks for Density Estimation and Causal Inference
Asic Q. Chen, Ruian Shi, Xiang Gao, Ricardo Baptista, Rahul G. Krishnan
Adding specific patterns or structures to neural networks can help them perform certain tasks more effectively. For example, in creating models that generate data, it’s helpful if the model can understand and respect the relationships and independencies between different pieces of data, much like a Bayesian network—a statistical model that represents a set of variables and their conditional dependencies. The study proposes a novel approach called Structured Neural Network (StrNN), which incorporates such patterns by selectively blocking certain connections in the network. The key to StrNN’s design is a fresh look at how neural networks can be linked to the concept of binary matrix factorization—a mathematical method for breaking down complex problems into simpler parts. While the problem of designing these structures is typically very complex (NP-hard)—meaning it’s computationally intense—the research offers new algorithms that manage this complexity by tailoring the network architecture, ensuring that the model behaves as desired. StrNN’s potential is showcased in three scenarios: estimating probabilities for binary and continuous data, and analyzing cause-and-effect relationships—which is crucial for understanding the influence of one variable over another. This work paves the way for more data-efficient neural networks, serving as a stepping stone for using generative models to estimate causal effects.
Subject-driven Text-to-Image Generation via Apprenticeship Learning
Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William Cohen
The creation of images from text descriptions has advanced greatly with models like DreamBooth, which can produce highly personalized images of a specific subject using a handful of examples. Although effective, this approach is costly since it requires training a separate model for every subject. This paper introduces SuTI, a new model that creates images of a new subject in various scenes immediately after seeing a few examples, avoiding the need for costly individual model training. SuTI employs apprenticeship learning, where one ‘apprentice’ model learns from the output of many ‘expert’ models, each trained on a different subject using vast numbers of image clusters collected from the internet. As a result, SuTI mimics the experts’ capabilities to generate custom images very rapidly. Compared to existing methods that rely on fine-tuning for every subject, SuTI works much faster—20 times faster than the current state-of-the-art methods. When tested against other models on DreamBench and its updated version, DreamBench-v2, SuTI excelled, particularly in its ability to capture the essence of the subject and align with the text descriptions, according to evaluations by humans.
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation
Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet
Recently, a type of AI model known as denoising diffusion probabilistic models has been making waves in image creation, known for its high-quality and varied outputs. This research reveals that they’re also remarkably good at estimating optical flow (the pattern of apparent motion of objects in a visual scene) and monocular depth (the distance of objects from the viewpoint, using one camera). What’s surprising is that they achieve this without needing specialized structures or custom-made error measures, typically essential for these tasks. Unlike traditional methods that give a single best-guess answer, these diffusion models can use Monte Carlo methods—a statistical technique—to represent uncertainties and multiple possible answers for things like object movement and depth. By cleverly mixing self-supervised learning (where the system teaches itself using available data), a combination of both simulated and real data, and new technical methods that deal with imperfect training data, researchers trained top-notch models for depth and flow estimation. Through thorough testing and adjustments, and with special enhancements, these models—referred to as DDVM (Denoising Diffusion Vision Models)—set new records for accuracy in predicting how far away things are in images from indoor scenes and how things are moving in driving scenarios, surpassing previous methods by about 25%.
Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design
AkshatKumar Nigam, Robert Pollice, Gary Tom, Kjell Jorner, John Willes, Luca A. Thiede, Anshul Kundaje, Alan Aspuru-Guzik
One of chemistry’s big challenge is designing molecules with intended properties quickly, which is pivotal for advancing drug discovery, material science, and catalysis. Despite strides in computer power and AI, there’s been less progress in benchmarks—realistic tests to see if these methods can handle actual molecule design in the real world. This study introduces a series of practical benchmarks, using physical simulations to replicate the complex nature of designing molecules for use in materials, pharmaceuticals, and chemical reactions. The researchers used these benchmarks to test several established algorithms and found that an algorithm’s success depends greatly on the specific type of molecule design challenge it faces. These new benchmarks aim to steer the development of molecule design techniques towards more realistic scenarios, bridging the gap between theoretical promise and practical application in industry and academia.
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Shengran Hu, Jeff Clune
Language is often considered a key aspect of human thinking, providing us with exceptional abilities to generalize, explore, plan, replan, and adapt to new situations. However, AI agents rarely think in natural language. We introduce a novel method, Thought Cloning, that enables AI agents to imitate humans thinking out loud while acting, thereby teaching them to think and act like humans. Human children receive feedback from teachers not only on the actions they take but also on the reasoning behind their actions.Thought Cloning is similar in that AI agents are taught to have clear thinking behind their actions. We compare Thought Cloning to the standard practice of having AI agents imitate the actions (only) that humans take while solving tasks, which is called Behavioral Cloning. Experiments reveal that Thought Cloning not only learns faster and outperforms Behavioral Cloning, but also does better and learns faster in novel situations. Thought Cloning also provides important benefits for AI Safety and Interpretability. Because we can observe the AI’s thoughts, we can better understand why the agent does things, which also makes it easier to fix agent training if it is not working for a task. If an agent plans to do something unsafe, we can also prevent it from doing so. Overall, by training agents how to think as well as behave, Thought Cloning creates safer, more powerful agents
Tools for Verifying Proofs-of-Training-Data
Dami Choi, Yonadav Shavit, David Duvenaud
What could a “nuclear inspector” for large neural models check if they had access to the training checkpoints? We propose a simple protocol for verifying claims about very large SGD training runs. We show how, based on weight checkpoints, one can detect spoofed claims about:
Our scheme is simple: Model trainers set their random seed to a hash of the data and code, and save regular checkpoints. The verifier looks for anomalies in the training stats, and re-runs any suspicious-looking segments. The anomaly-search is cheap, e.g. an extra 1.3% on GPT2.
Training Private Models That Know What They Don’t Know
Stephan Rabanser, Anvith Thudi, Abhradeep Thakurta, Krishnamurthy Dvijotham, Nicolas Papernot
Creating deep learning models that make cautious rather than overconfident errors is difficult, and it’s even more challenging when the models must protect data privacy. Privacy protection, known as differential privacy (DP), can introduce extra randomness that complicates training. This study examines selective classifiers, which have the option of not making a prediction when unsure, in the context of DP. The researchers find that common selective prediction methods might fail under DP as they could leak private information. However, they note one recent method, which employs checkpoints from standard private learning algorithms, works well with DP. The study also reveals that while DP safeguards privacy, it adversely affects the performance of selective classifiers. To assess the impact of DP on selective classifiers across different levels of privacy, the authors introduce a new evaluation approach. Their experiments show that while it’s possible to reach the performance of non-private models, doing so requires sacrificing the model’s coverage, or the range of data it can confidently predict, as the privacy safeguards become more stringent.
Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers
Yiwei Lu, Yaoliang Yu, Xinlin Li, Vahid Partovi Nia
BinaryConnect (BC) and its variants are common methods for neural network binarization, which simplifies networks to binary values for efficiency. But binarization encounters a snag with training due to the sign function’s gradient being zero, halting progress as weights can’t update. To bypass this, “training tricks” like approximate gradients are used for continued training, despite lacking solid theoretical underpinnings. This paper seeks to rationalize these practices through an optimization lens. It does so by advancing ProxConnect (PC) to ProxConnect++ (PC++), which encapsulates various binarization methods. The authors introduce a systematic approach to crafting quantizers, tools that convert continuous signals to binary, ensuring theoretical performance guarantees. They showcase this advancement with the new BNN++ algorithm. Through image classification tests on complex networks, BNN++ has shown promising results, suggesting that it could enhance binary network training while reinforcing the theoretical framework behind these optimization techniques.
VisAlign: Dataset for Measuring the Degree of Alignment between AI and Humans in Visual Perception
Jiyoung Lee, Seungho Kim, Seunghyun Won, Joonseok Lee, Marzyeh Ghassemi, James Thorne, Jaeseok Choi, O-Kil Kwon, Edward Choi
AI alignment ensures that machine learning models pursue outcomes that align with human intentions, preferences, or ethics. However, due to the opaque nature of large-scale deep learning models, it’s difficult to manually direct their actions. To ensure AI safety, measuring how closely AI’s visual perception matches human perception can be crucial. This paper introduces a novel dataset specifically designed to assess AI-human visual alignment based on image classification, a key aspect of visual understanding. To be effective, such a dataset must cover a broad range of real-world scenarios and include definitive human judgment as the standard. The proposed dataset comprises three types of image samples, categorized as Must-Act (or Must-Classify), Must-Abstain, and Uncertain images. These categories reflect the amount and clarity of visual information present. For example, Uncertain images are highly blurry, and labeling for these was crowd-sourced to capture human perception accurately. The dataset’s structure follows established sampling theory, statistical principles for survey design, and expert input. Utilizing this dataset, the paper evaluates how well five leading visual perception models and seven methods for deciding when to abstain from making a prediction align with human visual judgment, contributing to the field of AI safety.
Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation
Kirill Neklyudov Jannes Nys, Luca Thiede, Juan Carrasquilla, Qiang Liu, Max Welling, Alireza Makhzani
We propose “Wasserstein Quantum Monte Carlo”, a novel approach for solving the quantum many-body Schrödinger equation, which is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. We approach the energy functional minimization in QM from a purely probabilistic perspective, rather than the conventional wave-function formulation. This new framework allows us to transform the energy minimization problem to a probabilistic inference problem, where the target density is the ground-state density. We then borrow ideas from the probabilistic inference literature and propose to use (projected) Wasserstein gradient flows to minimize the energy functional directly in the space of distributions. We show that our method, called Wasserstein Quantum Monte Carlo, converges faster than the conventional Quantum Variational Monte Carlo (which we interpret as a projected Fisher–Rao gradient flow) for different molecular systems.
Medical Imaging meets NeurIPS
DOU QI, Konstantinos Kamnitsas, Yuankai Huo, Xiaoxiao Li, Daniel Moyer, Danielle Pace, Jonas Teuwen, Islem Rekik
‘Medical Imaging meets NeurIPS’ is a satellite workshop established in 2017. The workshop aims to bring researchers together from the medical image computing and machine learning communities. The objective is to discuss the major challenges in the field and opportunities for joining forces. This year the workshop will feature online oral and poster sessions with an emphasis on audience interactions. In addition, there will be a series of high-profile invited speakers from industry, academia, engineering and medical sciences giving an overview of recent advances, challenges, latest technology and efforts for sharing clinical data.