Unlearning

SISA training equations and diagram depicting the flow of training set data to aggregation and output

July 30, 2020

By Ian Gormely

“The internet never forgets,” was once thought to be a foundational truism of life online. But a spate of legal rulings have challenged that notion, and the ability to erase a person’s online footprint is emerging as a bedrock principle of digital privacy rights.

But while legislation like the European Union’s “Right to be Forgotten” regulation lays out important digital privacy principles, it fails to offer technical solutions for how this might be achieved in a hyper-connected online world where a single post can be aggregated across multiple channels.

Likewise, deleting someone’s data from an AI algorithm is a time-consuming process. It can cost companies valuable resources, and delay action on someone’s reasonable request to have their information scrubbed clean.

To address this problem, Vector Faculty Member, Canada CIFAR AI Chair and Assistant Professor, Department of Electrical & Computer Engineering UoT Nicolas Papernot and his team looked at how models could be trained differently to make processing these requests easier while updating a model without fundamentally altering it. “That’s the guarantee that we want to provide users.”

AI models and algorithms are created by using millions of data points from thousands of people. “You have to assume that these models are direct by-products of the data,” says Papernot. During the training process, where algorithms learn by combing through examples or data points, every data point is used to update all of the model parameters. Every future update will depend on that specific data point. “So if you delete that data point, you should also delete the models.”

Of course, scrapping a model wholesale is generally not an option for researchers or business. So Papernot and his co-authors looked at different ways data could be presented to a model so that small tweaks might be made.

Their paper, “Machine Unlearning,” which was recently accepted to the IEEE Symposium on Security and Privacy, the leading conference for computer security and electronic privacy, offers a two pronged approach. First, they “shard” the data, creating many smaller models as opposed to one big one, thereby restricting the influence of any one data point. “We then ask the different models to vote on the label they predict,” says Papernot. “We count how many votes each class received and output the class that received the most number of votes.”

Then they “slice” the shards and present the data to the model in small increments, increasing the amount of data each time while creating checkpoints along the way. “So when someone asks us to unlearn their data, we can revert to the checkpoint that was saved before we started analyzing their data,” saving time and resources.

Papernot, whose research is focused on areas of privacy and security in machine learning, is not alone in looking at ways to tackle this problem. As AI is integrated into so many facets of society, unlearning data becomes a growing issue for companies.

But he and his team were among the first to look at it, and their approach is more wholesome than many of their peers. “We wanted to be completely agnostic as to the kind of algorithm people are using so that you can just throw it in with whatever pipeline you have.”

The goal, he explains, is to make it practical for organizations to receive and process these requests quickly. “If the model takes a week to re-train, that slows down the speed at which an organization will handle these requests. We’re saying that you can do that more regularly and the smaller cost.”

Unlearning

Related:

Vector Institute and South Korea’s National AI Research Lab partner to accelerate frontier AI research

The AI Scientist: Towards full automation of the research life cycle

Vector researchers advance generative AI, responsible AI, and scientific discovery at ICML 2026

Anne Martel: Using AI to personalize cancer treatment

Mohamad Moosavi: Accelerating the search for climate solutions with AI

Hassan Ashtiani: Building trustworthy AI through mathematical foundations

Vector researchers advance representation learning and deep learning research at ICLR 2026

Remarkable 2026 Poster Session: 60 research projects shaping AI’s future

CRISPNAM-FG: An interpretable Fine-Gray deep survival model for competing risks in health care

The New Cartography of the Invisible

Vector researchers advance AI frontiers with 80 papers at NeurIPS 2025

When smart AI gets too smart: Key insights from Vector’s 2025 ML Security & Privacy Workshop

Vector Institute names 13 new Faculty Members, expanding core research leadership across Ontario

Vector researchers dive into deep learning at ICLR 2025

Vector researchers tackle real-world AI challenges at ICML 2025

Transforming Youth Mental Health Support: FAIIR’s AI-Powered Crisis Response Model

AI Weather Forecasting Breakthrough: How Canadian Innovation is Transforming Climate Prediction | Aardvark Weather

Exploring Intelligence: Vector Faculty Member Kelsey Allen’s Path from Particle Physics to Cognitive Machine Learning

Real World Multi-Agent Reinforcement Learning – Latest Developments and Applications

Leveraging Large Language Models for More Efficient Systematic Reviews in Medicine and Beyond

Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making

Recommender Systems: Where Academia Meets Industry

My Visiting Researcher Term at Vector Institute

Vector researchers presenting more than 98 papers at NeurIPS 2024

Unlocking the Potential of Prompt-Tuning in Federated Learning

New multimodal dataset will help in the development of ethical AI systems

Unveiling Alzheimer’s: How Speech and AI Can Help Detect Disease

Vector co-founder Geoffrey Hinton wins the Nobel Prize in Physics 2024

Empowering Air Travelers: A Chatbot for Canadian Air Passenger Rights

Vector Institute researchers reconvene for the second edition of the Machine Learning Privacy and Security Workshop

Vector researcher Wenhu Chen on improving and benchmarking foundation models

Vector Researchers present papers at ACL 2024

AtomGen: Streamlining Atomistic Modeling through Dataset and Benchmark Integration

Vector researchers presented more than 50 papers at ICML 2024

Vector researchers are presenting over a dozen papers at CVPR 2024

Vector Institute Computer Vision Workshop showcases the field’s current capabilities and future potential

Vector researcher Gautam Kamath breaks down the latest developments in robustness and privacy

World-leading AI Trust and Safety Experts Publish Major Paper on Managing AI Risks in the journal Science

Standardized protocols are key to the responsible deployment of language models

The known unknowns: Vector researcher Geoff Pleiss digs deep into uncertainty to make ML models more accurate

Breaking Ground: Natural language processing headlines Vector Institute’s latest workshop gathering

Vector Research Blog: Is Your Neural Network at Risk? The Pitfall of Adaptive Gradient Optimizers

How Vector Researcher Xi He uses differential privacy to help keep data private

Vector Research Blog: Structured Neural Networks for Density Estimation and Causal Inference

Vector Research Blog: Causal Effect Estimation Using Machine Learning

Machine learning theory takes centre stage at Vector Institute workshop

Introducing FlexModel: Breakthrough Framework for Unveiling the Secrets of Large Generative AI Models

Neutralizing Bias in AI: Vector Institute’s UnBIAS Framework Revolutionizes Ethical Text Analysis

Vector researchers presenting more than 65 papers at NeurIPS 2023

AI for Chemistry and Materials: blending old and new ways of thinking

AI & public health: using natural language processing for clinical database management

ICML 2023: Developing an adaptive computation model for multidimensional generative tasks

Vector Research Blog: Large Language Models, Prompting and PEFT

Dan Roy named Vector Research Co-Director

Unlocking AI-powered approaches to cancer treatment and detection

Vector community explores data privacy research at Machine Learning Privacy and Security Workshop

Machine Learning Meets Quantum Mechanics: Vector Workshop Showcases Groundbreaking Developments in Quantum Computing

Over 20 Vector research papers accepted at CVPR 2023

Vector research featured at ICLR 2023

AI Research Symposium highlights new Vector research

Vector researchers win top honours at NeurIPS 2022

Canada can lead in AI for Science

Vector researcher Alán Aspuru-Guzik delivers CIFAR Massey Talk

Deep Learning for Building Footprint Extraction in Aerial Imagery

Graham Taylor named Vector Research Director

Acceleration Consortium, Matter Lab, and Vector Institute collaborate on software to power self-driving labs

New Vector Faculty Member Jeff Clune’s quest to create open-ended AI systems

Vector research blog: Value Gradient weighted Model-Based Reinforcement Learning

New AI framework helps map and manage invasive mussel species in Canada’s lakes

Computer Vision Technical Report details insights from industry-academic collaborative project

Vector researchers recognized with awards at the 2022 International Conference on Learning Representations (ICLR)

Research Symposium brings together Vector community to celebrate student and postdoc work

Amateur hockey given professional viewing experience courtesy of machine vision startup co-founded by Vector researcher

AI-enabled tool that identifies COVID-19 variants co-developed by Vector researcher Bo Wang

Technology, including AI, increasingly plays a key role in our food chain

Spotlight on Health at NeurIPS 2021

Vector researchers presenting more than 50 papers at NeurIPS 2021

Vector researchers help institutions ensure privacy and confidentiality when sharing ML models