dsml-banner
dsml-banner_mobile-scaled


DSML 2021
Data Shift in Machine Learning:
What is it and what are potential remedies

September 13, 14:30 to 19:00 Bilbao, Spain (CET) / 8:30am to 1:00pm (EDT)

Register

Machine learning models are conventionally trained under the premise that the training and the real-world (i.e., both source and target) data are sampled from the same distribution. This assumption may potentially lead to predictive problems in dynamic environments where the distribution of data changes over time. This is known as dataset shift.

In most real-world situations, machine learning models have to cope with dataset shift after deployment. The shift in the distribution could be dramatic for unexpected reasons, e.g., the breakout of COVID-19 pandemic or cyber attacks.

This tutorial will present:

  1. The principles behind data shift.
  2. Strategies for detecting dataset shift.
  3. Adaptation techniques.
  4. Advanced topics in data shift for enhancing machine learning models in situations where dataset shift is inevitable.


Description

This tutorial aims to provide a comprehensive understanding of dataset shift and explore potential remedies in the face of distribution shift. The learning outcomes for the participants are:

  • Understand characteristics of dataset shift by real-world examples and relationship between input features and output changes;
  • Learn different types of dataset shifts and the terminologies;
  • Learn approaches to detect dataset shifts including domain classification, multivariate Kolmogorov–Smirnov test, and Black Box shift estimation;
  • Gain the knowledge of different adaptation techniques for dataset shift, including sample re-weighting, mapping to a common feature space, etc.;
  • Learn how to adapt an existing ML model to shifts using transfer learning and active learning rather than retraining on the target task from scratch;
  • Get acquainted with computational techniques and libraries through hands-on practices;
  • Get familiar with recent topics and open problems in the field.

All machine learning researchers, practitioners, novices and graduate students will benefit from this tutorial as we cover topics ranging from basic to advanced. Experienced machine learning experts will benefit from the advanced topics sessions as we will discuss state-of-the-art solutions. We will share our slides and hands-on materials with participants before the sessions via Jupyter Notebook or Google Colab.


Outline

Developing adaptive methods to deal with the dataset shift phenomenon is an open problem in machine learning. It is because the dataset shift can introduce challenges to the deployment of ML tools. This tutorial covers solutions to address such problem as follows:

Title Topics Covered
15 min Kick-off
  • Introduction
  • Industrial applications initiated by Vector Institute
  • Data shift challenge
40 min Characterizing Dataset Shift in Machine Learning
  • Terminology and taxonomy
  • Overview of different types and causes of dataset
    shift
  • Theory of learning from different domains
15 min Q&A
40 min Dataset Shift Detection and Potential Remedies
  • Covariate shift adaptation using sample re-weighting
  • Important considerations in determining a reliable
    importance weight estimation
  • Estimating target label distribution
  • Label shift adaptation using black-box predictors
  • Concept shift adaptation
15 min Q&A
45 min Advanced Topics: Transfer Learning and Active Learning
  • Introduction to transfer learning
  • Transfer learning under different settings
  • Introduction to active learning
  • Pool-based and stream-based active learning
  • Active learning uncertainty sampling
40 min Hands-on Practice
  • Techniques for dataset shift detection and adaptation
  • Practical use of available packages in dataset shift
15 min Q&A and Concluding Remarks


Organizers / Tutorial Leaders

Ali Pesaranghader

AI Research Scientist
LG Toronto AI Lab

Ali Pesaranghader is an AI Research Scientist at LG Toronto AI Lab, and a former Sr. Research Scientist at the Canadian Imperial Bank of Commerce (CIBC) with primary research interests in adaptive learning, data stream mining, natural language processing, and transfer learning. Ali obtained his Ph.D. in Computer Science with a focus on Adaptive Machine Learning at the University of Ottawa in 2018.

Mehdi Ataei

Research Affiliate
Vector Institute

Mehdi Ataei is a research affiliate at the Vector Institute with a Ph.D. in Computational Physics from University of Toronto. He is currently a researcher at Autodesk’s Simulation, Optimization, and Systems group. His current research is focused on computational physics, applied mathematics, topology optimization, and machine learning.

Sedef Akinli Kocak

Project Manager
Vector Institute

Sedef Akinli Kocak is an academic industry R&D partnership and project manager in the area of AI/ML and is an accomplished researcher in the area of ICT for Sustainability and Advance Analytics. She has a Ph.D. in Environmental Applied Science and Management from Data Science Lab at Ryerson University. She is currently with the Vector Institute as an AI Project Manager. She is also a part-time lecturer and supervisor in the Data Science and Analytics Program at Ryerson University.


Contact Us

Please use this contact form ONLY for communicating with the DSML tutorial organizers.

For registration and registration-related inquiries, please use this button link:

Register
Scroll to Top