Data Shift in Machine Learning:
What is it and what are potential remedies
September 13, 14:30 to 19:00 Bilbao, Spain (CET) / 8:30am to 1:00pm (EDT)Register
Machine learning models are conventionally trained under the premise that the training and the real-world (i.e., both source and target) data are sampled from the same distribution. This assumption may potentially lead to predictive problems in dynamic environments where the distribution of data changes over time. This is known as dataset shift.
In most real-world situations, machine learning models have to cope with dataset shift after deployment. The shift in the distribution could be dramatic for unexpected reasons, e.g., the breakout of COVID-19 pandemic or cyber attacks.
This tutorial will present:
- The principles behind data shift.
- Strategies for detecting dataset shift.
- Adaptation techniques.
- Advanced topics in data shift for enhancing machine learning models in situations where dataset shift is inevitable.
This tutorial aims to provide a comprehensive understanding of dataset shift and explore potential remedies in the face of distribution shift. The learning outcomes for the participants are:
- Understand characteristics of dataset shift by real-world examples and relationship between input features and output changes;
- Learn different types of dataset shifts and the terminologies;
- Learn approaches to detect dataset shifts including domain classification, multivariate Kolmogorov–Smirnov test, and Black Box shift estimation;
- Gain the knowledge of different adaptation techniques for dataset shift, including sample re-weighting, mapping to a common feature space, etc.;
- Learn how to adapt an existing ML model to shifts using transfer learning and active learning rather than retraining on the target task from scratch;
- Get acquainted with computational techniques and libraries through hands-on practices;
- Get familiar with recent topics and open problems in the field.
All machine learning researchers, practitioners, novices and graduate students will benefit from this tutorial as we cover topics ranging from basic to advanced. Experienced machine learning experts will benefit from the advanced topics sessions as we will discuss state-of-the-art solutions. We will share our slides and hands-on materials with participants before the sessions via Jupyter Notebook or Google Colab.
Developing adaptive methods to deal with the dataset shift phenomenon is an open problem in machine learning. It is because the dataset shift can introduce challenges to the deployment of ML tools. This tutorial covers solutions to address such problem as follows:
|40 min||Characterizing Dataset Shift in Machine Learning||
|40 min||Dataset Shift Detection and Potential Remedies||
|45 min||Advanced Topics: Transfer Learning and Active Learning||
|40 min||Hands-on Practice||
|15 min||Q&A and Concluding Remarks|
Organizers / Tutorial Leaders
AI Research Scientist
LG Toronto AI Lab
Ali Pesaranghader is an AI Research Scientist at LG Toronto AI Lab, and a former Sr. Research Scientist at the Canadian Imperial Bank of Commerce (CIBC) with primary research interests in adaptive learning, data stream mining, natural language processing, and transfer learning. Ali obtained his Ph.D. in Computer Science with a focus on Adaptive Machine Learning at the University of Ottawa in 2018.
Mehdi Ataei is a research affiliate at the Vector Institute with a Ph.D. in Computational Physics from University of Toronto. He is currently a researcher at Autodesk’s Simulation, Optimization, and Systems group. His current research is focused on computational physics, applied mathematics, topology optimization, and machine learning.
Sedef Akinli Kocak
Sedef Akinli Kocak is an academic industry R&D partnership and project manager in the area of AI/ML and is an accomplished researcher in the area of ICT for Sustainability and Advance Analytics. She has a Ph.D. in Environmental Applied Science and Management from Data Science Lab at Ryerson University. She is currently with the Vector Institute as an AI Project Manager. She is also a part-time lecturer and supervisor in the Data Science and Analytics Program at Ryerson University.
Please use this contact form ONLY for communicating with the DSML tutorial organizers.
For registration and registration-related inquiries, please use this button link:Register