Machine learning theory takes centre stage at Vector Institute workshop

By Arber Kacollja

The Vector Institute’s recent Machine Learning Theory Workshop brought together machine learning theory researchers to present their latest research, discuss cutting-edge topics, and share insights into the theoretical underpinnings of machine learning. The event, organized by Vector Faculty Member Shai Ben-David and Vector Faculty Affiliate Ruth Urner, was held at the University of Waterloo in November.

Machine learning theorists play a pivotal role in addressing challenges and posing novel fundamental questions in the field. Delving deep into the foundational aspects of machine learning, they contribute to the development of new concepts that can have a transformative impact on problem-solving methodologies. Ultimately, their contributions can usher in a paradigm shift in how complex AI issues are approached.

Participants gathered in a lecture hall at the University of Waterloo for Vector's Machine Learning Theory Workshop in November.

Participants gathered at the University of Waterloo for Vector’s Machine Learning Theory Workshop in November.

Over the course of the day-long workshop, Vector Institute Faculty Members, Faculty Affiliates, Postdoctoral Fellows and researchers from the broader Vector community heard research talks from some of the top machine learning theorists who covered a wide range of topics in mathematical foundations of machine learning. Participants also took part in interactive group discussions as well as a poster session where graduate students presented their research.

Vector Faculty Member Shai Ben-David uses a microphone to welcome participants to the workshop.

Vector Faculty Member Shai Ben-David welcoming participants to the workshop.

Ben-David discussed possible notions of learnability characterizations and dimensions. The Fundamental Theorem of Statistical Learning states that the Vapnik–Chervonenkis (VC) dimension characterizes the learnability of classes for the binary label prediction task. Can similar characterizations be provided for other learning tasks such as multi-class prediction, unsupervised learning probability distributions and more? Ben-David, who is a Canada CIFAR AI Chair and a professor at the David Cheriton School of Computer Science, University of Waterloo, also explained recent results from his lab at the University of Waterloo stating the non-existence of such dimensions for general statistical learning and for learning classes of probability distributions.

Beyond classical statistical and computer science paradigms

Vector Faculty Member and Canada CIFAR AI Chair Murat Erdogdu and the researchers in this lab study the effect of gradient-based optimization on feature learning in two-layer neural networks. In his presentation Erdogdu, who is also an assistant professor in the Department of Computer Science and Department of Statistical Sciences, Faculty of Arts & Science, University of Toronto, considered a setting where the number of samples is of the same order as the input dimension and showed that, when the input data is isotropic, gradient descent always improves upon the initial random features model in terms of prediction risk, for a certain class of targets. Further leveraging the practical observation that data often contains additional structure, i.e., the input covariance has non-trivial alignment with the target, the research group’s work proves that the class of learnable targets can be significantly extended, demonstrating a clear separation between kernel methods and two-layer neural networks in this regime.

Neural networks have become so large that their behavior can be well approximated by “infinite neural networks”, which are obtained by considering the limit as the number of neurons goes to infinity. However, there are many possible infinite limits one can take. For example, one well-known limit is the “neural tangent kernel” (NTK) limit, where the depth is fixed and the layer width goes to infinity.

Vector Faculty Affiliate and Assistant Professor at the University of Guelph Mihai Nica introduced an alternative infinite limit, the infinite depth-and-width limit, where both the depth and the width are scaled to infinity simultaneously. This leads to exotic non-Gaussian distributions that are very different from NTK-type behaviour but match the output of finite neural networks more accurately.

Over the last decade, a body of rich predictions has been made about the spectra of empirical Hessian and information matrices over the course of training (via stochastic gradient descent) in overparameterized networks. Aukosh Jagannath who is an Assistant Professor at the University of Waterloo, presented recent work done in collaboration with Gérard Ben Arous, Reza Ghessari, and Jiaoyang Huang, where they rigorously establish some of these predictions. The presentation focused on their results for a high-dimensional analog of the XOR problem,showing that the stochastic gradient descent (SGD) trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient matrices; this alignment occurs per layer, with the final layer’s outlier eigenspace evolving over the course of training and exhibiting rank deficiency when the SGD converges to sub-optimal classifiers.

Towards a Robust and Trustworthy Machine Learning System Learning

Vector Faculty Member and Canada CIFAR AI Chair Sivan Sabato, who is also an Associate Professor at McMaster University, discussed the potential and the challenges of using explanations as input to a learning algorithm. The goal is for machine learning systems to be able to learn much faster from experience if they also hear explanations from a knowledgeable teacher, just like people do.

For instance, when learning to automatically diagnose patients, people usually use historical data on previous patients and their diagnoses. However, the process could be much more effective if the learning system could also ask physicians to explain some of the diagnoses. As another example, software and websites would be able to better personalize to specific users, if they allow users to explain their preferences.

Incorporating explanations into the learning process requires understanding how they can be used and how they should be interpreted by the algorithm. Perhaps the most challenging aspect is to have the learning system take into account the fact that explanations may be useful, but may also sometimes be mistaken. A robust learning system needs to use explanations cautiously, so that it can benefit from good explanations without being too sensitive to poor ones. This talk proposed methods for doing just that.

Vector Faculty Affiliate Ruth Urner delivers her talk “Models of adversaries.”

Urner, who is also an Associate Professor at York University, focused her talk on discussing how adversarial robustness requirements can be adequately modeled. She surveyed how different modeling assumptions can lead to drastically different conclusions. Urner argued that we should aim for minimal assumptions on how an adversary might act, and presented recent results on a variety of relaxations of learning with the standard framework for adversarial (or strategic) robustness.

Following on this note, statistical learning traditionally relies on training and test data being generated by the same process, but instances might (strategically or adversarially) respond to a published predictor aiming for a specific outcome. Such manipulations to data at test time can lead to unexpected failures of a learned model. A large body of both practical and theoretical research studies aim at mitigating the resulting safety risks by developing methods that are robust to adversarial perturbations.

Statistical Estimation under Differential Privacy Constraints

Vector Faculty Member and Canada CIFAR AI Chair Gautam Kamath focused on his group’s research on statistical estimation under privacy constraints. Statistical estimation addresses fundamental tasks like understanding patterns that underlie a dataset, or computing certain aggregate statistics like the mean. The pertinent question is how to do this without leaking sensitive information about individual data points in the dataset. Kamath, who is also an assistant professor at the David R. Cheriton School of Computer Science, University of Waterloo, talked about a host of new issues that arise in the private setting and how to address them, including trade-offs involving statistical bias, data with heavy tails, and priors on the dataset.

Furthermore, Vector Faculty Affiliate Hassan Ashtiani, who is also an Assistant Professor at McMaster University, talked about private learning of Gaussian Mixture Models (GMMs). GMMs represent a rich class of distributions that have been used for modeling various scientific phenomena, including in the early work of Karl Pearson in 1890’s to study the characteristics of shore crabs. In recent years, there has been significant interest in designing sample-optimal and computationally efficient algorithms for estimating GMMs.

However, designing differentially private methods for learning GMMs has been difficult. Interestingly, some of these difficulties represent the fundamental gaps in our understanding of private statistical estimation. In his talk, Ashtiani outlined some of these challenges, and some generic approaches to solve them. One shared theme was to use non-private estimators as a black-box, “stabilize” the outputs of the non-private estimators, and then aggregate the outcomes in a differentially private manner. The results included the first computationally efficient reduction for private to non-private parameter estimation for GMMs, as well as the first learnability result for GMMs in the density estimation setting.

Additional highlights from the event included Vector Faculty Member Yaoliang Yu’s discussion on data poisoning, model compression and (Shapley-related) value estimation, and Vector Faculty Member and Co-Research Director Daniel Roy’s presentation on applications of infinitesimals to open problems in statistical decision theory.

The work of learning theorists extends beyond mere problem-solving; it serves as a catalyst for a deeper understanding of emerging phenomena. By exploring the fundamental principles that underlie machine learning, learning theorists provide insights that not only enhance the efficiency of existing systems but also pave the way for the development of cutting-edge approaches. In essence, their contributions form the bedrock of the ongoing evolution in machine learning and artificial intelligence, shaping the landscape of these dynamic fields.

Want to learn more about the Vector Institute’s current research initiatives in machine learning theory? Click here for the full playlist of talks.

Beyond classical statistical and computer science paradigms

Towards a Robust and Trustworthy Machine Learning System Learning

Statistical Estimation under Differential Privacy Constraints

Related:

Ontario’s AI ecosystem: fueling real economic growth with record number of jobs and private investments

Transforming Youth Mental Health Support: FAIIR’s AI-Powered Crisis Response Model

AI Weather Forecasting Breakthrough: How Canadian Innovation is Transforming Climate Prediction | Aardvark Weather