Events

- This event has passed.
NATURAL LANGUAGE PROCESSING (NLP) SYMPOSIUM
September 15, 2020 @ 10:00 am - 1:45 pm
Held Virtually – event open to Vector researchers and industry sponsors only
Register VIEW SEPT 16 AGENDA
The Vector Institute is hosting a Natural Language Processing (NLP) Symposium showcasing the NLP project with academic-industry collaborators to facilitate interaction between our industry sponsors, researchers, students and faculty members.
In June 2019, Vector Institute launched a multi-phase industry-academic collaborative project focusing on recent advances in NLP. Participants replicated a state-of-the-art NLP model called BERT and fine tuned a transfer learning approach to optimize domain-spe cific tasks in areas such as health, law and finance.
To follow-up on the outcomes of the project, a two-day symposium will be held featuring presentations and hands-on workshops, delivered by the project participants and Vector researchers.
The symposium will support knowledge transfer and provide an exclusive opportunity for Vector’s industry sponsors to engage with talent in the NLP domain.
Workshop Requirements:
Required skill set: Fundamentals of machine learning and deep learning; knowledge of Language modelling and/or transformers; experience programming in Python and any of the deep learning frameworks (Tensorflow, Pytorch); experience using GPUs for accelerated deep learning training; experience in using jupyter notebook and/or Google Colab. Participants must be individuals actively involved in NLP research and/or development
Who should attend:
- Individuals who are interested in learning more about natural language processing
- Vector sponsors involved in the NLP project
- Technical experts from Vector Sponsor companies
- Vector PGA, Alumni, Scholarship recipient students interested in NLP
- Vector Researchers
September 15: AGENDA
Opening Remarks
10:00 am – 10:10 am

MC: Sedef Akinli Kocak
Project Manager, Industry Innovation, Vector Institute

Garth Gibson
President and CEO, Vector Institute
Keynote Presentation: Unlearn dataset bias for robust language understanding
10:10 am – 10:40 am
While we have made great progress in natural language understanding, transferring the success from benchmark datasets to real applications has not always been smooth. Notably, models sometimes make mistakes that are confusing and unexpected to humans. In this talk, I will discuss spurious association in NLP benchmarks and present our recent works on correcting known bias during learning.

He He
Assistant Professor, Computer Science and Data Science, New York University
Keynote Presentation: Infinite Scaling of Language Modelling
10:40 am – 11:00 am
Provide a brief overview of the current trend and tricks in large-scale learning systems in NLP. The second half of the presentation will discuss the limitation of the current approaches and some potential future directions.

Jimmy Ba
Assistant Professor, Department of Computer Science, University of Toronto, Machine Learning Group, University of Toronto, Faculty Member, Vector Institute, Canada CIFAR Artificial Intelligence Chair
Keynote Presentation: Efficient DNN Training at Scale: from Algorithms to Hardware
11:00 am – 11:20 am
The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus of systems research is usually quite narrow and limited to (i) inference — i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. In this talk, we will demonstrate a holistic approach to DNN training acceleration and scalability starting from the algorithm, to software and hardware optimizations, to special development and optimization tools.
In the first part of the talk I will show our radically new approach on how to efficiently scale backpropagation algorithm used in DNN training (BPPSA algorithm, MLSys’20). I will then demonstrate several approaches to deal with one of the major limiting factors in DNN training: limited GPU/accelerator memory capacity (Echo, ISCA’20 and Gist, ISCA’18). At the end, I will show the performance and visualization tools we built in my group to understand, visualize, and optimize DNN models, and even predict their performance on different hardware.

Gennady Pekhimenko
Assistant Professor, Department of Computer Science, University of Toronto, Faculty Member, Vector Institute, Canada CIFAR Artificial Intelligence Chair
Project/Research Presentations
11:20 am – 12 noon
Project Presentation: Multi-node BERT-Pretraining: Cost-efficient Approach
Recently, large scale Transformer-based language models such as BERT, GPT-2, and XLNet have brought about exciting leaps in state-of-the-art results for many Natural Language Processing (NLP) tasks. One of the common trends in these recent models is a significant increase in model complexity. As a result, to train these models within reasonable time, advanced hardware setups are required, such as the premium GPU enabled NVIDIA DGX workstations or specialized accelerators like TPU Pods. Our work addresses this limitation and demonstrates that the BERT pre-trained model can be trained within 2 weeks on an academic-size cluster of widely available GPUs through careful algorithmic and software optimizations. We present these optimizations on how to improve single device training throughput, distribute the training workload over multiple nodes and GPUs, and overcome the communication bottleneck introduced by the large data exchanges over the network.

Jiahuang (Jacob) Lin
Masters Student Computer Science, University of Toronto
Project Presentation: Language Modelling in Finance Domain
Domain-specific pre-training recently got a good amount of attention because of some very powerful general-purpose language models like BERT, XLNet, and OpenAI GPT. Even though these models generally do very well over domains with few labeled evidences or no evidence at all, they are not effective enough in terms of performance. One such domain having those lackings is the finance domain. In this paper, we propose FinanceBERTCLS and FinanceBERT-SUM, where we perform domain-specific pre-training of BERT using examples portraying financial aspects. We use LDA to extract stories explaining finance topics and create a finance version of the CNN/Dailymail dataset. We also scrape large amounts of financial text from the web. Our extensive experiments show improvement over two financial sentiment analysis datasets: FiQA and Financial Phrase Bank. We also achieved very good 5-fold cross-validation performance in an extractive summarization task on the subset of CNN/Dailymail dataset. We find that domain-specific pre-training is very much effective even with a smaller training set and fine-tuning.

Stella Wu
Applied Machine Learning Researcher, BMO AI
Student Presentation: Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels
A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning, but through two separate channels corresponding to their languages. In this talk, I will present our work on the Multichannel Generative Language Model (MGLM). MGLM is a generative joint distribution model over channels, marginalizing over all possible factorizations within and across all channels. We demonstrate experiments with unconditional, conditional, and partially conditional generation for the Multi30K dataset containing English, French, Czech, and German as the channels. We provide qualitative samples sampled unconditionally from the generative joint distribution, and quantitatively analyze the quality-diversity trade-offs, finding that MGLM outperforms traditional bilingual discriminative models.

Harris Chan
PhD student in the Machine Learning Group, University of Toronto, Vector Institute
Student Presentation: Improving Transformer Optimization Through Better Initialization
As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these architectures. In this work our contributions are two-fold: we first investigate and empirically validate the source of optimization problems in the encoder-decoder Transformer architecture; we then propose a new weight initialization scheme with theoretical justification, that enables training without warmup or layer normalization. Empirical results on public machine translation benchmarks show that our approach achieves leading accuracy, allowing to train deep Transformer models with 200 layers in both encoder and decoder (over 1000 attention/MLP blocks) without difficulty.

Xiao Shi (Gary) Huang
Machine Learning Research Scientist, Layer 6
Networking and Poster Sessions
12 noon – 12:30 pm
Poster #1: Application of NLP in Emergency Medical Services Amrit Sehdev Queen’s University |
Poster #2: Modelling Sentence Pairs via Reinforcement Learning: An Actor-Critic Approach to Learn the Irrelevant Words Mahtab Ahmed University of Western Ontario |
Poster #3: SentenceMIM: A Latent Variable Language Model Micha Livne University of Toronto |
Poster #4: Training without training data: Improving the generalizability of automated medical abbreviation disambiguation Marta Skreta University of Toronto |
Poster #5: Explainability for deep learning text classifiers Diana Lucaci University of Ottawa |
Poster #6: Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning Aryan Arbabi University of Toronto |
Poster #7: Sharing is Caring: Exploring machine learning methods to facilitate medical imaging exchange using metadata only Joanna Pineda University of Toronto |
Poster #8: GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference Ali Hadi Zadeh University of Toronto |
Poster #9: Improved knowledge distillation by utilizing backward pass knowledge in neural networks Aref Jafari University of Waterloo |
Poster #10: How Nouns Surface as Verbs: Inference and Generation in Word Class Conversion Lei Yu University of Toronto |
Poster #11: Informal Natural Language Processing: The Case of Slang Zhewei Sun University of Toronto |
Poster #12: Applications of the Chinese Remainder Theorem in Word Embedding Compression and Arithmetic Patricia Thaine University of Toronto |
Poster #13: Predicting change in Major Depressive Disorder symptoms based on topic modelling features from psychiatric notes: An exploratory analysis Marta Maslej Centre for Addiction and Mental Health |
Poster #14: Non-Pharmaceutical Intervention Discovery with Topic Modeling Jonathan Smith Layer 6 |
Poster #15: Hurtful words: quantifying biases in clinical contextual word embeddings Haoron Zhang University of Toronto |
Poster #16: Domain Specific Fine-tuning of Denoising Sequence-to-Sequence Models for Natural Language Summarization Matt Kalabic PwC/Deloitte |
Poster #17: Sentiment Classification and Extractive Summarization on Financial Text Using BERT Stella Wu BMO AI |
Poster #18: Multi-node BERT-Pretraining: Cost-efficient Approach Jiahuang (Jacob) Lin Masters Student Computer Science, University of Toronto |
Poster #19: Multichannel Generative Language Model: Learning All Possible FactorizationsWithin and Across Channels Harris Chan PhD student in the Machine Learning Group, University of Toronto, Vector Institute |
Poster #20: Improving Transformer Optimization Through Better Initialization Xiao Shi (Gary) Huang Machine Learning Research Scientist, Layer 6 |
Networking and Break
12:30 pm – 12:45 pm
Concurrent Workshops
12:45 pm – 1:45 pm
WS1: Performing down-stream NLP tasks with transformers
Training NLP models from scratch requires large amounts of computational resources that may not be financially feasible for most organizations. By leveraging pre-trained models and transfer learning, we can fine-tune NLP models for a specific task at a fraction of the time and resources. In this workshop, we will explore how to use HuggingFace to fine-tune Transformer models to perform specific downstream tasks. The purpose of this workshop is to provide learning through demonstration and hands-on experience.
Level of workshop: Beginner/Intermediate

Nidhi Arora
Data Scientist II, Intact Financial Corp.

Faiza Khan Khattak
Data Scientist, Manulife

Max Tian
Machine Learning Engineer, Adeptmind
WS2: Distributed multi-node pre-training with unsupervised machine translation application
In order to significantly reduce the training time when dealing with large datasets we will demonstrate multi-node distributed training; this allows us to efficiently parallelize the training updates of deep neural networks across multiple nodes.
Level of workshop: Intermediate/Advanced

Jacob Lin
Vector Institute, University of Toronto

Gennady Pekhimenko
Assistant Professor, Department of Computer Science, University of Toronto, Faculty Member, Vector Institute, Canada CIFAR Artificial Intelligence Chair

Filippo Pompili
Senior Research Scientist – NLP, Thomson Reuters

Xin Li
Member of the AI Technical Staff, Vector Institute
Event to conclude
1:45 pm
Register