Events

Loading Events

All Events

  • This event has passed.

NATURAL LANGUAGE PROCESSING (NLP) SYMPOSIUM

September 15, 2020 @ 10:00 am - 1:45 pm

 

Held Virtually – event open to Vector researchers and industry sponsors only

 

Register VIEW SEPT 16 AGENDA

 

The Vector Institute is hosting a Natural Language Processing (NLP) Symposium showcasing the NLP project with academic-industry collaborators to facilitate interaction between our industry sponsors, researchers, students and faculty members. 

In June 2019, Vector Institute launched a multi-phase industry-academic collaborative project focusing on recent advances in NLP. Participants replicated a state-of-the-art NLP model called BERT and fine tuned a transfer learning approach to optimize domain-spe    cific tasks in areas such as health, law and finance. 

To follow-up on the outcomes of the project, a two-day symposium will be held featuring presentations and hands-on workshops, delivered by the project participants and Vector researchers. 

The symposium will support knowledge transfer and provide an exclusive opportunity for Vector’s industry sponsors to engage with talent in the NLP domain.

Workshop Requirements:

Required skill set: Fundamentals of machine learning and deep learning; knowledge of Language modelling and/or transformers; experience programming in Python and any of the deep learning frameworks (Tensorflow, Pytorch); experience using GPUs for accelerated deep learning training; experience in using jupyter notebook and/or Google Colab. Participants must be individuals actively involved in NLP research and/or development

Who should attend: 
  • Individuals who are interested in learning more about natural language processing
  • Vector sponsors involved in the NLP project
  • Technical experts from Vector Sponsor companies
  • Vector PGA, Alumni, Scholarship recipient students interested in NLP 
  • Vector Researchers

 

September 15: AGENDA

Opening Remarks

10:00 am – 10:10 am

 

Sedef Akinli Kocak

MC: Sedef Akinli Kocak

Project Manager, Industry Innovation, Vector Institute

Garth Gibson

Garth Gibson

President and CEO, Vector Institute

 

Keynote Presentation: Unlearn dataset bias for robust language understanding

10:10 am – 10:40 am

While we have made great progress in natural language understanding, transferring the success from benchmark datasets to real applications has not always been smooth. Notably, models sometimes make mistakes that are confusing and unexpected to humans. In this talk, I will discuss spurious association in NLP benchmarks and present our recent works on correcting known bias during learning.

 

Garth Gibson

He He

Assistant Professor, Computer Science and Data Science, New York University

 

Keynote Presentation: Infinite Scaling of Language Modelling

10:40 am – 11:00 am

Provide a brief overview of the current trend and tricks in large-scale learning systems in NLP. The second half of the presentation will  discuss the limitation of the current approaches and some potential future directions.

 

Jimmy Ba

Jimmy Ba

Assistant Professor, Department of Computer Science, University of Toronto, Machine Learning Group, University of Toronto, Faculty Member, Vector Institute, Canada CIFAR Artificial Intelligence Chair

 

Keynote Presentation: Efficient DNN Training at Scale: from Algorithms to Hardware

11:00 am – 11:20 am

The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus of systems research is usually quite narrow and limited to (i) inference — i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. In this talk, we will demonstrate a holistic approach to DNN training acceleration and scalability starting from the algorithm, to software and hardware optimizations, to special development and optimization tools.

In the first part of the talk I will show our radically new approach on how to efficiently scale backpropagation algorithm used in DNN training (BPPSA algorithm, MLSys’20). I will then demonstrate several approaches to deal with one of the major limiting factors in DNN training: limited GPU/accelerator memory capacity (Echo, ISCA’20 and Gist, ISCA’18). At the end, I will show the performance and visualization tools we built in my group to understand, visualize, and optimize DNN models, and even predict their performance on different hardware.

 

Gennady Pekhimenko

Gennady Pekhimenko

Assistant Professor, Department of Computer Science, University of Toronto, Faculty Member, Vector Institute, Canada CIFAR Artificial Intelligence Chair

 

Project/Research Presentations

11:20 am – 12 noon

 

Project Presentation: Multi-node BERT-Pretraining: Cost-efficient Approach

Recently, large scale Transformer-based language models such as BERT, GPT-2, and XLNet have brought about exciting leaps in state-of-the-art results for many Natural Language Processing (NLP) tasks. One of the common trends in these recent models is a significant increase in model complexity. As a result, to train these models within reasonable time, advanced hardware setups are required, such as the premium GPU enabled NVIDIA DGX workstations or specialized accelerators like TPU Pods. Our work addresses this limitation and demonstrates that the BERT pre-trained model can be trained within 2 weeks on an academic-size cluster of widely available GPUs through careful algorithmic and software optimizations. We present these optimizations on how to improve single device training throughput, distribute the training workload over multiple nodes and GPUs, and overcome the communication bottleneck introduced by the large data exchanges over the network.

 

Jiahuang (Jacob) Lin

Jiahuang (Jacob) Lin

Masters Student Computer Science, University of Toronto

 

Project Presentation: Language Modelling in Finance Domain

Domain-specific pre-training recently got a good amount of attention because of some very powerful general-purpose language models like BERT, XLNet, and OpenAI GPT. Even though these models generally do very well over domains with few labeled evidences or no evidence at all, they are not effective enough in terms of performance. One such domain having those lackings is the finance domain. In this paper, we propose FinanceBERTCLS and FinanceBERT-SUM, where we perform domain-specific pre-training of BERT using examples portraying financial aspects. We use LDA to extract stories explaining finance topics and create a finance version of the CNN/Dailymail dataset. We also scrape large amounts of financial text from the web. Our extensive experiments show improvement over two financial sentiment analysis datasets: FiQA and Financial Phrase Bank. We also achieved very good 5-fold cross-validation performance in an extractive summarization task on the subset of CNN/Dailymail dataset. We find that domain-specific pre-training is very much effective even with a smaller training set and fine-tuning.

 

Stella Wu

Stella Wu

Applied Machine Learning Researcher, BMO AI

 

Student Presentation: Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels

A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning, but through two separate channels corresponding to their languages. In this talk, I will present our work on the Multichannel Generative Language Model (MGLM). MGLM is a generative joint distribution model over channels, marginalizing over all possible factorizations within and across all channels. We demonstrate experiments with unconditional, conditional, and partially conditional generation for the Multi30K dataset containing English, French, Czech, and German as the channels. We provide qualitative samples sampled unconditionally from the generative joint distribution, and quantitatively analyze the quality-diversity trade-offs, finding that MGLM outperforms traditional bilingual discriminative models.

 

Harris Chan

Harris Chan

PhD student in the Machine Learning Group, University of Toronto, Vector Institute

 

Student Presentation: Improving Transformer Optimization Through Better Initialization

As Transformer models are becoming larger and more expensive to train, recent research has focused on understanding and improving optimization in these architectures. In this work our contributions are two-fold: we first investigate and empirically validate the source of optimization problems in the encoder-decoder Transformer architecture; we then propose a new weight initialization scheme with theoretical justification, that enables training without warmup or layer normalization. Empirical results on public machine translation benchmarks show that our approach achieves leading accuracy, allowing to train deep Transformer models with 200 layers in both encoder and decoder (over 1000 attention/MLP blocks) without difficulty.

 

Xiao Shi (Gary) Huang

Xiao Shi (Gary) Huang

Machine Learning Research Scientist, Layer 6

 

Networking and Poster Sessions

12 noon – 12:30 pm

 

Poster #1: Application of NLP in Emergency Medical Services

Amrit Sehdev

Queen’s University

Poster #2: Modelling Sentence Pairs via Reinforcement Learning: An Actor-Critic Approach to Learn the Irrelevant Words

Mahtab Ahmed

University of Western Ontario

Poster #3: SentenceMIM: A Latent Variable Language Model

Micha Livne

University of Toronto

Poster #4: Training without training data: Improving the generalizability of automated medical abbreviation disambiguation

Marta Skreta

University of Toronto

Poster #5: Explainability for deep learning text classifiers

Diana Lucaci

University of Ottawa

Poster #6: Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

Aryan Arbabi

University of Toronto

Poster #7: Sharing is Caring: Exploring machine learning methods to facilitate medical imaging exchange using metadata only

Joanna Pineda

University of Toronto

Poster #8: GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference

Ali Hadi Zadeh

University of Toronto

Poster #9: Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Aref Jafari

University of Waterloo

Poster #10: How Nouns Surface as Verbs: Inference and Generation in Word Class Conversion

Lei Yu

University of Toronto

Poster #11: Informal Natural Language Processing: The Case of Slang

Zhewei Sun

University of Toronto

Poster #12: Applications of the Chinese Remainder Theorem in Word Embedding Compression and Arithmetic

Patricia Thaine

University of Toronto

Poster #13: Predicting change in Major Depressive Disorder symptoms based on topic modelling features from psychiatric notes: An exploratory analysis

Marta Maslej

Centre for Addiction and Mental Health

Poster #14: Non-Pharmaceutical Intervention Discovery with Topic Modeling

Jonathan Smith

Layer 6

Poster #15: Hurtful words: quantifying biases in clinical contextual word embeddings

Haoron Zhang

University of Toronto

Poster #16: Domain Specific Fine-tuning of Denoising Sequence-to-Sequence Models for Natural Language Summarization

Matt Kalabic

PwC/Deloitte

Poster #17: Sentiment Classification and Extractive Summarization on Financial Text Using BERT

Stella Wu

BMO AI

Poster #18: Multi-node BERT-Pretraining: Cost-efficient Approach

Jiahuang (Jacob) Lin

Masters Student Computer Science, University of Toronto

Poster #19: Multichannel Generative Language Model: Learning All Possible FactorizationsWithin and Across Channels

Harris Chan

PhD student in the Machine Learning Group, University of Toronto, Vector Institute

Poster #20: Improving Transformer Optimization Through Better Initialization

Xiao Shi (Gary) Huang

Machine Learning Research Scientist, Layer 6

 

Networking and Break

12:30 pm – 12:45 pm

 

Concurrent Workshops

12:45 pm – 1:45 pm

 

WS1: Performing down-stream NLP tasks with transformers

Training NLP models from scratch requires large amounts of computational resources that may not be financially feasible for most organizations. By leveraging pre-trained models and transfer learning, we can fine-tune NLP models for a specific task at a fraction of the time and resources. In this workshop, we will explore how to use HuggingFace to fine-tune Transformer models to perform specific downstream tasks. The purpose of this workshop is to provide learning through demonstration and hands-on experience.

Level of workshop: Beginner/Intermediate

Nidhi Arora

Nidhi Arora

Data Scientist II, Intact Financial Corp.

Faiza Khan Khattak

Faiza Khan Khattak

Data Scientist, Manulife

Max Tian

Max Tian

Machine Learning Engineer, Adeptmind

 

WS2: Distributed multi-node pre-training with unsupervised machine translation application

In order to significantly reduce the training time when dealing with large datasets we will demonstrate multi-node distributed training; this allows us to efficiently parallelize the training updates of deep neural networks across multiple nodes.

Level of workshop: Intermediate/Advanced

Jacob Lin

Jacob Lin

Vector Institute, University of Toronto

Gennady Pekhimenko

Gennady Pekhimenko

Assistant Professor, Department of Computer Science, University of Toronto, Faculty Member, Vector Institute, Canada CIFAR Artificial Intelligence Chair

Filippo Pompili

Filippo Pompili

Senior Research Scientist – NLP, Thomson Reuters

Xin Li

Xin Li

Member of the AI Technical Staff, Vector Institute

 

Event to conclude

1:45 pm

 

Register
Scroll to Top