Endless Summer School: Health Roundup: COVID-19 Updates and Health Talent Spotlight
March 24, 2021 @ 10:00 am - 12:00 pm
- This event has passed.
Available to Vector Institute sponsors only.
This session showcases Vector researchers who have been working on health and COVID-19 related AI initiatives. It also features our second PhD spotlight, including a series of health-related lightning talks by Vector PhD and postdoc researchers who will soon be hitting the workforce and seeking opportunities in industry.
3D Variability Analysis: Resolving atomic resolution protein dynamics from cryo-EM
Abstract: Cryo-EM is a method for imaging frozen biomolecules at near atomic resolution. 3D Variability Analysis (3DVA) is a new technique for inferring and visualizing the space of conformational variation of macromolecules at high resolution. One of the earliest uses of the algorithm was to reveal the open and closed state of SARS-CoV2. In this talk I’ll briefly outline the nature of the algorithm and show some of our results on several proteins.
Bio: David is a Professor of Computer Science in the Department of Computer and Mathematical Sciences at the University of Toronto Scarborough. He joined the University of Toronto in October, 2003. Prior to that he worked in the Digital Video Analysis and Perceptual Document Analysis Groups at the Palo Alto Research Center (PARC) for 4 years. Before that, he was on faculty at Queen’s University in Kingston.
David’s research interests include aspects of computer vision, visual neuroscience and machine learning. His research awards include the 2010 Koenderink Prize, the Sloan Research Fellowship, and best paper award at ICCV, CVPR, UIST and BMVC.
How to Generate and Validate Question Answer Pairs for A Rapidly Evolving COVID-19 Literature
Abstract: It is no surprise to anyone now that the global pandemic of COVID-19 has changed our lives in ways we could have never imagined. In the wake of the pandemic, we encountered a whole new level of biomedical articles being published every day which made it virtually impossible for the medical community to promptly extract useful information to be utilized towards the development of a potential vaccine or cure. As researchers in Natural Language Processing (NLP), we can leverage our expertise to automatically generate question answer (QA) pairs in the coronavirus-related literature to assist the medical community in this endeavor. Creating high quality QA pairs would allow us to build models to address scientific queries for answers which are not readily available in support of the ongoing fight against the pandemic. QA pair generation is, however, a very tedious and time-consuming task requiring domain expertise for annotation and evaluation. In this talk we address some of the challenges of building a QA system without gold data and also present a method to create QA pairs from a large semi-structured dataset through the use of transformer and rule-based models. Additionally, we propose a means of engaging the subject matter experts (SMEs) for annotating the QA pairs through the usage of a web application. Finally, we demonstrate some experiments showcasing the effectiveness of leveraging active learning in designing a high performing model with a substantially lower annotation effort from the domain experts.
This work, previously presented at the SDP workshop of the EMNL 2020 conference, was created as a result of the collaboration between the Center for A.I. & Cognitive Computing (C3) team at Thomson Reuters and the Vector Institute.
Bio: Dawn is a research scientist in Natural Language Processing (NLP) at Thomson Reuters. She obtained her MSc and PhD in computer science from the University of Tennessee with a focus on machine learning and more specifically graphical models. In her current role, Dawn specializes in developing novel and innovative solutions for practical NLP problems in the legal domain as well as contributing to the state-of-the-art approaches in NLP.
Seung Eun Yi and Jonathan Smith
Characterizing Canadian Non-pharmaceutical Interventions in response to COVID-19
Abstract : Non-pharmaceutical interventions (NPIs) in Canada have helped mitigating the spread of COVID-19 since its outbreak in early 2020. However, these policies have evolved variably at the federal, provincial, and municipal levels. We will present how we built a comprehensive dataset to centralize all information. We will also show how this work was extended to analyze and monitor Ontario’s response to COVID-19, as well as its integration within a worldwide database developed by Oxford.
This talk will include work previously presented at the ML4GH workshop of the ICML 2020 conference, and work published in CMAJ Open. These were created as a result of open collaboration with the Vector Institute, Trillium Health Partners, Dallai Lana School of Public Health (UofT), Blavatnik School of Government (Oxford) and many others.
Seung Eun Yi Bio: I am a Machine Learning Scientist at Layer 6 AI. I have a Master’s degree in General Engineering (Ecole CentraleSupelec, France) and in Applied Computing (University of Toronto, Canada). I am primarily interested in causal inference and in solving healthcare challenges using machine learning.
Jonathan Smith Bio:
Jonathan is a Machine Learning Scientist at Layer 6. He obtained his Bachelors of Business Administration from Wilfred Laurier University and Bachelor of Computer Science from University of Waterloo. Previously he ran a film production company. His research interests are in explainable machine learning and applications in healthcare.
The PhD Talent Spotlight showcases the research of Vector PhD and postdoctoral fellows who will be entering the workforce over the next 12 months. These research presentations are an opportunity to hear some of the latest AI research and are designed to forge connections with leading up-and-coming talent, for potential future hiring or collaboration.
After the event we will be circulating a form for sponsors to express interest in and schedule informal interviews with this iteration of the Talent Spotlight.
Preventing Negative Feedback Loops in Machine Learning Applied to Healthcare
Abstract: Machine learning models applied to healthcare hold great promise when it comes to improving clinical outcomes, reducing costs, and increasing patient throughput. However, patient populations change, as does the definition of disease, so a model must be updated regularly to account for these changes and have optimal performance. The cycle of making predictions, and updating a model on data influenced by those predictions can lead to a negative feedback loop where performance biases in the model are reinforced over time leading to degradation. Updating a model in this setting can be viewed as a noisy label learning problem of the most difficult kind where the noise is asymmetric and instance-dependent. We demonstrate the reality of such feedback loops via simulations on a real-world ICU dataset, and show the various factors that most highly contribute to the rate of model degradation. We then provide ways of mitigating such feedback loops by leveraging approaches for deep learning with noisy labels, and detail the assumptions necessary for these techniques to work.
Bio: Alex is a 2nd year PhD student interested in deep learning model robustness. He has worked in domains including adversarial examples, neural architecture search, and ensembles. Alex did his undergraduate studies in Computer Science at the University of Toronto, and a Master’s at U of T as well focused on machine learning techniques for integrating networks in the context of drug repurposing.
Multisource domain adaptation for robust classification in ultrasound
Abstract: Ultrasound imaging offers the possibility of non-invasive, mobile medical screening for a myriad disorders along with image-guided care. However, ultrasonography is a relatively rare, specialized skill with high variability of expertise among users. Ultrasound devices produce images of differing quality and ultrasonography from the same device may still vary dramatically due to operator choices and patient characteristics. Therefore, building tools which are flexible to these variations, particularly when labeled data is difficult to acquire, is paramount to improving clinical utilization of this otherwise accessible imaging technology. Here we show how multi-source domain adaptation can align very different, and limited, source data, enabling real-world computation on highly variable input data.
Bio: Lauren Erdman is a third year PhD student in Computer Science at the University of Toronto, Vector Institute, and SickKids Hospital, under the supervision of Anna Goldenberg. Her research focuses on developing machine learning methods for robust and generalizable clinical decision support. Lauren also manages the Center for Computational Medicine Machine Learning Core at SickKids Hospital where she facilitates the use of ML in clinical applications across a diverse set of clinical specialties and research areas in the Toronto Discovery District.
CiberATAC accurately deconvolves chromatin accessibility data by learning from the transcriptome
Abstract: Gene expression drives phenotype, cis-regulatory element activity drives gene expression, and transcription factor binding drives the activity of cis-regulatory elements. Widely used single cell transcriptomic assays identify dysregulated genes, but fail to identify the cis-regulatory elements regulating those genes. Our method, Cis-regulatory element Identification By Expressed RNA and ATAC-seq, CiberATAC, identifies active cell-type–specific cis-regulatory elements.
CiberATAC introduces a novel feature representation approach to model epigenomic and transcriptomic data for applying state-of-the-art deep learning and contrastive learning algorithms. It uses a siamese residual convolutional neural network to model bulk chromatin accessibility and cell-type–specific transcription within 20 kbp (±10 kbp) of each cis-regulatory element to predict its cell-type–specific activity. CiberATAC also uses cell identity as encoded by a customized variational auto-encoder as one of its inputs. CiberATAC accurately identifies active cis-regulatory elements when a heterogeneous chromatin accessibility signal and a homogeneous transcription signal is available, making it suitable for integrating single-cell ATAC-seq (or bulk ATAC-seq) data with single-cell RNA-seq datasets. CiberATAC achieved R 2 of 0.8 on chromosome-wide deconvolution when the biological baseline R 2 was 0.1. CiberATAC was able to distinguish the active enhancers in each of the 7 closely related primary blood mononuclear cells with R 2 of 0.3 when the biological baseline R 2 was -0.1.
Bio: Mehran is a Post-doctoral Fellow co-supervised by Dr. Hani Goodarzi at University of California, San Francisco and Dr. Bo Wang at Vector Institute. Mehran received his Ph.D. from the Department of Medical Biophysics at University of Toronto under supervision of Dr. Michael Hoffman. Currently, Mehran is interested in applying deep learning models to improve the limitations of genomic assays. Academic status: Post-doctoral Fellow; Affiliation: UCSF and Vector Institute
Detecting phenotypic consequences of perturbation screens using automated image analysis
Abstract: Understanding the genotype-to-phenotype relationship requires detailed observation and detection of phenotypes at the single-cell level. With the advancements in high-throughput microscopy and gene editing in the whole exome scale, generated massive image datasets by quantifying cell morphology and identifying biologically important cell phenotypes. Phenotypic analyses of these datasets require automated computational solutions to accurately detect and quantify the subtle phenotypic variations in perturbation screens. However, mutant phenotypic space is often unknown, therefore creates the need for robust automated phenotype detection. In my thesis, I explored methods for performing image analysis and outlier detection to automatically detect cells with abnormal morphologies. This detection allowed me to accurately quantify the percentage of cells with abnormal phenotypes for a perturbed gene population. I then combined outlier detection, unsupervised clustering and neural network-based phenotype classification to quantify the phenotypic variability within a population of cells and identified functionally distinct abnormal phenotypes associated with many genes. While the central players of many important biological processes have been discovered, there remain numerous gaps in our understanding of the regulation of cellular morphology. By developing computational pipelines to systematically and quantitatively assess abnormal phenotypes at the genome-wide level, I aimed to address these knowledge gaps and implemented methods that can be applied to any kind of perturbation screen on any image dataset. Recently, image analysis and machine learning approaches have proven to be useful in medical settings to support physicians with their diagnoses and choice of treatments. It will be an honour for me to serve healthcare by improving on my previous implementations for applications like real-time detection of anomalies and patient categorization for tailored therapies.
Bio: I am a senior PhD student in the Department of Molecular Genetics at the University of Toronto under the supervision of Drs Brenda Andrews and Quaid Morris. My thesis focuses on applied machine learning and image analysis, spanning both supervised and unsupervised learning approaches to detect the phenotypic consequences of genetic perturbation screens. Upon defending my thesis in summer 2021, I would like to seek opportunities to help progress healthcare as a data scientist.
Available to Vector Institute sponsors only.