How Thomson Reuters uses NLP to enable knowledge workers to make faster and more accurate business decisions

April 2, 2020

April 2, 2020

Hidden Injustice, a recent investigative report by Reuters, revealed how federal civil court rulings obscured the role that pharmaceutical companies played in the rising opioid epidemic. It was ground-breaking for what reporters uncovered, but also for how they uncovered it. They used machine learning and natural language processing (NLP) to review 3.2 million federal civil suits and over 90 million court actions to identify material filed under seal. They could then narrow their focus to search for instances in which “public health and safety information was kept secret without explanation.” [1]

The series won first prize at the 2019 Philip Meyer Journalism Awards, which recognize works of “precision journalism, computer-assisted reporting, and social science research.” [2]

For Thomson Reuters — the parent company of the news organization — AI techniques like machine learning and NLP are at the heart of what it does best: enabling legal, media, and tax & accounting professionals to find information, understand it, and use it to make decisions.

Khalid Al-Kofahi, who previously headed Thomson Reuters’ Center for AI and Cognitive Computing, says, “Generally speaking, knowledge workers such as attorneys and accountants essentially do three things. They have information needs so they engage in a journey of research and discovery. As they do that, they start analyzing it to understand it. Then at some stage they move to some sort of action or decision. We use AI technology to support all these activities.”

Thomson Reuters’ own journey of research and discovery included sponsoring the Vector Institute, which it did for three reasons: to stay on the front line of fundamental research, to support Canada’s AI ecosystem, and to develop approaches to common AI challenges through collaboration with other industry players.

One collaboration highlight is the Vector consortium project on NLP, a technique used to pursue “the holy grail” of AI: the fluent understanding of language. This project involves 25 industry participants that work with Vector researchers in workstreams focused on various NLP-related experiments. Thomson Reuters participated in a workstream to cost-effectively replicate BERT — bidirectional encoder representations from transformers — an advanced language representation model. Creating BERT requires a deep neural network to be pre-trained on a large body of unlabeled text — like that found on Wikipedia, Twitter, or a news site — to create a general model of how language works. This pre-trained BERT can then be fine-tuned for tasks like machine translation, sentiment analysis, and question answering in specific domains like law, health, and finance.

This utility often comes at quite a cost, though. Pre-training a BERT typically requires days of processing on hardware that may be prohibitively expensive for most organizations to access. Fine-tuning a vendor’s pre-trained BERT on specialized cloud-based processors like graphics processing units (GPUs) or tensor processing units (TPUs) is much less demanding, but still often comes at significant time and expense.

“When you look at some of these language models, they require a huge amount of resources to build,” Al-Kofahi says. “Part of the challenge for us was: Can we train these models using more distributed architectures and figure out algorithms that can reduce the demand for many GPUs?” The first phase of experiments, run on Vector’s own GPU cluster, were promising.

According to Al-Kofahi, this consortium project is “a tide that lifts all boats,” since participants gain benefits without having to risk competitive edges. He explains, “This is an area where it makes a lot of sense for industry to collaborate because we are establishing solutions for horizontal problems: how to scale deep learning models. Then each one of us, once we figure out a solution to that problem, can take that and adapt these models.”

Al-Kofahi continues, “We took these learnings and adapted them to different domains. We have BERT for legal, BERT for tax, BERT for other domains as well, and we are now exploring how to incorporate some of these models for some of our products. This is a win-win situation.”

One product for which the results show potential is WestLaw, Thomson Reuters’ legal research service suite and the technology that enabled Reuters journalists to analyze millions of legal documents for Hidden Justice. It’ll soon also play a key role in a much broader judicial arena: Thomson Reuters was recently chosen by the Administrative Office of the U.S. Courts to provide legal research tools to the Federal Judiciary, including the Supreme Court and federal public defenders.

These awards illustrate one of the potential benefits of pursuing new AI insights and staying close to the leading edge of AI research: the development of technology that increases justice and enhances access to it.

In 2017, the Congressionally-established non-profit, Legal Services Corporation, released a report declaring that “A lack of available resources accounts for the vast majority of eligible civil legal problems that go unserved or underserved,” and that “insufficient resources account for between 85% and 97% of all unserved or underserved eligible problems.” [3]

“How can we improve access?” Al-Kofahi says. “There are significant opportunities to use AI and machine learning to improve matter intake, provide resolutions that are aided by an arbitrator downstream, and so on. I think AI in that sense will transform the legal industry and how legal services are provided.”

He adds, “We are already part of the transformation.”

[1] Reuters Investigates. Hidden Injustice. How we did the data analysis. www.reuters.com/investigates/special-report/usa-courts-secrecy-how/

[2] Investigative Reporters & Editors. The Philip Meyer Awards. https://www.ire.org/awards/philip-meyer-awards/

[3] Legal Services Corporation. The Justice Gap: Measuring the Unmet Civil Legal Needs of Low-income Americans. 2017. Pg. 44

Learn more about Vector’s industry sponsorship opportunities, click here.

Success Story

From Vector Institute Internship to Dream Job: A Success Story in Machine Learning

Success Story

Transforming user experiences with AI: OJ Onyeagwu’s internship success

Success Story

Vector workshops give insights into responsible health AI deployment

Success Story

A change agent for AI workforce transformation: my time as a Vector Institute AI project management intern

Success Story

CIBC Analytics Day Recap: Understanding and Operationalizing Trustworthy AI

News

Vector awards nearly $2 million in scholarships to top master’s students pursuing graduate studies in AI in Ontario

Success Story

Students win award at inaugural Introduction to Machine Learning course for Black and Indigenous students

Success Story

Scaling AI: How Accenture bridges research and business to create organization-wide AI capability

News

Vector Bronze Sponsor Surgical Safety Technologies lands on Time magazine’s Best Inventions 2019 list

Research

Coronavirus research breakthrough has Toronto roots

News

AI community celebrates Dr. Geoffrey Hinton at Evolution of Deep Learning Symposium

News

Vector’s Chief Scientific Advisor, Dr. Geoffrey Hinton, wins the Honda Prize 2019

News

Vector Institute Announces Second Cohort of Vector Scholarship in AI Recipients

News

Vector Institute Grows Faculty Across Canada

Success Story

How Thomson Reuters uses NLP to enable knowledge workers to make faster and more accurate business decisions

Related:

From Vector Institute Internship to Dream Job: A Success Story in Machine Learning

Transforming user experiences with AI: OJ Onyeagwu’s internship success

Vector workshops give insights into responsible health AI deployment

A change agent for AI workforce transformation: my time as a Vector Institute AI project management intern

CIBC Analytics Day Recap: Understanding and Operationalizing Trustworthy AI

Vector awards nearly $2 million in scholarships to top master’s students pursuing graduate studies in AI in Ontario

Students win award at inaugural Introduction to Machine Learning course for Black and Indigenous students

Vector’s Shingai Manjengwa named one of Ryerson DMZ’s Women of the Year

Quantum tech startup yiyaniQ is Vector’s first spin-out company

Vector Faculty Member Toni Pitassi is the recipient of the 2021 EATCS Award

Vector Institute Releases First Annual Ontario AI Snapshot

Vector Scholarship in AI winner recognized for bringing AI to cancer analysis

Researchers and startups converge in Toronto AI ecosystem

VECTOR RESEARCHER BO WANG DEVELOPS TOOL TO TRACK GENETIC EVOLUTION OF COVID-19

Vector Scholarship in AI Announcement

ROXANA SULTAN, VP, HEALTH

Scaling AI: How Accenture bridges research and business to create organization-wide AI capability

Vector Bronze Sponsor Surgical Safety Technologies lands on Time magazine’s Best Inventions 2019 list

Coronavirus research breakthrough has Toronto roots

AI community celebrates Dr. Geoffrey Hinton at Evolution of Deep Learning Symposium

Vector’s Chief Scientific Advisor, Dr. Geoffrey Hinton, wins the Honda Prize 2019

Vector Institute Announces Second Cohort of Vector Scholarship in AI Recipients

Vector Institute Grows Faculty Across Canada

Congratulations to TD’s Layer 6 on Winning Spotify RecSys Challenge 2018