Breaking Ground: Natural language processing headlines Vector Institute’s latest workshop gathering
March 13, 2024
March 13, 2024
By Arber Kacollja
The Vector Institute’s recent Natural Language Processing (NLP) workshop brought together NLP researchers to discuss their shared interests and to showcase work being done by the Vector NLP community. The NLP research landscape is intricate and complex, empowering machines to comprehend the nuances of human language. It is a critical facet of AI, a nexus between linguistics and computing.
Recent strides in NLP, particularly the rise of pre-trained large language models (LLMs), have integrated NLP systems into numerous facets of our daily lives. Moreover, the swift evolution of NLP research has spawned a plethora of applications spanning diverse domains. Beyond conventional applications like personal voice assistants and recommender systems, recent breakthroughs include ChatGPT for content generation and Dall-E for text-to-image generation.
Throughout the day-long event, Vector Faculty Members, Faculty Affiliates, Postdoctoral Fellows, and researchers from the wider Vector community discussed cutting-edge work, examined opportunities and challenges of NLP in the age of LLMs, and exchanged insights on various NLP-related topics.
Participants also took part in interactive breakout sessions as well as a poster session where graduate students presented their research.
The rapid advancements in the pre-trained language models literature has resulted in remarkable progress. Vector Faculty Member and Canada CIFAR AI Chair Frank Rudzicz, who is also an Associate Professor at Dalhousie University, presented on topics related to the scientific method in modern NLP research exploring issues related to metrics and benchmarking in deep speech technology.
The talk also explored the concept of scientific debt in NLP, focusing on the challenges of making research on pre-trained language models more honest. Technical debt, he explained, emerges when technical teams opt for expedient yet suboptimal solutions to problems, or neglect to invest time in constructing sustainable methods. This might involve adopting an approach that doesn’t scale well or conflating multiple components at once without understanding their interactions. Although these shortcuts may initially appear effective, they often lead to significant challenges over time.
In a similar vein to how technical debt commonly accrues during the rapid development of new software, scientific debt describes similar issues arising from the development and integration of pre-trained language models. “Instead of hastily accepting trends in NLP, we need to carefully disentangle interconnected components and evaluate their individual contributions to reported outcomes,” says Rudzicz.
He also outlined numerous pivotal recommendations for the way forward and gleaned lessons aimed at fostering progress in this field of endeavor.
As generative AI technologies are deployed in our daily lives, the communication between AI and human users becomes more important than ever. Zining Zhu, a recent PhD graduate at the University of Toronto and new faculty member at the Stevens Institute, presented a communication channel framework where AI as assistants can help humans process complex data and understand automatic prediction results. This framework uses information theory to reveal the mechanisms of the AI-generated explanations, and incorporates the audience to tailor the contents and the formats. Explanation is a widely-used tool in communication.
“An explanation can be generated by LLMs, but they should be informative, contextual, and helpful”
Zining Zhu
Faculty Member, Stevens Institute of Technology
His presentation is part of continuing efforts to establish thorough evaluation standards for explainable AI. In addition to the computational hurdles in producing these explanations, evaluating the resultant explanations necessitates a human-centered outlook and metrics.
Shifting to building an explainable metric for all text generation tasks, Dongfu Jiang, a PhD student at the University of Waterloo, presented TIGERScore, a trained metric that follows instruction guidance to perform explainable and reference-free evaluation over a wide spectrum of text generation tasks. Unlike traditional automatic metrics, TIGERScore provides detailed error analysis to pinpoint mistakes in the generated text accompanied with explanations of each mistake. Jiang discussed how he and his team, led by Vector Faculty Member and Canada CIFAR AI Chair Wenhu Chen, who is also an Assistant Professor at University of Waterloo, curated the synthetic MetricInstruct dataset by prompting GPT-4, which is used to fine-tune LLama-13b, to get TIGERScore. He outlined prompting strategies and intuitions used to improve the quality of the dataset and presented about the application scenarios of TIGERScore and outlined future research directions towards better LLM metrics.
While LLMs have demonstrated an impressive ability to comprehend and generate human-like text, evaluation of natural language generation tasks is a long-standing open problem. The methods for guaranteeing their relevance and coherence and addressing their errors remain ambiguous.
Model editing in LLMs is a task that uses a single sample to directly modify factual knowledge in a pre-trained language model. The success of model editing methods is currently only evaluated using the next few tokens, so we currently do not understand the impact of model editing on long-form generation like paragraph-length text. To solve this, Domenic Rosati, a PhD candidate at Dalhousie University, presented a novel set of evaluation protocols. Long-form evaluation of model editing (LEME) is an evaluation protocol that measures the efficacy and impact of model editing in long-form generative settings. After introducing the protocol, he discussed findings such as “fact drift” issues in direct optimization-based methods such as MEMIT and ROME, over estimation of in-context methods like IKE when analysis isn’t properly controlled, and various quality issues introduced by model editing like lack of lexical cohesion and topic drift that we are only able to identify using our evaluation techniques. This exciting new work has been submitted for publication and is currently available online as a pre-print. Finally, he emphasized the importance of model editing interventions to devise techniques that yield generations characterized by high consistency.
While LLMs pre-trained on large-scale corpus have shown great performance in various NLP tasks, these systems have been criticized for their lack of factual knowledge. Vector Faculty Member and Canada CIFAR AI Chair Pascal Poupart, who is also a Professor at the University of Waterloo, discussed advances in knowledge representation for NLP. He provided an overview of recent advances achieved by his group including inferring belief knowledge graphs and latent object representations for text-based games. Poupart also presented new techniques for efficient prompt tuning and distillation of large models into tractable models. Finally, he discussed challenges in leveraging large language models for grammar error correction and accelerated material discovery.
With AI technologies becoming more proliferating across society, it is essential to develop systems that cater to users from diverse backgrounds and transcend the Western cultural lens by incorporating cultural awareness. Failing to consider cultural nuances could result in models that reinforce societal inequities and stereotypes, hindering their efficacy for users from non-Western regions.
Despite their amazing success LLMs and image generation models suffer from several limitations. Vector Faculty Member and Canada CIFAR AI Chair Vered Shwartz, who is also an Assistant Professor at the University of British Columbia, focused her talk on the models’ narrow Western, North American, or even US-centric lens, a result of training on web text and images from primarily US-based users. As a result, users from diverse cultures that are interacting with these tools can feel misunderstood and experience them as less useful. Worse still, when such models are used in applications that make decisions about people’s lives, a lack of cultural awareness can lead to models that perpetuate stereotypes and reinforcing societal inequalities.
“In my lab we’re currently assessing the cultural sensitivity of LLMs, and working on augmenting them with cultural proficiency.”
Vered Shwartz
Vector Faculty Member, Canada CIFAR AI Chair
She also presented a line of work from her lab aimed at quantifying and mitigating this bias.
Analogous to multicultural NLP, multilingual NLP should consider these differences in order to enhance the efficacy of NLP systems for users. Language lies at the core of human interactions and communications. With over 7,000 languages globally, an optimal language-universal NLP system should demonstrate proficiency in processing and comprehending diverse languages.
Freda Shi, a PhD candidate at the Toyota Technological Institute at Chicago, focused on the multilingual ability of large language models. In light of the remarkable accomplishments of recent LLMs, there arises a natural question about their multilingual ability, especially when it comes to low-resource and underrepresented languages. Although languages worldwide exhibit numerous similarities, they also boast significant typological diversity. She reviewed the multilingualism exhibited by current LLMs, taking bilingual lexicon induction and multilingual reasoning as two representative tasks. She concluded by discussing the challenges and opportunities of multilingualism in the LLM era.
Pushing NLP research and systems forward requires confronting the distinct hurdles associated with collecting and analyzing data from low-resource languages. As NLP techniques evolve, it is imperative to guarantee that the cultural identity and language of native speakers are neither neglected nor exploited. Instead, they should be actively engaged in the process and empowered to influence how technology represents them, their language, and their culture.
John Willes, Technical Team Lead in Vector’s AI Engineering team, introduced VectorLM. VectorLM is a lightweight package developed by the Vector Institute AI Engineering team to optimize common LLM fine-tuning workloads and empower Vector researchers in training moderately-sized models more efficiently on the Vector cluster. Willes provided an overview of practical lessons learned while training LLMs on Vector’s HPC cluster. Important hardware and software trade offs were examined through the lens of the compute and network constraints specific to the cluster.
NLP’s rapid advancement has revolutionized communication. Increased interconnectedness across the globe has resulted in a plethora of applications across diverse domains. With NLP now pervasive and increasingly reliant on user-generated data, it is more important than ever that research in the field embraces and adopts safe and trustworthy methodologies.
Want to learn more about the Vector Institute’s current research initiatives in natural language processing, watch the full playlist of talks on YouTube