AI for Chemistry and Materials: blending old and new ways of thinking

November 7, 2023

By Anatole von Lilienfeld

The introduction of AI into the mix of physics-based computer simulations has led to fascinating changes in the world of chemistry and materials science. Instead of spending ages poking around in the dark studying one compound at a time, we’ve now got machine learning gadgets sorting through mountains of virtual compounds, providing reliable answers about new chemical compounds, or new ways to tweak existing materials for special jobs.

This change is akin to the introduction of calculators replacing longhand calculations. However, in this case, the calculators give us a tremendous shortcut to understanding and control. It could soon become routine to craft the very stuff our world is made of on-demand and in real-time, allowing us to tackle many of the world’s challenges through improved materials and molecules. Imagine how this will allow us to accelerate drug discovery and personalized medicine, or reduce our carbon footprint by creating novel battery technologies.

I find it remarkably beautiful to blend these old and new ways of thinking, an approach exemplified by the following three publications from my lab.

Density Functional Theory in the AI Age

Quantum mechanics, especially an approach called density functional theory (DFT), is a powerful tool in understanding chemicals and materials and predicting their properties and behavior. This method is prized for its accuracy, versatility, and universality in computer simulations and is broadly applicable across Mendeleev’s periodic table. We have recently made a lot of advancements in using machine learning (ML) models trained on synthetic data that was obtained from computationally demanding DFT simulations. Already in 2011, we could show that after training, these ML models could deliver instant predictions for novel compounds with DFT quality and versatility. More recent advancements such as generative AI and large language models have led to further major impact and are setting the stage for software that holds great promise for planning successful experimental validations in labs that could essentially run themselves.

In the paper “The central role of density functional theory in the AI age” published in Science, we discuss how this enables a future where robot-led experiments become as fundamental to science as machine learning, computer simulations, traditional theories, and human-led experiments.

Quantum Machine Learning

Within the realm of our quantum mechanics-based ML models, the way we represent or “map” chemical systems is crucial since it directly affects the training data needs required to reach desirable predictive power. The most predictive methods of representation, which get away with minimal training data set sizes, however, tend to be computationally heavy, making the training step slow and taxing in terms of hardware needs and carbon footprint.

Since moving my lab to Toronto last summer, Vector postdoctoral fellow Stefan Heinen and Vector grad student Danish Khan and I have been working on a novel physics-based featurization of chemical compounds that is so compact that the training cost of ML models can be reduced by multiple orders of magnitudes. In our paper “Kernel based quantum machine learning at record rate: Many-body distribution functionals as compact representations,” which was published in The Journal of Chemical Physics, we propose an improved way to represent chemical systems that’s not only ultra-compact but that is also invariant with respect to the system’s size. It’s based on something dubbed atomic Gaussian many-body distribution functionals (MBDF).

When we tested MBDF on benchmark data sets consisting of organic molecules, its performance matched or even rivaled the best current methods for various quantum properties. Our findings suggest that the MBDF-based approach can efficiently navigate the balance between the cost of sampling and training while maintaining high accuracy, making it the preferred choice for certain settings of available training data and compute hardware. Generating chemically accurate predictions for quantum properties of unseen out-of-sample compounds, it achieves sampling rates in the chemical compound space that amount to sifting through roughly 48 molecules/second using just a single compute core. While humans are typically incapable of providing quantitative estimates of quantum properties, numerically solving the corresponding equations for just a single molecule would easily consume thousands of seconds when done the conventional way without ML.

Accelerating Scientific Research Through Automation

This summer Siwoo Lee, an undergraduate in the Department of Chemistry at University of Toronto, Heinen, Khan, and I developed an autonomous workflow that combines a convolutional neural network with a large language model to pull specific tables of data from scientific papers. We tested this approach for 592 organic molecules that were studied within 74 different papers published between 1957 and 2014. These papers reported experimental measurements on a property crucial for electrochemistry research, the oxidation potential, with values ranging from -0.75 to 3.58 V and the sign indicating if the molecule would rather attract or release an electron.

After curating the data for validation and to account for differing experimental conditions, we used it to train additional machine learning models which were able to predict oxidation potentials with an error margin close to what you’d expect from regular experimental errors (about +/- 0.2 V). If multiple studies had results for the same molecule, our AI model could decide which value was most likely the correct one. We then used our models to predict the oxidation potential for over 100,000 organic molecules, finding values between 0.21 to 3.46 V. Our analysis showed that certain molecule features, like being aliphatic, could raise the oxidation potential from an average of 1.5 to 2.0 V, while having more atoms generally lowered it. Importantly, our workflow demonstrates how daisy chaining multiple AI models enables an automatic workflow that will significantly cut down the manual work scientists would normally have to do to obtain computational property estimates of novel compounds based on AI models trained on literature data.

This exciting new work has been submitted for publication and is currently available online as a pre-print. Future applications of this line of research might well contribute to the revolution of experimental materials and chemistry research efforts through autonomous AI agents investigating novel questions and problems through the use of self-driving laboratories.

We’re on the brink of some really neat stuff; we could soon be stumbling upon new medicines, crafting better batteries, producing improved organic electronics, concocting cleaner ways to drive chemical reactions with tailored catalysts, and maybe, just maybe, chancing on those elusive room temperature superconductors.

AI for Chemistry and Materials: blending old and new ways of thinking

Density Functional Theory in the AI Age

Quantum Machine Learning

Accelerating Scientific Research Through Automation

Related:

Vector Institute and South Korea’s National AI Research Lab partner to accelerate frontier AI research

Vector Institute and European Space Agency partner to advance AI for Earth observation

The AI Scientist: Towards full automation of the research life cycle

Vector researchers advance generative AI, responsible AI, and scientific discovery at ICML 2026

Anne Martel: Using AI to personalize cancer treatment

Mohamad Moosavi: Accelerating the search for climate solutions with AI

Hassan Ashtiani: Building trustworthy AI through mathematical foundations

Vector researchers advance representation learning and deep learning research at ICLR 2026

Remarkable 2026 Poster Session: 60 research projects shaping AI’s future

CRISPNAM-FG: An interpretable Fine-Gray deep survival model for competing risks in health care

The New Cartography of the Invisible

Vector researchers advance AI frontiers with 80 papers at NeurIPS 2025

When smart AI gets too smart: Key insights from Vector’s 2025 ML Security & Privacy Workshop

Vector Institute names 13 new Faculty Members, expanding core research leadership across Ontario

Vector researchers dive into deep learning at ICLR 2025

Vector researchers tackle real-world AI challenges at ICML 2025

Transforming Youth Mental Health Support: FAIIR’s AI-Powered Crisis Response Model

AI Weather Forecasting Breakthrough: How Canadian Innovation is Transforming Climate Prediction | Aardvark Weather

Exploring Intelligence: Vector Faculty Member Kelsey Allen’s Path from Particle Physics to Cognitive Machine Learning

Real World Multi-Agent Reinforcement Learning – Latest Developments and Applications

Leveraging Large Language Models for More Efficient Systematic Reviews in Medicine and Beyond

Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making

Recommender Systems: Where Academia Meets Industry

My Visiting Researcher Term at Vector Institute

Vector researchers presenting more than 98 papers at NeurIPS 2024

Unlocking the Potential of Prompt-Tuning in Federated Learning

New multimodal dataset will help in the development of ethical AI systems

Unveiling Alzheimer’s: How Speech and AI Can Help Detect Disease

Vector co-founder Geoffrey Hinton wins the Nobel Prize in Physics 2024

Empowering Air Travelers: A Chatbot for Canadian Air Passenger Rights

Vector Institute researchers reconvene for the second edition of the Machine Learning Privacy and Security Workshop

Vector researcher Wenhu Chen on improving and benchmarking foundation models

Vector Researchers present papers at ACL 2024

AtomGen: Streamlining Atomistic Modeling through Dataset and Benchmark Integration

Vector researchers presented more than 50 papers at ICML 2024

Vector researchers are presenting over a dozen papers at CVPR 2024

Vector Institute Computer Vision Workshop showcases the field’s current capabilities and future potential

Vector researcher Gautam Kamath breaks down the latest developments in robustness and privacy

World-leading AI Trust and Safety Experts Publish Major Paper on Managing AI Risks in the journal Science

Standardized protocols are key to the responsible deployment of language models

The known unknowns: Vector researcher Geoff Pleiss digs deep into uncertainty to make ML models more accurate

Breaking Ground: Natural language processing headlines Vector Institute’s latest workshop gathering

Vector Research Blog: Is Your Neural Network at Risk? The Pitfall of Adaptive Gradient Optimizers

How Vector Researcher Xi He uses differential privacy to help keep data private

Vector Research Blog: Structured Neural Networks for Density Estimation and Causal Inference

Vector Research Blog: Causal Effect Estimation Using Machine Learning

Machine learning theory takes centre stage at Vector Institute workshop

Introducing FlexModel: Breakthrough Framework for Unveiling the Secrets of Large Generative AI Models

Neutralizing Bias in AI: Vector Institute’s UnBIAS Framework Revolutionizes Ethical Text Analysis

Vector researchers presenting more than 65 papers at NeurIPS 2023

AI & public health: using natural language processing for clinical database management

ICML 2023: Developing an adaptive computation model for multidimensional generative tasks

Vector Research Blog: Large Language Models, Prompting and PEFT

Dan Roy named Vector Research Co-Director

Unlocking AI-powered approaches to cancer treatment and detection

Vector community explores data privacy research at Machine Learning Privacy and Security Workshop

Machine Learning Meets Quantum Mechanics: Vector Workshop Showcases Groundbreaking Developments in Quantum Computing

Over 20 Vector research papers accepted at CVPR 2023

Vector research featured at ICLR 2023

AI Research Symposium highlights new Vector research

Vector researchers win top honours at NeurIPS 2022

Canada can lead in AI for Science

Vector researcher Alán Aspuru-Guzik delivers CIFAR Massey Talk

Deep Learning for Building Footprint Extraction in Aerial Imagery

Graham Taylor named Vector Research Director

Acceleration Consortium, Matter Lab, and Vector Institute collaborate on software to power self-driving labs

New Vector Faculty Member Jeff Clune’s quest to create open-ended AI systems

Vector research blog: Value Gradient weighted Model-Based Reinforcement Learning

New AI framework helps map and manage invasive mussel species in Canada’s lakes

Computer Vision Technical Report details insights from industry-academic collaborative project

Vector researchers recognized with awards at the 2022 International Conference on Learning Representations (ICLR)

Research Symposium brings together Vector community to celebrate student and postdoc work

Amateur hockey given professional viewing experience courtesy of machine vision startup co-founded by Vector researcher

AI-enabled tool that identifies COVID-19 variants co-developed by Vector researcher Bo Wang

Technology, including AI, increasingly plays a key role in our food chain