And may have just innovated its way to major new energy savings

February 17, 2021

By Jonathan Woods

Everything in this small, nondescript datacentre comes in singles. There’s one server, one cooling unit, and one cardinal rule: stay within thermal guidelines. It’s into this setting that TELUS released an AI agent tasked with cooling the room as efficiently as possible, and gave it virtual carte blanche to figure out how.

This was the first real-world test in TELUS’ Energy Optimization System Project (EOS), a pilot in which a reinforcement learning agent took control of a real physical system in order to teach itself how to best operate it. Excitement ran high. Two months prior, that same agent had showed it could increase energy efficiency by 2%-15% in a simulator, thanks in large part to a series of its own ingenious innovations.

EOS was born in the Vector Institute’s Model-Based Reinforcement Learning (MBRL) Project ― the AI-for-good initiative operated by Vector’s Industry Innovation team ― and was developed by TELUS to align with its sustainability goal of reducing its energy intensity by 50% between 2020 and 2030.

“Reinforcement learning is one of three main AI categories. The others are supervised and unsupervised learning,” explains Dr. Vincent Zha, Senior Data Scientist in TELUS’ Advanced Analytics & AI Team. “But reinforcement learning is a bit different. It doesn’t need as much input data, and you don’t really learn anything until you get to the end of its process.”

Unlike supervised and unsupervised learning, a reinforcement learning model isn’t initially trained on data to learn how to make accurate predictions. Instead, it begins with a goal and then interacts with an environment, learning from the consequences of its actions and iterating its way toward optimization. “Reinforcement learning is about taking actions, as opposed to being just about prediction,” says Amir-massoud Farahmand, a Vector Faculty Member who, along with Vector researcher Romina Abachi, provided guidance to MBRL Project participants. “For this reason, reinforcement learning is going to be very useful for applications in industry,” Farahmand says.

Another reason reinforcement learning is particularly well-suited to real-world systems is that it accounts for the long-term effects of decisions. When considering what action to take in a given situation, a reinforcement learning agent makes predictions about the immediate impacts of potential actions as well as the effects each will have further on down the line. This enables the agent to address the current state of the system in which its operating, while staying on track to achieve its ultimate goal in an optimal way. These features – being goal-oriented and able to consider the future – are why the team opted to use reinforcement learning for this pilot.

Before testing the reinforcement learning agent on real-world systems, the team ran it on a simulator. In the test, the agent was given control of the heating, ventilation, and air conditioning unit (HVAC), which has two methods for cooling the room: a relatively energy-intensive air compressor and a less-expensive ‘free cooling’ function ― essentially outside air that’s pushed into the room. The agent was also provided with the day’s weather forecast and a minute-by-minute update of the room’s temperature. After each update, the agent would decide whether to employ the air compressor, employ free cooling, or do neither. It would then recalculate its action plan for the rest of the day, updated with its new learning.

The results astonished the team – not only for the 2%-15% reduction in energy use the agent achieved – but for the numerous innovations it had concocted to do so. Two of these innovations stood out.

First, the agent immediately took a hands-off approach, allowing the room’s temperature to rise as close to the limit as possible before stabilizing it there. Typically, the HVAC unit’s program would push the temperature down by four degrees any time it closed in on the thermal range’s upper bound. This would provide a sufficient buffer to accommodate the full range of outdoor temperatures that could affect conditions within the datacentre. The agent learned that it was more efficient to let the temperature rise throughout the morning toward the threshold and then dynamically manage it there using quick bursts of air compressor and free cooling as required. The logic is clear: Why waste energy lowering the temperature more than required? As long as the temperature remains under the upper bound, whether by 4° or 1°, the agent stays onside.

A second approach was “truly an innovation” by the agent, according to Zha. Around five in the morning, near the time the outdoor temperature hit its low for the day, the agent began blasting the room with free cooling, which it ran continuously until mid-afternoon. It seemed baffling at first. Free cooling isn’t expensive, but it’s not cost-free. Why expend energy when the room was already at its coolest?

The answer was in the weather forecast. Zha explains: “The agent sees in the weather forecast that the afternoon temperature will be very hot. It determines that now is the best time to run more free cooling to delay the use of the compressor, because the compressor is very expensive.”

The agent is making a trade-off. “It’s like a good chess player,” says Zha. “A good chess player sometimes sacrifices in the immediate step, but it can gain more 20 steps later. Sacrifice a bit now to avoid a big cost in the future.” It hadn’t occurred to the team that the agent would discover this approach. Standard HVAC programming didn’t take the day’s unique forecast into account. It couldn’t. But the agent had figured it out.

“This is the hallmark of reinforcement learning,” says Zha.

Fast forward two months, and the TELUS team finds itself working in that small nondescript room to test the agent on a real-world HVAC ― and sees the simulated results validated. The team’s next steps: to expand testing to larger and more complex sites before eventual implementation throughout TELUS data rooms.

The pilot has resulted in new capabilities and possibilities at TELUS. Organizing the execution of a reinforcement learning pilot on real physical systems showed that a cross-section of technical and non-technical professionals within TELUS could come together to implement a sophisticated AI project.

“Once we had to touch physical systems, then not only did we have to understand reinforcement learning theory and develop the algorithm, we also had to navigate all the different stakeholders and owners of the network to even get access to that room,” says Dr. Ivey Chiu, Senior Data Scientist in Advanced Analytics & AI. The project involved close collaboration with TELUS’ Mission Critical Environments’ Manager Technology Strategy Dominic Dupuis, Energy Management Engineer Jonah Braverman, and Alexandre Guilbault, Director of Analytics & AI, as well as stakeholders from Network Operations, Building Operations, and the Data & Trust Office, all of whom had to understand, lend expertise, and sign off on the project.

“It was a process of building trust with us and being very transparent about what we were trying to do,” Chiu says. Chiu also points out that this cross-functional approach and emphasis on trust and transparency reflects TELUS’ commitment to responsible AI, which guides their efforts to augment technological capabilities in a way that brings benefits to society.

Additionally, the effort resulted in TELUS’ Advanced Analytics & AI team producing original research. “Normally we use existing algorithms to improve our business. Here, we invented a new algorithm to solve the problem,” says Zha. The agent was able to learn on a minute-by-minute basis because of hyperspace neighbour penetration, an improvement the TELUS team made on textbook algorithms. The innovation enabled the agent to account for slowly-changing variables, like the gradual rise of the room’s temperature. Zha’s research paper on the algorithm is currently being reviewed by Vector faculty.

The promising early results have also sparked interest about the agent’s application in other real-world TELUS systems ― from cell tower base stations to agriculture ― where temperature is key to performance, and energy efficiency improvements can translate into sustainability and savings at scale. Jaime Tatis, Vice-president of Data Strategy & Enablement, says, “We are excited to see what comes next and to see what other problems we can solve with reinforcement learning now that we were able to deliver on this proof of concept.”

Finally, there’s the deep satisfaction that comes with achieving a challenging technical goal. “In the end, we do have a sense of accomplishment that we did something difficult, both on the research side and on the industry side,” Chiu says. “It really showed that TELUS is open to responsible innovation and open to teamwork.”

A TELUS AI agent approached sustainability like a chess game

And may have just innovated its way to major new energy savings

Related:

Vector researchers tackle real-world AI challenges at ICML 2025

Ontario’s AI ecosystem: fueling real economic growth with record number of jobs and private investments

Transforming Youth Mental Health Support: FAIIR’s AI-Powered Crisis Response Model