UNIVERSITY PARK, Pa. — Data centers consume millions of homes’ worth of electricity each year, with much of that electricity simply powering the cooling systems that keep the facilities operational. Researchers at Penn State are addressing this inefficiency by using artificial intelligence (AI) to dynamically adjust data centers’ power usage to peak when the weather is favorable and electricity is affordable.
The team developed software, powered by a new physics-based AI learning model, that analyzes real-time climate and economic data to optimize data center cooling. The software works by simulating a virtual data center that serves as a training space for an AI agent — a system that can make highly complex decisions and learn over time. A trained agent can provide cooling recommendations personalized for the climate and economic market of a data center, specifically optimized for a facility’s location.
The new approach is detailed in a paper accepted and set to be presented by the researchers at the IEEE ITherm Conference in May.
The art of chilling out
Cooling the systems in data centers costs taxpayers and companies millions of dollars in electricity annually, according to Wangda Zuo, professor of architectural engineering at Penn State and corresponding author on the paper. He explained that the intricate cooling systems inside a data center are a main reason these facilities are so power-hungry.
“Cooling currently accounts for about 40% of a data center's total electricity use — it just goes to keeping the data center operational,” Zuo said. “On top of that, operators must navigate extreme environmental conditions like high ambient temperatures that raise cooling costs, as well as economic factors like volatile electricity and Bitcoin prices when mining for the cryptocurrency. These factors can sharply narrow profitability windows for some facilities.”
Although solutions like liquid-based cooling or heat-resistant hardware could help, Zuo said using AI to dynamically adjust cooling rates and power usage to capitalize on these volatile variables could offer substantial improvements for a fraction of the price. However, many cooling solutions do not allow for dynamic shifts.
“Traditionally, data centers are cooled to static thermal targets, which can lead to substantial financial losses when electricity prices are high,” Zuo said. “The cooling options that do offer AI-informed shifting require extensive training data and cannot effectively react to unfamiliar situations. We wanted to design software that can account for external conditions and better guide these shifts.”
Digital twinning is winning
The team’s software can interpret the dynamic shifts in the external temperatures, humidities and economic conditions data centers face when determining cooling parameters, Zuo said. The framework is based on what the team calls a physics-informed reinforcement learning model. They use industry-accepted hardware standards to set targets for data center temperature, humidity and more. These baselines inform the creation of a digital twin, or simulation, of a data center, with different variables that can be manipulated to observe how cooling recommendations may change depending on the climate the data center is located in.
For this specific study, the researchers simulated a data center set in Houston, Texas — testing their cooling approach in an extremely hot and humid environment. This simulation was then used to train an agentic AI that can provide cooling recommendations to data center operators that optimize profitability, efficiency and safety across the facility in real-time.
Viswanathan Ganesh, an architectural engineering doctoral candidate and first author on the paper, explained how this approach optimizes agents for both safety and effectiveness. While static cooling targets had kept data centers protected from hardware failures, they were also the primary source of inefficiency. Integrating the static safety requirements of data centers into the framework allows the team to dynamically adjust power usage, without sacrificing stability.
“Each hardware component used to cool a data center has its own operational ranges that cannot be violated, so we integrated them into our modeling,” Ganesh said. “We can massively increase efficiency, while ensuring that our agent instructs the centers to adhere to recommended temperatures.”