Liquid Cooled Data Center – When it’s right for HPC
It is earth shattering news to no one in the tech industry that a liquid cooled data center is an emerging trend. With the growing adoption of cloud services, exponentially increasing data, quantum computing, and training of GenAI models, the need for a liquid cooled data center, in particular for HPC, is on the rise and not showing any signs of slowing up. According to Fortune Business Insights, the global HPC market is projected to double between now and 2032, with North America dominating.
HPC systems are being used for complex simulations, mathematical computations, modeling, analytics and AI. These systems require more complex processing architectures and specialized hardware, that’s where a liquid cooled data center comes in. They’re also power hungry. The more complex and beefed up the system, the more power it needs. And of course, all that power means heat. It’s common knowledge that processors operate better when they’re cooler. A hot environment is too harsh for all hardware components, shortening their lifespan and limiting performance. We don’t want our brand-new data centers melting, now do we?
The need for cooling has always been an essential consideration in data center planning. Indeed, these days, it is also one of the single biggest factors affecting size and operational costs. Cooling requirements typically account for 40% of data center costs. In terms of scale, because CPU and GPU components must be spaced adequately to be air-cooled, any increase in computing power prompts a larger footprint. And of course, the bigger the facility, the more expensive it is to run and to keep at a consistent temperature.
But what if we could flip those numbers and that conventional wisdom on its head? And perhaps, at this point, we are beyond the ‘what if’ stage. We NEED to flip these numbers around. Energy costs are perpetually increasing. Carbon emissions must be curbed. Physical space is at a premium. And HPC systems add entirely new wrinkles to the equation in terms of space, density, and temperature requirements. For all these reasons, people are looking for an alternative to traditional air-cooling methods. For HPC-dominated data centers, liquid cooling may be the answer to all of these questions. Let’s dig in to why that is.
LIQUID COOLED DATA CENTER: HIGH PERFORMANCE = HIGH NEEDS
On every level, an HPC system brings new cooling and energy challenges. First off, the chips. The newer, faster GPU and CPU chips have more components in a tiny package, which means two things: they need more energy and they’re significantly less heat tolerant. Between 2007 and 2024, the power consumption of a single server has increased 33 times over, from .3kW to 10 kW. Meanwhile, older chips that could run smoothly at 90 to 100°C (Silicon Temp or Tcase) are being phased out for these new chips that can have a Silicon Temp (Tcase) as low as 60°C. So, you need more power AND lower temperatures.
At the next level up, in HPC systems, server racks are denser. AI and other HPC processes NEED high density racks—it’s the only way to achieve that computing muscle. But this also means that the systems need more power. According to a 2024 report by the International Energy Agency (IEA), organizations that introduce HPC and AI tools could see a tenfold increase in their electricity demand. As an example, compare the electricity used in a typical Google search (.3 watt hours) to that used for OpenAI’s ChatGPT (2.9 watt hours per request). When you begin multiplying that difference by billions of searches per year, it adds up to…A LOT. The bottom line: HPC data centers need more power and cooler temperatures, a combination of requirements that cannot be met by air cooling.
PERFORMANCE, DENSITY, AND EFFICIENCY: THREE KEY BENEFITS OF LIQUID COOLING DATA CENTER SOLUTIONS
A liquid cooled data center has long been used in supercomputers and gaming circles as an alternative to air cooled systems and for good reason. Water and other fluids have higher thermal transfer properties and can be up to 3000 times more effective than air. Liquid cooled systems are also quieter, since they often have reduced or eliminated needs for fans or motors. But the three main reasons that liquid cooling is ideally suited to be incorporated in HPC systems are performance, efficiency, and density.
First, performance. Because liquid cooled systems are so effective and consistent, with no hot spots in the racks, they actually lead to a boost in performance. HPE did a side-by-side comparison of air cooling versus liquid cooling on the same server (HPE’s XD2000, a higher density HPC product). Over 5 years, the liquid cooled system provided a performance boost of 2.7%. That may seem small but when considered alongside the fact that the system also used about 15% less power, the gain is larger.
Which brings us to the second factor: efficiency. In HPE’s comparison of the XD2000, the liquid cooling system offered huge savings in energy costs. The air-cooled system came in around $2 million per year while the liquid cooled system was around $300,000. When 40% of your data center costs are derived from cooling your space, if you can chop that number by 85%, you can reduce your overhead significantly. Today’s average air-cooled data center uses the power of 2000 homes. For the same size, liquid cooling reduces that to about 280 homes. Liquid cooling promises reduced energy usage along with significant reductions in CO2 emissions.
Density factors into the massive savings in energy as well. Compared to a direct liquid cooled system, an air-cooled system needs about 5 times the server racks. When you’re talking about the high-density requirements of HPC and AI systems, the overall footprint matters. Being able to pack the racks becomes a key advantage.
LIQUID COOLING OPTIONS
There are a variety of liquid cooling technologies on the market today. These options offer flexibility to accommodate different workloads and requirements. Each one offers pros and cons in balancing energy efficiency with feasibility, particularly when considered existing data centers and any kind of retrofit situation.
LIQUID TO AIR COOLING In these setups, an air-cooling system close to the servers and the racks is kept cold by passing through an external chilled water supply from the facility. This hybrid approach is useful when it is simply not practical to use one of the other more direct methods of liquid cooling. You get increased efficiency and performance benefits over traditional air cooling—but not as much as with the 100% liquid cooled systems.
REAR-DOOR HEAT EXCHANGERS Mounted on the back of server racks, this method uses liquid coolant to capture heat as it passes through the exchanger. This is another hybrid method that works well in a retrofit where it is not possible to make major hardware changes.
IMMERSION COOLING This method involves fully submerging hardware components in a non-conductive liquid. Since the hardware is in direct contact with the liquid, it maximizes the thermal transfer properties of the liquid and eliminates noise (no fans needed). While it is highly efficient, this method also tends to be more expensive, more complex, and complicated to maintain when components need to be removed and cleaned.
DIRECT LIQUID COOLING This method involves the use of cold plates attached directly to processors, with a liquid coolant that is inside the plates. The coolant absorbs the heat and carries it via a closed-loop system of pipes. Offering very targeted placement of the plates, this method seems to be gaining traction as the most feasible in HPC data center applications. It has all the performance and efficiency benefits of 100% liquid cooling along with precision, customization, and minimal maintenance.
At this point, HPC systems have outgrown air-cooling capabilities. With today’s servers using over 30 times the power of those just twenty years ago, organizations of all sizes will be seeking out ways to economize. The growing energy demands for HPC data centers stand at odds with global attempts to decrease energy use, consumption, and CO2 emissions. In a full-circle moment, organizations are beginning to use machine learning AI tools to develop ways to reduce electricity demand. Google reported that its DeepMind AI managed to reduce the energy demand of its data center cooling systems by 40%. No doubt, AI will have a role in optimizing efficiencies in the years to come.
A Liquid cooled data center is a win-win option (and almost a no-brainer) in terms of sustainability, efficiency and costs. Out of all the methods listed above, Direct Liquid Cooling seems to be gaining the most traction due to its relative ease and precision in installation and its minimal maintenance requirements. Offering significant savings and an eco-friendlier approach to data center operations, liquid cooling deserves a closer look from anyone who is already invested in HPC systems or is considering a new HPC data center design.