Future Tech

More than a third of enterprise datacenters expect to deploy liquid cooling by 2026

Tan KW
Publish date: Mon, 22 Apr 2024, 11:10 PM
Tan KW
0 439,205
Future Tech

Survey As CPUs and GPUs grow ever denser and power-hungry, many, including Register readers, expect liquid cooling to play a larger role in enterprise datacenters over the next few years.

More than a third of enterprises (38.3 percent) expect to employ some form of liquid cooling infrastructure in their datacenters by 2026, up from just 20.1 percent as of early 2024, according to a survey of 812 IT professionals conducted by The Register this spring.

Liquid cooling isn't just for HPC and AI

Today, liquid cooling remains a niche, with the majority seeing the tech as most beneficial for high-performance computing (64.4 percent), followed closely by dense server configurations (60.6 percent), and to a lesser extent artificial intelligence workloads (46.2 percent).

This makes sense as liquid cooling has traditionally been employed in densely packed supercomputing cabinets from the likes of Eviden, HPE Cray, and Lenovo. These systems are complex and rely on large coolant distribution units, chillers, and facility water systems. By comparison, most AI systems up until recently have been air-cooled.

As we saw at GTC, this trend could soon change. While Nvidia's HGX B100 and HGX B200 systems will still be available in air-cooled form factors, its most powerful accelerators like the 2,700-watt Grace-Blackwell Superchip will require liquid cooling.

Despite the hype, adding AI capabilities was far from enterprises' highest priority, with roughly 58 percent of respondents saying improving facility security was their biggest pain point, followed by reducing energy consumption, increasing utilization of existing hardware, and acquiring higher performance systems all ranking above AI capabilities at 27 percent.

Liquid cooling isn't limited to AI and HPC systems. It also happens to be much more efficient at removing waste heat than air. As we've previously discussed, 15-20 percent of power consumption can be directly attributed to the fans used to move air through these systems. Transitioning to liquid cooling, depending on the technology involved, largely eliminates the need for high RPM fans, reducing power consumption considerably.

Combined with the opportunistic boost algorithms found on most modern processors, liquid-cooled systems should in theory be capable of achieving higher clock speeds than their air-cooled siblings.

Enterprises still undecided on direct to chip vs immersion cooling

Despite the advantages liquid cooling offers, readers remain split as to which version of technology they will ultimately deploy.

By 2026, 16.3 percent said they were going all in on direct-to-chip (DTC) liquid cooling, which replaces heat sinks with cool plates through which warm or chilled water or coolants are pumped. By comparison, 6.5 percent said they planned to go 100 percent immersion cooling. This technology involves submerging the entire system in either single-phase fluids like synthetic oils, or two-phase fluids engineered to boil at or around the chips' operating temperature.

About a sixth of respondents said they planned to use a mix of DTC and immersion cooling in their datacenters, while 61.7 percent said they had no plan to utilize either technology in the next two years.

Unsurprisingly, the largest enterprises expect to adopt liquid cooling the fastest. This could be down to a couple of factors, ranging from larger budgets for AI deployments or limited datacenter space necessitating denser rack configurations.

Most enterprises probably don't need liquid cooling just yet

Speaking of current rack power trends, it's not hard to see why so many enterprises are sticking with air cooling in the near term. About 87 percent of respondents reported rack densities of 50 kW of lower. That's the upper end of what Digital Realty CTO Chris Sharp told our sibling site The Next Platform its facilities could support without resorting to rear-door heat exchangers (RDHx) or DTC cooling.

Practically speaking, we tend to see RDHx - essentially rack-sized radiators used to chill hot air exiting servers down to acceptable levels - used in racks exceeding around 40 kW. For reference, that's roughly the load expected for a stack of four DGX H100 systems.

Just 6.7 percent of respondents said their average rack power was between 51 kW and 100 kW a rack, while 6.3 percent said it exceeded 100 kW. We've seen some larger RDHx systems that can handle air-cooled systems up to around 90 kW of thermal dissipation.

Here, again, there aren't many surprises with enterprises trending toward higher rack densities the larger they get.

Cost and reliability remain key concerns

But while readers expect to deploy liquid cooling more broadly across their infrastructure over the next few years, there remain challenges and concerns regarding adoption.

Among their top concerns were maintenance, complexity, and the initial cost of implementation. Liquid-cooled systems require additional resources, facility water, CDUs, and in the case of DTC, rack manifolds to distribute the coolant to the individual systems.

Existing datacenters can be retrofitted to support liquid cooling using in-aisle coolant reservoirs and liquid-to-air CDUs. However, the cooling capacity is generally lower with these kinds of approaches.

Following cost and complexity, 48.6 percent of respondents cited a lack of experience with the technology, and 41 percent expressed fears over leaks and spills. Finally, 21.4 percent said the cost of buying and replacing coolant as a potential challenge.

While liquid cooling does introduce additional complexity and points of failure, the technology is by no means new, having been used for decades in supercomputers, HPC-centric clusters and render farms, and more recently large-scale GPU and accelerator farms for training generative AI.

Increased interest in liquid cooling has given rise to preventative measures like negative pressure coolant loops designed to minimize spills in the case of a leak, and in rack CDUs with redundant or modular pumps. ®



Be the first to like this. Showing 0 of 0 comments

Post a Comment