It’s running production workloads. But the rack of servers submerged in engineered fluid inside a Microsoft data center in Quincy, Washington, is still somewhat of a science project, similar in its purpose to Project Natick, the hermetically sealed computer-filled capsule the company’s R&D scientists had running on the ocean floor off the shores of Orkney Islands, in Scotland.
Like Natick, running real production software on a few dozen servers inside a tub of low-boiling-point fluid in Quincy is a way to answer an initial set of basic questions before deploying at a larger scale to test for the design’s impact on reliability.
This phase is meant to test for basic functionality and operability, Christian Belady, VP of Microsoft’s data center advanced development group, told DCK in an interview. Does immersion cooling affect server performance in any way? How easily can a data center technician adjust to working with servers submerged in liquid? Those are the types of questions his team is looking to answer at this stage.
It’s just a single rack, a much smaller deployment than the latest Natick experiment, but what’s at stake here is nothing less than the future trajectory of computing at scale. Chipmakers are no longer able to double a processor’s speed every couple of years without increasing its power consumption by cramming more, tinier transistors onto a same-size silicon die. Belady and his colleagues are trying to see if they can hold onto the benefits of Moore’s Law by cramming more processors inside a single data center.
“Moore’s Law for infrastructure,” is how he put it. “How do we continue to follow the scaling of Moore’s Law by looking at the full [data center] footprint?”
If you’ve been following this space, you may be tempted to think of this development and Google’s deployment of liquid-cooled AI hardware a few years ago as part of the same trend. That’s only true to a small extent. The difference in purpose dwarfs any similarity. Microsoft isn’t looking at liquid cooling for a subset of the most powerful computers running the most demanding workloads. It’s looking at it as a way to continue increasing its data centers’ capacity to process any workload at the same rate as it was when Moore’s Law was in full effect.
“We no longer have the luxury to count on the chip for performance [improvements] year over year,” Belady said.
The technology used in Microsoft’s deployment is one of several types of liquid cooling available to designers of computers. In a two-phase immersion cooling system, a synthetic liquid engineered to boil at a low temperature – in this case 122F, or 90F lower than the boiling point of water – turns into vapor upon contact with a warm processor, removing its heat by turning into bubbles of gas that travel up to the surface, where, upon contact with a cooled condenser in the tank’s lid, the gas converts back to liquid that rains back down to repeat the cycle.
Belady was careful to emphasize that Microsoft remained “agnostic” on the type of liquid cooling technology it would choose for a scaled deployment. He and his colleagues, including Husam Alissa, a principal engineer, and Brandon Rubenstein, senior director of server and infrastructure development management and engineering, started looking at liquid cooling years ago. Observing the trends in processor design, they wanted to be familiar enough with the available alternatives for cooling servers by the time an individual chip’s power consumption hit the limit of what air-based cooling technology could handle. “We’re not hitting limits yet,” Belady said, “but we see it coming soon.”
If not in five then in 10 years, we’ll see entirely liquid-cooled data centers become mainstream, not a niche phenomenon seen only in the worlds of supercomputers and bitcoin mining, he estimates. Even if in five years all servers are available liquid-cooled, you’d still have to wait a few years for the old, air-cooled ones to age out.
Alissa and Rubenstein presented results of their experiments with multiple liquid cooling technologies at the 2019 OCP Summit, the Open Compute Project’s annual hyperscale data center hardware and infrastructure design conference in San Jose. Their presentation included two-phase immersion cooling, single-phase immersion cooling (where hydrocarbon fluid cycles between hardware and a heat exchanger), and cold plates (where a traditional heat sink on the motherboard is replaced with a flat rectangular piece of heat-conducting metal that contains tiny pipes carrying coolant liquid in and out to a cooling distribution unit shared by all servers in the rack).
They found a lot to like in both immersion and cold-plate designs, Belady said. Both allow you to run servers much hotter than air cooling does, and both allow you to get rid of server fans. One area where immersion really wins is the extent of compute-muscle densification it makes possible. “It allows us to really densify,” he said. “More circuits per volume.”
But, “we’re kind of agnostic still on the direction, and we see a future where both will exist together.” The data center-level infrastructure supporting it all would be the same. What’s important here is that instead of fighting to squeeze every last drop of efficiency from air-based cooling – a fight that’s now crossed the threshold of diminishing returns, Belady admitted – computer designers are just at the start of taking advantage of the cooling capacity of liquids.
What’s the tank’s PUE, we asked? “Oh, it’s close to 1,” Belady replied.