Google is making entire interconnected racks full of its custom chips for machine learning available as a cloud service.
Previously available from Google Cloud only as individual devices, its TPU v2 and TPU v3 hardware now comes as powerful interconnected systems called Cloud TPU Pods. In other words, you can pay the Alphabet subsidiary to temporarily use a full rack (or multiple racks) of TPUs, all linked by a network mesh in a way that makes them behave as a single enormously powerful supercomputer.
According to the company, which made the announcement Tuesday at Google IO, a single TPU v3 Pod has “more than 100 petaFLOPs of computing power.” If true, the performance would make it comparable to the world’s fourth-fastest supercomputer, Tianhe-2A in Guangzhou, China, whose peak theoretical performance is just north of 100 petaFLOPS.
Google was careful to mention that although the Pod’s performance was comparable to some of the world’s fastest supercomputers, it operates at lower numerical precision than supercomputers do. Makers of hardware for training deep neural networks (currently the dominant approach to AI) have been pushing down the precision of each calculation their machines perform in exchange for lower energy use and more efficient storage use. According to IBM Research, the most common deep-learning workloads, such as speech recognition or image classification, rarely require the high levels of precision needed for traditional supercomputer workloads like calculating space-shuttle trajectories or simulating the human heart.
Google first revealed that it had designed custom hardware accelerators for machine learning in 2016. At that point, the first-generation TPUs had already been running in its data centers for about one year, powering Google’s own services.
It announced TPU v2 the following year and TPU v3 in 2018. It followed the announcement of each generation of TPUs by launching a corresponding product that offered the hardware to customers as a cloud service.
TPU v3 was so powerful and consumed so much energy, that the company was for the first time forced to start using liquid cooling in its data centers. Introduction of the third-generation accelerator spurred a global infrastructure upgrade to retrofit Google data centers with liquid-cooling systems to support the new hardware.
A closeup of Google's Cloud TPU v3 (Credit: Google/Alphabet)
According to Google, Cloud TPUs may be useful if you have the following needs:
- Shorter time to insights—iterate faster while training large ML models
- Higher accuracy—train more accurate models using larger datasets (millions of labeled examples; terabytes or petabytes of data)
- Frequent model updates—retrain a model daily or weekly as new data comes in
- Rapid prototyping—start quickly with our optimized, open-source reference models in image segmentation, object detection, language processing, and other major application domains
The new Cloud TPU v2 Pod and Cloud TPU v3 Pod services, now in beta, can be consumed either as entire pods or as “slices” of pods.
If you just want to evaluate a 32-core v3 Pod slice, the price is $32 per hour. You can also commit to using the slice for a year for $176,601 (a 37 percent discount) or for three years for $378,432 (a 55 percent discount).
The company says you’ll have to contact one of its sales reps if you’re wondering how much a larger slice or an entire Cloud TPU Pod will cost.