Development

Google unveils two new TPUs designed for the "agentic era"

April 22, 2026 Development Source: Ars Technica

Share this article

Join Group Now

Facebook

So the new chips allow for faster training, but Google also says you get more useful computation for every volt you pump into a TPU 8t. The company claims a “goodpute” rate of 97 percent, which means less waiting and wasted effort. With better handling of irregular memory access, automatic handling of hardware faults, and real-time telemetry across all connected chips, TPU 8t spends more time actively advancing model training. When training is done, AI models run in inference mode to generate tokens—that’s the process happening behind the scenes when you tell a model to do something. This doesn’t require as much horsepower, so using the same hardware for both parts of the AI lifecycle is inefficient. That’s why inference is the purview of TPU 8i, which is designed to be more efficient when running multiple specialized agents, with less waiting time. TPU 8i chips also run in larger pods of 1,152 chips versus just 256 for the last-gen Ironwood inference clusters. That works out to 11.6 EFlops per pod, much lower than TPU 8t pods. Generative AI systems consume a lot of power, which is often cited as one of the primary reasons not to use them. The eighth-gen TPUs don’t exactly sip power, but Google claims the chips offer twice the performance per watt compared to Ironwood. Google also touts improvements in its data centers, which are apparently “co-designed” with TPUs. Features like integrating networking with compute on a single chip and more efficient pod layouts have reportedly increased computing power per unit of electricity by six times. Of course, that doesn’t mean data centers will use less power, just that they get more compute for all the power they use. Water usage for cooling data centers is also a big efficiency concern. The heat generated by the dense computing requirements of AI servers cannot be dissipated with air, so liquid cooling is the only way. Google has adapted its fourth-gen liquid-cooling setup to the new chips, using actively controlled valves to adjust water flow based on workload. Again, this is supposed to be more efficient.