Google Cloud Unveils TPU 8t and TPU 8i to Challenge Nvidia
Image: 조선일보

Google Cloud Unveils TPU 8t and TPU 8i to Challenge Nvidia

22 April, 2026.Technology and Science.11 sources

Key Takeaways

  • Google Cloud introduces TPU 8t for training and TPU 8i for inference.
  • Eighth-generation TPUs split into training-focused 8t and inference-focused 8i.
  • Both chips will be available later this year.

TPU 8t and TPU 8i

Google Cloud on Wednesday announced that its eighth generation of custom-built AI chips, or tensor processing units (TPUs), will be split into two dedicated versions: one for model training and another for inference.

Close × Google Cloud Debuts New AI Chips, Tools for Building Agents by Ian King of Bloomberg News, 4/22/26 × Membership required Membership is now required to use this feature

Advisor PerspectivesAdvisor Perspectives

TechCrunch reports that one chip, named the TPU 8t, is geared for model training, while the TPU 8i is aimed at inference, which TechCrunch defines as “the ongoing usage of models, aka what happens after users submit prompts.”

Image from Barron's
Barron'sBarron's

The same TechCrunch account says Google touts performance improvements versus previous generations, including “up to 3x faster AI model training,” “80% better performance per dollar,” and the ability to get “1 million+ TPUs to work together in a single cluster.”

The Motley Fool adds that in a blog post released on Wednesday, Amin Vahdat, senior VP and chief technologist for AI and infrastructure, unveiled the eighth generation of Google’s TPU and “announced two distinct architectures -- one for AI training and the other for AI inference.”

CNBC similarly frames the move as separating training and inference into distinct processors for the eighth generation of the TPU, with both chips “become available later this year.”

Advisor Perspectives, drawing on Google’s event, describes the lineup as two versions where “The TPU 8t is tailored for creating artificial intelligence software, while the TPU 8i is designed to run AI services after they’ve been created — a stage known as inference.”

Across the coverage, the central theme is that Google is treating training and serving as different engineering problems and building its silicon accordingly, rather than keeping a single chip line that does both.

Why Google is splitting

Multiple outlets tie Google’s chip split to a shift in what AI workloads demand, especially as inference becomes more central.

Business Insider describes the “AI battleground” as shifting from training to inference, calling it “the process of actually running the models once they're deployed,” and says Google’s new TPU 8t and TPU 8i are designed to match that change.

Image from Business Insider
Business InsiderBusiness Insider

In the same reporting, Google Cloud CEO Thomas Kurian is quoted saying the decision to create two new chips is a “natural evolution,” and he links the design to power constraints, saying the chips were designed to be efficient in how much power they use “because we felt that power efficiency would become a constraint as people continue to scale both training and inference.”

The News International similarly reports that Google Cloud CEO Thomas Kurian called the split a natural evolution and cited “power efficiency as a core design constraint,” while also stating that “Both chips are expected to be available later this year.”

Los Angeles Times adds a broader rationale from Google Chief Scientist Jeff Dean, who said in an interview, “it now becomes sensible to specialize chips more for training or more for inference workloads,” and he added, “We are looking at a whole bunch of different things,” including “the speed of AI results it wants to enable.”

The same Los Angeles Times piece notes that Nvidia’s GPUs remain the “gold standard for AI, particularly for training more advanced models,” but it frames the competitive push around inference uses like “chips meant to cut down response times for chatbots and AI agents.”

Taken together, the reporting portrays Google’s split as a response to latency, power, and scaling pressures that become more acute when models are served continuously rather than trained in bulk.

Performance claims and architecture

Google’s chip split is accompanied by detailed performance and design claims that different outlets highlight in different ways.

- Google is for the first time splitting its AI chips into two lines

Business InsiderBusiness Insider

TechCrunch says Google’s TPU 8t and TPU 8i deliver “up to 3x faster AI model training,” “80% better performance per dollar,” and the ability to coordinate “1 million+ TPUs to work together in a single cluster.”

The Motley Fool adds more granular comparisons, stating that TPU 8t is designed to reduce the time it takes to develop frontier models from “months to weeks,” and it attributes that to “3 times the compute performance, 10 times faster storage access, and double the chip data transfer rate than its predecessor.”

For inference, The Motley Fool says TPU 8i “comes equipped with more memory to reduce the inherent latency (lag) in interactions between AI agents,” and it adds that the processor combines high-bandwidth memory (HBM) with “3 times the amount of static random-access memory (SRAM).”

CNBC provides a specific hardware detail, saying “Each chip contains 384 megabytes of SRAM, triple the amount in Ironwood,” and it also reports that Google’s inference chip, dubbed TPU 8i, relies on SRAM.

Advisor Perspectives describes how the training chip can be combined into groups of “9,600 semiconductors,” and it quantifies efficiency gains, saying “TPU 8t delivers 124% more performance per watt than the preceding generation, with TPU 8i providing a gain of 117%.”

Even where outlets differ in which metrics they foreground, the common thread is that Google is positioning TPU 8t for compute-intensive training throughput and TPU 8i for latency-sensitive inference, with explicit claims about memory bandwidth, SRAM, and scaling.

Nvidia relationship and collaboration

While Google is positioning the TPU 8 generation as a challenge to Nvidia, the reporting repeatedly emphasizes that Google is not cutting Nvidia out of its stack.

TechCrunch says Google’s chips are “not a full frontal assault on Nvidia’s future,” and it adds that Google is using its TPUs “to supplement the Nvidia-based systems it offers in its infrastructure,” not replace them.

Image from CNBC
CNBCCNBC

TechCrunch also reports that Google promises its cloud will have Nvidia’s latest chip, Vera Rubin, “available later this year,” and it says Google and Nvidia have agreed to work together to engineer computer networking so Nvidia-based systems can perform more efficiently in Google’s cloud.

In particular, TechCrunch says the two are working to beef up the software-based networking tech called Falcon, which Google created and open sourced in 2023 under the Open Compute Project.

The Motley Fool similarly notes that Alphabet remains heavily reliant on Nvidia’s GPUs and calls Google “among the chipmaker's largest customers,” while also describing how Google introduced the first version of its TPU in 2016 and still depends on Nvidia for heavy lifting.

Los Angeles Times reinforces the coexistence, saying Google “relies on a mix of TPUs and GPUs for its own work,” and it quotes Demis Hassabis of Google DeepMind telling Bloomberg, “A lot of people would like to run on both.”

Across these accounts, the competitive narrative is paired with a partnership narrative: Google is building more of its own silicon while still integrating Nvidia hardware and networking to keep customers on a broader, mixed platform.

Adoption, tools, and next steps

Google’s TPU 8 announcement is also presented as part of a wider push to make AI services cheaper, faster, and easier to deploy—especially for agentic systems that require low latency and ongoing execution.

After years of producing chips that can both train artificial intelligence models and handle inference work, Google is separating those tasks into distinct processors, its latest effort to take on Nvidia in AI hardware

CNBCCNBC

Advisor Perspectives says Google’s Google Cloud division unveiled the latest TPU generation “at its Google Cloud Next event,” where it also announced “a $750 million fund to help boost corporate AI adoption” and showed off “tools for building AI agents.”

Image from Los Angeles Times
Los Angeles TimesLos Angeles Times

It adds that the new TPUs “store more information on the chip, helping provide the rapid responses that users crave,” and it quotes Mark Lohmeyer saying, “It’s about how you deliver the lowest possible latency of the response at the lowest possible cost per transaction,” and that “The number of transactions is going way up, and the cost per transaction needs to go way down for it to scale.”

TechCrunch similarly frames the chips as enabling “a lot more compute for a lot less energy — and cost to customers — than previous versions,” while also stating that Google is not flat-out replacing Nvidia and is promising access to Vera Rubin later this year.

Business Insider and Los Angeles Times both describe how demand is already building among major AI developers, with Los Angeles Times saying leading artificial intelligence developers, including “some of the firm’s biggest rivals,” are “stocking up on them,” and it cites Anthropic’s expanded agreement to access as many as “1 million TPUs.”

The Los Angeles Times account also says Meta signed a multibillion-dollar deal to use TPUs through Google Cloud over several years and that Meta’s head of infrastructure, Santosh Janardhan, said, “It does look like there might be inference advantages,” while noting “no new platform is without hurdles and a learning curve.”

Looking ahead, Advisor Perspectives says AI services built on the chips will be “generally available later this year,” and it also reports that Google will continue to offer services based on Nvidia chips for customers who want systems that “currently dominate AI computing.”

More on Technology and Science