Future Tech

Nvidia reportedly delays Blackwell GPUs until 2025 over packaging issues

Tan KW
Publish date: Mon, 05 Aug 2024, 09:55 PM
Tan KW
0 462,670
Future Tech

Nvidia is understood to be delaying shipments of its Blackwell GPUs until the first quarter of 2025, and it appears the problems may be due to the complexity of the chip-on-wafer-on-substrate (CoWoS) packaging tech that TSMC is using to manufacture the next-gen hardware.

The GPU giant recently informed Microsoft about delays affecting the most advanced models in the Blackwell family, according to The Information. We have asked Nvidia for confirmation.

The issue could mean that volume shipments of chips such as the Blackwell B200 will be delayed by three months or more, disrupting the plans of customers such as Microsoft and Meta, which have reportedly placed orders worth billions of dollars for the new GPUs to drive their AI services.

It also means that Nvidia may have to cancel or postpone some products, in order to focus available supply of silicon on those it considers the highest priority.

The main issue behind these delays to GPU shipments is related to Nvidia's design of the Blackwell architecture, according to a report from semiconductor research firm SemiAnalysis. Specifically, Blackwell is the first high volume design to use the CoWoS-L packaging technology from TSMC, Nvidia's chip manufacturer.

CoWoS is a way of enabling more complex and advanced products to be engineered using chiplets that are interconnected, typically a system-on-chip (SoC) and one or more high bandwidth memory (HBM) chiplets.

However, CoWoS-L is a whole different level of complexity over CoWoS-S, where the chiplets are mounted on a relatively simple silicon interposer.

CoWoS-L instead uses an organic interposer that acts as a redistribution layer (RDL) to route signals between the chiplets on top, making use of local silicon interconnects (LSIs) and bridge dies that are embedded in the interposer.

An organic interposer is required in order to scale CoWoS packages larger than AMD's MI300 GPU, SemiAnalysis says, as silicon is brittle and handling very thin silicon interposers gets harder as the interposer gets larger. The LSIs and bridge dies help to compensate for the poorer electrical performance of the organic interposer.

Yet a number of issues have emerged with the technology, the analyst says. One is that embedding multiple silicon bridges in the interposer can cause a thermal expansion mismatch between the silicon dies, bridges, organic interposer, and substrate, leading to a warping of the substrate that can break connections.

However, the main causes of the delay are the bridge dies, which are thought to need a redesign, according to the SemiAnalysis report, along with a redesign of the top few global routing metal layers and bump out of the Blackwell die itself.

And there is the issue of TSMC not having enough CoWoS packaging capacity to meet demand, as has been reported numerous times. The problem here is that TSMC built up CoWoS-S capacity over the last couple years, largely to service Nvidia, but now the GPU maker is switching its products to CoWoS-L, SemiAnalysis claims.

While TSMC is building a new fab for CoWoS-L production, the semiconductor contract manufacturer urgently needs to convert its old CoWoS-S capacity in order to be able to keep up with demand.

In the meantime, Nvidia has to make choices about how to use the supply available to it from TSMC. Consequently, Semi says it believes the company is focusing almost entirely on the GB200 NVL36/72 rack scale systems, and that HGX form-factors with the B100 and B200 are "effectively now being cancelled outside of some initial lower volumes."

In order to satisfy demand, Nvidia will also bring to market a Blackwell GPU called the B200A, based on the B102 die that is additionally earmarked for Nvidia's "China special" B20 GPU. This B102 is a single monolithic die with 4 stacks of HBM that allows the chip to be packaged on CoWoS-S instead of CoWoS-L, according to SemiAnalysis.

All of this is unlikely to hurt Nvidia too much. Financial news site Barron's says that the GPU maestro may find a few billion dollars in revenue arrive in early 2025 instead of late 2024, but that customers still can't get all the Hopper chips they want, so the company could just crank out more of those.

However Nvidia may face further problems with the B20. According to a report by the South China Morning Post, Washington is considering a further tightening of export restrictions that would prevent the new GPU being sold in China, its intended market.

Late last year, Secretary of Commerce Gina Raimondo warned that the US would have to keep tightening restrictions to prevent its export controls on AI-capable chips being circumvented.

"If you redesign a chip around a particular cut line that enables them to do AI, I'm going to control it the very next day," she said at the time. ®

 

https://www.theregister.com//2024/08/05/nvidia_delays_blackwell_gpus_until/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment