Discussion Nvidia Blackwell in Q4-2024 ?

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
16,478
10,500
106
Question: is consumer Blackwell also going to be a multi-chip solution? If so, that would make it very expensive since the wafer yield will essentially get halved.
 

Aapje

Golden Member
Mar 21, 2022
1,395
1,885
106
That's not how yield works. Having two smaller chips instead of one bigger one gives better yields per mm2. The only issue is that the packaging may itself have yield issues.

But it's almost certainly going to be a monolith again.
 

MrTeal

Diamond Member
Dec 7, 2003
3,572
1,710
136
Predictions?

GB202 4NP (N4P) 800mm2~ mono
192SM
128MB L2
512bit 28Gb/s 1792GB/s
3ghz real in game boost.
To get those specs it would have to be 800mm²; that's effectively +33% over AD102 on basically the same node even if the Blackwell SMs don't use more transistors.

If your predictions are accurate, it'll be interesting to see how cut down consumer RTX 5090 is.
 
Jul 27, 2020
16,478
10,500
106
Are you confusing the just announced server Blackwell with consumer Blackwell? Because he was asking about the latter.
If the consumer Blackwell is a huge monolith, that's bad for their yields and it will be very expensive and hard to cool as a result.
 

MoogleW

Member
May 1, 2022
56
28
61
Question: is consumer Blackwell also going to be a multi-chip solution? If so, that would make it very expensive since the wafer yield will essentially get halved.
Chiplets are cheaper at a given total die size than monolithic. Unfortunately for AMD, price is determined by performance, which cost them equal to or more than they saved on chiplets to price match/beat the equivalently performing Nvidia GPU. However, theoretically a chip like B100 that presents as one GPU may scale well enough that the RDNA3 priciing problem is avoided
 

MoogleW

Member
May 1, 2022
56
28
61
To get those specs it would have to be 800mm²; that's effectively +33% over AD102 on basically the same node even if the Blackwell SMs don't use more transistors.

If your predictions are accurate, it'll be interesting to see how cut down consumer RTX 5090 is.
4NP is about 30% denser than 4N when comparing B100 to H100.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
4NP is about 30% denser than 4N when comparing B100 to H100.

Nope, N4P only has 6% higher density vs N5

If they use N4P for consumer Blackwell , they will have to increase the die size to 800mm2 to get any meaningful performance increase over ADA.
 

MoogleW

Member
May 1, 2022
56
28
61
Nope, N4P only has 6% higher density vs N5

If they use N4P for consumer Blackwell , they will have to increase the die size to 800mm2 to get any meaningful performance increase over ADA.
I know about N4P, but I am talking about 4NP. The custom process Nvidia is using and comparing H100 vs B100.

Even if the gains are not the process but the design, it would still translate to GB202
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
I know about N4P, but I am talking about 4NP. The custom process Nvidia is using and comparing H100 vs B100.

Even if the gains are not the process but the design, it would still translate to GB202

I have the feeling that the 30% higher density is only about the bigger die size since both ADA and Blackwell are using the same process.

B100 seems to be close to 1000mm2 to 1100mm2 850mm2 in die size (per die).
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,572
1,710
136
It's hard to tell from the keynote, but when Jensen was holding up the two it certainly didn't look like B100 was much larger if at all than H100. 30% higher average density than Hopper does seem in the realm of possibility if it's 104B transistors. I'm not photoshop expert, but just measuring the clearest die shot I could find from the keynote had the two die B100 complex as 51mmx33mm using the 11x11 HBM3 as a guide. If my crappy measurements were accurate, that'd be 123.7M/mm².
Hopper's not very dense though, it's 98.3M/mm² while AD102 is 125.4M.mm². B100 being 104B at the same die size as H100 would make it very slightly more dense than AD102. It might be more than a little hopeful to think 4NP GB202 will be +30% over 4N AD102.

It'd be interesting to know how they're counting transistors and the average density of the different areas, especially the new interchip interface. B100 only has 4 memory controllers vs 6 in H100 which might help there.
 

Hitman928

Diamond Member
Apr 15, 2012
5,339
8,108
136
It's hard to tell from the keynote, but when Jensen was holding up the two it certainly didn't look like B100 was much larger if at all than H100. 30% higher average density than Hopper does seem in the realm of possibility if it's 104B transistors. I'm not photoshop expert, but just measuring the clearest die shot I could find from the keynote had the two die B100 complex as 51mmx33mm using the 11x11 HBM3 as a guide. If my crappy measurements were accurate, that'd be 123.7M/mm².
Hopper's not very dense though, it's 98.3M/mm² while AD102 is 125.4M.mm². B100 being 104B at the same die size as H100 would make it very slightly more dense than AD102. It might be more than a little hopeful to think 4NP GB202 will be +30% over 4N AD102.

It'd be interesting to know how they're counting transistors and the average density of the different areas, especially the new interchip interface. B100 only has 4 memory controllers vs 6 in H100 which might help there.

We need to pump the breaks hard on the whole +30% density line. This doesn't come from NV or TSMC, it's from a twitter leaker who claims it without any source or explanation of how he gets to that number. Most likely, he just did what you did here and compared a rough B100 density calculation to H100 but, as you mention AD102, is just as dense despite it being on the "old" 4N node. What is on the chip (how much cache, IO, memory controllers, libraries used, etc.) and how you count the transistors makes a huge difference in density calculations. NV is not getting a 30% density improvement from the process tweak. Even if they went from an N5 based node to an N4P based node with additional tweaks, you're looking at like 7% max.
 
Reactions: Executor_

MrTeal

Diamond Member
Dec 7, 2003
3,572
1,710
136
We need to pump the breaks hard on the whole +30% density line. This doesn't come from NV or TSMC, it's from a twitter leaker who claims it without any source or explanation of how he gets to that number. Most likely, he just did what you did here and compared a rough B100 density calculation to H100 but, as you mention AD102, is just as dense despite it being on the "old" 4N node. What is on the chip (how much cache, IO, memory controllers, libraries used, etc.) and how you count the transistors makes a huge difference in density calculations. NV is not getting a 30% density improvement from the process tweak. Even if they went from an N5 based node to an N4P based node with additional tweaks, you're looking at like 7% max.
Yeah, the two tweets really seem like kopite7kimi is just saying B100 is 30% denser than H100, which seems to be likely the case. Even his followup tweet indicates you wouldn't think 4NP is +30% over 4N and he doesn't know 4NP.

 

KompuKare

Golden Member
Jul 28, 2009
1,028
972
136
We need to pump the breaks hard on the whole +30% density line. This doesn't come from NV or TSMC, it's from a twitter leaker who claims it without any source or explanation of how he gets to that number. Most likely, he just did what you did here and compared a rough B100 density calculation to H100 but, as you mention AD102, is just as dense despite it being on the "old" 4N node. What is on the chip (how much cache, IO, memory controllers, libraries used, etc.) and how you count the transistors makes a huge difference in density calculations. NV is not getting a 30% density improvement from the process tweak. Even if they went from an N5 based node to an N4P based node with additional tweaks, you're looking at like 7% max.

I guess one thing about density is this:

So despite N4 being "better" than N5 (although it is not as custom as all that), Navi 31's GCD is far denser.

And tellingly the MCD's are far less dense. So never mind yields, but aside from power having the memory controllers and cache on older nodes is a very big thing. Just a pity that AMD were so conservative with their implementation as a 400mm² plus GCD would have made far more sense to get the halo effect. Certainly seems a far saner strategy to me than insisting that Navi 32 had to be chiplet too.
 

MoogleW

Member
May 1, 2022
56
28
61
I guess one thing about density is this:

So despite N4 being "better" than N5 (although it is not as custom as all that), Navi 31's GCD is far denser.

And tellingly the MCD's are far less dense. So never mind yields, but aside from power having the memory controllers and cache on older nodes is a very big thing. Just a pity that AMD were so conservative with their implementation as a 400mm² plus GCD would have made far more sense to get the halo effect. Certainly seems a far saner strategy to me than insisting that Navi 32 had to be chiplet too.
The parts AMD removed are notoriously not scaleable on lower process so you are right imo, this would have reduced the density figures if it were monolithic.

What is interesting is Nvidia doubled the cache of B100 per die and added 20% more cores at a similar size so there is still an apreciable improvement, whatever it is
 
Last edited:

SteinFG

Senior member
Dec 29, 2021
425
480
106
What is interesting is Nvidia doubled the cache of B100 per die and added 20% more cores at a similar size so there is still an apreciable improvement, whatever it is
Any source for 2X cache per die? Can't find it
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Just created an excel with the specs of B200, B100, H200 and MI300X. If you find any mistakes pls point out to correct it.

 
Reactions: Elfear

biostud

Lifer
Feb 27, 2003
18,253
4,771
136
I guess one thing about density is this:

So despite N4 being "better" than N5 (although it is not as custom as all that), Navi 31's GCD is far denser.

And tellingly the MCD's are far less dense. So never mind yields, but aside from power having the memory controllers and cache on older nodes is a very big thing. Just a pity that AMD were so conservative with their implementation as a 400mm² plus GCD would have made far more sense to get the halo effect. Certainly seems a far saner strategy to me than insisting that Navi 32 had to be chiplet too.
But was the design of the chip too dense to reach the desired clocks?
 
Reactions: Tlh97 and KompuKare
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |