Discussion Nvidia Blackwell in Q4-2024 ?

jpiniero · Nov 15, 2023

tamz_msc said:
16 GB for 5070/Ti (256 bit bus)

For instance, say the 5070/Ti was 160-bit. With 3 GB chips that would be 15 GB. An odd number would be unusual, sure... but that would be 5 memory chips instead of 8.

tamz_msc · Nov 15, 2023

For the GB207 that will go into the 5050 series, I guess they'll use a 128 bit bus, so 8GB for those cards. If they price it right (around $250-300), it could be a viable choice as it'll be in between 3070 and 3080 level of performance or slightly higher than the current 4060 Ti.

tamz_msc · Nov 15, 2023

jpiniero said:
For instance, say the 5070/Ti was 160-bit. With 3 GB chips that would be 15 GB. An odd number would be unusual, sure... but that would be 5 memory chips instead of 8.

Nah I don't think they'll go for those weird configurations. I wonder though if you could mod 3GB modules on cards that come with 2GB chips, like it has been done before.

jpiniero · Nov 15, 2023

tamz_msc said:
or slightly higher than the current 4060 Ti.

I would not expect much, esp at the low end. If the 5060 is more than 10% faster than the 4060 I would be surprised.

tamz_msc · Nov 15, 2023

jpiniero said:
I would not expect much, esp at the low end. If the 5060 is more than 10% faster than the 4060 I would be surprised.

It is always the case that the performance of a tier in any given generation is matched or exceeded by the corresponding card one tier lower of the subsequent generation.

jpiniero · Nov 15, 2023

tamz_msc said:
It is always the case that the performance of a tier in any given generation is matched or exceeded by the corresponding card one tier lower of the subsequent generation.

Clearly you haven't been paying attention to nVidia lately.

Edit: Keep in mind I am projecting GB205/6/7 to be substantially smaller than their Ada counterparts... and there's no Cache/IO scaling with N3E.

tamz_msc · Nov 15, 2023

jpiniero said:
Clearly you haven't been paying attention to nVidia lately.

You don't say?

Let's focus on the xx60 tier for the moment:

4060 Ti matching 3070, coming close to 3070 Ti

3060 matching 2070

2060 6GB matching 1070 Ti

All of these are launch reviews.

TESKATLIPOKA · Nov 15, 2023

tamz_msc said:
You don't say?

Let's focus on the xx60 tier for the moment:

4060 Ti matching 3070, coming close to 3070 Ti

View attachment 88901

3060 matching 2070

View attachment 88902

2060 6GB matching 1070 Ti

View attachment 88903

All of these are launch reviews.

Couldn't you make It any larger? It's barely readable. Sarcasm intended
Next time use the spoiler button.

tamz_msc · Nov 16, 2023

Here's Micron's memory roadmap:

MoogleW · Nov 16, 2023

jpiniero said:
Clearly you haven't been paying attention to nVidia lately.

Edit: Keep in mind I am projecting GB205/6/7 to be substantially smaller than their Ada counterparts... and there's no Cache/IO scaling with N3E.

Using the targetted resolutions, lets compare the chips shall we:

AD107 is faster than GA106 (4060 vs 3060) at 1080p, faster in RT and AI

MSI GeForce RTX 4060 Gaming X Review

MSI's GeForce RTX 4060 Gaming X achieves impressive noise levels that are whisper quiet and temperatures are low, too. Unlike other vendors, MSI achieves that with a compact dual-slot card, which ensures it will fit into all cases out there, and PSU requirements are minimal, too.

www.techpowerup.com

AD106 7% slower vs GA104 (4060ti vs 3070ti) at 1080p, faster in RT and AI (https://www.techpowerup.com/review/nvidia-geforce-rtx-4060-ti-founders-edition/32.html)

AD104 matches GA102 (4070ti vs 3090ti) at 1440p, faster in RT and AI (https://www.techpowerup.com/review/asus-geforce-rtx-4070-ti-tuf/32.html)

AD103 is faster than GA102 (4080 vs 3090ti) at 1440p and 4K, fasterin RT and AI
(https://www.techpowerup.com/review/nvidia-geforce-rtx-4080-founders-edition/32.html)

GA104 faster than or equal toTU102 (3070ti vs 2080ti, rtx Titan not reviewed, inferred linear performance scaling based on specs still loses to 3070ti) at 1440p, even faster in RT and AI
(https://www.techpowerup.com/review/nvidia-geforce-rtx-3070-ti-founders-edition/28.html)

GA106 and GA107 are the odd due out vs TU104 and TU106, TU104 and TU106 are larger dies with better specs in every way, so rtx 3060 loses to rtx 2080 super (TU104) and rtx 2070 (TU106), they only match in RT and AI

For rtx 5060 (assuming GB207) to fail to be faster or equal to 4060ti (AD106), either:

1)GB207 stays 24SM, which is unlikely, since Blackwell is changing the TPC in a GPC from 6 to 8. AD107 has 2GPC so 24 SMs. 2 GB207 GPC means 32SMs, not to mention architectural improvements and maybe clocks.
2)Nvidia moves 5060ti to GB207 and 5070 to GB206, 5080 to GB205 and 5080ti to GB203, I doubt it but up and down movements of SKU vs chips have happened.

MoogleW · Nov 16, 2023

None of the other chips give a good whole number with the ratio of cache to memory that 128MB gives for GB202. Either GB202 will have less relative cache or more relative cache.

If GB202 has more relative cache, then the XX203 chips and below would have same size L2 cache as rtx 40 series. I think that makes sense due to poor memory scaling, rtx 40 series basically has 4bit bus per 1MB cache. while 128MB gives 3bit per 1MB cache. The lower chips would still enjoy raw memory bandwidth upgrade from GDDR7

jpiniero · Nov 16, 2023

MoogleW said:
1)GB207 stays 24SM, which is unlikely, since Blackwell is changing the TPC in a GPC from 6 to 8. AD107 has 2GPC so 24 SMs. 2 GB207 GPC means 32SMs, not to mention architectural improvements and maybe clocks.

I believe GB205/6/7 will have less SMs compared to the Ada counterparts.

MrTeal · Nov 16, 2023

MoogleW said:
Using the targetted resolutions, lets compare the chips shall we:

AD107 is faster than GA106 (4060 vs 3060) at 1080p, faster in RT and AI

AD106 7% slower vs GA104 (4060ti vs 3070ti) at 1080p, faster in RT and AI (https://www.techpowerup.com/review/nvidia-geforce-rtx-4060-ti-founders-edition/32.html)

AD104 matches GA102 (4070ti vs 3090ti) at 1440p, faster in RT and AI (https://www.techpowerup.com/review/asus-geforce-rtx-4070-ti-tuf/32.html)

AD103 is faster than GA102 (4080 vs 3090ti) at 1440p and 4K, fasterin RT and AI

For chips, that is true. Chip numbering doesn't really matter when the cards and price points don't match up gen to gen though. We now get 94% of AD106 in the $399 4060 Ti 8G as a replacement for the 79% of GA104 in the $399 3060 Ti 8G, and performance uplift is... uninspired.

It basically splits the difference between the 3060 Ti and 3070.

We'll have to wait and see if Blackwell brings an Ampere level of performance uplift at similar cards and price points, or whether we get another Turing or Ada.

jpiniero · Nov 18, 2023

Will say that the full GB203 should be comparable or maybe even a tad faster than the 4090. What they will call it (and how close the MSRP is to the 4090's) remains to be seen.

MoogleW · Nov 19, 2023

jpiniero said:
I believe GB205/6/7 will have less SMs compared to the Ada counterparts.

How?

MoogleW · Nov 19, 2023

It

MrTeal said:
For chips, that is true. Chip numbering doesn't really matter when the cards and price points don't match up gen to gen though. We now get 94% of AD106 in the $399 4060 Ti 8G as a replacement for the 79% of GA104 in the $399 3060 Ti 8G, and performance uplift is... uninspired.

View attachment 88919

It basically splits the difference between the 3060 Ti and 3070.

We'll have to wait and see if Blackwell brings an Ampere level of performance uplift at similar cards and price points, or whether we get another Turing or Ada.

may be more expensive but the uplift will be there. That is unless you assume 5070 will take over GB206 from AD104

jpiniero · Nov 19, 2023

MoogleW said:
How?

Because I think the dies will be much smaller comparitively speaking... and with the Cache/IO having no scaling, that doesn't leave a lot of room left. And I expect that room to be taken by AI and possibly RT. And of course by cutting the number of SMs.

I don't think it will be slower in raster but I wouldn't expect much.

MoogleW · Nov 20, 2023

jpiniero said:
Because I think the dies will be much smaller comparitively speaking... and with the Cache/IO having no scaling, that doesn't leave a lot of room left. And I expect that room to be taken by AI and possibly RT. And of course by cutting the number of SMs.

I don't think it will be slower in raster but I wouldn't expect much.

While I do believe GB207 will be small focusing entirely on cost, even 32SM chip will be smaller than the current 24SM chip, especially retaining same cache and memory bus width. N3E offers up to 60% logic scaling . GB207 with 32SM, 128 bit bus and 32MB L2 cache plus GDDR6 and architectural improvements would likely still be 10% smaller than AD107.

The more Nvidia focuses on RT, the more they need CUDA cores the 'raster performance' because after calculating the ray paths and hits, the GPU needs to handle things like color and textures, denoising (outside of DLSS3.5) and resource management. It would hinder their efforts to scale hardware negatively. Unless they clock 30% better again. RT hardware improvements are not mutually exclusive to gaming performance, they intersect with gaming hardware improvements.

https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

Page 65 or slide 71 are in depth breakdowns in RT frames and the frametimes of the stages in an RT frame.
TLDR: Doubling down on improving RT performance requires more TFLOPs (which means more CUDA cores or clock very high), and more TFLOPs also helps in games.

jpiniero · Nov 25, 2023

MoogleW said:
While I do believe GB207 will be small focusing entirely on cost, even 32SM chip will be smaller than the current 24SM chip, especially retaining same cache and memory bus width. N3E offers up to 60% logic scaling . GB207 with 32SM, 128 bit bus and 32MB L2 cache plus GDDR6 and architectural improvements would likely still be 10% smaller than AD107.

I think it's going to be more than 10% smaller plus you have to account for any changes they make to the SM structure.

SteinFG · Nov 26, 2023

jpiniero said:
I think it's going to be more than 10% smaller plus you have to account for any changes they make to the SM structure.

It's probably N4 or something. Cost/Transistor is higher for better nodes, so it doesn't make sense for cost-effective part to be 3nm. It's feasable that 192-bit chips and below will use N4

blackangus · Jan 30, 2024

So true that Blackwell is delayed?
If so anyone know why? (Lack of competition?)

adroc_thurston · Jan 30, 2024

blackangus said:
So true that Blackwell is delayed?

No.

blackangus said:
If so anyone know why? (Lack of competition?)

Comp is fiercer than ever?
B100 is on track for H2'24 """launch""".

CakeMonster · Jan 30, 2024

October announcement as usual? I'm assuming the """" means you think they might delay the availability somewhat. And I don't really care about the AI chips, I'm thinking of the **90 desktop part as most people here probably.

GDDR7 speculation

blackangus · Jan 30, 2024

adroc_thurston said:
Comp is fiercer than ever?
B100 is on track for H2'24 """launch""".

I am talking consumer, not DC. Should have clarified. =)

adroc_thurston · Jan 30, 2024

blackangus said:
I am talking consumer, not DC. Should have clarified. =)

Neither is delayed.

Discussion Nvidia Blackwell in Q4-2024 ?

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Member

Member

Lifer

Diamond Member

Lifer

Member

Member

Lifer

Member

Lifer

Senior member

Member

Platinum Member

Golden Member

Member

Platinum Member