Discussion RDNA4 + CDNA3 Architectures Thread

Page 447 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,774
6,757
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
8,067
3,010
146
The 5060 Ti is basically half a 5070 Ti and the 9060XT is half a 9070XT, so slotting into that same performance class as the 7700XT and 5060 Ti is kind of expected.
Fair point, I guess it is just surprising how much the xx60 parts are cut down then compared to the xx70.
 

coercitiv

Diamond Member
Jan 24, 2014
7,163
16,688
136
If I look at 9070XT vs. 4080S it seems that RDNA4 has a very similar bandwidth efficiency like Lovelace and Blackwell.
Why aren't you comparing with 5070Ti / 5080? Yeah, weird numbers don't fit.

The way Nvidia handles the bandwidth need is irrelevant for the point of this argument, we're in this situation because @reb0rn is likely unable to process performance numbers without an Nvidia reference. Imagine how awkward it would be to make performance estimates on the Blackwell thread using RDNA proxies. The 9060 XT needs to be faster than 7600 XT by a certain amount, Navi 33 is a much closer arch and therefore easier to use in estimates, the same way AD107 is a proper proxy for GB206 for anyone who is seriously entertaining napkin math on this forum.

So take the bandwidth scaling of RDNA3, especially the monotlithic Navi 33, and look at how RDNA4 improved or not with the 9070 XT. That's the most relaible way to check if there's going to be a bandiwth issue or not.
 

basix

Member
Oct 4, 2024
116
239
76
Why comparing against SKUs with excessive bandwidth? Blackwell has unlikely degraded compared to Lovelace regarding bandwidth efficiency and Blackwells bandwidth increased by more than its performance would require.

And comparing much smaller GPUs (N33 vs. N48) is also not viable, because smaller GPUs tend to be more bandwidth efficient.

So I would compare against GPUs with as close specs as possible:
- 9070XT vs. 4080S (same LLC, similar bandwidth and performance)
- 9070XT vs. 7800XT
- 9060XT vs. 7600XT
- 9060XT vs. 4060Ti 16GB

The 5080 or 5060 Ti 16GB might indicate to you, that Nvidia has worse bandwidth efficiency than RDNA4. But that is very likely not true, if you look at its predecessor. At the same time it is not true, that the 9060XT is much slower than the 5060Ti. Blackwell just has excessive bandwidth available. So I would not take Blackwell and its excessive bandwidth as performance estimate proxy, because it will be far off the real result. So it is better to take Lovelace which used GDDR6(X) with same or only little faster memory speeds.

The result:
- RDNA4 has similar bandwidth efficiency like Nvidia since Lovelace
- RDNA4 improved by about 1.35x compared to RDNA3

Disclaimer:
- These numbers do not indicate, that these GPUs work in their bandwidth limit
- Blackwell has very likely excessive bandwidth available, but this is unverified

The way Nvidia handles the bandwidth need is irrelevant for the point of this argument, [...] Imagine how awkward it would be to make performance estimates on the Blackwell thread using RDNA proxies.
Just as a note:
He directly referenced Nvidia and that their cards lack an Infinity Cache and implicitly concluded, that RDNA4 has better bandwidth efficiency than Nvidia. So what you are saying is simply wrong. It is very relevant regarding his argument. Because he compared Nvidia vs. AMD GPUs. And I did the same.

If you want to point out improvements of RDNA4, yes, RDNA3 is the reference. But comparing bandwidth efficiency against Nvidia GPUs, well, requires Nvidia GPUs as reference
 
Last edited:

adroc_thurston

Diamond Member
Jul 2, 2023
5,732
7,963
96
RDNA4 improved by about 1.35x compared to RDNA3
Not really.
RDNA3 just had faaaaar higher clock targets and never hit them.
There is no advantage for AMD in that regard (Last Level Cache).
no, AMD has a more elaborate caching setup, even after L1 became WCB.
- Blackwell has very likely excessive bandwidth available, but this is unverified
Yeah for GB202, not really anything else.
 

marees

Golden Member
Apr 28, 2024
1,103
1,552
96
next AMD RDNA5/CDNA4, that for now we calling it UDNA(we don't know how AMD gonna name it, or stay with current)
Just to avoid confusion

MI350x = CDNA 4 (Releasing now)
MI400x = CDNA 5 (releasing next year)

So AMD uncoupled CDNA & RDNA

Because of this CDNA numbers will increase every year
RDNA numbers will increase once in 2 years

(Basically our threads also need to reflect this decoupling )
 

soresu

Diamond Member
Dec 19, 2014
3,783
3,082
136
Just to avoid confusion

MI350x = CDNA 4 (Releasing now)
MI400x = CDNA 5 (releasing next year)

So AMD uncoupled CDNA & RDNA

Because of this CDNA numbers will increase every year
RDNA numbers will increase once in 2 years

(Basically our threads also need to reflect this decoupling )

Given the 'mid cycle' CDNA3 SKU and lack of any MI350A or MI400A rumours I think that their CDNA roadmap is too reactive to market pressures overall to say with any certainty whether they are moving to any fixed cycle.

My question now with CDNA5/MI4xx getting a big upgrade to its CUs with GFX12.5 then is >RDNA5 going to keep forging forward with nu hotness upgrades across all compute elements with every subsequent µArch as it has been since RDNA1?

Or is AMD going to largely freeze the general compute µArch structure of GFX13 for ROCm/HIP kernel portability with GFX12.5, and focus mostly on the domain specific RT, raster and matrix/AI/ML elements going forward until whatever time the DC µArch gets its next significant base CU upgrade a la CDNA9?

If the former then fragmentation between consumer and DC is going to persist, making their claimed "AMD needs to be a software company" pivot twice as hard to pursue even if it makes it easier to chase the bleeding edge of performance in consumer land.

AMD needs to make ROCm/HIP as accessible as possible to compete with CUDA in the long term, and significant fragmentation in basic compute capabilities makes that largely impossible unless they make ROCm hardware abstraction a lot more robust.
 

Kepler_L2

Senior member
Sep 6, 2020
814
3,294
136
lack of any MI350A or MI400A
MI400A exists
AMD needs to make ROCm/HIP as accessible as possible to compete with CUDA in the long term, and significant fragmentation in basic compute capabilities makes that largely impossible unless they make ROCm hardware abstraction a lot more robust.
They just need to adopt an IR.
 
Reactions: soresu
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |