Discussion RDNA4 + CDNA3 Architectures Thread

Page 58 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,659
6,101
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

blckgrffn

Diamond Member
May 1, 2003
9,179
3,144
136
www.teamjuchems.com
RDNA2 is XSX actually. PS5 is somewhere between RDNA1 and 2. Sony seems to be more eager to have its own input in the silicon (think Kraken) whereas Microsoft's console chips are closer to what AMD puts on the market as well.
Based on what I have read, its definitely closer to RDNA2 than RNDA1. There is what, one feature discrepancy where Sony did it "their way" and its the difference of the standard RDNA2 "Mesh Shaders" as used in the PC parts and the XSX vs the Sony implementation of "Primitive Shaders" used only in the PS5. And even then, it appears that Mesh Shaders are an abstraction, built on Primitive Shaders. Mesh Shaders appear to be part of DX12U and that's likely why we see this advertised on PC and XSX whereas the PS5 presumably isn't using DX12U (ha).

The PS5 GPU is essentially a 6700 non-xt. Heck, it probably even has the ability to use mesh shaders but its not exposed via an API you'd have to put it together on top of the Primitive Shaders which the AMD drivers apparently do. That's likely what Alan Wake 2 did.

What other major differences are there?

Thread for reference:

"Primitive shader is not the older version of Mesh Shaders. Primitive shader was proposed as the standard by AMD in 2017 while 2018 Nvidia proposed their implementation which Microsoft adopted in 2019 into DX12U in the form of Mesh Shaders. Primitive shaders still exist in AMD GPUs starting from Vega to RDNA 3.

On AMD GPUs Primitive shaders is what enables Mesh Shaders. How it functions depends on what API you are using it with. In DX12 it functions as Mesh Shaders, but it is the same Primitive Shaders in all AMD GPUs.

Mr. Wang
Certainly, Mesh Shader was adopted as standard in DirectX 12. However, the new geometry pipeline concept originally started with the concept of tidying up the complicated geometry pipeline, making it easier for game developers to use, and to make it easier to extract performance. In other words, it can be said that both AMD and NVIDIA had the same goal as the starting point of the idea. To put it bluntly, Primitive Shader and Mesh Shader have many similarities in terms of functionality, although there are differences in implementation.
So did AMD abandon the Primitive Shader? As for hardware, Primitive Shader still exists, and how to use Mesh Shader is realized with Primitive Shader , it corresponds to Mesh Shader with such an image.

Mr. Wang
Primitive Shader as hardware exists in everything from Radeon RX Vega to the latest RDNA 3-based GPU. When viewed from DirectX 12, Radeon GPU's Primitive Shader is designed to work as a Mesh Shader."



As for next gen implications, it will be interesting to see what's truly in the PS5 Pro. It's looking like a more custom, hyrbrid of RDNA3 and RDNA4, pulling in some RT specific RDNA4 hardware on what otherwise appears to be downclocked 7800XT in the dev kits. That will be significantly more custom that the GPU in the PS5, IMO. Since it might be coming out about the same time as RNDA4 is launching, it will be nice to have those bits being as advanced possible.

I am also interested if there will be any IC, based on the memory bandwidth numbers on the dev kits it doesn't seem like it, which is a bit of a mystery since its such a great performance per watt uplift and these consoles seem pretty optimized on that front in other ways.

6700 --> 7800XT+ might not set anyone's hair on fire here, but it should really bring native 4K/30FPS/RT gaming to some sort of reality.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,659
6,101
136
GDS finally removed in GFX12 (not in GFX11, as per rumors)
Significant changes in ISA incoming due to this only. Now Shaders can only export to RB+ or to Primitive Units. ROOE should finally work this time. It's mostly deactivated on GFX11 due to texture corruption.
New Cache policy dependent load store is the other interesting change thus far.
 

branch_suggestion

Senior member
Aug 4, 2023
278
599
96
GDS finally removed in GFX12 (not in GFX11, as per rumors)
Significant changes in ISA incoming due to this only. Now Shaders can only export to RB+ or to Primitive Units. ROOE should finally work this time. It's mostly deactivated on GFX11 due to texture corruption.
New Cache policy dependent load store is the other interesting change thus far.
This is one area where AMD/NV have different ideas, both in regards for client GPU and DC accelerator GPU. Some convergence and some divergence. AMD has gone to a very simple solution of MALL being the point of coherence.
 

soresu

Platinum Member
Dec 19, 2014
2,883
2,092
136
Mesh Shaders appear to be part of DX12U
An equivalent function now exists in Vulkan too, albeit not part of any fixed Vulkan version increment as yet.

This coming January will be 2 years since the release of VK 1.3 so we may see some action on that front vis a vis DX12U equivalence in a fixed standard.
 

moinmoin

Diamond Member
Jun 1, 2017
4,993
7,763
136
Based on what I have read, its definitely closer to RDNA2 than RNDA1.
It certainly is. I guess a better way to put the difference between Sony's and Microsoft's approach is that Sony appears to make itself much more involved in the development, sometimes making slightly different choices (like sticking with primitive shaders as you mentioned), whereas Microsoft gladly takes whatever the end result (which "naturally" will be built around a DX implementation anyway).

What's telling is the GFX ID which are chronological. PS5 should be GFX1013, so it originally started as an RDNA1 implementation that then went through the development of all the newer higher IDs. XSX has GFX1020 (based on RDNA2), so Microsoft likely picked a ready and done implementation instead.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,659
6,101
136
This is one area where AMD/NV have different ideas, both in regards for client GPU and DC accelerator GPU. Some convergence and some divergence. AMD has gone to a very simple solution of MALL being the point of coherence.
Device level coherence surely based on MALL (at least for MI300A) and system level coherence using IF/MALL.
But the regular Graphics pipeline relies on L1 and L2 crossbars mainly at SE level.
Removal of GDS is interesting because there is simply a lot of opcodes involving GDS.
There are new instructions for sync/barrier/fence in the patches though.

The dynamic VGPR allocation would be interesting if true.
VOPD hardly changed, same restrictions like GFX11. Was hoping for additional register banks for true dual issue but seems not the case.
 

Ajay

Lifer
Jan 8, 2001
15,959
8,068
136
The dynamic VGPR allocation would be interesting if true.
VOPD hardly changed, same restrictions like GFX11. Was hoping for additional register banks for true dual issue but seems not the case.
Hmm, seems like AMD is leaving some 'free' performance on the table for some reason. I wonder what is restricting the dual issue some much - NV doesn't seem to have this problem.
 

Saylick

Diamond Member
Sep 10, 2012
3,361
7,044
136
Device level coherence surely based on MALL (at least for MI300A) and system level coherence using IF/MALL.
But the regular Graphics pipeline relies on L1 and L2 crossbars mainly at SE level.
Removal of GDS is interesting because there is simply a lot of opcodes involving GDS.
There are new instructions for sync/barrier/fence in the patches though.

The dynamic VGPR allocation would be interesting if true.
VOPD hardly changed, same restrictions like GFX11. Was hoping for additional register banks for true dual issue but seems not the case.
This is a bummer. I was reading through C&C's RDNA 3 microbenchmarking article and they state that VOPD instructions are not used as often as they could be due to an unoptimized compiler. In the example code they presented, there were some obvious situations where a human would easily see the dual-issue opportunity that the compiler missed. Knowing that AMD's software team is generally not comparable to the competition, I am not hopeful for future VOPD optimizations.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,415
1,733
136
The dynamic VGPR allocation would be interesting if true.
VOPD hardly changed, same restrictions like GFX11. Was hoping for additional register banks for true dual issue but seems not the case.

I don't think we can derive info from what's unchanged yet. It seems to me that they started by copying the GFX11 stuff, and now are gradually doing changes to it. Any part that is as of yet unchanged might just be something they haven't gotten to yet, while anything that has been changed probably reflects an actual real change.
 

PJVol

Senior member
May 25, 2020
600
533
136
Meanwhile...
.


So, assuming this isn't just someone's hopeful fantasy, could it be that the ASIC known as "Navi4C" (or whatever it called now) was originally planned for the RDNA5 launch schedule?
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
Is It even worth It to post It here? It's pure speculation based on N4C and not even a good one.

Just looking at the second one It's clear he is talking nonsense. There is no reason for AMD to have two different SEDs, or rather for that second one to be using 9 smaller SEDS instead of 7 bigger ones in my opinion.

Regardless, If AMD releases such a product, I have to wonder how the other models would look like. 3-6-9 SEDS? Other combinations? Based on performance or?
BTW, even 3 SEDs would have 25% more WGPs compared to N31 in case It's 20WGPs per SED.
 
Last edited:
Reactions: Mopetar and Bigos

PJVol

Senior member
May 25, 2020
600
533
136
Is It even worth It to post It here? It's pure speculation based on N4C and not even a good one.
Why not? Isn't that what this thread is for?
Just looking at the second one It's clear he is talking nonsense. There is no reason for AMD to have two different SEDs, or rather for that second one to be using 9 smaller SEDS instead of 7 bigger ones in my opinion.
Are you sure that your opinion is backed up enough tech-wise so as not to look plain stupid later?
 
Reactions: SteinFG

Tuna-Fish

Golden Member
Mar 4, 2011
1,415
1,733
136
Are you sure that your opinion is backed up enough tech-wise so as not to look plain stupid later?

I'd back him up on this. The big win of having a separate SED that you tile to make products is that on the leading edge nodes, chip design is freaking expensive. By only having a single such design and then duplicating it in products, you save tons of money. There's no way they are making two fairly similar ones.

If they are unsure about how high to scale the high-end product at this point, what they are uncertain about is how many of the SEDs they are going to pack in, not about how much resources each one should have.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
Why not? Isn't that what this thread is for?
Is RTG even worth our time? How many times was he correct? BTW, he is already backpedaling in what you posted in case he is wrong.
Are you sure that your opinion is backed up enough tech-wise so as not to look just stupid later?
Are you saying It makes more sense to use 9 smaller SEDs than 7 bigger SEDs for basically the same amount of WGP(135 vs 140)? And AMD would also need to design another SED for that, that's not cheap and doesn't look like it's really needed.
So is what I wrote really so stupid? To me It looks more realistic than what he wrote.
 
Last edited:

PJVol

Senior member
May 25, 2020
600
533
136
There's no way they are making two fairly similar ones.
And AMD would also need to design another SED for that, that's not cheap and doesn't look like it's really needed.
If you both bothered to watch the video, it says that the info comes from different sources, or rather, one of them is more up-to-date.
Is RTG even worth our time? How many times was he correct?
Our? Anyway, it's your own business. Personally, I don't see how his info is less relevant than most people's guesswork here.
 
Last edited:
Reactions: SteinFG

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
If you both bothered to watch the video, it says that the info comes from different sources, or rather, one of them is more up-to-date.
RTG has sources?

I am very skeptical. I wouldn't be surprised If his sources are actually tech forums like this one.

Our? Anyway, it's your own business
Didn't mean exactly you but others. I don't think I am the only one who doesn't believe in him having any real sources.

You can believe him If you want, It's your own business, I said what I think about him or this info of his.
 
Last edited:
Reactions: Saylick

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
There is plenty of possibilities from engineering perspective.

9 SEDs gives you 3x3 array. What gives you 7 SEDs? How would you arrange them?
With 2 different SED It would look like this:
9 big SEDs in a 3x3 array -> 180 WGPs (+33.3%)
9 small SEDs in a 3x3 array -> 135 WGPs (+12.5%)
6 big SEDs in a 3x2 or 2x3 array -> 120 WGPs (+33.3%)
6 small SEDs in a 3x2 or 2x3 array -> 90 WGPs (+12.5%)
4 big SEDs in a 2x2 array -> 80 WGPs (+33.3%)
4 small SEDs in a 2x2 array -> 60 WGPs (duplicate)
3 big SEDs in a 3x1 or 1x3 array -> 60 WGPs (+33.3%)
3 small SEDs in a 3x1 or 1x3 array -> 45 WGPs (+12.5%)
2 big SEDs in a 2x1 or 1x2 array -> 40 WGPs (+33.3%)
2 small SEDs in a 2x1 or 1x2 array -> 30 WGPs (100%)
What SKUs would you make out of this? Performance jump would be uneven, either too big or pretty small, there is even a duplicate there. I personally wouldn't bother designing another SED for this.

BTW, why should we be limited to 3x3, 2x3, 1x3, 2x2, 2x1 arrays?
There is nowhere written that you can't have 1,2,3,4,5,6,7,8 or 9 SEDs.
Why can't I use 7 SEDs for example? Because It doesn't look pretty? 7900XT or 7700XT also have an uneven number of MCDs and no one cares.
It's not like you can't have a single SED in the last row(column) and instead you are forcing yourself to design another SED.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,753
4,659
136
BTW, why should we be limited to 3x3, 2x3, 1x3, 2x2, 2x1 arrays?
There is nowhere written that you can't have 1,2,3,4,5,6,7,8 or 9 SEDs.
Why can't I use 7 SEDs for example? Because It doesn't look pretty? 7900XT or 7700XT also have an uneven number of MCDs and no one cares.
It's not like you can't have a single SED in the last row(column) and instead you are forcing yourself to design another SED.
Most likely - interconnects, and how they are placed on the dies.

If I understand this correctly - its possible to reuse the SEDs on different types of products, so AMD would want to aim for best possible interconnect for the dies.

Single die could be then used in an APU, the same way they could be used for dGPUs. So AMD has to design ONE die to scale it from APUs to the dGPUs. But it will require stupidly complex interconnect.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
Most likely - interconnects, and how they are placed on the dies.
A 7 SED GPU would have the same placement as a 9 SED one, but 2 chips will be missing in one row or column.
Basically the same as we already have with RDNA3 MCDs.
If the problem is with the AID under those SEDs, then that would mean you are basically limited to 3-6-9 SEDs, then why not do just SEDs with 3x more WGPs for 1-2-3 SEDs in total?

For any different combination you would need a different AID(active interposer die) and If you add a different SED, then even for that.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,753
4,659
136
A 7 SED GPU would have the same placement as a 9 SED one, but 2 chips will be missing in one row or column.
Basically the same as we already have with RDNA3 MCDs.
If the problem is with the AID under those SEDs, then that would mean you are basically limited to 3-6-9 SEDs, then why not do just SEDs with 3x more WGPs for 1-2-3 SEDs in total?
View attachment 90970
For any different combination you would need a different AID(active interposer die) and If you add a different SED, then even for that.
I presume you need coherency of data, so the layout has to be symmetric.

Potentially this may be requirement for scalable geometry, if you think about it.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |