Discussion RDNA4 + CDNA3 Architectures Thread

Page 102 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,623
5,894
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:
Jul 27, 2020
16,809
10,747
106
It just shows that most humans are risk-averse and change-averse. People pretending to be wise by going with the popular or trendy thing rather than experimenting and hoping to discover something new. Blindly accepting common misconceptions and falsehoods. No wonder humanity has screwed planet Earth so bad.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,072
2,585
106
Best way to move more units of big expensive datacenter GPUs is to make smaller consumer GPUs that don't use as many wafers.

What are NVidia customers going to do in that scenario, switch to Intel?
The tradeoff argument would be valid only if TSMC has a shortage of capacity, if TSMC is at full utilization.

I have not listened to their latest investor CC, but we can safely assume that TSMC is not full capacity on the nodes in question (N5, N4).

As far as Intel, Intel is also using TSMC. Their old Arc cards are on N6, on which TSMC has endless capacity...
 
Reactions: Tlh97 and marees

gdansk

Platinum Member
Feb 8, 2011
2,212
2,836
136
It just shows that most humans are risk-averse and change-averse. People pretending to be wise by going with the popular or trendy thing rather than experimenting and hoping to discover something new. Blindly accepting common misconceptions and falsehoods. No wonder humanity has screwed planet Earth so bad.
It's not like Radeon is LSD it's just a slightly cheaper GPU.
 

gaav87

Junior Member
Apr 27, 2024
14
1
11
This is not really true, as shaders are compiled exactly so that the final code can be optimized for the architecture of the card. And Nvidia and AMD can handcode parts of the shader for certain games to optimize it further.

An issue is that the compiler for dual issue was really poor and probably only made modest gains. See the compiler section in: https://chipsandcheese.com/2023/01/07/microbenchmarking-amds-rdna-3-graphics-architecture/
How is my statement "Games are often optimized for wave32 execution." false ?

Gains from dual issue were next to none. Check ancient gameplays on YT rx7900gre vs rx 6950xt at the same core and memory speed. Only UE5 games got above 15% performance increase. Rest were +-0. Even some losses. Dual issue on rdna3 was still 128b doesnt matter if two neighbouring simds could do the same calculation still 128b if the result needs to be equal this would only increase performance in dot products or ML/AI as seen in SD 7900gre is like 2x faster then 6950xt.

I think amd separated the RT calculations from the alu's. And they either went 6x 32wide simd's per wgp for 192b of data as two FMAs needs 1536B so each piece of data needs to be reused 8 times instead of 12x as in rdna3. So they can market +50% performance/watt "slides". And the boost clock is around 2750mhz.
Or they went with some crazy 8x simd 16wide combo and the boost clock is 3050mhz.
I do not see N7->N4 clock increase at same iso power to be over 25-30% as [N7->N4p 26% and N7->N4x 30%]
Smaller simd's would allow them to increase the clock speed ? (not sure about that)
Reference rdna2 6800xt or 6800 boost around 2200mhz at 250W and 3050mhz would be around 40-45% clock increase that is crazy.
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,501
3,650
96
How is my statement "Games are often optimized for wave32 execution." false ?
They aren't.
Shader compiler does the 'optimization'.
Gains from dual issue were next to none.
You're not supposed to have much if any. It's an opportunistic throughput hack.
I think amd separated the RT calculations from the alu's. And they either went 6x 32wide simd's per wgp for 192b of data as two FMAs needs 1536B so each piece of data needs to be reused 8 times instead of 12x as in rdna3. So they can market +50% performance/watt "slides". And the boost clock is around 2750mhz.
Or they went with some crazy 8x simd 16wide combo and the boost clock is 3050mhz.
that's really-really not what happened.
They're not doing ALU spam.
I get it, you understand next to nothing about GPUs.
But no need to write essays about it.
3050mhz would be around 40-45% clock increase that is crazy.
baby clocks
 

Aapje

Golden Member
Mar 21, 2022
1,434
1,954
106
How is my statement "Games are often optimized for wave32 execution." false ?

You don't seem to have read my reasons so to repeat myself again:
- Games aren't specifically optimized for certain cards, the shaders are compiled for the specific card on the PC of the user
- The shader compiler for RDNA3 is/was bad and often doesn't take advantage of dual issue opportunities (see the link from my previous message that you didn't read)

And what adroc says is correct as well, dual issue is a bit of a hack that works much better for compute and not so well for games. It can only work for very specific things, since both operations have to work on the same data.
 

gaav87

Junior Member
Apr 27, 2024
14
1
11
They aren't.
Shader compiler does the 'optimization'.

You're not supposed to have much if any. It's an opportunistic throughput hack.

that's really-really not what happened.
They're not doing ALU spam.
I get it, you understand next to nothing about GPUs.
But no need to write essays about it.

baby clocks
1. Known shaders are w32 they can optimize shaders by hand for w64...
2. Thats what i said so why u quoting me ?
3. Maybe not next to nothing but im just, a civil structural engineer and read the white papers for fun as a hobby mr know it all.
You dont need to be rude af.

So you think they changed nothing from rdna3 and magically managed to get a 500-600mhz clock increase ? xD
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,501
3,650
96
The shader compiler for RDNA3 is/was bad and often doesn't take advantage of dual issue opportunities (see the link from my previous message that you didn't read)
You don't need dual issue when you can just emit w64.
1. Known shaders are w32 they can optimize shaders by hand for w64...
They aren't.
Shader compiler compiles them to either w32 or w64.
Maybe not next to nothing but im just, a civil structural engineer and read the white papers for fun as a hobby mr know it all.
Well yeah then you gotta read up on GPU programming basics and shader toolchains.
So you think they changed nothing from rdna3 and magically managed to get a 500-600mhz clock increase ? xD
RDNA3.5 gets a 500-600Mhz clock increase.
RDNA4 is an unrelated microarchitecture.
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,501
3,650
96
Also im sad that instead of saying why my reasoning was wrong you insulted me.
I'm telling you to read up on basics.
You need to understand how GPUs work before making some WGP design assumptions.
Why would 6x 32wide simd's be a bad idea ?
You need to upsize every other part of the WGP to make that work.
Hurts fmax, major gamble too.
So PS5 Pro's GPU clock ceiling should be 2.7-2.8GHz?
Not even the slightest of idea since consoles live in a discrete la-la-land wrt binning targets.
Does RDNA4 clock below or above RDNA3.5?
Bout the same.
 

gaav87

Junior Member
Apr 27, 2024
14
1
11
You need to upsize every other part of the WGP to make that work.
Hurts fmax, major gamble too.
I know that it hurts clocks and you need to upsize everything thats why i said 2750mhz boost clock. Still possible why a gamble ?
And what about 2x 4 16wide simds with shared 2x 64 wave slots so 128 wavefronts ?
Hint-Assisted Wavefront Scheduler would allow selective out-of-order execution.
Idk why you assumed i do not know the basics when my wgp design are possible.
 
Last edited:

adroc_thurston

Platinum Member
Jul 2, 2023
2,501
3,650
96
I know that it hurts clocks
Then it shouldn't exist!
Still possible why a gamble ?
Because you can sim it only so well and so far.
And what about 2x 4 16wide simds with shared 2x 64 wavefronts ?
You don't need more SIMDs.
More isn't better.
Hint-Assisted Wavefront Scheduler would allow selective out-of-order execution.
the what? GPUs are strictly in-order.
Idk why you assumed i do not know the basics when my wgp design are possible.
You don't since your ideas contradict the way modern shader cores converged on each other.
The biggest big boy shader core to this day is still SMX from Kepler, and guess what? It sucked.
 

SolidQ

Senior member
Jul 13, 2023
342
355
96
It just shows that most humans are risk-averse and change-averse. People pretending to be wise by going with the popular or trendy thing rather than experimenting and hoping to discover something new. Blindly accepting common misconceptions and falsehoods. No wonder humanity has screwed planet Earth so bad.
gonna show few examples from other forum.

and other alot examples. As you see for people only NV exist.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
28,654
21,159
146
ATPN is where they house the crazies so they don't infect the rest of the board.
The rules are different for the social forums. In the tech forums everyone is expected to maintain a much higher level of decorum. That includes no profanity, personal attacks, and trolling.

B3D™ virus is spreading.
Anyone that will follow the forum guidelines is welcome here. Brand preference is also perfectly fine. As long as they keep the pom pom shaking and trash talk of other teams to being of the mild variety, and in their favorite vendor's threads, no rules are broken. Going to other vendors threads to antagonize, troll, or trash talk the "other teams" is when the guilty will be punished. If anyone with a brand preference does not like what the "other teams" are saying in their threads? Don't read those threads. Jumping in to defend your team is not permitted. As the Offspring rocked so hard - gotta keep em separated!
- *Looks over at ATPN* uh yeah keep telling yourself that my guy
I know you and your sense of humor. I am a fan of it. But here's a heads up. This technically falls under the "no moderator callouts" rule. The only place to question and complain about moderation is the moderation discussions forum. Which ironically perhaps, our corporate overlords have not fixed for all members to access yet. Hence, I will not enforce the rule about PMing mods directly given the situation, and you can PM me any time. Even if you have beef with me, I will get you in group chat with the Administration so your grievances can be addressed.

I hope this post helps out and informs the newer members here. Welcome aboard and happy posting.

Mod DAPUNISHER
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |