Discussion RDNA 5 / UDNA (CDNA Next) speculation

soresu · May 15, 2025

basix said:
Sure it is already supported. But if the GPU has enhanced dynamic execution abilities (reordering, out of order, dynamic allocation, dynamic wavefront sizes) it should get more powerful.

True, but that is also true of all (I hope) types of programming models if the µArch is properly designed to be future proof.

SolidQ · May 17, 2025

About Mi450

https://twitter.com/x/status/1923142994963669245

adroc_thurston · May 17, 2025

SolidQ said:
About Mi450

none of that is news and yes it looks like Cray Shasta (EX235a) for obvious reasons.

basix · May 18, 2025

I find it interesting, that MI450X shall get its own LPDDR memory-pool. I speculated for a while now, that this could be introduced with CDNA4 or CDNA5 and would lead to a big benefit. 8x LPDDR packages would also fit on a OAM PCB, there is enough space for that.

Joe NYC · May 19, 2025

basix said:
I find it interesting, that MI450X shall get its own LPDDR memory-pool. I speculated for a while now, that this could be introduced with CDNA4 or CDNA5 and would lead to a big benefit. 8x LPDDR packages would also fit on a OAM PCB, there is enough space for that.

Is that in addition to or as a replacement for HBM?

adroc_thurston · May 19, 2025

Joe NYC said:
Is that in addition to or as a replacement for HBM?

the former.
You can't replace HBM anywhere HPC.

marees · May 22, 2025

CDNA 5 ?

9.0.0 -> 9.0.6 -> 9.0.8 -> 9.0.10 -> 9.4.2 -> 9.5.0 -> 12.5?

https://twitter.com/x/status/1925505769278992872

Kronos1996 · May 22, 2025

marees said:
CDNA 5 ?

9.0.0 -> 9.0.6 -> 9.0.8 -> 9.0.10 -> 9.4.2 -> 9.5.0 -> 12.5?

https://twitter.com/x/status/1925505769278992872

This implies it’s built on the RDNA 4 ISA? That would be sooner than expected, I figured RDNA 5 at the earliest. They’ve been adding architecture features useful for AI and datacenter since RDNA 3. I figured that was their game plan, slowly expand the RDNA ISA until it’s ready. If they think it’s ready, who am I to argue?

With the AI race heating up, maybe they decided to speed-run things. A modern ISA should bring very nice PPA improvements and RDNAs cache design is world-leading.

adroc_thurston · May 22, 2025

Kronos1996 said:
That would be sooner than expected, I figured RDNA 5 at the earliest. They’ve been adding architecture features useful for AI and datacenter since RDNA 3. I figured that was their game plan, slowly expand the RDNA ISA until it’s ready. If they think it’s ready, who am I to argue?

It's the opposite, they un-ghetto'd DC CUs.

Kronos1996 · May 22, 2025

adroc_thurston said:
It's the opposite, they un-ghetto'd DC CUs.

You’ll have to elaborate on that for me.

My understanding is that they’re driving to a unified modular CU and ISA. Then just insert additional IP as appropriate for the target market. With GFX 9 having so much legacy baggage, it would seem prudent to use RDNA as the basis. I can’t see AMD throwing out everything for a clean-sheet design again.

Kepler_L2 · May 22, 2025

Kronos1996 said:
You’ll have to elaborate on that for me.

My understanding is that they’re driving to a unified modular CU and ISA. Then just insert additional IP as appropriate for the target market. With GFX 9 having so much legacy baggage, it would seem prudent to use RDNA as the basis. I can’t see AMD throwing out everything for a clean-sheet design again.

Seems to be a "fatter" version of RDNA4 with more compute/matrix throughput. RDNA5/gfx13 might not have that much in common.

adroc_thurston · May 22, 2025

Kronos1996 said:
My understanding is that they’re driving to a unified modular CU and ISA.

They aren't, but ISAs will converge to a point.

Kronos1996 said:
With GFX 9 having so much legacy baggage, it would seem prudent to use RDNA as the basis

What is even legacy baggage here.

Kronos1996 · May 22, 2025

adroc_thurston said:
They aren't, but ISAs will converge to a point.

What is even legacy baggage here.

GCN had terrible PPA in later iterations which had knock-on effects for efficiency of course. IIRC The memory subsystem was also pretty atrocious and caused a lot of issues getting full theoretical performance. I was under the impression CDNA still had to work around these problems despite improvements. Thanks to chip-lets they more or less brute-forced the PPA problem.

RDNA is the exact opposite. Navi 10 was 25% smaller than Vega 20 while delivering similar gaming performance (HBM still gave the older card an advantage.) That’s a pretty impressive increase in PPA and efficiency due to the new architecture. Then of course RDNA 2 introduced the full realization of the new memory subsystem. AMDs cache design teams are probably the best in the world. Between Infinity cache and 3D cache.

adroc_thurston · May 22, 2025

Kronos1996 said:
GCN had terrible PPA in later iterations which had knock-on effects for efficiency of course

No it didn't, baby vegas in Renoir/Cezanne was really dang good.

Kronos1996 said:
The memory subsystem was also pretty atrocious and caused a lot of issues getting full theoretical performance

No it was alright, just tricky to scale.

Kronos1996 said:
I was under the impression CDNA still had to work around these problems despite improvements

No.

Kronos1996 said:
Thanks to chip-lets they more or less brute-forced the PPA problem.

You do understand that MI100 and MI200 are monodie, don't you.
They were super basic products and really competent at their job.

Kronos1996 said:
Navi 10 was 25% smaller than Vega 20 while delivering similar gaming performance (HBM still gave the older card an advantage.)

Yeah but Vega20 wasn't a good config for gaming, a 48CU with higher clocks would be less area and would do the same there.
It was an HPC part, the first one since Hawaii.

Kronos1996 said:
Then of course RDNA 2 introduced the full realization of the new memory subsystem

It just added MALL.
RDNA1 was the one that introduced the new memory subsystem.

reaperrr3 · May 23, 2025

Kronos1996 said:
RDNA is the exact opposite. Navi 10 was 25% smaller than Vega 20 while delivering similar gaming performance (HBM still gave the older card an advantage.) That’s a pretty impressive increase in PPA and efficiency due to the new architecture.

The PPA improvement of RDNA1 was actually quite a disappointment.

VII was only 330mm², even though it had an overkill (for gaming) 4096bit HBM2 interface, half-rate FP64 (which made CUs bigger than they needed to be for a gaming card) and 64 CUs even though the Vega 56 and the Fury cards before that had clearly shown that GCN scales poorly from 56 to 64 CUs (and mediocre from 48 to 56, there were some tests for that, too).

Basically, if you took Vega20, removed half-rate FP64 support, cut the HBM interface in half (but kept L2 the same size and went with the fastest available HBM), reduced the CUs to 56 or maybe even 48 like adroc suggested and clocked the thing just ~150-200 Mhz higher, you'd end up with a chip of similar size and similar gaming perf as N10, at least in the games back then.

N10 should've had 48 CUs and twice as much L2, then it would've been better (the 40 CUs only made up 81mm² of the chip, so that would've only increased N10's size by like 10%).
But the way they configured N10, it's PPA was so-so for a new uArch using N7, not much better than a gaming-focused Vega2 config would've been.

soresu · May 23, 2025

adroc_thurston said:
You do understand that MI100 and MI200 are monodie, don't you.

Eh?

I must be hallucinating, because this doesn't look like a single die package...

adroc_thurston · May 23, 2025

soresu said:
I must be hallucinating, because this doesn't look like a single die package...

It's two GPUs on the same substrate.
No different from K80 et al.

soresu · May 23, 2025

adroc_thurston said:
It's two GPUs on the same substrate.
No different from K80 et al.

We're not talking about stacking multiple dies here, MCM is MCM.

adroc_thurston · May 23, 2025

soresu said:
We're not talking about stacking multiple dies here, MCM is MCM.

It's not MCM, these are two discrete GPUs on the same substrate.
Is Tesla K80 MCM?

marees · May 24, 2025

AMD path tracing

Performant Path Tracing: Two patent filings about next level adaptive decoupled shading (texture space shading) that could be very important for making realtime path tracing mainstream; one spatiotemporal (how things in the scene changes over time) and another spatial (focusing on current scene). Both are working together to prioritize shading ressources on the most important parts of the scene by reusing previous shading results and lowering the shading rate when possible. IDK how much this differs from ReSTIR PTGI but it sounds more comprehensive and generalized in terms of boosting FPS.

https://www.reddit.com/r/hardware/comments/1kd14is/amds_postrdna_4_ray_tracing_patents_look_very

soresu · May 25, 2025

Kepler seems to be claiming UDNA has 256 ALUs per CU, which is 2x from RDNA4 based off this diagram:

https://twitter.com/x/status/1925525698422112514

marees · May 25, 2025

soresu said:
Kepler seems to be claiming UDNA has 256 ALUs per CU, which is 2x from RDNA4 based off this diagram:

https://twitter.com/x/status/1925525698422112514

Could be applicable only for CDNA 5 & not RDNA 5, I think

AMD not shared any stuff about UDNA so far (neither any leaks). This almost feels like a scam now !!

soresu · May 25, 2025

adroc_thurston said:
No it didn't, baby vegas in Renoir/Cezanne was really dang good.

I was under the impression that beyond using the significantly less bugged Vega v2 that those iGPUs also used some components from RDNA1?

adroc_thurston · May 25, 2025

soresu said:
I was under the impression that beyond using the significantly less bugged Vega v2 that those iGPUs also used some components from RDNA1?

It didn't, it's a chopped off Vega20 basically with 1/16DPFP.
Also Vega had no "bugs" besides the new internal shader stages. The IP just sucked. Until it didn't!

soresu said:
Kepler seems to be claiming UDNA has 256 ALUs per CU

no such thing as UDNA.

marees said:
Could be applicable only for CDNA 5 & not RDNA 5, I think

yea.
ALU spam doesn't help you in client, and it helps even less with RTRT.

marees said:
AMD not shared any stuff about UDNA so far (neither any leaks).

Because it does not exist.
Client and DC shader cores live on completely separate tracks.
The only thing that's happening is un-ghetto-ing of DC parts into a modern ISA with all the party tricks RDNA gained so far.

soresu · May 26, 2025

adroc_thurston said:
Client and DC shader cores live on completely separate tracks.

Is it even a shader core anymore when the µArch doesn't have any fixed function gfx silicon?

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Diamond Member

Golden Member

Diamond Member

Member

Diamond Member

Diamond Member

Golden Member

Member

Diamond Member

Member

Senior member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member