Discussion RDNA4 + CDNA3 Architectures Thread

Page 30 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,623
5,894
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Tuna-Fish

Golden Member
Mar 4, 2011
1,366
1,595
136
There are supposedly 2 chips. I would expect the smaller one to have 12GB and the bigger to have more.

The smaller one probably has a 64-bit bus, and like 6-8GB of memory.

If release is mid next year there should be 24Gb memory chips. So, 6GB with 64-bit bus, 12GB with 128b, 18GB with 192.

I would be very surprised if either of the low-end dies had a 256-bit bus, but I've been surprised before. If the rumor that there were supposed to be higher-end dies too, I would find it reasonable if N43 had a 128-bit bus, 12GB of vram, and in performance was near 7700XT (either side), at a much lower cost.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,373
2,868
136
The smaller one probably has a 64-bit bus, and like 6-8GB of memory.

If release is mid next year there should be 24Gb memory chips. So, 6GB with 64-bit bus, 12GB with 128b, 18GB with 192.

I would be very surprised if either of the low-end dies had a 256-bit bus, but I've been surprised before. If the rumor that there were supposed to be higher-end dies too, I would find it reasonable if N43 had a 128-bit bus, 12GB of vram, and in performance was near 7700XT (either side), at a much lower cost.
I don't think they would release a weaker GPU than N33.
N33 is the weakest RDNA3 GPU and has 128-bit GDDR6 + 8GB Vram.
Even with GDDR7 you would end up with lower BW than N33, unless you add more IC to compensate, you won't be able to feed a faster GPU.

If 24gbit GDDR7 will be out next year, then I expect 128-bit + 12GB Vram for the weaker one and 192-bit + 18GB Vram for the stronger one.
 
Last edited:

Tuna-Fish

Golden Member
Mar 4, 2011
1,366
1,595
136
N33 is the weakest RDNA3 GPU, but that's because N24 is still being made and sold.

GPUs are not just sold to gamers, there is always going to be a market for very low cost low performance GPUs that you can plug into a machine to get more display outputs, and with support for modern video encode/decode.

These products are not refereshed every generation, but N24 is not great because it lacks AV1, I think they will want to replace that sooner rather than later.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,373
2,868
136
N33 is the weakest RDNA3 GPU, but that's because N24 is still being made and sold.

GPUs are not just sold to gamers, there is always going to be a market for very low cost low performance GPUs that you can plug into a machine to get more display outputs, and with support for modern video encode/decode.

These products are not refereshed every generation, but N24 is not great because it lacks AV1, I think they will want to replace that sooner rather than later.
And why would you need RDNA4 GPU just to make a card aimed for video support or more display outputs?
 
Last edited:

PJVol

Senior member
May 25, 2020
561
472
106
-IMO the best time to buy an AMD GPU is at the end of a generational cycle at deep discount.
Yeah, that's what I planned to do when I bought it in the summer '22, although I missed discounts (region specific thing) and generational cycle lasted another year, based on it's availability on shelves, but I'm fine with it
 

Aapje

Golden Member
Mar 21, 2022
1,434
1,954
106
GPUs are not just sold to gamers, there is always going to be a market for very low cost low performance GPUs that you can plug into a machine to get more display outputs, and with support for modern video encode/decode.
It's not a big market, though. You need to keep in mind that lower tier products like a 3050 go into laptops in large numbers. With stronger APU's (with GPU chiplets), those will probably replace the lowest tier.

The people who want a low cost card probably need to go up a tier, go second hand, or whatever. Note that since the 4060 is a x050-size chip, it can easily be sold as a low power, low profile card.
 

rtxtwt

Senior member
Jul 2, 2018
319
505
136
A dude named BrockSuire75 had 2 comments under a youtube video:


I want to put out something about AMD GPU s. AMD is not leaving the high end market. As some have seen all over the internet about AMD to stop making high end GPU s. This is 1000% false. AMD has next Gen cards being validating as we speak. I have a few engineering samples I'm evaluating. Keep dreaming, never let anyone stop you!

I work for AMD. Love your video on ROCm!

Here's his linkedin page:

https://www.linkedin.com/in/brock-suire-1576a149
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,628
7,954
136
Hmm, that's interesting. His LinkedIn profile looks legit. His comment:

@BrockSuire75

1 month ago (edited)
I want to put out something about AMD GPU s. AMD is not leaving the high end market. As some have seen all over the internet about AMD to stop making high end GPU s. This is 1000% false. AMD has next Gen cards being validating as we speak. I have a few engineering samples I'm evaluating. Keep dreaming, never let anyone stop you!
 

Ajay

Lifer
Jan 8, 2001
15,628
7,954
136
An N31 refresh with GDDR7 would make some sense if RDNA4 (monolithic) isn't going to be that fast.
I don't think so. Either the scuttlebutt going about on High End RDNA4 is bogus, or (less likely) there is an RDNA3+ N31b (no point in complicating the design with GDDR7).
It could be that the next gen tiled RDNA4 got canned early enough that AMD was able to pivot to an RDNA4 design that closer to what RDNA3 is. Often times the 'leaks' we get come out well after the fact - so AMD may have had time to change direction. As always, interesting times.

Oh, and some engineer apparently got really p*ssed off over the rumors - wonder what his boss will have to say about his comments.
 

jpiniero

Lifer
Oct 1, 2010
14,688
5,317
136
I don't think so. Either the scuttlebutt going about on High End RDNA4 is bogus, or (less likely) there is an RDNA3+ N31b (no point in complicating the design with GDDR7).

Swapping out the MCD with one that works with GDDR7 would be relatively easier I would think. I suppose they could do a RDNA4 GCD as well... that would be a lot more work.

If they were to say compltely pivot back to something like RDNA3 chiplet... the question becomes can they get it out in the timeframe it needs to be.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Huh, not sure you're supposed to say that bud. But at this point more and more companies are getting actively hostile towards the ever grinding rumor hype machine, maybe he won't get in too much trouble. EG Rockstar just officially denied Joe Rogan was going to be in GTA6.

It'd be interesting to know what an RDNA4 high end gpu looks like. 256bit bus, stack the v-cache up to 128mb, 20% faster than a 4090 across the board could be really compelling for the holiday season, and then the price could get cut to < $1k once Nvidia's next gen drops.
 

Ajay

Lifer
Jan 8, 2001
15,628
7,954
136
I suppose they could do a RDNA4 GCD as well... that would be a lot more work.
Well, if what ever appeared to be too complex was more easily solvable than expected - that would be the best route. Of course, this whole rumor is a puzzle wrapped in an enigma.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Great writeup of how Starfield run across GPUs: https://chipsandcheese.com/2023/09/...erformance-on-nvidias-4090-and-amds-7900-xtx/

And it re-enforces a thought. Skip the normal L2 cache on RDNA4 entirely and collapse it to L1/L2. By skipping out the L2 you go straight from L1 to the giant chiplet cache, saving expensive die space and reducing latency at the same time. I'm not sure how much the trace length going out to chiplets matters, but if it's a lot one can imagine an RDNA4 with the 6nm chiplet cache packaged in a different way, perhaps directly to the main die instead of onto the PHY dies.
 

rtxtwt

Senior member
Jul 2, 2018
319
505
136
I don't think so. Either the scuttlebutt going about on High End RDNA4 is bogus, or (less likely) there is an RDNA3+ N31b (no point in complicating the design with GDDR7).
This.
Gossip just suggest one of the model(possible to be "flagship") just a bit smaller than N31(529 mm2), between N32(346 mm2) & N31 but closer to N31.

I have reasonable doubt that with TSMC N4, the so-call next gen RNDA is just a process shrink from RDNA3 with some fix. Calling it RDNA3+/3.5 is more realistic except if there's any advanced architecture changes. Of course it could be another option like going monolithic with N4 process but it's less possible. IMO.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,623
5,894
136
Great writeup of how Starfield run across GPUs: https://chipsandcheese.com/2023/09/...erformance-on-nvidias-4090-and-amds-7900-xtx/

And it re-enforces a thought. Skip the normal L2 cache on RDNA4 entirely and collapse it to L1/L2. By skipping out the L2 you go straight from L1 to the giant chiplet cache, saving expensive die space and reducing latency at the same time. I'm not sure how much the trace length going out to chiplets matters, but if it's a lot one can imagine an RDNA4 with the 6nm chiplet cache packaged in a different way, perhaps directly to the main die instead of onto the PHY dies.
AMD cache hierarchy has been the same since GCN. L1 are used for shader export and is one client (of many) to L2. Folding them both requires a major overhaul of the interconnect/architecture of the like of Terascale --> GCN migration.
This cache hierarchy below started from GCN and remained same in RDNA3


This.
Gossip just suggest one of the model(possible to be "flagship") just a bit smaller than N31(529 mm2), between N32(346 mm2) & N31 but closer to N31.

I have reasonable doubt that with TSMC N4, the so-call next gen RNDA is just a process shrink from RDNA3 with some fix. Calling it RDNA3+/3.5 is more realistic except if there's any advanced architecture changes. Of course it could be another option like going monolithic with N4 process but it's less possible. IMO.
Any links?

A hypothetical RDNA4 GCD on N3E using similar size to N32 GCD is going to pack a lot of transistors closing in on N31 GCD even with a mediocre 1.3x density improvement (vs advertised 1.6x).
If the LDS is capable of being reused as VGPR/L0 as seen in some patents, then they don't need as much Si real estate if they want to increase a bit of GPR/L0. Couple that with clock increase it would not be a mid range part at all.
On the other hand, going from the rumors of no high end parts for RDNA4, a hypothetical 32-40 CU GCD would be <140mm2 range on N3E. At which point there is an argument to be made why make RDNA4 chip at all, just sell the 7800XT refresh on N4 at a lower price and go straight to RDNA5. A 60CU RDNA3 would be more than capable to match a 40CU RDNA4 part.
 

rtxtwt

Senior member
Jul 2, 2018
319
505
136
Last edited:

beginner99

Diamond Member
Jun 2, 2009
5,211
1,582
136
I thought they were talking about the performance of n44(or maybe n43), which claimed to be higher than n32, and actually closer to n31. Maybe I read it wrong.
Which then confirm no high end right? because if the "next-gen" doesn't best the previous gen, that by definition means no high end. Releasing it of course still makes sense in terms of power user and features.
 

rtxtwt

Senior member
Jul 2, 2018
319
505
136
I thought they were talking about the performance of n44(or maybe n43), which claimed to be higher than n32, and actually closer to n31. Maybe I read it wrong.
Which then confirm no high end right? because if the "next-gen" doesn't best the previous gen, that by definition means no high end. Releasing it of course still makes sense in terms of power user and features.

Most of those comments under that topic were speculation. And most important information is within the title: "next gen" "larger than 32 and is close to 31". No codename like N43/42/41 being confirmed.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,373
2,868
136
On the other hand, going from the rumors of no high end parts for RDNA4, a hypothetical 32-40 CU GCD would be <140mm2 range on N3E. At which point there is an argument to be made why make RDNA4 chip at all, just sell the 7800XT refresh on N4 at a lower price and go straight to RDNA5. A 60CU RDNA3 would be more than capable to match a 40CU RDNA4 part.
32-40CU is too little for a next gen chip even If they scrapped High-end, this could be the slower chip.
Considering they have strix Halo with 40CU IGP, I would expect 48CU full and 40CU as a cutdown for the weaker chip.
Stronger could be 72CU and 60CU cutdown.

I got ~171mm2 for a 48CU(3SE) RDNA3 5nm GCD, based on what Locuza posted some time ago.
If I applied N3E density improvement to It.

At worst, It could be 132mm2(+30% improvement) and at best 114mm2(+50%).
Of course, we don't know how much more transistors RDNA4 uses vs RDNA3 for comparable specs, so this estimate is inaccurate.

To your argument, If It's not better to just port RDNA3 N32 to N4P and save some money in the process. That's very hard to tell when we know absolutely nothing about RDNA4.
Considering that they plan to release It, It should still have some advantage compared to RDNA3.
I personally expect significantly higher clocks. With 48CU, It would need ~25% to be on par with 7800XT.

The problem I see is not this GCD per se, but MCD and Vram. For a 48 CU at higher clock you would need comparable BW to N32.
4 MCDs would have comparable size to that GCD, even If they cost less to produce.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,623
5,894
136
I would expect 48CU full and 40CU as a cutdown for the weaker chip.
Stronger could be 72CU and 60CU cutdown.
72 CU is not going to be 5700XT/6700XT tier. The RDNA4 parts are allegedly 6700XT/5700XT (40 CUs) tier from twitter chatter.

At worst, It could be 132mm2(+30% improvement) and at best 114mm2(+50%).
Of course, we don't know how much more transistors RDNA4 uses vs RDNA3 for comparable specs, so this estimate is inaccurate.
I was being generous and assumed the RDNA4 parts would come with a big uptick in MTr/CU.

To your question, If It's not better to just port RDNA3 to N4P and save some money in the process. That's very hard to tell when we know absolutely nothing about RDNA4.
Considering that they plan to release It, It should still have some advantage compared to RDNA3.
I personally expect significantly higher clocks. With 48CU, It would need ~25% to be on par with 7800XT.

The problem I see is not this GCD per se, but MCD and Vram. For a 48 CU at higher clock you would need comparable BW to N32.
4 MCDs would have comparable size to that GCD, even If they cost less to produce.
Indeed RDNA4 chatter is quite less compared to RDNA3 at the similar point in time.

But RDNA3 GCD/MCD concept is quite sound, at least in theory. They moved the IC with a 96M of SRAM to the MCD.
The N31 GCD is not very cache heavy for its 304 mm2 size [ e.g. ~22M --> 6M (L2) + 3M (L1) + 3M (L0) + 2.25M (I+K) + 6M (LDS) ] in contrast to a Z4 CCD which is 40M (L2+L3) for a die size of 66mm2.
The density of RDNA3 GCD is on the high side among known N5 dies, around 148MTr/mm2.
Would be a shame to drop GCD/MCD concept in RDNA4.

So going with this an N3E GCD should be able to achieve decent scaling since it is going to remain logic heavy. I would imagine hitting close to 200 MTr/mm2 if not more.
40CUs on N3E really would be very tiny and not sure if meaningful going with a chiplet. They might as well stick to N5 or even N6 for 40CUs
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,373
2,868
136
72 CU is not going to be 5700XT/6700XT tier. The RDNA4 parts are allegedly 6700XT/5700XT (40 CUs) tier from twitter chatter.
I find only 40CU unlikely. Just to be on par with N32, It would need 50% higher clock.
I was being generous and assumed the RDNA4 parts would come with a big uptick in MTr/CU.
This could happen, but then I would expect some improvement in the architecture. Higher IPC or higher clock, etc.
Indeed RDNA4 chatter is quite less compared to RDNA3 at the similar point in time.

But RDNA3 GCD/MCD concept is quite sound, at least in theory. They moved the IC with a 96M of SRAM to the MCD.
The N31 GCD is not very cache heavy for its 304 mm2 size [ e.g. ~22M --> 6M (L2) + 3M (L1) + 3M (L0) + 2.25M (I+K) + 6M (LDS) ] in contrast to a Z4 CCD which is 40M (L2+L3) for a die size of 66mm2.
The density of RDNA3 GCD is on the high side among known N5 dies, around 148MTr/mm2.
Would be a shame to drop GCD/MCD concept in RDNA4.

So going with this an N3E GCD should be able to achieve decent scaling since it is going to remain logic heavy. I would imagine hitting close to 200 MTr/mm2 if not more.
40CUs on N3E really would be very tiny and not sure if meaningful going with a chiplet. They might as well stick to N5 or even N6 for 40CUs
Supposedly, only 2 monoliths survived.
If It ends up as a chiplet and GCD will use N3E, then I would expect a chip with more CU.
 
Reactions: Tlh97 and Joe NYC

Tigerick

Senior member
Apr 1, 2022
679
559
106
72 CU is not going to be 5700XT/6700XT tier. The RDNA4 parts are allegedly 6700XT/5700XT (40 CUs) tier from twitter chatter.


I was being generous and assumed the RDNA4 parts would come with a big uptick in MTr/CU.


Indeed RDNA4 chatter is quite less compared to RDNA3 at the similar point in time.

But RDNA3 GCD/MCD concept is quite sound, at least in theory. They moved the IC with a 96M of SRAM to the MCD.
The N31 GCD is not very cache heavy for its 304 mm2 size [ e.g. ~22M --> 6M (L2) + 3M (L1) + 3M (L0) + 2.25M (I+K) + 6M (LDS) ] in contrast to a Z4 CCD which is 40M (L2+L3) for a die size of 66mm2.
The density of RDNA3 GCD is on the high side among known N5 dies, around 148MTr/mm2.
Would be a shame to drop GCD/MCD concept in RDNA4.

So going with this an N3E GCD should be able to achieve decent scaling since it is going to remain logic heavy. I would imagine hitting close to 200 MTr/mm2 if not more.
40CUs on N3E really would be very tiny and not sure if meaningful going with a chiplet. They might as well stick to N5 or even N6 for 40CUs
If N43 is indeed a 40CU GPU, then it makes sense based on current product lineups. N43 is not designed to replace current N32, it is going to replace 6700XT/6750XT 12GB and I believe it can only support 192-bit memory bus with 12GB GDDR6 as well. I think it will be made by TSMC's N4P process with monolithic design, not N3E as people hope so.

As for pricing, @adroc_thurston has hinted it would cost $399 at best which make sense with smaller die size and monolithic design. I am hoping AMD would go lower like $349 with OC version cost up to $399. There is going to have some overlapping between N43 (likely going to model as Radeon RX7700 non-XT series) and 7700XT especially with OC version of 7700; but I believe AMD would focus more on RX7800XT based on current pricing.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |