Discussion RDNA4 + CDNA3 Architectures Thread

Page 79 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,623
5,893
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Mahboi

Senior member
Apr 4, 2024
522
833
91
They've had a unified compute stack for 15 years haven't they? V100, A100, H100, that's recent stuff. Using Geforce cards and CUDA for compute is from what, 2008?
AMD was always quite behind there (I mean the software, mostly). I take it you mean they wanna have ROCm fully primed and support across the entire RDNA 5 stack for Windows and Linux? Cause that's still a huge annoyance for people.
 
Reactions: Tlh97 and MoogleW

Abwx

Lifer
Apr 2, 2011
11,055
3,709
136
I'm not worried about gearing up. I'm worried that the absolute lack of interest from AMD means that in 1.5 years, we get Zen 6, we get MI450X, and we get a very polite "btw that's where all the wafers are going, so we got some smol & cute GPUs for a high price for you".
They have learned the hard way that selling at low prices doesnt work and is even counterproductive, with the former low prices one cant even fund increasing RD costs.

At the end of the day they are much better by sticking to healthy prices, it may change only if they get enough marketshare, at wich point Nvidia could lower their prices wich would lead AMD to follow suite.
 

Aapje

Golden Member
Mar 21, 2022
1,433
1,951
106
My expectation is that this gen will be a bit of a letdown, with AMD having had to rush this gen after redesigning their chips due to the RDNA3 problems and Nvidia sticking to too little VRAM. Also, Nvidia is definitely going to keep a lot in reserve for the mid-gen refresh, because they will then be up against RDNA5.
 

Ghostsonplanets

Senior member
Mar 1, 2024
387
659
96

"RDNA 4 specs and performance prediction
A similarly clocked RDNA 4 CU is 12% faster in raster and 25% faster in ray tracing compared to RDNA 3.
That would put:
- a full Navi 48 SKU close to a 7900XT
- a full Navi 44 SKU between a 7600XT and 7700XT"

The link they shared:


"The traditional performance IPC of RDNA4 is expected to increase by about 12% compared to RDNA3, while the improvement in light pursuit will be huge (hardware BVH traversal), and the IPC is expected to increase by about 25%.
RDNA4 should be a single-chip design, using TSMC N4P process, with a smaller area, so the cost is very low. The video memory should be GDDR6. The graphics card will be very cost-effective."
 
Aug 4, 2023
199
425
96
N32 is 36.3B* xtors at 103.7M xtors/mm^2 overall, for N48 being monolithic, adding 1SE and 2WGP's but taking out the USRs plus estimating other uArch changes puts it at ~34B xtors or so, about 140M xtors/mm^2 at 240mm^2.
N31 is 109M xtors/mm^2 overall, 150M xtors/mm^2 GCD
AD102 is 125M xtors/mm^2
AD104 less at 121M xtors/mm^2
GH100 98M xtors/mm^2
B100 looks to be 125M xtors/mm^2, uArch changes make up most of that gain
But finally, looking at other N4 parts, PHX1 is 141M xtors/mm^2.
So I think Greymon 2.0's leak is plausible.
*Yeah TPU needs to do a far better job at keeping their database up to date, N48 needs to have slightly fewer xtors than N32 to fit <=250mm^2.
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,671
2,946
136

"RDNA 4 specs and performance prediction
A similarly clocked RDNA 4 CU is 12% faster in raster and 25% faster in ray tracing compared to RDNA 3.
That would put:
- a full Navi 48 SKU close to a 7900XT
- a full Navi 44 SKU between a 7600XT and 7700XT"

The link they shared:


"The traditional performance IPC of RDNA4 is expected to increase by about 12% compared to RDNA3, while the improvement in light pursuit will be huge (hardware BVH traversal), and the IPC is expected to increase by about 25%.
RDNA4 should be a single-chip design, using TSMC N4P process, with a smaller area, so the cost is very low. The video memory should be GDDR6. The graphics card will be very cost-effective."

A 7800XT OC'd to 2.9Ghz with 2640 ram speed gets around 19% more performance than the stock model in Time Spy GT1 Extreme. Add another 12% to that and you are at the same performance as an OC'd 7900 GRE and not far at all from a 7900XT. 3Ghz + would get your basically in the same ballpark and N48 has a few extra CUs as well so that all seems to line up.

A 3Ghz 7600XT with 2500 ram is 12% ahead of stock so with 12% more performance per CU that would put it about 25% ahead of stock which is in 3070/4060Ti ball park and fits between the 6700XT and 7700XT. This also seems to line up.

N32 is 28.1B xtors at 81.2M xtors/mm^2 overall, for N48 being monolithic, adding 1SE, the relative L2 and 2WGP's and taking out the USRs plus estimating other uArch changes puts it at ~34B xtors or so, about 140M xtors/mm^2 at 240mm^2.
N31 is 109M xtors/mm^2 overall, 150M xtors/mm^2 GCD
AD102 is 125M xtors/mm^2
AD104 less at 121M xtors/mm^2
GH100 98M xtors/mm^2
B100 looks to be 125M xtors/mm^2, uArch changes make up most of that gain
But finally, looking at other N4 parts, PHX1 is 141M xtors/mm^2.
So I think Greymon 2.0's leak is plausible.

AMDs xtor numbers for N31 and N32 are bloody strange because we know from die shots the density of the GCDs should be roughly the same which means one is including MCDs and the other is excluding them.

If N44 is 19.5M transistors then die size should be between 140 and 150mm. I presume N48 is nearly exactly double (probably won't quite be because you don't need 2x the display engines, that will be the same in both) so something like 35B xtors for N48 would be between 250-270.
 

Mahboi

Senior member
Apr 4, 2024
522
833
91
Profanity is not permitted in the tech forums
HBM is far too expensive. They need cheaper GDDR AI HW for certain customers.
That is the plan, not many people can afford the HBM tax.

Ok, fair.

My last worry is about AMD's propensity to actually get ROCm there. Software isn't like hardware, you can have a REDACTED gen and it'll be replaced in 2 years. Hurts, but there are no consequences long term. Software is an infinite black hole of work and money, no matter how much you pour into it, there's more to pile on top of it. NV always won, long before CUDA even, because they understood that software is the real weight and the real seller.

AMD's been dancing around software and focusing on better hardware for pretty much all of their history, so I wonder if we're not going to end up with yet another gen where our good Wendell will say "ROCm is pretty good now, it's getting there!" and most of the enterprise customers will still go for CUDA because "getting there" isn't as trustworthy as "we've emptied the bank account into it".
 

Mahboi

Senior member
Apr 4, 2024
522
833
91
They have learned the hard way that selling at low prices doesnt work and is even counterproductive, with the former low prices one cant even fund increasing RD costs.

At the end of the day they are much better by sticking to healthy prices, it may change only if they get enough marketshare, at wich point Nvidia could lower their prices wich would lead AMD to follow suite.
Yes, I've heard the story of Fermi being the first gen compute card and being sold as a "gaming card/space heater" hybrid by Jensen. AMD thought it was the perfect occasion to run NVidia out of the market and sold their better competitor card for less of a price, to completely take the market and ruin Nvidia. After 8 months, Jensen had sold his compute card to enterprises, still sold a sizeable amount of crappy cards to customers, and despite the general anger, had made all his money and reinvested it into a new, fixed Fermi, and into the next gen soon after (isn't that right Kepler_L2?). AMD meanwhile was left with little money to invest into the next gen, because their terrible margins gave them no elbow room.

In the end, the man who read the market well made away with a perfect road down the next 10 years, and AMD would soon shoot themselves in the other foot with Bulldozer, leading to their dark years.

I know AMD is paranoid about going for market and will never not go for margins now. And that no matter how much cheaper than NV they are, people still pay the Jensen tax, so there's no point in lowering prices if you sell an XTX for $700 and people still pay for the 4080 at $1000. Frankly I think the price strategy is bollocks but makes sense. My real worry is that NV has, for lack of a better term, a total mental grasp on the market. People just eat the NV slop like it's gold. It's also a fact that AMD has little response in terms of features, it's always "we have RT at home" and "DLSS at home", done cheaper and less impressive. But AMD has a whole story to tell too, from how they design things to how they choose to invest. If there's one job I'd love to do for them it's some kind of marketing consulting, because they are freaking terrible at telling that story, even though it is very interesting to see how and why they do things.

As long as NV mind grabs the market and is the only storyteller, they'll set whatever prices they want and AMD will always be the second fiddle. AMD needs a voice and a story to tell. You need to wrap those features with a nice, fun story to tell to get people hooked in. Jensen doesn't just sell AI cores, he wraps the AI cores into a "RT revolution that'll completely change how we game" and pays for the entire RT development in Cyberpunk 2077. AMD meanwhile has a 2mn showcase of RT at 35 fps on an XTX and is like "best RT mah boiiiiis".
 

Mahboi

Senior member
Apr 4, 2024
522
833
91
My expectation is that this gen will be a bit of a letdown, with AMD having had to rush this gen after redesigning their chips due to the RDNA3 problems and Nvidia sticking to too little VRAM. Also, Nvidia is definitely going to keep a lot in reserve for the mid-gen refresh, because they will then be up against RDNA5.
I disagree, AMD didn't redesign anything. RDNA 3 is fine, it's the power draw that is somehow some jerry rigged crapchute that demanded clocks be turned down 20%. Fix that (and apparently they did) and you instantly get 15% general perf improvement. On what's still a sort of modern day Vega, a cheap design that you can bulk build. RDNA 3 may yet be found in cheaper APUs in 2026/27 like Vega was shoved into Zen2/Zen3+ all the way to 2023...
 
Aug 4, 2023
199
425
96
If N44 is 19.5M transistors
N33 is 13.3B transistors, no way that N44, which is probably the exact same spec has 6B more xtors. Uncore will be a touch beefier maybe but the area it uses will remain similar, and remember that RDNA4 continues to gut out hardware for software where possible, offsetting xtor gains for PPC.
I think N44 will be 18B* xtors at 130-140mm^2
N48 34B* at 240-250mm^2
Basically think 10-15M more xtors/mm^2 than previous GPU's, really pushing the limits of N4P.
I trust the twitter leak over the baidu speculation.
*Assuming larger vGPR than N33.
 
Last edited:
Reactions: Tlh97

Saylick

Diamond Member
Sep 10, 2012
3,216
6,579
136
Yes, I've heard the story of Fermi being the first gen compute card and being sold as a "gaming card/space heater" hybrid by Jensen. AMD thought it was the perfect occasion to run NVidia out of the market and sold their better competitor card for less of a price, to completely take the market and ruin Nvidia. After 8 months, Jensen had sold his compute card to enterprises, still sold a sizeable amount of crappy cards to customers, and despite the general anger, had made all his money and reinvested it into a new, fixed Fermi, and into the next gen soon after (isn't that right Kepler_L2?). AMD meanwhile was left with little money to invest into the next gen, because their terrible margins gave them no elbow room.

In the end, the man who read the market well made away with a perfect road down the next 10 years, and AMD would soon shoot themselves in the other foot with Bulldozer, leading to their dark years.

I know AMD is paranoid about going for market and will never not go for margins now. And that no matter how much cheaper than NV they are, people still pay the Jensen tax, so there's no point in lowering prices if you sell an XTX for $700 and people still pay for the 4080 at $1000. Frankly I think the price strategy is bollocks but makes sense. My real worry is that NV has, for lack of a better term, a total mental grasp on the market. People just eat the NV slop like it's gold. It's also a fact that AMD has little response in terms of features, it's always "we have RT at home" and "DLSS at home", done cheaper and less impressive. But AMD has a whole story to tell too, from how they design things to how they choose to invest. If there's one job I'd love to do for them it's some kind of marketing consulting, because they are freaking terrible at telling that story, even though it is very interesting to see how and why they do things.

As long as NV mind grabs the market and is the only storyteller, they'll set whatever prices they want and AMD will always be the second fiddle. AMD needs a voice and a story to tell. You need to wrap those features with a nice, fun story to tell to get people hooked in. Jensen doesn't just sell AI cores, he wraps the AI cores into a "RT revolution that'll completely change how we game" and pays for the entire RT development in Cyberpunk 2077. AMD meanwhile has a 2mn showcase of RT at 35 fps on an XTX and is like "best RT mah boiiiiis".
In such a highly capital intensive market like semiconductors, winner-takes-all is paramount to recouping the initial investment and allowing for investment into future success. Said another way, success begets success. This is why it's so hard to dethrone Nvidia because no one gets fired for using Nvidia a la how no one gets fired for buying Intel CPUs. It's also why there's less and less pure play foundries that are developing cutting edge nodes: if you are the first to HVM with respectable yields on a leading edge node, you get the lion's share of the orders and thus recoup your investment earlier. By the time your competitors have caught up, you've paid off your initial investment and are already working on the next thing, while they are still trying to break even. Rinse and repeat, and monopolies develop.
 

Mahboi

Senior member
Apr 4, 2024
522
833
91
In such a highly capital intensive market like semiconductors, winner-takes-all is paramount to recouping the initial investment and allowing for investment into future success. Said another way, success begets success. This is why it's so hard to dethrone Nvidia because no one gets fired for using Nvidia a la how no one gets fired for buying Intel CPUs. It's also why there's less and less pure play foundries that are developing cutting edge nodes: if you are the first to HVM with respectable yields on a leading edge node, you get the lion's share of the orders and thus recoup your investment earlier. By the time your competitors have caught up, you've paid off your initial investment and are already working on the next thing, while they are still trying to break even. Rinse and repeat, and monopolies develop.
Law of Capitalism. Whoever has the money can spend the money in a way that makes them more money.
Just...law of Silicapitalism, I guess?
 

Saylick

Diamond Member
Sep 10, 2012
3,216
6,579
136
Law of Capitalism. Whoever has the money can spend the money in a way that makes them more money.
Just...law of Silicapitalism, I guess?
This effect applies more so in industries that require huge barriers to entry. Only those with the financial resources to more or less guarantee success can thrive. New entrants or competitors seeking a bigger slice of the pie must invest a disproportionate amount of resources in comparison to the incumbent if they want to succeed. See Intel vs. TSMC for example. If you're lagging behind TSMC, you must invest more and be more aggressive or else you will never catch up.
 

Mahboi

Senior member
Apr 4, 2024
522
833
91
This effect applies more so in industries that require huge barriers to entry. Only those with the financial resources to more or less guarantee success can thrive. New entrants or competitors seeking a bigger slice of the pie must invest a disproportionate amount of resources in comparison to the incumbent if they want to succeed. See Intel vs. TSMC for example. If you're lagging behind TSMC, you must invest more and be more aggressive or else you will never catch up.
And HW particularly suffers in this regard compared to software/services. Failbook or Twitter can afford to waste money for 10 years because it's assumed that the power of being N°1/having no real competition is going to drive income in some far away future. The one with the biggest database/client list wins. Once installed, Discord, Twitter, Facebook etc, never get replaced unless they severely go down in terms of feature parity.

But HW is a veeeeery different affair. HW can be great one gen and bad the next. There is no 10Y open window to "grow the business". The business sells now, or doesn't, may sell later, or may not, but there is no "it doesn't run a profit, and may run a profit in 10 years". No permanent customer/user retention. And HW is well, produced. If Twitter has no success for 5 years then suddenly explodes, they need infrastructure to handle 1M users for 5years then need a massively larger infra for 100M users starting when these users roll in. Their costs, in machines at least, depend on their success.
If Intel sells crappy ARC and nobody wants it, they're still produced. You don't sell a promise of ARC. Sure you can mitigate losses by cutting production when the product is terrible, but this is very limited, you don't stop or start mass prod without costs, and the biggest money sink was still in the R&D. Same as with software one might say, but I'd argue that good software is incredibly easy to make for an internet service today, and requires a few months of work from a competent dev team. Nowhere near what it costs to create a bleeding edge chip.

Well, I don't have much sympathy for Intel. They had the money, they had the clout, they had the history, the engineers...their competition was basically bleeding out in the ditch between 2012-2016. And look at them now.
It's Lisa time now.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,372
2,865
136
He started in a first post by stating 2770 and then in a later post he said 3GHz.

At first he aligned the datas stating 32 WGP/64 CUs/256b bus/693GB/s /2770MHz/ 240mm2.
His first post explicitly mentioned die sizes, the rest were only numbers.

2770 for N48
515 for N44
You come up with a theory that It was frequency not caring that the numbers were hugely different.
It cant be more clear than this, 2770GB/s as you stated would imply chips with roughly 100Gb/s speed while a 7800XT FI use 20Gb/s speed chips, you can see the huge discrepancy.
I was talking about IC BW -> Infinity cache bandwidth.
And yes, there is a huge difference in BW depending on how much MB cache there is.
 
Reactions: Tlh97

Aapje

Golden Member
Mar 21, 2022
1,433
1,951
106
I disagree, AMD didn't redesign anything. RDNA 3 is fine, it's the power draw that is somehow some jerry rigged crapchute that demanded clocks be turned down 20%. Fix that (and apparently they did) and you instantly get 15% general perf improvement.

If they didn't redesign anything, we'd have Navi 41, 42 and 43. AMD is not that creative in their naming.

So Navi 48 is definitely a redesign. Whether Navi 44 is so, is unclear, because we don't know whether they intended to make 3 or 4 chips.

And if RDNA4 is only a fix of RDNA3 with 15% improvement, that would be very disappointing.
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,493
3,623
96
So Navi 48 is definitely a redesign.
no it's just a new die slotted where N43 (I think 43) could not meet the cost target. next.
because we don't know whether they intended to make 3 or 4 chips.
5.
Each gen is 5 parts.
And if RDNA4 is only a fix of RDNA3 with 15% improvement, that would be very disappointing.
i don't think gfx12 has any relation to gfx11.
 

MrTeal

Diamond Member
Dec 7, 2003
3,578
1,725
136
His first post explicitly mentioned die sizes, the rest were only numbers.
View attachment 96563
2770 for N48
515 for N44
You come up with a theory that It was frequency not caring that the numbers were hugely different.

I was talking about IC BW -> Infinity cache bandwidth.
And yes, there is a huge difference in BW depending on how much MB cache there is.
What does drive the effective BW of the Infinity Cache? There is a massive difference between the monolithic N33 which has 32MB and 477GB/s, while the 7700XT with 3 MCD has 48MB and 1995GB/s.
Even for the MCM parts though, I can't seem to figure out what actually makes up the "effective bandwidth" number. The closest I can come up with is this, where Cache BW is Effective BW - Memory Bandwidth.
CardMCDCacheEff BWMem ClockMem BWEff BW/MCDCache BWCache BW/MCD
7700 XT
3​
48​
1995​
18​
432​
665​
1563​
521​
7800 XT
4​
64​
2708​
19.5​
624​
677​
2084​
521​
7900GRE
4​
64​
2250​
18​
576​
563​
1674​
419​
7900 XT
5​
80​
2900​
20​
800​
580​
2100​
420​
7900 XTX
6​
96​
3500​
20​
960​
583​
2540​
423​

That at least gives a reasonably consistent number for Infinity Cache BW per MCD. There's still a huge disconnect between the N31 cards and the N32 cards. I assume IC bandwidth is tied to the infinity fabric speed, and N32 just runs the clock 24% faster than N31. If anyone knows the exact calculation for effective BW in RDNA3 I'd love to see it.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |