Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38

Platinum Member
Oct 16, 2019
2,669
6,202
146

majord

Senior member
Jul 26, 2015
435
523
136
When AMD price too close to Nvidia based on their Raster performance , everyone screams they need to lower prices because RT and feature set can't compete... 'they'll never gain market share' , 'Distrupt the market' , etc etc.

AMD come in undercutting Nvidia significantly:

"Something's wrong"
"it must be even slower than the 4080 in rasterization"
" should be renamed and be dropped to $949"

You guys are funny.

As for the chip/architecture itself. The only thing that's "wrong" is the RT performance. Yet everyone's fixated on the clock speeds not being through the roof, not beating the s**t out of 4090 (even at a mere 355w) and therfore it must have been botched. It's a Fermi, it's an R520...

Hello? , since when is a 50% increase in perf/watt, and 60% increase in performance vs a predecessor "Botched"

It's still a huge uplift over RDNA2 at the end of he day. It's also the first Gen Chiplet architecture, which no doubt has presented a host of challenges, and wouldn't come without some compromise..

Comparing to Nvidia's Gen on Gen - They've gone from an inferior 8nm SS process, to a Superior custom '4nm' process , so you can't even draw any parallels there either. It was always going to be a challenge to maintain status quo with Nvidia this gen because of this fact.

Bit of a reality check people.. Raster perf and perf/watt is looking fine. Not amazing, not matching the random rumors started my morons, sure. but all things in the real world considered.. Fine.

RT.. Yeah It'd be interesting to discuss the "why's" around this. because regardless what personal importance you put on it, it's becoming more and more heavily weighted in reviews, but I can't tell if AMD seem to have consciously given it a low priority with the nature of changes made, or if it's performance is unusually low for the resources on tap. I'm struggling with this a bit as I don't fully understand the bottlenecks.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,948
7,362
136
This just strays into the realm of too good to be true or only makes sense if it's one of those "up to" measures of performance where one tiny aspect of performance sees a major uplift that may have an almost negligible impact on the overall average.

- This right here.

I mean, is there even 2.5x untapped performance left for CPUs to even feed a GPU?

The RTX 3000 series is already slamming into all kinds of card and processor side bottlenecks trying to feed its huge shader array, and it is in no way in nearly any workload 2.5 times faster than RDNA2.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,623
5,894
136
Putting the interconnects on top seems like a really nice way of doing that. The chips would obviously be design in a way that the toasty parts of GPU chiplets (CUs) and interconnects (L3) are not not under each-other.
If we examine the patent from AMD and cross reference with what kind of packaging TSMC (and we could even investigate the timeline) is offering it might be possible to make a reasonable explanation why the patent is how it is and why the previous patent could have been superseded.
One main problem is indeed heat transfer, and the main drawback of CoWoS and InFO (all other BE based stacking) is that there is a metallic layer as well and the chips are stacked together using Micro bumps. The issue with this is that there is a problem with heat transfer from one die to another across the microbumps. In addition to having to take care about different thermal coefficients of different materials.

Looking at the new patent for chiplet fabrication, there are no bumps/metallic layers between the two dies depicted which makes me believe this is probably FE based like SoIC

The FE based stacking make use of same material for the stacked dies and they have same thermal coefficient and the bonding layer allows for a better thermal transfer.



Only issue with this is that there are constraints in how the dies can be mixed and matched, but of course reaping all the benefits of chiplets when co designed properly.

In short, it seems to me the previous patent below uses CoWoS

Whereas the new patent uses SoIC

Read about advantages of SoIC vs CoWoS here
TL;DR; SoIC offers better thermal characteristics than CoWoS

Also I think FE actually is basically only the layers before any metal layers are done, not as described in the AT article.
 

Mopetar

Diamond Member
Jan 31, 2011
7,936
6,239
136
so native resolution could be dead as a metric as everyone will be forced to render 1080/1440 and upsample to 4k and use fsr/dlss to get the gains associated with "next gen" performance when it comes to framerates.

Gross. Why not just render at 720p and upsample that for even bigger numbers at that point.

Even worse is that there's no longer any consistent image quality afterwards since both technologies will result in a different upsampled image. If that result is the accepted benchmark then both companies will be pushed to trade losses in quality for additional performance gain because bigger numbers!

Benchmarks should be kept to native resolution renders only.
 

jrdls

Junior Member
Aug 19, 2020
12
12
51

Saylick

Diamond Member
Sep 10, 2012
3,217
6,585
136
Looks like the rumors for N31 are corralling on the new theory that it's a single N5 GCD and all the Infinity Cache and memory controllers are on N6 MCDs (6 total). Still MCM architecture, probably uses fan out bridges, but not 3D stacked. Still, with all of the logic on N5 and all of the IO and cache on N6, it appears to be very cost optimized. The N5 GCD likely is only around 400mm2 knowing that N21 was 520mm2 and around half of that was shaders. 520mm2 / 2 * 2.4x shader count / 2x node shrink gives you around 315mm2. Add on the PHY for the fan out bridges and extra die space for better RT and architectural improvements, 400mm2 seems to be within the ballpark.

Edit: Just adding more flavor from various leakers:

- RedGamingTech:


- KittyYYuko:

Translation:
Many people see the Navi 31's performance goal as three times the Navi 21's, but that's a mistake, and some say it's four times the Navi 21's.
I would like to reiterate that Navi31 is always designed to achieve 6 times the performance of Navi10.



- Kopite7Kimi:
 
Last edited:

GodisanAtheist

Diamond Member
Nov 16, 2006
6,948
7,362
136
Imagine this scenario:

- AMD launch just 2 GPUs, for now
- AMD sells their flagship GPU that trades blows @ least in raster with 4090 (if not higher performance) for ... say ... $1000
- AMD lowers the prices of ALL their current line accordingly, so that every tier has their own pricing

How do you think will nVidia respond to that?

It's not like there hasn't been a precedent for this: remember when Zen launched? 8c / 16t @ CPU around half the price of what Intel priced their premium CPU.

Re: Zen1 AMD was still looking to redeem itself after the Bulldozer arch debacle in a big big way. As AMD's Zen rep increased, so did the prices, until they were at parity with Intel.

AMD is going to price lower than NV, it's just not going to price that much lower.

Let's take a walk down memory lane: 2080ti was $1000 (AIB $1200), Radeon VII was $700 (5700xt was $400). 3090 was $1500, 6900xt was $1000. 4090 is $1600 ($2000+ AIB), 7900xt will be...

More than $1000 and the more competitive it is the more it will cost.
 
Reactions: Tlh97 and scineram

Timorous

Golden Member
Oct 27, 2008
1,673
2,955
136
That's probably rare enough that you could just make a small SKU for mobile if that ends up being big enough.



If people are going to get hung up over memory capacity then maybe by the time it shows up on desktop 3 or 4 GB chips will be available. If they (revert?) back to a schedule like RDNA 2 then it will be August or September of next year before you will see N33 desktop. If the laptop version sells poorly then maybe you will see it sooner.

The 4 lanes is maybe not an issue with laptops since they will be using PCIe 5 but more ram would be helpful on desktop 4.0 or 3.0 systems I think.

N33 is drop in compatible with N23 so will be 8 lanes.

N33 is going to be a cheap cheap part and with the spec it is perfect for 1080p and will be okay at 1440p. For that 8GB is all that is needed but it will be a 7600 tier part not 7700 tier.

N32 will fill in both x800 and x700 tier simply because that makes sense, 200mm of N5 is not a fat lot and then having the option of using 4 or 3 MCDs. Some of which will be forced if one of the IO links to the MCDs is defective. BOM wise it will be far cheaper than the N21 BOMs and it will be comparable to N22 BOMs.

Another way to look at is is a you can get about 107 N21 dies from a full wafer and 177 N22 dies from a full wafer, some of which will be defective (calculator I used did not include defect rate) for a total of 284 dies across 2 wafers.

With N32 you can have 295 dies from 1 wafer and 1719 MCDs from 1 wafer. That means from 2 wafers you can build 107 N32 based 7800XTs + 177 N32 based 7700XTs + have 11 N32 dies left and 760MCDs left.

Cost wise I believe 2 N7 wafers is a bit more than 1 N5 wafer, not sure if it is more than 1 N5 + 1 N6 wafer, but you are only really using a bit more than half the MCDs off of the N6 with that ratio so really 2 N5 wafers + 1 N6 wafer can probably build all the N32 cards that are required if the skew more towards 7700XTs. That is 3 wafers building nearly 600 GPUs at just under 200 GPUs per wafer which is better than N22 in terms of wafer efficiency and far far better than N21 for x800 series.

Also because 2 of the more popular segments are built using the same GCD it allows AMD much more flexibility it meeting needs depending on what product has more demand vs their forecasts.
 

uzzi38

Platinum Member
Oct 16, 2019
2,669
6,202
146
My memory must be fading. Navi 10 was the 5600XT/5700/5700XT. I don't recall a respin for any of these. And those cards were certainly not sub-par.
Around the same time as the 5600XT launched it came to market. No differences to the original silicon aside from it seemingly fixing issues with the display engine afaik. Those cards crashed less than the original silicon if nothing else.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,238
5,244
136
Not a single question that goes beyond of what is already known.
Pretty useless one.

Sorry it bored you. For me it was very cool seeing the engineer who basically convinced AMD to do chiplets in the first place talk about it.

Have we seen those slides before? I haven't seen them. Link if we have?

This is the first time I have seen any solid indication of how much Chiplets save AMD. Basically a monolithic 16 core Ryzen would cost them 2x to build vs the chiplet one. These savings are MUCH higher than I would have expected, and he said it was based on their internal yield models which he said were very accurate. So this isn't some vague marketing slide. The same slide also showed a more recent cost increase per area for smaller processes.

Plus the slide on the different scaling of Memory, Analog, Logic, was interesting and explains how the Memory controller chips work well.

Also very interesting, is what a massive work effort it is to port memory controllers/cache to a new node. I think a lot of the time, people assume porting the same stuff to a new node is trivial. This makes it clear that it's the total opposite of that, and just putting this stuff in a chiplet on the previous node, saves them a massive amount of work, and the kind of work no one really likes. Engineers want to work on new architecture logic, not porting memory controllers.

I also wondered if they were going to use an expensive silicon interposer for the chiplets, but he indicated they are using some much less expensive plastic tech.

In short I got a lot out of it. I hope some others did as well, even if you got nothing.
 
Mar 11, 2004
23,099
5,578
146
It's really a shame.
Why is AMD like this, at least with GPUs? Every odd generation is a tragedy. Again it seemed that they had a change to compete by the top but again is out of the game, and Nvidia keeps increasing their distance in the top stop.

And we're bad to the sad "Sonic Cycle of GPUs".

Same thing happens to Nvidia and people don't bat an eye. While people still demonize AMD for well basically every issue large or small, but compare the Vega/Volta situation. Vega had issues but it still was a lot more competitive than people seem to recall, meanwhile they forget that Volta literally never had a consumer card. There was 1 prosumer Titan card and that was it. And it was priced so insane that I was baffled people were surprised about NVidia's pricing when they'd already gone so bonkers. On top of that, for all the constant griping about AMD hype train, there were Nvidia fanboys claiming Volta was going to offer over 100% performance (legit they were more like 150-250%) improvement (which it ended up being I think ~50%). Not a single peep about any of that. Or how Nvidia has been doing pretty bad in perf/w (which suddenly doesn't matter the moment NVidia is bad at it, but if they aren't then suddenly its the thing that matters the most).

And Ampere was hyped to hell, especially after the SM count stuff came out. While it certainly wasn't a failure, if AMD had a similar outcome it'd have been considered a complete failure and we'd never hear the end of it. Heck we still see people complaining about the first 7970 era and how that shows AMD has always been inferior, and that was a decade ago. (Although it is odd seeing people pining for the fixed mega clocking version of the 7000 series as some awesome similarity.)

Nvidia is simply a strong competitor in comparison to Intel.

In addition. AMD being on 7nm when Nvidia was on Samsungs 8nm garbage created an illusion of AMD catching up but since maxwell, AMD has generally been behind.

Add in the high development of finfet chips, AMD generally CPU first spending for R and D and we get the typically AMD being behind in the GPU space.

What kind of magnifies this disappointment however is the hype train that AMD's guerilla marketing team creates for every launch. Fake leakers that mysteriously disappear after launches(the most recent is greymon). The use and then subsequent destruction of youtubers/influencers who use leaks for their platform. Eg. s semiaccurate, adoredtv, MLID, redgamingtech. After each of these hype trains crashes, typically these influencers have faded into obscurity. The next person awaiting that fate is likely MLID. AMD used to do it more transparently with people on forums and AMD representatives, but after similar hype crashes, AMD reps and subsequently own reputation AMD took a hit.

Wait, you're blaming AMD for people being morons/clowns and propping up "leakers"? Seriously, the "leakers" have become just behind crypto-currency evangelists as far as being FOS that everyone knows but still keeps propping up for some reason. But you people are like addicts, you admit its stupid to put any stock into it but you keep reporting and discussing every single rumor, often while griping about how much of it is obviously just nonsense and seeking to profit off of people doing the behavior you're complaining about.

Further, why does the Nvidia hype train get a free pass? Its been just as bad (go back and look at the Nvidia fanboys on here with regards to Volta; we saw similar hype over the SM count of Ampere and pretty sure there was some ). Especially considering the past history of behavior, I'd take a look at Nvidia before I'd blame AMD for that stuff. Notice how we get rumors suggesting doom for Nvidia (with regards to power use for instance), and then the product comes out and its not that bad. That's a well known "softening the blow" tactic. Meanwhile there's a mountain of absolute nonsense about AMD and then perpetual "well they didn't meet 4GHz so its all broke and they're awful I'll never forgive them for this!!!" behavior.

How do you explain Lovelace if Ampere was simply due to garbage Samsung process? This time Nvidia has the superior process and yet, even a supposedly broken AMD chip is very likely still to be better in perf/W. That remains to be seen where things end up, but that's very likely to still hold being true.

Its not that. Its because Nvidia literally spent billions for this mindshare. Actually Intel has been as competitive in CPUs as NVidia in GPUs so that argument is straight nonsense. It is straight up how Nvidia intentionally worked to create the mindset that is prevalent. There's a reason, despite it being junk (and their history of failing in GPU and giving up quickly), that people are hating AMD but pining for Intel in the GPU space.

The one thing I'll give you is that the complexity of modern chips is really tough. AMD is doing something extra complex as well. But that doesn't matter. Also, despite Nvidia having the superior process and simpler chip design, well AMD is a failure. Who cares if we don't know where things are because the AMD stuff isn't out yet, why let that stop the nonsense?
 

Timorous

Golden Member
Oct 27, 2008
1,673
2,955
136
That's why I also included Civ IV. We're really nitpicking at this point, when the leakers were so far off that in reality we didn't get any increase in perf/$ this generation at all.

I don't get why TPU measure Civ 6 frame rate. 60fps is fine because it is turn based and more does not make any real difference. It is one of the oddest benchmarks in their suite.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,373
2,868
136
I'm not sure you can say it won't be bandwidth starved with a shared 32MB L4 without knowing more details on how the cache works. Even in the absolute best case and it's 32MB dedicated IC, that's still 50-60% hit rate at 1080p,and then you're going out to the shared 90GB/s bus (at DDR5-5600) to main memory. That's a big if, and even then it's still a huge chunk of CUs with limited bandwidth. That's still 62% of the total bandwidth of the top Navi 24 part, and that's only 16 CU.

It'd be an interesting part as a 8800G or something, but it still feels like an answer in search of a problem. Something with that kind of silicon distribution feels a lot more like it'd be in a console or other custom solution, with a higher bandwidth memory interface.
It wouldn't look bad as a refresh for Xbox Series S, but this would cost more to make than what's currently inside in Xbox, so I am sceptical. If Microsoft or Sony wants to make a refresh, then I don't see It happening without a price increase.

About that SLC cache, 32MB doesn't look much If It's shared. That's why I calculated with 64MB SLC.
If someone has RX 6600XT, then he could test It by downclocking the 16gbps Vram to see what happens at 1080p in some benchmark.
 
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,936
6,239
136
When is AMD going to start a price war? They have the silicon and everybody hates the greed of Nvidia.

Unless they've got a massive amount of cheap wafers that they don't have anything else to do with I wouldn't hold my breath waiting.

AMD will just use the wafers for something more profitable. Chasing market share isn't terribly valuable if they have to slice their own throats to do it.
 

leoneazzurro

Senior member
Jul 26, 2016
952
1,515
136
About N31, if the claim of "50+% more perf/W compared to RDNA2" refers to FPS (and it was, in the case of the RDNA to RDNA2 comparison") then performance of N31 could be (very roughly) estimated. If N31 is in the 400W range, i.e., then the baseline increase will result already in double the performance of a 6900XT. If AMD is sandbagging, it may be even more (2,1-2,2x). Something I wonder is how it will reach this performance. N33 is supposed to be in the same ballpark of 6900XT at least in fullHD. It has less BW; less IC. Discussion above seems to conclude that VLIW2, while doubling FP32 resources per CU/WGP, increases the throughput per CU/WGP by 1,3x, 1,4x at best. That means that a 4096 ALU N33 would get the throughput of 2870 ALU N21, per clock. So it would require clocks way above 3GHz (almost 4GHz) to achieve a similar performance. Not considering the inferior bandwidth and IC amount. So what s the "secret sauce" of that, if these performance claims are true? VLIW2 is truly so relatively inefficient, or real throughput per CU is higher than 1,3-1,4x per clock? Are there some other secrets (caching, compression, return to Wave64 programming..) we don't know about?
 

Saylick

Diamond Member
Sep 10, 2012
3,217
6,585
136
Angstronomics has an article on RDNA3:
Not sure I agree with their die size and Infinity Cache estimates...

Navi 31
  • gfx1100 (Plum Bonito)
  • Chiplet - 1x GCD + 6x MCD (0-hi or 1-hi)
  • 48 WGP (96 legacy CUs, 12288 ALUs)
  • 6 Shader Engines / 12 Shader Arrays
  • Infinity Cache 96MB (0-hi), 192MB (1-hi)
  • 384-bit GDDR6
  • GCD on TSMC N5, ~308 mm²
  • MCD on TSMC N6, ~37.5 mm²
Navi32
  • gfx1101 (Wheat Nas)
  • Chiplet - 1x GCD + 4x MCD (0-hi)
  • 30 WGP (60 legacy CUs, 7680 ALUs)
  • 3 Shader Engines / 6 Shader Arrays
  • Infinity Cache 64MB (0-hi)
  • 256-bit GDDR6
  • GCD on TSMC N5, ~200 mm²
  • MCD on TSMC N6, ~37.5 mm²
Navi33
  • gfx1102 (Hotpink Bonefish)
  • Monolithic
  • 16 WGP (32 legacy CUs, 4096 ALUs)
  • 2 Shader Engines / 4 Shader Arrays
  • Infinity Cache 32MB
  • 128-bit GDDR6
  • TSMC N6, ~203 mm²
 

Timorous

Golden Member
Oct 27, 2008
1,673
2,955
136
That's a guess. None of us know the wafer supply & demand numbers. None of us know the Zen4 adoption rate. Enthusiasts like us are irrelevant to a large degree.

We do know that a recession is here and all these sales/production plans are about to be trashed. What's different, is this appears to be worldwide.

True.

It is also maths to a degree. AMD have X wafers and if they are manufacturing another GPU SKU it needs to come from somewhere. Sure it might mean some people who would have gone with N31 get upsold but that won't cover all the stock so some needs to come from other N5 lines if AMD want to keep their N32 and N31 supply numbers about the same the only place left if Zen 4.

I think the recession is another good point. I don't think the appetite for an expensive 450mm^2 512bit 32GB monster card will be there like it was lest gen where some people just got what they could even if meant stepping up a tier or two into 3090 range because it was somewhat available. This does make the PPA play pretty perfect because BOM cost is going to matter.

I don't expect AMD to price cheaply at all because they won't want to be seen as the budget option and I don't think AMD will be able to influence NVs pricing much anyway.

Besides. 2x 6900XT perf is pretty close to rumoured ada performance @450W and if N31 hits 2.5x then I expect it will still be the fastest part you can buy and it will consume less power if 2x8pin is correct.
 
Reactions: Tlh97 and Saylick

Mopetar

Diamond Member
Jan 31, 2011
7,936
6,239
136
Very interesting. My guess is that RNDA 4 will have multiple compute chiplets on the high end and that MI250 doubles as a testbed for RDNA 4. Latency is less of an issue for CDNA, so sub-optimal latency is acceptable. Then they can improve on this for RNDA 4.

Other than that I see a lot of overlap between the RNDA 3 rumors and these slides.

Explains a lot of the rumors and confusion around big Navi having multiple chiplets with shaders. Those parts existed, but were actually CDNA.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |