Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 143 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146

Saylick

Diamond Member
Sep 10, 2012
3,892
9,041
136
Yikes. If those 3DMark scores are even remotely true, and if we are to assume they reflect gaming performance, then we're only looking at a 25-29% uplift between the 6950XT and the 7900XTX? That seems ludicrously low for a new generation.

I think we need to wait for reviews to make final judgements. Something doesn't add up.
 

Tup3x

Golden Member
Dec 31, 2016
1,238
1,359
136
I will add the graphs from your link.

30% higher Time Spy score with Extreme(4K) Preset than RX 6950XT. Performance(1440p) preset is only 17% better. Both N31 are too close to each other in Time Spy.
View attachment 72601

23% higher Fire Strike score with Ultra(4K) Preset than RX 6950XT. Extreme (1440p) preset is only 24.5% better.
View attachment 72602
Fire Strike has been obsolete for quite some time but Time Spy... I so hope that is not true. If it is then the pricing suddenly starts to make sense. It would be quite disappointing considering what kind of chip is inside RTX 4080.

Maybe reference model hits power limit?
 
Last edited:

exquisitechar

Senior member
Apr 18, 2017
722
1,019
136
Looks like N31 might be worse than AMD’s benchmarks implied. If the 7900 XTX doesn’t make a sizable gap between it and the 4080 in raster performance, that will be a massive failure for AMD and not something they need when their market share is reaching single digits. Well, we’ll see exactly how it performs in a few days.

If the bug rumors are true, they can salvage the situation a bit later on, but the generation is still a massive missed opportunity for AMD.
 

adamge

Member
Aug 15, 2022
113
215
86
Looks like N31 might be worse than AMD’s benchmarks implied. If the 7900 XTX doesn’t make a sizable gap between it and the 4080 in raster performance, that will be a massive failure for AMD and not something they need when their market share is reaching single digits. Well, we’ll see exactly how it performs in a few days.

If the bug rumors are true, they can salvage the situation a bit later on, but the generation is still a massive missed opportunity for AMD.

This is all becoming a bit concerning on the Radeon front. I am wondering how MLID is going to react. He's been pretty quiet about RDNA3 for a few months now.
 

gdansk

Diamond Member
Feb 8, 2011
4,161
6,948
136
If it doesn't end up basically between 4090 and 4080 in rasterization then Navi 31 is probably the biggest failure since Vega.
Of course a few early benchmarks doesn't mean much. And even if they are right maybe Navi 32 can redeem RDNA3.
 

Kaluan

Senior member
Jan 4, 2022
507
1,074
106
Guessing a reviewer.
Guessing the same, but the lack of transparency is off-putting.
They're not in the bussiness of being tech world sources/'content creators' themselves, but that of reporting on them.

Anyway, we're less than half a week away. Don't know if I wanna join the herd and lose my mind just yet 😂
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,779
136
Going through mesa, seems GFX11 is really bug ridden, worse than RDNA1.

  • ALU Stall requires a lot of additional instructions to be inserted to signal a switch to another wave.

  • Export conflict, likely coming from the OREO
Currently all GFX11 have it.​

  • GFX11 has excessive hazards which need a lot of wait states and idling/nop to allow the data dependency to get resolved.

  • Yikes so many dependency issues, which the HW cannot manage.
## RDNA3 / GFX11 hazards

### VcmpxPermlaneHazard

Same as GFX10.

### LdsDirectVALUHazard

Triggered by:
LDSDIR instruction writing a VGPR soon after it's used by a VALU instruction.

Mitigated by:
A vdst wait, preferably using the LDSDIR's field.

### LdsDirectVMEMHazard

Triggered by:
LDSDIR instruction writing a VGPR after it's used by a VMEM/DS instruction.

Mitigated by:
Waiting for the VMEM/DS instruction to finish, a VALU or export instruction, or
`s_waitcnt_depctr 0xffe3`.

### VALUTransUseHazard

Triggered by:
A VALU instrction reading a VGPR written by a transcendental VALU instruction without 6+ VALU or 2+
transcendental instructions in-between.

Mitigated by:
A va_vdst=0 wait: `s_waitcnt_deptr 0x0fff`

### VALUPartialForwardingHazard

Triggered by:
A VALU instruction reading two VGPRs: one written before an exec write by SALU and one after. To
trigger, there must be less than 3 VALU between the first and second VGPR writes and less than 5
VALU between the second VGPR write and the current instruction.

Mitigated by:
A va_vdst=0 wait: `s_waitcnt_deptr 0x0fff`

### VALUMaskWriteHazard

Triggered by:
SALU writing then reading a SGPR that was previously used as a lane mask for a VALU.

Mitigated by:
A VALU instruction reading a SGPR or with literal, or a sa_sdst=0 wait: `s_waitcnt_depctr 0xfffe`
 
Last edited:

Leeea

Diamond Member
Apr 3, 2020
3,799
5,566
136
Heh, some of you people have really volatile emotions. It's like watching sports fans as their team goes through ups and downs throughout a season.
Yea, the whiplash on this one is pretty brutal.


I think the 3dmark scores are legit. It is not a problem with the benchmark either. Or more accurately, it is AMD's problem either way.


It is wait and see now. Will be rather ironic if a 4080 ends up being "good value" in comparison.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
As i said again and again that RDNA 2 was able to compete with Ampere was simply because AMD had clock speed advantage due to Samsung die's sucked.

Now that advantage is gone as you can see what ampere really could have done if it had same die as AMD 5nm.

RDNA2 had a frequency advantage, but Ampere had an advantage in the number of FP32 units, which provided ~25-30% higher gaming performance. Because of this, they were pretty comparable in raster performance CU vs SM.

Just because Ada clocks ~2.7GHz doesn't mean Ampere would achieve the same frequency using the same process. Wanting ~50% higher frequency is asking a lot just from the node without optimizing the architecture. RDNA2 on the same process clocked a lot higher than RDNA1 for example, so It was done just by optimizing the architecture.
 
Last edited:

Bigos

Member
Jun 2, 2019
186
476
136
Going through mesa, seems GFX11 is really bug ridden, worse than RDNA1.

  • ALU Stall requires a lot of additional instructions to be inserted to signal a switch to another wave.

  • Export conflict, likely coming from the OREO
Currently all GFX11 have it.​

  • GFX11 has excessive hazards which need a lot of wait states and idling/nop to allow the data dependency to get resolved.

  • Yikes so many dependency issues, which the HW cannot manage.

Other than the export bug (which I cannot quantify), all look like deliberate hardware simplifications by putting more complexity on the compiler. With how shaders are compiled (mostly everything inlined into a single function), the compiler actually has enough information to schedule the instructions without much penalty. Though the added helper instructions could put more pressure on the instruction cache. This is a trade off and we will see how it fares in practice.

Still, do not search for gfx11 shader programming guidance in mesa's aco just yet. It is clearly not optimized for the new architecture (lack of VOPD support is a clear sign). LLVM is more geared towards ROCm as well, but it probably will fare better in games on gfx11 than aco on launch.
 
Reactions: scineram
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |