Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 153 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

adroc_thurston

Diamond Member
Jul 2, 2023
3,134
4,508
96
SMT uplift exceeds ST uplift.
No, lol.
Turin-D SMT on is just a 10% socket-level perf bump over SMT off.
Core is wider, so enabling SMT should provide a bigger gain in performance than It did to Zen4 for example.
Zen3 had more EUs without a corresponding bump in OoO resource but SMT yield went down.
Zen4 has little in a way for raw EU count bump while having more OoO resource yet SMT yield is up.
 

Glo.

Diamond Member
Apr 25, 2015
5,753
4,659
136
There is discrepancy, true, but look what TDP that RTX 3050 has. Only 35W!

35W RTX 3050 has only 713MHz base and 1058MHz turbo, that's very very low.
Strix Point IGP needs 20-30% higher clocks to match It, that's only 1270-1375Mhz.

For comparison: RX 6550M(RDNA2) has 16CU and 2560MHz boost, 16MB IC and 144GB/s.
With basically ~1/2 clockspeed 102-136.5GB/s(6.4-8.53gbps) should be enough to feed both IGP and CPU in Strix Point.

P.S. If Strix Point had enough BW, then at 2.6GHz It could go even against RTX 4050 35W.
That 35W 3050M scores around 4500 pts in 3DMark Time Spy graphics.

780M scores around 2800-2900 pts.

16 CU version will have at best 3-3.2 GHz core clock, and 25% more CUs.

There is no world in which 16 CU Strix Point can achieve 35W 3050M performance.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
That 35W 3050M scores around 4500 pts in 3DMark Time Spy graphics.

780M scores around 2800-2900 pts.

16 CU version will have at best 3-3.2 GHz core clock, and 25% more CUs.

There is no world in which 16 CU Strix Point can achieve 35W 3050M performance.
That's not true.

Top score for RTX 3050M is 5295 pts in 3DMark Time Spy graphics. Notebookcheck.net
Your 4500 pts is only 15% less, yet boost frequency at 35W is 40% lower than at >=80W.
That doesn't make sense.

If I calculated points based on boost frequency difference, then I would end up with 5295*0.6=3177pts.
Btw, the weakest score they have for this GPU is 3281 pts.
4500 pts is clearly for higher TGP.

P.S. 16CU is 33% more than what Phoenix has.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,753
4,659
136
That's not true.

Top score for RTX 3050M is 5295 pts in 3DMark Time Spy graphics. Notebookcheck.net
Your 4500 pts is only 15% less, yet boost frequency at 35W is 40% lower than at >=80W.
That doesn't make sense.

If I calculated points based on boost frequency difference, then I would end up with 5295*0.6=3177pts.
Btw, the weakest score they have for this GPU is 3281 pts.
4500 pts is clearly for higher TGP.

P.S. 16CU is 33% more than what Phoenix has.
Yep, true, its 33% I was thinking backwards about its CU count.

I hope you are right with your calculations.
 
Reactions: TESKATLIPOKA

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
Yep, true, its 33% I was thinking backwards about its CU count.

I hope you are right with your calculations.
I don't guarantee anything, I just compared the specs and made some simple calculations, but maybe I underestimate how much that 16MB IC actually helps.

edit:
Another interesting thing in that screenshot you posted is that he claims 16C Zen5 in Strix Halo is ~25% faster in CB R23 than Dragon Range, but Strix Point with 4+8 is only 35% faster than Phoenix. If this was really true, then It would mean that the average clockspeed for Strix Point is 28% lower compared to Phoenix.

@Glo. I know that, I meant in RX 6550M.
 
Last edited:

HurleyBird

Platinum Member
Apr 22, 2003
2,725
1,342
136
No, lol.
Turin-D SMT on is just a 10% socket-level perf bump over SMT off.

That's what, half (or even less) of what Bergamo gets with the same core count? Granted SMT yield is fairly dependent on which enterprise benchmarks you use (customers will do their own benchmarks and disable for the edge cases where there's a regression), but that still sounds like a very, let's say, audacious claim.

If someone like STH eventually corroborates this, I think that's all the proof we'll need that you're legit. But if it turns out otherwise, I think that's all the proof we'll need that you aren't.

Zen3 had more EUs without a corresponding bump in OoO resource but SMT yield went down.
Zen4 has little in a way for raw EU count bump while having more OoO resource yet SMT yield is up.

Which tells you that EU vs OoO is an oversimplification. Zen3 was able to get great utilization out of its additional EUs, had good branch prediction improvements, etc. (although iirc Zen 2 and Zen 3 SMT is very similar, no?). Meanwhile more L2 and looser timings in many places in Zen4 help SMT yield. And there's certainly hundreds of other changes in both cores that have some effect.

But in general, more resources that are less utilized helps SMT. And that would seem like Occam's razor for Zen 5 which has substantially more resources, to the extent that previous Zen iterations look incremental in comparison. If somehow utilization has improved despite this, everything else being equal, that would manifest as obscene 1T gains while reducing SMT yield. That would take a small miracle to pull off though. So the third option is that there are some flaws or trade-offs that are less tangible but are enormously negatively affecting SMT? Something feels a bit off about all of this if I'm being honest.
 
Last edited:

Fjodor2001

Diamond Member
Feb 6, 2010
3,890
347
126
Zen 5 standard would be N4 with 8 cores per CCD
Zen 5c N3 with 16 cores per CCD
Zen 6 standard N3 with 16 cores per CCD
Zen 6c N2 with 32 cores per CCD
Correct. And there will be Zen5 AM5 desktop SKUs with:

2xZen5 8C = 16C
1xZen5 8C + 1xZen5C 16C = 24C
2xZen5C 16C = 32C

Just called Lisa Su and double checked. Apparently it hasn't rippled down through the hierarchy to adroc_thurston yet.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
Correct. And there will be Zen5 AM5 desktop SKUs with:

2xZen5 8C = 16C
1xZen5 8C + 1xZen5C 16C = 24C
2xZen5C 16C = 32C

Just called Lisa Su and double checked. Apparently it hasn't rippled down through the hierarchy to adroc_thurston yet.
There is no good reason to release 32C64T Zen5C, when It's not capable of high clocks at low thread count.
It would be a very niche product.
 
Reactions: Tlh97 and yuri69

R81Z3N1

Member
Jul 15, 2017
77
24
81
I like the idea of 2xzen5C would love to have 32 cores on desktop. Or a real 24 core desktop, I know e-core and stuff. Threadripper and stuff but we need something for us peasants.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,134
4,508
96
That's what, half (or even less) of what Bergamo gets with the same core count?
Yeah.
Granted SMT yield is fairly dependent on which enterprise benchmarks you use
Well this is SIR, about as SMT-friendly as it gets really.
Genoa SMT benefit is ~30%-ish there.
Zen3 was able to get great utilization out of its additional EUs, had good branch prediction improvements, etc.
That's also true for Zen5 and every other "new core" by AMD.
If somehow utilization has improved despite this, everything else being equal, that would manifest as obscene 1T gains while reducing SMT yield
Kinda the point and also the reason why you're not really supposed to run Turin-D SMT on.
 

Kepler_L2

Senior member
Sep 6, 2020
445
1,824
106
Assuming that both the "30%+ IPC" and "no way it's going to be 30% IPC" camps are valid, there's an easy explanation:

SMT uplift exceeds ST uplift.

Building a wide core and feeding a wide core are two different matters. But to the extent that Zen5 has difficulty utilizing its wider structures in ST, it should have an easier time utilizing those structures via SMT.
Other way around.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,890
347
126
There is no good reason to release 32C64T Zen5C, when It's not capable of high clocks at low thread count.
It would be a very niche product.

It's on the optimal point on the efficiency curve. Max MT performance at lowest power consumption. Best perf/watt.

It's why they set a 170W TDP on AM5. The 16C Zen4 was just temporary. The last few extra 100 MHz:es on 16C when using the full 170W are pointless and just consumes a lot of power with very little perf gain.

The long-term intention with the 170W on AM5 was of course preparation for more cores on Zen5 and later, with 24/32C. Quite obvious.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,725
1,342
136
Kinda the point and also the reason why you're not really supposed to run Turin-D SMT on.
Other way around.

Okay, so you two and MLID seem to be corroborating a lot of details.

But I can't quite seem to square this circle in my head with the 2x64c Turin Cinebench leak, showing only a ~15% uplift. Even if you fudge that number up to 20% or even 25% to account for early silicon/platform stuff, that number sounds extremely low if:

  • Zen5 is so much wider than Zen 4, and
  • AMD engineers pulled a minor miracle because Zen5 utilizes its resources so much better than Zen4 does that SMT uplift is somewhere between halved the thirded.
In the scenario where the above two points are accurate, you cannot say that nT performance only looks low because of low SMT yield, because the only reason you have low SMT yield is that you're already extracting that performance, and there just isn't that much left to extract. It's not a trade-off, it's a win 100% of the time.

So, what's up here? Either we're looking at the mother of all edge cases, or that specific benchmark leak is FUBARed, or SMT is FUBARed (or some combination). How do we square this circle?
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,134
4,508
96
But I can't quite seem to square this circle in my head with the 2x64c Turin Cinebench leak, showing only a ~15% uplift
Cinememe is a horrible benchmark for server parts. Full stop.
  • Zen5 is so much wider than Zen 4, and
  • AMD engineers pulled a small miracle and Zen5 utilizes its resources so much better than Zen4 does that SMT uplift is somewhere between halved the thirded.
It's not a miracle, you just cover it in generous amounts of OoO and BP and scheduling resources.
How do we square this circle?
It's a single-threaded core with SMT tackled on or you should at least treat it as such.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
It's on the optimal point on the efficiency curve. Max MT performance at lowest power consumption. Best perf/watt.

It's why they set a 170W TDP on AM5. The 16C Zen4 was just temporary. The last few extra 100 MHz:es on 16C when using the full 170W are pointless and just consumes a lot of power with very little perf gain.

The long-term intention with the 170W on AM5 was of course preparation for more cores on Zen5 and later, with 24/32C. Quite obvious.
The point is what core you chose for that 32 core CPU.
Zen5C is not good for desktop.
If It was a combination of 8Zen5 + 16Zen5c, then It could be interesting for MT, but not pure Zen5C.

Zen5C most likely will clock a lot lower than standard Zen5.
16 Zen5 vs 32 Zen5C
It would lose horribly in programs using only 16 threads.
This CPU would be good only for those who can use all of It.
So a very niche product.
 
Last edited:
Reactions: Tlh97

HurleyBird

Platinum Member
Apr 22, 2003
2,725
1,342
136
Cinememe is a horrible benchmark for server parts. Full stop.

Have we seen different perf/clock uplifts in this benchmark between server and consumer parts in the past for new architectures? To the level of >=~2x the difference?

It's a single-threaded core with SMT tackled on or you should at least treat it as such.

That doesn't square the circle though. Lowering SMT yield won't lower MT performance when the reason for less SMT yield is that the architecture is so super-awesome that SMT only has scraps left to work with. Kinda the opposite.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,134
4,508
96
Have we seen different perf/clock uplifts in this benchmark between server and consumer parts in the past for new architectures?
It just doesn't scale to 64c*2p systems.
Lowering SMT yield won't lower MT performance
Yeah it would but really depends on the workload.
We need leakz
Goto China, buy a candyvan, trap a random Lenovo validation eng in said van, and maybe you'll get something out of it.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,721
14,749
136
This is more of a comment, but also a little bit of a question. And yes I know about OEMS and idiot IT managers, BUT...

With AMD as king in performance and perf/watt and perf/$$ for at least 4 years, and with Genoa,Genoa-X and Bergamo do crushing anything Intel has, or is going to release soon, how is it that they STILL can't get more market share. How long can Intels name keep them selling their crap server parts this far down the line ? This many years with crap ?

Edit: not to mention Zen 5 and Turin.....
 

Geddagod

Golden Member
Dec 28, 2021
1,184
1,144
106
Zen5C most likely will clock a lot lower than standard Zen5.
16 Zen5 vs 32 Zen5C
It would lose horribly in programs using only 16 threads.
This CPU would be good only for those who can use all of It.
So a very niche product.
Oh ye forgot to comment, and slightly tangential, but from Zen4C testing, it appears as if it doesn't win perf/clock against Zen 4 anywhere on the frequency curve despite what AMD said. Implementing the -C cores on a a better node might help differentiate them.
and with Genoa,Genoa-X and Bergamo do crushing anything Intel has, or is going to release soon,
GNR?
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,409
2,904
136
Oh ye forgot to comment, and slightly tangential, but from Zen4C testing, it appears as if it doesn't win perf/clock against Zen 4 anywhere on the frequency curve despite what AMD said. Implementing the -C cores on a a better node might help differentiate them.
Who tested It? Link, please.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |