Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

adroc_thurston · Sep 29, 2023

HurleyBird said:
SMT uplift exceeds ST uplift.

No, lol.
Turin-D SMT on is just a 10% socket-level perf bump over SMT off.

TESKATLIPOKA said:
Core is wider, so enabling SMT should provide a bigger gain in performance than It did to Zen4 for example.

Zen3 had more EUs without a corresponding bump in OoO resource but SMT yield went down.
Zen4 has little in a way for raw EU count bump while having more OoO resource yet SMT yield is up.

Glo. · Sep 29, 2023

TESKATLIPOKA said:
There is discrepancy, true, but look what TDP that RTX 3050 has. Only 35W!

35W RTX 3050 has only 713MHz base and 1058MHz turbo, that's very very low.
Strix Point IGP needs 20-30% higher clocks to match It, that's only 1270-1375Mhz.

For comparison: RX 6550M(RDNA2) has 16CU and 2560MHz boost, 16MB IC and 144GB/s.
With basically ~1/2 clockspeed 102-136.5GB/s(6.4-8.53gbps) should be enough to feed both IGP and CPU in Strix Point.

P.S. If Strix Point had enough BW, then at 2.6GHz It could go even against RTX 4050 35W.

That 35W 3050M scores around 4500 pts in 3DMark Time Spy graphics.

780M scores around 2800-2900 pts.

16 CU version will have at best 3-3.2 GHz core clock, and 25% more CUs.

There is no world in which 16 CU Strix Point can achieve 35W 3050M performance.

randomhero · Sep 29, 2023

adroc_thurston said:
It kinda does, 25% socket power bump on the same platform is pretty major.
But the perf signs.

Huh.... Well, 25 % more power for 33% more cores and 25-ish % ( on average) more "IPC" is good tradeoff in my book.
Guess we will have to wait for Zen6 for substantial ppw improvement with compete SoC/package overhaul.

adroc_thurston · Sep 29, 2023

randomhero said:
Well, 25 % more power for 33% more cores and 25-ish % ( on average) more "IPC" is good tradeoff in my book.

Wow you've almost nailed Turin perf.
Congrats.

randomhero said:
Guess we will have to wait for Zen6 for substantial ppw improvement with compete SoC/package overhaul.

Venice is silly expensive.
Will be funny but also miserable.

TESKATLIPOKA · Sep 29, 2023

Glo. said:
That 35W 3050M scores around 4500 pts in 3DMark Time Spy graphics.

780M scores around 2800-2900 pts.

16 CU version will have at best 3-3.2 GHz core clock, and 25% more CUs.

There is no world in which 16 CU Strix Point can achieve 35W 3050M performance.

That's not true.

Top score for RTX 3050M is 5295 pts in 3DMark Time Spy graphics. Notebookcheck.net
Your 4500 pts is only 15% less, yet boost frequency at 35W is 40% lower than at >=80W.
That doesn't make sense.

If I calculated points based on boost frequency difference, then I would end up with 5295*0.6=3177pts.
Btw, the weakest score they have for this GPU is 3281 pts.
4500 pts is clearly for higher TGP.

P.S. 16CU is 33% more than what Phoenix has.

Glo. · Sep 29, 2023

TESKATLIPOKA said:
That's not true.

Top score for RTX 3050M is 5295 pts in 3DMark Time Spy graphics. Notebookcheck.net
Your 4500 pts is only 15% less, yet boost frequency at 35W is 40% lower than at >=80W.
That doesn't make sense.

If I calculated points based on boost frequency difference, then I would end up with 5295*0.6=3177pts.
Btw, the weakest score they have for this GPU is 3281 pts.
4500 pts is clearly for higher TGP.

P.S. 16CU is 33% more than what Phoenix has.

Yep, true, its 33% I was thinking backwards about its CU count.

I hope you are right with your calculations.

TESKATLIPOKA · Sep 29, 2023

Glo. said:
Yep, true, its 33% I was thinking backwards about its CU count.

I hope you are right with your calculations.

I don't guarantee anything, I just compared the specs and made some simple calculations, but maybe I underestimate how much that 16MB IC actually helps.

edit:
Another interesting thing in that screenshot you posted is that he claims 16C Zen5 in Strix Halo is ~25% faster in CB R23 than Dragon Range, but Strix Point with 4+8 is only 35% faster than Phoenix. If this was really true, then It would mean that the average clockspeed for Strix Point is 28% lower compared to Phoenix.

@Glo. I know that, I meant in RX 6550M.

Glo. · Sep 29, 2023

TESKATLIPOKA said:
I don't guarantee anything, I just compared the specs and made some simple calculations, but maybe I underestimate how much that 16MB IC actually helps.

Uh...

Standard Strix Point, non-Halo will NOT have Infinity Cache.

HurleyBird · Sep 29, 2023

adroc_thurston said:
No, lol.
Turin-D SMT on is just a 10% socket-level perf bump over SMT off.

That's what, half (or even less) of what Bergamo gets with the same core count? Granted SMT yield is fairly dependent on which enterprise benchmarks you use (customers will do their own benchmarks and disable for the edge cases where there's a regression), but that still sounds like a very, let's say, audacious claim.

If someone like STH eventually corroborates this, I think that's all the proof we'll need that you're legit. But if it turns out otherwise, I think that's all the proof we'll need that you aren't.

adroc_thurston said:
Zen3 had more EUs without a corresponding bump in OoO resource but SMT yield went down.
Zen4 has little in a way for raw EU count bump while having more OoO resource yet SMT yield is up.

Which tells you that EU vs OoO is an oversimplification. Zen3 was able to get great utilization out of its additional EUs, had good branch prediction improvements, etc. (although iirc Zen 2 and Zen 3 SMT is very similar, no?). Meanwhile more L2 and looser timings in many places in Zen4 help SMT yield. And there's certainly hundreds of other changes in both cores that have some effect.

But in general, more resources that are less utilized helps SMT. And that would seem like Occam's razor for Zen 5 which has substantially more resources, to the extent that previous Zen iterations look incremental in comparison. If somehow utilization has improved despite this, everything else being equal, that would manifest as obscene 1T gains while reducing SMT yield. That would take a small miracle to pull off though. So the third option is that there are some flaws or trade-offs that are less tangible but are enormously negatively affecting SMT? Something feels a bit off about all of this if I'm being honest.

Fjodor2001 · Sep 29, 2023

Joe NYC said:
Zen 5 standard would be N4 with 8 cores per CCD
Zen 5c N3 with 16 cores per CCD
Zen 6 standard N3 with 16 cores per CCD
Zen 6c N2 with 32 cores per CCD

Correct. And there will be Zen5 AM5 desktop SKUs with:

2xZen5 8C = 16C
1xZen5 8C + 1xZen5C 16C = 24C
2xZen5C 16C = 32C

Just called Lisa Su and double checked. Apparently it hasn't rippled down through the hierarchy to adroc_thurston yet.

TESKATLIPOKA · Sep 29, 2023

Fjodor2001 said:
Correct. And there will be Zen5 AM5 desktop SKUs with:

2xZen5 8C = 16C
1xZen5 8C + 1xZen5C 16C = 24C
2xZen5C 16C = 32C

Just called Lisa Su and double checked. Apparently it hasn't rippled down through the hierarchy to adroc_thurston yet.

There is no good reason to release 32C64T Zen5C, when It's not capable of high clocks at low thread count.
It would be a very niche product.

R81Z3N1 · Sep 29, 2023

I like the idea of 2xzen5C would love to have 32 cores on desktop. Or a real 24 core desktop, I know e-core and stuff. Threadripper and stuff but we need something for us peasants.

adroc_thurston · Sep 29, 2023

HurleyBird said:
That's what, half (or even less) of what Bergamo gets with the same core count?

Yeah.

HurleyBird said:
Granted SMT yield is fairly dependent on which enterprise benchmarks you use

Well this is SIR, about as SMT-friendly as it gets really.
Genoa SMT benefit is ~30%-ish there.

HurleyBird said:
Zen3 was able to get great utilization out of its additional EUs, had good branch prediction improvements, etc.

That's also true for Zen5 and every other "new core" by AMD.

HurleyBird said:
If somehow utilization has improved despite this, everything else being equal, that would manifest as obscene 1T gains while reducing SMT yield

Kinda the point and also the reason why you're not really supposed to run Turin-D SMT on.

Kepler_L2 · Sep 29, 2023

HurleyBird said:
Assuming that both the "30%+ IPC" and "no way it's going to be 30% IPC" camps are valid, there's an easy explanation:

SMT uplift exceeds ST uplift.

Building a wide core and feeding a wide core are two different matters. But to the extent that Zen5 has difficulty utilizing its wider structures in ST, it should have an easier time utilizing those structures via SMT.

Other way around.

Fjodor2001 · Sep 29, 2023

TESKATLIPOKA said:
There is no good reason to release 32C64T Zen5C, when It's not capable of high clocks at low thread count.
It would be a very niche product.

It's on the optimal point on the efficiency curve. Max MT performance at lowest power consumption. Best perf/watt.

It's why they set a 170W TDP on AM5. The 16C Zen4 was just temporary. The last few extra 100 MHz:es on 16C when using the full 170W are pointless and just consumes a lot of power with very little perf gain.

The long-term intention with the 170W on AM5 was of course preparation for more cores on Zen5 and later, with 24/32C. Quite obvious.

HurleyBird · Sep 29, 2023

adroc_thurston said:
Kinda the point and also the reason why you're not really supposed to run Turin-D SMT on.

Kepler_L2 said:
Other way around.

Okay, so you two and MLID seem to be corroborating a lot of details.

But I can't quite seem to square this circle in my head with the 2x64c Turin Cinebench leak, showing only a ~15% uplift. Even if you fudge that number up to 20% or even 25% to account for early silicon/platform stuff, that number sounds extremely low if:

Zen5 is so much wider than Zen 4, and
AMD engineers pulled a minor miracle because Zen5 utilizes its resources so much better than Zen4 does that SMT uplift is somewhere between halved the thirded.

In the scenario where the above two points are accurate, you cannot say that nT performance only looks low because of low SMT yield, because the only reason you have low SMT yield is that you're already extracting that performance, and there just isn't that much left to extract. It's not a trade-off, it's a win 100% of the time.

So, what's up here? Either we're looking at the mother of all edge cases, or that specific benchmark leak is FUBARed, or SMT is FUBARed (or some combination). How do we square this circle?

adroc_thurston · Sep 29, 2023

HurleyBird said:
But I can't quite seem to square this circle in my head with the 2x64c Turin Cinebench leak, showing only a ~15% uplift

Cinememe is a horrible benchmark for server parts. Full stop.

HurleyBird said:
Zen5 is so much wider than Zen 4, and

AMD engineers pulled a small miracle and Zen5 utilizes its resources so much better than Zen4 does that SMT uplift is somewhere between halved the thirded.

It's not a miracle, you just cover it in generous amounts of OoO and BP and scheduling resources.

HurleyBird said:
How do we square this circle?

It's a single-threaded core with SMT tackled on or you should at least treat it as such.

TESKATLIPOKA · Sep 29, 2023

Fjodor2001 said:
It's on the optimal point on the efficiency curve. Max MT performance at lowest power consumption. Best perf/watt.

It's why they set a 170W TDP on AM5. The 16C Zen4 was just temporary. The last few extra 100 MHz:es on 16C when using the full 170W are pointless and just consumes a lot of power with very little perf gain.

The long-term intention with the 170W on AM5 was of course preparation for more cores on Zen5 and later, with 24/32C. Quite obvious.

The point is what core you chose for that 32 core CPU.
Zen5C is not good for desktop.
If It was a combination of 8Zen5 + 16Zen5c, then It could be interesting for MT, but not pure Zen5C.

Zen5C most likely will clock a lot lower than standard Zen5.
16 Zen5 vs 32 Zen5C
It would lose horribly in programs using only 16 threads.
This CPU would be good only for those who can use all of It.
So a very niche product.

HurleyBird · Sep 29, 2023

adroc_thurston said:
Cinememe is a horrible benchmark for server parts. Full stop.

Have we seen different perf/clock uplifts in this benchmark between server and consumer parts in the past for new architectures? To the level of >=~2x the difference?

adroc_thurston said:
It's a single-threaded core with SMT tackled on or you should at least treat it as such.

That doesn't square the circle though. Lowering SMT yield won't lower MT performance when the reason for less SMT yield is that the architecture is so super-awesome that SMT only has scraps left to work with. Kinda the opposite.

yuri69 · Sep 29, 2023

Details are out, reportedly the samples been out for some time. We need leakz. Preferably GB.

adroc_thurston · Sep 29, 2023

HurleyBird said:
Have we seen different perf/clock uplifts in this benchmark between server and consumer parts in the past for new architectures?

It just doesn't scale to 64c*2p systems.

HurleyBird said:
Lowering SMT yield won't lower MT performance

Yeah it would but really depends on the workload.

yuri69 said:
We need leakz

Goto China, buy a candyvan, trap a random Lenovo validation eng in said van, and maybe you'll get something out of it.

Markfw · Sep 29, 2023

This is more of a comment, but also a little bit of a question. And yes I know about OEMS and idiot IT managers, BUT...

With AMD as king in performance and perf/watt and perf/$$ for at least 4 years, and with Genoa,Genoa-X and Bergamo do crushing anything Intel has, or is going to release soon, how is it that they STILL can't get more market share. How long can Intels name keep them selling their crap server parts this far down the line ? This many years with crap ?

Edit: not to mention Zen 5 and Turin.....

Geddagod · Sep 29, 2023

TESKATLIPOKA said:
Zen5C most likely will clock a lot lower than standard Zen5.
16 Zen5 vs 32 Zen5C
It would lose horribly in programs using only 16 threads.
This CPU would be good only for those who can use all of It.
So a very niche product.

Oh ye forgot to comment, and slightly tangential, but from Zen4C testing, it appears as if it doesn't win perf/clock against Zen 4 anywhere on the frequency curve despite what AMD said. Implementing the -C cores on a a better node might help differentiate them.

Markfw said:
and with Genoa,Genoa-X and Bergamo do crushing anything Intel has, or is going to release soon,

GNR?

adroc_thurston · Sep 29, 2023

Markfw said:
how is it that they STILL can't get more market share

Channel sales hard.

Geddagod said:
it appears as if it doesn't win perf/clock against Zen 4 anywhere on the frequency curve

Not made to, it's a power/area play.

TESKATLIPOKA · Sep 29, 2023

Geddagod said:
Oh ye forgot to comment, and slightly tangential, but from Zen4C testing, it appears as if it doesn't win perf/clock against Zen 4 anywhere on the frequency curve despite what AMD said. Implementing the -C cores on a a better node might help differentiate them.

Who tested It? Link, please.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Diamond Member

Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Member

Diamond Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Senior member

Diamond Member

Moderator Emeritus, Elite Member

Golden Member

Diamond Member

Platinum Member