Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Philste · Jan 5, 2024

uzzi38 said:
I'll preface this by saying I don't know Strix Point perf numbers, but I still wouldn't expect a 50% bump over PHX personally.

I'm expecting more along the lines of ~25-30% perf at best. Memory bandwidth is the concern really, PHX seems massively mem bw bound by 28w.

Don't hate me for this, but right now I wonder if Strix' iGPU will be any faster than Phoenix. If Strix really doesn't have a bigger GPU Cache (which is likely, because Microsoft wants big AIE) I think it will be completely memory bound.

I mean look at Phoenix:
7600 is ~8-10% faster than 6650XT at mostly same clocks in most reviews, so that's what RDNA3 delivers over RDNA2. 780M clocks 18% (2.8GHz vs 2.4GHz) higher than 680M. So expected performance uplift of 780M would be in the 25-30% region. However, with same RAM it's barely 10% faster, more like 5%. If Strix just dumps 2 more WGPs in there i don't see how it is any faster than 780M at ISO RAM.

FlameTail · Jan 5, 2024

Philste said:
Don't hate me for this, but right now I wonder if Strix' iGPU will be any faster than Phoenix. If Strix really doesn't have a bigger GPU Cache (which is likely, because Microsoft wants big AIE) I think it will be completely memory bound.

I mean look at Phoenix:
7600 is ~8-10% faster than 6650XT at mostly same clocks in most reviews, so that's what RDNA3 delivers over RDNA2. 780M clocks 18% (2.8GHz vs 2.4GHz) higher than 680M. So expected performance uplift of 780M would be in the 25-30% region. However, with same RAM it's barely 10% faster, more like 5%. If Strix just dumps 2 more WGPs in there i don't see how it is any faster than 780M at ISO RAM.

Strix Point vs Phoenix
225 mm² vs 178 mm²
LPDDR5X-8533 vs LPDDR5X-7500

There's minor uplift in RAM bandwidth. Also they are adding 47 mm² of Silicon. Surely, that's enough to squeeze in a bigger GPU cache?

Philste · Jan 5, 2024

FlameTail said:
LPDDR5X-8533 vs LPDDR5X-7500

That's what it supports, but OEM decides what's in there at the end.

FlameTail said:
There's minor uplift in RAM bandwidth. Also they are adding 47 mm² of Silicon. Surely, that's enough to squeeze in a bigger GPU cache?

ZEN5 is a lot wider than ZEN4, there's a reason it uses 4+8 Design that probably takes the same space as a 8 ZEN5 Design. 50% more L2, 50% more L3, 2 more WGPs that may also be bigger because of RDNA3.5. Probably more/more modern IO (PCIe 5.0?). Bigger AIE. That's a lot to fit in those 47mm^2, don't you think? It's N4 vs N4(P?) after all.

Hitman928 · Jan 5, 2024

ikjadoon said:
Defining real-world benches might be helpful for this discussion?

//

TweakTown also measured a +9% IPC advantage for Raptor Lake versus Zen4:

Tweakers.net found about the same:

View attachment 91441

Overclockers tested a few more:

Not to jump too much into this as I actually put more value in spec_int, but your real world application comparison actually favors Zen4. You have Cinebench r20 three times, Cinebench r23 once, then 2 other tests. So basically it shows that RPL leads Zen4 in Cinebench by about 10% but then Zen4 leads in 7Zip by about 10% and the "tie breaker" would be Wprime where Zen4 leads by about 20%. This leaves you with Zen4 being 6% faster per clock on average across the 3 different workloads. Even if you count CB R20 and R23 separately, Zen4 would still be about 2% faster on average in the shown tests.

StefanR5R · Jan 5, 2024

Strix Point:

FlameTail said:
https://x.com/All_The_Watts/status/1708791849652273180?s=20

STX
TSMC N4P 225mm²
4c Zen 5 L3: 16 MB L2: 4 MB
8c Zen 5C L3: 16 MB L2: 8 MB
8 WGP RDNA3+
64 AIE tile
DDR5-5600 / LPDDR5X-8533
28-35+ W

Click to expand...

Philste said:
Got corrected immediately to 8MB L3 for the ZEN5c CCX. So the 4 ZEN5 Cores get 16MB L3, the 8 ZEN5c Cores get 8MB L3.

Tigerick said:
Hmm, if that is correction, then total L3 cache of STX is 24MB....50% larger than PHX

Describing two separate caches by the sum of their sizes is alright when you discuss just area, but functionally that's a rather idealized quantification and overstates the usefulness of these caches in many relevant scenarios. Personally I would expect the designers of a mobile CPU to go for a unified last level cache. But who knows, CPUs with stranger properties have been released before.

misuspita · Jan 5, 2024

FlameTail said:
256 bit + LPDDR5X 8533.

= 273 GB/s

And will this be enough? 6800m which has the same 40CU gets up to 384GB/s as per AMD.

That's why my question in the first place. A chip like this without being properly fed it's a bit of a waste. Unless I miss something.

jpiniero · Jan 5, 2024

misuspita said:
And will this be enough? 6800m which has the same 40CU gets up to 384GB/s as per AMD.

That's why my question in the first place. A chip like this without being properly fed it's a bit of a waste. Unless I miss something.

Clocks may not be that high on Strix Halo. Depends on how high the power draw is going to go.

rtxtwt · Jan 5, 2024

We receive Zen6 DT codename before Zen5 release

https://twitter.com/x/status/1742741127697166714

igor_kavinski said:
Why smallish amount? Some production related hiccups?

I don't know, there's no new informations about production recently. But some gossip suggest Zen5 would utilize XDNA2 architecture.

misuspita · Jan 5, 2024

jpiniero said:
Clocks may not be that high on Strix Halo. Depends on how high the power draw is going to go.

How high can they go? CPU wise, 40-65W is about enough... For GPU, sky's the limit. I wish they would do 100W at least, but I admit I have no idea if it is a mobile only or mixed desktop mobile part

I am really curious of this one, cause if it really is that powerful GPU wise, it's instabuy for me. My current 5700G can chug along until 2025 just fine

qmech · Jan 5, 2024

cortexa99 said:
We receive Zen6 DT codename before Zen5 release

https://twitter.com/x/status/1742741127697166714

I don't know, there's no new informations about production recently. But some gossip suggest Zen5 would utilize XDNA2 architecture.

I am not sure I would call the leaked roadmap "gossip". It has been accurate as far as Hawk Point is concerned.

A leaked roadmap is not as good a source as an official roadmap, but still somewhat above "gossip".

Joe NYC · Jan 5, 2024

Kolifloro said:
My first message here ... hi everybody !!!

I've been reading you silently for several months ... and ... I admit it : I am an enthusiast ignorant ... BUT , I would like to know ...

My question is :

I would like to know whether AMD were able to 'roll-out' Strix Point at ... let's say May ... why would they do it on October instead ??

I can think of 'commercial reasons' ... to give some time Hawk Point to get sell ...

On the other hand ... from my lack of knowledge, I think the soonest Strix Point (Zen 5) hits retailers' shelves ... the more market share AMD will take away from Intel ... at least as far as the laptop market is concerned.

Please ... shed some 'light' ...

Regards.

I think the answer to this (Strix Point) is different from, say Zen 5 desktop. Zen 5 desktop can be put in the retail box and shipped right away, when it is ready. A notebook chips is sent to OEMs and than, the OEMs have to be ready with their notebooks to be able to ship them.

But a broader question is a valid one. Tom of MLID has raised it in recent podcast. Namely, with Zen 4 being so strong, what is the reason to release Zen 5?

I think delaying a new (stronger) product is a mistake in a competitive environment. Releasing the product slots the product into the higher slots, competitively, which should lead to higher sales and higher revenue. Postponing release would forgo this advantage for a period of time, which is equivalent of leaving money on the table.

Joe NYC · Jan 5, 2024

misuspita said:
Soo... how's AMD gonna feed so many hungry Strix Halo CPU, GPU and AI cores?

I am guessing Strix Point will be able to utilize the highest speed LPDDR5x, namely 8533.

Edit: seems like 10 people replied to this ahead of me.

jpiniero · Jan 5, 2024

misuspita said:
How high can they go? CPU wise, 40-65W is about enough... For GPU, sky's the limit. I wish they would do 100W at least, but I admit I have no idea if it is a mobile only or mixed desktop mobile part

Mobile only but there might be some (pricey) NUCs.

misuspita · Jan 5, 2024

I don't mind pricey if it's also silent(y). A 13700 + 4070 is around 2k now, so if they manage to sell a miniPC at 1000-1500, I'd buy it. Also, if they can cool that combo, they could, potentially, cool a 180W NUC

TESKATLIPOKA · Jan 5, 2024

StefanR5R said:
Describing two separate caches by the sum of their sizes is alright when you discuss just area, but functionally that's a rather idealized quantification and overstates the usefulness of these caches in many relevant scenarios. Personally I would expect the designers of a mobile CPU to go for a unified last level cache. But who knows, CPUs with stranger properties have been released before.

L3 cache is not separate, I don't know why you think It's separate.

ikjadoon · Jan 5, 2024

StefanR5R said:
@ikjadoon, why are you first asking @Markfw to define real-world benchmarks, then leave the real world and provide Cinebench 1T figures? The Cinebench benchmark measures performance of a rendering engine which, in the real world, is used for widely parallel problems and, importantly, in its much more effective GPGPU implementation, not in the plain CPU implementation anymore these days.

I understand that @Markfw was primarily referring to fully parallel scientific computing. E.g. molecular dynamics, telescope data processing, number-theoretical transforms... That's often n×1-copies×threads, sometimes n×m-copies×threads. I am running such stuff myself and occasionally implemented actual reproducible benchmarks based on selected workunits of such science tasks.

...

Anyway; this side discussion was started with a statement that Intel had "better IPC" (without further qualification) and the implication that this is one of the aspects why the poster thought that AMD should offer Zen 5 based products as soon as they can. Meanwhile we have seen that there are indeed situations in which higher clock-normalized performance is observed on Intel CPUs, while on the other hand everybody can have a different opinion on how much this fact can or should influence AMD's time-to-market efforts. :-)

That's the nail on the head, Stefan. "Real-world" is too vague a useful term. It's not far from the "best" benchmark. Best for who and what circumstances? Everyone draws their line somewhere else.

Surely you'd agree why defining it makes sense.

It's ideally also why "IPC" without qualifiers as a catch-all term should be retired (both the practical reason that it's workload-specific and the pedantic reason that we don't measure instructions) in favour of "performance in XYZ workload at identical clocks".

Discussing IPC without a workload is like discussing frames per second without a game. Or, as we commonly do an average, it matters what gets averaged. The industry relies on SPEC, but if SPEC doesn't fit one's use-case, then one should define the use-case.

Agreed: I don't think any vendor is reading 1) these forums and 2) these discussions to magically adjust their launch timing, "The AnandTech forum posters nailed it: we must do more now."

Hitman928 said:
Not to jump too much into this as I actually put more value in spec_int, but your real world application comparison actually favors Zen4. You have Cinebench r20 three times, Cinebench r23 once, then 2 other tests. So basically it shows that RPL leads Zen4 in Cinebench by about 10% but then Zen4 leads in 7Zip by about 10% and the "tie breaker" would be Wprime where Zen4 leads by about 20%. This leaves you with Zen4 being 6% faster per clock on average across the 3 different workloads. Even if you count CB R20 and R23 separately, Zen4 would still be about 2% faster on average in the shown tests.

I agree on spec_int.

Yes; I meant to share data, not necessarily conclude "Zen4 is lower IPC" → the IPC will depend on the benchmark. Thank you for checking the math; inverting wprime (Zen4 = 121.4%) & the geometric mean is 2.75% in favour of Zen4.

That's a great point on why it matters what gets inside average to define "IPC".

Philste · Jan 5, 2024

TESKATLIPOKA said:
L3 cache is not separate, I don't know why you think It's separate.

If Strix doesn't change AMDs use of CCXs then it kinda is separate. The 4 ZEN5 Cores will have fast access to 16MB L3 and the 8 ZEN5c Cores will have fast access to 8MB L3. Cross CCX Latencies are usually much worse.

TESKATLIPOKA · Jan 5, 2024

Philste said:
If Strix doesn't change AMDs use of CCXs then it kinda is separate. The 4 ZEN5 Cores will have fast access to 16MB L3 and the 8 ZEN5c Cores will have fast access to 8MB L3. Cross CCX Latencies are usually much worse.

Aren't you comparing It to desktop CPUs, which have separate CCDs?
Does PHX2's cache look to you like It's separate? But It's true, that It's only 6 cores in total.
Strix Point will have most likely a single CCX.

Philste · Jan 5, 2024

TESKATLIPOKA said:
Strix Point will have most likely a single CCX.

Everything I heard so far was 2 CCX, but idk. Renoir also had 2 CCX btw, each with only 4MB L3.

StefanR5R · Jan 5, 2024

TESKATLIPOKA said:
L3 cache is not separate, I don't know why you think It's separate.

Only that twitter message, or how it was quoted here, made it look like that to me.

uzzi38 · Jan 5, 2024

TESKATLIPOKA said:
Aren't you comparing It to desktop CPUs, which have separate CCDs?
Does PHX2's cache look to you like It's separate? But It's true, that It's only 6 cores in total.
Strix Point will have most likely a single CCX.

It's dual CCXes.

PHX2 is single CCX, but STX is dual.

TESKATLIPOKA · Jan 5, 2024

uzzi38 said:
It's dual CCXes.

PHX2 is single CCX, but STX is dual.

For sure? Can't say I am happy with this.
How would It look like?
4xZen5 -> CCX 1
8xZen5c -> CCX 2

FlameTail · Jan 5, 2024

TESKATLIPOKA said:
For sure? Can't say I am happy with this.
How would It look like?
4xZen5 -> CCX 1
8xZen5c -> CCX 2

What's wrong with that setup?

This kind of division into clusters is common in ARM chips like Apple's M3 Max and Qualcomm's X Elite.

Tigerick · Jan 5, 2024

uzzi38 said:
It's dual CCXes.

PHX2 is single CCX, but STX is dual.

Yep, that's what I suspected cause Allthewatt stated the L3 cache of Zen5 and Zen5c are seperated...so we are getting:-

4 x Zen5 CCX1 (16MB L3)
8 x Zen5c CCX2 (8MB L3)

Total 24MB L3 cache

TESKATLIPOKA · Jan 5, 2024

FlameTail said:
What's wrong with that setup?

Performance penalty.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Member

Platinum Member

Member

Diamond Member

Elite Member

Senior member

Lifer

Senior member

Senior member

Member

Platinum Member

Platinum Member

Lifer

Senior member

Platinum Member

Member

Member

Platinum Member

Member

Elite Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Platinum Member