Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

igor_kavinski · Dec 31, 2023

My impression is that AMD engineers have designed the c cores so that even though each core has lower IPC than a fat core, when they are working in tandem, they work well better together to deliver an almost similar MT uplift at lower power consumption which is just amazing engineering. The only downside is some ST workload ending up on a c core which would bring performance down for it so the Windows scheduler needs to be careful about preventing that.

igor_kavinski · Dec 31, 2023

maddie said:
How can a core with halved L3 have the same IPC?

I think the ST IPC is not the same. But the inter core communication is somehow better between the c cores.

Maybe @adroc_thurston can shed more light?

StefanR5R · Dec 31, 2023

maddie said:
How can a core with halved L3 have the same IPC? Has L3 suddenly become irrelevant?

Level 3 cache is not part of the core. It's part of the core complex.
In mobile parts, AMD has provided less level 3 cache compared to desktop and server parts before "dense" a.k.a. "cloud-native" cores were introduced.

In post #5,522, it was not stated what amount of cache the 8 core comparison part should have.

igor_kavinski said:
even though each core has lower IPC than a fat core,

But they have the same IPC in the Zen 4 generation (if the amount of cache is the same, if memory interface is the same, etc.). *Maybe* they won't have the same floating point IPC in the Zen 5 generation anymore, but I guess integer IPC is still the same.

igor_kavinski said:
But the inter core communication is somehow better between the c cores.

That'd be a property of the core complex again, not of the cores.

igor_kavinski · Dec 31, 2023

StefanR5R said:
But they have the same IPC in the Zen 4 generation

So the only difference is that the c cores can't be clocked higher and they are more power optimized? That raises an interesting question. Does the fat core need more circuitry to sustain higher frequency? Or are the higher frequency transistors taking up more space (maybe spread further apart from each other to keep heat down)?

StefanR5R · Dec 31, 2023

igor_kavinski said:
So the only difference is that the c cores can't be clocked higher and they are more power optimized?

The Zen 4 "dense" cores are only area optimized (and the major repercussion of it is their lower f_max), not power optimized.

(If you recall spectacular perf/W of Bergamo, even better than Genoa in fully scalable compute-bound workloads, then that's because Bergamo operates even nearer the most power efficient f-V spot then Genoa and has got even more cores on top of the same IOD and RAM foundation as Genoa. Bergamo must work nearer this sweet spot than Genoa simply because Bergamo's power budget per core is lower than Genoa's.)

I don't know whether or not Zen 5 "dense" cores will have more optimization work applied. It was claimed here in this thread that they will continue to be a density play first and foremost.

HurleyBird · Dec 31, 2023

StefanR5R said:
(If you recall spectacular perf/W of Bergamo, even better than Genoa in fully scalable compute-bound workloads, then that's because Bergamo operates even nearer the most power efficient f-V spot then Genoa and has got even more cores on top of the same IOD and RAM foundation as Genoa. Bergamo must work nearer this sweet spot than Genoa simply because Bergamo's power budget per core is lower than Genoa's.)

I believe that frequency and voltage being equal, the dense cores do consume less power, but I think that's all based on transistor count, distances, and maybe process, rather than anything architectural.

maddie · Dec 31, 2023

StefanR5R said:
Level 3 cache is not part of the core. It's part of the core complex.
In mobile parts, AMD has provided less level 3 cache compared to desktop and server parts before "dense" a.k.a. "cloud-native" cores were introduced.

In post #5,522, it was not stated what amount of cache the 8 core comparison part should have.

A bit pedantic, but whatever.

StefanR5R said:
But they have the same IPC in the Zen 4 generation (if the amount of cache is the same, if memory interface is the same, etc.). *Maybe* they won't have the same floating point IPC in the Zen 5 generation anymore, but I guess integer IPC is still the same.

In the Zen4 gen, the Zen4c have 1/2 L3 as full Zen4, and AMD still claims same IPC, with no disclaimers as you have inserted, which makes no sense. How you claim (if the cache amount is the same), when it can't be, as they're 2 unique designs, with many, but not all elements shared.

adroc_thurston · Dec 31, 2023

StefanR5R said:
not power optimized.

Yes they are, literally lower Cac.

StefanR5R said:
even better than Genoa in fully scalable compute-bound workloads, then that's because Bergamo operates even nearer the most power efficient f-V spot then Genoa and has got even more cores on top of the same IOD and RAM foundation as Genoa.

No they both operate about their optimum V/f spots.

Saylick · Jan 1, 2024

maddie said:
This is something I've struggled to understand. How can a core with halved L3 have the same IPC? Has L3 suddenly become irrelevant?

It’s only going to be a few % lower. Not enough to undo the IPC gains. But in mobile applications the standard Z5 core gets half the normal L3 anyways so I bet Z5c is actually comparable.

Gideon · Jan 1, 2024

Saylick said:
It’s only going to be a few % lower. Not enough to undo the IPC gains. But in mobile applications the standard Z5 core gets half the normal L3 anyways so I bet Z5c is actually comparable.

What is the rumored total L3, still 16MB?

Zen5c should support 16 core CCX, so is it shared between 12 cores or partitioned to a 4x "big" core CCX and a 8x"small" core one?

If the latter, it better have at least 16 + 8 = 24MB total L3.
Otherwise it's a de facto L3 decrease per CCX to Zen2 levels (which will hurt in games)

FlameTail · Jan 1, 2024

Gideon said:
What is the rumored total L3, still 16MB?

Zen5c should support 16 core CCX, so is it shared between 12 cores or partitioned to a 4x "big" core CCX and a 8x"small" core one?

https://x.com/All_The_Watts/status/1708791849652273180?s=20

STX
TSMC N4P 225mm²
4c Zen 5 L3: 16 MB L2: 4 MB
8c Zen 5C L3: 16 MB L2: 8 MB
8 WGP RDNA3+
64 AIE tile
DDR5-5600 / LPDDR5X-8533
28-35+ W

TESKATLIPOKA · Jan 1, 2024

maddie said:
This is something I've struggled to understand. How can a core with halved L3 have the same IPC? Has L3 suddenly become irrelevant?

maddie said:
In the Zen4 gen, the Zen4c have 1/2 L3 as full Zen4, and AMD still claims same IPC, with no disclaimers as you have inserted, which makes no sense. How you claim (if the cache amount is the same), when it can't be, as they're 2 unique designs, with many, but not all elements shared.

Because most likely that IPC is measured at 1T, so It doesn't matter If you use Zen4(5) or Zen4(5)c core, you still have the whole L3 for that single core.

BTW, I question If L3 is really partitioned between standard and dense cores.

This is PHX2 and L3 doesn't look like It's separated physically.
I think It's not separated at all. Just the amount is 16MB for 12 cores.

maddie · Jan 1, 2024

TESKATLIPOKA said:
Because most likely that IPC is measured at 1T, so It doesn't matter If you use Zen4(5) or Zen4(5)c core, you still have the whole L3 for that single core.

BTW, I question If L3 is really partitioned between standard and dense cores.

This is PHX2 and L3 doesn't look like It's separated physically.
I think It's not separated at all. Just the amount is 16MB for 12 cores.

Ok, that makes a lot of sense.

Philste · Jan 1, 2024

FlameTail said:
https://x.com/All_The_Watts/status/1708791849652273180?s=20

Got corrected immediately to 8MB L3 for the ZEN5c CCX. So the 4 ZEN5 Cores get 16MB L3, the 8 ZEN5c Cores get 8MB L3.

Tigerick · Jan 1, 2024

Philste said:
Got corrected immediately to 8MB L3 for the ZEN5c CCX. So the 4 ZEN5 Cores get 16MB L3, the 8 ZEN5c Cores get 8MB L3.

Hmm, if that is correction, then total L3 cache of STX is 24MB....50% larger than PHX

BorisTheBlade82 · Jan 1, 2024

IMHO, as long as the power budget permits having higher all core frequencies than Zen5c is able to achieve, then the MT performance of 4 Zen5 + 8 Zen5c should be identical to 12 Zen5. I would assume that to be the case for 35w Package Power and below.
I still hope that they might improve the efficiency of the small cores in the next gen which could lead to the B.l package being faster at low wattages than a full blown 12 Zen5 setup.

Regarding a unified L3 I would say that there is a strong incentive for AMD: They could save big on area if they could make both clusters use the same 16/24 Mbyte of L3 instead of having to duplicate them - especially as cache gets more and more costly with each node.

StefanR5R · Jan 1, 2024

maddie said:
In the Zen4 gen, the Zen4c have 1/2 L3 as full Zen4, and AMD still claims same IPC, with no disclaimers as you have inserted, which makes no sense. How you claim (if the cache amount is the same), when it can't be, as they're 2 unique designs, with many, but not all elements shared.

While the physical design differs, the microarchitecture — i.e. frontend, execution units and so forth, and the resulting execution width and instruction latencies — is exactly the same (from what is publicly known, and there are no indications to the contrary). And I should rather have said: "But they have the same IPC in the Zen 4 generation (if the amount of cache is the same, if memory interface is the same, or if the present amount of cache is sufficient for the considered workload)."

Timorous · Jan 2, 2024

maddie said:
In the Zen4 gen, the Zen4c have 1/2 L3 as full Zen4

Not does not.

A Bergamo Zen4c CCD has half the L3 per core as Genoa but PHX 2 has slightly more L3 per core than PHX although if you compare 6c Vs 6c then the L3 per core is the same.

FlameTail · Jan 2, 2024

So will Strix Boind have an LLC ?

Edit: Typo. Correction: "Strix Point"

TESKATLIPOKA · Jan 2, 2024

FlameTail said:
So will Strix Boind have an LLC ?

I don't think so. First, such a product would need to exist.

StefanR5R · Jan 2, 2024

StefanR5R said:
The Zen 4 "dense" cores are only area optimized (and the major repercussion of it is their lower f_max), not power optimized.

adroc_thurston said:
Yes they are, literally lower Cac.

OK, what I claimed was incorrect; not sure why I said so. (Zen 4c's different physical design does shave off power consumption additionally, beyond what just a lower average operating frequency brings.) Kai Troester, Zen 4 lead architect, phrased it this way: "What we did to create Zen 4c is take the Zen 4 functional design, which keeps IPC and features identical [...] We targeted a design frequency optimized for cloud servers, and with that lower frequency we were able to significantly reduce the area of the core. And that smaller area led to tremendous increase in power efficiencies, allowing us to provide 33 % more cores in the same power envelope at high frequency and IPC." (Source: Hot Chips 2023 CPU 1 session video) — Is there something more detailed published anywhere?

moinmoin · Jan 2, 2024

FlameTail said:
So will Strix Boind have an LLC ?

Edit: Typo. Correction: "Strix Point"

LLC stands for Last Level Cache. So if a chip has no L4$, L3$ is the LLC. If no L3$, L2$ is it, and so on. So since Strix Point definitely has cache the answer is yes? I'm not sure what you actually wanted to know. Maybe if LLC is shared between CPU and iGPU? (That be a no.)

FlameTail · Jan 2, 2024

moinmoin said:
Maybe if LLC is shared between CPU and iGPU? (That be a no.)

Yes. That. LLC in the form of an SLC.

qmech · Jan 2, 2024

FlameTail said:
Yes. That. LLC in the form of an SLC.

The hit rate from the iGPU is terrible, so having it pollute the same cache that the processor uses is not usually a good tradeoff.

AMD had a presentation about this at Hot Chips in '21.

https://images.anandtech.com/doci/16912/514240562.jpg

Notice that even with HD gaming, you need a 20 MB dedicated cache to reach 40% hit rate.

FlameTail · Jan 2, 2024

qmech said:
The hit rate from the iGPU is terrible, so having it pollute the same cache that the processor uses is not usually a good tradeoff.

Is that specifically a characteristic of RDNA?

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Lifer

Elite Member

Lifer

Elite Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Platinum Member

Diamond Member

Member

Senior member

Senior member

Elite Member

Golden Member

Platinum Member

Platinum Member

Elite Member

Diamond Member

Platinum Member

Member

Platinum Member