Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

yuri69 · Sep 29, 2023

"2 basic block fetch"? 10-15% IPC goal?

The high-level indeed looks like Fire Storm. However, the main question are the OoO sizes - the good 'ol Fire Storm features a 600+ "ROB" structure. Is Zen similarly fat?

Saylick · Sep 29, 2023

Gideon said:
Hmm, Have I misunderstood it. I always thought i'ts two 256-bit loads and one 256 bit store:

AMD’s Zen 4 Part 1: Frontend and Execution Engine

AMD’s Zen 4 architecture has been hotly anticipated by many in the tech sphere; as a result many rumors were floating around about its performance gains prior to its release. In February 2021…

chipsandcheese.com

I got it from this slide:

Moving on, the load/store units within each CPU core have also been given a buffer enlargement. The load queue is 22% deeper, now storing 88 loads. And according to AMD, they’ve made some unspecified changes to reduce port conflicts with their L1 data cache. Otherwise the load/store throughput remains unchanged at 3 loads and 2 stores per cycle.

adroc_thurston · Sep 29, 2023

yuri69 said:
Is Zen similarly fat?

Yea.
Load/store queues also got a much needed juice injection.

yuri69 said:
10-15% IPC goal?

That's nT and they're way over that already best case.

DisEnchantment · Sep 29, 2023

Gideon said:
The same 12-way 48KB L1 cache as Colden Cove (hopefully without the latency penalty)

8-wide dispatch (+2 vs Alder Lake and Zen 4)

6 ALUs (+1 vs Alder lake +2 vs Zen 4)

4 loads / 2 stores per cycle (vs 3/2 for Golden cove, 2 /1 for Zen 4)

- if I'm reading this right, these are 512bit (64 byte) ? That's a massive uplift from Zen 4 if true (4x the throughput in ideal AVX-512 scenarios)

The biggest unknown for me is how do they plan to feed the beast? There are no mentions of any decoder changes, surely it would be an absurd bottleneck, if not changed?

It has unified schedulers too, big difference from Z4. Curious why they are not submitting compiler patches yet for such a big change.

Some hard facts compared to usual twitter rhetoric.
Going by AMD's usually sandbagging of CPU perf gains, I would say 20% IPC more or less. Not 30%+ for sure. If it takes a clock speed hit then the gains are going to be incremental not groundbreaking as what it is made it to be. Although I would have too say the gains are low for such a massive change in architecture compared to the Z2 --> Z3 evolution.

adroc_thurston · Sep 29, 2023

DisEnchantment said:
Curious why they are not submitting compiler patches yet for such a big change.

Opsec.

exquisitechar · Sep 29, 2023

Well, seems like over 30% IPC was a fantasy based on very weak evidence. Big surprise. Less than 20% would be disappointing given the breadth of the changes Zen 5 brings compared to Zen 3 and the rumored decrease in clock speeds, though.

DisEnchantment · Sep 29, 2023

What is interesting though is the native 16C CCX. Clark had this in mind during the 2021 interview with AT looks like.
Also Z6 has a 32C CCX
The new low power core option would be interesting if it keeps the same internals and just cut stuff here and there and fit that in a a tight power envelope.

adroc_thurston · Sep 29, 2023

exquisitechar said:
seems like over 30% IPC was a fantasy based on very weak evidence

Not when you know how Turin runs.

DisEnchantment said:
What is interesting though is the native 16C CCX

It's for cloud.

DisEnchantment said:
The new low power core option

That's a castrate from Strix with 1MB LLC.
Made for sitting at Vmin and doing background stuff (and cinememe).

bakyt115 · Sep 29, 2023

is that's all?
nothing crazy stuff like dual decoder? storing opCash to L3?

Joe NYC · Sep 29, 2023

adroc_thurston said:
Opsec.

What is your take on Zen 6 architecture, of CCDs stacked on IO die and CCDs connected with silicon bridges.

Seems like if they are already stacked on top of a single IOD, there would not be much of a reason to have silicon bridge between them.

Unless, there could be multiple IO dies for additional of memory channels.

adroc_thurston · Sep 29, 2023

bakyt115 said:
nothing crazy stuff like dual decoder?

Not really necessary here.

bakyt115 said:
storing opCash to L3?

L1 is already inclusive of opcache contents.

adroc_thurston · Sep 29, 2023

Joe NYC said:
What is your take on Zen 6 architecture

Good fun if morbidly expensive in server.

Joe NYC said:
CCDs stacked on IO die

No.
Just no.
They're stack on AID in Venice.
IOD is a different thingy.p

Joe NYC said:
there would not be much of a reason to have silicon bridge between them

You chain AIDs just like you do it in Navi4c.

Joe NYC said:
Unless, there could be multiple IO dies for additional of memory channels.

Yes Venice looks like Granite Rapids on crack cocaine.

Joe NYC · Sep 29, 2023

adroc_thurston said:
Good fun if morbidly expensive in server.

No.
Just no.
They're stack on AID in Venice.
IOD is a different thingy.p

You chain AIDs just like you do it in Navi4c.

Yes Venice looks like Granite Rapids on crack cocaine.

First, I forgot to say that what I posted about Zen 6 was in the latest MLID video.

But CCDs stacked on top of AIDs in Venice sounds very promising. It seems that AMD is really going all in on the MALL. in all future products.

adroc_thurston · Sep 29, 2023

Joe NYC said:
It seems that AMD is really going all in on the MALL. in all future products.

They have to do that to keep that perf gravy train rolling, especially in server.
HBM is no solution so MALL it is.

branch_suggestion · Sep 29, 2023

adroc_thurston said:
Good fun if morbidly expensive in server.

No.
Just no.
They're stack on AID in Venice.
IOD is a different thingy.p

You chain AIDs just like you do it in Navi4c.

Yes Venice looks like Granite Rapids on crack cocaine.

I guess I'll come out and say what standard Venice probably looks like now.
6 AID's, each with 4 2.5D stacked 8 core CCD's, each AID connected to each other with silicon bridges.
And finally 6 IO dies, connected to the outside of each AID with fanouts, each IO die with 2 memory channels and a bunch of PCIE/CXL lanes.
192 cores, huge fully unified MALL, along with so much room for all sorts of uncore accel and other server stuff.
Venice gets rid of EPYC's last weaknesses, for a huge fee, of course. Many will probably stick with Turin initially but Venice will become a huge stick of doom over time.

adroc_thurston · Sep 29, 2023

branch_suggestion said:
Venice gets rid of EPYC's last weaknesses, for a huge fee, of course

Yeah, but the costs are just above and beyond.

branch_suggestion said:
Many will probably stick with Turin initially

Yeah known platform, semi-reasonable price.

inf64 · Sep 29, 2023

So MLID seems to think that Zen 5 will bring Zen 4-like IPC , even though the changes are way more massive than Zen 2 -> Zen 3. He also talks about "fake leakers" who claim massive IPC increases for Zen 5, but he himself leaked that RWC will have up to 26% IPC increase (turns out it is 0-1%) and Arrow Lake will be up to 40% (while we have projections from intel that put it at 5-10% total ST jump vs Raptor Lake). The guy is something special.

Joe NYC · Sep 29, 2023

adroc_thurston said:
They have to do that to keep that perf gravy train rolling, especially in server.
HBM is no solution so MALL it is.

Well, Mi300 seems to have both, but is in a different class as far as cost.

I wonder how much performance increase there could be for a relatively modest sized caches in the MALL.

Or is it a scenario that there is a small chunk of memory that is being contested for by number of CCDs, and that small chunk of memory (being served from MALL) will then relieve a big bottleneck?

DisEnchantment · Sep 29, 2023

branch_suggestion said:
I guess I'll come out and say what standard Venice probably looks like now.
6 AID's, each with 4 2.5D stacked 8 core CCD's, each AID connected to each other with silicon bridges.
And finally 6 IO dies, connected to the outside of each AID with fanouts, each IO die with 2 memory channels and a bunch of PCIE/CXL lanes.
192 cores, huge fully unified MALL, along with so much room for all sorts of uncore accel and other server stuff.
Venice gets rid of EPYC's last weaknesses, for a huge fee, of course. Many will probably stick with Turin initially but Venice will become a huge stick of doom over time.

So what do you think of the Zen 5 leaks?

By the way, it looks like an external presentation. Not an internal presentation like what MLID is saying. Why would someone need an NDA for an internal document.
They just need access rights to the correct repos/sharepoint/team site/Confluence/etc to access the documents.

Probably AMD made this presentation to some CSP/OEM/SI etc.

Gideon · Sep 29, 2023

branch_suggestion said:
6 AID's, each with 4 2.5D stacked 8 core CCD's, each AID connected to each other with silicon bridges.

What are all the 16-32 core CCXs all about then?

adroc_thurston · Sep 29, 2023

Joe NYC said:
I wonder how much performance increase there could be for a relatively modest sized caches in the MALL.

Or is it a scenario that there is a small chunk of memory that is being contested for by number of CCDs, and that small chunk of memory (being served from MALL) will then relieve a big bottleneck?

Yeah it's basically a bandwidth ramp to keep 192/256 chungus cores functional.

DisEnchantment said:
Probably AMD made this presentation to some CSP/OEM/SI etc.

Yeah, some time ago.
This had no socket perf numbers for Turin tho.

DisEnchantment · Sep 29, 2023

DisEnchantment said:
Probably AMD made this presentation to some CSP/OEM/SI etc.

Actually I would say this is a presentation to an SI/OEM not even CSP. The codenames are client/commercial parts.

CSPs do not leak stuffs like this.

adroc_thurston · Sep 29, 2023

DisEnchantment said:
The codenames are client/commercial parts.

No, they're core/cache codenames.

DisEnchantment said:
CSPs do not leak stuffs like this.

Yes they do which is why AMD keeps them on a tight leash wrt sampling cycles.

Glo. · Sep 29, 2023

Extremely wide core compared to what we have currently. Resembling quite w bit Apple wide core uArch.

High IPC within low clocks?

If people know me - thats all I need. Next gen looks very, very good.

adroc_thurston · Sep 29, 2023

Glo. said:
within low clocks?

No.
But they're a tiny bit lower.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Diamond Member

Platinum Member

Golden Member

Platinum Member

Senior member

Golden Member

Platinum Member

Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Golden Member

Golden Member

Platinum Member

Golden Member

Platinum Member

Diamond Member

Platinum Member