Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

StefanR5R · Apr 13, 2025

inquiss said:
A dual vache CCD still has the inherent cross CCD penalty. It doesn't really help.

That's what people claim. But the actual impact on particular games (or other workloads) has never been measured.

inquiss said:
Instead what you should be asking for, is an overall larger CCD

Small core complexes are an enabler of low latency of the L3 caches.

reaperrr3 · Apr 13, 2025

StefanR5R said:
That's what people claim. But the actual impact on particular games (or other workloads) has never been measured.

I don't think that's true.
It's just hard to test, because most games are designed around 6-8 cores being mainstream and don't heavily utilise more cores to begin with.
AMD making games run only on the VCache-CCD of the x9x0X3D models doesn't help the difficulty of measuring it, but it's a strong indication that AMD's own testing suggested that cross-CCD latency hurts more than limiting games to 8C max, which is kind of telling.

There have been other indications in the past, like Matisse vs. Vermeer and Rembrandt.

A 5700X is up to ~60% faster than a 3700X in some games, even though the IPC in most non-gaming tasks is only 11% higher and clocks aren't that much higher, either.

Meanwhile, the 5700G with only half as much L3 (but same amount of maximum L3 available to 1 core as Matisse, if not all cores are fully utilised) craters down to being only about 20% faster in those games than a 3700X, but still beating the 11% IPC uplift we see in many other workloads.

That means up to 2/3 of the Zen3 performance uplift in games is coming from the doubled L3 size per core vs. Zen2, but a few more percentage points may be coming from avoiding cross-CCD latency.

StefanR5R said:
Small core complexes are an enabler of low latency of the L3 caches.

For low-threaded workloads that always fit into L3, maybe.
But for a lot of workloads, a few cycles less L3 latency pale in comparison to having twice as much L3 to work with before having to go off-CCD/CCX, both in terms of power and latency.

AMD wouldn't be increasing core count + L3 per CCX/CCD for no good reason.

LightningZ71 · Apr 13, 2025

inquiss said:
Ryzen is the consumer line, people use it for gaming and other compute tasks. The point is they are all consumer tasks. Really you get the single core CCD with vcache of you game, and if you want it for dual use as a workhorse you get the single CCD vcache.

If you really want to use the applications that benefit from vcache and large amounts of cores the last part of your paragraph is the important one. The product already exists and it's not for consumer. To provide it to consumer would mean lopping off lots of commercial income. The clamour for expensive hardware for cheap is, pointless. It's not a segment AMD will provide for you. It would be mad for them to do so...

Precisely why AMD has the Epyc 4XXX line of glorified desktop processors for low end servers and workstations. A 2 CCD X3D parts would fit perfectly in that line and be priced accordingly.

LightningZ71 · Apr 13, 2025

Abwx said:
KRK 350 based laptop review :

https://www.notebookcheck.net/A-1-100-nit-OLED-and-AMD-Zen-5-in-a-creator-s-laptop-The-Lenovo-IdeaPad-Pro-5-14-G10-review.997910.0.html

Not bad. The iGPU performs a bit better than I expected, but digging into the benches, a lot of the performance gain on KRK is influenced by the higher 1t performance of Zen5 more than held back by the fewer CUs. In stuff that is less 1t dependent, it doesn't beat HPT. But the two solutions are very close.

HPT still offers lots of PCIe lanes for a dGPU though, and anything with a 3050 or better will be miles better in gaming.

Joe NYC · Apr 13, 2025

reaperrr3 said:
For low-threaded workloads that always fit into L3, maybe.
But for a lot of workloads, a few cycles less L3 latency pale in comparison to having twice as much L3 to work with before having to go off-CCD/CCX, both in terms of power and latency.

AMD wouldn't be increasing core count + L3 per CCX/CCD for no good reason.

AMD is going to turn an ugly stepchild (9900x3d) into a beautiful princess. Like Cinderella.

inquiss · Apr 13, 2025

Win2012R2 said:
Not true - there is no 3D version of Turin and official message is that they won't do it this gen, so no - you can't get Zen 5 with both chiplets having 3D cache at all, for any money.

AMD said that they won't come this gen.

Why does the gen make a difference? AMD sells threadripper and EPYC vcache models. For some reason you've added that they need to be zen 5 for some reason?

igor_kavinski · Apr 13, 2025

Win2012R2 said:
so no - you can't get Zen 5 with both chiplets having 3D cache at all, for any money.

Well, I have the 9950X3D now so a dual V-cache CCD chip is landing any day now.

YOU ARE WELCOME.

Because I never win. Whatever I do, it gets trampled over by life and fate

Don't believe me?

I have won a lottery just once.

Usually that lottery was shared by at most two people.

You know how many people I had to share MY lottery prize with?

FORTY.

So I got only like $6000 which I promptly blew away buying stupid old hardware because I'm no investment guru.

See? Even when I win, it's not actually a win!

Oh wait. You want more?

I got Z790 because the hope was to get 14900KS and make it last longer.

Then the degradation fiasco happened and even if it hadn't, 14900KS is a dud without exotic cooling.

I could go on and on...

Abwx · Apr 13, 2025

LightningZ71 said:
Not bad. The iGPU performs a bit better than I expected, but digging into the benches, a lot of the performance gain on KRK is influenced by the higher 1t performance of Zen5 more than held back by the fewer CUs. In stuff that is less 1t dependent, it doesn't beat HPT. But the two solutions are very close.

HPT still offers lots of PCIe lanes for a dGPU though, and anything with a 3050 or better will be miles better in gaming.

Gaming is a secondary thought on those devices and the real point is to crush the competition in this segment, wich it does very well, FI at 20-27W it perform 25% better in MT than Intel s LNL@32-35W, that s considerable.

Beside thIs laptop cost 990€/1100$, wich is way lower than most equivalently equipped LNLs that are often sold at an insane 1800$/2000$ price tag.

LightningZ71 · Apr 13, 2025

This was more musings of mine following a previous conversation with Adroc. I still contend that the 780m should be able to perform a bit better than KRK's iGPU when the game can't take advantage of the higher ST performance of Zen5P. However, with RDNA3.5 being more efficient than 3.0, it may not appear until you hit the upper limit of power draw for the platform and have good cooling.

Abwx · Apr 13, 2025

It can be sometime faster due to LPDDR5 8000 instead of the 6400 that can be found with Hpoint like the Lenovo 8845HS on this graph, both have 32GB.

QuickyDuck · Apr 13, 2025

Server zen5 have 2 link to IOD while consumer zen5 only use 1. Just wondering, if there could e a CPU with links between CCD1, CCD2, IOD creating full mesh connectivity and address cross CCD latency issue.

biostud · Apr 14, 2025

reaperrr3 said:
I don't think that's true.
It's just hard to test, because most games are designed around 6-8 cores being mainstream and don't heavily utilise more cores to begin with.
AMD making games run only on the VCache-CCD of the x9x0X3D models doesn't help the difficulty of measuring it, but it's a strong indication that AMD's own testing suggested that cross-CCD latency hurts more than limiting games to 8C max, which is kind of telling.

There have been other indications in the past, like Matisse vs. Vermeer and Rembrandt.

A 5700X is up to ~60% faster than a 3700X in some games, even though the IPC in most non-gaming tasks is only 11% higher and clocks aren't that much higher, either.

Meanwhile, the 5700G with only half as much L3 (but same amount of maximum L3 available to 1 core as Matisse, if not all cores are fully utilised) craters down to being only about 20% faster in those games than a 3700X, but still beating the 11% IPC uplift we see in many other workloads.

That means up to 2/3 of the Zen3 performance uplift in games is coming from the doubled L3 size per core vs. Zen2, but a few more percentage points may be coming from avoiding cross-CCD latency.

For low-threaded workloads that always fit into L3, maybe.
But for a lot of workloads, a few cycles less L3 latency pale in comparison to having twice as much L3 to work with before having to go off-CCD/CCX, both in terms of power and latency.

AMD wouldn't be increasing core count + L3 per CCX/CCD for no good reason.

The cases where a 9700X is faster than a regular 9950X (running same clocks), is a few outlier games, so I really don't think dual CCD creates many problems in it self. With heterogeneous cores the differences will be larger, but again it seems like it is fixed in most games.

Win2012R2 · Apr 14, 2025

inquiss said:
Why does the gen make a difference? AMD sells threadripper and EPYC vcache models. For some reason you've added that they need to be zen 5 for some reason?

Higher clocks, full AVX512, better general IPC, 2st gen of DDR5 support, 12 mem channels, current product that will sell for a while?

inquiss · Apr 14, 2025

Win2012R2 said:
Higher clocks, full AVX512, better general IPC, 2st gen of DDR5 support, 12 mem channels, current product that will sell for a while?

Yeah sure, zen 5 would make those current products faster but if you want massive multi thread or vcache models of massive multi thread, those products exist today. AMD has a product to sell and they win those segments already. Would they be faster with zen 5, sure. But right now AMD has products on those segments, just they are too expensive for the minute number of hobbyists that want them but want to pay cheap prices that would make no sense for AMD to charge, considering what consumers with real workloads would buy them for.

Win2012R2 · Apr 14, 2025

inquiss said:
if you want massive multi thread or vcache models of massive multi thread, those products exist today

They existed 3 years ago too in form of Milan-X, but why should I be buying that in 2025?

-X model frequencies were pretty low, where as Zen 5 EPYC pushed them up very aggressively and since 3D cache is now at the bottom that means it could retain a lot of that frequency in -X model too, therefore making it FAR more superior than old stuff.

And why should AMD push people to old product when they need to sell new stuff? What's there to argue about - doing dual chiplet desktop is trivial, call it EPYC 4004 3D and sell for $999

StefanR5R · Apr 14, 2025

reaperrr3 said:
There have been other indications in the past,

Indications... As I said, we have no measurements yet. What's been measured so far doesn't isolate cache segmentation effects from cache segment size effects.

(BTW, not directly related to X3D: I sometimes run numbertheoretic z-transforms in which program threads share hot data, with data size at the order of magnitude of L3$ size. For this edge case I do have my own performance measurements for different thread scheduling schemes, with the obvious outcome that shared hot data better sits in a shared cache. Some run-of-the-mill classic FP code with OpenMP autoparallelization appears to benefit too but I haven't quantified this yet for the particular applications which I run. Nothing of that relates to video games though.)

reaperrr3 said:
AMD wouldn't be increasing core count + L3 per CCX/CCD for no good reason.

Conversely, AMD started the Zen line with 4-core CCXs for good reasons. (And slowly worked their way up from there.)
Would be interesting to see Turin-dense cache latency related microbenchmarks.

Win2012R2 said:
there is no 3D version of Turin and official message is that they won't do it this gen,

The (according to adroc) fork into separate desktop and server physical CCD variants surely doesn't help, as the production volume wouldn't be shared. There is also Azure's adoption of MI300C ( = HBM'ed EPYC) which I have seen painted as the reason against Turin-X. Zen 5 Threadripper alias Shimada Peak is still a wildcard though.

inquiss · Apr 14, 2025

Win2012R2 said:
They existed 3 years ago too in form of Milan-X, but why should I be buying that in 2025?

-X model frequencies were pretty low, where as Zen 5 EPYC pushed them up very aggressively and since 3D cache is now at the bottom that means it could retain a lot of that frequency in -X model too, therefore making it FAR more superior than old stuff.

And why should AMD push people to old product when they need to sell new stuff? What's there to argue about - doing dual chiplet desktop is trivial, call it EPYC 4004 3D and sell for $999

Why should you be buying it? Because if you want high core count and high cache, it's got tons of performance.

Arguing with me that zen 5 is better than zen 4 is pointless. I agree.

The topic here is why AMD should sell this for peanuts. It's precisely because they make a lot of money for the purchasers. And AMD make a lot of money selling them that too. What's to argue about is forum dwellers thinking they should have expensive hardware for hobbyist prices. Which is what you are Igor argue for. You will always be disappointed, as you are now.

LightningZ71 · Apr 14, 2025

I would be concerned that a full EPYC 3D cache version of Zen5 Turin would run into power and thermal problems. What we're seeing in this generation is that the 3D cache versions of Zen 5 desktop do not have as profound of a power/thermal advantage over the non-3Dcache versions. Yes, it's partly due to the cores running at higher clocks, but it's partly because there are more transistors and higher IPC on the same node as before.

adroc_thurston · Apr 14, 2025

LightningZ71 said:
I would be concerned that a full EPYC 3D cache version of Zen5 Turin would run into power and thermal problems.

no.

LightningZ71 said:
Yes, it's partly due to the cores running at higher clocks, but it's partly because there are more transistors and higher IPC on the same node as before.

It's just clocks.
Same clocks as non-V$ parts means workload power is also the same.
d'oh.

igor_kavinski · Apr 14, 2025

LightningZ71 said:
Yes, it's partly due to the cores running at higher clocks, but it's partly because there are more transistors and higher IPC on the same node as before.

Zen 5 consumer dies are a lot leakier I think than Zen 4 dies. That could explain the higher power usage.

Joe NYC · Apr 14, 2025

Win2012R2 said:
They existed 3 years ago too in form of Milan-X, but why should I be buying that in 2025?

There is also Genoa-X. No word on Turin-X. Or more precisely, AMD may skip the -X on Turin...

igor_kavinski · Apr 14, 2025

Joe NYC said:
AMD may skip the -X on Turin...

Maybe saving the V-cache dies for Venice-X, in case they think they could release them earlier than their own expectations. Also, since the Genoa-X CPUs are so expensive and there can't be that many customers with the specific need of really high cache who have no issue absorbing the additional cost as cost of doing business, it makes sense that AMD is not interested in flooding the market with more X server parts. The V-cache dies could also be getting more use in their Instinct accelerators that are in high demand.

Joe NYC · Apr 14, 2025

igor_kavinski said:
Maybe saving the V-cache dies for Venice-X, in case they think they could release them earlier than their own expectations. Also, since the Genoa-X CPUs are so expensive and there can't be that many customers with the specific need of really high cache who have no issue absorbing the additional cost as cost of doing business, it makes sense that AMD is not interested in flooding the market with more X server parts. The V-cache dies could also be getting more use in their Instinct accelerators that are in high demand.

There was in fact a 2nd tier cloud provider (just below the handful of tier-1) who standardized on all their servers being Genoa-X. This took place later in life of Genoa and just before Turin announcement.

From that, you have to know (as far as this cloud provider) that
- they know AMD (internal) roadmap
- from the (internal) roadmap, they know about AMD's commitment to V-Cache on servers
- and that commitment to V-Cache is not ending with Genoa

Most likely, growing with Venice, when this company would be refreshing their servers.

LightningZ71 · Apr 15, 2025

adroc_thurston said:
no.

It's just clocks.
Same clocks as non-V$ parts means workload power is also the same.
d'oh.

Yeah, no kidding, but that's not exactly where I was going with that post. Server CPUs, especially for providers with dense racks, are quite sensitive to power draw and thermal dissipation. I was speculating specifically that a TurinX3d part would have to take a significant clock hit to stay within the socket and chassis power and thermal limits to the point that the performance improvement over regular Turin parts wasn't sufficient to justify the price that AMD would have to ask in enough cases to make the business case for it's existence.

adroc_thurston · Apr 15, 2025

LightningZ71 said:
Server CPUs, especially for providers with dense racks, are quite sensitive to power draw and thermal dissipation

dawg the power is defined at platform level.

LightningZ71 said:
I was speculating specifically that a TurinX3d part would have to take a significant clock hit to stay within the socket and chassis power and thermal limits to the point that the performance improvement over regular Turin parts wasn't sufficient to justify the price that AMD would have to ask in enough cases to make the business case for it's existence.

no it would clock like Turin.
Turin-X doesn't exist because MS opted for a MI300C-based refresh for their HPC instances.
yes, that simple.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Elite Member

Member

Platinum Member

Platinum Member

Diamond Member

Senior member

Lifer

Lifer

Platinum Member

Lifer

Attachments

Member

Lifer

Senior member

Senior member

Senior member

Elite Member

Senior member

Platinum Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member