Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Doug S · May 21, 2024

Joe NYC said:
Ipad does not have the wealth of software that benefits from single core. Arm based Macs lost compatibility with gaming software, and Arm based Mac has been abandoned as a gaming platform.

And if you only play games where FPS is a major thing, then you have a poor taste in games.

Here's where I once again point out to people that the mobile gaming market is larger than the PC and console gaming markets COMBINED. Apple's mobile gaming market alone is much bigger than the entire PC (or console) gaming market.

So the Mac's "abandonment" as a gaming platform (how can it be abandoned when it never had it in the first place) is irrelevant. The iPad, and the slightly lower clocked version of M4's CPU that will be shipping in millions of iPhones four months from now, will have plenty of games for their users to choose from. Even if you turn your nose up at them and don't consider mobile gaming to be "real" gaming. Revenue don't lie, that's what most people consider to be gaming when voting with their wallets.

Timorous · May 21, 2024

H433x0n said:
Well, the hype has been +40% 1T perf and a score of 4200 is what it’d take to reach that.

32% IPC in Spec Int 2017 1T. A very specific claim and it annoys me when people take that specific claim and try and generalise to all workloads, it screams dishonest and disingenuous.

Kepler Said

Kepler_L2 said:
Core for core Zen5 is >40% faster than Zen4 in SPEC.

But the caveat is that Turing has 500W TDP Vs Genoa with 400W so with high core counts in an all core workload Turing probably clocks higher. On desktop I don't think there will be much of a clock speed difference going from Zen 4 to 5 so the mt uplift there will be lower.

IronLynx · May 21, 2024

Zen5 IPC is 42.

StefanR5R · May 21, 2024

Timorous said:
the caveat is that Turing has 500W TDP Vs Genoa with 400W

We don't know the lineup of Turin (and Turin dense) as of yet. That is, which SKU with what core count and thread count will get which cTDP_low/default TDP/cTDP_high is not obvious so far. There has been a rumored table of SKUs recently but this seemed to include some mistakes.

What's somewhat certain at this time is that the top SKU or top SKUs will have 500 W TDP = PPT limit indeed, yet at the same time come with increased core count and, presumably, slightly increased IMC clock compared to the top of the Genoa and Bergamo lines.

Timorous said:
so with high core counts in an all core workload Turing probably clocks higher

Ignoring the increased core count per socket for a second, or assuming that same-core-count SKUs might get their cTDP_high upped to 500 W, the clock speed in fully parallel workloads is going to depend on how much "IPC" the given workload will be able to extract from the considerably widened cores.

uzzi38 · May 21, 2024

Timorous said:
32% IPC in Spec Int 2017 1T. A very specific claim and it annoys me when people take that specific claim and try and generalise to all workloads, it screams dishonest and disingenuous.

Kepler Said

But the caveat is that Turing has 500W TDP Vs Genoa with 400W so with high core counts in an all core workload Turing probably clocks higher. On desktop I don't think there will be much of a clock speed difference going from Zen 4 to 5 so the mt uplift there will be lower.

please stop calling it Turing

CouncilorIrissa · May 21, 2024

uzzi38 said:
please stop calling it Turing

Turin no DLSS????? DOA

adroc_thurston · May 21, 2024

Timorous said:
But the caveat is that Turing has 500W TDP Vs Genoa with 400W so with high core counts in an all core workload Turing probably clocks higher.

Remember that despite industry leading IPC:Cac ratios of AMD cores, moar IPC is also moar power in the age of very dead Dennard scaling.

Kryohi · May 21, 2024

StefanR5R said:
So how well does "any kind of serious work" done on desktops and workstations, in IRL, scale past ≈eight threads?

Pretty much everything. If it doesn't scale that much, or MT isn't implemented for a specific algo/library and I don't want to bother rewriting it myself, I will simply use it in parallel for different datasets, or even use it while I do other kind of analyses. I work in biophysics/structural bioinformatics (academia), there's always different stuff to do/try. And the GPU isn't an option, since it's busy doing some other stuff most of the time, and it's far too time consuming to port stuff to it anyway (when it's not straight-up impossible).

StefanR5R said:
If your application scales easily to high thread counts but workstation and server are too expensive to you, then buy a dozen of 2nd hand PCs and a cheap Ethernet switch and run the CPU intensive part of your well scaling application on the Ethernet cluster. This instantly gives this type of applications greatly more nT performance than any generational CPU update will ever do, and for little money to boot. (Or if you do streaming vector arithmetic, get your application ported to GPU, as @H433x0n pointed out.)

I already have access to a powerful cluster, but desktop work is desktop work. For stuff that runs in let's say 10 or 20 minutes having to deal with remote resources isn't that convenient. A workstation running it locally in 5-10 minutes would be much better.

StefanR5R said:
In other other words, even though you, as end user with nT performance needs but without budget do not matter to AMD and to OEMs, you still get to reap much of the benefits of those CPU architecture updates which are driven by requirements of customer groups which do matter to AMD and OEMs. Nice!

Of course more ST is good! What I'm saying is that in the end, +30% MT and +20% ST would be better (for me and many other people) than the reverse, especially since the ST baseline is already fairly good and already allows fast prototyping.
Now, if the "problem" with zen 5 is SMT, that might be reasonable. 16 really fast cores might still be a good deal even if SMT does not add that much in some workloads.

Joe NYC · May 21, 2024

Doug S said:
Here's where I once again point out to people that the mobile gaming market is larger than the PC and console gaming markets COMBINED. Apple's mobile gaming market alone is much bigger than the entire PC (or console) gaming market.

So the Mac's "abandonment" as a gaming platform (how can it be abandoned when it never had it in the first place) is irrelevant. The iPad, and the slightly lower clocked version of M4's CPU that will be shipping in millions of iPhones four months from now, will have plenty of games for their users to choose from. Even if you turn your nose up at them and don't consider mobile gaming to be "real" gaming. Revenue don't lie, that's what most people consider to be gaming when voting with their wallets.

Mobile and especially ad supported mobile gaming is pathetic...

Good thing it is on decline:

What really takes the cake is "pay to win" gaming. Those people should be arrested.

Joe NYC · May 21, 2024

uzzi38 said:
please stop calling it Turing

Not as bad as RGT calling it "Cheering"

Timorous · May 21, 2024

uzzi38 said:
please stop calling it Turing

I was on mobile so I am going to blame auto correct (which may or may not be a total lie).

Timorous · May 21, 2024

adroc_thurston said:
Remember that despite industry leading IPC:Cac ratios of AMD cores, moar IPC is also moar power in the age of very dead Dennard scaling.

Then it just means the inverse is probably true. Turin roughly matches Genoa in clock speeds but uses up the extra TDP budget to do that due to the IPC uplift. The 9950X having the same TDP as 7950X means it clocks lower in all core workloads. In ST workloads you don't max out the TDP budget so clock speeds are roughly comparable. Something along those lines.

StefanR5R · May 21, 2024

500 W / 400 W = 1.25
128? cores / 96 cores = 1.33?
196? dense cores / 128 dense cores = 1.53?

(These fractions don't reflect though that the socket power budget is spread between CCDs and IOD and fabric. Edit: However, the new sIOD will apparently have more GMI links and faster IMCs.)

itsmydamnation · May 21, 2024

StefanR5R said:
500 W / 400 W = 1.25
128? cores / 96 cores = 1.33?
196? dense cores / 128 dense cores = 1.53?

(These fractions don't though that the socket power budget is divided between CCDs and IOD and fabric.)

lots of server power is spent on I/O / uncore / memory so if they have been able to continue to improve power in those area's it can mean more power per core then just a straight extrapolation.

Timorous · May 21, 2024

StefanR5R said:
500 W / 400 W = 1.25
128? cores / 96 cores = 1.33?
196? dense cores / 128 dense cores = 1.53?

(These fractions don't reflect though that the socket power budget is spread between CCDs and IOD and fabric.)

I may be utterly wrong but I presume Keplers core for core comment was referring to a 96c Turin vs a 96c Genoa, would not really be core for core if core counts were not equalised.

Kepler_L2 · May 21, 2024

Timorous said:
I may be utterly wrong but I presume Keplers core for core comment was referring to a 96c Turin vs a 96c Genoa, would not really be core for core if core counts were not equalised.

Correct

DrMrLordX · May 21, 2024

CouncilorIrissa said:
Turin no DLSS????? DOA

Oh noes first Starfield and now Turin? ahahahahsdouifhasdof sorry

Abwx · May 21, 2024

Timorous said:
Then it just means the inverse is probably true. Turin roughly matches Genoa in clock speeds but uses up the extra TDP budget to do that due to the IPC uplift. The 9950X having the same TDP as 7950X means it clocks lower in all core workloads. In ST workloads you don't max out the TDP budget so clock speeds are roughly comparable. Something along those lines.

N4P should bring the necessary perf/watt uplift to keep the same MT frequency as the N5 based 7950X.

StefanR5R · May 21, 2024

Re #11,227 -> #11,229,

Timorous said:
I may be utterly wrong but I presume Keplers core for core comment was referring to a 96c Turin vs a 96c Genoa, would not really be core for core if core counts were not equalised.

I got the core-for-core bit, but I had forgotten about the other mention which indicates that it is going to be possible to operate 96c Turin at 500 W socket power.

However, whether or not increased core clock speeds are involved in the presumed >140 % socket performance at 125 % socket power is going to hinge on what AMD's engineers were able to extract out of the minor CCD manufacturing node update, and whether or not the new sIOD saves power vs. Genoa's.

Timmah! · May 21, 2024

Kaffeekenan said:
Oh yeah? Which month?

April :-D

rainy · May 21, 2024

StefanR5R said:
500 W / 400 W = 1.25
128? cores / 96 cores = 1.33?
196? dense cores / 128 dense cores = 1.53?

(These fractions don't reflect though that the socket power budget is spread between CCDs and IOD and fabric. Edit: However, the new sIOD will apparently have more GMI links and faster IMCs.)

Successor of Bergamo (Turin Dense?) will have 192 cores not 196.

StefanR5R · May 21, 2024

Thanks, right, 196 would be awkward to divide into core complexes. ;-)

gdansk · May 21, 2024

If you made a prediction consider adding it to this spreadsheet so we can easily see who to give kudos without searching through the thread

ARL-S and Zen 5 Desktop Speculation

RULE RULE This spreadsheet is unmoderated. No spamming and please respect what others have written. I will stop making these public speculation spreadsheets once someone breaks the rule. That's all, have fun. (By the way, the tabs are at the bottom.)

docs.google.com

Mopetar · May 21, 2024

If they get a 40% improvement in performance for a 25% increase in power it's certainly worth it. If most of that is coming from IPC improvements then dialing back the clocks will reduce a good chunk of the additional power draw for a ~30% performance gain.

Going wider may use slightly more power in the moment, but if it gets through the workload faster and allows the cores to get to a rest state faster, it will still consume less overall.

CouncilorIrissa · May 21, 2024

Kryohi said:
I already have access to a powerful cluster, but desktop work is desktop work. For stuff that runs in let's say 10 or 20 minutes having to deal with remote resources isn't that convenient. A workstation running it locally in 5-10 minutes would be much better.

Of course more ST is good! What I'm saying is that in the end, +30% MT and +20% ST would be better (for me and many other people) than the reverse, especially since the ST baseline is already fairly good and already allows fast prototyping.
Now, if the "problem" with zen 5 is SMT, that might be reasonable. 16 really fast cores might still be a good deal even if SMT does not add that much in some workloads.

Does +30% nT gain actually translate into your workloads taking 1-(1/1.3) ~=23% lower time to complete?
Because I'd be very surprised if it did.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Platinum Member

Golden Member

Junior Member

Elite Member

Platinum Member

Senior member

Diamond Member

Member

Platinum Member

Platinum Member

Golden Member

Golden Member

Elite Member

Platinum Member

Golden Member

Senior member

Lifer

Lifer

Elite Member

Golden Member

Senior member

Elite Member

Platinum Member

Diamond Member

Senior member