Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Hitman928 · May 14, 2024

Thibsie said:
At a cost of a bit of latency ?

If by latency you mean that you are introducing an additional gate delay, then yes.

Abwx · May 14, 2024

Hitman928 said:
Wouldn't be the first time GB reported the wrong cache size. On the other hand, it could be correct and a Zen 5 update to their embedded platform.

All other cache sizes seems to be right, so it s unlikely that GB made a miss just for the L1D, and the ST score, if at about 1.4GHz, is aligned with the late april GB submissions, this would translate as about 23% better ST score than a 7950X.

Goop_reformed · May 14, 2024

Abwx said:
All other cache sizes seems to be right, so it s unlikely that GB made a miss just for the L1D, and the ST score, if at about 1.4GHz, is aligned with the late april GB submissions, this would translate as about 23% better ST score than a 7950X.

23% is bad

RnR_au · May 14, 2024

HurleyBird said:
Memory capacity is king. Strix Halo + 64GB RAM will beat a 4090 in any AI task if you scale it up sufficiently. And for local LLMs where capacity is by far the biggest bottleneck, it won't even be close.

Strix Halo directly competes for the semi-pro market that is buying 4090s instead of RTX 6000s. Significantly slower for tasks that fit inside 24GB, but can be used for tasks a 4090 can't touch.

Just on this, I noticed this thread this morning;

https://www.reddit.com/r/LocalLLaMA/comments/1crwkia/why_people_buying_macs_instead_of_cuda_machines

From my understanding, Strix Point will also feature a souped up NPU, so Windows folks might get some inferencing love if any of the partners will field Strix Point laptops with 128GB of ram.

gdansk · May 14, 2024

Goop_reformed said:
23% is bad

Only for train passengers.

Abwx · May 14, 2024

Goop_reformed said:
23% is bad

There s forcibly some big variability, in a comparision with a linux driven 7800X3D, wich is sure to be at 5Ghz, that could be 27-29% or so, at this point the GB submission give no clue about the test plateform, given that it display 6.74GB RAM it could be single channel, or not.

Gigabyte Technology Co., Ltd. B650 AORUS ELITE AX vs AMD celadon Android-x86_64 - Geekbench

poke01 · May 14, 2024

RnR_au said:
Just on this, I noticed this thread this morning;

https://www.reddit.com/r/LocalLLaMA/comments/1crwkia/why_people_buying_macs_instead_of_cuda_machines

From my understanding, Strix Point will also feature a souped up NPU, so Windows folks might get some inferencing love if any of the partners will field Strix Point laptops with 128GB of ram.

Great points. I would also say the huge bandwidth helps as well. That’s before Apple is using LP5X.

MLX is also advancing as well. Honestly, the more companies try to break down Nvidias iron wall the better.

I would like to say AMD should make a larger APU with more bandwidth. Maybe, Zen6?
Looking at OpenAI’s event yesterday, the hype around AI is not dying. Lots of memory is needed.

adroc_thurston · May 14, 2024

RnR_au said:
From my understanding, Strix Point will also feature a souped up NPU

It's ok but GPU is the meat there.

poke01 said:
I would like to say AMD should make a larger APU with more bandwidth.

Goop_reformed · May 14, 2024

gdansk said:
Only for train passengers.

Nope, brand new uarch with major changes for only 23%, that's definitely not good.

adroc_thurston · May 14, 2024

Goop_reformed said:
Nope, brand new uarch with major changes for only 23%, that's definitely not good.

You're trying to model perf off fuzzy data.
Don't.

poke01 · May 14, 2024

adroc_thurston said:
It's ok but GPU is the meat there.

View attachment 98972

lol.. Maybe if I win the lottery.

adroc_thurston · May 14, 2024

poke01 said:
lol.. Maybe if I win the lottery.

You've asked for bigger and meaner and APUs (so far) don't get bigger and meaner than that!

branch_suggestion · May 14, 2024

Eng samples having unoptimised firmware or straight up firmware locks makes any performance assumptions dubious.
That one Weibo post that sent people mad is a very obvious example of limited testing. If they are a Lenovo product manager then it makes it far more likely that AMD would put in firmware locks to make such lapses of judgement moot.

H433x0n · May 14, 2024

Abwx said:
There s forcibly some big variability, in a comparision with a linux driven 7800X3D, wich is sure to be at 5Ghz, that could be 27-29% or so, at this point the GB submission give no clue about the test plateform, given that it display 6.74GB RAM it could be single channel, or not.

Gigabyte Technology Co., Ltd. B650 AORUS ELITE AX vs AMD celadon Android-x86_64 - Geekbench

For the April GB scores when compared against a 14900K w/ DDR5-4800 JEDEC it ended up being 20-22%. If that were the score I think that'd actually be quite good since a desktop chip with larger L3$ would score even better than this. If you extrapolate the small IPC bump from a desktop chip having fatter L3$ and the slight fmax increase over Zen 4 you would see ~30% 1T perf increase.

I personally don't think it's worth drawing conclusions from it though since there's so much variability in GB scores. When searching through scores, you can see the same processors get results that vary by 10% from user submitted data. About the only thing that seems constant is that the Zen 5 core doesn't do great with the AES subtests.

moinmoin · May 15, 2024

BorisTheBlade82 said:
Will be highly interesting to see how this works out and what strategy for switching the threads over they might employ.
Will they switch on a thread level, on a core level (thread pairs), what thresholds, etc. ...

If the previously discussed patent 10698472 is any indication it would work on an opcode level, only implementing opcodes that can be churned through without any real computation (like JMP or anything keeping I/O going etc.) otherwise throwing an illegal opcode exception to move the computation onto a fat core.

adroc_thurston said:
The real question is just how castrated Z5LP really is.

Going by above patent it could well be so castrated it couldn't be used as a full core of its own. It would be little more than a fast shortcut for the simplest of all opcodes.

Mahboi · May 15, 2024

moinmoin said:
If the previously discussed patent 10698472 is any indication it would work on an opcode level, only implementing opcodes that can be churned through without any real computation (like JMP or anything keeping I/O going etc.) otherwise throwing an illegal opcode exception to move the computation onto a fat core.
Going by above patent it could well be so castrated it couldn't be used as a full core of its own. It would be little more than a fast shortcut for the simplest of all opcodes.

I don't know, how would that work for a LP island?
Sounds like something that would fire up the fat cores for literally any workload.

DrMrLordX · May 15, 2024

adroc_thurston said:
It gets day1 ROCm too!

Did I uh, miss something? ROCm under Windows? Google search results seem inconclusive.

edit: huh, seems like ROCm actually works under Win11 now. That's news to me.

cherullo · May 15, 2024

Mahboi said:
I don't know, how would that work for a LP island?
Sounds like something that would fire up the fat cores for literally any workload.

Yeah, the LP cores do need to perform proper computations to actually prevent the HP cores from powering up.
Since it's transparent to the OS, it can support any mix of instructions, regardless of extension. For example, it would be nice to have memory and string instructions from AVX512, even at reduced throughput, but no FMA at all. You can even support certain zeroing idioms without supporting the instruction itself.
You can also strip out all the legacy stuff like 16- and 32-bit modes, x87 FPU, etc. It can be really, really lean.
What moinmoin described is more like a DMA engine, which is also useful.

SteinFG · May 15, 2024

How possible it is for AMD to just ditch numbered tiers? Instead of ryzen 5,7,9 it's just all "Ryzen AI" for Kraken chips, and maybe "Ryzen AI HX" for Strix chips?

LightningZ71 · May 15, 2024

Could the LP cores just be stripped of all the "go fast" bits and anything else that can't be emulated in microcode and essentially be maximally efficient microcode engines instead? They could retain full isa compatibility with the performance cores.

Mahboi · May 15, 2024

Weird hypothesis, but frankly Intel and AMD always used the numbers very poorly.

bad idea

SteinFG · May 15, 2024

Mahboi said:
Weird hypothesis, but frankly Intel and AMD always used the numbers very poorly.
Out of 9 numbers, 4 are used: 3/5/7/9. It's a generally poor system because you don't want to have many tiers(muddies the lineup), and you don't want to have lower numbers than 5. Anyone identifies i3/R3 as "bad", to the point that some noobs actually think a 2023 i3 is worse than a 2014 i7.

Frankly I think clothes did it pretty well: S/M/L/XL?
Make 4 tiers:
- Light
- Medium
- Heavy
- Ultra

But it's kind of problematic too since it's a two sided coin, you get Bigger Number Better just like you get Lower Number Worse.

This just replaces a number with a word, same meaning. What actually should've happened is, each gen should shift the window of numbers. Example: Zen 1 would have Ryzen 1,2,3 tiers, Zen 2 will have Ryzen 2,3,4,5 tiers. Zen 3 would have Ryzen 3,4,5,6 tiers. Zen 4 would have Ryzen 5,6,7 tiers. This way, you avoid the problem you are talking about. Someone will say "I have a Ryzen 4", and you instantly know it's a chip made inbetween 2019 and 2022.

Mahboi · May 15, 2024

SteinFG said:
This just replaces a number with a word, same meaning.

Yes, that's the point.

SteinFG said:
What actually should've happened is, each gen should shift the window of numbers. Example: Zen 1 would have Ryzen 1,2,3 tiers, Zen 2 will have Ryzen 2,3,4,5 tiers. Zen 3 would have Ryzen 3,4,5,6 tiers. Zen 4 would have Ryzen 5,6,7 tiers. This way, you avoid the problem you are talking about.

You just create a huge new slew of problems.

SteinFG said:
Someone will say "I have a Ryzen 4", and you instantly know it's a chip made inbetween 2019 and 2022.

So just the difference between a 3950x and a 7600.

AMD Ryzen 9 3950X vs AMD Ryzen 5 7600 [cpubenchmark.net] by PassMark Software

Just a smidge of 31% better MT perf on one, vs 30% better ST perf on the other.

No, I don't think that's better.

SteinFG · May 15, 2024

Mahboi said:
Just a smidge of 31% better MT perf on one, vs 30% better ST perf on the other.

Is it a lot better when Ryzen 7 has 200% ST and MT better performance than Ryzen 7? (1700X vs 7700X), I don't think that. With intel it's even bigger

Mahboi · May 15, 2024

Mmmh, good point. Maybe it'd just be better to remove it entirely and just leave the Gen/Tier/Rev. And you can bin the "AI" bit too.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Lifer

Member

Golden Member

Platinum Member

Lifer

Senior member

Platinum Member

Member

Platinum Member

Senior member

Platinum Member

Senior member

Senior member

Diamond Member

Senior member

Lifer

Member

Senior member

Golden Member

Senior member

Senior member

Senior member

Senior member

Senior member