Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

FlameTail · Apr 22, 2024

adroc_thurston said:
you forgot to mention the funniest part, A10 had LITTLEs grow off the bigs like a cancerous tumor.

But that changed with Bionics, right?
Long ago, I wondered what 'Fusion' and 'Bionic' meant. Now I know.

FlameTail · Apr 22, 2024

How much memory bandwidth (GB/s) is needed to feed Strix's 50 TOPS NPU?

dhruvdh · Apr 22, 2024

FlameTail said:
How much memory bandwidth (GB/s) is needed to feed Strix's 50 TOPS NPU?

AMD's NPU has memory tiles, and the XDNA 1 implementation could do 30 GBps read and 30 GBps write per memory tile as per Ryzen AI column architecture and tiles — AMD Riallto 1.0 documentation. There is an interface tile that connects this memory tile to system memory.

Going by the 2x8 layout from the recent versal XDNA 2 launch, and that there are 1 less interface tile - it should have a limit of 210 GBps. But the interface tiles seem to support compression and can't say how that changes things. Don't know how XDNA 2 changes things.

But yeah should ideally be fed by ~200GBps bandwidth.

carancho · Apr 22, 2024

dhruvdh said:
AMD's NPU has memory tiles, and the XDNA 1 implementation could do 30 GBps read and 30 GBps write per memory tile as per Ryzen AI column architecture and tiles — AMD Riallto 1.0 documentation. There is an interface tile that connects this memory tile to system memory.

Going by the 2x8 layout from the recent versal XDNA 2 launch, and that there are 1 less interface tile - it should have a limit of 210 GBps. But the interface tiles seem to support compression and can't say how that changes things. Don't know how XDNA 2 changes things.

But yeah should ideally be fed by ~200GBps bandwidth.

Will Strix Point be able to feed it at that speed?

FlameTail · Apr 22, 2024

FYI, Strix Point will have 136 GB/s of main memory bandwidth.

And it has no SLC.

Uh, uh, uh.

So the GPU isn't going to be the only thing that's bandwidth starved huh?

Glo. · Apr 22, 2024

FlameTail said:
FYI, Strix Point will have 136 GB/s of main memory bandwidth.

And it has no SLC.

Uh, uh, uh.

So the GPU isn't going to be the only thing that's bandwidth starved huh?

You are assuming that we will get 8533 LPDDR5X on Strix Point APUs, not 6400 SO-DIMM DDR5.

FlameTail · Apr 22, 2024

I don't think you can overclock LPDDR...

Glo. · Apr 22, 2024

Redacted79 said:
I wonder how well strix halo will age, given that 2026 will prolly have an lpddr6 release of it, which would bump bandwith to like 300 to 400 ish area, considering amd products age fine, i bet we can squeeze a 25% improvement over base strix halo, by probably oc the lpddr5x modules to 9600 or hell to 10000, i heard the ram configuration was in 24 and 48gb for some asus laptops, which will probably be the first ones to release with strix halo, but i do hope we get a 32 or 64gb model, also undervolting this thing will be a massive task

We are not limited by compute capabilities anymore. We are limited by memory.

Its capacity.

Tuna-Fish · Apr 22, 2024

dhruvdh said:
AMD's NPU has memory tiles, and the XDNA 1 implementation could do 30 GBps read and 30 GBps write per memory tile as per Ryzen AI column architecture and tiles — AMD Riallto 1.0 documentation. There is an interface tile that connects this memory tile to system memory.

Going by the 2x8 layout from the recent versal XDNA 2 launch, and that there are 1 less interface tile - it should have a limit of 210 GBps. But the interface tiles seem to support compression and can't say how that changes things. Don't know how XDNA 2 changes things.

But yeah should ideally be fed by ~200GBps bandwidth.

Cache bandwidth is mostly irrelevant for client inference. Caching helps when you can batch requests, but for a client inference setup you mostly just can't usefully do that. A single inference job needs to touch gigabytes of parameters, the actual usable bandwidth is just how much is left of system ram bandwidth after all other users.

Joe NYC · Apr 22, 2024

adroc_thurston said:
You'll have to wait for a year.

If you look past the current sales of gaming PCs, including prebuild systems and DIY, AMD is still barely > 50%

Then, there is the installed base. Prior to 5800x, AMD was that compelling for gaming. Installed base is 65-70% Intel (on Steam Surveys).

So there is a good sized market out there for AMD to sell to, if Intel is not competitive. And this market may be people who never in their lives bought a non-Intel CPU. It seems like an opportunity that does not come around that often.

Here is a dumb way to think: "AMD does not need to release the V-Cache processor until Intel has a more competitive CPU". Smart way to think is for AMD to maximize their sales, while their advantage is the greatest.

Because if they wait until Intel does release a more competitive CPU, it will only give the Inte-only buyers excuse to stay with Intel.

Joe NYC · Apr 22, 2024

carancho said:
Do we have an idea of how RAM will work in Strix Halo? Fully unified, as in a Mac? Or with a part of it reserved for the GPU, as in integrated graphics solutions?

I'm interested in the part for IA RAM heavy workloads, but that would only be a differentiator under the Mac fully unified model. If a portion of RAM is to be reserved, then I could go for a laptop with an Nvidia GPU - since Strix Halo wouldn't have the advantage of a huge pool of RAM vs a GeForce card.

Edit: like the only way to run Llama 3 70B is with an M2/3 Max with at least 64 GB, and from what I see on Twitter even that's rather short on RAM.

It seems like you are contradicting yourself. If the only way to run this code is Mac with 64 GB of memory, then how is dGPU advantage if dGPU can only have 16-24 GB?

Only Strix Halo with 64 GB will equal the Mac with 64 GB of memory.

Ghostsonplanets · Apr 22, 2024

Joe NYC said:
If you look past the current sales of gaming PCs, including prebuild systems and DIY, AMD is still barely > 50%

Then, there is the installed base. Prior to 5800x, AMD was that compelling for gaming. Installed base is 65-70% Intel (on Steam Surveys).

So there is a good sized market out there for AMD to sell to, if Intel is not competitive. And this market may be people who never in their lives bought a non-Intel CPU. It seems like an opportunity that does not come around that often.

Here is a dumb way to think: "AMD does not need to release the V-Cache processor until Intel has a more competitive CPU". Smart way to think is for AMD to maximize their sales, while their advantage is the greatest.

Because if they wait until Intel does release a more competitive CPU, it will only give the Inte-only buyers excuse to stay with Intel.

Intel more competitive CPU arch won't intercept with Zen 5 time on market though? By the time they have a more competitive thing, AMD will also be releasing Zen 6.

Joe NYC · Apr 22, 2024

Ghostsonplanets said:
Intel more competitive CPU arch won't intercept with Zen 5 time on market though? By the time they have a more competitive thing, AMD will also be releasing Zen 6.

Think in terms of "surrender", when the fans of one brand flip to the opposite side. It is also a function of the gap in performance (not just the length of time with trivial lead). No need to "surrender" for 1% or 5% delta.

Zen 5 V-Cache can increase this delta, and with it the rate of "surrender".

Saylick · Apr 22, 2024

uzzi38 said:
This patent reminds me a lot of A10 Fusion, where the little cores ended up being almost transparent to software because if a workload was heavy enough it would transition from the little cores to the big cores, and primarily use those instead (not sure how that was determined by the OS/hardware, but I wouldn't be surprised if it relied upon the types of instructions run - like in that patent - or the workload duration).

A very hardware solution to the whole big.LITTLE scheduling problem. But Apple still ended up dropping it pretty quickly afterwards. It's why it was such a surprise to see AMD patent almost the same idea back then too.

Yeah, I suspect that Apple ditched (if you want to call it that) the concept because they realized their E cores were pretty buff on their own so it made more sense to expose them to the OS so that apps could leverage them in MT situations. With the A10 Fusion, it was either the E core or the P core, but never both.

I still could see AMD implement a power efficiency core on the IO die, a la MTL, just so that the compute die can be shut down, and considering that a Zen 4 or Zen 5 efficiency core is way more robust than a LP-E core in MTL, I suspect there's more cases where AMD could keep the compute die shut down.

StefanR5R · Apr 22, 2024

3D V-Cache client parts:

Joe NYC said:
Because if they wait until Intel does release a more competitive CPU, it will only give the Inte-only buyers excuse to stay with Intel.

I suspect they will wait until supply backlog for Turin-X tapers off.

FlameTail · Apr 22, 2024

When will APUs get 3D V-cache?

Ghostsonplanets · Apr 22, 2024

FlameTail said:
When will APUs get 3D V-cache?

Not in the foreseeable future. Advanced packaging will start to be mainstream once chipmakers are forced due to ever increasing complexity x performance. But early adoption is off the table as your costs increase too much compared to monolithic. See Meteor Lake.

Mopetar · Apr 22, 2024

Goop_reformed said:
My daughter's friend's uncle's newphew said Zen 5 > Arrowlake by a good chunk

I'm pretty sure I just read an article saying much the same by MLID that attributed their source to this nephew, so that must mean the he's legit!

StefanR5R · Apr 22, 2024

Ghostsonplanets said:
Now that AMD has separated Zen 5 Client into Classic and Dense, they can customize Classic to use eLVT for higher clocks while Dense use uLVT for higher efficiency at lower clocks.

branch_suggestion said:
Zen6 takes this even further, if you can figure out the one area AMD is not entirely leading in.

Fjodor2001 said:
In what way does Zen6 take it further? And are such details already known about Zen6?

Well, advanced packaging, which Zen 6 is supposed to bring, should help with power consumption in various load scenarios including idle state, at least. Consumption in Connected Standby alias Modern Standby may still be an issue though.
Relevant to the latter, see #3,895 and #3,919 — but I have no idea whether or not this alluded to Zen 6 already, and these claims were dismissed as mere guesswork by somebody anyway. :-)

adroc_thurston · Apr 22, 2024

FlameTail said:
So the GPU isn't going to be the only thing that's bandwidth starved huh?

no one cares about the dark Si blobs.
They're for marketing.

StefanR5R said:
Well, advanced packaging, which Zen 6 is supposed to bring, should help with power consumption in various load scenarios including idle state, at least. Consumption in Connected Standby alias Modern Standby may still be an issue though.

really now what he implied.

FlameTail said:
When will APUs get 3D V-cache?

never.

FlameTail said:
But that changed with Bionics, right?

well yeah A10 was an utterly schizophrenic design.

dhruvdh · Apr 22, 2024

Tuna-Fish said:
Cache bandwidth is mostly irrelevant for client inference. Caching helps when you can batch requests, but for a client inference setup you mostly just can't usefully do that. A single inference job needs to touch gigabytes of parameters, the actual usable bandwidth is just how much is left of system ram bandwidth after all other users.

I was referring to the theoretical peak of weights being streamed in and out of the NPU given the 7 interface-memory tile pairs. I am confused about what you are trying to say here.

StefanR5R · Apr 22, 2024

adroc_thurston said:
really now what he implied.

now ./. not?
Where further physical design options for f_max implied then? ¹

________
¹) I admit that I have been completely ignoring for years now what the f_max state of play is. Task energy and other practical aspects matter way too much to me to be fascinated with f_max.

poke01 · Apr 22, 2024

https://www.notebookcheck.net/Minisforum-V3-3-in-1-review-the-first-ever-Windows-tablet-with-AMD-s-Hawk-Point-APU-aka-the-AMD-Ryzen-7-8840U.829081.0.html#toc-5

sigh, AMD Hawk Point sucks for tablets, this tablet also has a fan and has heating problems despite having enough heatpipes. I heard this song and dance before but Zen 5 really needs to excel in tablets/thin n lights.

The idle usage is 10W! and battery life is pathetic.

adroc_thurston · Apr 22, 2024

poke01 said:
https://www.notebookcheck.net/Minisforum-V3-3-in-1-review-the-first-ever-Windows-tablet-with-AMD-s-Hawk-Point-APU-aka-the-AMD-Ryzen-7-8840U.829081.0.html#toc-5

sigh, AMD Hawk Point sucks for tablets, this tablet also has a fan and has heating problems despite having enough heatpipes. I heard this song and dance before but Zen 5 really needs excel in tablets/thin n lights.

That idle avg and battery life is pathetic.

That's an OEM issue (and AMD no longer makes tablet parts anyway).

Ghostsonplanets · Apr 22, 2024

Phawx showed OneXplayer using 8840 and idling at ~4W - 5W. Still high, but dramatically lower than the result NBC got.

(Still think a 8C + 12 CU SoC is too big for handheld power targets. Something like KRK with 444 or even bin KRK with 243 should do much better at 15 - 20W range.)

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Platinum Member

Platinum Member

Junior Member

Junior Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Platinum Member

Senior member

Platinum Member

Diamond Member

Elite Member

Platinum Member

Senior member

Diamond Member

Elite Member

Platinum Member

Junior Member

Elite Member

Senior member

Platinum Member

Senior member