Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 382 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

dhruvdh

Junior Member
Apr 2, 2024
10
13
36
How much memory bandwidth (GB/s) is needed to feed Strix's 50 TOPS NPU?
AMD's NPU has memory tiles, and the XDNA 1 implementation could do 30 GBps read and 30 GBps write per memory tile as per Ryzen AI column architecture and tiles — AMD Riallto 1.0 documentation. There is an interface tile that connects this memory tile to system memory.


Going by the 2x8 layout from the recent versal XDNA 2 launch, and that there are 1 less interface tile - it should have a limit of 210 GBps. But the interface tiles seem to support compression and can't say how that changes things. Don't know how XDNA 2 changes things.

But yeah should ideally be fed by ~200GBps bandwidth.
 

carancho

Junior Member
Feb 24, 2013
21
7
81
AMD's NPU has memory tiles, and the XDNA 1 implementation could do 30 GBps read and 30 GBps write per memory tile as per Ryzen AI column architecture and tiles — AMD Riallto 1.0 documentation. There is an interface tile that connects this memory tile to system memory.


Going by the 2x8 layout from the recent versal XDNA 2 launch, and that there are 1 less interface tile - it should have a limit of 210 GBps. But the interface tiles seem to support compression and can't say how that changes things. Don't know how XDNA 2 changes things.

But yeah should ideally be fed by ~200GBps bandwidth.
Will Strix Point be able to feed it at that speed?
 

FlameTail

Platinum Member
Dec 15, 2021
2,356
1,275
106
FYI, Strix Point will have 136 GB/s of main memory bandwidth.

And it has no SLC.

Uh, uh, uh.

So the GPU isn't going to be the only thing that's bandwidth starved huh?
 

Glo.

Diamond Member
Apr 25, 2015
5,723
4,594
136
FYI, Strix Point will have 136 GB/s of main memory bandwidth.

And it has no SLC.

Uh, uh, uh.

So the GPU isn't going to be the only thing that's bandwidth starved huh?
You are assuming that we will get 8533 LPDDR5X on Strix Point APUs, not 6400 SO-DIMM DDR5.
 

Glo.

Diamond Member
Apr 25, 2015
5,723
4,594
136
I wonder how well strix halo will age, given that 2026 will prolly have an lpddr6 release of it, which would bump bandwith to like 300 to 400 ish area, considering amd products age fine, i bet we can squeeze a 25% improvement over base strix halo, by probably oc the lpddr5x modules to 9600 or hell to 10000, i heard the ram configuration was in 24 and 48gb for some asus laptops, which will probably be the first ones to release with strix halo, but i do hope we get a 32 or 64gb model, also undervolting this thing will be a massive task
We are not limited by compute capabilities anymore. We are limited by memory.

Its capacity.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,357
1,565
136
AMD's NPU has memory tiles, and the XDNA 1 implementation could do 30 GBps read and 30 GBps write per memory tile as per Ryzen AI column architecture and tiles — AMD Riallto 1.0 documentation. There is an interface tile that connects this memory tile to system memory.


Going by the 2x8 layout from the recent versal XDNA 2 launch, and that there are 1 less interface tile - it should have a limit of 210 GBps. But the interface tiles seem to support compression and can't say how that changes things. Don't know how XDNA 2 changes things.

But yeah should ideally be fed by ~200GBps bandwidth.

Cache bandwidth is mostly irrelevant for client inference. Caching helps when you can batch requests, but for a client inference setup you mostly just can't usefully do that. A single inference job needs to touch gigabytes of parameters, the actual usable bandwidth is just how much is left of system ram bandwidth after all other users.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,011
2,428
106
You'll have to wait for a year.
If you look past the current sales of gaming PCs, including prebuild systems and DIY, AMD is still barely > 50%

Then, there is the installed base. Prior to 5800x, AMD was that compelling for gaming. Installed base is 65-70% Intel (on Steam Surveys).

So there is a good sized market out there for AMD to sell to, if Intel is not competitive. And this market may be people who never in their lives bought a non-Intel CPU. It seems like an opportunity that does not come around that often.

Here is a dumb way to think: "AMD does not need to release the V-Cache processor until Intel has a more competitive CPU". Smart way to think is for AMD to maximize their sales, while their advantage is the greatest.

Because if they wait until Intel does release a more competitive CPU, it will only give the Inte-only buyers excuse to stay with Intel.
 
Last edited:

Joe NYC

Platinum Member
Jun 26, 2021
2,011
2,428
106
Do we have an idea of how RAM will work in Strix Halo? Fully unified, as in a Mac? Or with a part of it reserved for the GPU, as in integrated graphics solutions?

I'm interested in the part for IA RAM heavy workloads, but that would only be a differentiator under the Mac fully unified model. If a portion of RAM is to be reserved, then I could go for a laptop with an Nvidia GPU - since Strix Halo wouldn't have the advantage of a huge pool of RAM vs a GeForce card.

Edit: like the only way to run Llama 3 70B is with an M2/3 Max with at least 64 GB, and from what I see on Twitter even that's rather short on RAM.
It seems like you are contradicting yourself. If the only way to run this code is Mac with 64 GB of memory, then how is dGPU advantage if dGPU can only have 16-24 GB?

Only Strix Halo with 64 GB will equal the Mac with 64 GB of memory.
 

Ghostsonplanets

Senior member
Mar 1, 2024
352
554
96
If you look past the current sales of gaming PCs, including prebuild systems and DIY, AMD is still barely > 50%

Then, there is the installed base. Prior to 5800x, AMD was that compelling for gaming. Installed base is 65-70% Intel (on Steam Surveys).

So there is a good sized market out there for AMD to sell to, if Intel is not competitive. And this market may be people who never in their lives bought a non-Intel CPU. It seems like an opportunity that does not come around that often.

Here is a dumb way to think: "AMD does not need to release the V-Cache processor until Intel has a more competitive CPU". Smart way to think is for AMD to maximize their sales, while their advantage is the greatest.

Because if they wait until Intel does release a more competitive CPU, it will only give the Inte-only buyers excuse to stay with Intel.
Intel more competitive CPU arch won't intercept with Zen 5 time on market though? By the time they have a more competitive thing, AMD will also be releasing Zen 6.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,011
2,428
106
Intel more competitive CPU arch won't intercept with Zen 5 time on market though? By the time they have a more competitive thing, AMD will also be releasing Zen 6.
Think in terms of "surrender", when the fans of one brand flip to the opposite side. It is also a function of the gap in performance (not just the length of time with trivial lead). No need to "surrender" for 1% or 5% delta.

Zen 5 V-Cache can increase this delta, and with it the rate of "surrender".
 

Saylick

Diamond Member
Sep 10, 2012
3,194
6,492
136
This patent reminds me a lot of A10 Fusion, where the little cores ended up being almost transparent to software because if a workload was heavy enough it would transition from the little cores to the big cores, and primarily use those instead (not sure how that was determined by the OS/hardware, but I wouldn't be surprised if it relied upon the types of instructions run - like in that patent - or the workload duration).

A very hardware solution to the whole big.LITTLE scheduling problem. But Apple still ended up dropping it pretty quickly afterwards. It's why it was such a surprise to see AMD patent almost the same idea back then too.
Yeah, I suspect that Apple ditched (if you want to call it that) the concept because they realized their E cores were pretty buff on their own so it made more sense to expose them to the OS so that apps could leverage them in MT situations. With the A10 Fusion, it was either the E core or the P core, but never both.

I still could see AMD implement a power efficiency core on the IO die, a la MTL, just so that the compute die can be shut down, and considering that a Zen 4 or Zen 5 efficiency core is way more robust than a LP-E core in MTL, I suspect there's more cases where AMD could keep the compute die shut down.
 
Reactions: Tlh97

StefanR5R

Elite Member
Dec 10, 2016
5,534
7,872
136
Now that AMD has separated Zen 5 Client into Classic and Dense, they can customize Classic to use eLVT for higher clocks while Dense use uLVT for higher efficiency at lower clocks.
Zen6 takes this even further, if you can figure out the one area AMD is not entirely leading in.
In what way does Zen6 take it further? And are such details already known about Zen6?
Well, advanced packaging, which Zen 6 is supposed to bring, should help with power consumption in various load scenarios including idle state, at least. Consumption in Connected Standby alias Modern Standby may still be an issue though.
Relevant to the latter, see #3,895 and #3,919 — but I have no idea whether or not this alluded to Zen 6 already, and these claims were dismissed as mere guesswork by somebody anyway. :-)
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,328
3,146
96
So the GPU isn't going to be the only thing that's bandwidth starved huh?
no one cares about the dark Si blobs.
They're for marketing.
Well, advanced packaging, which Zen 6 is supposed to bring, should help with power consumption in various load scenarios including idle state, at least. Consumption in Connected Standby alias Modern Standby may still be an issue though.
really now what he implied.
When will APUs get 3D V-cache?
never.
But that changed with Bionics, right?
well yeah A10 was an utterly schizophrenic design.
 

dhruvdh

Junior Member
Apr 2, 2024
10
13
36
Cache bandwidth is mostly irrelevant for client inference. Caching helps when you can batch requests, but for a client inference setup you mostly just can't usefully do that. A single inference job needs to touch gigabytes of parameters, the actual usable bandwidth is just how much is left of system ram bandwidth after all other users.
I was referring to the theoretical peak of weights being streamed in and out of the NPU given the 7 interface-memory tile pairs. I am confused about what you are trying to say here.
 

StefanR5R

Elite Member
Dec 10, 2016
5,534
7,872
136
really now what he implied.
now ./. not?
Where further physical design options for f_max implied then? ¹

________
¹) I admit that I have been completely ignoring for years now what the f_max state of play is. Task energy and other practical aspects matter way too much to me to be fascinated with f_max.
 

poke01

Senior member
Mar 8, 2022
750
744
106

adroc_thurston

Platinum Member
Jul 2, 2023
2,328
3,146
96

sigh, AMD Hawk Point sucks for tablets, this tablet also has a fan and has heating problems despite having enough heatpipes. I heard this song and dance before but Zen 5 really needs excel in tablets/thin n lights.

That idle avg and battery life is pathetic.
That's an OEM issue (and AMD no longer makes tablet parts anyway).
 
Reactions: poke01

Ghostsonplanets

Senior member
Mar 1, 2024
352
554
96
Phawx showed OneXplayer using 8840 and idling at ~4W - 5W. Still high, but dramatically lower than the result NBC got.

(Still think a 8C + 12 CU SoC is too big for handheld power targets. Something like KRK with 444 or even bin KRK with 243 should do much better at 15 - 20W range.)
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |