Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 370 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Timorous

Golden Member
Oct 27, 2008
1,668
2,935
136
i remember it for pentium D , but not Core 2 , link ?

It was both. Pentium D was glued and to talk to each other they had to go through the FSB to the north bridge and back.

Q6600 was the same with a pair of dual core dies that has to communicate via the FSB and northbridge.

Now AMD does it except the northbridge is on the package so the FSB can be much faster but fundamentally it is the same basic building blocks.
 

ToTTenTranz

Member
Feb 4, 2021
61
102
76
Honestly Kraken feels more like an heir to the 5600/5700g than a PHX heir.
It's not about gaming on an APU, it's about sort of kind of being able to game on an APU. The lowest cost still gaming capable ish system, if you're ok with regularly dropping to low settings, 720p and/or 30 fps.

From the looks of it, Sonoma is the actual low-cost successor of Phoenix2 and older Mendocino.
Depending on its GPU's clock/power curves, Kraken might actually be the better alternative for 15-25W handheld gaming PCs.

Strix Point v Kraken having only 36% higher TS score despite 2x bigger IGP?
Hardly true. It's just a speculation.
Same memory bandwidth on both sides, with Strix Point getting 50% more CPU cores that consume memory bandwidth.

I was hoping for Strix Point to either have a bit of Infinity Cache or at least LPDDR5T. Unless something's changed, AMD's designs don't let the iGPU access the CPU L3 either, so it looks like Strix Point is going to have a massive memory bandwidth bottleneck.
 

Mahboi

Senior member
Apr 4, 2024
458
770
91
So this is the SF4P chip. 4 core cheapo part sounds great on a cheap node.
Hopefully we see great prices for it. (knock on wood)

Edit: I have to say I dont get the point of a 16 cores Strix Halo.
I get that it's a Halo part, but in no console, portable or not, is 16 cores even close to useful. It's all locked at 8 and everyone that tried the 8/12/16 cores Zen 4 X3Ds came and said the 8 core was just the best deal overall. AMD even dropped trying to sell 2 CCDs with the X3D cause it was just a waste of cache.
I really don't get it, this could've been an easy 8 core part and still be fully a Halo product.
 
Last edited:
Reactions: Joe NYC

leoneazzurro

Senior member
Jul 26, 2016
945
1,503
136
That is exactly because it is a "halo" part in both CPU and GPU departments, probably for a light workstation with enough power to face almost all tasks - I admit it is a little unbalanced towards the CPU side but probably it was easier than beef up the GPU more, and 8 cores were deemed not enough for a high premium part.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,369
2,863
136
Same memory bandwidth on both sides, with Strix Point getting 50% more CPU cores that consume memory bandwidth.
Same memory bandwidth? Only controller width is the same, but that doesn't mean they will use the same memory.
BTW what does It have to do with TimeSpy and IGP?

I was hoping for Strix Point to either have a bit of Infinity Cache or at least LPDDR5T. Unless something's changed, AMD's designs don't let the iGPU access the CPU L3 either, so it looks like Strix Point is going to have a massive memory bandwidth bottleneck.
Do you really think AMD would put a 16CU IGP inside just to be massively bandwidth starved?
If It was as much starved as you make It out to be, then they should have kept only 12CU.
 
Reactions: Tlh97

ToTTenTranz

Member
Feb 4, 2021
61
102
76
Phoenix2 and Mendo aren't the same price bracket, Mendocino is a cheaper part. Sonoma Valley is a direct replacement for the same socket as Mendocino, so it should be considerably cheaper than PHX2.
I'm aware that Mendocino and Phoenix 2 are on different price brackets, but Sonoma is probably in the middle between the two and Kraken is well above Phoenix 2. Depending on the Zen5's IPC upgrades, Sonoma might actually be close to Phoenix 2 in CPU performance.



Edit: I have to say I dont get the point of a 16 cores Strix Halo.
I get that it's a Halo part, but in no console, portable or not, is 16 cores even close to useful. It's all locked at 8 and everyone that tried the 8/12/16 cores Zen 4 X3Ds came and said the 8 core was just the best deal overall. AMD even dropped trying to sell 2 CCDs with the X3D cause it was just a waste of cache.
I really don't get it, this could've been an easy 8 core part and still be fully a Halo product.
That's certainly valid for gaming tasks, but it looks like Strix Halo is a direct competitor to Apple's M3 Max.
That's why it has 16 CPU cores like the M3 Max and a probably comparable iGPU doing 10-15 TFLOPs (20-30 theoretical from double pumped ALUs). The only thing AMD isn't keeping up with is the massive 512bit LPDDR5 vs Halo's 256bit, but that difference is probably watered down by AMD going with 8533Mbps LPDDR5X (Apple's using 6400Mbps), twice the L3 cache and on the GPU side there's also the 32MB Infinity Cache.

So AMD is making a big SoC for medium to large-sized productivity laptops. That means it needs to do well in all kinds of number crunching like gaming, video editing, simulation, art design, engineering / simulation, etc.


That is exactly because it is a "halo" part in both CPU and GPU departments, probably for a light workstation with enough power to face almost all tasks - I admit it is a little unbalanced towards the CPU side but probably it was easier than beef up the GPU more, and 8 cores were deemed not enough for a high premium part.
Which also makes me wonder what AMD's expectations are for this chip's ASP and margins. They could put this on competitors to ~$1200 gaming laptops with Core i7 / Ryzen 7 and RTX4060-class dGPUs and it would probably be cheaper to make. However the Halo part of it tells me they're only putting this into $2000 products or more, for brand value.


Same memory bandwidth? Only controller width is the same, but that doesn't mean they will use the same memory.
BTW what does It have to do with TimeSpy and IGP?
They're all probably using the same LPDDR5X PHYs and controllers from Synopsys, and most likely they're using the same (fastest as possible) memory for these benchmarks (assuming they're legit).
And of course memory bandwidth plays a very important role in GPU performance.

Do you really think AMD would put a 16CU IGP inside just to be massively bandwidth starved?
If It was as much starved as you make It out to be, then they should have kept only 12CU.
Phoenix with 12 CUs is massively bandwidth starved with 6400MT/s LP5 (ROG Ally, for example), so yes.
16CUs in Strix Point will be +33% compute units compared to Pheonix. If Strix Point uses 8533MT/s LP5X then it has +33% more bandwidth than 6400MT/s LP5.
 
Last edited:

FlameTail

Platinum Member
Dec 15, 2021
2,356
1,276
106
16CUs in Strix Point will be +33% compute units compared to Pheonix. If Strix Point uses 8533MT/s LP5X then it has +33% more bandwidth than 6400MT/s LP5
But Strix Point's theoretical GPU performance uplift will be higher than 33%, considering that it's also bring upgraded from RDNA3 to RDNA3.5, and probably higher clocks for the CUs as well.

That and the fact that Strix Point has 12 cores means the CPU MT performance is >50% greater than Phoenix.

The RAM bandwidth has to feed all of that. I don't think it can. Memory bandwidth starvation...
certainly valid for gaming tasks, but it looks like Strix Halo is a direct competitor to Apple's M3 Max.
That's why it has 16 CPU cores like the M3 Max and a probably comparable iGPU doing 10-15 TFLOPs (20-30 theoretical from double pumped ALUs).
I don't think Strix Halo iGPU performance is going to rival M3 Max. If you look at the benchmarks...

It would certainly beat M3 Pro's iGPU though...
The only thing AMD isn't keeping up with is the massive 512bit LPDDR5 vs Halo's 256bit, but that difference is probably watered down by AMD going with 8533Mbps LPDDR5X (Apple's using 6400Mbps), twice the L3 cache and on the GPU side there's also the 32MB Infinity Cache.
Apple still has a memory bandwidth advantage. Look at this monstrosity:

Look at the massive LPDDR memory controllers and SLC slices, flanking the GPU like rockets.

M3 Max vs Strix Halo
256 bit vs 512 bit
LPDDR5-6400 vs LPDDR5X-8533
400 GB/s vs 273 GB/s
48 MB SLC vs 32 MB Infinity Cache

Btw if Gurman is correct, M4 Max is coming at the end of this year, so Strix Halo will have to compete with that.
 
Reactions: Tlh97

poke01

Senior member
Mar 8, 2022
822
828
106
But Strix Point's theoretical GPU performance uplift will be higher than 33%, considering that it's also bring upgraded from RDNA3 to RDNA3.5, and probably higher clocks for the CUs as well.

That and the fact that Strix Point has 12 cores means the CPU MT performance is >50% greater than Phoenix.

The RAM bandwidth has to feed all of that. I don't think it can. Memory bandwidth starvation...

I don't think Strix Halo iGPU performance is going to rival M3 Max. If you look at the benchmarks...
View attachment 96995
It would certainly beat M3 Pro's iGPU though...

Apple still has a memory bandwidth advantage. Look at this monstrosity:
View attachment 96996
Look at the massive LPDDR memory controllers and SLC slices, flanking the GPU like rockets.

M3 Max vs Strix Halo
256 bit vs 512 bit
LPDDR5-6400 vs LPDDR5X-8533
400 GB/s vs 273 GB/s
48 MB SLC vs 32 MB Infinity Cache

Btw if Gurman is correct, M4 Max is coming at the end of this year, so Strix Halo will have to compete with that.
Strix Halo isn't a M3 Max. Its more like a M3 Pro. M3 Max is very expensive to fab. Thats why its in $3500 Macbooks for the full die for now. AMD would be using strix halo for gaming, not for content creation or Blender etc. Different use cases imo.
 

FlameTail

Platinum Member
Dec 15, 2021
2,356
1,276
106
Strix Halo isn't a M3 Max. Its more like a M3 Pro. M3 Max is very expensive to fab. Thats why its in $3500 Macbooks for the full die for now. AMD would be using strix halo for gaming, not for content creation or Blender etc. Different use cases imo.
Indeed. I forgot to write the conclusion that Strix Halo is an M Pro-class part, and not an M Max-class part.
 

Glo.

Diamond Member
Apr 25, 2015
5,726
4,606
136
Edit: I have to say I dont get the point of a 16 cores Strix Halo.
I get that it's a Halo part, but in no console, portable or not, is 16 cores even close to useful. It's all locked at 8 and everyone that tried the 8/12/16 cores Zen 4 X3Ds came and said the 8 core was just the best deal overall. AMD even dropped trying to sell 2 CCDs with the X3D cause it was just a waste of cache.
I really don't get it, this could've been an easy 8 core part and still be fully a Halo product.
256 bit bus, 40 CU GPU, large NPU, large RAM pool available for the GPU, and 16 powerful CPU cores.

What else it can be than M2 Max competitor for AI, and LLMs?
 

ToTTenTranz

Member
Feb 4, 2021
61
102
76
But Strix Point's theoretical GPU performance uplift will be higher than 33%, considering that it's also bring upgraded from RDNA3 to RDNA3.5, and probably higher clocks for the CUs as well.

That and the fact that Strix Point has 12 cores means the CPU MT performance is >50% greater than Phoenix.

The RAM bandwidth has to feed all of that. I don't think it can. Memory bandwidth starvation...

I agree. If we look at the bandwidth-per-compute of an actual gaming-oriented SoC like Van Gogh, Series S, PS5, etc. then Phoenix is clearly bandwidth-starved. And by extension, so will be Strix Point.



I don't think Strix Halo iGPU performance is going to rival M3 Max. If you look at the benchmarks...

It would certainly beat M3 Pro's iGPU though...

Apple SoCs score abnormally high on multiplatform synthetic benchmarks, which then doesn't translate into actual gaming performance. Just look at the comparison charts in notebookcheck:

In Wild Life Extreme and GFXBench the M3 Max is indeed as fast as a RTX4080 Laptop (AD104, similar to desktop 4070 Ti), but in gaming benchmarks it fails to surpass the RTX 4060 even in games compiled for iOS like Shadow of the Tomb Raider. In many games it's even struggling against the RTX 4050.

I think the M3 GPU architecture (which I assume is still pretty much a fork of PowerVR Rogue) suffers from similar issues of low-ish effective bandwidth and high-ish cache latency that Intel Archmage, AMD GCN and Nvidia pre-Haswell also have. It's still a GPU made mostly for high compute throughput when there's very high occupancy, i.e. it can only tap into most of its bandwidth when many execution units are being used in parallel.
For compute tasks that's ok (AMD's CDNA is still much closer to GCN than RDNA) and for synthetic benchmarks this can be hand-tuned, but for unpredictable loads like videogames this isn't so good.


Regardless, the TLDR here is that despite using a smaller iGPU, Strix Halo should have much better gaming performance than the M3 Max. Especially if its rumored promises of RTX4070 Ti laptop performance are true. But for "pure" GPGPU tasks the M3 Max may have an edge.



Look at the massive LPDDR memory controllers and SLC slices, flanking the GPU like rockets.

M3 Max vs Strix Halo
256 bit vs 512 bit
LPDDR5-6400 vs LPDDR5X-8533
400 GB/s vs 273 GB/s
48 MB SLC vs 32 MB Infinity Cache
The big factor you're missing here is that the M3 uses a "System Level Cache" that serves the CPU and GPU, but has no L3 for the CPU.

For "last level cache" it's actually:
CPU: 48MB SLC (shared with GPU) vs. 64MB L3 exclusive
GPU: 48MB SLC (shared with CPU) vs. 40MB edit: 32MB Infinity Cache exclusive

Though this doesn't tell the full story either, because there's e.g. more L2 cache in apple's CPU cores than in Zen5.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,369
2,863
136
They're all probably using the same LPDDR5X PHYs and controllers from Synopsys, and most likely they're using the same (fastest as possible) memory for these benchmarks (assuming they're legit).
And of course memory bandwidth plays a very important role in GPU performance.
Who said It's legit and not just a speculation?
You brought up Strix Point having 50% more cores as some proof a reason why the difference in TS is so small.
I can tell you that CPU utilization in TimeSpy during the graphic test is less than 30% even in the worst case. I just tested It today using my 7840HS.
Btw, I really have to wonder why you think Kraken will use 8533MT/s memory. Not like It's impossible, but why It is needed for 8CU IGP?
Phoenix with 12 CUs is massively bandwidth starved with 6400MT/s LP5 (ROG Ally, for example), so yes.
Give me graphs or numbers.
Maybe you should also question why 8CU RDNA3.5 has basically the same performance as 12CU RDNA3.
 
Last edited:

FlameTail

Platinum Member
Dec 15, 2021
2,356
1,276
106
The big factor you're missing here is that the M3 uses a "System Level Cache" that serves the CPU and GPU, but has no L3 for the CPU.

For "last level cache" it's actually:
CPU: 48MB SLC (shared with GPU) vs. 64MB L3 exclusive
GPU: 48MB SLC (shared with CPU) vs. 40MB Infinity Cache exclusive

Though this doesn't tell the full story either, because there's e.g. more L2 cache in apple's CPU cores than in Zen5.
Ah, but you are misunderstanding Apple's CPU Design philosophy. They have no need for an L3, by virtue of the fact that their L1/L2 is so huge.

Also, Strix Halo has 32 MB Infinity Cache, not 40 MB?

I think the M3 GPU architecture (which I assume is still pretty much a fork of PowerVR Rogue)
Apple's GPU architecture traces it's lineage back to Imagination GPUs, not PowerVR.


While we don’t have much insight into Apple’s latest GPU designs, it’s understood that these are custom microarchitecture designs are based upon Imagination’s GPU architecture IP, which makes it unique in the GPU world as we don’t see any other such GPU architecture license in the market. Features such as tile-based deferred rendering and PVRTC are Imagination patented technologies which Apple currently publicly exposes as features of its GPUs, so it’s evident that the current designs still very much use the British company’s IP. The GPU’s block structure is also very similar to that of Imaginations, further pointing out to a close relationship between the designs.
 
Reactions: Mopetar

TESKATLIPOKA

Platinum Member
May 1, 2020
2,369
2,863
136
I think you misunderstood what I meant by "Huge L2"

Radeon 780M has 2 MB L2.

What if Strix Point has something like 6 or 8 MB? That's a substantial 3x to 4x increase, while not blowing the die size by much (?).
I didn't misunderstand anything.
If L2 is less dense, then even with only 6-8MB It wouldn't be much smaller than 16MB of Infinity cache.
Not like 16MB IC is big. It's 10mm2 or so I think.
 

Timmah!

Golden Member
Jul 24, 2010
1,430
660
136
If Strix Halo does presumably make 45k in CB23 MT, as per this latest leak, can we expect desktop 9950x do more than that? Cause of higher TDP and whatnot? If yes, how much do you reckon?
What does 55W-125W means, which one is it then? Or is the higher figure including the GPU part?
 

ToTTenTranz

Member
Feb 4, 2021
61
102
76
You brought up Strix Point having 50% more cores as some proof why the difference in TS is so small.

More and/or higher performing CPU cores = mode demanding client to the memory controller that shares its resources with the GPU, in an Unified Memory Architecture (UMA).

In UMA systems, memory requests from the CPU actually reduces the GPU's effective bandwidth disproportionately. Probably because serving memory requests from one causes a latency to serving memory requests from the other. In simpler terms, if the memory controller is too busy with the CPU, the GPU will stall waiting for it.

Sony even had a slide about this in their internal guidelines for PS4 developers:





This is also one of the most probable reasons why the PS5 Pro isn't doing any substantial upgrade to the CPU over the PS5. Sony probably wants all the extra bandwidth from the ~28.5% faster memory to go to the new GPU alone.


I am not interested in your words, give me graphs or numbers.
Graphs or numbers of what? 18CUs being 33.(3)% more CUs than 12 CUs? 😐


Maybe you should question why 8CU RDNA3.5 have basically the same performance as 12CU RDNA3.
For the same reason the cut-down 760M with 8CU RDNA3 gets similar performance to the full 780M in so many games.
Because the full GPU in Phoenix is bottlenecked by memory bandwidth.


Also, Strix Halo has 32 MB Infinity Cache, not 40 MB?
Yup, my bad.


Apple's GPU architecture traces it's lineage back to Imagination GPUs, not PowerVR.
PowerVR is the graphics division of Imagination
 

StefanR5R

Elite Member
Dec 10, 2016
5,580
7,972
136
[...] fundamentally it is the same basic building blocks.
They may be made up of ~same building blocks. But the the cores, caches and memory controllers of a) Pentium D, b) Core 2 Quad, c) Zen 1 Ryzen, d) Zen 1 Epyc, e) Zen 2/3/4/5 Ryzen and Epyc are arranged in five different topologies. What the respective competitor's marketing departments had to say about these various different solutions at the time may have been entertaining at best, but lacked analytical depth. ;-)

I have to say I dont get the point of a 16 cores Strix Halo.
I get that it's a Halo part, but in no console, portable or not, is 16 cores even close to useful. It's all locked at 8 and everyone that tried the 8/12/16 cores Zen 4 X3Ds came and said the 8 core was just the best deal overall. AMD even dropped trying to sell 2 CCDs with the X3D cause it was just a waste of cache.
I really don't get it, this could've been an easy 8 core part and still be fully a Halo product.
Actually the dual-CCX parts are even at a certain (and sometimes grave) disadvantage compared to single-CCX parts in certain applications, which includes many games but also some computational tasks — due to their lack of a unified last level cache. The additional Infinity Cache of Strix Halo (if there is one) does not look like it would have any bearing on this particular disadvantage, or does it?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |