Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 163 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
5,633
8,107
136
Re #4,028, #4,044:

Ansys Fluent throughput very much depends on memory bandwidth and level 3 cache (https://www.amd.com/en/server-docs/ansys-fluent-performance-amd-epyc-7003-series-processors). Hence, SMT isn't useful. However, on Linux, you get almost the same performance if you
a) leave SMT on but configure the application to limit its computing thread count to half of the number of logical CPUs,
b) switch SMT off in the BIOS and let the application use all of the logical CPUs,
c) do it like in a) but additionally bind the computing threads to dedicated logical CPUs such that none of them share a physical core.​
I suspect that a) might be way behind b) and c) on Windows, but it definitely is not on Linux. — In other words, SMT=off is a bit of a primitive hack for applications like this.

Bergamo isn't quite intended for CFD, obviously. The F (cache and frequency optimized) and X (3D cache) SKUs of Milan, Genoa, and (to get back on topic) Turin, are or will be better suited. Much more so if you pay per-CPU license fees. Even if you have a license for 256 CPUs, I do suspect that really large simulations compute faster (at the expense of respectively more task energy) on a small Infiniband cluster of nodes with said F or X SKUs, with higher MPI latency but also with much more aggregate cache and memory channels. Actually, I wonder if a single 2P Genoa-X wouldn't already handily outperform the 2P Bergamo — given the same memory throughput, same power budget, same MPI latency, but more processor cache (though lower compute density) on Genoa-X.

And getting back to Zen 5 and Turin: Seems as if it might get some interesting updates for streaming FP math, but what about classic FP math? I'm rather out of touch with CFD by now; last time I dealt with it, vector arithmetic wasn't a thing for CFD (pressure solvers).
(edited for clarity)
 
Last edited:

Tigerick

Senior member
Apr 1, 2022
683
565
106
I wonder if he would want to wake up if he realize he will get single digit percentage gains . [15% IPC gain + supposed 5 % clock regression = paltry 9% perf gain.]

Besides that, core uarch update seems really interesting. Biggest update since Zen 1 for sure.
I would have like to see more updates on the SoC architecture but looks like another long wait.

16M SLC on 8C Strix Zen 5c with 8WGP RDNA3+ would have been great for a lot of these Windows handhelds otherwise.

Strix seems to be getting a huge bump on AIE tiles and +33% CUs.

View attachment 86599
Curious whether they will put SLC/MALL all around for Zen 6. I have seen several MALL prefetch patents.
Hmm, now we are dealing with 2 different dies of Strix Point; instead of cutting core counts, AMD decided to cut half amount of L3 cache of Zen 5C, interesting. At least we have clearer picture of what the mobile APU lineup in 2024:-

2024TDPNodeDie SizeP-coreE-coreTotal L3 CacheRDNA3+ALUAIEMemory BW
Ryzen 5
U-series
?N4P?? x Zen58xZen5c
8 MB
? MB???128-bit 8533
Ryzen 7
U-series
28-35W+N4P225 mm24 x Zen5
16 MB
8xZen5c
16 MB
32 MB8 WGP102464128-bit 8533
Ryzen 7
HS-series
45W+N4P225 mm24 x Zen5
16 MB
8xZen5c
16 MB
32 MB8 WGP102464128-bit 8533
Ryzen 9
HS-series
?N4Px2
+ N3E
?16xZen5
64 MB
NA64 MB20 WGP2560?256-bit 8533
Ryzen 9
HX-series
55W+N4Px2
+ N6 ?
?16xZen5
64 MB
NA64 MB??NA128-bit
 
Last edited:

eek2121

Platinum Member
Aug 2, 2005
2,989
4,135
136
I may have missed this, but were there any assumptions on whether Zen 6 will require a new socket?
Just curious of what actual AM5 lifespan is.
Zen 6 will likely still use AM5.
I'm afraid there would be no Zen5 at January. Only teaser.

OTOH, few months ago Zen5 DT completion had been already planned to be Oct-Nov, mass production could happen at this timeframe, and there were about 4 months gap between completion to release since Zen2, so you can expect the actual release could happen in 1H2024 or even as early as Mar-Apr.

Mass production could even happening right now when I type this message.
Oh I didn’t mean to imply as much. “Announce” was what I was referring to, though you are right we could just get a teaser. Usually parts follow the announcement after a period of weeks or months, so March - May are probably good bets. I have not actually seen any solid leaks for that timeline, however. The only leak I have seen that was reliable indicated late 3rd quarter.
Also doesn't disprove that there are instances where a change can both increase IPC and reduce power consumption, which is very obviously true. Practically anything that reduces the need to move up the memory hierarchy may do that, or anything that reduces communication distances/hops, and of course not all work is created equal and it's possible to do more or less work to achieve a result. A more general approach to finding a result can both perform worse and consume more energy than a more specific approach. It's of course possible to do more work with less active transistors and vice versa. And then there's pipelines, branch prediction (which is huge, mispredictions are extremely expensive), OoO etc.

It's not a claim any engineer would make.
Very rarely will that ever be the case. When it is, the issue is usually either a failure to optimize the first iteration or the introduction of new power management features (or both)

Absent those two things, increasing IPC means increasing transistors, which means increasing power consumption.

Oh, and regardless of what you think of @adroc_thurston , note that I AM an engineer. I used to build some hardware products for a living, but these days it is all software, (though I did build a 6502 system on a breadboard recently, and I do have a product I am working on outside of work that is hardware, but not chip level stuff). I am also a tech veteran, having been building PCs since the late 80s. I also have an A+ cert and many other certs, used to work in IT (I do mostly web development these days since it pays very well) and have quite a few industry contacts, just very few that could (or are willing to) give any inside info.

What AMD did with Zen 3 is actually pretty unheard of in the industry. However…
Zen 3 vs. Zen 2
  • 19% IPC increase
  • same node class, but improved
  • slightly higher clocks
  • bigger die
  • ISO power
  • right in our face

The biggest problem in this thread isn't this discussion point though, but rather whether folks around here are going to accept the rude verdicts of a poster as gospel or demand the minimum of proof and decorum.
Zen 3 is a perf/watt champ, but those charts show exactly what we are referring to. IPC is up, but so is power. That is why the big upgrades often happen after node shrinks. Shrinks drop power/die area, increasing the budget for more transistors, which allow for IPC upgrades.
To be honest I'm sort of baffled of how people here treated mlid like he's an anti-christ. He definitely has good infos and people who are into tech news watch him religiously. Right after a new video is posted, a member linked that here almost instantly. I used to think people hate watch him but my opinions have been swayed.
  1. He makes stuff up all the time.
  2. He has deleted videos where he made stuff up and got it wrong.
  3. He also spins his failures to make it look like products were cancelled, etc.
  4. He is not a tech person, so often he doesn’t understand what he does see.
  5. He will do/say anything to get views, because he makes money, and that is the only reason he still exists.
Most valid leaks are found in other locations. I’ve yet to hear/see a single one that came originally from one of these YouTube “leakers”.

There are some folks on this forum that know A LOT about Intel/AMD’s release plans, and I am sure they die a bit inside every time he gets quoted.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,388
1,661
136
From a platform perspective are we expecting new 770/750 chipsets?

The motherboard vendors probably want a new model name. If they do, I can't see AMD not supporting their partners with one.

The reason is that Zen4 can currently support much higher memory clocks than it could at launch. Memory clock speed is a selling point printed in large type on MB packaging, and most current models only have 6400+, which was what it was possible to test when the motherboards were released. So MB vendors want to refresh their lineups, and when they do that, they usually want to have a new chipset name.

It's entirely possible that the chipset in question is literally PROM21, just rebranded.
 
Reactions: Tlh97 and Ajay

HurleyBird

Platinum Member
Apr 22, 2003
2,697
1,293
136
Very rarely will that ever be the case. When it is, the issue is usually either a failure to optimize the first iteration or the introduction of new power management features (or both)

Absent those two things, increasing IPC means increasing transistors, which means increasing power consumption.

Rarely isn't the same as never. Especially when a company focuses on agility, many things can be left of the table. Zen4c is a poster child for that. And as an engineer, you know there is such a thing as a free lunch--they just happen to be extremely rare.

But I don't think your categorization is correct.

Anything to do with cache and memory can increase IPC while reducing power consumption. That's a category you can pretty much always do something with.

And anything to do with layout can increase IPC (timings) while reducing power consumption (distances, voltages). That's a category that is never 100% optimal in this day of billions of transistors.

For the former, the most significant example I can think of is Maxwell memory compression: Huge increase to IPC with a massive decrease in power.

For the later, as an extreme example, take an Epyc processor (or even desktop Zen) and make it monolithic (or stacked).

Zen 3 is a perf/watt champ, but those charts show exactly what we are referring to. IPC is up, but so is power.

It shows a mix, mostly because Zen3 runs at a higher frequency but has better voltage scaling. Taken as a whole, running clock-for-clock, volt-for-volt, Zen3 probably runs a bit hotter, but that's taking into account the aggregate changes. Unified L3 likely does both increase IPC and reduce power consumption, and there are most likely smaller optimizations that have some positive effect on both, documented or not. If you count changes that less directly improve power consumption by facilitating lower voltage, there are probably others of that kind also.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,783
7,995
136
The motherboard vendors probably want a new model name. If they do, I can't see AMD not supporting their partners with one.

The reason is that Zen4 can currently support much higher memory clocks than it could at launch. Memory clock speed is a selling point printed in large type on MB packaging, and most current models only have 6400+, which was what it was possible to test when the motherboards were released. So MB vendors want to refresh their lineups, and when they do that, they usually want to have a new chipset name.

It's entirely possible that the chipset in question is literally PROM21, just rebranded.
I agree. It would almost be idiotic for AMD not to have a new chipset for Zen5. Maybe they tweak it, maybe they don't, but how are motherboard manufactures going to name their new mobos without a new chipset?
ROG CROSSHAIR X670 E++ MAXIMUM OVDRIVE EDITION???
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,388
1,661
136
Zen5 doesn't have any new I/O what's the point of a new chipset?

If they actually had a new chipset, it could significantly better. The links from the CPU are all PCIE 5.0, a better chipset could provide twice the throughput.

But what Ajay and I are proposing is that the reason is just literally marketing. New x770 motherboards will seem fancier than last-gen x670 ones. Even if it's the same chip. There are reasons why vendors do rebrands.
 
Reactions: Tlh97 and Ajay

Abwx

Lifer
Apr 2, 2011
11,103
3,780
136
It shows a mix, mostly because Zen3 runs at a higher frequency but has better voltage scaling. Taken as a whole, running clock-for-clock, volt-for-volt, Zen3 probably runs a bit hotter, but that's taking into account the aggregate changes.

3% higher frequency in MT than Zen 2, 15% more throughput and power lower by 9W, temp is 20°C lower at 64°C, numbers are for the 5950X vs 3950X, as i pointed likely that Zen 3 use an enhanced N7.

 

inf64

Diamond Member
Mar 11, 2011
3,713
4,088
136
I was reading this AT article on SMT scaling for Zen 3, and this quote piqued my interest:

"But, if a core design benefits from SMT, then perhaps the core hasn’t been designed optimally for a single thread of performance in the first place. If enabling SMT gives a user exact double performance and perfect scaling across the board, as if there were two cores, then perhaps there is a direct issue with how the core is designed, from execution units to buffers to cache hierarchy. It has been known for users to complain that they only get a 5-10% gain in performance with SMT enabled, stating it doesn't work properly - this could just be because the core is designed better for ST. Similarly, stating that a +70% performance gain means that SMT is working well could be more of a signal to an unbalanced core design that wastes power.

This is the dichotomy of Simultaneous Multi-Threading. If it works well, then a user gets extra performance. But if it works too well, perhaps this is indicative of a core not suited to a particular workload. The answer to the question ‘Is SMT a good thing?’ is more complicated than it appears at first glance."


Now looking forward to Zen 5, if the slide from mlid is for nT server workload and we assume AMD is sandbagging a bit (let's assume it's +20% higher MT performance at ISO clocks, or in other words 20% higher MT IPC), then below could be Zen 5's performance versus Zen 3 and Zen 4:
STSMT multipierMT
Zen31.0001.2501.250
Zen41.1101.2851.426
Zen51.4761.1501.697

So 33% higher ST IPC vs Zen 4 , core that is designed for ST supremacy, but as a consequence of that, scales much less with SMT (almost a half of the % that Zen 4 gets from SMT : ~28.5% vs 15%). I guess that the total MT throughput performance (at ISO core count and clocks) will greatly depend on what all core turbo speeds the Turin parts can achieve at the given TDP. If TDP is +25%, I think it's possible that we might see similar all core Turbo clocks as we had in Genoa case. That would translate to :

Turin classic vs Genoa classic: 128/96 x 1.2 = 1.6 or 60% more MT performance ; 0.95 x 1.33 = 1.26 or 26% higher ST performance (assuming 5% ST clock deficit vs Genoa)
Turin dense vs Bergamo (both with SMT) : 192 / 128 x 1.2 = 1.8 or 80% more MT performance ; 1 x 1.33 = 1.33 or 33% higher ST performance (assuming no ST clock deficit vs Bergamo)
 

Geddagod

Golden Member
Dec 28, 2021
1,165
1,049
106
I was reading this AT article on SMT scaling for Zen 3, and this quote piqued my interest:

"But, if a core design benefits from SMT, then perhaps the core hasn’t been designed optimally for a single thread of performance in the first place. If enabling SMT gives a user exact double performance and perfect scaling across the board, as if there were two cores, then perhaps there is a direct issue with how the core is designed, from execution units to buffers to cache hierarchy. It has been known for users to complain that they only get a 5-10% gain in performance with SMT enabled, stating it doesn't work properly - this could just be because the core is designed better for ST. Similarly, stating that a +70% performance gain means that SMT is working well could be more of a signal to an unbalanced core design that wastes power.

This is the dichotomy of Simultaneous Multi-Threading. If it works well, then a user gets extra performance. But if it works too well, perhaps this is indicative of a core not suited to a particular workload. The answer to the question ‘Is SMT a good thing?’ is more complicated than it appears at first glance."


Now looking forward to Zen 5, if the slide from mlid is for nT server workload and we assume AMD is sandbagging a bit (let's assume it's +20% higher MT performance at ISO clocks, or in other words 20% higher MT IPC), then below could be Zen 5's performance versus Zen 3 and Zen 4:
STSMT multipierMT
Zen31.0001.2501.250
Zen41.1101.2851.426
Zen51.4761.1501.697

So 33% higher ST IPC vs Zen 4 , core that is designed for ST supremacy, but as a consequence of that, scales much less with SMT (almost a half of the % that Zen 4 gets from SMT : ~28.5% vs 15%). I guess that the total MT throughput performance (at ISO core count and clocks) will greatly depend on what all core turbo speeds the Turin parts can achieve at the given TDP. If TDP is +25%, I think it's possible that we might see similar all core Turbo clocks as we had in Genoa case. That would translate to :

Turin classic vs Genoa classic: 128/96 x 1.2 = 1.6 or 60% more MT performance ; 0.95 x 1.33 = 1.26 or 26% higher ST performance (assuming 5% ST clock deficit vs Genoa)
Turin dense vs Bergamo (both with SMT) : 192 / 128 x 1.2 = 1.8 or 80% more MT performance ; 1 x 1.33 = 1.33 or 33% higher ST performance (assuming no ST clock deficit vs Bergamo)
esentially the same math as I did here yesterday lol
 
Reactions: Tlh97 and inf64

Joe NYC

Platinum Member
Jun 26, 2021
2,157
2,738
106
If they actually had a new chipset, it could significantly better. The links from the CPU are all PCIE 5.0, a better chipset could provide twice the throughput.

But what Ajay and I are proposing is that the reason is just literally marketing. New x770 motherboards will seem fancier than last-gen x670 ones. Even if it's the same chip. There are reasons why vendors do rebrands.

PCIe links to chiplet are Gen 5. I think 4.

All the other links coming directly from the CPU can be routed by Mobo makers.

I think the only problem is that there does not appear to ben easy way to split 8x PCIe Gen 5 to 16x Gen 4. Because if that's all that GPU needs, the mobo makers could gain the other 8x Gen 5 lanes.
 

Ajay

Lifer
Jan 8, 2001
15,783
7,995
136
Zen5 doesn't have any new I/O what's the point of a new chipset?
As @Tuna-Fish pointed out, this must be done, at a minimum, for marketing purposes. At least in the DIY space anyway. As he also pointed out, there are still technical reasons to do so, even if the I/O configuration on the CPU itself hasn't changed. AMD may opt to not spend another dime on the chipset - their choice obviously, but I'd be shocked if they didn't rename it.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,157
2,738
106
As @Tuna-Fish pointed out, this must be done, at a minimum, for marketing purposes. At least in the DIY space anyway. As he also pointed out, there are still technical reasons to do so, even if the I/O configuration on the CPU itself hasn't changed. AMD may opt to not spend another dime on the chipset - their choice obviously, but I'd be shocked if they didn't rename it.
There are always tradeoffs, and time to market for Zen 5 is one of those tradeoffs.

If Zen 5 indeed launches in ~Q1 2024 into well-established eco system vs. 2-3 quarter delays, overpriced mobos with buggy BIOSes, I would definitely take faster time to market.
 

Abwx

Lifer
Apr 2, 2011
11,103
3,780
136

Same 7nm technology doesnt imply that it s the same 7nm process, N7 and N7P are both based on the same 7nm process.

If we look at the 5950X vs 3950X the former has 20% better perf/watt at isoclocks, this is a hint that these are not the same iteration of 7nm.

TSMC’s N7P uses the same design rules as the company’s N7, but features front-end-of-line (FEOL) and middle-end-of-line (MOL) optimizations that enable to either boost performance by 7% at the same power, or lower power consumption by 10% at the same clocks.

 
Reactions: Executor_

Det0x

Golden Member
Sep 11, 2014
1,043
3,034
136
Zen5 doesn't have any new I/O what's the point of a new chipset?
Its a only a few AM5 motherboards that can run 8000MT/s stable in 2:1 mode, pretty much only two 1DPC from Asus and Gigabyte atm...

A new generation motherboards could improve improve memory layout/traces for the 2DPC boards.
At the moment they cap out at ~7400-7600MT/s if you want stability every reboot. (kinda behave like tuning memory on raptor lake, stability changes each reboot)

Below i did on my 1DPC GENE
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,165
1,049
106
Same 7nm technology doesnt imply that it s the same 7nm process,
it does
N7 and N7P are both based on the same 7nm process.
Then AMD would say "different TSMC 7nm finfet technology as Zen 2"
If we look at the 5950X vs 3950X the former has 20% better perf/watt at isoclocks, this is a hint that these are not the same iteration of 7nm.
Better arch
 

H433x0n

Senior member
Mar 15, 2023
933
1,032
96
I was reading this AT article on SMT scaling for Zen 3, and this quote piqued my interest:

"But, if a core design benefits from SMT, then perhaps the core hasn’t been designed optimally for a single thread of performance in the first place. If enabling SMT gives a user exact double performance and perfect scaling across the board, as if there were two cores, then perhaps there is a direct issue with how the core is designed, from execution units to buffers to cache hierarchy. It has been known for users to complain that they only get a 5-10% gain in performance with SMT enabled, stating it doesn't work properly - this could just be because the core is designed better for ST. Similarly, stating that a +70% performance gain means that SMT is working well could be more of a signal to an unbalanced core design that wastes power.

This is the dichotomy of Simultaneous Multi-Threading. If it works well, then a user gets extra performance. But if it works too well, perhaps this is indicative of a core not suited to a particular workload. The answer to the question ‘Is SMT a good thing?’ is more complicated than it appears at first glance."


Now looking forward to Zen 5, if the slide from mlid is for nT server workload and we assume AMD is sandbagging a bit (let's assume it's +20% higher MT performance at ISO clocks, or in other words 20% higher MT IPC), then below could be Zen 5's performance versus Zen 3 and Zen 4:
STSMT multipierMT
Zen31.0001.2501.250
Zen41.1101.2851.426
Zen51.4761.1501.697

So 33% higher ST IPC vs Zen 4 , core that is designed for ST supremacy, but as a consequence of that, scales much less with SMT (almost a half of the % that Zen 4 gets from SMT : ~28.5% vs 15%). I guess that the total MT throughput performance (at ISO core count and clocks) will greatly depend on what all core turbo speeds the Turin parts can achieve at the given TDP. If TDP is +25%, I think it's possible that we might see similar all core Turbo clocks as we had in Genoa case. That would translate to :

Turin classic vs Genoa classic: 128/96 x 1.2 = 1.6 or 60% more MT performance ; 0.95 x 1.33 = 1.26 or 26% higher ST performance (assuming 5% ST clock deficit vs Genoa)
Turin dense vs Bergamo (both with SMT) : 192 / 128 x 1.2 = 1.8 or 80% more MT performance ; 1 x 1.33 = 1.33 or 33% higher ST performance (assuming no ST clock deficit vs Bergamo)
This all makes sense but makes me wonder.. why keep SMT then? If there’s a meager 15% uplift from SMT, is it really worth the trade offs at that point? Getting rid of it reduces a lot of security and validation hurdles.
 
Reactions: Tlh97 and Saylick

Glo.

Diamond Member
Apr 25, 2015
5,743
4,633
136
This all makes sense but makes me wonder.. why keep SMT then? If there’s a meager 15% uplift from SMT, is it really worth the trade offs at that point? Getting rid of it reduces a lot of security and validation hurdles.
Expect that both, Intel and AMD will ditch the SMT from their mainstream CPUs.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |