Question Zen 6 Speculation Thread

OneEng2 · 2025-06-30T20:11:53-0400

poke01 said:
His words are more coherent than your essays 😝

Glad you get so much out of his one-word answers. Where I come from it is considered either disrespectful, or ignorant. I am giving a lot of leeway to throw in lazy as a possibility.

CouncilorIrissa · 2025-06-30T20:42:34-0400

OneEng2 said:
I think that if Zen 6 comes out before Nova Lake, it will be 6.0Ghz .... maybe 6.2Ghz. They may have the headroom, but they will not want to use it. Better to yield higher (and cheaper) than to use headroom you didn't need to best the competition.

No, better to clock higher because you can charge additional $ for these last 3-4% of performance. Whether it comes out before or after is entirely irrelevant. 8 cores being sold for $480 kinda tells that people are willing to pay for a bigger e-peen.

We're talking about the company that sold Zen 2 with unattainable boost clocks outside of noop-maxxing scenarios for Christ's sake. They *will* extract every last MHz.

edit: don't even get me started about Raphael.

Doug S · 2025-06-30T20:50:55-0400

The problem is too many people hear about "x% performance gain" with a new process and they simply multiply the current clock rate with that number in their head and that forms their expectation.

If AMD was porting Zen 5 to a smaller process without changing the number of cores or anything else about the product OK that might come close to being true. Problem is, that's what those percentages from TSMC & Intel are based on - they assume the SAME design, with the same number of transistors. You can use the shiny new process to grant that same design increased performance (and making the transistors x% faster doesn't raise your clock rate by x% either, but that's a whole other thing) or you can use the shiny new process to grant that same design the y% of reduced power usage. Key word being SAME design.

AMD isn't doing the same design when they move to a smaller process. Intel isn't, Apple isn't, Qualcomm isn't. No one is. They are making cores wider and more complex, adding cache, changes from major to minor, changes which make them bigger (in terms of number of transistors even if not in actual physical area) and they are providing more of them per chip. All those extra transistors consume added power which takes from that y% power benefit, leaving less for the x% performance benefit.

What's more, when you have more cores you have to share your overall power budget among more cores. Sure in theory if you had a CPU that could draw 250W and gave 5W per core among its 50 cores it should be able to devote 250W to a single core in a single core load. But no one designs that way, the percentage of overall power a single core is able to draw has declined since the single core days when obviously it could draw 100% of the power, and will continue to decline as we add more and more cores.

CouncilorIrissa · 2025-06-30T20:59:23-0400

Doug S said:
The problem is too many people hear about "x% performance gain" with a new process and they simply multiply the current clock rate with that number in their head and that forms their expectation.

This is true; anyone who claims to know how high it'll clock exactly is lying. And those who actually have an idea could never tell without having sued.

Doug S said:
If AMD was porting Zen 5 to a smaller process without changing the number of cores or anything else about the product OK that might come close to being true. Problem is, that's what those percentages from TSMC & Intel are based on - they assume the SAME design, with the same number of transistors. You can use the shiny new process to grant that same design increased performance (and making the transistors x% faster doesn't raise your clock rate by x% either, but that's a whole other thing) or you can use the shiny new process to grant that same design the y% of reduced power usage. Key word being SAME design.

Yep, case in point: Zen 4 -> Zen 5 went from N5 to N4P. On paper N4P has 11% perf gain over N5, in reality there's no clock difference because those are different designs.

edit: to clarify, my previous post wasn't arguing against specific clock figures, more against the general idea that order of NVL/Z6 release will in any way meaningfully affect clock speeds. It won't.

adroc_thurston · 2025-06-30T21:14:33-0400

Saylick said:
Pretty complicated decode stage if it alone takes up 6 or 7 clocks.

It's the opposite case of opcache being really simple.
Zen1 was 14-19 clocks iirc.

OneEng2 said:
rather on reply with information supporting your incredibly useless one-word-replies.

You don't need more than one word for simple stuff.

OneEng2 said:
Care to explain why competition at the time of launch is irrelevant?

The irrelevant part is downbinning.
It's a trick to hit volume targets, and DIY DT is not a volume-limited market.

Doug S said:
The problem is too many people hear about "x% performance gain" with a new process and they simply multiply the current clock rate with that number in their head and that forms their expectation.

That would be true but fin/nanoflex give chip designers a bazillion knobs to squeeze every last ounce of perf.
I'd say that's the biggest change of going to N2.

CouncilorIrissa said:
edit: don't even get me started about Raphael.

eyyy that's Mysticial being kinda a dick.

Doug S · 2025-06-30T21:19:38-0400

adroc_thurston said:
That would be true but fin/nanoflex give chip designers a bazillion knobs to squeeze every last ounce of perf.
I'd say that's the biggest change of going to N2.

Yes but the question I keep asking is how easy is that to access? How good is the tool support? How much of that is automated, and how much requires a bunch of manual tweaking - which would imply more verification time/potential issues.

I'm thinking maybe it is more of a gift that gives a little bit a year over five years as tools are improved / designers become more experienced with it rather than one big gift when you go to N3E/N2.

Thunder 57 · 2025-06-30T21:19:56-0400

Doug S said:
The problem is too many people hear about "x% performance gain" with a new process and they simply multiply the current clock rate with that number in their head and that forms their expectation.

If AMD was porting Zen 5 to a smaller process without changing the number of cores or anything else about the product OK that might come close to being true. Problem is, that's what those percentages from TSMC & Intel are based on - they assume the SAME design, with the same number of transistors. You can use the shiny new process to grant that same design increased performance (and making the transistors x% faster doesn't raise your clock rate by x% either, but that's a whole other thing) or you can use the shiny new process to grant that same design the y% of reduced power usage. Key word being SAME design.

AMD isn't doing the same design when they move to a smaller process. Intel isn't, Apple isn't, Qualcomm isn't. No one is. They are making cores wider and more complex, adding cache, changes from major to minor, changes which make them bigger (in terms of number of transistors even if not in actual physical area) and they are providing more of them per chip. All those extra transistors consume added power which takes from that y% power benefit, leaving less for the x% performance benefit.

What's more, when you have more cores you have to share your overall power budget among more cores. Sure in theory if you had a CPU that could draw 250W and gave 5W per core among its 50 cores it should be able to devote 250W to a single core in a single core load. But no one designs that way, the percentage of overall power a single core is able to draw has declined since the single core days when obviously it could draw 100% of the power, and will continue to decline as we add more and more cores.

Don't let @igor_kavinski know, he still wants his Tejas clocked to the moon! I sometimes wonder if he has some secret use case for it .

adroc_thurston · 2025-06-30T21:23:04-0400

Doug S said:
Yes but the question I keep asking is how easy is that to access

Looking at how everyone is shipping 3-2 zoomies on N3e, pretty easy.

Doug S said:
How much of that is automated, and how much requires a bunch of manual tweaking - which would imply more verification time/potential issues.

TSM very openly advertised Finflex with standard EDA stuff in mind.

Doug S said:
I'm thinking maybe it is more of a gift that gives a little bit a year over five years as tools are improved / designers become more experienced with it rather than one big gift when you go to N3E/N2

Oh no, it's not some magic juice, just added flexibility.

igor_kavinski · 2025-06-30T23:12:17-0400

Thunder 57 said:
Don't let @igor_kavinski know, he still wants his Tejas clocked to the moon! I sometimes wonder if he has some secret use case for it .

It could be the resonant frequency that harmonizes with my neurons, giving me the clarity I desperately need on what the heck I'm really supposed to be doing on this planet. Never ignore your inner voice. If it sounds crazy, it's because it's beckoning you towards things more fantastic than you could possibly imagine.

branch_suggestion · 2025-06-30T23:27:56-0400

OneEng2 said:
Anyone taking bets on Zen 6 desktop max clocks?

6.3Ghz, anything above 6.2 is a job well done.
I think more important would be 6Ghz for X3D, pull that off and it is a complete bloodbath.

poke01 · 2025-07-01T00:11:37-0400

OneEng2 said:
Anyone taking bets on Zen 6 desktop max clocks?

6.5GHz for classic. I believe in NanoFlex

Edit: 6.2GHz for X3D.

adroc_thurston · 2025-07-01T00:43:21-0400

poke01 said:
Edit: 6.2GHz for X3D.

It doesn't have to clock any lower since it's gonna be inherently lower wattage due to lower Vmax on N3 and N2 both.

DavidC1 · 2025-07-01T01:42:00-0400

Saylick said:
Pretty complicated decode stage if it alone takes up 6 or 7 clocks.

Remember this is with uop caches.

For example:
Nehalem had 16 stage pipeline. Then Sandy Bridge introduced a uop cache which increased complexity which adds few stages, but that's on a miss when no data is found in the cache, which makes it 18 cycles. But in a scenario where data is in the cache(or a hit) it goes down to 14 cycles, 2 cycles lower than Nehalem.

branch_suggestion · 2025-07-01T02:18:43-0400

adroc_thurston said:
It doesn't have to clock any lower since it's gonna be inherently lower wattage due to lower Vmax on N3 and N2 both.

I did forget to account that Z5X3D is limited by TSV Vmax.
Hopefully the bottleneck is indeed clock stability for all Z6 parts, no more thermally or voltage limited stuff.

DrMrLordX · 2025-07-01T02:59:31-0400

igor_kavinski said:
Remember once upon a time when everyone talked about pipeline stalls or pipelines stages?

There was a lot of talk surrounding that and Dr. Ian Cutress' original version of 3DPM. It had some punishing performance optimization problems.

MS_AT · 2025-07-01T03:14:50-0400

adroc_thurston said:
It's 11-18 and 12-18.

For Zen5 12-18, common 15. The quote from publicly available Software Optimization Guide for the AMD Zen5 microarchitecture

The branch misprediction penalty is in the range from 12 to 18 cycles, depending on the type of mispredicted branch and whether the instructions are being fed from the Op Cache. The common case penalty is 15 cycles.

Question Zen 6 Speculation Thread

OneEng2

Senior member

CouncilorIrissa

Senior member

Doug S

Diamond Member

CouncilorIrissa

Senior member

adroc_thurston

Diamond Member

Doug S

Diamond Member

Thunder 57

Diamond Member

adroc_thurston

Diamond Member

igor_kavinski

Lifer

branch_suggestion

Senior member

poke01

Diamond Member

adroc_thurston

Diamond Member

DavidC1

Golden Member

branch_suggestion

Senior member

DrMrLordX

Lifer

MS_AT

Senior member

TRENDING THREADS