Question Zen 6 Speculation Thread

Markfw · Feb 12, 2025

igor_kavinski said:
Most of them don't WANT to understand

If they had the desire to understand, they wouldn't have kept buying Raptor Lake.

And if you look at the primegrid tasks logs, you will see why avx-512 is so important to that type of calculation, and it is a server function. I am not sure if Intel took it out of their server CPUs , but AMD is ahead in the phoronix benchmarks for servers, one reason why they are outselling Intel. AMD kept in their desktop, but too bad companies don't make wider use for it. At 40% or more advantage in performance.

Thunder 57 · Feb 12, 2025

igor_kavinski said:
Most of them don't WANT to understand

If they had the desire to understand, they wouldn't have kept buying Raptor Lake.

Says the one who bought Arrow Lake .

igor_kavinski · Feb 12, 2025

Thunder 57 said:
Says the one who bought Arrow Lake .

I did my research! It has FOURTEEN REAL cores for only $344. Trust me, they will come in handy.

Some day

Joe NYC · Feb 12, 2025

Kepler_L2 said:
It's more like 2.5x the price of N7

I have seen a lot of quotes of $30,000 per N2 wafer (which surely will come down, but perhaps not right away). And N7, especially for a very simple SRAM die, with fewer metal layers (fewer than a complex logic die of Zen 6) could even be under $7,5000

Timorous · Feb 13, 2025

Kepler_L2 said:
~20% logic density over N3, which is is 60% logic density over N5 (~50% over N4).

So a 50% logic density increase would mean the Zen 5 core shrunk to N2 is in the region of 3.3mm per core.

A 75mm CCD with 30mm of space used for IO and L3 cache leaves 3.75mm per core. Quite an increase in area for Z6.

If L3 cache increased 50% to 48MB it would be in the region of 20mm at 15% increased density. That would mean with 15mm of IO the remaining area across 12 cores would make each Zen 6 core the same size as an N2 Zen 5 core.

So, depending on what they do with the core I could see either 48MB L3 or 36MB L3 depending on how it actually shrinks Vs these back of an envelope calcs.

OneEng2 · Feb 13, 2025

Kepler_L2 said:
I think with 12 core CCD it makes sense to have 3 different bin targets. Right now the only have full (8 core) and 6 core bins. With 12 core CCD they could have full (12 core), 10 core and 8 core.

So the lineup could be

10950X - 24 cores
10900X - 20 cores
10700X - 12 cores
10600X - 8 cores

or

10950X - 24 cores
10900X - 16 cores
10700X - 12 cores
10600X - 10 cores

I think the former is the more likely. Since each CCD is rendered separately, what you are talking about is the yield of a single CCD x 2, not a full 24 core single CCD.

So you can yield a 12, 10, 8, 6, 4, or 2 core CCD (I think). It will come down to marketing in each segment I think since they are easily able to create any core combo they want.

As for the discussion on N2, I suspect only the server parts (just like today) will spring for the more expensive node .... but maybe not. N2 is GAA which requires a completely different library to N3. It is a lot more work to do the Zen 6 design on BOTH libraries. It appears that Intel's 18A parts will be sufficient to drive AMD to N2, but I could be mistaken. It is possible that Zen 6 on N3P outperforms Intel's 18A lineup. AMD will use the least expensive process they need in order to maintain competitiveness..... so much depends on how well 18A does IMO.

StefanR5R · Feb 13, 2025

igor_kavinski said:
Interesting comment here: https://www.overclock.net/posts/29425812/

A more direct example would be that Zen 5's front-end is statically partitioned and makes it's SMT implementation a lot like CMT in some ways. The actual execution isn't split and and, when required, one front-end can take advantage of a core that is much wider than it would usually be able to serve because of giant modern op-caches.

Click to expand...

Could this mean that Zen 6's SMT will be even more like CMT since Mike Clark hinted that they laid the foundation for the future in Zen 5?

The terms "statically partitioned" and "dynamically partitioned" have recently been used when a resource is shared between threads in SMT mode. ("Statically partitioned" is when there is a single predetermined place for the partition wall, "dynamic" is when the partition wall slides here or there based on demand. Either way, the partition wall is taken down when there is only one thread.) The CMT setup in contrast, had resources which were _not_ shared between threads. (The partition wall was not torn down in singe-thread mode.)

Kepler_L2 said:
It's the opposite of CMT. Two front-ends sharing one back-end, rather than two back-ends sharing one front-end.

Fortunately for Zen 5, it is not the entire frontend which is not a shared resource, it is only the decoders which [edit: seemingly] are exclusive per thread (and thus, left ≥half unused in single thread mode). And it can be guessed from what Mike Clark said mistakenly¹ about the decoder halves, that the initial goal was to share them rather than make them thread-exclusive. Let's see if they make both halves available to single-thread mode in Zen 6.

________
¹) Or perhaps not in fact mistakenly, but imprecisely to a misleading extent. His "the answer is yes" was to the question "Can a single thread take advantage of all of the front-end resources and can it take advantage of both decode clusters and the entirety of the dual ported OP cache?". We have been focusing on the decoders part of the question, for which the answer turned out to be "no, or practically no" according to measurements and the software optimization guide. To the entire question, the correct answer would have been "yes and no".

StefanR5R · Feb 13, 2025

Markfw said:
what I don't get is why Intel people do not understand that an AMD c core is the same as regular except runs slower. (and maybe less cache ??)

Even better, the C cores don't generally run slower; they merely have a lower peak¹ clock frequency.
________
¹) And the important bit which some keep forgetting or ignoring is that (a) actual clock speed in well parallelized workloads is less than this limit, due to power and thermals, hence this limit does not matter in these scenarios, (b) in less well parallelized workloads, the OS should favor² the classic cores with their higher f_peak, hence this limit does not matter in these scenarios either. Unless you don't have any classic cores, as in Bergamo and Turin-dense.
²) The notion of favored cores was introduced by Intel's Turbo Boost Max 3.0 with Broadwell-E, and adopted by popular OS kernels soon after.

Edit:
Back to Zen 6 speculation, a pure classic-cores CCD as we currently have it appears more suited to desktop with its looser constraints WRT power and cooling, while a mixed classic and dense CCD seems like a good idea for mobile client. But if they try to stick with as few designs as possible, I have no idea which way they will end up...

LightningZ71 · Feb 13, 2025

If they want to keep the core count bloat going that's pervasive in the industry, but also want to keep the die sizes reasonable, then they may have to do a mixed CCX. 4+8 certainly gives them a good balance, and with the addition of fin-flex, which they haven't had available in any product (save turin-dense, which likely doesn't particularly need it), they can better differentiate the cores by using the HP fin arrangement on the P cores and the more dense, lower leakage arrangement on the C cores.

GTracing · Feb 13, 2025

LightningZ71 said:
If they want to keep the core count bloat going that's pervasive in the industry, but also want to keep the die sizes reasonable, then they may have to do a mixed CCX. 4+8 certainly gives them a good balance, and with the addition of fin-flex, which they haven't had available in any product (save turin-dense, which likely doesn't particularly need it), they can better differentiate the cores by using the HP fin arrangement on the P cores and the more dense, lower leakage arrangement on the C cores.

That would make sense for laptops where they're power limited anyways. On desktop I think it would improve perf/area (and ergo perf/dollar). But it might cause issues for games, so I really hope they don't do it.

Gideon · Feb 13, 2025

GTracing said:
That would make sense for laptops where they're power limited anyways. On desktop I think it would improve perf/area (and ergo perf/dollar). But it might cause issues for games, so I really hope they don't do it.

Yeah MT performance uplift (vs 8 full cores) would not be all that great in that case.

Ryzen 8500G has 2x zen4 and 4x Zen4c cores and, the max clocks are 5 Ghz and 3.7Ghz respectfully. That's a 25% drop. 7950x and 9950x have all core turbos north of 5.2 Ghz even in AVX workloads.

That 25% drop means the 4+8 core CCD would perform more like a 10 core CCD in MT workloads than a 12 Core one. Given Intel plans to offer 8+16 x2, i don't think it's good enough for the highest end part. But they might still go for it as 4+8 should also take roughly the same area than 8 full cores

StefanR5R · Feb 13, 2025

Remember that actual computing throughput is less than proportional to core clock though. (If the workload is very well cached, it is nearer to being proportional, if not, it may be way less than proportional.)

The higher the core clock, the lower the portion of cycles in which actual computation is going on.

GTracing · Feb 13, 2025

StefanR5R said:
Remember that actual computing throughput is less than proportional to core clock though. (If the workload is very well cached, it is nearer to being proportional, if not, it may be way less than proportional.)

The higher the core clock, the lower the portion of cycles in which actual computation is going on.

There are a lot of points to be made for and against dense cores. I don't think that one is particularly big.

For example, the dense cores in Strix Point have one quarter as much L3 cache. That would make a bigger difference to performance imo.

basix · Feb 13, 2025

Kepler_L2 said:
Latest data from TSMC has N2 SRAM density at +50% over N5

Edit: Sorry it's actually +50% over N7, +18% over N5

Actual SRAM array implementations should scale more than that. SRAM requires some control logic in between, so you should get some additional gains from logic scaling. Not much, but also not nothing.

A second lever is FinFlex. You could use optimized 1-2 / 2-2 / 2-3 cell widths, which might be more compact than its N4/5 counterparts (regular 2-2 / 3-3 cells), but still reach the performance goals.

LightningZ71 · Feb 13, 2025

Comparing the 12 core CCD to Strix Point seems kind of rough in a lot of ways. No fin flex, separate CCXs, odd, tiny cache sizing, highly power limited scenario.

I dare say that the E cores on the conjectural Zen6 CCX should be capable of hitting 4Ghz or more on N3P or N2. They should be easier on power as well.

As for comparing 2 x 4+8 to Intel's 2x 8+16, without HT, remember that once you get beyond a few cores pushing max turbo, you start to get power/thermal limited. Having 4 P cores per CCX should give them solid ST performance while not draining the power and thermal budget. It's more important that you have a lot of cores for good MT performance. My bigger worry is that in MT, AMD just runs out of cores. While HT is netting them about 25% extra throughput per core, are their C cores better than Intel's E cores by enough in MT to overcome a 2:1 core count deficit?

branch_suggestion · Feb 14, 2025

https://twitter.com/x/status/1890264424612872269

Good news everyone, VP2INTERSECT shall live on.

Joe NYC · Feb 14, 2025

MLID has some tidbits and renders of Medusa Point and Medusa Ridge.

Says IOD still TSMC (there was some speculation on Twitter today it may be Samsung), reiterates 12 core CCDs, all eligible for stacking, but stacked die not a requirement. Says AMD "likes" N2 for CCDs. Also said that IOD may be more advanced node.

The way he described the connection to between chiplets is confusing, but since he mentioned "wafer" it is likely RDL

Kepler_L2 · Feb 14, 2025

Joe NYC said:
MLID has some tidbits and renders of Medusa Point and Medusa Ridge.

Says IOD still TSMC (there was some speculation on Twitter today it may be Samsung), reiterates 12 core CCDs, all eligible for stacking, but stacked die not a requirement. Says AMD "likes" N2 for CCDs. Also said that IOD may be more advanced node.

The way he described the connection to between chiplets is confusing, but since he mentioned "wafer" it is likely RDL

L3 Cache width is almost identical to current CCD so this is definitely 48MB.

Joe NYC · Feb 14, 2025

Kepler_L2 said:
L3 Cache width is almost identical to current CCD so this is definitely 48MB.

48 MB would advance competitiveness of base models quite a bit.

But I don't know how realistic it would be to expect +50% cores, +50% L3 and fit it all in less than size of Zen 5 CCD. Removal of SerDes will make some difference, but I am not sure if it is enough...

Kepler_L2 · Feb 14, 2025

Joe NYC said:
48 MB would advance competitiveness of base models quite a bit.

But I don't know how realistic it would be to expect +50% cores, +50% L3 and fit it all in less than size of Zen 5 CCD. Removal of SerDes will make some difference, but I am not sure if it is enough...

It's N2 and the CCD is almost 20% bigger.

poke01 · Feb 14, 2025

Kepler_L2 said:
It's N2 and the CCD is almost 20% bigger.

looks like Zen6 will be a great upgrade even for Zen5 users. N2 is a massive node change

Joe NYC · Feb 14, 2025

Kepler_L2 said:
It's N2 and the CCD is almost 20% bigger.

So about 85mm2 vs. 70.6mm2? That would seem more feasible.

For a second I forgot that Zen 4 / Zen 5 are both actually in 70mm2 range vs. Zen 3 being in 80mm2 range.

Joe NYC · Feb 14, 2025

poke01 said:
looks like Zen6 will be a great upgrade even for Zen5 users. N2 is a massive node change

Yup, it looks like it will be my upgrade from my current 7800x3d.

Kepler_L2 · Feb 14, 2025

Joe NYC said:
So about 85mm2 vs. 70.6mm2? That would seem more feasible.

For a second I forgot that Zen 4 / Zen 5 are both actually in 70mm2 range vs. Zen 3 being in 80mm2 range.

Zen2: 72mm²
Zen3: 81mm²
Zen4: 66mm²
Zen5: 66mm²
Zen6: 75mm²

Joe NYC · Feb 14, 2025

Kepler_L2 said:
Zen2: 72mm²
Zen3: 81mm²
Zen4: 66mm²
Zen5: 66mm²
Zen6: 75mm²

Zen 4 official spec says 71mm2 for CCD.

https://www.amd.com/en/products/processors/desktops/ryzen/7000-series/amd-ryzen-7-7700x.html

Zen 5 is apparently 70.6 mm2

Question Zen 6 Speculation Thread

Moderator Emeritus, Elite Member

Diamond Member

Lifer

Diamond Member

Golden Member

Senior member

Elite Member

Elite Member

Platinum Member

Senior member

Platinum Member

Elite Member

Senior member

Member

Platinum Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member