Question Zen 6 Speculation Thread

Page 60 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,074
16,002
136
Most of them don't WANT to understand

If they had the desire to understand, they wouldn't have kept buying Raptor Lake.
And if you look at the primegrid tasks logs, you will see why avx-512 is so important to that type of calculation, and it is a server function. I am not sure if Intel took it out of their server CPUs , but AMD is ahead in the phoronix benchmarks for servers, one reason why they are outselling Intel. AMD kept in their desktop, but too bad companies don't make wider use for it. At 40% or more advantage in performance.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,150
4,596
136
It's more like 2.5x the price of N7

I have seen a lot of quotes of $30,000 per N2 wafer (which surely will come down, but perhaps not right away). And N7, especially for a very simple SRAM die, with fewer metal layers (fewer than a complex logic die of Zen 6) could even be under $7,5000
 

Timorous

Golden Member
Oct 27, 2008
1,957
3,821
136
~20% logic density over N3, which is is 60% logic density over N5 (~50% over N4).

So a 50% logic density increase would mean the Zen 5 core shrunk to N2 is in the region of 3.3mm per core.

A 75mm CCD with 30mm of space used for IO and L3 cache leaves 3.75mm per core. Quite an increase in area for Z6.

If L3 cache increased 50% to 48MB it would be in the region of 20mm at 15% increased density. That would mean with 15mm of IO the remaining area across 12 cores would make each Zen 6 core the same size as an N2 Zen 5 core.

So, depending on what they do with the core I could see either 48MB L3 or 36MB L3 depending on how it actually shrinks Vs these back of an envelope calcs.
 

OneEng2

Senior member
Sep 19, 2022
661
900
106
I think with 12 core CCD it makes sense to have 3 different bin targets. Right now the only have full (8 core) and 6 core bins. With 12 core CCD they could have full (12 core), 10 core and 8 core.

So the lineup could be

10950X - 24 cores
10900X - 20 cores
10700X - 12 cores
10600X - 8 cores

or

10950X - 24 cores
10900X - 16 cores
10700X - 12 cores
10600X - 10 cores
I think the former is the more likely. Since each CCD is rendered separately, what you are talking about is the yield of a single CCD x 2, not a full 24 core single CCD.

So you can yield a 12, 10, 8, 6, 4, or 2 core CCD (I think). It will come down to marketing in each segment I think since they are easily able to create any core combo they want.

As for the discussion on N2, I suspect only the server parts (just like today) will spring for the more expensive node .... but maybe not. N2 is GAA which requires a completely different library to N3. It is a lot more work to do the Zen 6 design on BOTH libraries. It appears that Intel's 18A parts will be sufficient to drive AMD to N2, but I could be mistaken. It is possible that Zen 6 on N3P outperforms Intel's 18A lineup. AMD will use the least expensive process they need in order to maintain competitiveness..... so much depends on how well 18A does IMO.
 
Reactions: Win2012R2

StefanR5R

Elite Member
Dec 10, 2016
6,535
10,244
136
Interesting comment here: https://www.overclock.net/posts/29425812/
A more direct example would be that Zen 5's front-end is statically partitioned and makes it's SMT implementation a lot like CMT in some ways. The actual execution isn't split and and, when required, one front-end can take advantage of a core that is much wider than it would usually be able to serve because of giant modern op-caches.
Could this mean that Zen 6's SMT will be even more like CMT since Mike Clark hinted that they laid the foundation for the future in Zen 5?
The terms "statically partitioned" and "dynamically partitioned" have recently been used when a resource is shared between threads in SMT mode. ("Statically partitioned" is when there is a single predetermined place for the partition wall, "dynamic" is when the partition wall slides here or there based on demand. Either way, the partition wall is taken down when there is only one thread.) The CMT setup in contrast, had resources which were _not_ shared between threads. (The partition wall was not torn down in singe-thread mode.)
It's the opposite of CMT. Two front-ends sharing one back-end, rather than two back-ends sharing one front-end.
Fortunately for Zen 5, it is not the entire frontend which is not a shared resource, it is only the decoders which [edit: seemingly] are exclusive per thread (and thus, left ≥half unused in single thread mode). And it can be guessed from what Mike Clark said mistakenly¹ about the decoder halves, that the initial goal was to share them rather than make them thread-exclusive. Let's see if they make both halves available to single-thread mode in Zen 6.

________
¹) Or perhaps not in fact mistakenly, but imprecisely to a misleading extent. His "the answer is yes" was to the question "Can a single thread take advantage of all of the front-end resources and can it take advantage of both decode clusters and the entirety of the dual ported OP cache?". We have been focusing on the decoders part of the question, for which the answer turned out to be "no, or practically no" according to measurements and the software optimization guide. To the entire question, the correct answer would have been "yes and no".
 
Last edited:
Reactions: Win2012R2 and Tlh97

StefanR5R

Elite Member
Dec 10, 2016
6,535
10,244
136
what I don't get is why Intel people do not understand that an AMD c core is the same as regular except runs slower. (and maybe less cache ??)
Even better, the C cores don't generally run slower; they merely have a lower peak¹ clock frequency.
________
¹) And the important bit which some keep forgetting or ignoring is that (a) actual clock speed in well parallelized workloads is less than this limit, due to power and thermals, hence this limit does not matter in these scenarios, (b) in less well parallelized workloads, the OS should favor² the classic cores with their higher f_peak, hence this limit does not matter in these scenarios either. Unless you don't have any classic cores, as in Bergamo and Turin-dense.
²) The notion of favored cores was introduced by Intel's Turbo Boost Max 3.0 with Broadwell-E, and adopted by popular OS kernels soon after.

Edit:
Back to Zen 6 speculation, a pure classic-cores CCD as we currently have it appears more suited to desktop with its looser constraints WRT power and cooling, while a mixed classic and dense CCD seems like a good idea for mobile client. But if they try to stick with as few designs as possible, I have no idea which way they will end up...
 
Last edited:
Reactions: Tlh97

LightningZ71

Platinum Member
Mar 10, 2017
2,274
2,824
136
If they want to keep the core count bloat going that's pervasive in the industry, but also want to keep the die sizes reasonable, then they may have to do a mixed CCX. 4+8 certainly gives them a good balance, and with the addition of fin-flex, which they haven't had available in any product (save turin-dense, which likely doesn't particularly need it), they can better differentiate the cores by using the HP fin arrangement on the P cores and the more dense, lower leakage arrangement on the C cores.
 
Reactions: Tlh97 and GTracing

GTracing

Senior member
Aug 6, 2021
478
1,112
106
If they want to keep the core count bloat going that's pervasive in the industry, but also want to keep the die sizes reasonable, then they may have to do a mixed CCX. 4+8 certainly gives them a good balance, and with the addition of fin-flex, which they haven't had available in any product (save turin-dense, which likely doesn't particularly need it), they can better differentiate the cores by using the HP fin arrangement on the P cores and the more dense, lower leakage arrangement on the C cores.
That would make sense for laptops where they're power limited anyways. On desktop I think it would improve perf/area (and ergo perf/dollar). But it might cause issues for games, so I really hope they don't do it.
 
Reactions: Tlh97 and Joe NYC

Gideon

Platinum Member
Nov 27, 2007
2,012
4,989
136
That would make sense for laptops where they're power limited anyways. On desktop I think it would improve perf/area (and ergo perf/dollar). But it might cause issues for games, so I really hope they don't do it.
Yeah MT performance uplift (vs 8 full cores) would not be all that great in that case.

Ryzen 8500G has 2x zen4 and 4x Zen4c cores and, the max clocks are 5 Ghz and 3.7Ghz respectfully. That's a 25% drop. 7950x and 9950x have all core turbos north of 5.2 Ghz even in AVX workloads.

That 25% drop means the 4+8 core CCD would perform more like a 10 core CCD in MT workloads than a 12 Core one. Given Intel plans to offer 8+16 x2, i don't think it's good enough for the highest end part. But they might still go for it as 4+8 should also take roughly the same area than 8 full cores
 
Reactions: Tlh97 and GTracing

StefanR5R

Elite Member
Dec 10, 2016
6,535
10,244
136
Remember that actual computing throughput is less than proportional to core clock though. (If the workload is very well cached, it is nearer to being proportional, if not, it may be way less than proportional.)

The higher the core clock, the lower the portion of cycles in which actual computation is going on.
 
Reactions: Tlh97 and Joe NYC

GTracing

Senior member
Aug 6, 2021
478
1,112
106
Remember that actual computing throughput is less than proportional to core clock though. (If the workload is very well cached, it is nearer to being proportional, if not, it may be way less than proportional.)

The higher the core clock, the lower the portion of cycles in which actual computation is going on.
There are a lot of points to be made for and against dense cores. I don't think that one is particularly big.

For example, the dense cores in Strix Point have one quarter as much L3 cache. That would make a bigger difference to performance imo.
 

basix

Member
Oct 4, 2024
144
297
96
Latest data from TSMC has N2 SRAM density at +50% over N5

Edit: Sorry it's actually +50% over N7, +18% over N5

Actual SRAM array implementations should scale more than that. SRAM requires some control logic in between, so you should get some additional gains from logic scaling. Not much, but also not nothing.

A second lever is FinFlex. You could use optimized 1-2 / 2-2 / 2-3 cell widths, which might be more compact than its N4/5 counterparts (regular 2-2 / 3-3 cells), but still reach the performance goals.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,274
2,824
136
Comparing the 12 core CCD to Strix Point seems kind of rough in a lot of ways. No fin flex, separate CCXs, odd, tiny cache sizing, highly power limited scenario.

I dare say that the E cores on the conjectural Zen6 CCX should be capable of hitting 4Ghz or more on N3P or N2. They should be easier on power as well.

As for comparing 2 x 4+8 to Intel's 2x 8+16, without HT, remember that once you get beyond a few cores pushing max turbo, you start to get power/thermal limited. Having 4 P cores per CCX should give them solid ST performance while not draining the power and thermal budget. It's more important that you have a lot of cores for good MT performance. My bigger worry is that in MT, AMD just runs out of cores. While HT is netting them about 25% extra throughput per core, are their C cores better than Intel's E cores by enough in MT to overcome a 2:1 core count deficit?
 

Joe NYC

Diamond Member
Jun 26, 2021
3,150
4,596
136
MLID has some tidbits and renders of Medusa Point and Medusa Ridge.

Says IOD still TSMC (there was some speculation on Twitter today it may be Samsung), reiterates 12 core CCDs, all eligible for stacking, but stacked die not a requirement. Says AMD "likes" N2 for CCDs. Also said that IOD may be more advanced node.

The way he described the connection to between chiplets is confusing, but since he mentioned "wafer" it is likely RDL

 
Reactions: lightmanek

Kepler_L2

Senior member
Sep 6, 2020
852
3,481
136
MLID has some tidbits and renders of Medusa Point and Medusa Ridge.

Says IOD still TSMC (there was some speculation on Twitter today it may be Samsung), reiterates 12 core CCDs, all eligible for stacking, but stacked die not a requirement. Says AMD "likes" N2 for CCDs. Also said that IOD may be more advanced node.

The way he described the connection to between chiplets is confusing, but since he mentioned "wafer" it is likely RDL

L3 Cache width is almost identical to current CCD so this is definitely 48MB.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,150
4,596
136
L3 Cache width is almost identical to current CCD so this is definitely 48MB.

48 MB would advance competitiveness of base models quite a bit.

But I don't know how realistic it would be to expect +50% cores, +50% L3 and fit it all in less than size of Zen 5 CCD. Removal of SerDes will make some difference, but I am not sure if it is enough...
 
Reactions: Tlh97 and OneEng2

Kepler_L2

Senior member
Sep 6, 2020
852
3,481
136
48 MB would advance competitiveness of base models quite a bit.

But I don't know how realistic it would be to expect +50% cores, +50% L3 and fit it all in less than size of Zen 5 CCD. Removal of SerDes will make some difference, but I am not sure if it is enough...
It's N2 and the CCD is almost 20% bigger.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |