Question Zen 6 Speculation Thread

Page 104 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

LightningZ71

Platinum Member
Mar 10, 2017
2,134
2,586
136
Things we do know for certain:
AMD has been producing a separate die for DC from desktop for a while if you consider the C core high core count dies as separate since they don't appear in consumer products. AMD's die for Strix Halo appears different than their desktop die from recent die shots. That's at least three different CCDs for Zen5 so far. This leads us to believe that there will be additional segmentation going forward.

L3 cache scaling slowed drastically from N5 to N3. The most notable density increase came from a cell rearrangement. BSPD will allow another density increase, but that's not until A16.

Wafer costs are still increasing each generation and now at a faster rate than density is. This means that leading edge products will have to be viciously efficient, putting pressure on non-scaling parts to get evicted from the CCD.

Die stacking is getting more experience in the industry and techniques and processes are improving. It's going to become more common.

My prediction is that the trend will start where cache levels that can tolerate higher latencies will get evicted in order. L3 is already moving as seen by X3D products. It will get more thorough in the future to where, eventually, no core but the most compact and low end will have it on the CCD.

TSMC and AMD are already experienced in making cache and cost optimized chiplets. With the introduction of N3C, this continues. I expect that there will be an A16C since BSPD will bring a cache density boost. I realize that the C isn't specifically about cache, but cost optimized nodes will be the foundation for it.

At some point soon, AMD will have to produce a CCD with no L3, just buffers, control logic and VIAs to a stacked L3 cache chiplets. It'll probably have changes to the L2 and what passes as L1 caches to hide the extra few cycles of latency that will cost.

Intel is in much the same boat and will be chasing similar paths themselves.
 
Reactions: BorisTheBlade82

Josh128

Senior member
Oct 14, 2022
766
1,267
106
The new 12 core CCD, assuming it has the same L3 per core and 12 core CCX, should make for, by far, the most potent X3D chips to date. The CCD will come standard with 48MB of on die L3 and assuming AMD again stick to 64MB V-cache, give a total of 112MB of L3, unified across 12 cores! That, plus 10% IPC gains and potentially some memory speed and latency improvements, PLUS likely another 200-300 MHz clock speed increase, should make Zen 6 X3D a stupendous gaming chip.

If they happen to also increase the L3 on the v-cache die, it will be even more ridiculous. Seems like the stars are aligning for Zen 6 to be quite impressive. Intel has their work cut out for them even if 18A is all they claim it will be.
 

Hitman928

Diamond Member
Apr 15, 2012
6,615
12,131
136
BSPDN doesn't bring SRAM Boost if anything it reduces density

Intel showed increased SRAM density with their BSPDN, you just have to not use it in the bit cell. TSMC’s BSPDN is more advanced and has a good chance of allowing bit cell density improvements as well.
 

511

Golden Member
Jul 12, 2024
1,898
1,705
106
Intel showed increased SRAM density with their BSPDN, you just have to not use it in the bit cell. TSMC’s BSPDN is more advanced and has a good chance of allowing bit cell density improvements as well.
We will have to see with implementation by TSMC but yeah it's an advanced method vs powerVia.
 

511

Golden Member
Jul 12, 2024
1,898
1,705
106
The new 12 core CCD, assuming it has the same L3 per core and 12 core CCX, should make for, by far, the most potent X3D chips to date. The CCD will come standard with 48MB of on die L3 and assuming AMD again stick to 64MB V-cache, give a total of 112MB of L3, unified across 12 cores! That, plus 10% IPC gains and potentially some memory speed and latency improvements, PLUS likely another 200-300 MHz clock speed increase, should make Zen 6 X3D a stupendous gaming chip.

If they happen to also increase the L3 on the v-cache die, it will be even more ridiculous. Seems like the stars are aligning for Zen 6 to be quite impressive. Intel has their work cut out for them even if 18A is all they claim it will be.
Cough cough 144MB L3 Cache variant from Intel
 

jpiniero

Lifer
Oct 1, 2010
16,121
6,578
136
You want less NUCA domains, not more.

Good answer... but I guess that answers why there would be a server/client split. Because you'd have to think they would also increase the core count on the regular die too... and that'd start to get really expensive, esp if the regular server die is also on N2.
 

reaperrr3

Member
May 31, 2024
89
288
86
The new 12 core CCD, assuming it has the same L3 per core and 12 core CCX, should make for, by far, the most potent X3D chips to date. The CCD will come standard with 48MB of on die L3 and assuming AMD again stick to 64MB V-cache, give a total of 112MB of L3, unified across 12 cores! (...)

If they happen to also increase the L3 on the v-cache die, it will be even more ridiculous.
I assume that due to how L3 and V-Cache are connected, the V-Cache will have to be 48MB*x as well.

Combining 48MB L3 with 64 MB V-Cache would likely only work if the V-Cache was an L4.
But since it's technically a capacity upgrade of the L3, aspects like associativity and bandwidth need to be the same, so when the CCD has 48MB L3, the V-Cache has to be either 48, 96, 144 or 192 MB (if that many layers are supported).
 

511

Golden Member
Jul 12, 2024
1,898
1,705
106
If the space is missing they have to use copper pillars to support uneven die size if the Cache stays to the bottom.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,985
4,367
106
SRAM bit cell scaling has been dead for few years only N3B had scaling but it was so bad they reverted it back.
The cost to put L3 on between N5 and N2 is a big increase so it makes sense to use HB with N4P as the base die with N2 but the question remains when are we going to see it.

Why N4 (or N4P) for Base as opposed to N6? Cost per bit and wafer availability should favor N6.
 

OneEng2

Senior member
Sep 19, 2022
512
742
106
BSPDN doesn't bring SRAM Boost if anything it reduces density
That is possible. SRAM only accounts for 5-7W of power on 9950X3D while the 2 CCD's core processors munch through over 100W.

As a result, SRAM is less likely to shrink with BSPDN than the cores are. The cores may well shrink though as logic gates routing is much more complex .... and it draws way way more power.
 
Reactions: Win2012R2 and 511

Hitman928

Diamond Member
Apr 15, 2012
6,615
12,131
136
Why though?

I don't understand how shifting power delivery metal layers to the backside of the die would reduce density.

With Intel’s method, you still need room for the contact in the horizontal so density goes down. You get an electrical benefit but Intel found it not to be worth the density impact. You can still use it in the SRAM array for the control logic and such, just not in the bitcells.


 

soresu

Diamond Member
Dec 19, 2014
3,708
3,037
136
From what I am seeing it looks like Intel described this technology backside (the dirty kind) backwards, because I still can't see how moving it from one side of the die to another should affect density at all.

Perhaps their explaining it in the context of backside vs frontside illumination of image sensors wasn't the right analogue?
 

Hitman928

Diamond Member
Apr 15, 2012
6,615
12,131
136
From what I am seeing it looks like Intel described this technology backside (the dirty kind) backwards, because I still can't see how moving it from one side of the die to another should affect density at all.

Perhaps their explaining it in the context of backside vs frontside illumination of image sensors wasn't the right analogue?

For density, moving the power routing to the backside helps because when you have front side power delivery, you have both power and logic being routed to basically the same place with roughly the same path to get there so there ends up being a lot of congestion and you have to spread things out so they don’t run into each other.

With backside power, the power delivery and logic follow completely different paths up until the very end (or for TSMC’s method they never come together) so that relieves a significant amount of the congestion and lets you improve density.
 

soresu

Diamond Member
Dec 19, 2014
3,708
3,037
136
For density, moving the power routing to the backside helps because when you have front side power delivery, you have both power and logic being routed to basically the same place with roughly the same path to get there so there ends up being a lot of congestion and you have to spread things out so they don’t run into each other.

With backside power, the power delivery and logic follow completely different paths up until the very end (or for TSMC’s method they never come together) so that relieves a significant amount of the congestion and lets you improve density.
Ah thankyou, that makes a lot more sense.

Does it improve signal integrity at all for IO by moving the power further away?
 

Hitman928

Diamond Member
Apr 15, 2012
6,615
12,131
136
Ah thankyou, that makes a lot more sense.

Does it improve signal integrity at all for IO by moving the power further away?

Maybe not as much for the IO itself (this typically refers to a specific block of the chip) but if you mean the logic signals, then yes. The main effect comes from the reduced parasitics and cleaner power delivery (as well as more efficient power delivery) which allows the transistors to more cleanly pass the signals through the chip.
 

soresu

Diamond Member
Dec 19, 2014
3,708
3,037
136
Maybe not as much for the IO itself (this typically refers to a specific block of the chip) but if you mean the logic signals, then yes. The main effect comes from the reduced parasitics and cleaner power delivery (as well as more efficient power delivery) which allows the transistors to more cleanly pass the signals through the chip.
Nais, in the unlikely event that Vaire's reversible computing revolution ever happens in CPUs and GPUs that should reduce the remaining power losses some.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,985
4,367
106
Because you need to run SRAM macro at Real Actual Speeds.

If AMD were to completely remove L3 from the main die (and possibly increase L2), then the whole L3 can be reengineered to work at half speed in the V-Cache die, with minimum loss of latency.

L3 latency is already > 40 clock cycles, and going to V-Cache already adds (apparently) about 2 clock cycles, to half speed cache, with practically unlimited width should result in very tiny delays - 1 cycle. That's assuming AMD is already not doing it.

Running at half the clock speed could also allow lower voltage and with it, lower power consumption.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,493
7,681
96
then the whole L3 can be reengineered to work at half speed in the V-Cache die, with minimum loss of latency
Uh no brother that's not how that works.
L3 latency is already > 40 clock cycles, and going to V-Cache already adds (apparently) about 2 clock cycles, to half speed cache, with practically unlimited width should result in very tiny delays - 1 cycle. That's assuming AMD is already not doing it.
It adds 4 cycles and half speed caches are just not happening, that's terribly slow under all circumstances.
Running at half the clock speed could also allow lower voltage and with it, lower power consumption.
Active cache power just isn't all that high.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |