Question Zen 6 Speculation Thread

Page 39 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

OneEng2

Senior member
Sep 19, 2022
619
861
106
How does this work with AM5? Already we see Zen5 as having some main memory limitations.

If we have a 50+ % increase in cores and another 10+ % throughput increase per core, where does that leave us? DDR5 10000+ needed?

As a comparison, SP5 has 12 channels serving 128 Zen5 cores vs AM5 having 2 channels with 16 cores maximum.
Good point. I wonder if the RAM bandwidth is the limit though?
Yes, Strix Halo will answer many questions.

True, but the SP5 socket has max 10.67 cores/memory channel, also slower cores, still 50% more bandwidth/core than AM5 (max). Tells me, that at the limit, the cores with AM5 are starved. Strix Halo will show what's true.
Rumor suggests that Strix Halo's 256bit wide DDR 8000 interface is essentially equal to a quad channel memory setup. If true, then such a setup (quad channel moving from DDR6000 to DDR8000) would represent more than enough bandwidth for double the number of Zen 5 cores for Zen 6.
 

maddie

Diamond Member
Jul 18, 2010
5,127
5,476
136
Rumor suggests that Strix Halo's 256bit wide DDR 8000 interface is essentially equal to a quad channel memory setup. If true, then such a setup (quad channel moving from DDR6000 to DDR8000) would represent more than enough bandwidth for double the number of Zen 5 cores for Zen 6.
For mainly CPU only tasks, definitely yes, but questionable if both CPU & GPU are stressed simultaneously. I eagerly await detailed testing.
 

maddie

Diamond Member
Jul 18, 2010
5,127
5,476
136
Chipset and socket are not linked in that way.

As long as the pin configuration is right it's the IMC on the processor package that needs to change to support a variant of DDR5.
Lets take this a bit further. Would there be frequency limits to the motherboard and its memory trace specifications, even if the raw CPU was capable? Might a 24C Zen 6 be throttled? Lower core counts models should be OK. Not a bad position for those upgrading, if true.
 

soresu

Diamond Member
Dec 19, 2014
3,851
3,233
136
Lets take this a bit further. Would there be frequency limits to the motherboard and its memory trace specifications, even if the raw CPU was capable? Might a 24C Zen 6 be throttled? Lower core counts models should be OK. Not a bad position for those upgrading, if true.
Interesting point.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,103
4,514
106
Interesting tidbit on TSMC N2 wafer starts. It includes both Zen 6 and Mi400. But it shows (one of the two) already taped out, and (one of the two) starting production in early 2026.

Based on "conventional wisdom", Zen 6 is in late 2026, which would imply Mi400 in early 2026 and likely majority of the 100k wafers.

As far as Zen 6, it is safe to assume that none of this volume is for client parts, and all of the Zen 6 N2 volume is server.

 
Reactions: Tlh97 and RnR_au

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,778
136
Not quite. First of all, INT scheduler has nothing to do with SSE and AVX2 integer operation, FP/SIMD scheduler is responsible for those. The throughput is not halved for those operations, but if you max the schedulers out, you will get one extra cycle of delay on one cycle instructions. Since SIMD integer adds are natively 1 cycle, they get the latency hit. Throughput stays the same, 4 int adds at whatever SIMD width you want. Speaking of desktop Zen5.
Yes they are handled by FP unit, I guess it was a weekend when I commented, and I tend to have higher levels of alcohol in my blood on weekends.
I am mostly referring to this teardown by Alexander Yee, where there are increase in latencies for SIMD instructions

All SIMD instructions have minimum 2 cycle latency:

As awesome as Zen5's AVX512 is, not everything is perfect. So let's start with the biggest regression I found:
  • All formerly 1 cycle SIMD instructions have regressed to 2 cycles.
  • Applies to all widths, even 128-bit.
  • Everything that was already >= 2 does not further regress.
  • Throughput remains unchanged. The regression is only for latency.
  • Instructions that can be rename-eliminated (i.e. XOR zeroing) are unaffected and remain zero latency.
This caught me by surprise since it wasn't revealed in AMD's GCC patch. Initially I suspected that this regression was a trade-off to achieve the full 256 -> 512-bit widening. So I asked AMD about this and they gave a completely different explanation. While I won't disclose their response (which I assume remains under NDA), I'll describe it as a CPU hazard that "almost always" turns 1-cycle SIMD instructions into 2-cycle latency.
So while the 1-cycle instructions technically remain 1-cycle, for all practical purposes they are now 2 cycles. So developers and optimizing compilers should assume 2 cycles instead of 1 cycle. I believe it is possible to construct a benchmark that demonstrates the 1-cycle latency, but I have not attempted to do this.

Also if you check the SOG for Z5, there are quite a few regressions in latencies for SIMD ops, and (very) few regression in throughput compared to Z4. But of course lots of improvements as well. (Z3 to Z4 there were no regressions.)
 

MS_AT

Senior member
Jul 15, 2024
683
1,380
96
I am mostly referring to this teardown by Alexander Yee, where there are increase in latencies for SIMD instructions
Ok, so those are related to the scheduler issues I have mentioned. You can find the confirmation in the SOG itself. Alex later confirmed this is the same phenomenon. But int adds still enjoy full throughput afaik.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,103
4,514
106
Some potential configurations of Medusa (Zen 6). It may share CCD across 3 families:
- Medusa Ridge (desktop)
- Medusa Point (higher end notebook)
- Medusa Halo (high end notebook)

What would differ between these is IO die. So likely configurations would be:
- Medusa Ridge - 2 CCDs, smaller IOD
- Medusa Point - 1 CCD, medium sized IOD (maybe adding LLC, large NPU)
- Medusa Halo - 2 CCDs, large IOD

The interposer used is not known, but presumably the inexpensive wafer with RDL. And it seems AMD feels confident in its competitive cost and power efficiency to offer it in mainstream Medusa Point.

The CCD being 12 (big) cores is interesting. Aiming for the mostly the high end. There will still likely be a low end Kraken successor. I wonder if it will be monolithic or chiplet.

Since this is geared to client and notebooks, the AVX-512 implementation will likely be similar to Zen 4, saving some die area, and possibly allowing 12 big cores on a single CCD (using N3) with die size still in AMD sweet spot for CCDs - 70 to 80 mm2.

 

yuri69

Senior member
Jul 16, 2013
657
1,173
136
Does it even make sense to feed a set of 2 * 12 hungrier-than-Zen-5 cores with dual channel DDR5?

Also scaling a 100+W 6GHz 12c CCD to ~35W mobile seems a bit weird.
 
Reactions: Tlh97 and maddie

eek2121

Diamond Member
Aug 2, 2005
3,370
4,987
136
Granted that Intel "got rid of it" because E cores couldn't do it, but I believe that this was a die size issue for BOTH E and P cores.

Good point on the non-Intel compilers; however, my point is that if Intel creates new instructions in their processors, they don't release this fact early enough for AMD to include the same instructions in the same time window. This has given Intel a generation of added performance every time Intel did this before AMD could spin up a new design that included the support.

The question for AVX512 becomes "Is the juice worth the squeeze?". I believe it is due to the growing number of applications that support it in the desktop/laptop, and mostly the huge gains found in many applications in DC.

Intel's recent design decisions seem to leave DC concerns on the back burner. Seems like a strategic mistake to me. We will see.
If Windows itself began making heavy use of AVX-512 along with browsers, you would see a decent speed up. AVX-512 can accelerate many different workloads, the issue is that code needs to be written, you don’t get much benefit from autogen.
I'm not overly optimistic about more cores per CCD because of the skyrocketing node cost and the slow advancement lately (except for the c core parts).
Costs go down over time FYI. N3e will cost significantly less in 2027 when N2 is expected to be out. In addition, TSMC is not going the only leader anymore since Intel has 18A, etc. More competition means lower prices for all.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,778
136
The interposer used is not known, but presumably the inexpensive wafer with RDL. And it seems AMD feels confident in its competitive cost and power efficiency to offer it in mainstream Medusa Point.
What is this gigantic interposer for? I thought they will use Fan Out with RDL.
Also wouldn't it be odd for mobile chips with razor tight margins to have SLC but not the DT chips.

What's the consensus on the IOD process node? I would suppose a GPU would really benefit from a new node if it is to also be part of the IOD. Is the GPU rumored to be in IOD or separate chiplet?

75mm2 CCD are surprisingly large even with 12 Cores.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,241
2,750
136
If Windows itself began making heavy use of AVX-512 along with browsers, you would see a decent speed up. AVX-512 can accelerate many different workloads, the issue is that code needs to be written, you don’t get much benefit from autogen.

Costs go down over time FYI. N3e will cost significantly less in 2027 when N2 is expected to be out. In addition, TSMC is not going the only leader anymore since Intel has 18A, etc. More competition means lower prices for all.
It doesn't matter if TSMC has competition on it's leading nodes if demand still surpasses aggregate capacity. Leading edge nodes will continue to be very costly going forward, getting more expensive as the actual time to complete a platter grows due to multipatterning, etc, and be more resistant to reducing in cost over time.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,241
2,750
136
What is this gigantic interposer for? I thought they will use Fan Out with RDL.
Also wouldn't it be odd for mobile chips with razor tight margins to have SLC but not the DT chips.

What's the consensus on the IOD process node? I would suppose a GPU would really benefit from a new node if it is to also be part of the IOD. Is the GPU rumored to be in IOD or separate chiplet?

75mm2 CCD are surprisingly large even with 12 Cores.
TSMC discussed N4C as a more cost effective node for non-leasing edge chips. I suspect that we'll see a generation of IODs use it.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,103
4,514
106
What is this gigantic interposer for? I thought they will use Fan Out with RDL.

I think what AMD may be planning is what is called Fan Out wafer level packaging, combining multiple wafers on the reconstituted wafer.

The start of the process is getting Known Good Die for both IOD (SoC) and CCD.

Then, these are placed (in the desired arrangement distance etc) on the Reconstituted wafer (which is what MLID may incorrectly be calling interposer.)

Then, on this reconstituted wafer, additional wiring layers (RDL) are applied, to make the necessary connections.

This first video shows that multiple chips can be placed in the mold:



In the 2nd video, it shows how the reconstitued wafer is constructed (just to get the concept), but their example only shows one chip being used.

 
Reactions: lightmanek

OneEng2

Senior member
Sep 19, 2022
619
861
106
From a desktop perspective where die size cost is very important, I find it difficult to believe there will not be variants of the CCD used that have mixed core content (or 6p+6c) for value segments.

A 2CCD solution with one CCD all P and the other all C, and possibly a high end with 2 CCD both all P.

I guess an argument could be made against the mixed CCD based on volume?

Thoughts?
 

marees

Golden Member
Apr 28, 2024
1,175
1,692
96
From a desktop perspective where die size cost is very important, I find it difficult to believe there will not be variants of the CCD used that have mixed core content (or 6p+6c) for value segments.

A 2CCD solution with one CCD all P and the other all C, and possibly a high end with 2 CCD both all P.

I guess an argument could be made against the mixed CCD based on volume?

Thoughts?
Probably just the strix point ( 4p + 8c) ported to zen 6 mobile which in turn is ported to zen 6 desktop

I don't expect AMD to experiment with multiple designs. One and done is more like AMD
 
Reactions: Tlh97 and OneEng2

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,758
106
Some potential configurations of Medusa (Zen 6). It may share CCD across 3 families:
- Medusa Ridge (desktop)
- Medusa Point (higher end notebook)
- Medusa Halo (high end notebook)

What would differ between these is IO die. So likely configurations would be:
- Medusa Ridge - 2 CCDs, smaller IOD
- Medusa Point - 1 CCD, medium sized IOD (maybe adding LLC, large NPU)
- Medusa Halo - 2 CCDs, large IOD

The interposer used is not known, but presumably the inexpensive wafer with RDL. And it seems AMD feels confident in its competitive cost and power efficiency to offer it in mainstream Medusa Point.

The CCD being 12 (big) cores is interesting. Aiming for the mostly the high end. There will still likely be a low end Kraken successor. I wonder if it will be monolithic or chiplet.

Since this is geared to client and notebooks, the AVX-512 implementation will likely be similar to Zen 4, saving some die area, and possibly allowing 12 big cores on a single CCD (using N3) with die size still in AMD sweet spot for CCDs - 70 to 80 mm2.

Well, it'a funny that some of this lines up with the drunken speculation I made several months ago:
ZEN 6 Client (with RDNA5)
Ryzen AI 400 series

All Medusa parts use 12-core CCDs. The difference is the IOD, of which 3 unique ones exist for each Ridge/Halo/Point.

MEDUSA RIDGE (Desktop)
24C/4CU
20C/4CU
16C/4CU
12C/4CU
8C/4CU

192bit/LPDDR6-10667 LPCAMM
100 TOPS NPU

MEDUSA POINT
12C/24CU
10C/20CU
8C/16CU
6C/12CU

192bit/LPDDR6-10667 LPCAMM/Soldered
100 TOPS NPU
4 LP cores

MEDUSA HALO
24C/72CU
20C/60CU
16C/48CU

384bit/LPDDR6-10667 On-package
200 TOPS NPU
4 LP cores
___

*above is speculation
Maybe this comment of mine was secretly one of MILD's sources for that video!
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |