Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Kaffeekenan · Jan 28, 2023

DisEnchantment said:
Not sure if this is widely known, Zen 5 is family 26/1Ah. Kernel patches are landing now.

Hmmm, if we knew, when the Zen 4 Kernel patches came out, can we then get a feel for the rough potential launch window of Zen 5?

leoneazzurro · Jan 28, 2023

TESKATLIPOKA said:
It was just developed, It's only sample production for customers.
It will take time until It's available in first products, maybe in Q4 2023?
When Strix Point will arrive, then price could be lower, true, but It will be higher than lower clocked ones.
Strix with this memory would once more not be for cheaper laptops but premium ones.
I always thought APU was meant as a cheaper alternative to CPU+dGPU combo, yet It's not.

There are different markets for APUs, as they are used from mainstream to very expensive tjin&light laptops. So while it's unlikely to find this memory in very low-end machines, it will be not strange to find those on things like this:

Are you a human?

www.newegg.com

which costs nornally around 2000$ and more. Also, new GPUs and CPUs will cost more and more, so even low end discrete GPUs anc CPU combos will not be extremely cheap if one wants to have some even entry-level performance (that is, light gaming and streaming). Of course for surfing and office work even the IGPU that is on the 7000 desktop series will be enough.

Glo. · Jan 28, 2023

In upcoming years, it will be impossible to build DIY PC of entry level to mainstream level performance for equal or less amount of money as you would have to pay for Mini-PC, or APU based system.

Its not the problem with the dies and their prices. Its the problem of prices of everything around them: DRAM, GDDR memory, PCBs, controllers, power delivery, manufacturing costs of separate components.

TESKATLIPOKA · Jan 28, 2023

Glo. said:
In upcoming years, it will be impossible to build DIY PC of entry level to mainstream level performance for equal or less amount of money as you would have to pay for Mini-PC, or APU based system.

Its not the problem with the dies and their prices. Its the problem of prices of everything around them: DRAM, GDDR memory, PCBs, controllers, power delivery, manufacturing costs of separate components.

I am talking about laptops here.

Glo. · Jan 28, 2023

TESKATLIPOKA said:
I am talking about laptops here.

"APU based system".

I should also add to that: SOC based system.

TESKATLIPOKA · Jan 28, 2023

Glo. said:
"APU based system".

I should also add to that: SOC based system.

Didn't you notice how much they ask for Rembrandt laptops? Phoenix will end up the same and Strix too. Too expensive for that performance.

Glo. · Jan 28, 2023

TESKATLIPOKA said:
Didn't you notice how much they ask for Rembrandt laptops? Phoenix will end up the same and Strix too. Too expensive for that performance.

Which one of Strix Point dies?

TESKATLIPOKA · Jan 28, 2023

Glo. said:
Which one of Strix Point dies?

What are the options? Rare, medium or well done?

Glo. · Jan 28, 2023

TESKATLIPOKA said:
What are the options? Rare, medium or well done?

All of them.

For now - its a mystery.

Two are 99% sure. What is not sure is... the rest.

jamescox · Jan 28, 2023

TESKATLIPOKA said:
Didn't you notice how much they ask for Rembrandt laptops? Phoenix will end up the same and Strix too. Too expensive for that performance.

I haven’t kept up on the mobile stuff, but I would expect PC laptop makers will need something to compete with the Apple M2 systems eventually, so high end APUs may be a thing eventually. A regular processor with a discrete gpu will not be able to compete on power consumption with a high end APU.

lopri · Jan 28, 2023

DisEnchantment said:
Not sure if this is widely known, Zen 5 is family 26/1Ah. Kernel patches are landing now.

Kaffeekenan said:
Hmmm, if we knew, when the Zen 4 Kernel patches came out, can we then get a feel for the rough potential launch window of Zen 5?

I thought Zen 4 was Family 19h.

Edit: oh apparently alphanumeric 25 = 19h in hex.

Thibsie · Jan 29, 2023

Well, 16+9=25.
So 16h (16 in hexadecimal) equals 25 (in decimal).

DisEnchantment · Jan 29, 2023

Kaffeekenan said:
Hmmm, if we knew, when the Zen 4 Kernel patches came out, can we then get a feel for the rough potential launch window of Zen 5?

Zen 4 patches started appearing at the end of 2021.

Zen 1 and Zen 2 developed by first team
Zen 3 and Zen 4 developed by second team
AMD's newish cadence should be around 18 months and Zen 4 should have launched earlier, but Norrod postponed it by 2Q to add CXL.

I would say early 2Q24 should be the launch window considering the team developing Zen 5 is working in parallel.
Which makes it 18 Months after Zen 4 (if you add the additional 2Qs of Zen 4 push back that makes it two years from the supposed original Zen 4 planned launch, from Forrest Norrod's statement)
So, I would not be surprised if they launch Zen 5 at CES24

BorisTheBlade82 · Jan 29, 2023

DisEnchantment said:
Zen 4 patches started appearing at the end of 2021.

Zen 1 and Zen 2 developed by first team
Zen 3 and Zen 4 developed by second team
AMD's newish cadence should be around 18 months and Zen 4 should have launched earlier, but Norrod postponed it by 2Q to add CXL.

I would say early 2Q24 should be the launch window considering the team developing Zen 5 is working in parallel.
Which makes it 18 Months after Zen 4 (if you add the additional 2Qs of Zen 4 push back that makes it two years from the supposed original Zen 4 planned launch, from Forrest Norrod's statement)
So, I would not be surprised if they launch Zen 5 at CES24

They more or less need to, if they want to keep the Mobile cadence with Strix Point.

jamescox · Jan 31, 2023

It looks like my speculation that the MCD could have v-cache stacked on top might be correct; don't know anything about this person though:

https://twitter.com/x/status/1619089246089773059

It makes it a lot simpler to manufacture if the base die (MCD) is the same size as the v-cache die. They can just do wafer on wafer without dicing and making a reconstituted wafer with cache die and small pieces of filler silicon. If something goes wrong, it is only an MCD die, not a more expensive cpu or gpu die/wafer. That should make it a lot cheaper, so I would expect MCDs to be used other places. MI300 doesn't seem to use them unless they are embedded under the compute die in some manner. I haven't seen anything about off package memory channels on MI300/SH5, but it seems like they would have extra memory somehow, unless they are depending on CXL with essentially HBM cache. HBM still has rather high latency though, since it is still DRAM, so I would expect cache to be under there somewhere. A single silicon interposer under the entire thing still seems too expensive and unnecessary. It seems like it would be smaller embedded die and/or EFB bridge chips.

I don't know how MCDs would work for a Ryzen or Epyc cpu though. If these were used in Epyc, how many DDR5 controllers would fit on an MCD die? I would expect GDDR6 to take proportionately more die area than DDR5 controllers. Fitting the memory channels for a whole quadrant (192-bit) on 1 MCD seems too large, but the Genoa IO die isn't really that big. It is only 24.79 x 16 mm, so if the memory controllers are along the top and bottom edges, as they are in rome/milan io die, then they may be able to fit 192-bit controller. That is 384-bit DDR5 (really 12x32 rather than 6x64, right?) in a 16 nm wide area for the genoa IO die. That would lead to a design with just 4 MCDs, so adding cache by stacking would be good to have, especially if the cpu die only have shared L2 and no L3 or small L3. Perhaps Turin will look like MI300, except with 4 cpu chiplets per quadrant.

Joe NYC · Feb 2, 2023

jamescox said:
I suspect Zen5 will use some of the stacking and connectivity tech used for RDNA3 and MI300, so it is kind of relevant. The things you have labeled as Zen4 cores look more like infinity cache, or maybe L2 cache, or something like that. I have seen the small chips between the HBM3 referred to as structural silicon (semiaccurate, I think). The chiplet you have labeled as "adaptive chiplet" looks exactly like a Zen 4 chiplet with 8 cores. The thing you have labeled "AI chiplet" may be partially FPGA. FPGAs have large arrays, so it could look like cache. It could also just be all AI hardware. That would have large, regular, arrays of things in addition to possible caches. It would be easier to tell if I knew the die size of HBM3. I didn't find it in a quick search and I don't have time to search more today. I thought HBM2 was around 100 mm2. The rendering may be completely inaccurate, but if the "AI chiplets" are actually cpu cores, then where do the 24 cores come from? There are essentially 3 GPUs (2 chiplets each), so having 3x8-cores would make sense. I don't know where the other 8 cores would be hiding unless there is something weird like 2 low power cores in each base die.

The ballpark die sizes are 300-350 mm2 for base die and ~150 mm2 for each of the 2 compute dies stacked on top.

In case of 2 chaplets, that must be adding to 24 Zen 4(d?) cores, they must be in 80-100 mm2 each, since they cover somewhere between 25% and 33% of the base die.

I think these CPU dies may be 16 core Zen 4d chiplets with 2 cores disabled on each.

In any case, the Mi300 implementation, where CPU CCDs are stacked on top of base die, mean that there will be core re-use, but not CCD reuse. Mi300 style CCDs will be different.

Turin will continue to use Genoa SP5 (and Sienna SP6) sockets for mainstream server implementation.

Zen5 cores will likely go to eventually go to SH5 (Mi300, Mi400) socket as well.

Whether there will be some convergence of the architectures, Turin adopting with large base cache + I/O dies remains to be seen. Reusability of CCDs would be a huge asset - which would be the argument "for" convergence to Mi300 style architecture.

Joe NYC · Feb 2, 2023

jamescox said:
Replying to myself...

I am wondering if the layout pictured just isn't the 24 core device. Perhaps it is a 16 core with an FPGA or other accelerator. They apparently can put more than one type of chip on top of the base die. The base die looks like it might be able to fit 4 cpu chiplets, so I am wondering if the 24 core variant is really the top end. This seems like a small number of cores compared to what Nvidia will have with each Grace Hopper package (144?), although that may have a more powerful gpu.

I was just thinking the same, but the 2 dies of the same kind may in fact be 2 x 16 cores with 4 disable cores. It seems they may be too big to fit 4 of them on top of the base die. 3 definitely, 4 maybe.

The ratio of cores vs Grace + Hopper has Grace with a lot more cores, but Grace probably tries to be able to stand on its own against leading x86 CPUs.

In case of Mi300, it does not need to satisfy that requirement, it just needs to be optimal for the likely tasks.

Also, MLID mentioned there would be some interchangeability between compute chiplets stacked on top of the base die, which could allow different configurations.

Joe NYC · Feb 2, 2023

jamescox said:
It looks like my speculation that the MCD could have v-cache stacked on top might be correct; don't know anything about this person though:

https://twitter.com/x/status/1619089246089773059

It makes it a lot simpler to manufacture if the base die (MCD) is the same size as the v-cache die. They can just do wafer on wafer without dicing and making a reconstituted wafer with cache die and small pieces of filler silicon. If something goes wrong, it is only an MCD die, not a more expensive cpu or gpu die/wafer. That should make it a lot cheaper, so I would expect MCDs to be used other places. MI300 doesn't seem to use them unless they are embedded under the compute die in some manner. I haven't seen anything about off package memory channels on MI300/SH5, but it seems like they would have extra memory somehow, unless they are depending on CXL with essentially HBM cache. HBM still has rather high latency though, since it is still DRAM, so I would expect cache to be under there somewhere. A single silicon interposer under the entire thing still seems too expensive and unnecessary. It seems like it would be smaller embedded die and/or EFB bridge chips.

Navi 31, 32 use the MCD chiplet to save on N5 die area and as means to make memory bandwidth a building block, to add more or less bandwidth, as needed per GPU model.

On Mi300, there is already a base die of less expensive N6, and the base die will always interface with 2 HBM stacks, so the memory controller will be a fixed implementation, that will be part of base N6 die. And the N6 die will likely have a lot of L3 SRAM for cache.

So not really any commonality / similarity between the RDNA 3 and CDNA 3 packaging of components, IMO.

A/// · Feb 2, 2023

Here's hoping zen 5 isn't a half baked ham.

jamescox · Feb 2, 2023

Joe NYC said:
Navi 31, 32 use the MCD chiplet to save on N5 die area and as means to make memory bandwidth a building block, to add more or less bandwidth, as needed per GPU model.

On Mi300, there is already a base die of less expensive N6, and the base die will always interface with 2 HBM stacks, so the memory controller will be a fixed implementation, that will be part of base N6 die. And the N6 die will likely have a lot of L3 SRAM for cache.

So not really any commonality / similarity between the RDNA 3 and CDNA 3 packaging of components, IMO.

It isn't necessarily implemented as a single interposer under each group of 2 gpu chiplets. That would be quite large and expensive.

They have shown images kind of indicating infinity cache die embedded under the compute die:

This could just be illustrative; I can't rule out a giant interposer, but it seems like it would be something on the order of 200 to 300 mm2 per each 2 gpu chiplets? Does it need that much for an infinity fabric switch, caches, and whatever IO it has? Even at 200 mm2, that is more than 2x the size of an epyc genoa IO die if all 4 of them are considered. I think they are bigger than 200 mm2; the HBM stacks are over 100 mm2. The actual fabric switches are very small. They need wider connections for gpus, but the PHY is minimal for anything stacked.

If they can use the base die other places then it may make more sense, like if they could use the base die across the product lines with many different combinations of chiplets (cpu, rdna, cdna, fpga, etc). It would still be quite wasteful for a lot of other products, so it still seems more likely that there are smaller pieces of bridge silicon (EFB connections) and possibly infinity fabric fanout (MCD <-> GCD) connected chips under the compute chiplets. If the MCD are used for stacking, then they would be thinned down to a few tens of microns thick, so embedding them under the other chiplets seems doable, even if it is multiple stacked die. IO, cache, and logic are scaling differently, so perhaps there are completely separate IO die and MCD. I wonder if it would be reasonable to use GF for just the IO die component.

igor_kavinski · Feb 2, 2023

A/// said:
Here's hoping zen 5 isn't a half baked ham.

We would prefer that AMD take its time and not do a hurried launch. It's the pesky investors that AMD has more trouble trying to satisfy.

BorisTheBlade82 · Feb 2, 2023

@jamescox
With the little knowledge we have, GF is entirely possible. Intel uses some 14/22nm process for their MTL interposer. Some 16-10nm TSMC legacy process is a possibility as well. It all depends on what is integrated into the base-dies.
I am definitely putting my eggs into the "base dies are connected with something EFBish" basket.

Joe NYC · Feb 2, 2023

jamescox said:
It isn't necessarily implemented as a single interposer under each group of 2 gpu chiplets. That would be quite large and expensive.

The interposer on Mi300 is going to be very large, multiple of the reticle limit, and quite expensive. I think in 2,000 - 2,500 mm2.

Joe NYC · Feb 2, 2023

jamescox said:
They have shown images kind of indicating infinity cache die embedded under the compute die:

This could just be illustrative; I can't rule out a giant interposer, but it seems like it would be something on the order of 200 to 300 mm2 per each 2 gpu chiplets? Does it need that much for an infinity fabric switch, caches, and whatever IO it has? Even at 200 mm2, that is more than 2x the size of an epyc genoa IO die if all 4 of them are considered. I think they are bigger than 200 mm2; the HBM stacks are over 100 mm2. The actual fabric switches are very small. They need wider connections for gpus, but the PHY is minimal for anything stacked.

Bulk of the base die, of Mi300, under each pair of GPU, that is in 300+ mm2 range, and in addition to the things you mentioned, the caches have no upper limit. Whatever area is left over will be SRAM.

I think as much as 512 MB of SRAM per base die, total as much as 2 GB for the Mi300, probably in 1GB to 2 GB range for full Mi300

As an aside, I was looking at the 7950 picture:

and it seems it would not be a big stretch to increase the size of I/O die, give it some cache, which would be system level cache, and stack the CCDs on top of it (in Zen 5)

Let's say the base die of Zen 5 would be ~300 mm2 N6 and could accommodate 3 blocks to stack on it. The blocks, around 100mm2, would be:
- CCD with 8 CPU cores
- GPU
- SRAM for extra cache
- blank (for lower end implementations)

On the client implementation, the base die would sit directly on top on the organic substrate to connect to pins. The base die would provide all the routing and connected via hybrid bond - so minimal power and latency losses, maximum bandwidth.

Joe NYC · Feb 2, 2023

BorisTheBlade82 said:
@jamescox
With the little knowledge we have, GF is entirely possible. Intel uses some 14/22nm process for their MTL interposer. Some 16-10nm TSMC legacy process is a possibility as well. It all depends on what is integrated into the base-dies.
I am definitely putting my eggs into the "base dies are connected with something EFBish" basket.

MLID said it is a giant interposer.

Also, I don't know if this is true, but AMD may dumped (for now) EFB, for now.

https://twitter.com/x/status/1621127941269626882

OTOH, AMD's Papermaster mentioned that there may be a future EFB with hybrid bond.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Elite Member

Senior member

Golden Member

Senior member

Senior member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Senior member

Lifer

Senior member

Platinum Member

Platinum Member

Attachments

Platinum Member