Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

soresu · Jun 19, 2023

Ajay said:
Or, if you mean as main system RAM, as pointed out, it is just too expensive for that - DRAM is produced in massive quantities

HBM memory dies are DRAM also, I assume you meant DDRx DRAM.

I think that the main difference between the 2 (apart from the obvious packagin) is the control/IO die at the bottom of the stack, and the TSVs connecting the stack together.

Though I'm not sure if server DDR isn't already using something like TSVs for higher end modules that exceed consumer capacity with stacking, certainly the DDR5 spec increased the amount of dies per stack over DDR4 along with the maximum potential gbit per die capacity.

soresu · Jun 19, 2023

@Exist50

Found a graph from IMEC demonstrating statistics of SRAM, DRAM and other memories vs various MRAM variants:

The bottom 2 lines are all MRAM variants minus FeRAM which is Ferroelectric RAM.

Ajay · Jun 19, 2023

soresu said:
HBM memory dies are DRAM also, I assume you meant DDRx DRAM.

I think that the main difference between the 2 (apart from the obvious packagin) is the control/IO die at the bottom of the stack, and the TSVs connecting the stack together.

Though I'm not sure if server DDR isn't already using something like TSVs for higher end modules that exceed consumer capacity with stacking, certainly the DDR5 spec increased the amount of dies per stack over DDR4 along with the maximum potential gbit per die capacity.

Yeah, sorry, high MT/s DDR.

Joe NYC · Jun 19, 2023

soresu said:
Ah interesting, I guess I'd just figured in the 6 years since Vega it would already have been scaled up enough for that, put a pin in that for now then 😅

Hopefully 3D DRAM comes along in the interim and scales up the per die capacity too - as it is now it's going to need at least 4 stacks just to reach the DDR4 maximum on consumer platforms (TR excluded).

On HBM3, the base configuration seems to be a stack that is 8 high, with capacity of 16 GB.
Other options would be 1 stack 12 high with 24 GB and 2 stacks with 32-48 GB. That would cover nearly the entire consumer space.

With these new technologies, cost is a function of volume. The volume is rising (which is a good thing from the cost POV) but there is a shortage (which is a bad news from price POV).

But the memory makers have a glut of capacity in regular memory, and are hurting due to prices being low. They need to re-direct more of their capacity to make more HBM memories - which is what they are doing.

With Intel launching SPR + HBM and Mi300c (all CPU) on the horizon, it seems that the trend to HBM is accelerating.

jamescox · Jun 19, 2023

Joe NYC said:
Theoretical max for HBM3 memory is 32 GB per stack, which would be 512 GB for the Mi300.

The way to get from 128 GB to 512 GB is to double the memory chip size and then to double the stack height from 8-high to 16-high.

AMD is apparently going to use Samsung for HBM, while NVidia is using Hynix. Hynix has been leading the HBM area, but Samsung is right behind.

Samsung is planning production of 12-high stack for Q4 - which coincides with Mi300x ramp, and what allows the 192 GB capacity.

The way AMD is planning on selling Mi300x, with 2 Genoa sockets and 8 Mi300x, it competes ok with NVidia system of 2 Sapphire Rapids (8 memory channels vs. 12 for Genoa) and also on on GPU memory, where Mi300x would be clearly ahead in memory capacity.

On the APU side Grace Hopper would have some extra memory attached to Grace CPU, but since the memory is not unified, there would be duplication of copying back and forth to Hopper. But still, NVidia will have more memory when it is released. We will see how customers perceive the tradeoffs.

AMD has not mentioned Mi300c, the CPU only version. So it may come after Mi300a and Mi300x. If it does, then it will likely have an option of using 12-high stack with 192 GB of memory.

When it comes to Zen 5 (Turin), from some leaks, it seems it is moving forward timely, and we may see it out in H1 2024.

And then, the next gen Mi400 likely in H2 2024. By then, the memory makers may be able to have the 48-64 GB HBM stacks, so we could have 512 GB on Mi400.

The time to pull the plug on local motherboard memory is approaching. Faster on the datacenter side of things.

The memory attached to the grace cpu appears to be unified with the HBM memory. It may not be exactly the same as AMD's implementation with HBM possibly set up more like cache, but it seems to be cache coherent between the LPDDR5x and the HBM. It isn't as tightly coupled as AMD has with their base die and massive bandwidth between base die sharing the local HBM between cpu and gpu. The connectivity between sockets may be similar between MI300 and the grace hopper superchip.

NVIDIA Grace Hopper Superchip Architecture In-Depth | NVIDIA Technical Blog

The NVIDIA Grace Hopper Superchip Architecture is the first true heterogeneous accelerated platform for high-performance computing (HPC) and AI workloads. It accelerates applications with the…

developer.nvidia.com

turtile · Jun 20, 2023

Kepler_L2 said:
MI400 is H2 2025 at best.

If the demand stays high, I can easily see AMD shrink it down to 3nm, throw in some improvements, and add Zen 5 in the 2024 timeframe.

soresu · Jun 20, 2023

turtile said:
If the demand stays high, I can easily see AMD shrink it down to 3nm, throw in some improvements, and add Zen 5 in the 2024 timeframe.

I don't imagine that they will launch the CPU only MI400 SKU without its CDNA variants which may not be ready by 2024.

jamescox · Jun 20, 2023

Exist50 said:
They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.

They can get very dense SRAM cache on a process optimized for it. Since the infinity cache is connected to the compute die in MI300 with SoIC, it can be as fast as on-die caches. I don’t know if there is need in the hierarchy for an explicit memory side cache.

I initially thought that MI300 would have separate cache, IO, and compute die. I guess IO may be fine in a cache optimized process, so perhaps there is less need to split IO and cache since it seems to be cheap enough to make on 6 nm node. If significantly larger caches make sense, then then it may also make sense to split out the cache onto a separate die such that multiple cache die can be stacked to increase capacity. Infinity cache doesn’t seem to need to be that large to be effective, so I don’t know if that will happen.

turtile · Jun 21, 2023

soresu said:
I don't imagine that they will launch the CPU only MI400 SKU without its CDNA variants which may not be ready by 2024.

I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.

Ajay · Jun 21, 2023

turtile said:
I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.

I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.

soresu · Jun 21, 2023

Ajay said:
I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.

I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.

This is something nVidia will never have in the x86 space at least, hence their move to cover lotsa ARM cores with Grace.

moinmoin · Jun 21, 2023

I think the point is that AMD continues to have plenty flexibility open to exploit.

Bergamo is the first product where within the same gen the same platform and IOD is offered with two different types of CCDs.

MI300 has a pretty insane flexibility, with the CPU, GPU and APU variants as well as being offered for SH5 and OAM platforms. Upgrading only specific chiplets without changing the overall package can and probably should happen. Previously this already happened with Zen 2 and Zen 3 both using the same IOD and package. Whether the result is marketed as MI400 or as a more gradual upgrade is a different matter.

jamescox · Jun 28, 2023

soresu said:
I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.

This is something nVidia will never have in the x86 space at least, hence their move to cover lotsa ARM cores with Grace.

AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.

soresu · Jun 28, 2023

jamescox said:
AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.

I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.

It seems unlikely that ARM or its server/datacenter partners will stay monolithic for long given how ridiculously expensive chip design is becoming even with ML based placement assistance and other computational design tools.

Exist50 · Jun 28, 2023

soresu said:
I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.

It seems unlikely that ARM or its server/datacenter partners will stay monolithic for long given how ridiculously expensive chip design is becoming even with ML based placement assistance and other computational design tools.

Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.

soresu · Jun 28, 2023

Exist50 said:
Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.

Grace-Hopper 2: Grass-Hopper 😂🤣😆

Saylick · Jun 28, 2023

Exist50 said:
Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.

I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.

soresu · Jun 28, 2023

Saylick said:
I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.

That seems more like a Musk move.

I think for all his ego JHH is a bit too savvy about his companies PR to court that kind of comparison - especially at a time where public opinion over AI is still very much in flux and influenced by decades of negative attention in Hollywood media and fiction literature.

There have been plenty of advances in more recent times (like Li ion battery co inventor Goodenough) worthy of named product generations, so we may eventually see some of them highlighted by nVidia.

Or we can just get moar 19th century stuff ala Curie and Faraday 😅

deasd · Jul 2, 2023

6-cores Granite Ridge(Zen 5) 105w A0

https://twitter.com/x/status/1675365723659718659

eek2121 · Jul 2, 2023

deasd said:
6-cores Granite Ridge(Zen 5) 105w A0

https://twitter.com/x/status/1675365723659718659

It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.

AMD should give all the ‘X’ chips 125W TDPs (170W PPT) next round. More room to clock high, but still efficient.

Geddagod · Jul 2, 2023

deasd said:
6-cores Granite Ridge(Zen 5) 105w A0

https://twitter.com/x/status/1675365723659718659

Zen 5 hasn't powered on yet?

uzzi38 · Jul 2, 2023

eek2121 said:
It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.

AMD should give all the ‘X’ chips 125W TDPs (170W PPT) next round. More room to clock high, but still efficient.

I do not see a world in which peak power for Zen 5 per core is lower than that for Zen 4.

Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.

DisEnchantment · Jul 2, 2023

uzzi38 said:
Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.

From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain

Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.

They might be driving the core to even more frequencies is the only reason that would be acceptable.

Kepler_L2 · Jul 2, 2023

DisEnchantment said:
From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain

Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.

They might be driving the core to even more frequencies is the only reason that would be acceptable.

It's likely to be similar to Zen3 vs Zen2, where the perf gap is small at low power (15-25W) but grows significantly at 45W+.

DrMrLordX · Jul 2, 2023

DisEnchantment said:
Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.

It's numbers like that that remind me how good N4P is relative to its release. N3-family nodes will take awhile to become significantly better in any metric outside of maybe density.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Platinum Member

Platinum Member

Lifer

Platinum Member

Senior member

Senior member

Platinum Member

Senior member

Senior member

Lifer

Platinum Member

Diamond Member

Senior member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Golden Member

Platinum Member

Golden Member

Senior member

Lifer