Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 61 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Platinum Member
Dec 19, 2014
2,863
2,077
136
Or, if you mean as main system RAM, as pointed out, it is just too expensive for that - DRAM is produced in massive quantities
HBM memory dies are DRAM also, I assume you meant DDRx DRAM.

I think that the main difference between the 2 (apart from the obvious packagin) is the control/IO die at the bottom of the stack, and the TSVs connecting the stack together.

Though I'm not sure if server DDR isn't already using something like TSVs for higher end modules that exceed consumer capacity with stacking, certainly the DDR5 spec increased the amount of dies per stack over DDR4 along with the maximum potential gbit per die capacity.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,910
8,052
136
HBM memory dies are DRAM also, I assume you meant DDRx DRAM.

I think that the main difference between the 2 (apart from the obvious packagin) is the control/IO die at the bottom of the stack, and the TSVs connecting the stack together.

Though I'm not sure if server DDR isn't already using something like TSVs for higher end modules that exceed consumer capacity with stacking, certainly the DDR5 spec increased the amount of dies per stack over DDR4 along with the maximum potential gbit per die capacity.
Yeah, sorry, high MT/s DDR.
 
Reactions: Tlh97 and soresu

Joe NYC

Platinum Member
Jun 26, 2021
2,276
2,877
106
Ah interesting, I guess I'd just figured in the 6 years since Vega it would already have been scaled up enough for that, put a pin in that for now then 😅

Hopefully 3D DRAM comes along in the interim and scales up the per die capacity too - as it is now it's going to need at least 4 stacks just to reach the DDR4 maximum on consumer platforms (TR excluded).
On HBM3, the base configuration seems to be a stack that is 8 high, with capacity of 16 GB.
Other options would be 1 stack 12 high with 24 GB and 2 stacks with 32-48 GB. That would cover nearly the entire consumer space.

With these new technologies, cost is a function of volume. The volume is rising (which is a good thing from the cost POV) but there is a shortage (which is a bad news from price POV).

But the memory makers have a glut of capacity in regular memory, and are hurting due to prices being low. They need to re-direct more of their capacity to make more HBM memories - which is what they are doing.

With Intel launching SPR + HBM and Mi300c (all CPU) on the horizon, it seems that the trend to HBM is accelerating.
 
Reactions: Tlh97 and soresu

jamescox

Senior member
Nov 11, 2009
640
1,104
136
Theoretical max for HBM3 memory is 32 GB per stack, which would be 512 GB for the Mi300.

The way to get from 128 GB to 512 GB is to double the memory chip size and then to double the stack height from 8-high to 16-high.

AMD is apparently going to use Samsung for HBM, while NVidia is using Hynix. Hynix has been leading the HBM area, but Samsung is right behind.

Samsung is planning production of 12-high stack for Q4 - which coincides with Mi300x ramp, and what allows the 192 GB capacity.

The way AMD is planning on selling Mi300x, with 2 Genoa sockets and 8 Mi300x, it competes ok with NVidia system of 2 Sapphire Rapids (8 memory channels vs. 12 for Genoa) and also on on GPU memory, where Mi300x would be clearly ahead in memory capacity.

On the APU side Grace Hopper would have some extra memory attached to Grace CPU, but since the memory is not unified, there would be duplication of copying back and forth to Hopper. But still, NVidia will have more memory when it is released. We will see how customers perceive the tradeoffs.

AMD has not mentioned Mi300c, the CPU only version. So it may come after Mi300a and Mi300x. If it does, then it will likely have an option of using 12-high stack with 192 GB of memory.

When it comes to Zen 5 (Turin), from some leaks, it seems it is moving forward timely, and we may see it out in H1 2024.

And then, the next gen Mi400 likely in H2 2024. By then, the memory makers may be able to have the 48-64 GB HBM stacks, so we could have 512 GB on Mi400.

The time to pull the plug on local motherboard memory is approaching. Faster on the datacenter side of things.
The memory attached to the grace cpu appears to be unified with the HBM memory. It may not be exactly the same as AMD's implementation with HBM possibly set up more like cache, but it seems to be cache coherent between the LPDDR5x and the HBM. It isn't as tightly coupled as AMD has with their base die and massive bandwidth between base die sharing the local HBM between cpu and gpu. The connectivity between sockets may be similar between MI300 and the grace hopper superchip.

 

jamescox

Senior member
Nov 11, 2009
640
1,104
136
They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.
They can get very dense SRAM cache on a process optimized for it. Since the infinity cache is connected to the compute die in MI300 with SoIC, it can be as fast as on-die caches. I don’t know if there is need in the hierarchy for an explicit memory side cache.

I initially thought that MI300 would have separate cache, IO, and compute die. I guess IO may be fine in a cache optimized process, so perhaps there is less need to split IO and cache since it seems to be cheap enough to make on 6 nm node. If significantly larger caches make sense, then then it may also make sense to split out the cache onto a separate die such that multiple cache die can be stacked to increase capacity. Infinity cache doesn’t seem to need to be that large to be effective, so I don’t know if that will happen.
 

turtile

Senior member
Aug 19, 2014
618
296
136
I don't imagine that they will launch the CPU only MI400 SKU without its CDNA variants which may not be ready by 2024.
I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.
 

Ajay

Lifer
Jan 8, 2001
15,910
8,052
136
I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.
I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.
 

soresu

Platinum Member
Dec 19, 2014
2,863
2,077
136
I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.
I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.

This is something nVidia will never have in the x86 space at least, hence their move to cover lotsa ARM cores with Grace.
 

moinmoin

Diamond Member
Jun 1, 2017
4,993
7,763
136
I think the point is that AMD continues to have plenty flexibility open to exploit.

Bergamo is the first product where within the same gen the same platform and IOD is offered with two different types of CCDs.

MI300 has a pretty insane flexibility, with the CPU, GPU and APU variants as well as being offered for SH5 and OAM platforms. Upgrading only specific chiplets without changing the overall package can and probably should happen. Previously this already happened with Zen 2 and Zen 3 both using the same IOD and package. Whether the result is marketed as MI400 or as a more gradual upgrade is a different matter.
 

jamescox

Senior member
Nov 11, 2009
640
1,104
136
I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.

This is something nVidia will never have in the x86 space at least, hence their move to cover lotsa ARM cores with Grace.
AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.
 

soresu

Platinum Member
Dec 19, 2014
2,863
2,077
136
AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.
I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.

It seems unlikely that ARM or its server/datacenter partners will stay monolithic for long given how ridiculously expensive chip design is becoming even with ML based placement assistance and other computational design tools.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.

It seems unlikely that ARM or its server/datacenter partners will stay monolithic for long given how ridiculously expensive chip design is becoming even with ML based placement assistance and other computational design tools.
Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.
 

Saylick

Diamond Member
Sep 10, 2012
3,352
6,994
136
Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.
I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.
 

soresu

Platinum Member
Dec 19, 2014
2,863
2,077
136
I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.
That seems more like a Musk move.

I think for all his ego JHH is a bit too savvy about his companies PR to court that kind of comparison - especially at a time where public opinion over AI is still very much in flux and influenced by decades of negative attention in Hollywood media and fiction literature.

There have been plenty of advances in more recent times (like Li ion battery co inventor Goodenough) worthy of named product generations, so we may eventually see some of them highlighted by nVidia.

Or we can just get moar 19th century stuff ala Curie and Faraday 😅
 

eek2121

Diamond Member
Aug 2, 2005
3,022
4,205
136
It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.

AMD should give all the ‘X’ chips 125W TDPs (170W PPT) next round. More room to clock high, but still efficient.
 

uzzi38

Platinum Member
Oct 16, 2019
2,688
6,339
146
It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.

AMD should give all the ‘X’ chips 125W TDPs (170W PPT) next round. More room to clock high, but still efficient.
I do not see a world in which peak power for Zen 5 per core is lower than that for Zen 4.

Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.
 
Reactions: Tlh97 and Thibsie

DisEnchantment

Golden Member
Mar 3, 2017
1,658
6,098
136
Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.
From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain

Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.

They might be driving the core to even more frequencies is the only reason that would be acceptable.
 

Kepler_L2

Senior member
Sep 6, 2020
440
1,805
106
From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain

Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.

They might be driving the core to even more frequencies is the only reason that would be acceptable.
It's likely to be similar to Zen3 vs Zen2, where the perf gap is small at low power (15-25W) but grows significantly at 45W+.
 

DrMrLordX

Lifer
Apr 27, 2000
21,762
11,084
136
Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
It's numbers like that that remind me how good N4P is relative to its release. N3-family nodes will take awhile to become significantly better in any metric outside of maybe density.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |