- Mar 3, 2017
- 1,658
- 6,098
- 136
HBM memory dies are DRAM also, I assume you meant DDRx DRAM.Or, if you mean as main system RAM, as pointed out, it is just too expensive for that - DRAM is produced in massive quantities
Yeah, sorry, high MT/s DDR.HBM memory dies are DRAM also, I assume you meant DDRx DRAM.
I think that the main difference between the 2 (apart from the obvious packagin) is the control/IO die at the bottom of the stack, and the TSVs connecting the stack together.
Though I'm not sure if server DDR isn't already using something like TSVs for higher end modules that exceed consumer capacity with stacking, certainly the DDR5 spec increased the amount of dies per stack over DDR4 along with the maximum potential gbit per die capacity.
On HBM3, the base configuration seems to be a stack that is 8 high, with capacity of 16 GB.Ah interesting, I guess I'd just figured in the 6 years since Vega it would already have been scaled up enough for that, put a pin in that for now then 😅
Hopefully 3D DRAM comes along in the interim and scales up the per die capacity too - as it is now it's going to need at least 4 stacks just to reach the DDR4 maximum on consumer platforms (TR excluded).
The memory attached to the grace cpu appears to be unified with the HBM memory. It may not be exactly the same as AMD's implementation with HBM possibly set up more like cache, but it seems to be cache coherent between the LPDDR5x and the HBM. It isn't as tightly coupled as AMD has with their base die and massive bandwidth between base die sharing the local HBM between cpu and gpu. The connectivity between sockets may be similar between MI300 and the grace hopper superchip.Theoretical max for HBM3 memory is 32 GB per stack, which would be 512 GB for the Mi300.
The way to get from 128 GB to 512 GB is to double the memory chip size and then to double the stack height from 8-high to 16-high.
AMD is apparently going to use Samsung for HBM, while NVidia is using Hynix. Hynix has been leading the HBM area, but Samsung is right behind.
Samsung is planning production of 12-high stack for Q4 - which coincides with Mi300x ramp, and what allows the 192 GB capacity.
The way AMD is planning on selling Mi300x, with 2 Genoa sockets and 8 Mi300x, it competes ok with NVidia system of 2 Sapphire Rapids (8 memory channels vs. 12 for Genoa) and also on on GPU memory, where Mi300x would be clearly ahead in memory capacity.
On the APU side Grace Hopper would have some extra memory attached to Grace CPU, but since the memory is not unified, there would be duplication of copying back and forth to Hopper. But still, NVidia will have more memory when it is released. We will see how customers perceive the tradeoffs.
AMD has not mentioned Mi300c, the CPU only version. So it may come after Mi300a and Mi300x. If it does, then it will likely have an option of using 12-high stack with 192 GB of memory.
When it comes to Zen 5 (Turin), from some leaks, it seems it is moving forward timely, and we may see it out in H1 2024.
And then, the next gen Mi400 likely in H2 2024. By then, the memory makers may be able to have the 48-64 GB HBM stacks, so we could have 512 GB on Mi400.
The time to pull the plug on local motherboard memory is approaching. Faster on the datacenter side of things.
MI400 is H2 2025 at best.
I don't imagine that they will launch the CPU only MI400 SKU without its CDNA variants which may not be ready by 2024.If the demand stays high, I can easily see AMD shrink it down to 3nm, throw in some improvements, and add Zen 5 in the 2024 timeframe.
They can get very dense SRAM cache on a process optimized for it. Since the infinity cache is connected to the compute die in MI300 with SoIC, it can be as fast as on-die caches. I don’t know if there is need in the hierarchy for an explicit memory side cache.They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.
I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.I don't imagine that they will launch the CPU only MI400 SKU without its CDNA variants which may not be ready by 2024.
I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.
I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.
AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.
This is something nVidia will never have in the x86 space at least, hence their move to cover lotsa ARM cores with Grace.
I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.
Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.
It seems unlikely that ARM or its server/datacenter partners will stay monolithic for long given how ridiculously expensive chip design is becoming even with ML based placement assistance and other computational design tools.
Grace-Hopper 2: Grass-Hopper 😂🤣😆Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.
I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.
That seems more like a Musk move.I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.
It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.
Zen 5 hasn't powered on yet?
I do not see a world in which peak power for Zen 5 per core is lower than that for Zen 4.It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.
AMD should give all the ‘X’ chips 125W TDPs (170W PPT) next round. More room to clock high, but still efficient.
From TSMC official numbers (best case scenarios)Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.
It's likely to be similar to Zen3 vs Zen2, where the perf gap is small at low power (15-25W) but grows significantly at 45W+.From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain
Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.
They might be driving the core to even more frequencies is the only reason that would be acceptable.
It's numbers like that that remind me how good N4P is relative to its release. N3-family nodes will take awhile to become significantly better in any metric outside of maybe density.Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.