I was thinking about doing a similar poll but this one covers most of the options. There are still a few options that I would like to add to that list. The basic idea behind them is that there would be another 7nm chiplet that would have two DDR4 memory controllers on-die and all the other IO related stuff (PCIe, USB, SATA, etc.) would be located in a smaller IO die. There might even be a small iGPU there (but I'm leaving it out for now).
- 16-core modular CPU+IMC (one 16-core CPU chiplet with 2xDDR4 + a smaller IO chiplet)
- 12-core modular CPU+IMC (one 12-core CPU chiplet with 2xDDR4 + a smaller IO chiplet)
- 8-core modular CPU+IMC (one 8-core CPU chiplet with 2xDDR4 + a smaller IO chiplet)
There might be a better way to describe these options but you all get the idea. I have studied both Summit Ridge and Raven Ridge die shots for a while now and the two DDR4 memory controllers take about
15 to 16.5 mm² of die space (on 14nm) and the part of the SDF (Scalable Data Fabric) that AMD labeled
'IF clock domain' in their IEEE paper
"Zeppelin: An SoC for Multichip Architectures" is about 12.5% of the total die space of the Zeppelin SoC (calculated using histogram view of the die shot marking areas with different colors). Thats about
26.7 mm² ("overrounded" up) of the total die space of 212.97 mm² Zeppelin die. And that part very likely does scale with 7nm while the actual DDR4 PHYs might not scale almost at all.
So all in all, let's say that two DDR4 PHYs will take about
17 mm² and IF fabric stuff realated to memory controllers will scale with factor of 2x which gives us
13.25 mm² and a total of 30,35 mm² but let's just be fair and give it
33 mm². Now taking the 72 mm² 8C chiplet and adding those memory controllers to it would give us
105 mm² chiplet with 8 cores and two integrated memory controllers. Adding more CCXs would make IF more complex but each new 4C CCX would be about 25 to 35 mm² more. Let's just go with 35 mm² and call it a day. Please note that while these calculations are based on real die shots of Summit Ridge / Raven Ridge, there's still a lot of speculation here. So the final die sizes would be, give or take:
- 16-core modular CPU with IMC - 175 mm² chiplet (7nm)
- 12-core modular CPU with IMC - 140 mm² chiplet (7nm)
- 8-core modular CPU with IMC - 105 mm² chiplet (7nm)
So with this approach AMD could save few 10 mm² of valuable 7nm die space compared to full monolithic die. Here's a list of pros and cons of this design that i came up with (please speculate and add your own views):
- (+) Memory access latencies would be lower than with "pure" chiplet design
- (+) Smaller 7nm die size than a full monolithic chip
- (+) Little cheaper to develop (compared to monolithic) as only memory controllers need to added to Rome chiplet design and all other IO stuff could remain on a small 14nm IO die
- (-/+) Manufacturing costs might be about the same or only a little lower than a full monolithic design (because of MCM) - would still save 7nm die space
- (-) Design costs would be higher than just going with existing Rome chiplets
- (-) Only one chiplet could be used for Ryzen 3000 (AM4) because of the IMCs
- (-/~) Threadripper could have an option to use two of these chiplets but there would be an IO chiplet between them and therefore memory access would be bifurcated (maybe even more than currently)
- (-/~) PCIe latencies would be higher than monolithic design but it might not matter all that much
- (-/~) Next gen memory (DDR5) would need a new chiplet design but that might also not matter that much since Zen3 will in any case need a new chiplet - still would loose some flexibility while gaining better memory acccess latencies
So that's the idea I've been contemplating in my head for a couple of weeks now but it's a totally another story what AMD has actually done at least for the first few Ryzen 3000 models.
If you have any ideas or suggestions about this approach, please share them with us. Thanks.
Edit: As you can see, I didn't vote for the design presented in this message and this is just an option for a monolithic design while saving few 10 mm² of die space. Would this be worth over similar monolithic design, I don't know for sure. The pure chiplet design has many (manufacturing and flexibility) benefits over this one, though, if we can get over some latency penalties which may not matter that much anyway.