This should hopefully kill off those nonsense SMT4 rumors for Zen 3. I like what they are showing with the L3.
Does that mean they would be using one 8 core CCX?
This should hopefully kill off those nonsense SMT4 rumors for Zen 3. I like what they are showing with the L3.
I doubt AMD would do that a 4c CCX's mesh is a million times less complicated compared to an 8c and they seemingly straightened out all out of CCX communication so I don't see why they would add complexity there. What I think that suggests is they remove the L3 from the module design (simplifying that) and probably pool between CCX's. So both CCX's have direct access. It would increase the single core to cache that they were directly attached to latency slightly but it would give everyone much quicker access to the entire stash. Honestly it might help IF connections a bunch. Centralize the IF connections out to the IO die from the cache pool, so that each CCX only has to connect to the cache and the cache out instead of each CCX connecting directly to the IO die. That would also probably save on power as well.Does that mean they would be using one 8 core CCX?
Does that mean they would be using one 8 core CCX?
I doubt AMD would do that a 4c CCX's mesh is a million times less complicated compared to an 8c and they seemingly straightened out all out of CCX communication so I don't see why they would add complexity there. What I think that suggests is they remove the L3 from the module design (simplifying that) and probably pool between CCX's. So both CCX's havedct aireccess. It would increase the single core to cache that they were directly attached to latency slightly but it would give everyone much quicker access to the entire stash. Honestly it might help IF connections a bunch. Centralize the IF connections out to the IO die from the cache pool, so that each CCX only has to connect to the cache and the cache out instead of each CCX connecting directly to the IO die. That would also probably save on power as well.
Removing the L3 form the CCX doesn't simply the design, it makes it way,way,way harder, how are you going to handle coherency. If i was to guess its an 8 core CCX using rings.I doubt AMD would do that a 4c CCX's mesh is a million times less complicated compared to an 8c and they seemingly straightened out all out of CCX communication so I don't see why they would add complexity there. What I think that suggests is they remove the L3 from the module design (simplifying that) and probably pool between CCX's. So both CCX's have direct access. It would increase the single core to cache that they were directly attached to latency slightly but it would give everyone much quicker access to the entire stash. Honestly it might help IF connections a bunch. Centralize the IF connections out to the IO die from the cache pool, so that each CCX only has to connect to the cache and the cache out instead of each CCX connecting directly to the IO die. That would also probably save on power as well.
Decoupled yes, but still part of the socketed package - wasn't it part of the MB chipset(s) before Athlon 64?First they rock the x86 world by releasing Athlon x64 with IMC, now they decoupled it to rule the server/workstation world
Removing the L3 form the CCX doesn't simply the design, it makes it way,way,way harder, how are you going to handle coherency. If i was to guess its an 8 core CCX using rings.
Decoupled yes, but still part of the socketed package - wasn't it part of the MB chipset(s) before Athlon 64?
Still are, x570 is their first internal chipset for a while I think - Asmedia being their main supplier up until recently.It was on the north bridge for the longest time. Problem was, back then AMD was largely reliant on 3rd party chipsets.
Still are, x570 is their first internal chipset for a while I think - Asmedia being their main supplier up until recently.
28 point to point core links is high, but not absurdly so. Anyway, IF AMD went with an 8 core CCX they'd use a mesh or ring. Or, then can just put 4 CCXs on one CCD. Seems like they'll have the xtor budget to do it at 5nm.I meant for the CCX. One of the big issues with the APU's and some of their other designs is that they still basically have to redesign the ccx for other implementations. So the framework is there but they have redesign that. What that means for the rest of the design I can't be sure. But once you have that you have an even easier to adapt CCX design for other dies. Still going to be tons better then trying to cross connect 8 cores.
IIRC FMA4 was supposed to be the standard but then Intel pulled a fast one on AMD and switched to FMA3.
I mean, they could move the L3 to the I/O die (doubtful).
From 20:41 in the video said:What will come in Milan is that we will get rid of that dual level three where compute complex (right here) have 4 cores sharing level 3, with Milan we will do one L3 for all of the cores in a single chiplet.
Video might get taken down
View attachment 11605
Zen 3 Milan highlights [AMD, Martin Hilgeman ]
- Unified L3 32+ MB per CCD
- Sampling already
- 7nm
- Same core count as Rome
- 2x SMT
- Planned for Q3 2020
- DDR4/SP3
What is Zen3's special sauce gonna be?
- Bigger cache most likely (32MB+)
- Improved IF
- ...
Speeding up or widening the IF would be more costly power wise, as would using more cache (which looks set at 32MB from the screen grab).
New uarch Zen 3, new 19h Family number..... and the main improvement is just unified L3 cache? It looks like fake.What is Zen3's special sauce gonna be?
- Bigger cache most likely (32MB+)
- Improved IF
- ...
It looks like fake.
I can't beleive new uarch with new 19 Family number will bring smaller improvements than Zen 2.
I doubt anyone actually believes that Milan will stop at 64 cores. Just saying.
I doubt anyone actually believes that Milan will stop at 64 cores. Just saying.
8 core APUs!Seems like the unified L3 in Zen3 effectively makes a CCD a single CCX, rather than 2.
I wonder how this works out for APU's under Zen3...
Zen+ on 12nm had a Ryzen 7 2700E 8 core at 45W, stands to reason that at 7nm+ you would be able to get an 8 core and decent GPU for 35W.8 core APUs!
I'm not sure it would (how do they manage DRAM stacking in mobile?). I'm not talking about a whole stack, I'm talking about a single high stack which should remove the need for TSVs as you wouldn't be routing through the HBM (which is what the TSVs are there for). Plus there's possibility that you could implement the HBM in the die itself, and they could segment easily based on the viable amount. I don't believe that the I/O die gains a lot from being shrunk, and to me HBM3 using the same process provides an opportunity that I think would be very beneficial to take advantage of.
I'm talking about in an APU itself. There's quite a few companies that don't want to bother with an extra chip (GPU), but they're fairly constrained by memory bandwidth with regards to GPU performance in current APUs. Which as they move to chiplets the distinction there becomes a bit semantic, but for the OEM it would be a single chip solution, and its something they could do without needing to overhaul the work they did on the substrate for Zen 2 - which they talked up how much work they did there.
By Keller's own words he's there to develop next gen interconnect (which I believe he talked about one that could scale up from intrachip to interchip, and then even system - i.e. unified memory/storage that leverages different tiers trying to make that transparent to the system - and network/datacenter; that to me sounds a lot like the talk about moving to fiber optic, which he's likely looking at is it time to start that transition or can they push the limits of metal first). The way he talked he doesn't seem to have anything to do with the core designs (architecture, etc). Seems that he's there to get the various chips communicating in an efficient and fast manner (which will be needed with move to chiplet designs and co-processing and other things).
Which, I think that's what he was working on at Tesla, is figuring out how to get all the various components (sensors, processing) communicating, while trying to cut down the wiring (for weight, complexity, and cost reasons), but push latency down and throughput up.
And I think there was talk that actually was kinda his focus with Zen (basically InfinityFabric and designing chips to utilize that). I might be very wrong though, but I do know he himself said he's at Intel for developing interconnect.