- Mar 3, 2017
- 1,623
- 5,894
- 136
I got it from this slide:Hmm, Have I misunderstood it. I always thought i'ts two 256-bit loads and one 256 bit store:
AMD’s Zen 4 Part 1: Frontend and Execution Engine
AMD’s Zen 4 architecture has been hotly anticipated by many in the tech sphere; as a result many rumors were floating around about its performance gains prior to its release. In February 2021…chipsandcheese.com
Moving on, the load/store units within each CPU core have also been given a buffer enlargement. The load queue is 22% deeper, now storing 88 loads. And according to AMD, they’ve made some unspecified changes to reduce port conflicts with their L1 data cache. Otherwise the load/store throughput remains unchanged at 3 loads and 2 stores per cycle.
Yea.Is Zen similarly fat?
That's nT and they're way over that already best case.10-15% IPC goal?
It has unified schedulers too, big difference from Z4. Curious why they are not submitting compiler patches yet for such a big change.The biggest unknown for me is how do they plan to feed the beast? There are no mentions of any decoder changes, surely it would be an absurd bottleneck, if not changed?
- The same 12-way 48KB L1 cache as Colden Cove (hopefully without the latency penalty)
- 8-wide dispatch (+2 vs Alder Lake and Zen 4)
- 6 ALUs (+1 vs Alder lake +2 vs Zen 4)
- 4 loads / 2 stores per cycle (vs 3/2 for Golden cove, 2 /1 for Zen 4)
- - if I'm reading this right, these are 512bit (64 byte) ? That's a massive uplift from Zen 4 if true (4x the throughput in ideal AVX-512 scenarios)
Opsec.Curious why they are not submitting compiler patches yet for such a big change.
Not when you know how Turin runs.seems like over 30% IPC was a fantasy based on very weak evidence
It's for cloud.What is interesting though is the native 16C CCX
That's a castrate from Strix with 1MB LLC.The new low power core option
Opsec.
Not really necessary here.nothing crazy stuff like dual decoder?
L1 is already inclusive of opcache contents.storing opCash to L3?
Good fun if morbidly expensive in server.What is your take on Zen 6 architecture
No.CCDs stacked on IO die
You chain AIDs just like you do it in Navi4c.there would not be much of a reason to have silicon bridge between them
Yes Venice looks like Granite Rapids on crack cocaine.Unless, there could be multiple IO dies for additional of memory channels.
Good fun if morbidly expensive in server.
No.
Just no.
They're stack on AID in Venice.
IOD is a different thingy.p
You chain AIDs just like you do it in Navi4c.
Yes Venice looks like Granite Rapids on crack cocaine.
They have to do that to keep that perf gravy train rolling, especially in server.It seems that AMD is really going all in on the MALL. in all future products.
I guess I'll come out and say what standard Venice probably looks like now.Good fun if morbidly expensive in server.
No.
Just no.
They're stack on AID in Venice.
IOD is a different thingy.p
You chain AIDs just like you do it in Navi4c.
Yes Venice looks like Granite Rapids on crack cocaine.
Yeah, but the costs are just above and beyond.Venice gets rid of EPYC's last weaknesses, for a huge fee, of course
Yeah known platform, semi-reasonable price.Many will probably stick with Turin initially
They have to do that to keep that perf gravy train rolling, especially in server.
HBM is no solution so MALL it is.
So what do you think of the Zen 5 leaks?I guess I'll come out and say what standard Venice probably looks like now.
6 AID's, each with 4 2.5D stacked 8 core CCD's, each AID connected to each other with silicon bridges.
And finally 6 IO dies, connected to the outside of each AID with fanouts, each IO die with 2 memory channels and a bunch of PCIE/CXL lanes.
192 cores, huge fully unified MALL, along with so much room for all sorts of uncore accel and other server stuff.
Venice gets rid of EPYC's last weaknesses, for a huge fee, of course. Many will probably stick with Turin initially but Venice will become a huge stick of doom over time.
What are all the 16-32 core CCXs all about then?6 AID's, each with 4 2.5D stacked 8 core CCD's, each AID connected to each other with silicon bridges.
Yeah it's basically a bandwidth ramp to keep 192/256 chungus cores functional.I wonder how much performance increase there could be for a relatively modest sized caches in the MALL.
Or is it a scenario that there is a small chunk of memory that is being contested for by number of CCDs, and that small chunk of memory (being served from MALL) will then relieve a big bottleneck?
Yeah, some time ago.Probably AMD made this presentation to some CSP/OEM/SI etc.
Actually I would say this is a presentation to an SI/OEM not even CSP. The codenames are client/commercial parts.Probably AMD made this presentation to some CSP/OEM/SI etc.
No, they're core/cache codenames.The codenames are client/commercial parts.
Yes they do which is why AMD keeps them on a tight leash wrt sampling cycles.CSPs do not leak stuffs like this.
No.within low clocks?