igor_kavinski
Lifer
- Jul 27, 2020
- 25,245
- 17,540
- 146
2027/2028? Wake me up when that happens, please.Intel's not gonna be asleep at the wheel forever. Unified Core is coming, after all.
2027/2028? Wake me up when that happens, please.Intel's not gonna be asleep at the wheel forever. Unified Core is coming, after all.
sure in 2028....Intel's not gonna be asleep at the wheel forever. Unified Core is coming, after all.
stock ARM designs can't even get their phone/mobile cores to clock past 3.8GHz on N3E...But at least ARM self-immolated Neoverse since any X5 derivative will have beyond prohibitive costs for server applications due to gigabloat.
Maybe 2029, even.sure in 2028....
The 1t isn't even far behind, but man the area just stinks.stock ARM designs can't even get their phone/mobile cores to clock past 3.8GHz on N3E...
You make it sound like 6 ghz is a hard process node wall and factors like pipeline length etc. don't make much of a difference.Great post. I am going to be the contrarian here. Over 6GHz is the equivalent of the sub 2 hour marathon. It can be done but only with cheating. Either wind aided for the runners or with resulting catastrophic degradation for the silicon. As always, I'd love to see it. But I'm not thinking it will happen in this decade with ambient cooling.
Maybe 2029, even.
But point still stands.
Core improvements:
- Int scheduler entries for ALU/AGU upgraded from 88/56 to 96/64 or whereabouts
- Int PRF upgraded from 240 to at least 288, perhaps even 336 entries (336 would mean 56 per ALU, like Zen4 had)
- ROB upgraded from 448 to at least 512 entries
- smaller other upgrades throughout the core, including in the FPU area
- return of some optimizations that accelerate some ops (or no-ops via NOPS fusion) substantially
- at least 300, but more likely 500-600 and maybe even 700-800 mhz turbo clock uplift (not all-core, but at least for some of them), thanks to N3P + smart usage of 2-2 and 3-2 fin transistors where it's worth it
Uncore improvements:
- although no increase in L3 per core, cache/bandwidth-sensitive heterogenous workloads (aka not all threads equally heavy) will benefit from the 50% larger L3 per CCD
- less cross-CCD context switch penalties due to bigger CCDs + faster chiplet interconnect
- bandwidth improvements from faster chiplet connection + faster memory support
Wouldn't surprise me if INT-heavy workloads - and therefore many client and semi-professional workloads - would see a bigger effective IPC uplift on Zen6 than what Zen5 gave us.
And then the turbo clock bumps and more cores on top of that.
Oooooof, tell me you don't remember Bulldozer without telling me 🤣Static core partitioning = "Little Cores"?
Yeah they should start with not regressing on freq gen on gen in mobile.
If Zen 6 improvements were increase cores to 24, 5% ST IPC gain, and some efficiency gains due to a node shrink I'd be quite pleased.
But it does not let you keep more values in registers than there are architectural registers. The compiler will spill regardless of the size of register file if your working set is too big. And this is where APX is useful. I am not sure if I am using right words to convey the message.Register renaming allows the compiler to just keep a live set, and offload finding ILP to the CPU. Modern compilers are absolutely assuming renaming and hundreds of registers. Before renaming was common, compilers were designed to try to extract ILP in ways that increased register pressure, through aggressive unrolling and interleaving and the like.
Been optimising some stuff last month and using this also worked on my Zen 4 nicely, was pretty tight loop too where everything will be cached very nicelyAnd agressive unrolling is still used to this day
Bulldozer had many other flaws, like e.g. poor caches. Zen is solid from the ground up.Oooooof, tell me you don't remember Bulldozer without telling me 🤣
You can statically partition Zen1 (maybe later too). Oracle had Naples with cores statically sawed in half per SMT thread.Oooooof, tell me you don't remember Bulldozer without telling me 🤣
They'd probably need to dupe some parts of ld/st setup for proper isolation but yeah.Correct, security should get an improvement from that as well (if implemented correctly). In the end it depends on what customers request. I see static 2T mode also as a product to counter ARM parts and Intels Sierra Forrest etc.
Compared to the existing partially static, partially dynamic resource sharing SMT design, it would be...No new core design, no new physical design.
...a product in which performance per socket and performance per Watt are sacrificed in favor of performance determinism. Hard to tell if the sacrifices would turn out small enough to be still competitive with the likes of Sierra Forrest.In the end it depends on what customers request.
Zen 6 is a new core, so it is not a different design because it will be new anyways. Yes, there might some updates be required if starting from Zen 5. But if I can do dynamic resource sharing, I should be able to do static "no sharing" with rather little effort. Dynamic sharing is more complex compared to just give each thread half of the resources.Compared to the existing partially static, partially dynamic resource sharing SMT design, it would be...
I do not know that either. But let's imagine a hypotethical 256C/512T Zen 6 CPU which can turned into a 512C CPU. Each core has roughly Zen 4 performance, including AVX512. It's main competitor will be Clearwater Forest (with unknown performance and core counts)....a product in which performance per socket and performance per Watt are sacrificed in favor of performance determinism. Hard to tell if the sacrifices would turn out small enough to be still competitive with the likes of Sierra Forrest.
I think the main points to decide if this is viable idea is how much static partitioning optional feature would cost in silicon and if additional validation would be cheaper than simply designing and validating a purpose built core.Zen 6 is a new core, so it is not a different design because it will be new anyways. Yes, there might some updates be required if starting from Zen 5. But if I can do dynamic resource sharing, I should be able to do static "no sharing" with rather little effort. Dynamic sharing is more complex compared to just give each thread half of the resources.
I do not know that either. But let's imagine a hypotethical 256C/512T Zen 6 CPU which can turned into a 512C CPU. Each core has roughly Zen 4 performance, including AVX512. It's main competitor will be Clearwater Forest (with unknown performance and core counts).
I believe that such a 512C CPU would look very decent in it's market environment. If customers are happy with 256C/512T parts so be it, then AMD will leave SMT as is.
You're not sacrificing anything but a wee bit of area..a product in which performance per socket and performance per Watt are sacrificed in favor of performance determinism
How did you arrive at that conclusion and when you say compare, you tested some workload on an Epyc CPU using some BIOS option to alter the partitioning of core resources?Competitive sharing should tend to yield higher utilization.
I don't know. The clever people left and formed their own company?Maybe easier said than done but I trust they have clever people working there so I hope they will be able to improve this going forward
Competitive sharing should tend to yield higher utilization.
I am speaking hypothetically. For now, only AMD have the means to test this (in simulators at least, if not in actual silicon)... unless there is already a custom product out there like the one mentioned in #1,313.How did you arrive at that conclusion and when you say compare, you tested some workload on an Epyc CPU using some BIOS option to alter the partitioning of core resources?
The "rentable units" which are mentioned in this article are a different example of dynamic resource sharing, with the goal of optimum resource utilization. On a first glance, it looks to me like some sort of hardware replacement of (or hardware assistance to) the operating system's thread scheduler. — It's otherwise a bit off-topic here as it is an Intel patent.[...] clever people left [Intel] and formed their own company?
https://www.heise.de/en/news/Leading-Intel-engineers-found-RISC-V-company-9847956.html
Ignoring the low overlap of x86 and RISC-V ecosystems for a moment — while these engineers might want to follow up on ideas which they came up with while working for Intel but Intel put on the back burner, they can do so only to the extent to which they can come up with ideas to work around the now Intel-owned patents (which they themselves might have helped to conceive...).Wouldn't it be ironic if actual competition to Zen 6 came from a RISC-V design...
I don't know. The clever people left and formed their own company?
Leading Intel engineers found RISC-V company
An Intel team was supposed to develop a completely new CPU architecture. The project now appears to have been scrapped.www.heise.de
The lead is a woman.
"Hell hath no fury like a woman scorned"
Wouldn't it be ironic if actual competition to Zen 6 came from a RISC-V design...