Question Discussion over P-core and E-core vs AMDs regular vs C-core

Markfw · Feb 9, 2024

First my take. Intel needed a way to be competitive with AMD in multicore benchmarks and performance scenarios but were limited by space and power due to foundry limitations, so they created E-cores.

AMD wanted more cores with less power, so they reduced the speed to allow for more cores in a small space, and a few other changes, but essentially kept all the same capabilities, like avx-512.

My opinion is that AMDs solution is much easier to implement and much more sane and capable in todays world. Please give your thoughts, and this discussion is more about methods than companies.

mikeymikec · Feb 9, 2024

What are "c-cores"?

Markfw · Feb 9, 2024

mikeymikec said:
What are "c-cores"?

One example. I think there are also Zen 4 c cores in Bergamo. I will have to look.

rainy · Feb 9, 2024

mikeymikec said:
What are "c-cores"?

AMD density cores like Zen 4c and used by them in Bergamo.

Schmide · Feb 9, 2024

C cores are more regular cores. Both companies used to have low power variants. (Pentium M, Atom, Geode, Bobcat, Jaguar, etc) While intel has continued two lineages, AMD has unified then diversified its cores.

Markfw · Feb 9, 2024

Schmide said:
C cores are more regular cores. Both companies used to have low power variants. (Pentium M, Atom, Geode, Bobcat, Jaguar, etc) While intel has continued two lineages, AMD has unified then diversified its cores.

Yes, what I am getting at, is which the the way of the future ? The E-core variant or the C core variant. Which is the better design.

SteinFG · Feb 9, 2024

I think both solutions are on par with each other, E cores are really dense, while C cores are easier to implement. I prefer Intel's solution, except where intel uses only 2 P-cores for their chips (U-series). There AMD will fair much better with their solutions.

The advantage of E cores is: They are much much smaller: 4 cores + Their L3 slice, are slightly bigger than what 1P core with its L3 would take. Perf/Area is like 1.5x over standard big core. All while AMD's solution reduces core+L3 by just 20%, while reducing perf by around the same number. Meaning Perf/Area is actually the same.

One big factor to also consider is Perf/W, and I don't know enough data to say anything about it

Khato · Feb 9, 2024

Certainly an interesting question.

AMD's approach definitely has the advantage of unified instruction set... but such can also be a disadvantage when those 'expensive' extensions are never used. Beyond that, it was definitely a smart, low-resource-cost move to defend against the greater core counts of ARM competition in the server space. And going forward it's an efficient option for reducing die size in the client space where there's little need for more than 2-4 'highest performance' cores.

I'll agree that Intel adopted the E-core out of competitive necessity. It's just somewhat interesting that it ended up going into client so much earlier than server. Because in the 1-2W per core range a single e-core is higher performance than a p-core. That's the same threat that ARM competition poses for many server workloads that scale nicely with core count.

As for which approach is better? AMD's is lower resource cost, Intel's offers greater flexibility. Which unsurprisingly matches their respective market positions.

Markfw · Feb 9, 2024

Khato said:
As for which approach is better? AMD's is lower resource cost, Intel's offers greater flexibility. Which unsurprisingly matches their respective market positions.

Not sure why you think E-cores/P-core is more flexible. When you buy the chip, it goes to a user/server that has a specific purpose for the most part, but if the user then needs the full capability of a P-core, they are out of luck if they are mixed (no avx-512 in the user space). The same could happen to servers, but they are more dedicated to a specific purpose.

Saylick · Feb 9, 2024

SteinFG said:
I think both solutions are on par with each other, E cores are really dense, while C cores are easier to implement. I prefer Intel's solution, except where intel uses only 2 P-cores for their chips (U-series). There AMD will fair much better with their solutions.

The advantage of E cores is: They are much much smaller: 4 cores + Their L3 slice, are slightly bigger than what 1P core with its L3 would take. Perf/Area is like 1.5x over standard big core. All while AMD's solution reduces core+L3 by just 20%, while reducing perf by around the same number. Meaning Perf/Area is actually the same.

One big factor to also consider is Perf/W, and I don't know enough data to say anything about it

Excellent points, but some comments:

- AMD's Zen 4c is 2/3rds the size of the standard Zen 4 core, but also clocks 1/3rd lower. Perf/Area is pretty similar I bet, but perf/W is higher since power is exponential with clocks.

- Intel's E cores offer much higher perf/area over their P cores not because E cores are extremely small, but because their P cores are just stinkin' huge. Per SemiAnalysis, Redwood Cove in MTL is 5.33 mm2 (core + L2). That's almost 40% larger than Zen 4 standard and it doesn't even offer 20% more perf/core. Crestmont is ~1.5mm2 (core + shared portion of the L2 cache).

Khato · Feb 9, 2024

Markfw said:
Not sure why you think E-cores/P-core is more flexible. When you buy the chip, it goes to a user/server that has a specific purpose for the most part, but if the user then needs the full capability of a P-core, they are out of luck if they are mixed (no avx-512 in the user space). The same could happen to servers, but they are more dedicated to a specific purpose.

Apologies, I should have been more clear that the comparison was with respect to chip design.

The AMD approach is 'stuck' with a single design - they can only decrease size via whatever parameterization knobs they put in place and fully validate and setting lower frequency/higher efficiency targets on layout. Now they can continue to evolve this approach by adding further parameterization to the core design, but that could well end up being a more complicated approach compared to having entirely separate designs.

By comparison, Intel is free to pursue maximum performance with their p-core while focusing on area and power efficiency on their e-core. Technically they could even make 'c' core variants of each, but such would be of limited value. If Intel's design teams were up to the challenge, this could easily result in a much larger p-core that is far faster than what AMD's single-core approach could reasonably offer. (Say 3x the area, 1.5x the performance.) Sure it'd for area efficiency, but great ST performance and power efficiency. But the horrible area efficiency is of less importance when there are only 2 of them and the remainder of MT processing is taken care of by 8-16 e-cores. Those e-cores may only offer 0.75x the performance of the competition, but at 0.5x the size. Basically, flexibility of as much or as little spread between the p-core and e-core targets as desired. It just isn't quite so great of a strategy at the moment because the Intel design teams aren't exactly delivering

Hans Gruber · Feb 9, 2024

Intel is going from 10nm to 5nm with Arrow Lake. That means Intel has much more silicon real estate on their chips. They can be more flexible. Dumping hyperthreading indicates they have some fancy new technology moving forward.

SteinFG · Feb 9, 2024

Saylick said:
Excellent points, but some comments:

- AMD's Zen 4c is 2/3rds the size of the standard Zen 4 core, but also clocks 1/3rd lower. Perf/Area is pretty similar I bet, but perf/W is higher since power is exponential with clocks.

Yea but I'm also including L3 slices necessary to connect everything, that's why C core + L3 slice is just ~20% smaller than Regular core + L3 slice.

The end result of adding e-cores is increasing MT without increasing cost of the chip (die size), and I think intel does a decent job at that. AMD doesn't seem to use their C cores for this, in the consumer market. Right now they're maybe just cutting 10$ off the price of their Phoenix2 chip, as at low power even regular cores can't sustain all-core 3.5 GHz

Saylick · Feb 9, 2024

SteinFG said:
Yea but I'm also including L3 slices necessary to connect everything, that's why C core + L3 slice is just ~20% smaller than Regular core + L3 slice.

Does this account for Zen 4c using half of the L3 as standard Zen 4?

SteinFG · Feb 9, 2024

Saylick said:
Does this account for Zen 4c using half of the L3 as standard Zen 4?

No, because L3 size is not connected any way to the type of core being used. AMD just decided to give less cache to bergamo because microservices don't benefit from fat shared L3. In fact, Phoenix1 and Phoenix2 have the same L3, which proves that point

moinmoin · Feb 9, 2024

Khato said:
Now they can continue to evolve this approach by adding further parameterization to the core design, but that could well end up being a more complicated approach compared to having entirely separate designs.

By comparison, Intel is free to pursue maximum performance with their p-core while focusing on area and power efficiency on their e-core.

What an odd distinction: E- cores are their own thing so easier than turning C-cores into their own thing? Why?

The big advantage C-cores enjoy over E-cores is that they have the same feature set as the full cores (so no disabling of AVX-512 necessary for which precious area is still wasted in P-cores).

SteinFG said:
No, because L3 size is not connected any way to the type of core being used. AMD just decided to give less cache to bergamo because microservices don't benefit from fat shared L3. In fact, Phoenix1 and Phoenix2 have the same L3, which proves that point

So you used PHX2 for your "~20%" figure? You have only proven that L3$ size is independent from the core type.

Khato · Feb 9, 2024

moinmoin said:
What an odd distinction: E- cores are their own thing so easier than turning C-cores into their own thing? Why?

This is based on my understanding of the C-core design - same RTL code base, different synthesis targets. This results in a smaller core that provides lower max frequency and slightly higher efficiency at no additional design and validation cost. As soon as you add more complex parameterization that adds corresponding design and validation cost and moves toward a p-core/e-core distinction. I note that parameterization can become the more complicated approach because if you have three different parameters you can change in design with no dependency upon one another, then that's 8 different configurations that need to be validated.

moinmoin said:
The big advantage C-cores enjoy over E-cores is that they have the same feature set as the full cores (so no disabling of AVX-512 necessary for which precious area is still wasted in P-cores).

That's only an advantage in the current implementation. There's nothing inherent to the p-core/e-code design methodology which precludes instruction set parity.

I'd agree that the current AMD c-core implementation is better than the current Intel p-core/e-core implementation in client processors. I expect that the advantages of the p-core/e-core approach will be demonstrated with Sierra Forest in the next few months.

moinmoin · Feb 9, 2024

Khato said:
This is based on my understanding of the C-core design - same RTL code base, different synthesis targets. This results in a smaller core that provides lower max frequency and slightly higher efficiency at no additional design and validation cost. As soon as you add more complex parameterization that adds corresponding design and validation cost and moves toward a p-core/e-core distinction. I note that parameterization can become the more complicated approach because if you have three different parameters you can change in design with no dependency upon one another, then that's 8 different configurations that need to be validated.

This is assuming that the complete core is one IP block. I'd think it's made up of different IP blocks that can be independently changed/validated like already happened for e.g. Mendocino which has a Zen 2 core with half rate FPU similar to PS5.

Khato said:
There's nothing inherent to the p-core/e-code design methodology which precludes instruction set parity.

There isn't. Still since introducing e-cores Intel hasn't managed to either make e-cores reach parity p-cores, or actually free up the unused AVX-512 area.

Khato · Feb 9, 2024

moinmoin said:
This is assuming that the complete core is one IP block. I'd think it's made up of different IP blocks that can be independently changed/validated like already happened for e.g. Mendocino which has a Zen 2 core with half rate FPU similar to PS5.

While it's certainly made up of many separate blocks, full chip validation of all valid variations of those blocks is typically required. One case where it wouldn't be required is if the different variants of a functional block pass formal equivalency checks (for any given input sequence, the output is exactly the same.) But as soon as the output behavior of a fub changes, eg half-rate FPU providing output every other clock instead of every clock, it's necessary to ensure that such doesn't impact the fub receiving that output in unexpected ways. Note that I'm exacerbating the potential issue here for illustrative effect - most likely there would just be a 'full' and a single c-core configuration with all other potential combinations being unsupported.

positivedoppler · Feb 9, 2024

I think AMD's Carrizo was the original C core using the GPU's HDL. It allow them to use GF's 28nm node to almost compete against Intel's 14 nm. Always wonder why they didn't continue using HDL for mobile

AMD Launches Carrizo: The Laptop Leap of Efficiency and Architecture Updates

www.anandtech.com

Abwx · Feb 9, 2024

Khato said:
slightly higher efficiency

Slightly :

Hot Chips 2023: AMD verrät wenig mehr über „Siena“ und verwirrt

Nach Genoa, Bergamo und Genoa-X fehlt noch Siena in AMDs Familie der Zen-4-CPUs für Server. Zur Hot Chips gibt es (verwirrende) Details.

www.computerbase.de

DavidC1 · Feb 10, 2024

moinmoin said:
There isn't. Still since introducing e-cores Intel hasn't managed to either make e-cores reach parity p-cores, or actually free up the unused AVX-512 area.

AVX 10.1/10.2 come into mind, 10.1 which will come with Granite Rapids.

It's not coming until post-Darkmont(18A shrink of Skymont) at the earlier. Maybe Arctic Wolf comes with 128-bit FPUs executing AVX 10.x 256-bit instructions.

SteinFG · Feb 10, 2024

moinmoin said:
So you used PHX2 for your "~20%" figure? You have only proven that L3$ size is independent from the core type.

Yea. With 32MB L3 there would be even smaller difference (15%?), so I chose PHX2

Hulk · Feb 10, 2024

Markfw said:
First my take. Intel needed a way to be competitive with AMD in multicore benchmarks and performance scenarios but were limited by space and power due to foundry limitations, so they created E-cores.

AMD wanted more cores with less power, so they reduced the speed to allow for more cores in a small space, and a few other changes, but essentially kept all the same capabilities, like avx-512.

My opinion is that AMDs solution is much easier to implement and much more sane and capable in todays world. Please give your thoughts, and this discussion is more about methods than companies.

You got half of it correct. AMD and Intel are using c and E cores for area efficiency. But they are not using them for power efficiency. P cores and normal size Zen 4 cores are more power efficient at iso-performance than their "full size" counterparts. Well, to be totally honest this is definitely the case for Intel, for AMD efficiency is probably about the same since the architecture is the same, but the c cores are smaller so the only advantage of using them for AMD is area efficiency.

Due to the nonlinear nature of the v/f curve a big (in terms of die area), wide (in terms of architecture) CPU running at a low frequency will generally be more power efficient than a smaller physical core running at higher frequency to achieve the same performance. But of course die area is expensive so we have c and E cores.

AMD added the c cores for the exact same reason Intel did. Most software doesn't utilize more than 8 or so threads so there are decreasing gains to adding more full size cores. Instead of designing an entirely new architecture AMD reduced the physical size of Zen 4, which required a decrease in clocks. This way they got quite a bit of additional multicore performance for less die area than would have been required with "full speed" Zen 4 (larger) cores.

It's all a delicately engineered balance of price/performance/efficiency that both companies create. By creating the c cores AMD has validated Intel's hybrid approach.

igor_kavinski · Feb 10, 2024

Even though C cores are better architecturally, I'm still not happy with both Intel and AMD's approach to heterogeneous cores. I would prefer them to explore a true heterogeneous core cluster or CCD with both P and C/E cores in close proximity to each other for the lowest core to core latency. Use the P core in the cluster for bursty workloads and the C/E cores for sustained workloads and keep optimizing the frequency of both types of cores dynamically based on the user's demands.

Question Discussion over P-core and E-core vs AMDs regular vs C-core

Moderator Emeritus, Elite Member

Lifer

Moderator Emeritus, Elite Member

Senior member

Diamond Member

Moderator Emeritus, Elite Member

Senior member

Golden Member

Moderator Emeritus, Elite Member

Diamond Member

Golden Member

Platinum Member

Senior member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Lifer

Senior member

Senior member

Diamond Member

Lifer