Discussion Intel current and future Lakes & Rapids thread

Page 884 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Fjodor2001

Diamond Member
Feb 6, 2010
3,842
305
126
No 32E being seen among the initial batch of ARL ES datasheet.
Ok, so "no 32E" was from an ARL ES datasheet. Nothing is known about amount of E cores in the final SKUs then, and it could be 24/32E?
and E core seems didn't boost sells, IMO it's not worth it
How do we know how sales relate to the presence/amount of E cores? There are many factors affecting sales, e.g. power consumption, price, ST/MT performance, etc. Correlation does not imply causation.
 

SiliconFly

Golden Member
Mar 10, 2023
1,062
548
96
Ok, so "no 32E" was from an ARL ES datasheet. Nothing is known about amount of E cores in the final SKUs then, and it could be 24/32E?

How do we know how sales relate to the presence/amount of E cores? There are many factors affecting sales, e.g. power consumption, price, ST/MT performance, etc. Correlation does not imply causation.
Having E-cores is definitely a bonus. But having way too many E-cores doesn't sound very appealing. Most of them will be idle for the life of the product itself considering there isn't much software to support them.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,842
305
126
Having E-cores is definitely a bonus. But having way too many E-cores doesn't sound very appealing. Most of them will be idle for the life of the product itself considering there isn't much software to support them.
Many MT heavy workloads can handle more or less as many cores as you have available. If they already make use of 16C then they'll also make use of 48C.

E.g. video encoding, running VMs, image editing, compilers, etc. And sometimes running a mix of those mentioned in parallel.
 
Reactions: SiliconFly

gdansk

Platinum Member
Feb 8, 2011
2,212
2,836
136
It depends on the workload itself and especially if there is a lot of lock contention more cores doesn't really help. That's why GB6 changed its MT test methodology (for the better, in my opinion). It still includes many of those mentioned use cases but shows relatively worse scaling for Intel and AMD relative to Apple.
 

dullard

Elite Member
May 21, 2001
25,126
3,515
126
Having E-cores is definitely a bonus. But having way too many E-cores doesn't sound very appealing. Most of them will be idle for the life of the product itself considering there isn't much software to support them.
You are correct that not a lot of software can effectively use 40 threads. Eventually Amdahl's law kicks in and you can't do one thing much (if any) faster with more threads. Some software doesn't try to do one thing though (especially a lot of the benchmarking software which arbitrarily create work). Think about almost anything in the distributed computing realm. They don't try to do one thing fast, they try to do many things fast. That type of software is perfect for core spamming. That is not applicable to everyone though as many people don't run those types of software.

But I have a more important point that I'd like to make. Right now as I eat lunch, my computer is almost completely idle. Yet ~18 threads are using over 0.3% of the CPU time. Context switching (switching a core from one thread to another) to share all 18 threads over a few cores is quite computationally and energetically expensive. https://en.wikipedia.org/wiki/Context_switch#Cost Imagine if context switching wasn't really a thing any more. Imagine if every important thread has its own core. Think about all that software you are using that doesn't need a fast core, but needs to do something regularly (virus scanning, Windows crap, background encryption of everything, heck even Microsoft Teams has 3 to 4 threads each using a significant amount of CPU time all the time).

Imagine they all have their own tiny core. No more context switching for threads that matter to you would make the operating system a piece of cake and far more efficient than it is. Heck, Windows could even be made to be fast. Gasp! Imagine how much simpler and more secure virus scanning could be if the programmers didn't have to consider sharing resources. Scan everything as it is being used. Encrypt/decrypt everything without any performance impact to the user's software. AI always listening to what you are doing/saying/typing without impacting the user (if enabled), etc. All without the CPU cost and energy use of context switching. And your own programs have dedicated powerful cores without any of that background tasks impacting them (no interruptions for Windows stuff, no cache sharing, etc).
 
Last edited:

rtxtwt

Senior member
Jul 2, 2018
319
505
136
Ok, so "no 32E" was from an ARL ES datasheet. Nothing is known about amount of E cores in the final SKUs then, and it could be 24/32E?
Oh sry, I don't know anything about that. The ARL data was mentioned by a twitter which was deleted and author even change his name to avoid trouble. he seemed to post a lot like core config but I forgot that.

How do we know how sales relate to the presence/amount of E cores? There are many factors affecting sales, e.g. power consumption, price, ST/MT performance, etc. Correlation does not imply causation.
Compare to older generation before Alderlake. E core implementation doesn't change the picture too much when compare to competition.
 

gdansk

Platinum Member
Feb 8, 2011
2,212
2,836
136
You are correct that not a lot of software can effectively use 40 threads. Eventually Amdahl's law kicks in and you can't do one thing much (if any) faster with more threads. Some software doesn't try to do one thing though (especially a lot of the benchmarking software which arbitrarily create work). Think about almost anything in the distributed computing realm. They don't try to do one thing fast, they try to do many things fast. That type of software is perfect for core spamming. That is not applicable to everyone though as many people don't run those types of software.

But I have a more important point that I'd like to make. Right now as I eat lunch, my computer is almost completely idle. Yet ~18 threads are using over 0.3% of the CPU time. Context switching (switching a core from one thread to another) to share all 18 threads over a few cores is quite computationally and energetically expensive. https://en.wikipedia.org/wiki/Context_switch#Cost Imagine if context switching wasn't really a thing any more. Imagine if every important thread has its own core. Think about all that software you are using that doesn't need a fast core, but needs to do something regularly (virus scanning, Windows crap, background encryption of everything, heck even Microsoft Teams has 3 to 4 threads each using a significant amount of CPU time all the time).

Imagine they all have their own tiny core. No more context switching would make the operating system a piece of cake and far more efficient than it is. Heck, Windows could even be made to be fast. Gasp! Imagine how much simpler and more secure virus scanning could be if the programmers didn't have to consider sharing resources. Scan everything as it is being used. Encrypt/decrypt everything without any performance impact to the user's software. AI always listening to what you are doing/saying/typing without impacting the user (if enabled), etc. All without the CPU cost and energy use of context switching. And your own programs have dedicated powerful cores without any of that background tasks impacting them (no interruptions for Windows stuff, no cache sharing, etc).
Check your thread count. I'm at 4450. Context switching is in fact inevitable for the remainder of our life times. Operating systems cannot avoid it even if they have 1024 CPU cores.
Reducing context switches is beneficial but I'm not sure adding 16 more e cores will help that much. How many of your cores are actually being scheduled on simultaneously? Entire core clusters are gated in these low use, idle scenarios. It is more power efficient to waste a few microseconds context switching 4-8 times than to wake up another core cluster.
 

dullard

Elite Member
May 21, 2001
25,126
3,515
126
Check your thread count. I'm at 4450. Context switching is in fact inevitable for the remainder of our life times. Operating systems cannot avoid it even if they have 1024 CPU cores.
Reducing context switches is beneficial but I'm not sure adding 16 more e cores will help that much. How many of your cores are actually being scheduled on simultaneously? Entire core clusters are gated in these low use, idle scenarios. It is more power efficient to waste a few microseconds context switching 4-8 times than to wake up another core cluster.
I have ~3200 threads right now. But ~3182 of them are idle. They can sit on a single core unused along with other background tasks that do not need to be timely (or rent a core as needed). who cares if Microsoft Service Host: Themes gets delayed by 6 microseconds? Those aren't the issue.

I already answered how many are being used simultaneously: 18 for me. 18 actively doing work basically all the time and interfering with/taking resources from/slowing down the software that I do care about.

But my post wasn't about what is currently being done, but what could be done in the future. "If I had asked people what they wanted, they would have said faster horses." - possibly Henry Ford. I'm looking into what could be done, more features, less interruptions for the main programs, snappier, less energy, etc. True 24/7 security without noticeable impact on the performance, etc.
 
Last edited:
Reactions: Executor_

dullard

Elite Member
May 21, 2001
25,126
3,515
126
Compare to older generation before Alderlake. E core implementation doesn't change the picture too much when compare to competition.
E cores aren't really useful until you have 8, even better 16+ of them. The fact that many chips with E cores don't have enough makes people think the E core idea is bad (see A/// where he turns them off). Having just 4 E cores on a lot of chips was a terrible design choice. But, look at where they are heading and you'll see a different possibility.
 
Reactions: Fjodor2001

gdansk

Platinum Member
Feb 8, 2011
2,212
2,836
136
I have ~3200 threads right now. But ~3182 of them are idle. They can sit on a single core unused along with other background tasks that do not need to be timely (or rent a core as needed). who cares if Microsoft Service Host: Themes gets delayed by 6 microseconds?

I already answered how many are being used simultaneously: 18 for me. 18 actively doing work basically all the time and interfering with/taking resources from/slowing down the software that I do care about.

But my post wasn't about what is currently being done, but what could be done in the future. "If I had asked people what they wanted, they would have said faster horses." - possibly Henry Ford. I'm looking into what could be done, more features, less interruptions for the main programs, snappier, less energy, etc. True 24/7 security without noticeable impact on the performance, etc.
Only 18 active threads? I find that hard to believe. Run ProcessExplorer and tell me what does it say the context switch delta is? Every tick is in the order of 20,000 context switches per second even on my idle 32 thread system. Windows isn't context switching because it's fun, but because it has a lot of threads to schedule. So unless you're proposing a 20P + 20000 E core CPU it'll never get to the point of no context switching. Even in an 18 important thread case the interactive threads should be on the P cores so you still have 18 threads that should be on 8P cores which means context switching.
 

Khato

Golden Member
Jul 15, 2001
1,206
251
136
Agreed, the low E-core configurations on ADL were a pretty clear indicator of risk management. Intel wanted to devote the minimum amount of die size to hybrid compute on the first generation in order to get the ball rolling. Then scale up to 16 E cores on RPL and eventually 32 E cores on ARL - all the cores that desktop use could need currently. Seems to have been an adequately successful strategy on the whole. Especially since it now enables them to expand to the low-power E cores on MTL. And maybe in the future we'll see super high performance cores on the other end of the spectrum.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,842
305
126
Long-term, wouldn't the best strategy be:

~8P cores
E-cores for the rest of the die area

Reason: You rarely need more than 8C at max ST performance. For the rest of the workloads you need max MT performance, and E-cores handle that better with regards to perf/watt and die area / cost.
 
Last edited:

dullard

Elite Member
May 21, 2001
25,126
3,515
126
Only 18 active threads? I find that hard to believe. Run ProcessExplorer and tell me what does it say the context switch delta is? Every tick is in the order of 20,000 context switches per second even on my idle 32 thread system. Windows isn't context switching because it's fun, but because it has a lot of threads to schedule. So unless you're proposing a 20P + 20000 E core CPU it'll never get to the point of no context switching. Even in an 18 important thread case the interactive threads should be on the P cores so you still have 18 threads that should be on 8P cores which means context switching.
I don't think I conveyed my message to you. Never did I say each thread gets its own core. I said each active thread that uses any significant amount of CPU time and needs to be run routinely (If you go back and reread, I arbitrarily put the cutoff at regularly >0.3% of my CPU, but that could be a different cutoff). That number is far, far lower than 20000. Yes, there will always be context switching but it could be limited to non-time sensitive tasks on a single core when the penalty doesn't matter and isn't felt (and then you have the rentable cores when a background thread suddenly needs a burst of resources). The threads that matter are usually orders of magnitude fewer. Those threads can have enough dedicated cores to avoid context switching issues.
 

gdansk

Platinum Member
Feb 8, 2011
2,212
2,836
136
I don't think I conveyed my message to you. Never did I say each thread gets its own core. I said each active thread that uses any significant amount of CPU time and needs to be run routinely (If you go back and reread, I arbitrarily put the cutoff at regularly >0.3% of my CPU, but that could be a different cutoff). That number is far, far lower than 20000. Yes, there will always be context switching but it could be limited to non-time sensitive tasks on a single core when the penalty doesn't matter and isn't felt. The threads that matter are usually orders of magnitude fewer. Those threads can have enough dedicated cores to avoid context switching issues.
Yeah I guess the part about
No more context switching would make the operating system a piece of cake and far more efficient than it is.
really confused me. Even when divided into P core pool and E core pool both will still be context switching all the time. Windows already tries to keep important threads resident longer & throwing more e cores at it will only decrease the portion of CPU time spent context switching if it also keeps clusters awake.

But even idle it's 20,000+ context switches. More awake e-cores can help in theory but we're talking marginal improvements because many threads really want to race to sleep waiting for the next interrupt which makes them good candidates for the P cores.


I am convinced:
  1. When concerned with latency of interactive tasks instead of total throughput then 12P+16E would be superior to 8P+32E.
  2. The 8P+32E configuration, if it is real, is pretty much designed for benchmark wanking.
 
Last edited:

SiliconFly

Golden Member
Mar 10, 2023
1,062
548
96
You are correct that not a lot of software can effectively use 40 threads. Eventually Amdahl's law kicks in and you can't do one thing much (if any) faster with more threads. Some software doesn't try to do one thing though (especially a lot of the benchmarking software which arbitrarily create work). Think about almost anything in the distributed computing realm. They don't try to do one thing fast, they try to do many things fast. That type of software is perfect for core spamming. That is not applicable to everyone though as many people don't run those types of software.

But I have a more important point that I'd like to make. Right now as I eat lunch, my computer is almost completely idle. Yet ~18 threads are using over 0.3% of the CPU time. Context switching (switching a core from one thread to another) to share all 18 threads over a few cores is quite computationally and energetically expensive. https://en.wikipedia.org/wiki/Context_switch#Cost Imagine if context switching wasn't really a thing any more. Imagine if every important thread has its own core. Think about all that software you are using that doesn't need a fast core, but needs to do something regularly (virus scanning, Windows crap, background encryption of everything, heck even Microsoft Teams has 3 to 4 threads each using a significant amount of CPU time all the time).

Imagine they all have their own tiny core. No more context switching would make the operating system a piece of cake and far more efficient than it is. Heck, Windows could even be made to be fast. Gasp! Imagine how much simpler and more secure virus scanning could be if the programmers didn't have to consider sharing resources. Scan everything as it is being used. Encrypt/decrypt everything without any performance impact to the user's software. AI always listening to what you are doing/saying/typing without impacting the user (if enabled), etc. All without the CPU cost and energy use of context switching. And your own programs have dedicated powerful cores without any of that background tasks impacting them (no interruptions for Windows stuff, no cache sharing, etc).
Actually, the threads work a bit differently. Threads are even-driven. The become active on a event, run for a short burst & idle most of the time. Even threads that iterate tend to go idle waiting for a system event or a state change. If any thread runs continuously, it drives up cpu utilization to 100% for that core.

So, the OS actually keeps scheduling multiple threads to fewer cores as long the core utilization is low. Keeping fewer cores active (and rest parked) keeps the power usage under check. Only when there are many active threads fully utilizing the cores (like games, benchmarks, transcoding, compression, scanning, etc), the OS tries to power-up more cores to sustain performance at the cost of efficiency.

When core utilization is low, context switching delay doesn't make much of a difference. Context switching starts hurting performance only when multiple cores are running multiple threads at full steam and context-switching delay becomes a bottleneck.
 
Last edited:

dullard

Elite Member
May 21, 2001
25,126
3,515
126
Yeah I guess the part about

really confused me. Even when divided into P core pool and E core pool both will still be context switching all the time. Windows already tries to keep important threads resident longer & throwing more e cores at it will only decrease the portion of CPU time spent context switching if it also keeps clusters awake.

But even idle it's 20,000+ context switches. More awake e-cores can help in theory but we're talking marginal improvements because many threads really want to race to sleep waiting for the next interrupt which makes them good candidates for the P cores.
View attachment 85918

I am convinced:
  1. When concerned with latency of interactive tasks instead of total throughput then 12P+16E would be superior to 8P+32E.
  2. The 8P+32E configuration, if it is real, is pretty much designed for benchmark wanking.
I edited it to "no more context switching on threads that matter to you". Is that more clear? I thought from that context it would have been clear, but I don't mind editing it more.

To repeat, there is always and always will be context switching. Is that clear? But, for your priority threads on software that you are actively using will have dedicated cores that Windows isn't buggering up with context switching to do its background tasks.
 
Last edited:

dullard

Elite Member
May 21, 2001
25,126
3,515
126
Actually, the threads work a bit differently. Threads are even-driven. The become active on a event, run for a short burst & idle most of the time. Even threads that iterate tend to go idle waiting for a system event or a state change. If any thread runs continuously, it drives up cpu utilization to 100% for that core.

So, the OS actually keeps scheduling multiple threads to fewer cores as long the core utilization is low. Keeping fewer cores active (and rest parked) keeps the power usage under check. Only when there are many active threads fully utilizing the cores (like games, benchmarks, transcoding, compression, scanning, etc), the OS tries to power-up more cores to sustain performance at the cost of efficiency.

When core utilization is low, context switching delay doesn't make much of a difference. Context switching starts hurting performance only when multiple cores are running multiple threads at full steam and context-switching delay becomes a bottleneck.
None of that addresses what I was saying. Yes, when idle, it doesn't matter. If your computer utilization is low, who cares? Throw everything at a dedicated power efficient tiny core (or tiny cluster of cores) and go with it since your computer is idle. Avoid powering up a power hungry performance core.

It is only when you are utilizing your computer to its fullest when events come along and interrupt your process to do their task. That is exactly what could be avoided with cores dedicated to those events leaving your cores uninterrupted.

Try the opposite thought process. Don't think of idle, since we are probably both in agreement with having E cores handle the idle stuff is best. Suppose you have an 8* core computer, and your software is using 8 cores fully with your software. What do you want to happen when an event occurs? Me, I'd rather have a tiny core(s) dedicated to those random things and leaving my 8 cores doing their work uninterrupted. No hiccups, no dropped frames, no spinning mouse cursor as you wait frustratingly, etc, because there was no stoppage for those other tasks.

* Replace "8" with any number you want.
 
Last edited:

Khato

Golden Member
Jul 15, 2001
1,206
251
136
The 8P+32E configuration, if it is real, is pretty much designed for benchmark wanking.
No question that it's more MT throughput than needed. Sure rendering, transcoding, and some other tasks will always be able to make use of more cores. But that's getting beyond the 'mainstream' realm which likely wouldn't notice any difference between a 4P+8E configuration and 8P.

Another way to put it. Intel learned their lesson with respect to core counts - it's better to have the 'benchmark wanking' product rather than let the competition run with it.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,842
305
126
No question that it's more MT throughput than needed. Sure rendering, transcoding, and some other tasks will always be able to make use of more cores. But that's getting beyond the 'mainstream' realm which likely wouldn't notice any difference between a 4P+8E configuration and 8P.

Another way to put it. Intel learned their lesson with respect to core counts - it's better to have the 'benchmark wanking' product rather than let the competition run with it.

With regards to the 'mainstream' realm, why only consider MT performance and not also ST performance? The "average consumer" currently does not need more performance than what 13900K or 7950X provides, regardless if it's ST or MT performance.

Whomever needs more performance is a power user, which thus are the ones in scope for the top-of-the-line desktop CPUs. And power users will need more MT performance. More MT performance is best provided using E-cores. Better perf/watt, lower cost, and smaller die area.
 
Reactions: SiliconFly

SiliconFly

Golden Member
Mar 10, 2023
1,062
548
96
It is only when you are utilizing your computer to its fullest...

Try the opposite thought process. Don't think of idle, since we are probably both in agreement with having E cores handle the idle stuff is best. Suppose you have an 8* core computer, and your software is using 8 cores fully with your software. What do you want to happen when an event occurs? Me, I'd rather have a tiny core(s) dedicated to those random things and leaving my 8 cores doing their work uninterrupted. No hiccups, no dropped frames, no spinning mouse cursor as you wait frustratingly, etc, because there was no stoppage for those other tasks.
Sure. Just thinking out of the box. Let us assume that we pin some preferred threads to some preferred cores. Even then, when the other cores are running full and there are other threads waiting in the queue, the OS will still try to schedule (preempt) some of those other threads to the preferred cores.

Even if we try pin the threads manually by programming them directly or use apps for core-pinning, the performance gains won't be that significant as the OS scheduler usually takes care of all these by itself very efficiently (windows, linux, etc).
 
Last edited:

dullard

Elite Member
May 21, 2001
25,126
3,515
126
Sure. Just thinking out of the box. Let us assume that we pin some preferred threads to some preferred cores. Even then, when the other cores are running full and there are other threads waiting in the queue, the OS will still try to schedule (preempt) some of those other threads to the preferred cores.
I think that is where we differ. When there are plentiful E cores, there would be no need to preempt the preferred cores. We could finally have dedicated cores to our important programs with no* context switching, no preemption, etc. of those important tasks.


* Yes, gdansk, there would still be context switching of the unimportant tasks on unimportant cores.

Even if we try pin the threads manually by programming them directly or use apps for core-pinning, the performance gains won't be that significant as the OS scheduler usually takes care of all these by itself very efficiently (windows, linux, etc).
"Usually" is not sufficient in my experience. I see all kinds of stutters and delays when background tasks kick in. Maybe it is fine with your tasks. But certainly not for me.
 

gdansk

Platinum Member
Feb 8, 2011
2,212
2,836
136
I guess we'll have to agree to disagree on that. 16e cores is plenty of cores for a low priority thread pool. Context switching on the P cores is inevitable and more common the fewer P cores you have regardless of 16 or 32e cores. The Apple approach is better for anyone who cares about a responsive, low latency system than the rumored 32e Intel approach.

The reason a company would go beyond that is throughput. Which is an odd choice for consumer parts. I wouldn't have a problem with the option if they also made a 12P+16E version but I doubt Intel will do both.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |