Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 257 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
679
559
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E08 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Arrow Lake Refresh (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXDesktop OnlyMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2025 ?Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E8P + 32E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ??8 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

 

Attachments

  • PantherLake.png
    283.5 KB · Views: 23,969
  • LNL.png
    881.8 KB · Views: 25,441
Last edited:

dullard

Elite Member
May 21, 2001
25,126
3,516
126
I was thinking it's either because there's a massive security hole in HT that Intel doesn't want to admit to right now or they are simply cutting costs by not doing the validation.
What is the latest in the performance impact of the spectre / meltdown mitigations? I don't follow that thread regularly. At least for a while at the start, there was a pretty big performance loss from the mitigations when hyperthreading was on.

Hyperthreading realistically is more of a 10% performance boost (ranges roughly from +30% in a few benchmarks to -10% in others, with typical average close to ~+10%). And that was before the mitigation performance losses. So how does the current spectre / meltdown mitigation performance loss compare to the potential hyperthreading gain? Potentially a wash?
 
Mar 8, 2024
37
110
66
Chalking up the loss of HT to apple alone is pretty silly; i think a more reasonable explanation has something to do with the utter lack of success that intel has with arresting power budgets in a way that scales with performance. If you're a company with a history of utterly catastrophic duds, you're on the back foot against AMD, and you NEED to have a successful generational launch to stop the coming tide of OEM mutiny - you axe the thing that makes it harder to hit performance goals. I'm not sure how it'll play out in marketing terms though (people generally like seeing big number, because that's more bigger and better and gooder)
 

naukkis

Senior member
Jun 5, 2002
722
610
136
Big cores in hybrid designs are there to offer better thread performance. Using HT will nullify that as HT will split core thread performance to about half. Only beneficial case for HT in those hybrid designs are massively parallelized loads where single thread performance won't matter - and if power is limited it's more beneficial to assign that power to efficiency cores anyway for better total performance. Intel was actually slow to drop out HT, they should have done it as soon as they go to hybrid designs.
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,506
3,651
96
Big cores in hybrid designs are there to offer better thread performance.
Those cores are also used in DC, where SMT loss hurts.
Nope. LNL actually competes with the likes of Snapdragon X series to fill the gap left out by ARL.
no? lol.
X Elite is higher power than LNL is like all cases.
For a lot higher nT perf but I digress.
 

Hulk

Diamond Member
Oct 9, 1999
4,269
2,089
136
Anyone have a technical understanding of how the cycles that were unused for the primary thread and diverted to the secondary logical thread are going to be utilized solely for the primary thread? Is the removal of HT just to reduce die area or is something being changed to minimize lost cycles during thread stalls?

I mean other than Apple doesn't do it.
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,506
3,651
96
Is the removal of HT just to reduce die area or is something being changed to minimize lost cycles during thread stalls?
Less validation work and you dupe a bit less structures.
SMT area/power impact was overall negligable, you pay in validation costs/times mostly.
 

SiliconFly

Golden Member
Mar 10, 2023
1,062
548
96
...they should have done it as soon as they go to hybrid designs.
I totally agree with this assessment. They should have removed it in Alder Lake itself. Anyways, better late than never.

Anyone have a technical understanding of how the cycles that were unused for the primary thread and diverted to the secondary logical thread are going to be utilized solely for the primary thread? Is the removal of HT just to reduce die area or is something being changed to minimize lost cycles during thread stalls?

I mean other than Apple doesn't do it.
Reduced die space, cleaner design, less vulnerabilities, faster validation, higher ST, etc. HT is basically hardware based thread context switching thats employed when the h/w scheduler (like thread director) can't find a free core to assign the thread. When HT isn't available in the CPU, the OS scheduler itself executes a s/w based context switch which is slower but minimizes lost cycles. But since the new CPUs have tons of real cores, the s/w based context switching overhead is very minimal which makes HT totally redundant in clients.
 
Last edited:

naukkis

Senior member
Jun 5, 2002
722
610
136
Anyone have a technical understanding of how the cycles that were unused for the primary thread and diverted to the secondary logical thread are going to be utilized solely for the primary thread? Is the removal of HT just to reduce die area or is something being changed to minimize lost cycles during thread stalls?

I mean other than Apple doesn't do it.

Intel HT is symmetrical threading. Both threads are equal, every other clock cycle instructions are feed from different thread. There aren't primary/secondary thread but both threads are executed at speed that is a bit more than half of that execution of single thread on that core.
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,506
3,651
96
Those cases where single-thread speed doesn't matter they should drop big cores and use just more e-cores. Actually Intel is doing it right now.
Atoms have rather castrated featureset and middling perf in just a ton of workloads and they won't replace mainline Xeon that way.
You still need big cores in many-many places. The loss of SMT hurts there.
 
Reactions: Tlh97 and Kepler_L2

dullard

Elite Member
May 21, 2001
25,126
3,516
126
Less validation work and you dupe a bit less structures.
SMT area/power impact was overall negligable, you pay in validation costs/times mostly.
Don't forget the non-negligible part: the cache. When two threads share the cache for a core, you get half as much cache for each thread and cache thrashing is much more likely. That means either significantly less performance per thread or you need to have significantly more cache than you would otherwise need (more area, more expense, and more cache latency). Hyperthreading can be a nice performance boost in some cases, but it comes with some significant drawbacks.
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,506
3,651
96
Don't forget the non-negligible part: the cache. When two threads share the cache for a core, you get half as much cache for each thread and cache thrashing is much more likely. That means either significantly less performance per thread or you need to have significantly more cache than you would otherwise need (more area, more expense, and more cache latency). Hyperthreading can be a nice performance boost in some cases, but it comes with some significant drawbacks.
SMT-friendly workloads nuke your caches anyway (stuff like server-side Java and other JITs etc).
The real SMT drawbacks are security (cloud guys are anal about that) and validation time.
 

SiliconFly

Golden Member
Mar 10, 2023
1,062
548
96
Don't forget the non-negligible part: the cache. When two threads share the cache for a core, you get half as much cache for each thread and cache thrashing is much more likely. That means either significantly less performance per thread or you need to have significantly more cache than you would otherwise need (more area, more expense, and more cache latency). Hyperthreading can be a nice performance boost in some cases, but it comes with some significant drawbacks.
Cache & TLB trashing happens even on non-HT processors (and not just x86). When a OS executes a s/w based context switch, it has to invalidate the cache contents and tlb for the incoming thread. Even in cases where the cache is directly mapped, the contents won't need to be flushed but will still need to be populated with new info related to the incoming thread. The penalty still exists in one form or the other.
 
Reactions: Tlh97
Jul 27, 2020
16,817
10,764
106
I have been thinking about the rumored removal of HT from ARL
While HT may seem like an unnecessary headache on desktop for Intel and maybe AMD, in mobile CPUs, Intel may keep HT alive for a few more years simply because it's the cheapest way to advertise more cores to consumers without incurring a significant area penalty of replacing the HT virtual cores with physical efficiency cores. I used a Core i5-1235U Dell laptop recently and its BIOS had no setting to turn off HT which I found to be pretty weird. It was an Inspiron laptop. It's like Intel doesn't want the majority of its users working without HT.

Another factor is the core occupancy determination by Intel Thread Director. Suppose Windows is using the P-core for something so that core is "awake". A lightweight thread needs to do something at the same time. Does the ITD wake up a sleeping efficiency core or does it allocate the virtual HT core of the active P-core to that thread? I'm thinking the latter would be a more efficient use of the available resources and it could even save time if the lightweight thread is quick to finish its task in less time it takes to context switch and wake up an efficiency core.

Then there's the rumor about rentable units. Let's suppose they really help increase performance similar to or even better than HT. But because adjacent idle core resources are getting rented out, this will wake those cores up more often and thus power efficiency will take some hit. What if ARL desktop has rentable units and no HT while ARL mobile has HT and no rentable units? If the silicon area dedicated to enabling the rentable unit functionality is similar to HT's silicon area requirements, Intel could put both on the compute die and enable one or the other depending on their use case and targeted market.
 

SiliconFly

Golden Member
Mar 10, 2023
1,062
548
96
Then there's the rumor about rentable units. Let's suppose they really help increase performance similar to or even better than HT...
Far far better than HT. But, I don't think Rentable Units is still ready yet. At least not for the next couple of years for sure. This technology is radical, disruptive and complex, Intel has to completely rethink some of the foundational design elements. There were rumors that Nova Lake might feature RU, but after reading about the tech, I don't think Nova Lake is gonna get it. And most importantly, RU flies in the face of logic and promises too much. At this point, I have to say, RU is just pure smoke fellas. Don't take it too seriously until Intel says otherwise in no uncertain terms.

I mean, instead of running x86 instructions in physical cores, RU runs virtual instructions in virtual RU cores (mapped 1-to-1) after translating the x86 instructions to virtual instruction using a translation layer. Sounds tedious and inefficient, but not really. Most of it is actually doable today. The problem arises when the x86 core has to emulate a virtual RU core. A CPU with physical RU cores can do all this and they even have a proof of concept to show for it. But a x86 core emulating a RU core at the hardware level is just absurd. An architectural nightmare. Sounds insane. It's like a x86 core casually emulating a ARM core simultaneously! Yikes!

But hypothetically speaking, and assuming Intel can get it working, and assuming that it works at theoretical maximum and assuming that the threads are one hundred percent RU friendly and perfectly sliceable and all conditions are optimal, here it how it goes:

(1) When the CPU executes a single thread on a single core, the thread executes at the usual speed of 1 ST. This is normal. Nothing unusual about it.

(2) But when RU kicks in, it slices that one single thread into multiple pieces and executes the multiple pieces of that single thread simultaneously on multiple RU cores!!!!! Yikes!

i.e, if a single thread is cut into 4 smaller pieces and executed on 4 different RU cores simultaneously, it executes at the speed of 4 ST (or 1/4th the time) or 400% boost. Pick which ever number you prefer. But this is the concept of RU.

(3) And there are limitations. For example, how do we "cut" or slice a single thread? All these years I didn't even know that was possible, cos it's actually not that simple. Even if it were true, I still don't think it can be done in a generic way. Maybe on specific workloads, but definitely not on everything.

RU promises too much people. Hence, I don't think it's even real. Too many things just don't add up. Probably just pure smoke! Rumors thats somehow gotten mainstream. I'm definitely not gonna believe in it until I see something real.
 

Doug S

Platinum Member
Feb 8, 2020
2,320
3,678
136
The reason Intel did HT was to increase MT throughput, they were pretty clear about that when they introduced it. They don't need that anymore with their E cores.

Look at it this way. MT throughput is always going to be power limited - you can't run every core at its max frequency in a CPU with a lot of cores. So you (or rather Intel's chip designers) have to ask yourself, where do I get the best increase in performance for each additional watt of power I can pump into the chip?

I'll bet Intel's designers did the math/simulations/benchmarks and determined that if they disabled HT and used the power saved by that to spin up a few more E cores, they got better throughput. What's more, it wouldn't suffer from the vagaries of HT performance where on average it helps but in the wide world of MT workloads there are some where it helps more and some where it actually HURTS. One nice advantage of an extra E core is that it is almost impossible to come up with a benchmark where that will hurt. Maybe it won't help (i.e. you're maxing out memory bandwidth) but you won't see the benchmarks where it makes things worse like you do with HT.
 

Saylick

Diamond Member
Sep 10, 2012
3,217
6,585
136

FlameTail

Platinum Member
Dec 15, 2021
2,356
1,276
106

Probably something similar to this. We called it "reverse Hyperthreading".

That inverse hyperthreading is wild stuff.

If Intel can get that working, they'll become the undisputed king of Single Thread performance.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,804
3,266
136
Those cases where single-thread speed doesn't matter they should drop big cores and use just more e-cores. Actually Intel is doing it right now.
Except for all those workloads that have high latency but also need lots of brawn like relational DB's or generally anything in the server space that is dealing with I/O.

I cant wait for 1000's of terribly performing kubernetes containers running on 1000's of average performing core. But im cloud scale!!!!! 2024 IT is lit.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,828
3,659
136
Except for all those workloads that have high latency but also need lots of brawn like relational DB's or generally anything in the server space that is dealing with I/O.

I cant wait for 1000's of terribly performing kubernetes containers running on 1000's of average performing core. But im cloud scale!!!!! 2024 IT is lit.
Isn't Skymont targeting Golden Cove level of performance?

Hardly average performing if that is indeed the case.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |