Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 296 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
663
540
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E08 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Arrow Lake Refresh (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXDesktop OnlyMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2025 ?Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E8P + 32E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ??8 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

 

Attachments

  • PantherLake.png
    283.5 KB · Views: 23,960
  • LNL.png
    881.8 KB · Views: 25,430
Last edited:

AMDK11

Senior member
Jul 15, 2019
234
153
116
You took my words about the scheduler too literally. Never mind.

Look carefully again. You can clearly see at the beginning, where the FPU part is, 4 execution ports and then 6 execution ports, which gives a total of 10. Right? The remaining 8 are from SD and AGU.
 

Cheesecake16

Junior Member
Aug 5, 2020
4
18
51
You took my words about the scheduler too literally. Never mind.

Look carefully again. You can clearly see at the beginning, where the FPU part is, 4 execution ports and then 6 execution ports, which gives a total of 10. Right? The remaining 8 are from SD and AGU.
Except that's not how Intel sets up their math scheduler...
Here is what Golden Cove's Math Scheduler looks like.....
Notice how the FP ALUs are on the same ports as the Integer ALUs..... That's what I expect of Lion Cove.....

 

Cheesecake16

Junior Member
Aug 5, 2020
4
18
51
C&C is continuing their MTL investigation, this time with an article on the NPU:

In summary, it looks like the NPU is of limited use because of the data types it supports (or rather, doesn't support). And of the cases it does support, the NPU offers lower power but not necessarily higher performance. The author seems to think that the iGPU is the better approach here because it's more powerful and more flexible, even at the cost of higher power because there's many situations where you can plug in a laptop these days. That is, until Intel develops a NPU which does cover more use cases with higher performance while continuing to use lower power.
Yeah..... Getting the NPU to work was a pain and in the end it ended up just being faster to run stuff on the iGPU....
Maybe with LNL, STX, and SDXE that will change and we will have to test it when those CPUs are available, but for right now we just didn't see the point of the NPUs in MTL or in PHX/HWK..... they just aren't fast or efficient enough to justify the headaches of programming them......
 

Hulk

Diamond Member
Oct 9, 1999
4,228
2,016
136
Adding another set of cores in the mix does not change the outcome, the work done by the P cores can still be optimized. Consider you run a highly parallelized workload on a 8P+16E / 24T CPU, so SMT is disabled. The task is split between the P and E cores based on predetermined ratios, and the work done by the 8P/8T cores is finite. That same work can be done more efficiently by 8P/16T with lower clocks and lower voltage, as long as there's good scaling from 8T to 16T. The work done by the E cores is already accounted for.

I think SMT is the victim of a misunderstanding based on the power race in the recent years. Yes, SMT will increase power and temps when allowed to push the package power higher. However, when enforcing a sane power limit, SMT increases efficiency instead.
Theoretically yes, you are correct.

But in reality there are cases where this isn't the best option.

For example, with my 14900K I have HT turned off. I can achieve 5.5/4.3 in a more stable manner and with lower temps than with HT on. While there are very few applications that will slam all cores with HT on, when it happens BIOS settings that are stable with HT off will cause a restart with HT on.

The tiny bit of performance lost by having HT off in those few applications is more than made up for by having a cooler, lower voltage, more stable rig with less chance of CPU degradation.

If I was simply tuning the MT performance and efficiency then I would leave HT on and limit clocks as you noted. But setting up for that usage scenario for daily usage reduces overall performance in the many apps that still rely heavily on ST performance.
 

Hulk

Diamond Member
Oct 9, 1999
4,228
2,016
136
Raptor Cove is 40-45% in overall, where the Integer gap is closer and FP gap is large. It's something like 25% Int and 50-60% FP.

"Raptormont" gets 1-3% gain while Crestmont gets 4-6% gain. So a 30% gain with Skymont as did with Atom-based predecessors gets us to the "aiming for ADL" claim on Twitter. I think SKT will be able to reach Golden Cove similar to GMT reaching Skylake.

It means 10-15% faster than Golden Cove for Int while being 10-15% slower in FP. Consequently it means Sierra Glen(which is Crestmont without the 6-wide retire/allocate, or IOTW Gracemont) is similar to 10-15% faster per core than Skylake and is an excellent Cloud core.

If we extrapolate that to Darkmont-based Clearwater Forest, you essentially have an 18A 144-288 core better-than Golden Cove core chip.

On a side note, I speculate the possibility that they aren't backing down on clocks with Skymont hence the greater than expected core size while they are for Lion Cove.

Good analysis there. I know from reading your posts that you have a deep understanding of this topic so I appreciate the reply.

That type of "mont" IPC increase is mouth-watering...
 

mikk

Diamond Member
May 15, 2012
4,141
2,154
136
Edit:
It seems that Skymont has a 3x 3-Way decoder (Gracemont and Crestmont 2x 3-Way).

Not too surprising given that Raichu told it 6 months ago. And yes indeed it looks like there are 3 decoder in the LNL picture, I can see it.

It is based on three 3-way decoder clusters and the prediction bandwidth looks like has obvious improvement (more than 2X).
 

AMDK11

Senior member
Jul 15, 2019
234
153
116
Except that's not how Intel sets up their math scheduler...
Here is what Golden Cove's Math Scheduler looks like.....
Notice how the FP ALUs are on the same ports as the Integer ALUs..... That's what I expect of Lion Cove.....

View attachment 97861
I know how Intel up to RedwoodCove uses a schedule for ALU and FP.

In the LunarLake diagram, the schedule in the LionCove core appears to be separate, and this may be a big change from what is currently used. FP Units and ALU Units have separate ports. Alternatively, the FP part also has ALU.

Either way, the ALU ports and FP ports give a total of 10 ports, as you can see in the diagram. Unless you think it's fake.
 

Cheesecake16

Junior Member
Aug 5, 2020
4
18
51
I know how Intel up to RedwoodCove uses a schedule for ALU and FP.

In the LunarLake diagram, the schedule in the LionCove core appears to be separate, and this may be a big change from what is currently used. FP Units and ALU Units have separate ports. Alternatively, the FP part also has ALU.

Either way, the ALU ports and FP ports give a total of 10 ports, as you can see in the diagram. Unless you think it's fake.
Like I said, I think you are reading too much into it.....

At this point we don't know enough about what Lion Cove (or Skymont for that matter) actually looks like to assume if the image actually is showing anything notable.... sure there are folks like Raichu on Twitter that claim certain things but there is no "hard" evidence like GCC patches, LLVM patches, perf patches, or MSR documentation out yet other then the ISA manual that Intel puts out which isn't that helpful here (other then to imply that Skymont doesn't have AVX10 but that's a different discussion)........

But if I was to speculate based on the assumption that there is in fact 10 math ports split between 6 integer ports and 4 vector ports, then I would assume that the unified math scheduler is no more and that they have split the schedulers up into a 6 port scheduler for integer operations and a 4 port scheduler for the vector operations while keeping the individual load and store schedulers........

Which is starting to look a lot like that fake Zen 5 slide MLID claimed was real ironically enough....
Except here it would be a single vector scheduler instead of 2 vector schedulers in the slide and a split load and store scheduler instead of a unified AGU scheduler in the slide........
 

AMDK11

Senior member
Jul 15, 2019
234
153
116
The diagram most likely comes from Intel. You can clearly see 10 4+6 ports.

You can also see three blocks under AGU+SD. It looks to me like L1-D and L2 divided into 512KB + 2.5MB.

In the UOP cache decoding and sending location, I see 24 entries. GoldenCove has 12 items, including 6 decoders and 8 uop caches.

Skymont also appears to have fewer execution ports than Gracemont and Crestmont.
 

coercitiv

Diamond Member
Jan 24, 2014
6,213
11,954
136
For example, with my 14900K I have HT turned off. I can achieve 5.5/4.3 in a more stable manner and with lower temps than with HT on. While there are very few applications that will slam all cores with HT on, when it happens BIOS settings that are stable with HT off will cause a restart with HT on.

The tiny bit of performance lost by having HT off in those few applications is more than made up for by having a cooler, lower voltage, more stable rig with less chance of CPU degradation.
Enforce temperature limits, enforce power limits, enforce current limits. My argument is you can have your CPU work at lower clocks and have both more performance and better stability. If you want a more stable and better lasting system, lower your max temp from 100C to something like 85-95C. The same applies to power, find the power target that suits your config, this way clocks will push to the max under light loads, then pull back under MT loads, then pull even lower when SMT is getting yields.

It's very weird to me to see a system pushed to the limit of stability and then have SMT blamed for tipping it over the edge. What are you doing up there in the first place?
 

Hulk

Diamond Member
Oct 9, 1999
4,228
2,016
136
Enforce temperature limits, enforce power limits, enforce current limits. My argument is you can have your CPU work at lower clocks and have both more performance and better stability. If you want a more stable and better lasting system, lower your max temp from 100C to something like 85-95C. The same applies to power, find the power target that suits your config, this way clocks will push to the max under light loads, then pull back under MT loads, then pull even lower when SMT is getting yields.

It's very weird to me to see a system pushed to the limit of stability and then have SMT blamed for tipping it over the edge. What are you doing up there in the first place?
My temps currently never go over 75C. I have tweaked quite a bit and this is where I ended up. I'm at a very safe voltage right now. Enabling HT would require more voltage even with other limits you specified in place otherwise the momentary voltage demand would reset or freeze the system. I learned about this over at Overclocking.net where most everybody turns off HT for best daily performance.

It comes down to the fact that it's simply easier to tune the P's without HT on and then rely on the 16E's for MT. Trying to do both gets messy quickly. Handbrake is one app that will "break" the tune quickly with HT on when you are undervolted and tuned for efficiency.
 
Reactions: igor_kavinski

Henry swagger

Senior member
Feb 9, 2022
374
239
86
C
Raptor Cove is 40-45% in overall, where the Integer gap is closer and FP gap is large. It's something like 25% Int and 50-60% FP.

"Raptormont" gets 1-3% gain while Crestmont gets 4-6% gain. So a 30% gain with Skymont as did with Atom-based predecessors gets us to the "aiming for ADL" claim on Twitter. I think SKT will be able to reach Golden Cove similar to GMT reaching Skylake.

It means 10-15% faster than Golden Cove for Int while being 10-15% slower in FP. Consequently it means Sierra Glen(which is Crestmont without the 6-wide retire/allocate, or IOTW Gracemont) is similar to 10-15% faster per core than Skylake and is an excellent Cloud core.

If we extrapolate that to Darkmont-based Clearwater Forest, you essentially have an 18A 144-288 core better-than Golden Cove core chip.

On a side note, I speculate the possibility that they aren't backing down on clocks with Skymont hence the greater than expected core size while they are for Lion Cove.
Clock speed will be key
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |