Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 801 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
762
715
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4TSMC N3BTSMC N3BIntel 18A
DateQ4 2023Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P8P + 16E4P + 4E4P + 8E
LLC24 MB36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,023
  • LNL.png
    881.8 KB · Views: 25,513
Last edited:

ondma

Diamond Member
Mar 18, 2018
3,256
1,661
136
My comment was directed specifically at the Gen on Gen performance difference between vanilla non-x3d Zen5 vs. vanilla non-x3d Zen6. There are only three cases where I would expect an X3D parts to be slower in ST performance than it's predecessor or the non-x3d sibling:
1- notable peak clock speed deficit, largely gone with Zen5.
2-Thermal throttling due to heavy MT loads running concurrently or poor cooling leading to heat soak. The vanilla part should generate slightly less thermal load and should maintain slightly higher clocks.
3- a weird corner case that exposes the minor latency hit that the 3d cache causes.

My argument for Zen6 is that, if the rumors are true, the 12 core CCX will have 48MB L3 cache at a comparable latency to the 8 core 32MB L3 CCX in Zen5. The 50% larger L3 would theoretically be available for a pure ST scenario, helping any apps that are dependent on it. It should also be less affected by cache pollution as the cache is larger and has more room to tolerate it with. Add in the expected 10% pic improvement from the rumor slide and it should be able to best Arrow Lake too.
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.
 

Kepler_L2

Senior member
Sep 6, 2020
832
3,369
136
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.
You can look at Zen2 vs Zen3, both had 32MB L3 but on Zen2 only 16MB were available for each core due to split CCX design.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,787
8,121
96
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.
You want more cache in general for 1T or gaming, and more cache per core for anything nT.
Venice-D goes to 4M L3@core despite a generational membw bump for a reason.
 

DavidC1

Golden Member
Dec 29, 2023
1,509
2,472
96
Why is it not that's1MB increase for 1 Cycle Skymont is 19 Cycles 4MB L2.
Latency is also affected by design choices, so you can't compare 1:1 with Skymont, which is lower power, and is also a shared cache for 4x cores.

1 cycle increase for mere 33% capacity increase is nothing good. Even if latency stayed the same, I wouldn't call it impressive, and actually even against Skymont it's just 1 cycle reduction. You'd think a "performance" focused core in 2027 would be better than E core in 2025.

The last Intel core with impressive cache structure was Sandy Bridge. It could overclock to 4.5GHz, the cache was at same clock as the core, and at 8MB capacity had 25 cycle latency, despite being an L3 cache. I wonder how it would fare with 18A?
If only they weren't a bunch of idiot in Intel DC GPU space not cancelling everything.
That's cause they weren't selling. Lot of vendors were on board with mobile ARC GPUs until it found the perf/W was bad and the drivers were atrocious. The last famous Intel DC GPU was Ponte Vecchio, which had enormously complicated packaging that made Lunarlake's MoP complaint like it added a penny to BoM and was maybe 20% faster in cornercase scenarios.

The last JPR dGPU marketshare showed Intel isn't even blip in the radar now. They are 0% according to them. Probably sold few thousands to low tens of thousands. The best case is 0.49%, since numbers are rounded down.
 
Last edited:

511

Platinum Member
Jul 12, 2024
2,395
2,105
106
Latency is also affected by design choices, so you can't compare 1:1 with Skymont, which is lower power, and is also a shared cache for 4x cores.

1 cycle increase for mere 33% capacity increase is nothing good. Even if latency stayed the same, I wouldn't call it impressive, and actually even against Skymont it's just 1 cycle reduction. You'd think a "performance" focused core in 2027 would be better than E core in 2025.
It's good tbh also it's shares between 2 cores as well also bout the P core vs E core in terms of IPC I would think that P and E core have similar IPC by H2 26 when Nova Lake launches.
The last Intel core with impressive cache structure was Sandy Bridge. It could overclock to 4.5GHz, the cache was at same clock as the core, and at 8MB capacity had 25 cycle latency, despite being an L3 cache. I wonder how it would fare with 18A?
8 MB at 25 Cycle is pretty Good I wonder what's the Cycles will be for NVL L3 anything under 50 would be Good imo.
That's cause they weren't selling. Lot of vendors were on board with mobile ARC GPUs until it found the perf/W was bad and the drivers were atrocious. The last famous Intel DC GPU was Ponte Vecchio, which had enormously complicated packaging that made Lunarlake's MoP complaint like it added a penny to BoM and was maybe 20% faster in cornercase scenarios.
Not to mention ARC has been delayed so much.
The last JPR dGPU marketshare showed Intel isn't even blip in the radar now. They are 0% according to them. Probably sold few thousands to low tens of thousands. The best case is 0.49%, since numbers are rounded down.
Well maybe they already shipped in Q4 25 when they were 1% and after that low shipments.
 
Reactions: Io Magnesso

DavidC1

Golden Member
Dec 29, 2023
1,509
2,472
96
It's good tbh also it's shares between 2 cores as well also bout the P core vs E core in terms of IPC I would think that P and E core have similar IPC by H2 26 when Nova Lake launches.
In Sandy Bridge, it went from 41 cycles to 25 cycles, nearly a 40% reduction, while clocking much higher in the new Turbo mode consistently as well.

They aren't losing money on ARC because of high BoM, that is nonsense. They are losing money on ARC because basically there's no volume. They could have $50 BoM and it would still lose them money.
 
Last edited:

AcrosTinus

Member
Jun 23, 2024
182
176
76
Yeah but not anymore going forward the private alley is going away 2P people have to share 😂.


If only they weren't a bunch of idiot in Intel DC GPU space not cancelling everything.


Why is it not that's1MB increase for 1 Cycle Skymont is 19 Cycles 4MB L2.

Their Heydey died with 10nm delays lol.
I have a feeling that this is the secret on how they were able to increase the P-core count. Instead of having a stop per P-Core and E-Core cluster, 2P-Cores share a stop and maybe even the E-Core cluster is now 8 cores big. This sounds more realistic to me than two compute dies with two separated ring-buses? with each having 12stops.

Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.
 
Reactions: Io Magnesso

511

Platinum Member
Jul 12, 2024
2,395
2,105
106
In Sandy Bridge, it went from 41 cycles to 25 cycles, nearly a 40% reduction, while clocking much higher in the new Turbo mode consistently as well.
Didn't know that it is insane improvement lol.
They aren't losing money on ARC because of high BoM, that is nonsense. They are losing money on ARC because basically there's no volume. They could have $50 BoM and it would still lose them money.
Yes but I think the volume they are using now is due to the prepayment they did for Arc.
I have a feeling that this is the secret on how they were able to increase the P-core count. Instead of having a stop per P-Core and E-Core cluster, 2P-Cores share a stop and maybe even the E-Core cluster is now 8 cores big. This sounds more realistic to me than two compute dies with two separated ring-buses? with each having 12stops.
Yes also I doubt 8E core cluster 12 -> 8 is a good amount of reduction for cores in Ring.
Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.
Each die has separate ring and they are connecting using some shared fabric.
 
Reactions: AcrosTinus

DavidC1

Golden Member
Dec 29, 2023
1,509
2,472
96
Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.
And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.
Didn't know that it is insane improvement lol.
Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?
 

Io Magnesso

Member
Jun 12, 2025
30
14
36
And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.

Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?
There are rumors that the NEX division will be given up, but the network/wifi I don't think it's possible to let go
I think that the dismantling of the NEX division will be merely a change of personnel within the Intel company.
 

AcrosTinus

Member
Jun 23, 2024
182
176
76
And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.

Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?
That is true, Intel introduced the mesh in HEDT and benchmarks show that if clocked high enough the penalty compared to the ring are minimal but the scaling it vastly superior. Had they invested some time in a mainstream variant, the mesh could have been vastly more performant but who knows....

AMD being on a mesh is news to me, this explains the sub 20ns core to core latency within a CCD.
 

Doug S

Diamond Member
Feb 8, 2020
3,192
5,472
136
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

It is the same cache per core only if you use all cores.

In the world most of us occupy our CPUs are typically loading only a few cores at a time so you get more cache per core in those circumstances. But even if you're the outlier who is often running all cores at 100% you aren't any worse off than before and now you have 50% more cores for your outlier tasks.
 
Reactions: Tlh97 and Kepler_L2

Thibsie

Golden Member
Apr 25, 2017
1,068
1,236
136
It is the same cache per core only if you use all cores.

In the world most of us occupy our CPUs are typically loading only a few cores at a time so you get more cache per core in those circumstances. But even if you're the outlier who is often running all cores at 100% you aren't any worse off than before and now you have 50% more cores for your outlier tasks.

Yeah, but might thread 'eat' the second core cache ? I mean, both core will compte for cache then, no ?
Also, more read/write ports could slow cache access (speed/latency) or augment complexity ?
This might be completely false, I dunno much about cache workings.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |