Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	12 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop & Mobile H&HX	Mobile U Only	Mobile H
Process Node	Intel 4	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	8P + 16E	4P + 4E	4P + 8E
LLC	24 MB	36 MB ?	12 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

adroc_thurston · Friday at 10:28 PM

511 said:
If only they weren't a bunch of idiot in Intel DC GPU space not cancelling everything.

Well, it's dead because none of it was competitive.

511 · Friday at 10:51 PM

And it is their fault as well cause they Mess up product definition for GPUs.

adroc_thurston · Friday at 11:09 PM

511 said:
And it is their fault as well cause they Mess up product definition for GPUs.

Well no their IP is just not up to snuff.

511 · Friday at 11:14 PM

adroc_thurston said:
Well no their IP is just not up to snuff.

That's not true tbf if we are talking about Xe3 and beyond but Xe1 was dud and Xe2 improved upon it by very much also by IP I mean the architecture not the physical implementation which sucks big time

ondma · Saturday at 2:32 AM

LightningZ71 said:
My comment was directed specifically at the Gen on Gen performance difference between vanilla non-x3d Zen5 vs. vanilla non-x3d Zen6. There are only three cases where I would expect an X3D parts to be slower in ST performance than it's predecessor or the non-x3d sibling:
1- notable peak clock speed deficit, largely gone with Zen5.
2-Thermal throttling due to heavy MT loads running concurrently or poor cooling leading to heat soak. The vanilla part should generate slightly less thermal load and should maintain slightly higher clocks.
3- a weird corner case that exposes the minor latency hit that the 3d cache causes.

My argument for Zen6 is that, if the rumors are true, the 12 core CCX will have 48MB L3 cache at a comparable latency to the 8 core 32MB L3 CCX in Zen5. The 50% larger L3 would theoretically be available for a pure ST scenario, helping any apps that are dependent on it. It should also be less affected by cache pollution as the cache is larger and has more room to tolerate it with. Add in the expected 10% pic improvement from the rumor slide and it should be able to best Arrow Lake too.

I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

Kepler_L2 · Saturday at 2:48 AM

ondma said:
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

You can look at Zen2 vs Zen3, both had 32MB L3 but on Zen2 only 16MB were available for each core due to split CCX design.

adroc_thurston · Saturday at 2:52 AM

ondma said:
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

You want more cache in general for 1T or gaming, and more cache per core for anything nT.
Venice-D goes to 4M L3@core despite a generational membw bump for a reason.

Joe NYC · Saturday at 3:53 AM

511 said:
@Kepler_L2 it was leaked that bLLC would be 144MB L3 but is it using same tech as AMD or is it Intel foundry tech.

TSMC, most likely. Brute force - just a huge die full of SRAM.

511 · Saturday at 4:06 AM

Joe NYC said:
TSMC, most likely. Brute force - just a huge die full of SRAM.

Yup around 20-25mm2 extra required for such a large amount of SRAM.

adroc_thurston · Saturday at 4:09 AM

511 said:
Yup around 20-25mm2 extra required for such a large amount of SRAM.

More, modern SRAM macros need a ton of assist circuitry.

dangerman1337 · Saturday at 4:27 AM

If that 144MB L3 compute die exists wonder what the core count would be? If Intel is releasing BTL-S to Consumer (this seems to be the case: https://videocardz.com/newz/intel-preparing-budget-core-5-120f-6-core-cpu-featuring-only-p-cores) I wonder if they make NVL & RZL extra huge cache versions just be P Core only. Would feel weird to do 8+16 Core tile.

511 · Saturday at 4:52 AM

adroc_thurston said:
More, modern SRAM macros need a ton of assist circuitry.

as long as it offers the best gaming performance it is gonna sell 30-35mm2 extra die will be worth it

dangerman1337 · Saturday at 5:02 AM

511 said:
as long as it offers the best gaming performance it is gonna sell 30-35mm2 extra die will be worth it

Hopefully they do it P Core Only, imagine 12 or more Griffin Cove Cores with RZL-S with all that extra cache and crazy fast DDR5 with low latency? Insane gaming performance.

511 · Saturday at 5:35 AM

dangerman1337 said:
Hopefully they do it P Core Only, imagine 12 or more Griffin Cove Cores with RZL-S with all that extra cache and crazy fast DDR5 with low latency? Insane gaming performance.

This is a 8+16 144MB tile tbh for Nova Lake Griffin Cove / Razer Lake is 2027

DavidC1 · Saturday at 6:17 AM

511 said:
Why is it not that's1MB increase for 1 Cycle Skymont is 19 Cycles 4MB L2.

Latency is also affected by design choices, so you can't compare 1:1 with Skymont, which is lower power, and is also a shared cache for 4x cores.

1 cycle increase for mere 33% capacity increase is nothing good. Even if latency stayed the same, I wouldn't call it impressive, and actually even against Skymont it's just 1 cycle reduction. You'd think a "performance" focused core in 2027 would be better than E core in 2025.

The last Intel core with impressive cache structure was Sandy Bridge. It could overclock to 4.5GHz, the cache was at same clock as the core, and at 8MB capacity had 25 cycle latency, despite being an L3 cache. I wonder how it would fare with 18A?

511 said:
If only they weren't a bunch of idiot in Intel DC GPU space not cancelling everything.

That's cause they weren't selling. Lot of vendors were on board with mobile ARC GPUs until it found the perf/W was bad and the drivers were atrocious. The last famous Intel DC GPU was Ponte Vecchio, which had enormously complicated packaging that made Lunarlake's MoP complaint like it added a penny to BoM and was maybe 20% faster in cornercase scenarios.

The last JPR dGPU marketshare showed Intel isn't even blip in the radar now. They are 0% according to them. Probably sold few thousands to low tens of thousands. The best case is 0.49%, since numbers are rounded down.

511 · Saturday at 6:35 AM

DavidC1 said:
Latency is also affected by design choices, so you can't compare 1:1 with Skymont, which is lower power, and is also a shared cache for 4x cores.

1 cycle increase for mere 33% capacity increase is nothing good. Even if latency stayed the same, I wouldn't call it impressive, and actually even against Skymont it's just 1 cycle reduction. You'd think a "performance" focused core in 2027 would be better than E core in 2025.

It's good tbh also it's shares between 2 cores as well also bout the P core vs E core in terms of IPC I would think that P and E core have similar IPC by H2 26 when Nova Lake launches.

DavidC1 said:
The last Intel core with impressive cache structure was Sandy Bridge. It could overclock to 4.5GHz, the cache was at same clock as the core, and at 8MB capacity had 25 cycle latency, despite being an L3 cache. I wonder how it would fare with 18A?

8 MB at 25 Cycle is pretty Good I wonder what's the Cycles will be for NVL L3 anything under 50 would be Good imo.

DavidC1 said:
That's cause they weren't selling. Lot of vendors were on board with mobile ARC GPUs until it found the perf/W was bad and the drivers were atrocious. The last famous Intel DC GPU was Ponte Vecchio, which had enormously complicated packaging that made Lunarlake's MoP complaint like it added a penny to BoM and was maybe 20% faster in cornercase scenarios.

Not to mention ARC has been delayed so much.

DavidC1 said:
The last JPR dGPU marketshare showed Intel isn't even blip in the radar now. They are 0% according to them. Probably sold few thousands to low tens of thousands. The best case is 0.49%, since numbers are rounded down.

Well maybe they already shipped in Q4 25 when they were 1% and after that low shipments.

DavidC1 · Saturday at 6:44 AM

511 said:
It's good tbh also it's shares between 2 cores as well also bout the P core vs E core in terms of IPC I would think that P and E core have similar IPC by H2 26 when Nova Lake launches.

In Sandy Bridge, it went from 41 cycles to 25 cycles, nearly a 40% reduction, while clocking much higher in the new Turbo mode consistently as well.

They aren't losing money on ARC because of high BoM, that is nonsense. They are losing money on ARC because basically there's no volume. They could have $50 BoM and it would still lose them money.

AcrosTinus · Saturday at 7:01 AM

511 said:
Yeah but not anymore going forward the private alley is going away 2P people have to share 😂.

If only they weren't a bunch of idiot in Intel DC GPU space not cancelling everything.

Why is it not that's1MB increase for 1 Cycle Skymont is 19 Cycles 4MB L2.

Their Heydey died with 10nm delays lol.

I have a feeling that this is the secret on how they were able to increase the P-core count. Instead of having a stop per P-Core and E-Core cluster, 2P-Cores share a stop and maybe even the E-Core cluster is now 8 cores big. This sounds more realistic to me than two compute dies with two separated ring-buses? with each having 12stops.

Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.

511 · Saturday at 7:08 AM

DavidC1 said:
In Sandy Bridge, it went from 41 cycles to 25 cycles, nearly a 40% reduction, while clocking much higher in the new Turbo mode consistently as well.

Didn't know that it is insane improvement lol.

DavidC1 said:
They aren't losing money on ARC because of high BoM, that is nonsense. They are losing money on ARC because basically there's no volume. They could have $50 BoM and it would still lose them money.

Yes but I think the volume they are using now is due to the prepayment they did for Arc.

AcrosTinus said:
I have a feeling that this is the secret on how they were able to increase the P-core count. Instead of having a stop per P-Core and E-Core cluster, 2P-Cores share a stop and maybe even the E-Core cluster is now 8 cores big. This sounds more realistic to me than two compute dies with two separated ring-buses? with each having 12stops.

Yes also I doubt 8E core cluster 12 -> 8 is a good amount of reduction for cores in Ring.

AcrosTinus said:
Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.

Each die has separate ring and they are connecting using some shared fabric.

DavidC1 · Saturday at 7:14 AM

AcrosTinus said:
Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.

And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.

511 said:
Didn't know that it is insane improvement lol.

Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?

Io Magnesso · Saturday at 9:00 AM

I want more IPC performance for the X86...

Io Magnesso · Saturday at 9:22 AM

DavidC1 said:
And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.

Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?

There are rumors that the NEX division will be given up, but the network/wifi I don't think it's possible to let go
I think that the dismantling of the NEX division will be merely a change of personnel within the Intel company.

AcrosTinus · Saturday at 9:35 AM

DavidC1 said:
And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.

Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?

That is true, Intel introduced the mesh in HEDT and benchmarks show that if clocked high enough the penalty compared to the ring are minimal but the scaling it vastly superior. Had they invested some time in a mainstream variant, the mesh could have been vastly more performant but who knows....

AMD being on a mesh is news to me, this explains the sub 20ns core to core latency within a CCD.

Doug S · Saturday at 2:51 PM

ondma said:
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

It is the same cache per core only if you use all cores.

In the world most of us occupy our CPUs are typically loading only a few cores at a time so you get more cache per core in those circumstances. But even if you're the outlier who is often running all cores at 100% you aren't any worse off than before and now you have 50% more cores for your outlier tasks.

Thibsie · Saturday at 3:14 PM

Doug S said:
It is the same cache per core only if you use all cores.

In the world most of us occupy our CPUs are typically loading only a few cores at a time so you get more cache per core in those circumstances. But even if you're the outlier who is often running all cores at 100% you aren't any worse off than before and now you have 50% more cores for your outlier tasks.

Yeah, but might thread 'eat' the second core cache ? I mean, both core will compte for cache then, no ?
Also, more read/write ports could slow cache access (speed/latency) or augment complexity ?
This might be completely false, I dunno much about cache workings.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Senior member

Platinum Member

Golden Member

Platinum Member

Golden Member

Member

Platinum Member

Golden Member

Member

Member

Member

Diamond Member

Golden Member