Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	8 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (20A)	Arrow Lake (N3B)	Arrow Lake Refresh (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop Only	Desktop & Mobile H&HX	Desktop Only	Mobile U Only	Mobile H
Process Node	Intel 4	Intel 20A	TSMC N3B	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Q1 2025 ?	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2025 ?	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	6P + 8E ?	8P + 16E	8P + 32E	4P + 4E	4P + 8E
LLC	24 MB	24 MB ?	36 MB ?	?	8 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

igor_kavinski · May 6, 2024

Joe NYC said:
When it comes to better hardware utilization, for better "AI", just using more CPU cores (that have already been there for years) more efficiently would be one area.

Then there is AVX, AVX512. There are more gaming PCs / CPUs that have AVX512 vs NPU.

This remains to be benchmarked. Can 24 CPU threads in ARL outsmart the embedded NPU with more TOPS?

AVX-512 in Ryzens, yes. But Intel will have to wait till AVX10 for their 2nd chance at AVX-512 in consumer desktops. Until then, they gotta depend on the NPU.

Hulk · May 6, 2024

AMDK11 said:
Skylake - SunnyCove
micro-ops(decode + uop cache) from 11 to 11 +0%
Dispatch/Rename from 4 to 5 +25%
execution ports from 8 to 10 +25%
With 2xFP/ALU + 2xALU, 1xS/D + 3xAGU
for 2xFP/ALU + 2xALU, 2xS/D + 4xAGU
IPC average +18%

SunnyCove - GoldenCove
micro-ops(decode + uop cache) from 11 to 14 +27%
Dispatch/Rname from 5 to 6 +20%
execution ports from 10 to 12 +20%
With 2xFP/ALU + 2xALU, 2xS/D + 4xAGU
for 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
FPU+ALU from 4 to 5 +25%
IPC average +19%

GoldenCove - LionCove
micro-ops(decode + uop cache) from 14 to 24 +71.4%
Dispatch/Rename from 6 to 8 +33.3%
execution ports from 12 to 18 +50%
With 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
up to 4xFPU, 6xALU, 2xS/D + 6xAGU
FPU+ALU from 5 to 10 +100%
IPC increase +??%

This is great information. Thank you for taking the time to organize and post it.

What is included in "decode + uop cache" and "dispatch/rename?"

How many decoders? How many uop entries?
Reorder buffers? In-Flight Loads/Stores?

I'm not fully following.

Saylick · May 6, 2024

AMDK11 said:
Skylake - SunnyCove
micro-ops(decode + uop cache) from 11 to 11 +0%
Dispatch/Rename from 4 to 5 +25%
execution ports from 8 to 10 +25%
With 2xFP/ALU + 2xALU, 1xS/D + 3xAGU
for 2xFP/ALU + 2xALU, 2xS/D + 4xAGU
IPC average +18%

SunnyCove - GoldenCove
micro-ops(decode + uop cache) from 11 to 14 +27%
Dispatch/Rname from 5 to 6 +20%
execution ports from 10 to 12 +20%
With 2xFP/ALU + 2xALU, 2xS/D + 4xAGU
for 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
FPU+ALU from 4 to 5 +25%
IPC average +19%

GoldenCove - LionCove
micro-ops(decode + uop cache) from 14 to 24 +71.4%
Dispatch/Rename from 6 to 8 +33.3%
execution ports from 12 to 18 +50%
With 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
up to 4xFPU, 6xALU, 2xS/D + 6xAGU
FPU+ALU from 5 to 10 +100%
IPC increase +??%

WCCFTech be like:

poke01 · May 6, 2024

Saylick said:
WCCFTech be like:

Bottom of the trash publication an the comment section is the worst of ANY site

Joe NYC · May 6, 2024

igor_kavinski said:
This remains to be benchmarked. Can 24 CPU threads in ARL outsmart the embedded NPU with more TOPS?

AVX-512 in Ryzens, yes. But Intel will have to wait till AVX10 for their 2nd chance at AVX-512 in consumer desktops. Until then, they gotta depend on the NPU.

Software support for hardware features lags by too much time to be relevant to make the hardware feature a selling point for the hardware.

Intel now talks a lot about "Centrino moment", but that hardware feature, WIFI, was built to the OS, and was ubiquitous for all applications as part of already widely used networking capabilities.

Support support (for a questionable feature) by 100s of ISV, gaming development teams is not happening in the short run.

AMDK11 · May 6, 2024

Fix for Skylake-SunnyCove:

Skylake - SunnyCove
micro-ops(decode + uop cache) from 11 to 11 +0%
Dispatch/Rename from 4 to 5 +25%
execution ports from 8 to 10 +25%
With 2xFP/ALU + 2xALU, 1xS/D + 3xAGU
for 3xFP/ALU + 1xALU, 2xS/D + 4xAGU
IPC average +18%

SunnyCove - GoldenCove
micro-ops(decode + uop cache) from 11 to 14 +27%
Dispatch/Rname from 5 to 6 +20%
execution ports from 10 to 12 +20%
With 3xFP/ALU + 1xALU, 2xS/D + 4xAGU
for 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
FPU+ALU from 4 to 5 +25%
IPC average +19%

GoldenCove - LionCove
micro-ops(decode + uop cache) from 14 to 24 +71.4%
Dispatch/Rename from 6 to 8 +33.3%
execution ports from 12 to 18 +50%
With 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
up to 4xFPU, 6xALU, 2xS/D + 6xAGU
FPU+ALU from 5 to 10 +100%
IPC increase +??%

Two LionCove diagrams from one LunarLake graphic:

Thibsie · May 6, 2024

I don't get it, how is AVX512 any substitute for an NPU ?

Mahboi · May 6, 2024

Some ops do get greatly accelerated by AVX 512. Some AI workloads go up by double in perf, some even more.

AMD Ryzen 7 7700X vs. Core i9 11900K AVX-512 Performance Analysis Review - Phoronix

www.phoronix.com

(no, it's not a substitute, but it helps)

Ghostsonplanets · May 6, 2024

https://twitter.com/x/status/1787572832219861422

https://twitter.com/x/status/1787571666945810485

"From what I hear LNL has good number of design wins in last quater...."

"Not going to put numbers because, earlier design wins itself are high... Having something that big on top of them .... I am not confident enough"

Lunar Lake getting a lot of design wins. It tracks with previous leaks that LNL would have 3x as much design wins as Meteor Lake.

Mahboi · May 6, 2024

poke01 said:
Bottom of the trash publication an the comment section is the worst of ANY site

Oh no, their automated AT bot has confused you with Kepler and released the next article!

Mahboi · May 6, 2024

Holy sheet I had never read their comment section, it truly is below Reddit.

Best quote: "You are the noob as there is nothing wrong with userbenchmark."

AMDK11 · May 6, 2024

Hulk said:
This is great information. Thank you for taking the time to organize and post it.

What is included in "decode + uop cache" and "dispatch/rename?"

How many decoders? How many uop entries?
Reorder buffers? In-Flight Loads/Stores?

I'm not fully following.

I gave a general summary of decoding + sending from uop Cache because the LionCove diagram is of too low quality and does not specify how much it is for the decoder.

GoldenCove has 14 uops, including 6 from the decoder and 8 from the uop Cache.

LionCove has 24 uops, but it is not 100% sure whether there are 8 from the decoder and 16 from the uop cache or maybe 10 from the decoder and 14 from the uopcache.

I provided the data that can be read from the LionCove diagram. Much is still unknown.

I won't be surprised at all if there is a 10-Way decoder because Skymont has 3x 3-Way, i.e. a total of 9-Way. But it can also be 8-Way.

I think Intel will present the LionCove microarchitecture in June, and as you can see, they already have core diagrams ready for presentation.

Ghostsonplanets · May 6, 2024

Not sure if this is reliable. Take with a huge grain of salt.

But, supposedly, ARL-U also gets the new N4P iGP tile from ARL/S (64 EU): N4(P?), XMX units added back and higher clocks.

Basically MTL-U/M with smaller Compute and GPU tile (nodelet shrink to Intel 3 and N4(P?) respectively) for higher yields and cheaper costs.

XMX units will probably provide the TOPs throughout for ARL-U to meet AI PC requirements. Cheaper alternative to Lunar Lake for mainstream U-series designs.

adroc_thurston · May 6, 2024

Ghostsonplanets said:
Cheaper alternative to Lunar Lake for mainstream U-series designs.

Mainstream is still raptor.
ARL-U is in a very weird position overall, the only boon is platform comparability with MTL-U.

Saylick · May 6, 2024

Mahboi said:
Holy sheet I had never read their comment section, it truly is below Reddit.

Best quote: "You are the noob as there is nothing wrong with userbenchmark."

If you thought reading that Prakhar guy's tweets made you lose brain cells, WTFTech's comments section devolves you to a Neanderthal.

I refuse to take anyone seriously who regularly posts in their comment's section, even if I see the same people making reasonable, logical statements on Xitter outside of it. I simply will not respect their opinion because they consciously made a decision to be a willing participant in that cesspool to begin with. It's like if I knew someone was actively participating in a Nazi get-together but was a normal behaving human being outside of it. It doesn't matter how that person behaves in a regular setting; they are a Nazi, regardless.

But what can you do. That comment section is probably where they get more than half their ad impressions, and WCCFTech knows it so they will never moderate it.

igor_kavinski · May 6, 2024

Thibsie said:
I don't get it, how is AVX512 any substitute for an NPU ?

Probably this: https://www.tencentcloud.com/document/product/213/41062

Joe NYC · May 6, 2024

Thibsie said:
I don't get it, how is AVX512 any substitute for an NPU ?

Not exactly a substitute, but as a hardware feature for ISV / game developers to use.

It is a very slow process for a hardware feature to become widely used. It's completely unrealistic to expect to have a game released precisely at the time of ARL release supporting a unique feature of ARL.

Example of far more consequential hardware feature: x86-64 instruction set. First introduced in 2003, it took 2 years for first version to support in in 2005 and then it took another decade for the games to start switching to 64 bit.

Ghostsonplanets · May 6, 2024

adroc_thurston said:
Mainstream is still raptor.
ARL-U is in a very weird position overall, the only boon is platform comparability with MTL-U.

Right. ARL-U is an odd duck because it still won't be cheap for mainstream, even with the nodelet shrink. But it also can't be priced as premium because the the product doesn’t justify.

Intel lineup will be in a weird position next year

LNL - >$999
ARL-U -> $700 - $900
RPL-U -> $600 and below

adroc_thurston · May 6, 2024

Ghostsonplanets said:
Intel lineup will be in a weird position next year

LNL - >$999
ARL-U -> $700 - $900
RPL-U -> $600 and below

It's not weird, it's bad.
Only LNL has merits because battery life.

Hulk · May 6, 2024

AMDK11 said:
I gave a general summary of decoding + sending from uop Cache because the LionCove diagram is of too low quality and does not specify how much it is for the decoder.

GoldenCove has 14 uops, including 6 from the decoder and 8 from the uop Cache.

LionCove has 24 uops, but it is not 100% sure whether there are 8 from the decoder and 16 from the uop cache or maybe 10 from the decoder and 14 from the uopcache.

I provided the data that can be read from the LionCove diagram. Much is still unknown.

I won't be surprised at all if there is a 10-Way decoder because Skymont has 3x 3-Way, i.e. a total of 9-Way. But it can also be 8-Way.

I think Intel will present the LionCove microarchitecture in June, and as you can see, they already have core diagrams ready for presentation.

Thanks, that makes perfect sense. They are really opening up both the front end and back end of Lion Cove. I still think the IPC increase will be 20% as that seems to always be Intel's target, maybe 25% if things really work out well for them. That 24uopss is a huge increase, which makes me think the front end is currently the bottleneck.

Hulk · May 6, 2024

Is it possible that Skymont might get a micro-op cache?

Henry swagger · May 6, 2024

Ghostsonplanets said:
https://twitter.com/x/status/1787572832219861422

https://twitter.com/x/status/1787571666945810485

"From what I hear LNL has good number of design wins in last quater...."

"Not going to put numbers because, earlier design wins itself are high... Having something that big on top of them .... I am not confident enough"

Lunar Lake getting a lot of design wins. It tracks with previous leaks that LNL would have 3x as much design wins as Meteor Lake.

Ofcouse LNL will have design wins.. intel has 88% of market share even apple will superior battery life can't eat into it.. LNL eill be intel's m1 for x86

DrMrLordX · May 7, 2024

reggie_fils_aime said:
All of that seems fine and good, and would certainly negate the need for trained human supervision. I'm perhaps a little jaded on that whole aspect because this is all work I've been trained to do throughout my adult life: finding trends in data, editing photos, concise and clear communications... That's a broadcasting and journalism degree, in so many words. I guess I'm not the target audience, lol.

The goal here is not to necessarily replace you (the operator) performing those functions, but to instead make it easier for you to do the things you already know how to do. Ideally so that someone trained as you are can do the work of 2-3 people, meaning that your boss then gets to lay off some of your colleagues without raising your pay by much, if at all. Isn't that great?

And that is why there's so much hype around NPUs!

DavidC1 · May 7, 2024

Hulk said:
Is it possible that Skymont might get a micro-op cache?

No. If anything uop caches will be sharing the fate of Hyperthreading and be axed in the future.

Uop caches are a remnant of the extreme clock focus of Netburst. Rather than thinking it brings more performance, you should think of it as maintaining performance while allowing it to raise clocks.

AMDK11 said:
I won't be surprised at all if there is a 10-Way decoder because Skymont has 3x 3-Way, i.e. a total of 9-Way. But it can also be 8-Way.

I think Intel will present the LionCove microarchitecture in June, and as you can see, they already have core diagrams ready for presentation.

Raichu and others said 8-way decode for Lion Cove.

Also, for Skymont to be 8-way, it has to be 2x4. Since the cluster approach is to reduce impact of power/area of decoders, it makes no sense as Intel said cluster 3-way minimizes impact of decoders. Raichu might have thought it was 8-way in the beginning, but later clarified that its 3x3-way.

AMDK11 · May 7, 2024

Maybe Raichu found out about the width-8 Dispatch/Rename and assumed that the decoder is also 8-Way?

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Lifer

Diamond Member

Diamond Member

Senior member

Platinum Member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Platinum Member

Diamond Member

Lifer

Platinum Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Senior member

Lifer

Senior member

Senior member