Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	8 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (20A)	Arrow Lake (N3B)	Arrow Lake Refresh (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop Only	Desktop & Mobile H&HX	Desktop Only	Mobile U Only	Mobile H
Process Node	Intel 4	Intel 20A	TSMC N3B	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Q1 2025 ?	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2025 ?	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	6P + 8E ?	8P + 16E	8P + 32E	4P + 4E	4P + 8E
LLC	24 MB	24 MB ?	36 MB ?	?	8 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

coercitiv · Mar 18, 2024

uzzi38 said:
Yeah XMX has to come back, because MS's TOPs requirements are only getting larger as time goes on. It's a lot more area efficient (AKA cheaper to produce and for the end consumer later) to use XMX than it is to slap on an even _bigger_ NPU, even if the NPU would be more power efficient.

Never really bothered to to get more in-depth with the subject, but my basic understanding is ML based tasks in personal computing will be split in two categories:

"Low" compute tasks where efficiency is important, such as video call background blur, noise reduction, recognition, translation, dictation, grammar & auto correct etc. The NPU should handle them, so it needs to be scaled to their scope and made as efficient as possible.
Heavy compute tasks using generative models (language, multimedia, science & engineering) where performance is important. These will leverage the GPU mostly, because this way the compute area can be used for both AI and graphics, which is a good compromise for a consumer chip.

moinmoin · Mar 18, 2024

SiliconFly said:
If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money). Having two separate designs makes a lot of sense considering LNC is new.

It would be especially silly considering the whole reason of existence for E-cores is area efficiency. But with AVX-512 existing but disabled the P-cores are essentially artificially bloated without reason. And I have a hard time imagining combining P and E-cores with all the hard- and software changes that necessitates is cheaper than just optimizing P-cores.

Geddagod · Mar 18, 2024

SiliconFly said:
I remember reading some articles that had mixed views about the AVX-512 die area during the Linus Torvalds AVX controversy. Many claimed that AVX-512 instructions take up significant die space (as much as 25% per core) due to it's complex logic. While a few others claimed that AVX-512 support doesn't take up significant space in the total die area.

If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money). Having two separate designs makes a lot of sense considering LNC is new.

Where did you hear AVX-512 adds 25% to the core area?
Also, even if it is true (I doubt it is lmao), that's just the core. It's not adding 25% to the whole die, the impact there is gonna be way, waaaay smaller

SiliconFly · Mar 18, 2024

Geddagod said:
Where did you hear AVX-512 adds 25% to the core area?
Also, even if it is true (I doubt it is lmao)

You should. Like I said, AVX-512 % die area projections differs from one die to die. Some are from trust worthy sources while others are plain speak, possibly just rumors or guesses (may or may not be accurate). You be your own judge, cos Intel doesn't publish exact figures.

Link, link & link.

Geddagod said:
...It's not adding 25% to the whole die, the impact there is gonna be way, waaaay smaller

Your accuracy amazes me! Assuming you can read correctly, I clearly mention their claims upto 25% per core. I never said 25% of the whole die area. Then also, I'm sure it's definitely not waaaay smaller. Definitely not a single digit % number.

igor_kavinski · Mar 18, 2024

If we assume each P core in Intel CPUs contains two AVX-512 units and same core is used for server and consumer with the latter having one unit disabled, that's a lot of die area being wasted in the name of segmentation.

Geddagod · Mar 18, 2024

SiliconFly said:
You should. Like I said, AVX-512 % die area projections differs from one die to die. Some are from trust worthy sources while others are plain speak, possibly just rumors or guesses (may or may not be accurate). You be your own judge, cos Intel doesn't publish exact figures.

You can literally just look at skylake client and then skylake server (which has AVX-512).

SiliconFly said:
Assuming you can read correctly, I clearly mention their claims upto 25% per core. I never said 25% of the whole die area.

Shouldn't have said this then

If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money)

You are talking about the die in one sentence, and then in parenthesis just mention 25% is a ton of money? It would be generous of me to assume you are talking about the core area tbh, though if it was some other people who typed that I would have just made that assumption lol

SiliconFly said:
Then also, I'm sure it's definitely not waaaay smaller. Definitely not a single digit % number.

Using very optimistic calculations, at best it looks like on AVX-512 is ~15% of a skylake server core. A skylake core looks to be ~ 2/3 of a skylake "block". There's a lot of stuff on the CPU that's not just skylake "blocks" but let's just ignore that. It very conceivably can be a single digit % number, even if the number is 25% as you think it is.

Geddagod · Mar 18, 2024

SiliconFly said:
Intel might have a marginal lead in client with their upcoming products.

Lol

SiliconFly said:
But, AMD's upcoming cores appear to be better suited for data center than Intel's upcoming cores. Diamond rapids may not match Zen5 series in overall server performance and/or efficiency.

DMR not matching Zen 5 would be kinda pathetic since DMR would be launching pretty much near Zen 6

Markfw said:
don't think the transistors are in it to enable it, right ? While googling on the subject, I found this from tomshardware.com

That doesn't mean the transistors won't/can't be there....

eek2121 said:
AMD drops performance in exchange for area/cost for the Zen4c cores. This leaves Intel in a situation where they need a faster Atom or a more efficient Cove core, but they have neither.

Why?

SiliconFly said:
Ya. Agree. Nothing special. But still gonna be light years ahead of Zen5 I presume.

"I presume"

AMDK11 said:
LionCove+ will be something like RaptorCove compared to GoldenCove or RedwoodCove. Nothing more.

Uhh I think Bionc said something about changes to the L0 and L1, but I could be misremembering

dullard · Mar 18, 2024

Geddagod said:
You are talking about the die in one sentence, and then in parenthesis just mention 25% is a ton of money?

So many of the arguments here are simple misunderstandings like that. People here tend to change subject midsentence and not tell anyone about the change of subject. Or, my pet peeve, use a pronoun that doesn't refer to anything remotely close to the sentence the pronoun is in. The way I read his post was that paragraph was talking about dies, and thus a 25% cost would naturally be assumed to be referring to the entire die cost.

Geddagod · Mar 18, 2024

dullard said:
So many of the arguments here are simple misunderstandings like that. People here tend to change subject midsentence and not tell anyone about the change of subject. Or, my pet peeve, use a pronoun that doesn't refer to anything remotely close to the sentence the pronoun is in. The way I read his post was that paragraph was talking about dies, and thus a 25% cost would naturally be assumed to be referring to the entire die cost.

Yup. Regardless, idk why he was so mad, other than the "I doubt it is lmao", nothing in my reply was thaaaat annoying. And that was referring to the core, not the total die.
But whatever, I* don't use this site much anymore due to how many times it loads super slowly, or just doesn't load at all.

SiliconFly · Mar 19, 2024

igor_kavinski said:
Once done benchmarking, put the CPUs in water and measure their "density" using the Archimedes Principle

I know it sounds like a stretch, but I think it is kinda possible in theory to calculate the transistor density of a cpu using Archimedes principle if we know the volume of a transistor (and other materials like packaging, etc). All we need is a very large container filled with water and a lot of cpus to calculate the displacement. Then subtract and divide. Voila!

Geddagod · Mar 19, 2024

Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

S'renne · Mar 19, 2024

Geddagod said:
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

Who's Xino

S'renne · Mar 19, 2024

Geddagod said:
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

Oh wait their accounts have all been suspended when was this lmao

SiliconFly · Mar 19, 2024

Geddagod said:
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

Well, generally speaking, there are 3 possibilities for ARL...

(1) There is a very high probability that ARL might have a clock regression of up to 10% to 15%. Or maybe not. Hard to say at this point. If a clock regression is there AND if LNC's IPC gains are in the order of 20% to 25% only (which sounds reasonable), then we may end up with ARL having only single digit level IPC gains. Imho, thats not exactly a bad assessment.

(2) Next is, similar to MILD's claims, if LNC ends up having massive IPC gains in the order of 30% or 40%, and there isn't much clock regression, then ARL is gonna be awesome. The likelihood of this happening isn't that high if you ask me. But quite a possibility.

(3) Then there is a third but remote possibility that ARL might have a slight performance regression over RPL, cos RPL screams at a mind-numbing clock of 6.2 GHz. And if LNC's IPC gains aren't large enough, we may end up with something very similar to MTL, a slight performance regression. But the probability of something like this happening is pretty low. But still a possibility.

Ghostsonplanets · Mar 19, 2024

Geddagod said:
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

That would be really unfortunate for a brand new generation. Specially coming after the impressive Sunny and Golden Cove generations, which both had ~20% IPC increase over the past uArch gen.

But, quite frankly, I'm more interested into how Lunar Lake will shape up than ARL. The return of low power x86 in the vein of Core M is much needed to fight against Apple M and X Elite. Intel ST performance is already very competitive so that a single digit uplift would still maintain them in the fight. But figuring high performance at low power and with good efficiency is key and hence why LNL is such an interesting prospect to me.

Geddagod · Mar 19, 2024

S'renne said:
Oh wait their accounts have all been suspended when was this lmao

No it hasn't. It was today.

SiliconFly said:
Then there is a third but remote possibility that ARL might have a slight performance regression over RPL, cos RPL screams at a mind-numbing clock of 6.2 GHz.

Xino claims ARL might get up to 5.6ghz, but I doubt he is talking about the 14900ks, most people when comparing generations don't include the KS parts.

Ghostsonplanets said:
That would be really unfortunate for a brand new generation. Specially coming after the impressive Sunny and Golden Cove generations, which both had ~20% IPC increase over the past uArch gen.

Might be closer to 10% than 20%, unfortunately. I agree though, after all the LNC hype...

Ghostsonplanets said:
But figuring high performance at low power and with good efficiency is key and hence why LNL is such an interesting prospect to me.

Intel's low power optimization is just so cooked it's wild. Crossing my fingers for LNL (though I prob won't end up getting it anyway).

eek2121 · Mar 19, 2024

coercitiv said:
Never really bothered to to get more in-depth with the subject, but my basic understanding is ML based tasks in personal computing will be split in two categories:

"Low" compute tasks where efficiency is important, such as video call background blur, noise reduction, recognition, translation, dictation, grammar & auto correct etc. The NPU should handle them, so it needs to be scaled to their scope and made as efficient as possible.

Heavy compute tasks using generative models (language, multimedia, science & engineering) where performance is important. These will leverage the GPU mostly, because this way the compute area can be used for both AI and graphics, which is a good compromise for a consumer chip.

Meh, both can be used for point #2. I am actually more curious if this becomes a move back to CMT. Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃

igor_kavinski · Mar 19, 2024

eek2121 said:
Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃

Wouldn't that break compatibility with existing software? If they somehow divert those instructions from CPU decoder to NPU, there would be a latency hit involved.

SiliconFly · Mar 19, 2024

igor_kavinski said:
Wouldn't that break compatibility with existing software? If they somehow divert those instructions from CPU decoder to NPU, there would be a latency hit involved.

I think it's possible. Not very sure though. If the FP instructions are removed, the CPU will throw an exception when it doesn't recognize the instruction, the OS has to catch it and divert it to the NPU. There's gonna be a lot of latency involved.

And if I'm right, too many programs use FP these days I think. Possibly even browsers and apps like MS Office and lots of games too I think. Again, not sure though. If thats the case, then we're stuck with FP forever!

naukkis · Mar 19, 2024

eek2121 said:
Meh, both can be used for point #2. I am actually more curious if this becomes a move back to CMT. Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃

NPU is exact opposite of FPU. Floating point instructions are varying point so number expression range can be huge, like from 2^-64 to 2^64 and calculations can be done between opposite extremes. NPU instead is relying extremely short integers, like 4 and 8 bits - only 16 or 256 values. If we think normal integer(fixed point math) cpu as middle point NPU is other way and FPU the other, they are absolutely not alternatives.

adroc_thurston · Mar 19, 2024

naukkis said:
NPU is exact opposite of FPU

?
They do bf/fp8|16 math just fine.

naukkis said:
NPU instead is relying extremely short integers, like 4 and 8 bits - only 16 or 256 values.

Most support FP16/BF16 just fine.

igor_kavinski · Mar 19, 2024

adroc_thurston said:
Most support FP16/BF16 just fine.

Can they match or exceed an AVX-512 CPU?

adroc_thurston · Mar 19, 2024

igor_kavinski said:
Can they match or exceed an AVX-512 CPU?

Yes they're dedicated matrix crunchers.
GEMM is the only thing they do.

H433x0n · Mar 19, 2024

An fmax of 5.6ghz would be fine.

At this point they should be receiving ES2 for a launch in October. If the practical IPC is ~15% I guess that makes sense once taking into account the 2-4% penalty from tile overhead.

Unfortunately in typical leaker fashion Xino wasn’t very specific. Was this test at JEDEC-4800? Was it with most recent stepping of SoC tile? Are the IPC figures from mobile or desktop ARL? Which version of RPL is he talking about? The 13900K or a 14900KS 1T performance?

Expectations are pretty low but if IPC bump is <15% then they deserve to get clobbered.

FlameTail · Mar 19, 2024

adroc_thurston said:
Yes they're dedicated matrix crunchers.
GEMM is the only thing they do.

GEMM?

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Diamond Member

Diamond Member

Golden Member

Golden Member

Lifer

Golden Member

Golden Member

Elite Member

Golden Member

Golden Member

Golden Member

Member

Member

Golden Member

Senior member

Golden Member

Platinum Member

Lifer

Golden Member

Senior member

Platinum Member

Lifer

Platinum Member

Senior member

Platinum Member