Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	8 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (20A)	Arrow Lake (N3B)	Arrow Lake Refresh (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop Only	Desktop & Mobile H&HX	Desktop Only	Mobile U Only	Mobile H
Process Node	Intel 4	Intel 20A	TSMC N3B	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Q1 2025 ?	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2025 ?	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	6P + 8E ?	8P + 16E	8P + 32E	4P + 4E	4P + 8E
LLC	24 MB	24 MB ?	36 MB ?	?	8 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

dullard · Mar 20, 2024

adroc_thurston said:
see the docs.

I've mentioned twice that I can't find it (including in the exact text you quoted). Do you have a link? And again, supporting something isn't the same as doing it with good performance due to the issues that I listed here: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...akes-discussion-threads.2606448/post-41177881

adroc_thurston · Mar 20, 2024

dullard said:
Do you have a link?

AMD Technical Information Portal

docs.amd.com

dullard said:
And again, supporting something isn't the same as doing it with good performance due to the issues that I listed here

It does support them with good performance, just that you need to use Vitis AI toolchain instead of DirectML. (for now, anyway).

naukkis · Mar 20, 2024

dullard said:
The ability to use different data types is theoretically possible with NPUs. But, the main issue is memory, bandwidth, and power used. If you only need 4 bits (which AI often only needs 4 bits or 8 bits), then using something set up for 512 bits is quite a waste. Using 512 bits when your application needs 4 bits will require 128x more memory, will have to move 128x more data around, and will have to process 128x more of that data, using much more power. All while only being able to use much smaller AI models due to those limits. So, it isn't really efficient to use something set up for 512 bits with 4 bits

The reverse is true too. If you have an NPU optimized and designed for say, 4-bit math, and need 16-bit data, then you need to transfer that data around in 4 chunks which takes more time. Then you have memory to store only 1/4th the data. It can work, but just won't be as performant as you want.

Whole NPU meaning is to make optimized hardware for very short datatypes with simplified instructions. FPU does very complex instructions with long datatypes being exactly opposite optimizing point to NPU. And FPU isn't actually co-processor anymore, it's a part of a ISA and it cannot removed without losing compatibility. And as it's a part of a ISA pretty much every program used it by default if float or double type variables are used as it's faster use them with FPU than with integer units.

Ghostsonplanets · Mar 20, 2024

adroc_thurston said:
That's compared to (non-existent) MTL-U which is one of the worst parts Intel has ever made.
Not a real achievement per se.

Not to be nitpicky, but weren't you quite bullish on LNL before? Did something changed to the project that isn't as efficient as expected?

adroc_thurston · Mar 20, 2024

Ghostsonplanets said:
but weren't you quite bullish on LNL before?

Yeah it's a solid part.

Ghostsonplanets said:
Did something changed to the project that isn't as efficient as expected?

No, just that the 50% moar nT relative to MTL-U puts it not that high up on the efficiency list.

igor_kavinski · Mar 20, 2024

adroc_thurston said:
Yeah it's a solid part.

Is AMD's competing part gonna be more solid?

adroc_thurston · Mar 20, 2024

igor_kavinski said:
Is AMD's competing part gonna be more solid?

They don't have any.

igor_kavinski · Mar 20, 2024

adroc_thurston said:
They don't have any.

Are you saying that they been caught off guard? Or just disinterested?

Goop_reformed · Mar 20, 2024

Geddagod said:
That's wild lmaoooo

I'm playing the talos principle fan mods and they can get really damn difficult. I have no time for cryptic messages atm.

Goop_reformed · Mar 20, 2024

igor_kavinski said:
Are you saying that they been caught off guard? Or just disinterested?

I don't think the current zen can scale to that level, perhaps why amd are making 3 types of venice? This is a total guest btw

adroc_thurston · Mar 20, 2024

igor_kavinski said:
Are you saying that they been caught off guard

No.

igor_kavinski said:
Or just disinterested

AMD tablet chips swimlane is dead.

Goop_reformed · Mar 20, 2024

Freshly squeezed leaks:

Just a picture though.

Feast on this peasants

FlameTail · Mar 20, 2024

Isn't using dummy dies a waste of silicon?

What's the benefit, other than aesthetics?

adroc_thurston · Mar 20, 2024

FlameTail said:
Isn't using dummy dies a waste of silicon?

no

FlameTail said:
What's the benefit, other than aesthetics?

Thermomechanical stability.
Those things expand at different rates.

igor_kavinski · Mar 20, 2024

adroc_thurston said:
Thermomechanical stability.
Those things expand at different rates.

Do the wafers used by Intel and TSMC differ in any way? Or do they both get them from pure silicon ingots?

Hitman928 · Mar 21, 2024

Not sure if this is news, but Microsoft just announced "Surface AI PCs". They utilize Intel MTL and come in 2 models, the Surface Pro 10 for Business and Surface Laptop 6 for Business.

Introducing Surface Pro 10 for Business and Surface Laptop 6 for Business

AI-powered PCs built for a new era of work We are excited to announce the first Surface AI PCs built exclusively for business: Surface Pro 10 for Business and Surface Laptop 6 for Business. These new PCs re

blogs.windows.com

eek2121 · Mar 21, 2024

naukkis said:
Whole NPU meaning is to make optimized hardware for very short datatypes with simplified instructions. FPU does very complex instructions with long datatypes being exactly opposite optimizing point to NPU. And FPU isn't actually co-processor anymore, it's a part of a ISA and it cannot removed without losing compatibility. And as it's a part of a ISA pretty much every program used it by default if float or double type variables are used as it's faster use them with FPU than with integer units.

While I do agree, technology does evolve over time and this is what I am getting at. I was thinking in terms of literal decades when I said that.

The NPU will likely take on more responsibility as time goes on. I will be shocked if x86 20 years from now as it does today. New wafer costs keep going up and having duplicate functionality on multiple parts of the package wastes die space.

I am terrible at predicting , but if I had to I would say the lines between the NPU, CPU cores, and GPU cores are going to blur.

IIRC someone mentioned AMD has a patent for catching exceptions for missing instructions and redirecting the workload off chip. The reason it got brought up is because of Intel’s missing AVX-512. I could absolutely see something similar happening here.

Someone mentioned latency, but once the compilers are changed, there would be no performance penalty and performance may actually increase.

Me personally? I want socketed NPUs so competitors can play too, but of course we won’t get that. We are getting PCIE accelerators soon, however.

We are still in the very early days of AI. If you compare AI to the invention of the internet, we equivalent to where they were in the 70s.

adroc_thurston · Mar 21, 2024

naukkis said:
Speculation was about replacing FPU with NPU

did you just invent GPUs.
like dawg we already invented silly parallel SIMD crunchers. in 2005. In Xenos, from Xbox 360.

eek2121 said:
The NPU will likely take on more responsibility as time goes on.

It does dumb matrix math.
are you daft

eek2121 said:
I will be shocked if x86 20 years from now as it does today. New wafer costs keep going up and having duplicate functionality on multiple parts of the package wastes die space.

THE FUTURE IS FUSION™

eek2121 said:
but if I had to I would say the lines between the NPU, CPU cores, and GPU cores are going to blur.

They'll be more clear-cut than ever.

igor_kavinski · Mar 21, 2024

adroc_thurston said:
It does dumb matrix math.

That's kinda like saying our neurons do dumb matrix math

adroc_thurston · Mar 21, 2024

igor_kavinski said:
That's kinda like saying our neurons do dumb matrix math

They don't, that's the whole point.

mikk · Mar 22, 2024

Genuine Intel(R) 0000 1.60GHz (8C 2.8GHz/1.6GHz 75% OC, 5x 2.5MB L2, 2x 12MB L3)
Intel(R) Arc(TM) Graphics (64CU 512SP SM6.4 1.85GHz, 8MB L2

Details for Computer/Device Samsung 999JZR Galaxy Book5 Pro (Samsung NT940XGK-DSD)

ranker.sisoftware.co.uk

https://twitter.com/x/status/1771183281389605260

This should be Lunar Lake? Sisoft reports 8 threads.

DavidC1 · Mar 22, 2024

dullard said:
I think Intel 3 seems to be what was intended. Because, well, Sierra Forest is on Intel 3.

That makes no sense in context of what Raichu said. He said and I quote "it's easy to get 3 meters".

DavidC1 · Mar 22, 2024

moinmoin said:
Technology wise the Atom team's work indeed has been more interesting to follow for quite some time now, like for a decade by now?

After falling out of the spotlight with Silvermont/Airmont, they've been working quietly behind the scenes.

They've been doing a consistent cadence of using new ideas in one generation and optimization/expansion the next.

Bonnell - 2 way in order
New Ideas: Silvermont - 2 way out of order, lowered pipeline stages
New Ideas: Goldmont - 3 way out of order(added OoOE FP) + 16KB predecode
Exp/Opt: Goldmont Plus - 3 way + wider backend, quadrupled(64KB) predecode

New Ideas: Tremont - Clustered, 2x3 way, 128KB predecode, greatly improved branch prediction
Exp/Opt: Gracemont - Improved clustered 2x3 way, so the throughput is effective 6-wide. Predecode cache replaced with OD-ILD that predecodes on the fly for better performance under more demanding workloads and higher area/power efficiency

New Ideas?: Skymont - Clustered 3x3 way + ??
Darkmont - 18A shrink
Exp/Opt: Arctic Wolf - Clustered 3x3 way with backend optimizations?

Gracemont is already superior to Golden Cove in the fetch department, where it can fetch 2x32B(2x16B from the OD-ILD) to feed the two clustered decoders, while Goldemont Cove can only work with 1x32B for its 6-wide decoders.

Not only SMT is going to go away, eventually I suspect uop caches will too, as the primary purpose of the uop cache is to increase clocks while attempting to minimize branch misprediction penalties that comes with increased pipeline stages.

Henry swagger · Mar 23, 2024

DavidC1 said:
After falling out of the spotlight with Silvermont/Airmont, they've been working quietly behind the scenes.

They've been doing a consistent cadence of using new ideas in one generation and optimization/expansion the next.

Bonnell - 2 way in order
New Ideas: Silvermont - 2 way out of order, lowered pipeline stages
New Ideas: Goldmont - 3 way out of order(added OoOE FP) + 16KB predecode
Exp/Opt: Goldmont Plus - 3 way + wider backend, quadrupled(64KB) predecode

New Ideas: Tremont - Clustered, 2x3 way, 128KB predecode, greatly improved branch prediction
Exp/Opt: Gracemont - Improved clustered 2x3 way, so the throughput is effective 6-wide. Predecode cache replaced with OD-ILD that predecodes on the fly for better performance under more demanding workloads and higher area/power efficiency

New Ideas?: Skymont - Clustered 3x3 way + ??
Darkmont - 18A shrink
Exp/Opt: Arctic Wolf - Clustered 3x3 way with backend optimizations?

Gracemont is already superior to Golden Cove in the fetch department, where it can fetch 2x32B(2x16B from the OD-ILD) to feed the two clustered decoders, while Goldemont Cove can only work with 1x32B for its 6-wide decoders.

Not only SMT is going to go away, eventually I suspect uop caches will too, as the primary purpose of the uop cache is to increase clocks while attempting to minimize branch misprediction penalties that comes with increased pipeline stages.

raichu said skymont will be 8 wide so 4×4 way. And is targeting rocket lake to golden cove ipc

SiliconFly · Mar 23, 2024

Henry swagger said:
raichu said skymont will be 8 wide so 4×4 way. And is targeting rocket lake to golden cove ipc

Kinda skeptical SKT can match GLC.

If it sounds too good to be true, then it probably is...

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Elite Member

Platinum Member

Senior member

Senior member

Platinum Member

Lifer

Platinum Member

Lifer

Member

Member

Platinum Member

Member

Platinum Member

Platinum Member

Lifer

Diamond Member

Platinum Member

Platinum Member

Lifer

Platinum Member

Diamond Member

Member

Member

Senior member

Golden Member