Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	8 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (20A)	Arrow Lake (N3B)	Arrow Lake Refresh (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop Only	Desktop & Mobile H&HX	Desktop Only	Mobile U Only	Mobile H
Process Node	Intel 4	Intel 20A	TSMC N3B	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Q1 2025 ?	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2025 ?	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	6P + 8E ?	8P + 16E	8P + 32E	4P + 4E	4P + 8E
LLC	24 MB	24 MB ?	36 MB ?	?	8 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

igor_kavinski · Apr 22, 2024

coercitiv said:
This was already done by someone in the forum. It's not the first time we're discussing this topic.

Oh no! Why did he stop at 150W???

@Khato , did you do any further testing at higher PL1 values?

Saylick · Apr 22, 2024

C&C is continuing their MTL investigation, this time with an article on the NPU:

Intel Meteor Lake’s NPU

AI is a hot topic and Intel doesn’t want to be left out, so their Meteor Lake mobile processor integrates a Neural Processing Unit (NPU). Intel internally refers to the NPU as “NPU 3720…

chipsandcheese.com

In summary, it looks like the NPU is of limited use because of the data types it supports (or rather, doesn't support). And of the cases it does support, the NPU offers lower power but not necessarily higher performance. The author seems to think that the iGPU is the better approach here because it's more powerful and more flexible, even at the cost of higher power because there's many situations where you can plug in a laptop these days. That is, until Intel develops a NPU which does cover more use cases with higher performance while continuing to use lower power.

Meteor Lake’s NPU is a fascinating accelerator. But its has narrow use cases and benefits. If I used AI day to day, I would run off-the-shelf models on the iGPU and enjoy better performance while spending less time getting the damn thing running. It probably makes sense when trying to stretch battery life, but I find myself almost never running off battery power. Even economy class plane seats have power outlets these days. Hopefully Intel will iterate on both hardware and software to expand NPU use cases going forward. GPU compute evolved over the past 15 years to reach a reasonably usable state today. There’s something magical about seeing work offloaded to a low power block, and I hope the same evolution happens with NPUs.

Khato · Apr 22, 2024

igor_kavinski said:
Oh no! Why did he stop at 150W???

@Khato , did you do any further testing at higher PL1 values?

Negative, didn't see a need to as it'd just continue the same trend. 18W per core is already well into the realm of diminishing efficiency for P cores, and well beyond the maximum the E cores will take.

With respect to SMT, in the 'perfect' workload for it which CB23 is, it's of greatest benefit at the highest power levels. Maybe I'll do a fixed frequency test to confirm actual performance and power increase, but lets assume 1.5x perf for 1.5x power to keep it simple for now. With a fixed 18W per core power limit that means non-SMT runs at the 18W frequency (say 5GHz) while SMT runs at the 12W frequency (say 4.4GHz.) Such results in SMT having a 6.6GHz effective performance, right around 30% above non-SMT. But as power per core goes down to something more like the 4W seen in mobile and servers the V/F curve is nowhere near so punishing, so the benefit of SMT drops.

Basically, my guess is that Intel dropping SMT won't have much of an impact. Improvements in the E-cores are going to negate the losses on the high power desktop side where removal of SMT would otherwise have the greatest impact. Mobile is probably going to come out ahead since the E-cores should be doing most of the multithreading work there anyway. And in server it just draws a clearer line in the segmentation between P-core and E-core based designs.

Oh, and for those concerned about 2 P-cores with ample E-core backup being inadequate for gaming? Don't be. While it's definitely true that many modern games benefit from having 8+ cores they need not be equal in their capabilities to extract that benefit. Most that I've seen still only have 2 'heavy' threads while the remainder are 'light'.

Panino Manino · Apr 22, 2024

Saylick said:
C&C is continuing their MTL investigation, this time with an article on the NPU:

Intel Meteor Lake’s NPU

AI is a hot topic and Intel doesn’t want to be left out, so their Meteor Lake mobile processor integrates a Neural Processing Unit (NPU). Intel internally refers to the NPU as “NPU 3720…

chipsandcheese.com

In summary, it looks like the NPU is of limited use because of the data types it supports (or rather, doesn't support). And of the cases it does support, the NPU offers lower power but not necessarily higher performance. The author seems to think that the iGPU is the better approach here because it's more powerful and more flexible, even at the cost of higher power because there's many situations where you can plug in a laptop these days. That is, until Intel develops a NPU which does cover more use cases with higher performance while continuing to use lower power.

Isn't it enough for Windows tasks, CoPilot, etc?
In theory the NPU may be disappointing, but in practice may give the average user some relevant extra battery time.

Ghostsonplanets · Apr 22, 2024

Panino Manino said:
Isn't it enough for Windows tasks, CoPilot, etc?
In theory the NPU may be disappointing, but in practice may give the average user some relevant extra battery time.

Considering it doesn’t support future AI PC Windows features, it doesn’t seem so for MS.

Intel is growing up both NPU and GPU MatMul in next generations. In special GPU with the debut of of XMX on iGPUs starting with Lunar Lake and Arrow Lake.

DrMrLordX · Apr 23, 2024

Khato said:
Basically, my guess is that Intel dropping SMT won't have much of an impact.

It will in workloads like 3DPM v1 which inflicts frequent cache flushes/pipeline stalls.

nahiyoahz865910900100@out · Apr 23, 2024

And RPL is P core 7.04 / E core cluster 8.78 I believe

DavidC1 · Apr 23, 2024

Hulk said:
Raptor Cove has about 45% or so better IPC than Gracemont when it comes to single core. So while the increased size of Skymont is enticing I find it hard to believe Skymont will approach Raptor Cove IPC, which would be needed for Arrow Lake to compete with Raptor Lake in MT.

Raptor Cove is 40-45% in overall, where the Integer gap is closer and FP gap is large. It's something like 25% Int and 50-60% FP.

"Raptormont" gets 1-3% gain while Crestmont gets 4-6% gain. So a 30% gain with Skymont as did with Atom-based predecessors gets us to the "aiming for ADL" claim on Twitter. I think SKT will be able to reach Golden Cove similar to GMT reaching Skylake.

It means 10-15% faster than Golden Cove for Int while being 10-15% slower in FP. Consequently it means Sierra Glen(which is Crestmont without the 6-wide retire/allocate, or IOTW Gracemont) is similar to 10-15% faster per core than Skylake and is an excellent Cloud core.

If we extrapolate that to Darkmont-based Clearwater Forest, you essentially have an 18A 144-288 core better-than Golden Cove core chip.

On a side note, I speculate the possibility that they aren't backing down on clocks with Skymont hence the greater than expected core size while they are for Lion Cove.

nahiyoahz865910900100@out · Apr 23, 2024

P core: 4.55mm²
E core cluster: 8,1mm²
E core (without L2): 1.52mm²
_________________________________
If that is true, the die size of ARL compute tile would be much smaller than the die size of RPL CPU part, is that right?

dullard · Apr 23, 2024

Panino Manino said:
Isn't it enough for Windows tasks, CoPilot, etc?
In theory the NPU may be disappointing, but in practice may give the average user some relevant extra battery time.

Windows wants 40 TOPS of AI performance as a minimum. Meteor Lake's NPU is 10 TOPS. It is just too slow for what Microsoft wants as a minimum. That is something that won't be available (from any CPU supplier) until the next generation of CPUs later this year.

I appreciate Chips and Cheese going into these tests. But, insisting on 64-bit data for AI seems like totally the wrong approach. AI doesn't usually require that level of precision. AI is all about lots of variables with lots of math at low precision. Even 4-bit and 8-bit math is plenty for many AI applications. 64-bit data would allow for 8x fewer variables in memory and 8x fewer calculations than 8-bit variables. That said, the new drivers for the NPU do include FP64 as of last week, but the drivers now have a significant performance issue. AI, software, drivers, and hardware are still a work in progress. See this post and the one below it that would address Chip and Cheese's FP64 problem: https://github.com/openvinotoolkit/openvino/issues/22846#issuecomment-2056100285

igor_kavinski · Apr 23, 2024

dullard said:
Windows wants 40 TOPS of AI performance as a minimum. Meteor Lake's NPU is 10 TOPS.

The real question is, what do users of existing systems do? Are dedicated NPU PCIe cards coming out soon or do users have to buy expensive AI-accelerated cards from AMD/nVdia with no video outputs?

What about millions of laptops with no easy way to upgrade the GPU? Will they use USB 3.0 AI acceleration devices? Or maybe tethering a mobile device with an NPU to the laptop and offloading AI calculations to that?

Ghostsonplanets · Apr 23, 2024

igor_kavinski said:
The real question is, what do users of existing systems do? Are dedicated NPU PCIe cards coming out soon or do users have to buy expensive AI-accelerated cards from AMD/nVdia with no video outputs?

Desktop users buy an GPU or buy Arrow Lake.

igor_kavinski said:
What about millions of laptops with no easy way to upgrade the GPU? Will they use USB 3.0 AI acceleration devices? Or maybe tethering a mobile device with an NPU to the laptop and offloading AI calculations to that?

They won't get the features, at all. This also includes the latest and greatest from x86 duo: MTL and HWK.

AI PC is partly a push towards AI as MS doesn’t want to miss the next big thing and partly a push to phase out COVID and before laptops sales and push consumers towards new machines. Just think that AI PC push is coming on the heels of W10 EoL. Millions of corportate and consumers will want to replace their old machines.

Gideon · Apr 23, 2024

Finally a Notebookcheck review with a decent gen-on-gen battery life improvement (though the last gen had terrible scores):

https://www.notebookcheck.net/Core-Ultra-7-165U-performance-debut-Dell-Latitude-9450-2-in-1-review.825254.0.html#toc-6

Even a slight win over average 7840U laptops with similar battery, but only just. The full load battery life is impressive (40%) better, but only because the performance is similarily throttled. The sustained Cinebench score is about the same as the aforementioned Ryzen Lenovo in "balanced" mode between 19W and 15W CPU load (compared to 30-25W in high-performance mode). So the actual perf / watt is probably quite similar.

Still a decent generational uplift.

Ghostsonplanets · Apr 23, 2024

Gideon said:
Finally a Notebookcheck review with a decent gen-on-gen battery life improvement (though the last gen had terrible scores):

https://www.notebookcheck.net/Core-Ultra-7-165U-performance-debut-Dell-Latitude-9450-2-in-1-review.825254.0.html#toc-6

Even a slight win over average 7840U laptops with similar battery, but only just. The full load battery life is impressive (40%) better, but only because the performance is similarily throttled. The sustained Cinebench score is about the same as the aforementioned Ryzen Lenovo in "balanced" mode between 19W and 15W CPU load (compared to 30-25W in high-performance mode). So the actual perf / watt is probably quite similar.

Still a decent generational uplift.

One sore point is that idle, once again, regressed GoG. Tiger Lake, my beloved, had much better idle power.

But, yes, MTL-U is fairly impressive jump in efficiency. In this specific model, it basically doubled battery life at load and WLAN testing also had a healthy jump.

Another good point for Intel is that Arc Graphics 64 EUs are basically matching Iris Xe 96EU while also drawing much less power. Good PPA improvement over Iris Xe (Granted, N5 is basically a node ahead of Intel 7).

Ghostsonplanets · Apr 23, 2024

The biggest question point about Meteor U is availability and pricing. So far, there's few models to choose and they're priced above similar AMD options. Also, I saw some models where the difference between the U SKU and the H SKU was only $50 - $100.

IMO Intel needs to drop prices on the U SKUs and higher availability, specially on the low-end side of things (Core Ultra 5 125U and Core Ultra 5 115U (The "i3 1215U of this gen")).

Next year, ARL-U should bring some effiency gains due to Intel 3 CPU and N4 iGPU and also some small performance gains on the P core. But I don't think that's enough to stave AMD/QCOM offerings. Lower price and higher availability will be key for Intel.

But there's not much to worry as Intel are king of volume. Their market reach worldwide far outpaces the others two.

dullard · Apr 23, 2024

igor_kavinski said:
The real question is, what do users of existing systems do? Are dedicated NPU PCIe cards coming out soon or do users have to buy expensive AI-accelerated cards from AMD/nVdia with no video outputs?

What about millions of laptops with no easy way to upgrade the GPU? Will they use USB 3.0 AI acceleration devices? Or maybe tethering a mobile device with an NPU to the laptop and offloading AI calculations to that?

I believe that right now a lot of it (such as Microsoft Copilot) is being run through online servers. Of course with business data, that could be a security risk. So, it would be best to get off of the cloud. And who knows how long these companies will be willing to run their servers for free, so you might want off the cloud eventually anyways.

Gideon · Apr 23, 2024

dullard said:
I believe that right now a lot of it (such as Microsoft Copilot) is being run through online servers. Of course with business data, that could be a security risk. So, it would be best to get off of the cloud. And who knows how long these companies will be willing to run their servers for free, so you might want off the cloud eventually anyways.

Business also run AI on local servers.

With ollama it's dead easy and now with llama 3 released, it's very-very competitive to commercial competitors. The biggest downside of it is, that the 70B parameter model requires at least 40GB of VRAM so you can't run it on any gaming GPUs (the 8B model is fine though).

Anyway this also goes to show why client NPUs are total meme for chatbot tasks. Meteor Lake NPU has 11 TOPS. Windows AI requirement will be 40 TOPS.

Well, RTX 4060 has 242 TOPS, RTX 4090 has 1321 TOPS (and is decent but far from fast enough) and most top end models require 64+ GB of memory (with loads of bandwidth to boot).

Those NPUs have some uses, but "running chatbots locally, thus replacing cloud" aint it

dullard · Apr 23, 2024

Gideon said:
Business also run AI on local servers.
...
Those NPUs have some uses, but "running chatbots locally, thus replacing cloud" aint it

NPUs locally certainly aren't going to be running large language model chatbots any time soon. I however so no reason that I'd ever want to run my own chatbot either. Cloud based chatbots are just fine with me.

Tasks like these would run just fine locally with a bit more NPU power (no one in my work has a GPU to run them on any of our computers, all iGPUs here over hundreds of computers) and I would like to use them regularly :

What did I miss in that meeting that I had to miss because I was double booked?
What are my assignments from the last month?
Summarize this document.
Put all my emails from project ABC into a new folder.
Is there any correlation between these two data sets?
Highlight this data set in the chart better.
Does my data have an unusual/suspicious pattern indicating fraud?
Is there a pattern of issues on the production line indicating part EFG is out of alignment?
Are any of the bottles on the conveyor belt cracked?
Etc.

AI is more than chat.

Gideon · Apr 23, 2024

dullard said:
I however so no reason that I'd ever want to run my own chatbot either. Cloud based chatbots are just fine with me.

Yeah, they are fine for many, but plenty of companies has strict-enough rules/contracts that disallow them tot share any company's (or client's) data with these platforms.

dullard said:
AI is more than chat.

Agreed, that's why I mentioned they have some uses.

I'll end the OT talk now.

dullard · Apr 23, 2024

Gideon said:
Yeah, they are fine for many, but plenty of companies has strict-enough rules/contracts that disallow them tot share any company's (or client's) data with these platforms.

I guess my point didn't go through. I think AI as a chatbot has very limited business usage. AI in general has lots of business use cases. Your posts made it sound like just some usage was other things and that chatbot were where it is at. Maybe it was your 4 paragraphs talking about chatbots and then one line of "have some uses" made me think you decided the other AI uses were not very important.

Any way you look at it though, Meteor Lake's NPU isn't sufficient for most uses.

igor_kavinski · Apr 24, 2024

dullard said:
Any way you look at it though, Meteor Lake's NPU isn't sufficient for most uses.

Oh no. Intel's been selling MTL "beta AI" CPUs all this time! Shock! Surprise! Shock!

dullard · Apr 24, 2024

Gideon said:
With ollama it's dead easy and now with llama 3 released, it's very-very competitive to commercial competitors. The biggest downside of it is, that the 70B parameter model requires at least 40GB of VRAM so you can't run it on any gaming GPUs (the 8B model is fine though).

Good timing for this discussion. Microsoft just announced Phi-3. https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/

Phi-3-mini uses 3.8 billion parameters, half as much as Llama-3-8B, yet slightly higher quality.
Phi-3-small uses 7 billion parameters
Phi-3-medium uses 14 billion parameters--nothing like that 70 billion parameter model you discussed.

This is why I mentioned limited use for large language models on individual CPUs. Who really needs a local chatbot that can discuss everything? If you are a medical clinic, just train on medical texts and skip training on the fire codes of Zimbabwe. If you are selling widgets at a hardware store, just train on plumbing and electrical and construction text and skip texts on the history of 11th century Asia. These mini to medium models can be run locally on many of the new CPUs coming out later this year.

Gideon · Apr 24, 2024

dullard said:
Good timing for this discussion. Microsoft just announced Phi-3.

Yeah I actually wanted to discuss it, as it was released yesterday but i said i'll end the OT.

Phi3's smaller versions are probably something they will embed to windows, but tune for os assistance rather than chat.

The 14b medium model truly looks excellent, but i'd wait for more benchmarks before claiming it's across the board better than llama3 though, MS probably cherry picked the benchmark to show it in the strongest light

AMDK11 · Apr 25, 2024

I made a discovery about the LionCove core!

https://pbs.twimg.com/media/GK3IUxmbkAAzu8N?format=jpg&name=4096x4096

Apparently, the LionCove core graphics do not represent anything specific. Only apparently!

Enlarge the LionCove graphic and compare it to the Redwood Cove diagram!

https://images.anandtech.com/doci/20046/Architecting%20Our%20Next%20Gen%20Power%20Efficient%20Processor_FINAL%20CLEAN-12.png

Looks familiar? Yes!

LunarLake art includes an unlabeled diagram of the LionCove core!!!

From what I've read so far from LionCove's daigram:
8-Way Dispatch/Rename (GoldenCove 6-Way)
6x AGU + 2x Store/Data (GoldenCove 5x AGU + 2x SD)
6x ALU + 4x ALU-FP or 6x ALU + 4x FPU!!! (GoldenCove 3x ALU-FP + 2x ALU)

If these 10 execution ports are 4x ALU-FP + 6x ALU, this gives a gigantic amount of 10x ALU! Which, together with AGU and SD, gives 18 execution ports compared to 12 ports for GoldenCove.

Edit:
It seems that Skymont has a 3x 3-Way decoder (Gracemont and Crestmont 2x 3-Way).

https://elchapuzasinformatico.com/wp-content/uploads/2024/04/Intel-Lunar-Lake-Dark-Silicon-Tile-Dummy.jpg

Cheesecake16 · Apr 25, 2024

AMDK11 said:
I made a discovery about the LionCove core!

https://pbs.twimg.com/media/GK3IUxmbkAAzu8N?format=jpg&name=4096x4096
Apparently, the LionCove core graphics do not represent anything specific. Only apparently!

Enlarge the LionCove graphic and compare it to the Redwood Cove diagram!

https://images.anandtech.com/doci/20046/Architecting%20Our%20Next%20Gen%20Power%20Efficient%20Processor_FINAL%20CLEAN-12.png
Looks familiar? Yes!

LunarLake art includes an unlabeled diagram of the LionCove core!!!

From what I've read so far from LionCove's daigram:
8-Way Dispatch/Rename (GoldenCove 6-Way)
6x AGU + 2x Store/Data (GoldenCove 5x AGU + 2x SD)
6x ALU + 4x ALU-FP or 6x ALU + 4x FPU!!! (GoldenCove 3x ALU-FP + 2x ALU)

If these 10 execution ports are 4x ALU-FP + 6x ALU, this gives a gigantic amount of 10x ALU! Which, together with AGU and SD, gives 18 execution ports compared to 12 ports for GoldenCove.

Edit:
It seems that Skymont has a 3x 3-Way decoder (Gracemont and Crestmont 2x 3-Way).

https://elchapuzasinformatico.com/wp-content/uploads/2024/04/Intel-Lunar-Lake-Dark-Silicon-Tile-Dummy.jpg

I think you are reading way too much into this diagram.....
8 way Dispatch and Rename seems reasonable as does the 6 AGUs and 2 Store Data ports likely split evenly between a separate 4 port load schedule and a 4 port store scheduler like Golden Cove.

But the assumption that you are going to have a single 10 port scheduler for the Unified Math Scheduler is insane when you consider the amount of routing that a 10 port scheduler would require.....
It's much more likely that what it will be is a 6 port scheduler where all 6 ports have integer ALUs and 4 of the ports also have floating point ALUs, again just like Golden Cove, which gives a total of 14 ports (6 for the unified math scheduler, 4 for the load scheduler, and 4 for the store scheduler).

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Lifer

Diamond Member

Golden Member

Senior member

Senior member

Lifer

Junior Member

Member

Junior Member

Elite Member

Lifer

Senior member

Golden Member

Senior member

Senior member

Elite Member

Golden Member

Elite Member

Golden Member

Elite Member

Lifer

Elite Member

Golden Member

Senior member

Junior Member