Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	8 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (20A)	Arrow Lake (N3B)	Arrow Lake Refresh (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop Only	Desktop & Mobile H&HX	Desktop Only	Mobile U Only	Mobile H
Process Node	Intel 4	Intel 20A	TSMC N3B	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Q1 2025 ?	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2025 ?	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	6P + 8E ?	8P + 16E	8P + 32E	4P + 4E	4P + 8E
LLC	24 MB	24 MB ?	36 MB ?	?	8 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

eek2121 · Dec 30, 2023

SiliconFly said:
x86-64 apps on ARM require virtualization. Not exactly an ideal situation as the user experience suffers. But we can get most of the apps to working to a certain level but requires patience. Not just MS, but top software companies should start coming out with native ARM apps. Native is still the best.

Emulation, not virtualization. Important difference. The entirety of both x86 and x86-64 instruction sets have to be emulated. Virtualization runs most non-privileged instructions natively and thus can execute code at near native speed.

You are looking at a good 30-50% performance hit for emulated applications.

TESKATLIPOKA said:
MTL is a massive battery life win? It clearly depends on laptop, Acer Swift Go 14 SFG14-72 ended up much worse in comparison despite having the same CPU.
On the other hand, MSI did a great job in selecting power efficient components and putting the biggest battery possible inside.

BTW, this was in what you posted.
View attachment 91196
Raptor Lake U managed 954 minutes with only a 68W battery, If I normalized It to 99.9W then It should manage 1401 minutes.
Not really apples to apples comparison for both CPU and laptop, but I wanted to show that It's not really depended on CPU, because they are very efficient light loads, but more on the rest of components(SSD, RAM, display).

A few here have mentioned a microcode update which improves performance (I haven’t looked into it). Not every vendor apparently has implemented this. I would check back sometime in January. At least some benchmarks at various reviewers should be updated.

eek2121 · Dec 30, 2023

coercitiv said:
A single thread that can be perfectly sliced by RU can be sliced perfectly by the programmer or the compiler as well. You asked for the ideal situation, that's what happens when things are pushed to the limit.

RU has been around in various forms for many years and nobody has been able to make it work. I will be surprised if Intel does.

SiliconFly · Dec 31, 2023

eek2121 said:
RU has been around in various forms for many years and nobody has been able to make it work. I will be surprised if Intel does.

Totally agree. Thats what I was thinking too. Getting RU to work is very far-fetched. And the only person who's spoken about RU is MLID (similar articles were based on his leak). So, makes me think, is the leak even legit? Looks like a dud.

SiliconFly · Dec 31, 2023

coercitiv said:
A single thread that can be perfectly sliced by RU can be sliced perfectly by the programmer or the compiler as well. You asked for the ideal situation, that's what happens when things are pushed to the limit.

Programmers? Compilers? i thought we were talking about runtime code execution by the CPU. Not writing & compiling them.

TESKATLIPOKA · Dec 31, 2023

mikk said:
Cinebench scores from a Thinkpad Carbon: https://www.bilibili.com/video/BV1PQ4y1J7K9/?spm_id_from=333.337.search-card.all.click

Sustained power around 30W and PL2 55W (6:45).

Those scores don't look like they are for sustained 30W.
Both CB R15 and R23 is practically the same as MSI Prestige, which uses the same CPU and has 110W PL2 and 44W PL1.

TESKATLIPOKA · Dec 31, 2023

eek2121 said:
A few here have mentioned a microcode update which improves performance (I haven’t looked into it). Not every vendor apparently has implemented this. I would check back sometime in January. At least some benchmarks at various reviewers should be updated.

You do realize I am talking about Wifi battery life and not performance, right?
What does It have in common with that microcode? You expect It will also increase wifi battery life?

SiliconFly · Dec 31, 2023

TESKATLIPOKA said:
You do realize I am talking about Wifi battery life and not performance, right?
What does It have in common with that microcode? You expect It will also increase wifi battery life?

I remember reading that the new pcode update also increases battery life.

naukkis · Dec 31, 2023

coercitiv said:
A single thread that can be perfectly sliced by RU can be sliced perfectly by the programmer or the compiler as well. You asked for the ideal situation, that's what happens when things are pushed to the limit.

No it can't. Rentable units approach can slice workload within thread - if implementation shares register file slicing can happen in register level. But to fully utilize such a approach there's need to really complex front-end or instruction set that support execution partitioning scheme.

maddie · Dec 31, 2023

It seems that "Rentable units" is simply "Intelese" for the old concept of "reverse hyperthreading". Softmachines and their technology was discussed here many years ago and Anandtech even had an article. AMD, Intel and several other big names were investors until Intel bought the company outright. I always wondered when we would see real world results and I guess it will be soon.

For those interested and not wanting to make ridiculous ignorant claims, here it is. https://www.anandtech.com/print/10025/examining-soft-machines-architecture-visc-ipc

mikk · Dec 31, 2023

TESKATLIPOKA said:
Those scores don't look like they are for sustained 30W.
Both CB R15 and R23 is practically the same as MSI Prestige, which uses the same CPU and has 110W PL2 and 44W PL1.

The MIS bios is almost 3 months old, does it even run with the pcode update? Hwinfo says 30W current and max 55W in their video. One CB R23 run with this score requires maybe 50 seconds, you have to keep in mind that the PL2 has a big effect if it's a first run score. 155H at 55W can do slightly over 16K with the update pcode. If it run 30 seconds with 55W it could be possible to reach high 14k scores.

SiliconFly · Dec 31, 2023

maddie said:
It seems that "Rentable units" is simply "Intelese" for the old concept of "reverse hyperthreading". Softmachines and their technology was discussed here many years ago and Anandtech even had an article. AMD, Intel and several other big names were investors until Intel bought the company outright. I always wondered when we would see real world results and I guess it will be soon.

For those interested and not wanting to make ridiculous ignorant claims, here it is. https://www.anandtech.com/print/10025/examining-soft-machines-architecture-visc-ipc

Nothing ridiculous or ignorant about any of the claims. They're perfectly in line with the article you posted. A very interesting piece. It makes one thing very clear that RU may actually be real and might debut with a future Intel product. Thats all.

It says what was exactly discussed before, slicing a single thread into smaller pieces and executing the pieces simultaneously across multiple cores. Soft machine's implementation uses a rather complex and novel approach that we didn't come across earlier.

It uses a translation layer to translate the existing machine code into its own proprietary instructions and then feeds that translated code to a Global Front End (more like splitter+scheduler combined) which then slices the thread into smaller pieces and feeds the pieces to multiple virtual cores (probably emulated).

Like what was discussed earlier, this has the potential to increase ST performance a lot. We do not know the performance penalty of this approach, but the article says it's not much (???). So, if Intel manages to successfully combine 2 cores into a single cluster with RU, it can increase a threads ST performance beyond a single cores ST performance. Maybe by upto 2X under "ideal" conditions. Just guessing.

But the implementation looks rather messy and might over-complicate the architecture and turn it into a sh%t show. Just my opinion.

Also, a hypothetical i9-13900K with RU can have a single core Geekbench 6 score of 5000 or above!

naukkis · Dec 31, 2023

maddie said:
It seems that "Rentable units" is simply "Intelese" for the old concept of "reverse hyperthreading". Softmachines and their technology was discussed here many years ago and Anandtech even had an article. AMD, Intel and several other big names were investors until Intel bought the company outright. I always wondered when we would see real world results and I guess it will be soon.

For those interested and not wanting to make ridiculous ignorant claims, here it is. https://www.anandtech.com/print/10025/examining-soft-machines-architecture-visc-ipc

Visc is that instruction set supporting execution partitioning. It ain't coming into x86. But, as cpu fron-ends are now extremely complex such a front-end could theoretically extract two independent instruction streams within loops and execute them in independent execution units. Such a approach would probably need shared register file - or at least direct input-output ports between register files to be able extract performance benefits. Relying normal data load/storing would make such a approach only racing from resources. But it's sure doable for x86 too.

maddie · Dec 31, 2023

SiliconFly said:
Nothing ridiculous or ignorant about any of the claims. They're perfectly in line with the article you posted. A very interesting piece. It makes one thing very clear that RU may actually be real and might debut with a future Intel product. Thats all.

It says what was exactly discussed before, slicing a single thread into smaller pieces and executing the pieces simultaneously across multiple cores. Soft machine's implementation uses a rather complex and novel approach that we didn't come across earlier.

It uses a translation layer to translate the existing machine code into its own proprietary instructions and then feeds that translated code to a Global Front End (more like splitter+scheduler combined) which then slices the thread into smaller pieces and feeds the pieces to multiple virtual cores (probably emulated).

Like what was discussed earlier, this has the potential to increase ST performance a lot. We do not know the performance penalty of this approach, but the article says it's not much (???). So, if Intel manages to successfully combine 2 cores into a single cluster with RU, it can increase a threads ST performance beyond a single cores ST performance. Maybe by upto 2X under "ideal" conditions. Just guessing.

But the implementation looks rather messy and might over-complicate the architecture and turn it into a sh%t show. Just my opinion.

Seems I read this recently. 8X 1T performance increase for 8 core CPU?

LightningZ71 · Dec 31, 2023

It does look like something that could be used with a cluster of whatever the current e-cores evolve into. They already share an L2, so there's rather tight integration between them already, though I realize that RU goes FAR beyond that...

SiliconFly · Dec 31, 2023

maddie said:
Seems I read this recently. 8X 1T performance increase for 8 core CPU?

Nope. Hypothetically speaking if Intel has a 8 core RU implementation, then yes. In real world, no. I don't think they even have a 8 core RU concept in the drawing board as of now. Maybe a 2 core RU implementation like what one leak suggested.

maddie · Dec 31, 2023

naukkis said:
Visc is that instruction set supporting execution partitioning. It ain't coming into x86. But, as cpu fron-ends are now extremely complex such a front-end could theoretically extract two independent instruction streams within loops and execute them in independent execution units. Such a approach would probably need shared register file - or at least direct input-output ports between register files to be able extract performance benefits. Relying normal data load/storing would make such a approach only racing from resources. But it's sure doable for x86 too.

That's why it says VISC-like. A new concept is often described in terms of known things, as the new unique words don't exist yet.

SiliconFly · Dec 31, 2023

maddie said:
That's why it says VISC-like. A new concept is often described in terms of known things, as the new unique words don't exist yet.

naukkis said:
Visc is that instruction set supporting execution partitioning. It ain't coming into x86. But, as cpu fron-ends are now extremely complex such a front-end could theoretically extract two independent instruction streams within loops and execute them in independent execution units. Such a approach would probably need shared register file - or at least direct input-output ports between register files to be able extract performance benefits. Relying normal data load/storing would make such a approach only racing from resources. But it's sure doable for x86 too.

...It uses a translation layer to translate the existing machine code into its own proprietary instructions and then feeds that translated code to a Global Front End...

This tech can work on any processor. ARM or Intel.

maddie · Dec 31, 2023

SiliconFly said:
Nope. Hypothetically speaking if Intel has a 8 core RU implementation, then yes. In real world, no. I don't think they even have a 8 core RU concept in the drawing board as of now. Maybe a 2 core RU implementation like what one leak suggested.

Do you think I'm claiming the 800% speedup?

SiliconFly said:
...It uses a translation layer to translate the existing machine code into its own proprietary instructions and then feeds that translated code to a Global Front End...

This tech can work on any processor. ARM or Intel.

Yes, as nearly all, if not all, computing technologies.

naukkis · Dec 31, 2023

SiliconFly said:
...It uses a translation layer to translate the existing machine code into its own proprietary instructions and then feeds that translated code to a Global Front End...

This tech can work on any processor. ARM or Intel.

Sure they can have transmeta-style executing layer which then also distributes that code into multiple cores. But we haven't yet had transmeta-styled cpu for one core rivaling hardware-based cpus performance so that route sure won't go anywhere. VISC needs data partitioning in instruction set to work.

TESKATLIPOKA · Dec 31, 2023

mikk said:
The MIS bios is almost 3 months old, does it even run with the pcode update? Hwinfo says 30W current and max 55W in their video. One CB R23 run with this score requires maybe 50 seconds, you have to keep in mind that the PL2 has a big effect if it's a first run score. 155H at 55W can do slightly over 16K with the update pcode. If it run 30 seconds with 55W it could be possible to reach high 14k scores.

From that image you originally posted, the viewer would think that It was at 30W for the whole run but It wasn't, because PL2 was affecting It quite a bit.
That was my point.

rtxtwt · Dec 31, 2023

SPECint ST efficiency comparison

https://twitter.com/x/status/1741390546562883694

coercitiv · Dec 31, 2023

SiliconFly said:
Nothing ridiculous or ignorant about any of the claims.

SiliconFly said:
Also, a hypothetical i9-13900K with RU can have a single core Geekbench 6 score of 5000 or above!

Grabs popcorn.

SiliconFly · Dec 31, 2023

naukkis said:
Sure they can have transmeta-style executing layer which then also distributes that code into multiple cores. But we haven't yet had transmeta-styled cpu for one core rivaling hardware-based cpus performance so that route sure won't go anywhere. VISC needs data partitioning in instruction set to work.

Actually, the article specifically states that they've overcome the transmeta-style translation bottleneck. The more worrying part is not the translation layer, but the virtual cores itself. God only know it's overhead, but the article again claims even that issue has been solved. Doesn't sound very feasible to me and I believe the overheads will be very significant. Not sure.

SiliconFly · Dec 31, 2023

coercitiv said:
Grabs popcorn.

Thats the whole point of this discussion. Rentable Units. Sounds like magic, cos it may not be real at all. Intel may never get it to work due to the complexities involved. But if it is real, then these are the expected numbers.

The whole discussion thats happening is based on "If it is real". Maybe. Maybe not. Who knows!

Either way, time to give RU a break. Even if it's real, we aren't gonna see it for many years.

SiliconFly · Dec 31, 2023

maddie said:
Do you think I'm claiming the 800% speedup?

What I meant was, in a hypothetical scenario, where RU exists in real world, and Intel manages to get it working and manages to overcome all the issues related to its complexity, then thats the the theoretical maximum that we can expect in a 8 core cluster with RU. Just hypothesizing. It's to explain the significance of RU. Nothing more. Not a real world scenario.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Platinum Member

Platinum Member

Golden Member

Golden Member

Platinum Member

Platinum Member

Golden Member

Senior member

Diamond Member

Diamond Member

Golden Member

Senior member

Diamond Member

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Senior member

Platinum Member

Senior member

Diamond Member

Golden Member

Golden Member

Golden Member