Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	12 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop & Mobile H&HX	Mobile U Only	Mobile H
Process Node	Intel 4	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	8P + 16E	4P + 4E	4P + 8E
LLC	24 MB	36 MB ?	12 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

Win2012R2 · Apr 24, 2025

Abwx said:
that s why there s 4x 32b registers wich you confuse as being 128b.

ok, how big are registers in AVX-512 then.

OneEng2 · Apr 24, 2025

Fjodor2001 said:
What P/E/LPE core count mix has been leaked so far at 32C/32T for Nova Lake?

Also, note that there has been a 52 core variant mentioned in news media lately.

What DDR5 bandwidth are you assuming, and why would the cores necessarily be bandwidth limited? I think that varies a lot per use case. In many use cases you don’t need that much bandwidth, as the cores are doing heavy CPU crunching on a limited data set.

I see the same rumors as you; however, I can find no reason to imagine that ONLY a 52 core variant will be created (this seems like a financial nightmare for the company).

The utility of a 52 core variant also seems very questionable to the end customer. It is a solution to a problem that very few users have meaning that very few users would be willing to pay for it.

The use cases where a high core count product IS useful seem to require lots of bandwidth. It therefore makes no sense to build a product with so many cores only to starve it in the only situations where it would be useful.

This is just my speculation.

Abwx · Apr 24, 2025

Win2012R2 said:
ok, how big are registers in AVX-512 then

With Integer numbers either 8 numbers of 64b magnitude can be processed or 16 numbers of 32b magnitude, do the maths.

dullard · Apr 24, 2025

OneEng2 said:
When you multiply that 40$ by 100 million units, suddenly it isn't a laughing matter at all.

I personally cannot fathom a single scenario where Intel sells 100 million of the 52-core variant of Nova Lake. AMD + Intel combined often sell 70 million to 80 million desktops / workstations CPU per year. Intel's portion would be smaller and the portion of just one rumored top-end desktop CPU would be even smaller than that. Divide your number by 10 at least (probably more) to get a much more realistic value. Plus, again, Intel isn't the one buying the memory AND the new memory premium fades quickly after a few months.

OneEng2 said:
I see the same rumors as you; however, I can find no reason to imagine that ONLY a 52 core variant will be created (this seems like a financial nightmare for the company).

The utility of a 52 core variant also seems very questionable to the end customer. It is a solution to a problem that very few users have meaning that very few users would be willing to pay for it.

The use cases where a high core count product IS useful seem to require lots of bandwidth. It therefore makes no sense to build a product with so many cores only to starve it in the only situations where it would be useful.

This is just my speculation.

There never was a rumor for only a 52-core variant. Here are rumors for 16 core and 28 core versions.

https://videocardz.com/newz/intel-nova-lake-preliminary-desktop-specs-list-52-cores-16p32e4lp-configuration

Here is another rumor from our own source that adds 24 core, 12 core, and 4 core variants.

https://videocardz.com/newz/intel-nova-lake-s-for-desktops-rumored-to-feature-2x8p16e-configuration

Plus, I can't think of a recent time where Intel sold only one desktop variant without lower end models with fewer cores. Take Arrow Lake for example. There is Ultra 9 with 8P + 16E, Ultra 7 with 8P + 12E, and Ultra 5 that is either 6P + 8E or 6P + 4E. The rumored 52-core, if it exists, is only for the very top SK processor (or maybe X if Intel brings that back).

Win2012R2 · Apr 24, 2025

Abwx said:
With Integer numbers either 8 numbers of 64b magnitude can be processed or 16 numbers of 32b magnitude, do the maths.

I asked how big the register size is, here is what Intel says about it:

"Intel AVX-512 features include 32 vector registers each 512 bits wide, eight dedicated mask registers, 512-bit operations on packed floating point data or packed integer data"

Intel® AVX-512 Instructions

The latest Intel® Architecture Instruction Set Extensions Programming Reference includes the definition of Intel® Advanced Ve

www.intel.com

It does not matter how you logically divide SIMD register - the size stays the same, so for that reason SSE got 128 bit registers because they physically got 128 bits, here is quote for you:

"The SSE registers are 128 bits, and can be used to perform operations on a variety of data sizes and types."

x86 Assembly/SSE - Wikibooks, open books for an open world

en.wikibooks.org

Class dismissed.

LightningZ71 · Apr 24, 2025

I wouldn't be shocked if there is a "recovery" product that has 12+24+4 (or some other combination of e-cores).

Schmide · Apr 24, 2025

In a way now you don't really have registers, you have ports. Everything is out of order and pipelined. This doesn't mean there aren't issues and workarounds that give insight to what is happening inside the pipeline. Example early implementations intel would have slowdowns when you mixed SSE, AVX or AVX512 in any combination. This was because intel would execute the whole register file regardless of the size of the operand. So if you were doing a 256b operation with a 128b operation the 128b operation would garble the upper bits of the 256b operation. This required extra flushes which stalled the pipeline. Intel seems to have solved this in their latest few generations. AMD never seemed to have this issue.

Another caveat. AMD was able to implement an AVX512 system with 256b ports. (7000 series) This was not a perfect solution but worked surprisingly well. There are some instructions that have to operate across lanes (128b boundries) and if a lane from one port had to exchange data with another port, you can see the issue.

class in session

Win2012R2 · Apr 24, 2025

Schmide said:
In a way now you don't really have registers, you have ports

Intel calls them 512 bit registers. What's there to argue about? It a settled undisputable fact from horses mouth.

Schmide said:
AMD was able to implement an AVX512 system with 256b ports

They used full 512 bit registers.

Schmide · Apr 24, 2025

Myopic! Register has evolved to a special type of operand. They are renamed on the fly to different memory/port areas. There isn't a special area where EAX or any other register lives like there used to be. So all those MMX registers you were spousing about sharing, shared.

Win2012R2 · Apr 24, 2025

Schmide said:
Register has evolved to a special type of operand

I am taking about here maximum physical register size expressed in bits - which for AVX-512 is 512 bits and for SSE is 128 bits and MMX is 64 bits max.

Schmide · Apr 24, 2025

Win2012R2 said:
I am taking about here maximum physical register size expressed in bits - which for AVX-512 is 512 bits and for SSE is 128 bits and MMX is 64 bits max.

Actually it would be an 128b lane. In AVX256/512 you can't operate across a lane. You can transfer data across them but that's a special operation.

Win2012R2 · Apr 24, 2025

Schmide said:
Actually it would be an 128b lane.

No mate, it's a register because Intel says so -

"10.2.2 XMM Registers

Eight 128-bit XMM data registers were introduced into the IA-32 architecture with Intel SSE (see Figure 10-2).These registers can be accessed directly using the names XMM0 to XMM7; and they can be accessed independentlyfrom the x87 FPU and MMX registers and the general-purpose registers (that is, they are not aliased to any otherof the processor’s registers)."

Source: https://cdrdv2.intel.com/v1/dl/getContent/671200

Are we clear?

Schmide · Apr 24, 2025

Are we in a modern thread talking about Lakes or back in y2k?

Abwx · Apr 24, 2025

Win2012R2 said:
No mate, it's a register because Intel says so -

"10.2.2 XMM Registers

Eight 128-bit XMM data registers were introduced into the IA-32 architecture with Intel SSE (see Figure 10-2).These registers can be accessed directly using the names XMM0 to XMM7; and they can be accessed independentlyfrom the x87 FPU and MMX registers and the general-purpose registers (that is, they are not aliased to any otherof the processor’s registers)."

Source: https://cdrdv2.intel.com/v1/dl/getContent/671200

Are we clear?

That change nothing, the end result is the same, like i said four 32b operands are available at any point, and that s not 128b ops, yet another confusion of yours.

SSE used only a single data type for XMM registers:

four 32-bit single-precision floating-point numbers

Streaming SIMD Extensions - Wikipedia

en.wikipedia.org

Win2012R2 · Apr 24, 2025

Schmide said:
Are we in a modern thread talking about Lakes or back in y2k?

I'll go with whatever Intel's reference guide says.

Abwx said:
That change nothing, the end result is the same, like i said four 32b operands are available at any point, and that s not 128b ops, yet another confusion of yours.

What you said does not contradict what I said - it's a 128 bit register that can be 4x32, so what? AVX-512 can do 64x8 - does it make operations 8 bit or it's still operations over 512 bits of data, whatever the logical partition is?

Obviously in SIMD you got multiple data so of course register size is EXPECTED to be much bigger than data size and the key parameter is how big the register size is, thus indicating maximum throughput.

Schmide · Apr 24, 2025

Win2012R2 said:
I'll go with whatever Intel's reference guide says.

What you said does not contradict what I said - it's a 128 bit register that can be 4x32, so what? AVX-512 can do 64x8 - does it make operations 8 bit or it's still operations over 512 bits of data, whatever the logical partition is?

Obviously in SIMD you got multiple data so of course register size is EXPECTED to be much bigger than data size and the key parameter is how big the register size is, thus indicating maximum throughput.

PS the pdf you linked to only goes to Westmere EX.

This is like saying back in the day intel declared that the registers will be so and never evolve to meet the demands of modern processing. Register renaming will never happen. There will never be creative workarounds to get more out of the instruction set. The line must be drawn here.

Win2012R2 · Apr 24, 2025

Schmide said:
PS the pdf you linked to only goes to Westmere EX.

Wot, they changed SSE register size in later CPU models?

Schmide said:
Register renaming will never happen.

That was not under discussion at all, do you have any evidence that renamed registers are any different in size then those available explicitly? I doubt it very much, they'll have to be exactly the same size (width) as in number of bits which is, once again, 128 bits for SSE.

Schmide · Apr 24, 2025

Win2012R2 said:
Wot, they changed SSE register size in later CPU models?

That was not under discussion at all, do you have any evidence that renamed registers are any different in size then those available explicitly? I doubt it very much, they'll have to be exactly the same size (width) as in number of bits which is, once again, 128 bits for SSE.

No they mapped AVX over SSE which was the reason I expounded above for the slow down and extra flush issues. They later solved this as stated above.

In the last decade they virturalized registers such that they do not exist in one place or more specifically can exist in more than one place at once as long as their dependencies are met.

If you replace operand everywhere you say register much more of what you say will be more true.

Fjodor2001 · Apr 24, 2025

OneEng2 said:
The use cases where a high core count product IS useful seem to require lots of bandwidth. It therefore makes no sense to build a product with so many cores only to starve it in the only situations where it would be useful.

Not according to this a few pages back in this thread.

Win2012R2 · Apr 24, 2025

Schmide said:
No they mapped AVX over SSE

So what, does this change the fact that SSE registers are 128 bit, yes or no?

You are getting deep into implementation details and history lessons none of which change published by Intel spec which is pretty clear on what register is and how wide it is.

MMX, SSE, AVX - all SIMD: Single Instruction (or op) working on Multiple Data - apart from different instructions that key differentiating factor is how much of the Multiple Data an op can work on - 64 bit, 128 bit, 256 bits, since this is the main thing it's totally legit to say "128-bit op in SSE" which clearly refers to Single bloody Instruction (or an op) working over 128 bits of data, even if it does same 32 bit numbers, the key part is that there are MORE of them.

Anyway, sod this for a game of soldiers, I am off to pub!

Abwx · Apr 24, 2025

Win2012R2 said:
I'll go with whatever Intel's reference guide says.

What you said does not contradict what I said - it's a 128 bit register that can be 4x32, so what? AVX-512 can do 64x8 - does it make operations 8 bit or it's still operations over 512 bits of data, whatever the logical partition is?

Obviously in SIMD you got multiple data so of course register size is EXPECTED to be much bigger than data size and the key parameter is how big the register size is, thus indicating maximum throughput.

That was all theorical because the data path was still 64b, so only half of the SSE throughput was actualy possible, and still, only if FMUL was distributed evenly with FPADD, guess that the main objective was to counter 3Dnow at any cost even by duping the uninformed, and apparently it worked till this very day.

To compensate partially for implementing only half of SSE's architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.The issue was that Katmai's hardware-implementation contradicted the parallelism model implied by the SSE instruction-set. Programmers faced a code-scheduling dilemma: "Should the SSE-code be tuned for Katmai's limited execution resources, or should it be tuned for a future processor with more resources?"

Pentium III - Wikipedia

en.wikipedia.org

Schmide · Apr 24, 2025

Win2012R2 said:
So what, does this change the fact that SSE registers are 128 bit, yes or no?

You are getting deep into implementation details and history lessons none of which change published by Intel spec which is pretty clear on what register is and how wide it is.

MMX, SSE, AVX - all SIMD: Single Instruction (or op) working on Multiple Data - apart from different instructions that key differentiating factor is how much of the Multiple Data an op can work on - 64 bit, 128 bit, 256 bits, since this is the main thing it's totally legit to say "128-bit op in SSE" which clearly refers to Single bloody Instruction (or an op) working over 128 bits of data, even if it does same 32 bit numbers, the key part is that there are MORE of them.

Anyway, sod this for a game of soldiers, I am off to pub!

Before pentium registers were exact memory locations. Fixed size. Smaller operands within the registers were mapped to lower bits. Pentium to Nehalem registers were remapped to a register file using mostly fixed size micro-ops. Sandy Bridge on, Physical Register File mapped instruction data to micro ops to retirement registers. Pointers were used to map physical memory to registers so data no longer moved as much. From then onward buffers grew, execution pipelines widened and grew in number, allowed dependencies increased and so on. Now-a-days a register in code is just a name like any other variable. If the peg fits in the hole it comes out the other side.

Nothingness · Apr 24, 2025

I'm really surprised people fight over SSE vs 3dnow, some depending on their preference for Intel or AMD. What I will always remember is 64-bit x86, which was designed by AMD; this had a much higher impact on x86 than any SIMD extension.

Also register banks exist. They can even be spotted on floor plans. They are obviously much larger than what the ISA requires due to renaming.

MS_AT · Apr 24, 2025

Schmide said:
In a way now you don't really have registers

That is not exactly true, is it? I mean register file still has entries and limited number of these, of course much more than architectural registers, but the number of entries is fixed. I mean on Zen5 you do not have twice as many registers available when you switch from 512b to 256b operands, or between instruction sets. And the registers from different sets are aliased on top of each other (zmm on ymm on xmm). The same is true for scalar registers and scalar register file.

I think people started to confuse here too many things, some were arguing about register sizes according to ISA, other about physical entries in register file, some added renaming to the mix, other were talking about architectural register count etc. At least I ended up confused reading last few pages what people are really arguing about

Schmide · Apr 24, 2025

MS_AT said:
That is not exactly true, is it? I mean register file still has entries and limited number of these, of course much more than architectural registers, but the number of entries is fixed. I mean on Zen5 you do not have twice as many registers available when you switch from 512b to 256b operands, or between instruction sets. And the registers from different sets are aliased on top of each other (zmm on ymm on xmm). The same is true for scalar registers and scalar register file.

I think people started to confuse here too many things, some were arguing about register sizes according to ISA, other about physical entries in register file, some added renaming to the mix, other were talking about architectural register count etc. At least I ended up confused reading last few pages what people are really arguing about

They are aliased by name. If there are no dependence between them they can coexist at the same time with the same name and only the last one used will represent the reported value if queried. Do you have to copy the final value to a specific location or can it be determined by reference? That is left to the implementation. On Zen5 desktop you have 384 512b vector entries. On Zen 5 mobile you have 240 512b + ~144 256b. So you don't have twice as many but you have more if you use less bits on mobile.

The same is true for immediate and referenced values. They have scope and mapping and can be reused/mapped over. Is a referenced value any different if used and discarded?

If you look at the opcodes spit out by compilers they continuously reuse the same register references over and over rarely stacking more than a couple depths. Does this reuse block execution? No. They just allocate another virtual register from the physical file making sure writes to memory are made in order.

This whole argument started with the 3dnow and MMX registers. Those were fixed sized without masks and you can say how big they are.

Now we have very wide registers with masks. How big is a SSE register. On Zen5 it can be 512b with a mask set to 128b. On Zen5 mobile it can be both that and a 256b.

Yes this is very nit picky. Am I right? Probably not exactly. But asking how big a register is, isn't exactly cut and dry these days anyways.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Senior member

Senior member

Lifer

Elite Member

Senior member

Platinum Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Lifer

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member