Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 775 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
720
677
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4TSMC N3BTSMC N3BIntel 18A
DateQ4 2023Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P8P + 16E4P + 4E4P + 8E
LLC24 MB36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,021
  • LNL.png
    881.8 KB · Views: 25,510
Last edited:

OneEng2

Senior member
Sep 19, 2022
512
742
106
What P/E/LPE core count mix has been leaked so far at 32C/32T for Nova Lake?

Also, note that there has been a 52 core variant mentioned in news media lately.


What DDR5 bandwidth are you assuming, and why would the cores necessarily be bandwidth limited? I think that varies a lot per use case. In many use cases you don’t need that much bandwidth, as the cores are doing heavy CPU crunching on a limited data set.
I see the same rumors as you; however, I can find no reason to imagine that ONLY a 52 core variant will be created (this seems like a financial nightmare for the company).

The utility of a 52 core variant also seems very questionable to the end customer. It is a solution to a problem that very few users have meaning that very few users would be willing to pay for it.

The use cases where a high core count product IS useful seem to require lots of bandwidth. It therefore makes no sense to build a product with so many cores only to starve it in the only situations where it would be useful.

This is just my speculation.
 

dullard

Elite Member
May 21, 2001
25,763
4,289
126
When you multiply that 40$ by 100 million units, suddenly it isn't a laughing matter at all.
I personally cannot fathom a single scenario where Intel sells 100 million of the 52-core variant of Nova Lake. AMD + Intel combined often sell 70 million to 80 million desktops / workstations CPU per year. Intel's portion would be smaller and the portion of just one rumored top-end desktop CPU would be even smaller than that. Divide your number by 10 at least (probably more) to get a much more realistic value. Plus, again, Intel isn't the one buying the memory AND the new memory premium fades quickly after a few months.
I see the same rumors as you; however, I can find no reason to imagine that ONLY a 52 core variant will be created (this seems like a financial nightmare for the company).

The utility of a 52 core variant also seems very questionable to the end customer. It is a solution to a problem that very few users have meaning that very few users would be willing to pay for it.

The use cases where a high core count product IS useful seem to require lots of bandwidth. It therefore makes no sense to build a product with so many cores only to starve it in the only situations where it would be useful.

This is just my speculation.
There never was a rumor for only a 52-core variant. Here are rumors for 16 core and 28 core versions.

Here is another rumor from our own source that adds 24 core, 12 core, and 4 core variants.

Plus, I can't think of a recent time where Intel sold only one desktop variant without lower end models with fewer cores. Take Arrow Lake for example. There is Ultra 9 with 8P + 16E, Ultra 7 with 8P + 12E, and Ultra 5 that is either 6P + 8E or 6P + 4E. The rumored 52-core, if it exists, is only for the very top SK processor (or maybe X if Intel brings that back).
 
Reactions: Thunder 57

Win2012R2

Senior member
Dec 5, 2024
892
852
96
With Integer numbers either 8 numbers of 64b magnitude can be processed or 16 numbers of 32b magnitude, do the maths.
I asked how big the register size is, here is what Intel says about it:

"Intel AVX-512 features include 32 vector registers each 512 bits wide, eight dedicated mask registers, 512-bit operations on packed floating point data or packed integer data"


It does not matter how you logically divide SIMD register - the size stays the same, so for that reason SSE got 128 bit registers because they physically got 128 bits, here is quote for you:

"The SSE registers are 128 bits, and can be used to perform operations on a variety of data sizes and types."


Class dismissed.
 

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
In a way now you don't really have registers, you have ports. Everything is out of order and pipelined. This doesn't mean there aren't issues and workarounds that give insight to what is happening inside the pipeline. Example early implementations intel would have slowdowns when you mixed SSE, AVX or AVX512 in any combination. This was because intel would execute the whole register file regardless of the size of the operand. So if you were doing a 256b operation with a 128b operation the 128b operation would garble the upper bits of the 256b operation. This required extra flushes which stalled the pipeline. Intel seems to have solved this in their latest few generations. AMD never seemed to have this issue.

Another caveat. AMD was able to implement an AVX512 system with 256b ports. (7000 series) This was not a perfect solution but worked surprisingly well. There are some instructions that have to operate across lanes (128b boundries) and if a lane from one port had to exchange data with another port, you can see the issue.

class in session
 

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
Myopic! Register has evolved to a special type of operand. They are renamed on the fly to different memory/port areas. There isn't a special area where EAX or any other register lives like there used to be. So all those MMX registers you were spousing about sharing, shared.
 

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
I am taking about here maximum physical register size expressed in bits - which for AVX-512 is 512 bits and for SSE is 128 bits and MMX is 64 bits max.

Actually it would be an 128b lane. In AVX256/512 you can't operate across a lane. You can transfer data across them but that's a special operation.
 

Win2012R2

Senior member
Dec 5, 2024
892
852
96
Actually it would be an 128b lane.

No mate, it's a register because Intel says so -

"10.2.2 XMM Registers

Eight 128-bit XMM data registers were introduced into the IA-32 architecture with Intel SSE (see Figure 10-2).These registers can be accessed directly using the names XMM0 to XMM7; and they can be accessed independentlyfrom the x87 FPU and MMX registers and the general-purpose registers (that is, they are not aliased to any otherof the processor’s registers).
"

Source: https://cdrdv2.intel.com/v1/dl/getContent/671200

Are we clear?
 

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
No mate, it's a register because Intel says so -

"10.2.2 XMM Registers

Eight 128-bit XMM data registers were introduced into the IA-32 architecture with Intel SSE (see Figure 10-2).These registers can be accessed directly using the names XMM0 to XMM7; and they can be accessed independentlyfrom the x87 FPU and MMX registers and the general-purpose registers (that is, they are not aliased to any otherof the processor’s registers).
"

Source: https://cdrdv2.intel.com/v1/dl/getContent/671200

Are we clear?
That change nothing, the end result is the same, like i said four 32b operands are available at any point, and that s not 128b ops, yet another confusion of yours.
SSE used only a single data type for XMM registers:



 

Win2012R2

Senior member
Dec 5, 2024
892
852
96
Are we in a modern thread talking about Lakes or back in y2k?
I'll go with whatever Intel's reference guide says.
That change nothing, the end result is the same, like i said four 32b operands are available at any point, and that s not 128b ops, yet another confusion of yours.
What you said does not contradict what I said - it's a 128 bit register that can be 4x32, so what? AVX-512 can do 64x8 - does it make operations 8 bit or it's still operations over 512 bits of data, whatever the logical partition is?

Obviously in SIMD you got multiple data so of course register size is EXPECTED to be much bigger than data size and the key parameter is how big the register size is, thus indicating maximum throughput.
 
Reactions: Nothingness

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
I'll go with whatever Intel's reference guide says.

What you said does not contradict what I said - it's a 128 bit register that can be 4x32, so what? AVX-512 can do 64x8 - does it make operations 8 bit or it's still operations over 512 bits of data, whatever the logical partition is?

Obviously in SIMD you got multiple data so of course register size is EXPECTED to be much bigger than data size and the key parameter is how big the register size is, thus indicating maximum throughput.
PS the pdf you linked to only goes to Westmere EX.

This is like saying back in the day intel declared that the registers will be so and never evolve to meet the demands of modern processing. Register renaming will never happen. There will never be creative workarounds to get more out of the instruction set. The line must be drawn here.
 

Win2012R2

Senior member
Dec 5, 2024
892
852
96
PS the pdf you linked to only goes to Westmere EX.
Wot, they changed SSE register size in later CPU models?
Register renaming will never happen.
That was not under discussion at all, do you have any evidence that renamed registers are any different in size then those available explicitly? I doubt it very much, they'll have to be exactly the same size (width) as in number of bits which is, once again, 128 bits for SSE.
 

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
Wot, they changed SSE register size in later CPU models?

That was not under discussion at all, do you have any evidence that renamed registers are any different in size then those available explicitly? I doubt it very much, they'll have to be exactly the same size (width) as in number of bits which is, once again, 128 bits for SSE.
No they mapped AVX over SSE which was the reason I expounded above for the slow down and extra flush issues. They later solved this as stated above.

In the last decade they virturalized registers such that they do not exist in one place or more specifically can exist in more than one place at once as long as their dependencies are met.

If you replace operand everywhere you say register much more of what you say will be more true.
 

Win2012R2

Senior member
Dec 5, 2024
892
852
96
No they mapped AVX over SSE
So what, does this change the fact that SSE registers are 128 bit, yes or no?

You are getting deep into implementation details and history lessons none of which change published by Intel spec which is pretty clear on what register is and how wide it is.

MMX, SSE, AVX - all SIMD: Single Instruction (or op) working on Multiple Data - apart from different instructions that key differentiating factor is how much of the Multiple Data an op can work on - 64 bit, 128 bit, 256 bits, since this is the main thing it's totally legit to say "128-bit op in SSE" which clearly refers to Single bloody Instruction (or an op) working over 128 bits of data, even if it does same 32 bit numbers, the key part is that there are MORE of them.

Anyway, sod this for a game of soldiers, I am off to pub!
 

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
I'll go with whatever Intel's reference guide says.

What you said does not contradict what I said - it's a 128 bit register that can be 4x32, so what? AVX-512 can do 64x8 - does it make operations 8 bit or it's still operations over 512 bits of data, whatever the logical partition is?

Obviously in SIMD you got multiple data so of course register size is EXPECTED to be much bigger than data size and the key parameter is how big the register size is, thus indicating maximum throughput.
That was all theorical because the data path was still 64b, so only half of the SSE throughput was actualy possible, and still, only if FMUL was distributed evenly with FPADD, guess that the main objective was to counter 3Dnow at any cost even by duping the uninformed, and apparently it worked till this very day.



To compensate partially for implementing only half of SSE's architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.The issue was that Katmai's hardware-implementation contradicted the parallelism model implied by the SSE instruction-set. Programmers faced a code-scheduling dilemma: "Should the SSE-code be tuned for Katmai's limited execution resources, or should it be tuned for a future processor with more resources?"

 

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
So what, does this change the fact that SSE registers are 128 bit, yes or no?

You are getting deep into implementation details and history lessons none of which change published by Intel spec which is pretty clear on what register is and how wide it is.

MMX, SSE, AVX - all SIMD: Single Instruction (or op) working on Multiple Data - apart from different instructions that key differentiating factor is how much of the Multiple Data an op can work on - 64 bit, 128 bit, 256 bits, since this is the main thing it's totally legit to say "128-bit op in SSE" which clearly refers to Single bloody Instruction (or an op) working over 128 bits of data, even if it does same 32 bit numbers, the key part is that there are MORE of them.

Anyway, sod this for a game of soldiers, I am off to pub!

Before pentium registers were exact memory locations. Fixed size. Smaller operands within the registers were mapped to lower bits. Pentium to Nehalem registers were remapped to a register file using mostly fixed size micro-ops. Sandy Bridge on, Physical Register File mapped instruction data to micro ops to retirement registers. Pointers were used to map physical memory to registers so data no longer moved as much. From then onward buffers grew, execution pipelines widened and grew in number, allowed dependencies increased and so on. Now-a-days a register in code is just a name like any other variable. If the peg fits in the hole it comes out the other side.
 
Reactions: moinmoin

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,357
136
I'm really surprised people fight over SSE vs 3dnow, some depending on their preference for Intel or AMD. What I will always remember is 64-bit x86, which was designed by AMD; this had a much higher impact on x86 than any SIMD extension.

Also register banks exist. They can even be spotted on floor plans. They are obviously much larger than what the ISA requires due to renaming.
 

MS_AT

Senior member
Jul 15, 2024
599
1,253
96
In a way now you don't really have registers
That is not exactly true, is it? I mean register file still has entries and limited number of these, of course much more than architectural registers, but the number of entries is fixed. I mean on Zen5 you do not have twice as many registers available when you switch from 512b to 256b operands, or between instruction sets. And the registers from different sets are aliased on top of each other (zmm on ymm on xmm). The same is true for scalar registers and scalar register file.

I think people started to confuse here too many things, some were arguing about register sizes according to ISA, other about physical entries in register file, some added renaming to the mix, other were talking about architectural register count etc. At least I ended up confused reading last few pages what people are really arguing about
 

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
That is not exactly true, is it? I mean register file still has entries and limited number of these, of course much more than architectural registers, but the number of entries is fixed. I mean on Zen5 you do not have twice as many registers available when you switch from 512b to 256b operands, or between instruction sets. And the registers from different sets are aliased on top of each other (zmm on ymm on xmm). The same is true for scalar registers and scalar register file.

I think people started to confuse here too many things, some were arguing about register sizes according to ISA, other about physical entries in register file, some added renaming to the mix, other were talking about architectural register count etc. At least I ended up confused reading last few pages what people are really arguing about

They are aliased by name. If there are no dependence between them they can coexist at the same time with the same name and only the last one used will represent the reported value if queried. Do you have to copy the final value to a specific location or can it be determined by reference? That is left to the implementation. On Zen5 desktop you have 384 512b vector entries. On Zen 5 mobile you have 240 512b + ~144 256b. So you don't have twice as many but you have more if you use less bits on mobile.

The same is true for immediate and referenced values. They have scope and mapping and can be reused/mapped over. Is a referenced value any different if used and discarded?

If you look at the opcodes spit out by compilers they continuously reuse the same register references over and over rarely stacking more than a couple depths. Does this reuse block execution? No. They just allocate another virtual register from the physical file making sure writes to memory are made in order.

This whole argument started with the 3dnow and MMX registers. Those were fixed sized without masks and you can say how big they are.

Now we have very wide registers with masks. How big is a SSE register. On Zen5 it can be 512b with a mask set to 128b. On Zen5 mobile it can be both that and a 256b.

Yes this is very nit picky. Am I right? Probably not exactly. But asking how big a register is, isn't exactly cut and dry these days anyways.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |