Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 776 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
720
677
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4TSMC N3BTSMC N3BIntel 18A
DateQ4 2023Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P8P + 16E4P + 4E4P + 8E
LLC24 MB36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,021
  • LNL.png
    881.8 KB · Views: 25,510
Last edited:

MS_AT

Senior member
Jul 15, 2024
599
1,253
96
But asking how big a register is, isn't exactly cut and dry these days anyways.
This I think here is where we disagree. I mean while physical entry size in register file is up to implementation, SSE architectural register size is defined to be 128b (the xmm register), AVX512 with VL extension supports xmm, ymm (256b) and zmm (512b) registers each with well defined bit width. How this is implemented in HW is another matter

They just allocate another virtual register from the physical file making sure writes to memory are made in order.
Compiler cannot allocate more than there are architectural registers. It will spill to stack as soon as it thinks it already used all architectural registers. Renaming is opaque to the compiler and is used by OoO engine to solve other problems like write after write or write after read. And its OoO engine that makes sure the results are observable in program order, not the compiler.
 

Win2012R2

Senior member
Dec 5, 2024
893
852
96
How big is a SSE register.
128 bits as per Intel who invented it and documented a few aeons ago

What's your number?

Am I right?

Wrong because you are still confusing hardware implementation with ISA spec - sure there are lots of registers now and obviously they have to be contiguous, so 512 bit register can be used for 256 bit purposes, so what - does it mean SSE is suddenly not 128 bit but 512 bit because in modern CPU hardware decided to use one of 512 bit registers?

No - it's still 128 bit, as per bloody spec.

On Zen5 it can be 512b with a mask set to 128b. On Zen5 mobile it can be both that and a 256b

No, AVX-512 registers are 512 bits wide - on Zen 4 or Zen 5 or Zen 5C and on Zen 6 they will also be 512 bits wide. In fact even in Zen 1000 they will have to be 512 bits wide, because that's how they are specified!
 
Last edited:

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
If you look at the opcodes spit out by compilers they continuously reuse the same register references over and over rarely stacking more than a couple depths. Does this reuse block execution? No. They just allocate another virtual register from the physical file making sure writes to memory are made in order.

Compiler cannot allocate more than there are architectural registers. It will spill to stack as soon as it thinks it already used all architectural registers. Renaming is opaque to the compiler and is used by OoO engine to solve other problems like write after write or write after read. And its OoO engine that makes sure the results are observable in program order, not the compiler.

You took this this out of context. I said the compiler is reusing the same registers over and over.

example code

Code:
    int xvals[8] = {0,1,2,3,4,5,6,7};
    int end = sizeof(xvals) / sizeof(int);
    int i = 0;
    do {
        xvals[i]++;
    } while (++i < end);

the loop minus the array setup

Code:
.L2:
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     eax, DWORD PTR [rbp-48+rax*4]
        lea     edx, [rax+1]
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     DWORD PTR [rbp-48+rax*4], edx
        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        setl    al
        test    al, al
        jne     .L2

The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine. My point was compilers generally don't use more than a few named registers.
 

MS_AT

Senior member
Jul 15, 2024
599
1,253
96
The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine
Nope, they are not spilled over to the stack because compiler is storing results on the stack because array is kept there So in other words compiler is loading value from the stack, incrementing it by one and storing it to stack. There is no reason it would need to use more registers than EAX/RAX since x64 allows memory operands. OoO engine will rename whatever false dependencies it will spot in this code but the reason you are seeing EAX/RAX only is because your code does not need more active registers. In other words compiler does not have to keep the value "alive" in registers for the whole duration of your program.

Out of Order engine is a hardware property. Compiler does not know on what hardware your code will run [most of the time] so it will not assume you have X size of register file to know how well it can do register renaming. The whole magic of OoO is that it is opaque to the compiler and the programmer.
 
Last edited:

Schmide

Diamond Member
Mar 7, 2002
5,682
912
126
Nope, they are not spilled over to the stack because compiler is storing results on the stack because array is kept there So in other words compiler is loading value from the stack, incrementing it by one and storing it to stack. There is no reason it would need to use more registers than EAX/RAX since x64 allows memory operands. OoO engine will rename whatever false dependencies it will spot in this code but the reason you are seeing EAX/RAX only is because your code does not need more active registers. In other words compiler does not have to keep the value "alive" in registers for the whole duration of your program.

Out of Order engine is a hardware property. Compiler does not know on what hardware your code will run [most of the time] so it will not assume you have X size of register file to know how well it can do register renaming. The whole magic of OoO is that it is opaque to the compiler and the programmer.

The OoO may be opaque to the compiler. Compilers still produce code in ways that help the OoO engine take advantage of register renaming.

I've looked at a lot of code spit out by the compilers. Rarely do you see more than 3 or 4 registers. I don't think I've ever seen a numbered register.

As for referenced values. That's the whole x86 thing. RISC requires you to load everything then operate on it. Part of what I was saying previously was labeled memory is more like named registers.
 

eek2121

Diamond Member
Aug 2, 2005
3,318
4,880
136
I personally cannot fathom a single scenario where Intel sells 100 million of the 52-core variant of Nova Lake. AMD + Intel combined often sell 70 million to 80 million desktops / workstations CPU per year. Intel's portion would be smaller and the portion of just one rumored top-end desktop CPU would be even smaller than that. Divide your number by 10 at least (probably more) to get a much more realistic value. Plus, again, Intel isn't the one buying the memory AND the new memory premium fades quickly after a few months.

There never was a rumor for only a 52-core variant. Here are rumors for 16 core and 28 core versions.

Here is another rumor from our own source that adds 24 core, 12 core, and 4 core variants.

Plus, I can't think of a recent time where Intel sold only one desktop variant without lower end models with fewer cores. Take Arrow Lake for example. There is Ultra 9 with 8P + 16E, Ultra 7 with 8P + 12E, and Ultra 5 that is either 6P + 8E or 6P + 4E. The rumored 52-core, if it exists, is only for the very top SK processor (or maybe X if Intel brings that back).
Maybe I missed it, however I don’t think anyone was ever claiming there would only be a 52 score variant.

FWIW I have a hard time believing such a part exists at all. N2 and 18A cost significantly more than previous processes, mastering PPA is a must for both Intel and AMD. The 285k is on N3, and Intel’s core sizes will eat up any improvements that N2/18A provide.

If anything, they will probably have a 12/24 + 4 part at the top end, with 2 6 + 12 core CCDs. I have seen Intel do crazier things, however.

Just recently, they announced another round of layoffs, so I definitely wouldn’t hold my breath.
 
Reactions: OneEng2

Win2012R2

Senior member
Dec 5, 2024
893
852
96
That was all theorical because the data path was still 64b, so only half of the SSE throughput was actualy possible
The size of SSE registers, however, was, is and will always be - 128 bits, even if there is no single data type that is 128 bits, which obviously is the case because it's called SIMD for a reason.

Register renaming, compilers, spilling stack, not full data path, dinosaurs roaming the Earth, all that are entirely different matters.
 
Reactions: Nothingness

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
The size of SSE registers, however, was, is and will always be - 128 bits, even if there is no single data type that is 128 bits, which obviously is the case because it's called SIMD for a reason.

Register renaming, compilers, spilling stack, not full data path, dinosaurs roaming the Earth, all that are entirely different matters.
That was 128b on paper and 64b in real world, the same way as a tank with 2x the volume but with an unchanged output pipe diameter and flow speed.
 

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
So then are AVX-512 registers in Zen 4 also 512 bit only on paper because execution is double pumped and data path can't load whole register in one go, yes or no?
Not the same thing as the pentium 3 also lacked the necessary exe ressources, wich is not the case of Zen 4.

I put it again since you have trouble grasping all the info :

To compensate partially for implementing only half of SSE's architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.The issue was that Katmai's hardware-implementation contradicted the parallelism model implied by the SSE instruction-set. Programmers faced a code-scheduling dilemma: "Should the SSE-code be tuned for Katmai's limited execution resources, or should it be tuned for a future processor with more resources?"

 

Win2012R2

Senior member
Dec 5, 2024
893
852
96
Not the same thing as the pentium 3 also lacked the necessary exe ressources, wich is not the case of Zen 4.
Zen 4 totally lacks necessary resources for AVX 512 which is why it's "double pumped", you really have double standards here.

Even Zen 5 can't load 64 bytes from L3 in one go (only L1 and L2), does it make Zen 5's registers half size? No!

Anyway, SSE is 128 bit, as per Intel spec, as per physical implementation, actual execution how fast or slow it is, whether it's microcode even does not matter - spec for size is spec for size, end of.
 

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
Zen 4 totally lacks necessary resources for AVX 512 which is why it's "double pumped", you really have double standards here.

Anyway, SSE is 128 bit, as per Intel spec, as per physical implementation, actual execution how fast or slow it is, whether it's microcode even does not matter - spec for size is spec for size, end of.

A baby need to be fed with a spoon.
Zen 4 has not such a limitation, read better :

This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.
 

Win2012R2

Senior member
Dec 5, 2024
893
852
96
You are deflecting the subject.
I am not deflecting anything, it is you who is bringing all sort of historic limitations in execution which got nothing to do with register size - all CPUs got limitations of different sorts, many of which get relaxed later, that does not affect register size as it is defined in uarch.

Anyway, I am done here, you clearly operating on a different plane than me on this one.

P.S. Why won't you bring 64-bit memory addressing, which in reality is not 64-bit because CPUs use less real bits for memory addressing, yet the registers used for pointers are 64-bit... or are they now? Rhetorical question.
 

OneEng2

Senior member
Sep 19, 2022
512
742
106
Not according to this a few pages back in this thread.
Current CB24 scores appear to be quite bandwidth limited. Certainly even from that post, it is clear that bandwidth has a significant effect on performance across a wide range of benchmarks.

Assuming that Intel will shell out the dough for state-of-the-art new memory for high volume products is a fallacy IMO.

Assuming that average desktop and laptop users utilize higher core counts than today's processors provide is another fallacy IMO.

Finally, assuming that HPC and DC workloads where higher core counts are justified by the applications will not be bandwidth starved with only 2 channels is equally hard to imagine.

A 52 core Nova Lake (lets just call it a 48 core since I doubt those LP cores are worth the die space anyway) with higher IPC P and E cores like everyone is expecting will crave even more bandwidth than the current Arrow Lake per core.

I see neither the market for a 52 core desktop/laptop processor nor the technical merit of pairing such a beast with only 2 channels of DDR5.

As this is only my opinion, I suspect time will tell.
 

dullard

Elite Member
May 21, 2001
25,763
4,289
126
Maybe I missed it, however I don’t think anyone was ever claiming there would only be a 52 score variant.
FWIW I have a hard time believing such a part exists at all. N2 and 18A cost significantly more than previous processes, mastering PPA is a must for both Intel and AMD. The 285k is on N3, and Intel’s core sizes will eat up any improvements that N2/18A provide.

If anything, they will probably have a 12/24 + 4 part at the top end, with 2 6 + 12 core CCDs.
That was implied by the "multiply by 100 million" statement which was in reference to the rumored 52 core variant. The need for DDR6 is only for the 52 core variant. There is absolutely no need for the rest of the Nova Lake chips to have DDR6. They may have it, but don't need it. I can certainly see a desktop CPU with one tile of 26 or fewer cores on DDR5 and a dual tile workstation CPU with 52 cores and DDR6 (remember rumors state that Nova Lake separates the memory tile from the CPU tile). Will it happen? I have no idea. But it could.

Take the expense and time to design just one 1 tile with 26 cores for the desktop crowd. Mass produce it for yield and cost savings. Put two of those tiles together for the workstation crowd with a different memory controller. Apple put two M1 tiles together with the M1 Ultra. AMD did it with Threadripper. Heck, Intel did it (poorly) back with the Pentium D which started the whole glue it together meme. Plus, this is exactly what the rumors state: 2x26 cores.
 

OneEng2

Senior member
Sep 19, 2022
512
742
106
That was implied by the "multiply by 100 million" statement which was in reference to the rumored 52 core variant. The need for DDR6 is only for the 52 core variant. There is absolutely no need for the rest of the Nova Lake chips to have DDR6. They may have it, but don't need it. I can certainly see a desktop CPU with one tile of 26 or fewer cores on DDR5 and a dual tile workstation CPU with 52 cores and DDR6 (remember rumors state that Nova Lake separates the memory tile from the CPU tile). Will it happen? I have no idea. But it could.

Take the expense and time to design just one 1 tile with 26 cores for the desktop crowd. Mass produce it for yield and cost savings. Put two of those tiles together for the workstation crowd with a different memory controller. Apple put two M1 tiles together with the M1 Ultra. AMD did it with Threadripper. Heck, Intel did it (poorly) back with the Pentium D which started the whole glue it together meme. Plus, this is exactly what the rumors state: 2x26 cores.
Seeing how they put 24 cores on N3B @ 114mm2, seems logical that 26 cores would be doable on 18A (or N2). They should be able to keep the die down to around 100mm2 or somewhere near that I would think as well. Doubling this and expecting the RAM to keep up with just 2 channels? That seems a bit optimistic IMO.

I believe I saw a rumor that AMD's 12c CCD would be around 75mm2 on N2. This next round is going to be interesting for sure.

Currently the 16c/32t Zen 5 generally bests Arrow Lake 24c/24t by 5% in multi-threaded loads (pretty close though). So essentially, 1 Zen 5 core = 1.5 Arrow Lake cores overall in MT. So if everything stays in the same proportion next generation, 24 Zen 6 cores would be equivalent to 36 Nova Lake cores in MT.



I believe we are entering an interesting time with the next generation of processors though. What percentage of the consumer market needs more than a 26 core Nova Lake (or 12c/24t Zen 6)? Sure, you can make one with a dual CCD and a decent memory controller (and potentially faster memory on dual channel), but how many consumers need it?

For the HPC/Workstation work, wouldn't something like Threadripper be better? Much more memory bandwidth and way more threads?

Perhaps this is what the 52 core Nova Lake will be targeting?
 

511

Golden Member
Jul 12, 2024
1,899
1,705
106
I'm really surprised people fight over SSE vs 3dnow, some depending on their preference for Intel or AMD. What I will always remember is 64-bit x86, which was designed by AMD; this had a much higher impact on x86 than any SIMD extension.

Also register banks exist. They can even be spotted on floor plans. They are obviously much larger than what the ISA requires due to renaming.
Intel had it designed as well they were super focused on Itanium they didn't wanted AMD to have thier money making x86 license.


 
Reactions: Nothingness

511

Golden Member
Jul 12, 2024
1,899
1,705
106
You took this this out of context. I said the compiler is reusing the same registers over and over.

example code

Code:
    int xvals[8] = {0,1,2,3,4,5,6,7};
    int end = sizeof(xvals) / sizeof(int);
    int i = 0;
    do {
        xvals[i]++;
    } while (++i < end);

the loop minus the array setup

Code:
.L2:
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     eax, DWORD PTR [rbp-48+rax*4]
        lea     edx, [rax+1]
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     DWORD PTR [rbp-48+rax*4], edx
        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        setl    al
        test    al, al
        jne     .L2

The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine. My point was compilers generally don't use more than a few named registers.
there are only 16 GPR Registers available in x86_64 to begin with

Seeing how they put 24 cores on N3B @ 114mm2, seems logical that 26 cores would be doable on 18A (or N2). They should be able to keep the die down to around 100mm2 or somewhere near that I would think as well. Doubling this and expecting the RAM to keep up with just 2 channels? That seems a bit optimistic IMO.

I believe I saw a rumor that AMD's 12c CCD would be around 75mm2 on N2. This next round is going to be interesting for sure.
where did it leak i had guessed around 80mm2 i was close :rofl:
Currently the 16c/32t Zen 5 generally bests Arrow Lake 24c/24t by 5% in multi-threaded loads (pretty close though). So essentially, 1 Zen 5 core = 1.5 Arrow Lake cores overall in MT. So if everything stays in the same proportion next generation, 24 Zen 6 cores would be equivalent to 36 Nova Lake cores in MT.

View attachment 122734
It has Y cruncher in the mix with AVX-512 when NVL will get AVX-512 there would be a change in it if in handbrake they are using SVT-AV1 it also supports AVX-512 .
I believe we are entering an interesting time with the next generation of processors though. What percentage of the consumer market needs more than a 26 core Nova Lake (or 12c/24t Zen 6)? Sure, you can make one with a dual CCD and a decent memory controller (and potentially faster memory on dual channel), but how many consumers need it?
For this the Tiles that were leaked were 8+16/4+8/4+0 and the SOC tile contains 4LPE Cores no matter the config i think a 4+4(4 cores disabled)+4 I3 would be ridiculous.
THe SKU can be but not limited to
  • 8+16+4
  • 2*(8+16)+4
  • 2*(4+8)+4
  • 4+8+4
  • 4+0+4
the 8+16 Tile is N2 and the 4+8+4/4+0 tile is 18AP and the common SoC tile is shared across all the SKUs.

For the HPC/Workstation work, wouldn't something like Threadripper be better? Much more memory bandwidth and way more threads?

Perhaps this is what the 52 core Nova Lake will be targeting?
Probable around $1000 this should be a great buy for people looking at ST/MT performance where you don't need a ton of PCI-E and for RAM i think 64 GB DIMM should be more available by than so 256 GB should be enough for most enthusiasts.
 

Thunder 57

Diamond Member
Aug 19, 2007
3,489
5,783
136
Intel had it designed as well they were super focused on Itanium they didn't wanted AMD to have thier money making x86 license.

View attachment 122737

100% fake. There was absolutely no 64 bit in the P4 until it needed it. I cannot believe I read such stupidity. His 64 bit just happened to be the same as AMD's? What a fool. Totally worthless fake chump. Have I made my opinion clear?
 
Last edited:
Reactions: Thibsie

511

Golden Member
Jul 12, 2024
1,899
1,705
106
100% fake. There was absolutely no 64 bit in the P4 until it needed it. I cannot believe I read such stupidity. His 64 bit just happened to be the same as AMD's? What a fool. Totally worthless fake chump. Have I mad my opinion clear?
Ok so you are saying one of the Chief architects of x86 of his time is lying that's pretty bold of you
 
Reactions: Nothingness
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |