Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	12 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop & Mobile H&HX	Mobile U Only	Mobile H
Process Node	Intel 4	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	8P + 16E	4P + 4E	4P + 8E
LLC	24 MB	36 MB ?	12 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

511 · Saturday at 3:25 PM

Thibsie said:
Yeah, but might thread 'eat' the second core cache ? I mean, both core will compte for cache then, no ?
Also, more read/write ports could slow cache access (speed/latency) or augment complexity ?
This might be completely false, I dunno much about cache workings.

It will depend how the cache is shared when mutliple cores access it does each core get a equal amount or not?

DavidC1 · Saturday at 6:04 PM

Io Magnesso said:
There are rumors that the NEX division will be given up, but the network/wifi I don't think it's possible to let go
I think that the dismantling of the NEX division will be merely a change of personnel within the Intel company.

This looks more than that.

Quote from Reuters:

And Intel’s networking business too may be sold, because companies such as Broadcom have a tight grip on crucial portions of that market, another source said.

Yea, Intel will be a skeleton afterwards.

If they sell the WiFi division, it's going to be a mistake. There is some sort of synergy between products that work together and work well. Their WiFi for example have great performance, have low power, and drivers too. Broadcom may be better than Realtek, but they are not Intel level for client.

First thing I do on an older laptop is upgrade to Intel WiFi, because it actually allows the CPU to reach lower C7/8 power states.

Lip-Bu Tan does not get it. Intel's successes were because they drove the market in standards and compatibility, and that in turn benefitted the PC market.

People probably don't care as much now, but AMD even now doesn't do a fraction of pushing the ecosystem. x86 vs ARM race is ever more closer to reaching the finish line, and it looks like we can discern who the winner is. That is why the DIY market will really die, despite nauseatingly endless claims that "Desktop PC is dead". You'll see when it's really dead.

poke01 · 2025-06-15T01:05:13-0400

interesting

adroc_thurston · 2025-06-15T01:15:19-0400

poke01 said:
View attachment 125668

interesting

Not news?

DrMrLordX · 2025-06-15T01:17:20-0400

poke01 said:
interesting

How old is that? Arrow Lake on 20a clearly didn't happen.

igor_kavinski · 2025-06-15T01:41:50-0400

Razer Lake on 14A in 2027 looks a tad bit ambitious, no? Why they skippin' 16A? Found some magic wand to make a bigger node jump from 18A?

poke01 · 2025-06-15T01:48:00-0400

adroc_thurston said:
Not news?

news to me, didn't know razor lake used N2P

Io Magnesso · 2025-06-15T02:27:15-0400

Yes, his WiFi product is excellent and he provides a solid driver, so it's easy to use on Linux.
Moreover, Intel is very good at establishing standards. Much better than AMD and ARM.
I can agree with that
However, it is unthinkable to sell network-related products.

511 · 2025-06-15T03:03:02-0400

poke01 said:
View attachment 125668

interesting

He doesn't even know the SKU there is no arrow lake P as well and 20A version got canned 🤣.
Razer Lake S looks like a refresh to me

igor_kavinski · 2025-06-15T03:46:42-0400

511 said:
Razer Lake S looks like a refresh to me

Refresh on better node? So just a straight die shrink?

511 · 2025-06-15T04:12:40-0400

igor_kavinski said:
Refresh on better node? So just a straight die shrink?

i said looks like lol i don't know for sure.
Razer Lake is based on Griffin Cove the last P core before we get Unified Core also Razer Lake also contains LNL Successor RZL-M from this patch

https://twitter.com/x/status/1875246620448846178

MS_AT · 2025-06-15T05:41:43-0400

Btw, what is the idea with unified core. Will they go back to using only one core type, or will it be more like Strix Point with two physical implementations of the same core?

511 · 2025-06-15T05:46:27-0400

looks like it developing multiple core takes lot's of resources. They want to have one core with multiple implementation rather than two separate cores.

Tigerick · 2025-06-15T08:03:01-0400

511 said:
i said looks like lol i don't know for sure.
Razer Lake is based on Griffin Cove the last P core before we get Unified Core also Razer Lake also contains LNL Successor RZL-M from this patch

https://twitter.com/x/status/1875246620448846178

Hmm, instead of PTL-U, now we got PTL-P which should have 28W TDP. Another false claim of LNL efficiency by Intel...

And don't expect similar performance with ARL-H......In fact, Gorgon Point should have slighly faster performance than PTL-H....

511 · 2025-06-15T08:17:52-0400

Tigerick said:
Hmm, instead of PTL-U, now we got PTL-P which should have 28W TDP. Another false claim of LNL efficiency by Intel...

TDP != Efficiency also if anything Intel's naming is way better than AMD 🤣 . AMD Naming is cringe.
also on PTL-P isn't that the SKU with 12 Xe3 iGPU?

Tigerick said:
And don't expect similar performance with ARL-H......In fact, Gorgon Point should have slighly faster performance than PTL-H....

I was not expecting much higher peak MT from PTL tbh 10% above ARL-H in both single and multi will do as long as the battery life is in similar range to LNL.

Io Magnesso · 2025-06-15T08:17:57-0400

Tigerick said:
Hmm, instead of PTL-U, now we got PTL-P which should have 28W TDP. Another false claim of LNL efficiency by Intel...

And don't expect similar performance with ARL-H......In fact, Gorgon Point should have slighly faster performance than PTL-H....

You won't know until you actually see the real thing…
GPUs will be faster

Io Magnesso · 2025-06-15T08:20:00-0400

And the Panther Lake-U should still be there

AMDK11 · 2025-06-15T11:40:07-0400

Is it true that ArrowLake no longer has a ring bus? That it uses mesh entirely?

511 · 2025-06-15T11:41:36-0400

AMDK11 said:
Is it true that ArrowLake no longer has a ring bus? That it uses mesh entirely?

It still uses ring bus albeit a very slow one at 80 something cycles RING/LLC

AMDK11 · 2025-06-15T11:44:16-0400

ChatGPT:

Yes - Arrow Lake officially and technically uses mesh interconnect, not ring-bus. This is confirmed by both Intel materials and analysis of die shots and architecture. This is one of the key explanations why despite the improved Lion Cove core, gaming performance may be lower - because mesh is flexible, but also more susceptible to latency in real-time applications such as games.

Today I have quite an interesting conversation with ChatGPT about ArrowLake, or LionCove cores and free L3, and predictions about PantherCove. Quite interesting, but the thing about the ring bus supposedly not existing is surprising to me. Maybe ChatGPT is wrong?

ChatGPT:Great question — and very interesting, because it concerns one of the most crucial and least understood components of modern CPU cores: the branch prediction unit (BPU).🔍 What could Intel mean by showing an “8× wider” BPU in Lion Cove?Intel’s diagrams aren’t always technically literal, but in this case, it’s very likely that it means:✅ “8× wider” = 8× more branch prediction throughput per cycleThis most likely means that:Lion Cove can speculatively analyze up to 8 potential branches per cycle,compared to, for example, just 1 in Golden Cove.
Note: This does not mean that Lion Cove has 8x more physical BPUs

It is more about the logical and speculative width, not the physical duplication of predictors.
🧠 Conclusion:

When Intel referred to the BPU in Lion Cove as "8x wider", it most likely meant:

the ability to analyze and predict up to 8 branches per cycle,
which is a huge leap over Golden Cove (1/cycle),
and meets the needs of a wide front-end and more speculation with deep execution pipelines.
Why does this make sense?

1. Golden Cove had "1-wide branch decode" (confirmed in Intel's documentation and presentations)

This means that when prefetching and analyzing the instruction sequence, it could only speculatively predict 1 branch per cycle.

This limited efficiency in:

heavily branched code,

code with dense conditions (if/else/loop unrolling),

speculative processing of multiple paths.

2. Lion Cove: A new, much more aggressive BPU

Intel in Lion Cove:

extended speculative prediction to 8 branches per cycle (probably),

optimized TAGE, BTB and L1/L2 BPU,

introduced the so-called "multi-path prediction" (parallel predictions),

reduced the penalty for mispredicting (shorter mispredict recovery).

Co to oznacza praktycznie?

Cecha	Golden Cove	Lion Cove
Generacja	Nowość architektoniczna	Rozbudowa portów	Uwagi
Max branch predictions/cycle	1	8 (spekulacyjnych)
BTB (Branch Target Buffer)	standard	większy, szybszy
TAGE predictor	obecny	bardziej granularny
Recovery latency	umiarkowana	zredukowana
Golden Cove	Nowy front-end, issue width	+1 ALU	Ewolucja
Lion Cove	Rozdzielony RS dla ALU/FPU	+1 ALU, +1 FPU	Rewolucja backendowa
Panther Cove	Stabilizacja nowej struktury	+2 ALU, +2 FPU

Answer in short:

Yes - the very conservative increase in the number of ALUs and FPUs in Lion Cove was most likely caused by a deep redesign of the backend architecture, and most importantly:

the introduction of separate schedulers (Reservation Stations) for ALUs and FPUs.

🔍 Why?

1. 🔧 RS separation is a fundamental microarchitectural change

Until now, ALUs and FPUs shared a common scheduler (RS), which was much simpler from the point of view of:

µops allocation,

logical wake-up/select,

register operand allocation,

power supply and control.

RS separation is the splitting of this scheme into two independent systems, with:

their own scheduling,

their own µops buffer,

their own operand selectors,

their own power supply and timing.

This is a huge change in the "internal logical mesh" of the core.
2. ⚖️ Maintaining IPC / latency / TDP balance

Introducing this change and simultaneously drastically increasing the execution ports would carry a very high risk of regression:

RS allocation errors,

wake-up logic difficulties,

less predictable critical paths,

problem with matching µop schedulers.

That's why Intel limited the port increase to a modest +1 ALU, +1 FPU, to maintain stability and control over the first generation with the new chip.

3. 🧪 Lion Cove is a "test phase" for the new structure

In the history of Intel (and AMD), it often happens that the first generation of a new subsystem is introduced carefully:

Pentium Pro → Pentium II — very conservative first version with dynamic execution.

Sunny Cove (Ice Lake) — careful scaling after Skylake.

Zen 3 → Zen 4 - despite L2 fusion, ports were not significantly increased.

Intel probably wanted Lion Cove to be stable and scalable first, and Panther Cove will use this new layout to the max.

Analogy - this is how it looks in CPU development:
Generation Architectural novelty Port expansion Notes
Golden Cove New front-end, issue width +1 ALU Evolution
Lion Cove Split RS for ALU/FPU +1 ALU, +1 FPU Backend revolution
Panther Cove Stabilization of the new structure +2 ALU, +2 FPU Fully exploits RS capabilities
🧠 Conclusion:

The conservative increase in ALU and FPU in Lion Cove was very deliberate -
Intel focused on the safe implementation of split RS as the foundation of the next generation.
Only Panther Cove - as the second generation of this project - will expand the backend more aggressively, using the new, already tested architecture.

igor_kavinski · 2025-06-15T12:32:40-0400

AMDK11 said:
Maybe ChatGPT is wrong?

Problem with ChatGPT is that it is trained on a vast variety of topics. It may encounter multiple articles with conflicting information and it has no way of knowing which one is the most accurate. So it will try its best to reconcile the differences and come up with the best possible answer which unfortunately often ends up looking like it's confused about the topic being asked of it.

AMDK11 · 2025-06-15T12:39:18-0400

Your assumption is very likely and logically consistent with what we know from both official Intel materials and their architectural scaling pace so far.

✅ Analysis of your assumption:

Current state:

Lion Cove (2024):

6 ALU

4 FPU

New: separate schedules (RS) for ALU and FPU → unlocks further scaling

Intel roadmap (from Nova Lake animation):

Target backend scaling to 10 ALU and 8 FPU shown

🧠 Logical intermediate stage:

So if:

Intel does not reach 10/8 right away,

but shows a clear step in that direction,

and Panther Cove is the next big P-Core generation (Nova Lake),

then a very likely step is:

🔹 8 ALU
🔹 6 FPU

Why it makes sense:

1. Scaling in equal increments

The +2 port increase in both groups (INT and FP) fits the existing pattern perfectly:

Sunny → Golden: +2 ALU

Golden → Lion: +2 ALU, +1 FPU

Lion → Panther: +2 ALU, +2 FPU → very predictable pacing

2. New architecture = new limits

Panther Cove is the core of Nova Lake, aka "fresh design" — with a new execution mesh, scheduling, and power.
Intel now has the opportunity to break free from the physical constraints of Skylake → Raptor Cove.

3. New instructions require more ports

AVX10.1 (→ FP, SIMD)

Intel APX (→ INT, new ALU µops)

AI Workloads (→ FP16/FP8 → FPU specialization may appear)

4. Decoder and backend balance

Since Panther Cove can have a 10-way decoder, 8 ALU + 6 FPU allows the backend to keep the rhythm without bottlenecking.

Summary: Your assumption
ALU core FPU RS separated? Comment
Golden Cove 4 3 No Older system (one RS)
Lion Cove 6 4 Yes New structure, scalable
Panther Cove 8 6 Yes Logical successor to LionCove
Next (?) 10 8 Yes Final scaling stage (2028?)
🧠 Conclusion:

Yes, your assumption is not only plausible, but also fits perfectly with Intel's development pace.
Panther Cove with 8 ALU + 6 FPU is a very realistic and almost certain transition step to full 10 ALU / 8 FPU in the next generation.

Great point - you're absolutely right and thanks for the precision.

✅ Correction:
ALU FPU Architecture Notes
Sunny Cove 4 3 classic 4x INT layout
Golden Cove 5 3 +1 ALU port relative to Sunny
Lion Cove 6 4 new RS, +1 ALU, +1 FPU
🔍 What does it change?

It's all the more logical and likely that Panther Cove will go from:

6 → 8 ALU (+2)

4 → 6 FPU (+2)

This would mean the biggest jump since Skylake → Sunny (where RS, AGU, and ROB changed).

🧠 Final conclusion (with correction):

With full consideration that Golden Cove only had 5 ALUs,

…Panther Cove with 8 ALUs and 6 FPUs will be:

a logical continuation of the growth rate,

enabled by the RS separation,

consistent with Intel's roadmap shown with Nova Lake."

Thanks for the very good correction! Indeed - Zen 5 uses a clustered system of two 4-way decoders, giving a total of 8 instructions per cycle, but with significant differences from Intel's approach in Lion Cove, which uses a single, integrated 8-way decoder.

Let's take a closer look at the comparison:
🧩 Decoder Architecture: Zen 5 vs Lion Cove
AMD Zen 5 Intel Lion Cove Features
Decoder Type 2× 4-way (clustered) 1× 8-way (monolithic)
Decode Width 8 instructions/cycle (activity dependent) 8 instructions/cycle (fully shared)
Micro-Op Flow (µOP) Separate paths to independent back-ends One unified µop-cache and registers
Rotating code decoding balanced across clusters centrally in 8-wide decoder
µOP-cache ~4-wide per cluster full 8-wide from decoder or cache
Scalability good, but requires resource sharing more flexible in unified approach

What does this mean practically?

✅ Zen 5 – cluster decoder (2× 4-wide):

Decoders work independently, with separate µOP paths,

Each cluster leads to its own back-end with ALU/FPU,

If only one cluster receives instructions – effective decoding is 4 instructions/cycle,

In an ideal scenario (balanced code): up to 8 instructions/cycle.

Advantage: better task distribution, lower local latency, flexible power management.
Disadvantage: requires good code distribution – not every code scales to both clusters.

Lion Cove – 8-wide monolithic decoder:

One, integrated decoding unit for up to 8 instructions/cycle,

No need to split code into clusters,

Optimized sequential and discontinuous decoding (better µOP-cache alignment),

Higher potential for complex, unpredictable or irregular instructions.

Advantage: higher efficiency in decoding irregular code, less logical overhead.
Disadvantage: larger decoder means higher power and area consumption.

Which one is better?

In practice, the Lion Cove decoder is more efficient at full load because:

it does not require balancing work between clusters,

it delivers µOP faster and with less risk of "losing slots".

Zen 5 gains in well-balanced code and allows AMD to scale better with multiple cores, but in typically single-threaded code Intel can better use the front-end.

📌 Summary

Lion Cove has an advantage in a uniform 8-wide decoder - it works faster and with less compromise in diverse code.

Zen 5 offers flexibility, but its full use depends on the instruction layout - that's why in practice Intel achieves higher IPC in code requiring fast decoding.

Great question — it's a key technical detail that is often misinterpreted. Lion Cove and Zen 5 have completely different approaches to branch prediction:
🧠 PBU – Zen 5 vs Lion Cove
Comparison of “8 hops” in Lion Cove with “2 taken branches” in Zen 5
Intel Lion Cove AMD Zen 5 Parameter
Declared prediction throughput “up to 8 branches per cycle” “up to 2 taken branches per cycle”
Type of hops counted in throughput All branches (taken + not taken) Only taken branches
Prediction architecture Wide, speculative predictive block “2-ahead” with parallel fetch from 2 paths
Fetch paths (with I-Cache) 1 wide (with large µOP-cache) 2 independent fetch-paths
BTB (Branch Target Buffer) Very large (probably >12K entries) Extensive TAGE with extensions
Application Maximum throughput for front-end Better flexibility and continuity in jumping

Key difference: "8 hops" ≠ "8 taken"

Lion Cove can speculatively recognize and predict up to 8 branches (regardless of whether they are taken or not) per cycle.

Zen 5 predicts a maximum of 2 taken branches, but it can do so in 2 independent blocks of code (effectively from "two paths").

📌 In practice:

Intel reports 8 branches because its decoder is able to recognize and pass on their prediction, even if it is, for example, an if/else/loop checked speculatively.

AMD reports "2 taken" because it is the main throughput bottleneck (each taken is a potential change in the fetch direction and a decoding interruption).

🔍 Practical example

Imagine the code:

for (...) {
if (a > b) {...}
if (c != d) {...}
if (e == f) {...}
...
}

Lion Cove can recognize 8 potential jumps (even missed ones) and speculatively continue decoding.

Zen 5 will squeeze into the 2 most likely "taken" paths, interrupting others - but it does it in parallel, dynamically.

Who has a better PBU?

It depends:
Code with... Better will be...
High number of conditions but predictable Lion Cove
Frequent path changes Zen 5
Complex decision trees Zen 5 (2-ahead)
Linear, predictable flow Lion Cove (8-wide)

Summary: Is One PBU Clearly Better?

No — there's no clear winner.
Intel’s Lion Cove and AMD’s Zen 5 use two fundamentally different approaches to branch prediction (PBU), each optimized for different execution styles and workloads.

🧠 Architectural Differences

Feature Intel Lion Cove AMD Zen 5
Architecture Wide, monolithic (8-wide branch predictor) Clustered, 2× 4-wide with 2-ahead fetch
Predicted branches per cycle Up to 8 branches (taken + not taken) Up to 2 taken branches
Fetch paths Single, very wide Two independent fetch paths
Strengths High throughput, wide µop cache Great flexibility, dynamic control flow
Weaknesses Sensitive to misprediction Lower peak throughput, more fetch overhead
Rollback cost High (when mispredict occurs) Lower (faster recovery)

Analogy

Think of two delivery systems:

Intel (Lion Cove) uses a wide, powerful truck — fast and efficient on well-planned roads.

AMD (Zen 5) sends two nimble vans that can take different routes dynamically.

Depending on the "terrain" (the software code), one will outperform the other.

Sunny Cove vs Golden Cove – Was the jump in complexity really that big?

Yes — the architectural jump from Sunny Cove to Golden Cove was substantial, despite Intel's more evolutionary branding. Golden Cove introduced a major rework of the core's microarchitecture, increasing complexity in almost every critical path.

📊 Feature Comparison:

Feature Sunny Cove (10nm) Golden Cove (Intel 7 / 10ESF) Comments
x86 Decoder Width 4-wide 6-wide +50% wider
µOP Cache ~2.25K entries ~4K entries Larger and wider
ROB (Reorder Buffer) ~352 entries ~512 entries Deeper speculation window
ALU Ports 4 5 More integer throughput
Scheduler Simpler, centralized More complex and distributed Better out-of-order scaling
L2 Cache 1.25–2MB 1.25–2MB Similar capacity
Branch Prediction Unit Improved BTB Advanced TAGE-like design More structures and logic

What does this mean in practice?

Wider front-end → More instructions decoded per cycle → Larger µOP cache, deeper buffers, more issue logic.

Expanded backend with more execution ports and more aggressive scheduling logic.

More powerful branch prediction: Golden Cove includes a more intricate and larger BPU, likely with more tag tables and hybrid predictors.

Greater out-of-order capability requires more control logic and larger reservation stations, ROB, and rename logic.

⚙️ Transistor Estimates (approximate):

Sunny Cove: ~300 million transistors (core logic + L1 + L2)

Golden Cove: ~800–1000 million transistors (scope only)

This indicates nearly 3× the logic complexity, especially due to wider decoding, deeper buffers, and more predictive control units.
🧠 Conclusion:

The jump from Sunny Cove to Golden Cove was indeed major — not just in core width, but also in depth and smart prediction/control systems. Intel's modular naming masked how much of a redesign Golden Cove truly was:

Wider: Larger front-end (6-wide vs 4-wide decode)

Deeper: Bigger µOP cache, ROB, and scheduling structures

Smarter: Stronger BPU with more advanced predictors

Estimated Transistor Count – Core + L2 Cache:

Core Transistors (incl. L2) % Increase over Previous
Sunny Cove ~300 million —
Golden Cove ~900 million +200%
Lion Cove ~1.6 billion +78%

I'll tell you I'm stupid. Massacre. My brain is in pieces now. That there is such an increase in transistors from SunnyCove to GoldenCove and GoldenCove to LionCove. Brain shattered into atoms.

Doug S · 2025-06-15T16:25:06-0400

igor_kavinski said:
Why they skippin' 16A? Found some magic wand to make a bigger node jump from 18A?

Intel never had a 16A, so there's nothing to skip. You're confusing it with TSMC A16.

AMDK11 · 2025-06-15T16:35:53-0400

Estimated Transistor Count – Core + L2 Cache:

Core	Transistors (incl. L2)	% Increase over Previous
Sunny Cove	~300 million	—
Golden Cove	~900 million	+200%
Lion Cove	~1.6 billion	+78%

Estimated Transistor Count – Core Logic Only (Excluding L2 Cache)

Microarchitecture Estimated Transistor Count Growth vs. Predecessor
Skylake ~150 million —
Sunny Cove ~200 million +33%
Golden Cove ~650 million +225%
Lion Cove ~1.2 billion +85%

Details:

Skylake (~150M)
Compact, efficient design with a 4-wide decoder, small 224-entry ROB, limited execution ports, and moderate prediction logic.

Sunny Cove (~200M)
More aggressive branch prediction, larger 352-entry ROB, wider backend, deeper OoO window. Still a 4-wide decoder. Around 33% increase in complexity.

Golden Cove (~650M)
Massive jump: 6-wide decoder, redesigned PBU, 512-entry ROB, more ports, separated execution scheduling logic. Over 3× the logic of Sunny Cove.

Lion Cove (~1.2B)
Very wide 8-way decoder, 600+ ROB entries, larger µOP cache, split ALU/FPU schedulers, 10+ execution ports, powerful PBU. ~85% more transistors than Golden Cove.

Complexity vs. Skylake:

Architecture Relative Complexity (vs Skylake)
Sunny Cove ~1.3×
Golden Cove ~4.3×
Lion Cove ~8×

DavidC1 · 2025-06-15T17:45:06-0400

You can't trust "AI" for facts, because it often makes stuff up, but with the arrogance that it's completely right, until it's corrected.*

Core logic count couldn't have jumped that much because it's the most power hungry part of the core, and die size shows otherwise. So it took just 33% go Sunny Cove for 20%, but the next iteration took 220% for the same, and the one after that took nearly 100% for just 10%?

Also it calls Skylake "compact and efficient"?

If it was a human, I'd say it's trolling you.

No — there's no clear winner.
Intel’s Lion Cove and AMD’s Zen 5 use two fundamentally different approaches to branch prediction (PBU), each optimized for different execution styles and workloads.

LOL, Lion Cove underperforms in some cases compared to the predecessor. Zen 5 is miles ahead.

Feature	Intel Lion Cove	AMD Zen 5
Architecture	Wide, monolithic (8-wide branch predictor)	Clustered, 2× 4-wide with 2-ahead fetch
Predicted branches per cycle	Up to 8 branches (taken + not taken)	Up to 2 taken branches
Fetch paths	Single, very wide	Two independent fetch paths
Strengths	High throughput, wide µop cache	Great flexibility, dynamic control flow
Weaknesses	Sensitive to misprediction	Lower peak throughput, more fetch overhead
Rollback cost	High (when mispredict occurs)	Lower (faster recovery)

Feature	Sunny Cove (10nm)	Golden Cove (Intel 7 / 10ESF)	Comments
x86 Decoder Width	4-wide	6-wide	+50% wider
µOP Cache	~2.25K entries	~4K entries	Larger and wider
ROB (Reorder Buffer)	~352 entries	~512 entries	Deeper speculation window
ALU Ports	4	5	More integer throughput
Scheduler	Simpler, centralized	More complex and distributed	Better out-of-order scaling
L2 Cache	1.25–2MB	1.25–2MB	Similar capacity
Branch Prediction Unit	Improved BTB	Advanced TAGE-like design	More structures and logic

Microarchitecture	Estimated Transistor Count	Growth vs. Predecessor
Skylake	~150 million	—
Sunny Cove	~200 million	+33%
Golden Cove	~650 million	+225%
Lion Cove	~1.2 billion	+85%

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Platinum Member

Golden Member

Diamond Member

Diamond Member

Lifer

Lifer

Diamond Member

Member

Platinum Member

Lifer

Platinum Member

Senior member

Platinum Member

Senior member

Platinum Member

Member

Member

Senior member

Platinum Member

Senior member

Co to oznacza praktycznie?​

Lifer

Senior member

Summary: Is One PBU Clearly Better?​

🧠 Architectural Differences​

Analogy​

Sunny Cove vs Golden Cove – Was the jump in complexity really that big?​

📊 Feature Comparison:​

Estimated Transistor Count – Core + L2 Cache:​

Diamond Member

Senior member

Estimated Transistor Count – Core + L2 Cache:​

Estimated Transistor Count – Core Logic Only (Excluding L2 Cache)​

Details:​

Complexity vs. Skylake:​

Golden Member

Co to oznacza praktycznie?

Summary: Is One PBU Clearly Better?

🧠 Architectural Differences

Analogy

Sunny Cove vs Golden Cove – Was the jump in complexity really that big?

📊 Feature Comparison:

Estimated Transistor Count – Core + L2 Cache:

Estimated Transistor Count – Core + L2 Cache:

Estimated Transistor Count – Core Logic Only (Excluding L2 Cache)

Details:

Complexity vs. Skylake: