Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

poke01 · May 21, 2024

junjie1475 said:
【苹果M4性能分析：尽力了，但人类科技快到头了！】 https://www.bilibili.com/video/BV1N...eb&vd_source=a38447d92dbac66e202c0251155453d6

Great work.

Eug · May 21, 2024

junjie1475 said:
【苹果M4性能分析：尽力了，但人类科技快到头了！】 https://www.bilibili.com/video/BV1N...eb&vd_source=a38447d92dbac66e202c0251155453d6

Nice.

It says we need to login for subtitles? Do you know if this be posted on YouTube too, with English subtitles?

junjie1475 · May 21, 2024

Eug said:
Nice.

It says we need to login for subtitles? Do you know if this be posted on YouTube too, with English subtitles?

Not yet, but soon. Most of the graphs are labeled.

poke01 · May 22, 2024

Summary from the video.

Here’s the M4 vs M3 architecture diagram.

The M4 P core grows from an already big 9 wide decode to a 10 wide decode.
Integer Physical Register File has grown by 21% while Floating Point Physical Register File has shrunk.
The dispatch buffer for the M4 has seen a significant boost for both Int and FP units ranging from 50-100% wider structures. (Seems to resolve a major issue for M3 since M3 increased no of ALU units but IPC increases were minimal (3%) since they couldn’t be kept fed)
Integer and Load store schedulers have also seen increases by around 11-15%.
Seems to be some changes to the individual capabilities of the execution units as well but I do not have a clear picture on what they mean.
Load Store Queue and STQ entries have seen increases by around 14%.
The ROB has grown by around around 12% while PRRT has increased by around 14%
Memory/Cache latency has reduced from 96ms to 88ms.

All these changes result in the largest gen on gen IPC gain for Apple silicon in 4 years.

In SPECint 2017, M4 increases performance by around 19%.

in SPECfp 2017, M4 increases performance by around 25%.

Clock for clock, M4 increases IPC by 8% for SPECint and 9% for SPECfp.

from:

https://www.reddit.com/r/hardware/comments/1cxq7em/apple_m4_geekerwan_review_with_microarchitecture

FlameTail · May 22, 2024

What's the GPU performance uplift of M4 vs M3 (if there is any) ?

poke01 · May 22, 2024

FlameTail said:
What's the GPU performance uplift of M4 vs M3 (if there is any) ?

10%, same cores but probably clocked higher.

FlameTail · May 22, 2024

poke01 said:
10%, same cores but probably clocked higher.

Aha?

GPU Performance
M1 : 100%
M2 : 125%
M3 : 150%
M4 : 165%
M5 : ???

I am very curious where how GPU uplifts will go in future generations. Moore's Law is dead, so huge increases by adding more ALUs is not possible without blowing up the die size.

But Nvidia, the leader of the GPU industry still manages to bring about a 2x jump in performance every generation. 3090 -> 4090 was nearly a 2x jump, and 4090 -> 5090 is also rumoured to be a nearly 2x jump. This means that iGPUs are actually falling behind top end dGPUs % performance wise, with each passing year.

Eug · May 22, 2024

Eug said:
It says we need to login for subtitles? Do you know if this be posted on YouTube too, with English subtitles?

junjie1475 said:
Not yet, but soon. Most of the graphs are labeled.

On YouTube now, with subtitles. 🤓

FlameTail · May 22, 2024

Apple has reached Intel territory in terms of clock speeds. This is shocking.

10% clock boost on GPU.

He mentions that the M4 does not use LPDDR5X, but a special overclocked LPDDR5.

Shoutout to @junjie1475

CPU architecture diagrams. P-core is new. E-core is same M3/A17.

Gigachad pouring liquid nitrogen on the iPad. (From a Snapdragon branded bottle- the irony).

Does SPEC use SME?

Power consumption exploded.

FlameTail · May 22, 2024

After showing this slide, he proceeds to say this:

Rofl.

Nothingness · May 22, 2024

FlameTail said:
View attachment 99402
Does SPEC use SME?

@SarahKerrigan do you happen to have a Mac with Xcode? I wonder if the latest LLVM compiler is able to use SME on SPEC (I doubt it). And even if it does, that wouldn't prove Geekerwan enabled it, to keep the comparison fair.

junjie1475 · May 22, 2024

Nothingness said:
@SarahKerrigan do you happen to have a Mac with Xcode? I wonder if the latest LLVM compiler is able to use SME on SPEC (I doubt it). And even if it does, that wouldn't prove Geekerwan enabled it, to keep the comparison fair.

No, we didn't use SME in the compilation of SPEC, the compiler is Apple clang14 from Xcode14.2 and LLVM Flang17 with Optimization -Ofast.

FlameTail · May 22, 2024

The CPU clock speed strategy is very interesting.

CPU Power

GPU Power

poke01 · May 22, 2024

So in the end M4 does earn its new nomenclature.

The battery life test surprised me. I didn’t know mini-LED was that power hungry.

SarahKerrigan · May 22, 2024

Nothingness said:
@SarahKerrigan do you happen to have a Mac with Xcode? I wonder if the latest LLVM compiler is able to use SME on SPEC (I doubt it). And even if it does, that wouldn't prove Geekerwan enabled it, to keep the comparison fair.

Unfortunately not - but I'd be happy to do a run if someone has a system I can use remotely.

In my experience, even autovectorization on SPEC is very limited. I agree with your doubts about it being able to usefully emit SME.

poke01 · May 22, 2024

What’s the benefit of using nitrogen cooling?

So it doesn’t throttle?

FlameTail · May 22, 2024

poke01 said:
The battery life test surprised me. I didn’t know mini-LED was that power hungry.

More like Tandem OLED is a huge game changer.

It has begun.

In the 2010s, OLEDs took over smartphones Now, with the innovation of Tandem OLED, the 2020s will be the decade where OLED takes over laptops.

https://www.reddit.com/r/hardware/comments/1brzvlc/oleds_are_going_to_take_over_laptops

Eug · May 22, 2024

With the efficiency cores running at almost 3 GHz, I think I'd be really happy with an ultralight MacBook with 2 performance cores and 4 efficiency cores. Oh wait, is that coming to the iPhone this fall? Even 2+2 would be fine.

FlameTail said:
View attachment 99403
Power consumption exploded.

Yes, but at higher clocks. At similar clocks to M3, the power utilization is in the same ballpark. Power utilization is 5% higher at similar clocks, but performance is 8% higher in SPEC (no SME).

FlameTail said:
He mentions that the M4 does not use LPDDR5X, but a special overclocked LPDDR5.
View attachment 99394

I wonder how that relates to the 6 GB DRAM chip thing, if at all.

poke01 said:
What’s the benefit of using nitrogen cooling?

So it doesn’t throttle?

To overclock it 0.01 GHz according to Tom's Hardware.

But yeah, so it doesn't throttle.

FlameTail said:
View attachment 99399
Gigachad pouring liquid nitrogen on the iPad. (From a Snapdragon branded bottle- the irony).

Haha. I missed that.

FlameTail said:
View attachment 99394
Shoutout to @junjie1475

Great work @junjie1475!

uzzi38 · May 22, 2024

FlameTail said:
View attachment 99407
After showing this slide, he proceeds to say this:
View attachment 99408
View attachment 99409
View attachment 99410
View attachment 99411
View attachment 99412
View attachment 99413 View attachment 99414 View attachment 99415
Rofl.

Uh wait a minute, quick question (I didn't watch the video): are stuff like the SPEC results using LN2? And if so, were they able to keep operating temperatures in reasonable ranges (AKA not sub-zero?)

Just asking because it can mess with power consumption figures, you'll often find LN2 overclocked desktop chips pulling less power than if they were running the same clocks as on air at normal operating temperatures, even on the stock boost algorithms. Thermals can have an impact on leakage, and while it doesn't usually matter with air cooling, it can have a big impact once temperatures get into the sub-zero region.

I don't really expect the numbers to be hugely impacted from this, just wanted to know for accuracy's sake.

Eug · May 22, 2024

According to Geekerwan's tests, the M4 iPad Pro 11" 2024 can continuously dissipate approx. 13 Watts, compared to the 10 Watts of the M2 iPad Pro 11". The M2 iPad Pro 12.9" gets close to the M4 iPad Pro 13" though.

With mixed usage, the new iPad Pros have much better battery life.

This is likely due to the lower power requirements of tandem OLED. As you can see, the power consumption for video playback on the 2024 M4 iPad Pros is much lower in the following graph.

poke01 · May 22, 2024

uzzi38 said:
Uh wait a minute, quick question (I didn't watch the video): are stuff like the SPEC results using LN2? And if so, were they able to keep operating temperatures in reasonable ranges (AKA not sub-zero?)

Just asking because it can mess with power consumption figures, you'll often find LN2 overclocked desktop chips pulling less power than if they were running the same clocks as on air at normal operating temperatures, even on the stock boost algorithms. Thermals can have an impact on leakage, and while it doesn't usually matter with air cooling, it can have a big

It looked like LN2 was used for SPEC ‘to measure peak performance’.

But it doesn’t matter from these screenshots below, it looked the M4 pulled more power under LN2.

Normal, SPEC 2017 int: 6.98w for int

From: https://m.bilibili.com/video/BV1ir421j7vR?spm_id_from=333.999.0.0

LN2?, SPEC: 7.21W for int

From: Geekerwan

Doug S · May 22, 2024

There isn't a whole lot of difference between 6.98 and 7.21, nor between the SPECints of 11.37 and 11.72.

The higher figure is probably what we'd expect from an M4 installed in say a Mac Mini, where it has traditional heatsink and fan, rather than trying to pass all its heat through a copper Apple logo.

SpudLobby · May 22, 2024

Fwiw geekerwan seems to be measuring now with internal Apple APIs. There are other figures putting total M4 power at 11W which is what I actually expect it’s running platform level (idle normalized).

Even A14 iPhones when geekerwan measured from the VRMs/wall were running 4.11W on Spec, and the M1 P core here is substantially below that, and substantially below even Andrei’s measurements, so I suspect this is using Apple internal power modeling that Geekerwan recently mentioned.

M1 ain’t doing less than an A14 with a higher clock at the peak of the curve and twice the bus width, twice the ram. He changed the methodology here, or he massively changed what he’s actually testing, yet it’s still Spec.

Pretty important thing to note, much more important than the cooling.

name99 · May 22, 2024

junjie1475 said:
No, we didn't use SME in the compilation of SPEC, the compiler is Apple clang14 from Xcode14.2 and LLVM Flang17 with Optimization -Ofast.

What happens if you allow SME?
In PRINCIPLE LLVM should
- detect loops that look like matrix multiples or similar (and also appropriate long vector loops)
- map them to linalg operations
- which should then be lowered to SME or SSVE if the compiler has been given permission to do so

The various steps in this process are newish, in the sense that they've been written over the past two or three years, and haven't had much real world testing. But in THEORY they should work.

You could also try the multiversioning support, as described here,

What is new in LLVM 16

Contributions from Arm to the new release include the usual architecture and CPU additions and new features such as, function multi-versioning and strict floating point support.

community.arm.com

for a single function that looks like it should use SSVE2 or SME2, and see what happens.

name99 · May 22, 2024

SarahKerrigan said:
Unfortunately not - but I'd be happy to do a run if someone has a system I can use remotely.

In my experience, even autovectorization on SPEC is very limited. I agree with your doubts about it being able to usefully emit SME.

The ARM blog doesn't think so.
If you go through their annual changes to LLVM and GCC, every year they call out some big change in one of the SPEC benchmarks enabled by some new vectorization, though each year it tends to be a different function.

eg

Part 1: What Is New In LLVM 18?

This post summarizes LLVM 18 improvements contributed by Arm: new Arm architecture and CPU support, performance improvements.

community.arm.com

It seems like the linalg stuff might be lagging (in the compilers being used) as opposed to leading edge of Flang 18 and LLVM 18, but that just means we should see big boosts after WWDC? (Look at eg the Flang numbers at the above link)

The real question is how aggressively (and sensibly) the compiler routes to SME and SSVE, and the answer to both may be "not at all", and "terribly", until XCode 16 (which, while surely far from perfect, will presumably also at least make some sort of intelligent effort, when given an M4 target).

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Lifer

Junior Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Junior Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Lifer

Platinum Member

Lifer

Diamond Member

Diamond Member

Golden Member

Senior member

Senior member