Discussion Apple Silicon SoC thread

Page 282 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,725
1,261
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

poke01

Golden Member
Mar 8, 2022
1,202
1,390
106
Summary from the video.


Here’s the M4 vs M3 architecture diagram.

  • The M4 P core grows from an already big 9 wide decode to a 10 wide decode.
  • Integer Physical Register File has grown by 21% while Floating Point Physical Register File has shrunk.
  • The dispatch buffer for the M4 has seen a significant boost for both Int and FP units ranging from 50-100% wider structures. (Seems to resolve a major issue for M3 since M3 increased no of ALU units but IPC increases were minimal (3%) since they couldn’t be kept fed)
  • Integer and Load store schedulers have also seen increases by around 11-15%.
  • Seems to be some changes to the individual capabilities of the execution units as well but I do not have a clear picture on what they mean.
  • Load Store Queue and STQ entries have seen increases by around 14%.
  • The ROB has grown by around around 12% while PRRT has increased by around 14%
  • Memory/Cache latency has reduced from 96ms to 88ms.
All these changes result in the largest gen on gen IPC gain for Apple silicon in 4 years.

In SPECint 2017, M4 increases performance by around 19%.

in SPECfp 2017, M4 increases performance by around 25%.

Clock for clock, M4 increases IPC by 8% for SPECint and 9% for SPECfp.

from:
 

FlameTail

Platinum Member
Dec 15, 2021
2,905
1,636
106
10%, same cores but probably clocked higher.
Aha?

GPU Performance
M1 : 100%
M2 : 125%
M3 : 150%
M4 : 165%
M5 : ???

I am very curious where how GPU uplifts will go in future generations. Moore's Law is dead, so huge increases by adding more ALUs is not possible without blowing up the die size.

But Nvidia, the leader of the GPU industry still manages to bring about a 2x jump in performance every generation. 3090 -> 4090 was nearly a 2x jump, and 4090 -> 5090 is also rumoured to be a nearly 2x jump. This means that iGPUs are actually falling behind top end dGPUs % performance wise, with each passing year.
 

FlameTail

Platinum Member
Dec 15, 2021
2,905
1,636
106

Apple has reached Intel territory in terms of clock speeds. This is shocking.

10% clock boost on GPU.

He mentions that the M4 does not use LPDDR5X, but a special overclocked LPDDR5.

Shoutout to @junjie1475

CPU architecture diagrams. P-core is new. E-core is same M3/A17.


Gigachad pouring liquid nitrogen on the iPad. (From a Snapdragon branded bottle- the irony).

Does SPEC use SME?

Power consumption exploded.
 
Last edited:

junjie1475

Junior Member
Apr 9, 2024
17
51
51
@SarahKerrigan do you happen to have a Mac with Xcode? I wonder if the latest LLVM compiler is able to use SME on SPEC (I doubt it). And even if it does, that wouldn't prove Geekerwan enabled it, to keep the comparison fair.
No, we didn't use SME in the compilation of SPEC, the compiler is Apple clang14 from Xcode14.2 and LLVM Flang17 with Optimization -Ofast.
 

SarahKerrigan

Senior member
Oct 12, 2014
539
1,162
136
@SarahKerrigan do you happen to have a Mac with Xcode? I wonder if the latest LLVM compiler is able to use SME on SPEC (I doubt it). And even if it does, that wouldn't prove Geekerwan enabled it, to keep the comparison fair.

Unfortunately not - but I'd be happy to do a run if someone has a system I can use remotely.

In my experience, even autovectorization on SPEC is very limited. I agree with your doubts about it being able to usefully emit SME.
 

FlameTail

Platinum Member
Dec 15, 2021
2,905
1,636
106
Reactions: Tlh97 and SpudLobby

Eug

Lifer
Mar 11, 2000
23,725
1,261
126


With the efficiency cores running at almost 3 GHz, I think I'd be really happy with an ultralight MacBook with 2 performance cores and 4 efficiency cores. Oh wait, is that coming to the iPhone this fall? Even 2+2 would be fine.

View attachment 99403
Power consumption exploded.
Yes, but at higher clocks. At similar clocks to M3, the power utilization is in the same ballpark. Power utilization is 5% higher at similar clocks, but performance is 8% higher in SPEC (no SME).



He mentions that the M4 does not use LPDDR5X, but a special overclocked LPDDR5.
View attachment 99394
I wonder how that relates to the 6 GB DRAM chip thing, if at all.

What’s the benefit of using nitrogen cooling?

So it doesn’t throttle?
To overclock it 0.01 GHz according to Tom's Hardware.

But yeah, so it doesn't throttle.

View attachment 99399
Gigachad pouring liquid nitrogen on the iPad. (From a Snapdragon branded bottle- the irony).
Haha. I missed that.

Great work @junjie1475!

 
Reactions: Ghostsonplanets

uzzi38

Platinum Member
Oct 16, 2019
2,690
6,345
146
Uh wait a minute, quick question (I didn't watch the video): are stuff like the SPEC results using LN2? And if so, were they able to keep operating temperatures in reasonable ranges (AKA not sub-zero?)

Just asking because it can mess with power consumption figures, you'll often find LN2 overclocked desktop chips pulling less power than if they were running the same clocks as on air at normal operating temperatures, even on the stock boost algorithms. Thermals can have an impact on leakage, and while it doesn't usually matter with air cooling, it can have a big impact once temperatures get into the sub-zero region.

I don't really expect the numbers to be hugely impacted from this, just wanted to know for accuracy's sake.
 

Eug

Lifer
Mar 11, 2000
23,725
1,261
126
According to Geekerwan's tests, the M4 iPad Pro 11" 2024 can continuously dissipate approx. 13 Watts, compared to the 10 Watts of the M2 iPad Pro 11". The M2 iPad Pro 12.9" gets close to the M4 iPad Pro 13" though.



With mixed usage, the new iPad Pros have much better battery life.



This is likely due to the lower power requirements of tandem OLED. As you can see, the power consumption for video playback on the 2024 M4 iPad Pros is much lower in the following graph.

 

poke01

Golden Member
Mar 8, 2022
1,202
1,390
106
Uh wait a minute, quick question (I didn't watch the video): are stuff like the SPEC results using LN2? And if so, were they able to keep operating temperatures in reasonable ranges (AKA not sub-zero?)

Just asking because it can mess with power consumption figures, you'll often find LN2 overclocked desktop chips pulling less power than if they were running the same clocks as on air at normal operating temperatures, even on the stock boost algorithms. Thermals can have an impact on leakage, and while it doesn't usually matter with air cooling, it can have a big
It looked like LN2 was used for SPEC ‘to measure peak performance’.

But it doesn’t matter from these screenshots below, it looked the M4 pulled more power under LN2.

Normal, SPEC 2017 int: 6.98w for int

From: https://m.bilibili.com/video/BV1ir421j7vR?spm_id_from=333.999.0.0

LN2?, SPEC: 7.21W for int

From: Geekerwan
 

Doug S

Platinum Member
Feb 8, 2020
2,420
3,914
136
There isn't a whole lot of difference between 6.98 and 7.21, nor between the SPECints of 11.37 and 11.72.

The higher figure is probably what we'd expect from an M4 installed in say a Mac Mini, where it has traditional heatsink and fan, rather than trying to pass all its heat through a copper Apple logo.
 

SpudLobby

Senior member
May 18, 2022
912
611
106
Fwiw geekerwan seems to be measuring now with internal Apple APIs. There are other figures putting total M4 power at 11W which is what I actually expect it’s running platform level (idle normalized).

Even A14 iPhones when geekerwan measured from the VRMs/wall were running 4.11W on Spec, and the M1 P core here is substantially below that, and substantially below even Andrei’s measurements, so I suspect this is using Apple internal power modeling that Geekerwan recently mentioned.

M1 ain’t doing less than an A14 with a higher clock at the peak of the curve and twice the bus width, twice the ram. He changed the methodology here, or he massively changed what he’s actually testing, yet it’s still Spec.



Pretty important thing to note, much more important than the cooling.
 
Reactions: name99

name99

Senior member
Sep 11, 2010
427
324
136
No, we didn't use SME in the compilation of SPEC, the compiler is Apple clang14 from Xcode14.2 and LLVM Flang17 with Optimization -Ofast.
What happens if you allow SME?
In PRINCIPLE LLVM should
- detect loops that look like matrix multiples or similar (and also appropriate long vector loops)
- map them to linalg operations
- which should then be lowered to SME or SSVE if the compiler has been given permission to do so

The various steps in this process are newish, in the sense that they've been written over the past two or three years, and haven't had much real world testing. But in THEORY they should work.

You could also try the multiversioning support, as described here,
for a single function that looks like it should use SSVE2 or SME2, and see what happens.
 
Reactions: carancho

name99

Senior member
Sep 11, 2010
427
324
136
Unfortunately not - but I'd be happy to do a run if someone has a system I can use remotely.

In my experience, even autovectorization on SPEC is very limited. I agree with your doubts about it being able to usefully emit SME.

The ARM blog doesn't think so.
If you go through their annual changes to LLVM and GCC, every year they call out some big change in one of the SPEC benchmarks enabled by some new vectorization, though each year it tends to be a different function.

eg

It seems like the linalg stuff might be lagging (in the compilers being used) as opposed to leading edge of Flang 18 and LLVM 18, but that just means we should see big boosts after WWDC? (Look at eg the Flang numbers at the above link)

The real question is how aggressively (and sensibly) the compiler routes to SME and SSVE, and the answer to both may be "not at all", and "terribly", until XCode 16 (which, while surely far from perfect, will presumably also at least make some sort of intelligent effort, when given an M4 target).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |