Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

name99 · May 25, 2024

FlameTail said:
M3, M3 Pro, M3 Max

All three M3 generation parts, from the lowest end to the highest end, have a 17 TOPS NPU. This is interesting. The NPU does not scale up in size/performance for the higher end parts, like CPU/GPU does. Why not?

Will it remain this way for future generations too?

This is a business decision, not a technical decision.
It may stay the same indefinitely, implemented as good enough for human factors needs (in the same way that no-one expects a Mac Pro to come bundled with three keyboards), so Apple ships an ANE that matches what they expect for inference this year.
OR it may turn out that neural networks are something people are willing to pay more for for better performance (neural networks that are improving the quality of pro video, or whatever).

I think no-one really knows and Apple is playing it by ear. There's certainly no tech reason the ANE can't be scaled up.

SpudLobby · May 25, 2024

name99 said:
Like I said, now we are getting into semantics about what counts as "pretty shoddy".

I'm frustrated that people (frequently the same people) get excited about some chip being able to boost by 100MHz, but still insist that a free boost of their code by 5% or so from the compiler is not interesting.

Yeah I certainly didn’t say that, or at least I am not one of those guys. Others here maybe, I did chuckle.

name99 said:
That's to multiple devices.
I believe the Blackwell chip-to-chip link is 1.8TB/s so still slightly behind Apple.

(Of course to be fair we know nvLink scales, in a way that we believe is true for UltraFusion but have not actually seen; AND nvLink can cover longer distances.)

“slightly behind Apple” yeah but it’s for something different and at totally different cost structures and like you say, longer distances. Apple doesn’t really have anything special with UltraFusion and M Ultra packaging. It’s literally just borne of some InFO_L stuff from TSMC, yes?

name99 · May 25, 2024

Doug S said:
Yeah I think they didn't really have a whole lot for the NPU to do, particularly in Macs, so it wasn't worth scaling in Pro/Max. It was a solution looking for a problem. While they're still not sure what the problem is, judging from stock market price surges and Microsoft "AI PC" hype the solution is clearly "more TOPS!"

I still think over time we'll see the GPU and NPU merge. When the NPU was this tiny little corner it wasn't worth the bother, but if the NPU grows significantly while the GPU will of course continue to be very important, there is a lot to be gained from combining the two. Yes it means some work since there isn't a 100% overlap in their function, and there will need to be a way of dynamically partitioning so it can tilt from almost entirely GPU to almost entirely NPU depending on the load, but the gains from such a merger are too great to ignore.

We might see it as soon as next year, but probably 2026 unless they've already been planning it for a while.

I used to think this, until I investigated closely the exact hardware present in both. Now I think "unification GPU and NPU" is something nv will push (for obvious reasons) ... right up until they release their separate NPU...

A substantial part of the reason for a GPU, and then an NPU, is the win available from smaller area and less power. But that win only exists because of specialization. Some specialization in the case of GPU (relative to CPU), even more specialization in the case of NPU relative to GPU.

On version of this HAS, in a sense, happened, but not what you want...
What is now the ANE began life as part of the Apple ISP. It made sense to split off the part of the ISP that was handling convolution and beginning to perform more and more tasks built on that convolution (face recognition and suchlike) so that it was generally available to other code. But that's, in a sense, going backwards from what you want – MORE specialized hardware (an ISP specialized for the camera plus an ANE specialized for neural nets) rather than an undifferentiated sea of throughput compute.

Even AMX is kinda a version of this - split off what HPC wants (long runs of FMACs) from what general purpose computing wants and what is provided by NEON (flexible data rearrangement in short vector registers).

Doug S · May 25, 2024

roger_k said:
I welcome it when a CPU uses up the entire available thermal range, but this has to stay within reasonable limits. I do not think that 50+ watts for single-threaded operation is reasonable. A desktop might get away with it (even though it's a massive waste), but it is simply unacceptable for laptops. I do not want my power to shoot up beyond the CPU TDP when opening a new browser tab.

I do not see any excuses for contemporary mobile CPUs drawing more power than the enthusiast-class desktop ten years ago. That is not good engineering, and that is not honest advertising. I like Apple's hardware because their thermal design targets make sense to me. And they can still hit performance records despite using much less power than the competition. This is the path the industry should follow, not the massive power inflation we have witnessed in the last decade. And frankly, TDP should become recognized as a fraudulent advertising practice. The spec sheet should show CPU power consumption across the frequency range, not some detached from reality number that makes the CPU maker look good.

I believe in my example I said "if I had a CPU with a 100 watt TDP" that I'd be in favor of it being able to draw 100 watts in a single core if that were possible and still contributing to faster speeds.

That exact same CPU when in a laptop with a 25 watt TDP would be limited to drawing 25 watts.

FlameTail · May 26, 2024

Doug S said:
I believe in my example I said "if I had a CPU with a 100 watt TDP" that I'd be in favor of it being able to draw 100 watts in a single core if that were possible and still contributing to faster speeds.

That exact same CPU when in a laptop with a 25 watt TDP would be limited to drawing 25 watts.

I don't like that. It would have to be done by jacking up the frequency of the core. As frequency is increased, power consumption rises exponentially and performance-per-watt drops like a rock.

Wasteful.

FlameTail · May 26, 2024

name99 said:
I used to think this, until I investigated closely the exact hardware present in both. Now I think "unification GPU and NPU" is something nv will push (for obvious reasons) ... right up until they release their separate NPU...

A substantial part of the reason for a GPU, and then an NPU, is the win available from smaller area and less power. But that win only exists because of specialization. Some specialization in the case of GPU (relative to CPU), even more specialization in the case of NPU relative to GPU.

On version of this HAS, in a sense, happened, but not what you want...
What is now the ANE began life as part of the Apple ISP. It made sense to split off the part of the ISP that was handling convolution and beginning to perform more and more tasks built on that convolution (face recognition and suchlike) so that it was generally available to other code. But that's, in a sense, going backwards from what you want – MORE specialized hardware (an ISP specialized for the camera plus an ANE specialized for neural nets) rather than an undifferentiated sea of throughput compute.

Even AMX is kinda a version of this - split off what HPC wants (long runs of FMACs) from what general purpose computing wants and what is provided by NEON (flexible data rearrangement in short vector registers).

Qualcomm's NPU also had a similar origin story. In their case, it was born from their Hexagon DSP (digital signal processor).

roger_k · May 26, 2024

Doug S said:
I believe in my example I said "if I had a CPU with a 100 watt TDP" that I'd be in favor of it being able to draw 100 watts in a single core if that were possible and still contributing to faster speeds.

That exact same CPU when in a laptop with a 25 watt TDP would be limited to drawing 25 watts.

As I said, I don’t find it reasonable that a CPU draws the same amount of power reading a spreadsheet as it does running a demanding multicore compute job. This is not good user experience.

Doug S · May 26, 2024

FlameTail said:
I don't like that. It would have to be done by jacking up the frequency of the core. As frequency is increased, power consumption rises exponentially and performance-per-watt drops like a rock.

Wasteful.

Why? No one would force you to do it. Do you disable turbo mode on your Intel or AMD PCs? That's taking CPUs into less efficient territory too!

Where's the line? If you're serious about saving power, disable all your P cores and run on E cores alone. On a Mac/iPhone the P core uses ~10x the power for ~3x the performance. Sounds like a bad deal to me! In fact you should probably be wanting to run your E cores are less than max frequency, because they have a power/frequency curve of their own, and are even more efficient running at half their max frequency!

FlameTail · May 28, 2024

Andrei discovered 5 years ago that Apple implements something akin to IBM Telum's virtual L3 cache.

@igor_kavinski

Source:

The Apple iPhone 11, 11 Pro & 11 Pro Max Review: Performance, Battery, & Camera Elevated

www.anandtech.com

FlameTail · May 31, 2024

Running LLMs locally on a Macbook Air;

One thing I found interesting is that it seems Apple Silicon primarily uses the GPU, and not the Neural Engine for this stuff.

FlameTail · May 31, 2024

According to Wikipedia, Apple M4 is ARMv9.4

It is rumored the Apple M4 is Apple's first SoC which uses the ARMv9 architecture for its CPU cores, ARMv9.4 to be specific

Apple M4 - Wikipedia

en.m.wikipedia.org

FlameTail · May 31, 2024

From A10 Fusion to A14 Bionic, the decoder width only increased by 50% (6 -> 8), but the PPC nearly doubled.

Good old days.

roger_k · May 31, 2024

FlameTail said:
According to Wikipedia, Apple M4 is ARMv9.4

Apple M4 - Wikipedia

en.m.wikipedia.org

Well, whoever added it to the WIkipedia is wrong. Quite a lot technical details on Apple Silicon is wrong there. And the funny thing is that it is impossible to correct this information because no definitive authority exists. Basically, whoever edits the article first gets to invent whatever BS they want and there is no way to get it fixed.

roger_k · May 31, 2024

FlameTail said:
One thing I found interesting is that it seems Apple Silicon primarily uses the GPU, and not the Neural Engine for this stuff.

It will use the NPU if you give it a stuitable CoreML model. Most popular frameworks do not use that functionality.

igor_kavinski · May 31, 2024

FlameTail said:
According to Wikipedia, Apple M4 is ARMv9.4

Apple M4 - Wikipedia

en.m.wikipedia.org

Bravo, Apple! Even Wikipedia had to become rumorpedia coz you are sooooo transparent! /s

igor_kavinski · May 31, 2024

roger_k said:
Basically, whoever edits the article first gets to invent whatever BS they want and there is no way to get it fixed.

You think Apple can't do anything about it if they really really wanted to?

This just lets them brush off Wikipedia as a source of "crowd sourced" information and point everyone to their developer portal which requires a login and funnily enough, still has no architectural and ISA details about M4.

Apple is so paranoid and secretive it would make a nun blush.

Eug · May 31, 2024

Notebookcheck’s review:

Apple iPad Pro 11 2024 tablet review – Lighter, slimmer, and lightning fast

For years, creative users and entertainment fans have had a blast on the iPad Pro, and now Apple presents an update: The 11-inch model is getting slimmer and is supposed to offer more battery life and performance. OLED is now used as the display technology. We take a detailed look at the...

www.notebookcheck.net

Geekbench 6.2 (no SME)

Power consumption

3DMark Wild Life Stress Test (frame rate during repeated benchmarks)

poke01 · May 31, 2024

Eug said:
Notebookcheck’s review:

Apple iPad Pro 11 2024 tablet review – Lighter, slimmer, and lightning fast

For years, creative users and entertainment fans have had a blast on the iPad Pro, and now Apple presents an update: The 11-inch model is getting slimmer and is supposed to offer more battery life and performance. OLED is now used as the display technology. We take a detailed look at the...

www.notebookcheck.net

Geekbench 6.2 (no SME)

View attachment 100096

Power consumption

View attachment 100101

View attachment 100102

View attachment 100100

View attachment 100097

3DMark Wild Life Stress Test (frame rate during repeated benchmarks) - Max limited to 60 fps on iPad Pro

View attachment 100099

View attachment 100098

This shows that Apple needs a big IPC jump next M chip. They can’t raise frequency forever if they want to keep these chips in fanless devices.

Mahboi · May 31, 2024

poke01 said:
This shows that Apple needs a big IPC jump next M chip. They can’t raise frequency forever if they want to keep these chips in fanless devices.

Which is going to be a very interesting squeeze. They can't get any more IPC since all the IPC makers apparently left for Nuvia.
We're going to see a very wild switcheroo where QC will provide M1-like chips, for cheaper, and go to M2/3/4 pretty quickly without raising area or frequency too much.
While Apple will have to eat through their margins for area or admit that fans are necessary again because freq has gone up too much.

https://twitter.com/x/status/1787959708156129512

Mmmmmh.

FlameTail · May 31, 2024

poke01 said:
This shows that Apple needs a big IPC jump next M chip. They can’t raise frequency forever if they want to keep these chips in fanless devices.

Nah, they'll continue to sell their soul (efficiency) to the devil, in exchange for cheap performance gains.

FlameTail · May 31, 2024

Mahboi said:
Which is going to be a very interesting squeeze. They can't get any more IPC since all the IPC makers apparently left for Nuvia.
We're going to see a very wild switcheroo where QC will provide M1-like chips, for cheaper, and go to M2/3/4 pretty quickly without raising area or frequency too much.
While Apple will have to eat through their margins for area or admit that fans are necessary again because freq has gone up too much.

https://twitter.com/x/status/1787959708156129512

Mmmmmh.

That Kepler tweet is garbage though. ARM isn't the problem.

Edit: They reason is that switching to ARM wasn't the problem. It's becuase they lost their engineers. Even if they switched to RISC-V instead of ARM, they would still have the same issue.

Apple making their own processors is great for them. They have always been wanting to do this. Tight vertical integration is their signature. I don't think Apple is going to have any regrets for the foreseeable future (~5 years), even if Intel/AMD surpasses them and make better processors.

Mahboi · May 31, 2024

FlameTail said:
That Kepler tweet is garbage though.

Mmmmh. (we need more emotes, like a "uh huh" or a "cry" emote)

igor_kavinski · May 31, 2024

FlameTail said:
That Kepler tweet is garbage though. ARM isn't the problem.

I'm almost tempted to tag Kepler but if you want to be educated, I suggest you tag him before calling his analysis garbage. Not doing that makes you sound like you don't want a confrontation with him. Or if saying stuff like that, at least post your own justification why you think it's garbage.

Eug · May 31, 2024

I think Notebookcheck tested power consumption only on the 10-core variant. I would have liked to see the power consumption numbers on the 9-core variant, since that would make for an interesting comparison, and that is the one I will buy.

Also, they claimed it charges at only 20 Watts max. Maybe that’s true for the 11” but if so, I’d be surprised since the 13” can charge at up to 39-40 Watts.

EDIT:

I see elsewhere that the 11” can charge at 36+ Watts. However, both of them will charge at 20 Watts when nearing full charge.

Nothingness · May 31, 2024

roger_k said:
Well, whoever added it to the WIkipedia is wrong. Quite a lot technical details on Apple Silicon is wrong there. And the funny thing is that it is impossible to correct this information because no definitive authority exists. Basically, whoever edits the article first gets to invent whatever BS they want and there is no way to get it fixed.

The "sources" are:

https://twitter.com/x/status/1788607053361189307

You OK, Apple? Seriously, your silicon lineup is a mess

M4? The M3 is barely six months old, and what about all those Macs still stuck on the M2? When will they get some love?

www.theregister.com

You can count one point less in their credibility score

Discussion Apple Silicon SoC thread

Lifer

Senior member

Senior member

Senior member

Platinum Member

Platinum Member

Platinum Member

Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Member

Member

Lifer

Lifer

Lifer

Golden Member

Senior member

Platinum Member

Platinum Member

Senior member

Lifer

Lifer

Platinum Member