Discussion Apple Silicon SoC thread

Page 285 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,725
1,263
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

name99

Senior member
Sep 11, 2010
427
324
136
M3, M3 Pro, M3 Max

All three M3 generation parts, from the lowest end to the highest end, have a 17 TOPS NPU. This is interesting. The NPU does not scale up in size/performance for the higher end parts, like CPU/GPU does. Why not?

Will it remain this way for future generations too?
This is a business decision, not a technical decision.
It may stay the same indefinitely, implemented as good enough for human factors needs (in the same way that no-one expects a Mac Pro to come bundled with three keyboards), so Apple ships an ANE that matches what they expect for inference this year.
OR it may turn out that neural networks are something people are willing to pay more for for better performance (neural networks that are improving the quality of pro video, or whatever).

I think no-one really knows and Apple is playing it by ear. There's certainly no tech reason the ANE can't be scaled up.
 

SpudLobby

Senior member
May 18, 2022
918
618
106
Like I said, now we are getting into semantics about what counts as "pretty shoddy".

I'm frustrated that people (frequently the same people) get excited about some chip being able to boost by 100MHz, but still insist that a free boost of their code by 5% or so from the compiler is not interesting.
Yeah I certainly didn’t say that, or at least I am not one of those guys. Others here maybe, I did chuckle.
That's to multiple devices.
I believe the Blackwell chip-to-chip link is 1.8TB/s so still slightly behind Apple.

(Of course to be fair we know nvLink scales, in a way that we believe is true for UltraFusion but have not actually seen; AND nvLink can cover longer distances.)
“slightly behind Apple” yeah but it’s for something different and at totally different cost structures and like you say, longer distances. Apple doesn’t really have anything special with UltraFusion and M Ultra packaging. It’s literally just borne of some InFO_L stuff from TSMC, yes?
 

name99

Senior member
Sep 11, 2010
427
324
136
Yeah I think they didn't really have a whole lot for the NPU to do, particularly in Macs, so it wasn't worth scaling in Pro/Max. It was a solution looking for a problem. While they're still not sure what the problem is, judging from stock market price surges and Microsoft "AI PC" hype the solution is clearly "more TOPS!"

I still think over time we'll see the GPU and NPU merge. When the NPU was this tiny little corner it wasn't worth the bother, but if the NPU grows significantly while the GPU will of course continue to be very important, there is a lot to be gained from combining the two. Yes it means some work since there isn't a 100% overlap in their function, and there will need to be a way of dynamically partitioning so it can tilt from almost entirely GPU to almost entirely NPU depending on the load, but the gains from such a merger are too great to ignore.

We might see it as soon as next year, but probably 2026 unless they've already been planning it for a while.

I used to think this, until I investigated closely the exact hardware present in both. Now I think "unification GPU and NPU" is something nv will push (for obvious reasons) ... right up until they release their separate NPU...

A substantial part of the reason for a GPU, and then an NPU, is the win available from smaller area and less power. But that win only exists because of specialization. Some specialization in the case of GPU (relative to CPU), even more specialization in the case of NPU relative to GPU.

On version of this HAS, in a sense, happened, but not what you want...
What is now the ANE began life as part of the Apple ISP. It made sense to split off the part of the ISP that was handling convolution and beginning to perform more and more tasks built on that convolution (face recognition and suchlike) so that it was generally available to other code. But that's, in a sense, going backwards from what you want – MORE specialized hardware (an ISP specialized for the camera plus an ANE specialized for neural nets) rather than an undifferentiated sea of throughput compute.

Even AMX is kinda a version of this - split off what HPC wants (long runs of FMACs) from what general purpose computing wants and what is provided by NEON (flexible data rearrangement in short vector registers).
 

Doug S

Platinum Member
Feb 8, 2020
2,427
3,923
136
I welcome it when a CPU uses up the entire available thermal range, but this has to stay within reasonable limits. I do not think that 50+ watts for single-threaded operation is reasonable. A desktop might get away with it (even though it's a massive waste), but it is simply unacceptable for laptops. I do not want my power to shoot up beyond the CPU TDP when opening a new browser tab.

I do not see any excuses for contemporary mobile CPUs drawing more power than the enthusiast-class desktop ten years ago. That is not good engineering, and that is not honest advertising. I like Apple's hardware because their thermal design targets make sense to me. And they can still hit performance records despite using much less power than the competition. This is the path the industry should follow, not the massive power inflation we have witnessed in the last decade. And frankly, TDP should become recognized as a fraudulent advertising practice. The spec sheet should show CPU power consumption across the frequency range, not some detached from reality number that makes the CPU maker look good.

I believe in my example I said "if I had a CPU with a 100 watt TDP" that I'd be in favor of it being able to draw 100 watts in a single core if that were possible and still contributing to faster speeds.

That exact same CPU when in a laptop with a 25 watt TDP would be limited to drawing 25 watts.
 

FlameTail

Platinum Member
Dec 15, 2021
2,922
1,655
106
I believe in my example I said "if I had a CPU with a 100 watt TDP" that I'd be in favor of it being able to draw 100 watts in a single core if that were possible and still contributing to faster speeds.

That exact same CPU when in a laptop with a 25 watt TDP would be limited to drawing 25 watts.
I don't like that. It would have to be done by jacking up the frequency of the core. As frequency is increased, power consumption rises exponentially and performance-per-watt drops like a rock.

Wasteful.
 

FlameTail

Platinum Member
Dec 15, 2021
2,922
1,655
106
I used to think this, until I investigated closely the exact hardware present in both. Now I think "unification GPU and NPU" is something nv will push (for obvious reasons) ... right up until they release their separate NPU...

A substantial part of the reason for a GPU, and then an NPU, is the win available from smaller area and less power. But that win only exists because of specialization. Some specialization in the case of GPU (relative to CPU), even more specialization in the case of NPU relative to GPU.

On version of this HAS, in a sense, happened, but not what you want...
What is now the ANE began life as part of the Apple ISP. It made sense to split off the part of the ISP that was handling convolution and beginning to perform more and more tasks built on that convolution (face recognition and suchlike) so that it was generally available to other code. But that's, in a sense, going backwards from what you want – MORE specialized hardware (an ISP specialized for the camera plus an ANE specialized for neural nets) rather than an undifferentiated sea of throughput compute.

Even AMX is kinda a version of this - split off what HPC wants (long runs of FMACs) from what general purpose computing wants and what is provided by NEON (flexible data rearrangement in short vector registers).
Qualcomm's NPU also had a similar origin story. In their case, it was born from their Hexagon DSP (digital signal processor).
 

roger_k

Member
Sep 23, 2021
102
215
86
I believe in my example I said "if I had a CPU with a 100 watt TDP" that I'd be in favor of it being able to draw 100 watts in a single core if that were possible and still contributing to faster speeds.

That exact same CPU when in a laptop with a 25 watt TDP would be limited to drawing 25 watts.

As I said, I don’t find it reasonable that a CPU draws the same amount of power reading a spreadsheet as it does running a demanding multicore compute job. This is not good user experience.
 
Reactions: Eug

Doug S

Platinum Member
Feb 8, 2020
2,427
3,923
136
I don't like that. It would have to be done by jacking up the frequency of the core. As frequency is increased, power consumption rises exponentially and performance-per-watt drops like a rock.

Wasteful.


Why? No one would force you to do it. Do you disable turbo mode on your Intel or AMD PCs? That's taking CPUs into less efficient territory too!

Where's the line? If you're serious about saving power, disable all your P cores and run on E cores alone. On a Mac/iPhone the P core uses ~10x the power for ~3x the performance. Sounds like a bad deal to me! In fact you should probably be wanting to run your E cores are less than max frequency, because they have a power/frequency curve of their own, and are even more efficient running at half their max frequency!
 

FlameTail

Platinum Member
Dec 15, 2021
2,922
1,655
106
Running LLMs locally on a Macbook Air;


One thing I found interesting is that it seems Apple Silicon primarily uses the GPU, and not the Neural Engine for this stuff.
 

roger_k

Member
Sep 23, 2021
102
215
86
According to Wikipedia, Apple M4 is ARMv9.4



Well, whoever added it to the WIkipedia is wrong. Quite a lot technical details on Apple Silicon is wrong there. And the funny thing is that it is impossible to correct this information because no definitive authority exists. Basically, whoever edits the article first gets to invent whatever BS they want and there is no way to get it fixed.
 
Reactions: FlameTail
Jul 27, 2020
17,491
11,280
106
Basically, whoever edits the article first gets to invent whatever BS they want and there is no way to get it fixed.
You think Apple can't do anything about it if they really really wanted to?

This just lets them brush off Wikipedia as a source of "crowd sourced" information and point everyone to their developer portal which requires a login and funnily enough, still has no architectural and ISA details about M4.

Apple is so paranoid and secretive it would make a nun blush.
 

Eug

Lifer
Mar 11, 2000
23,725
1,263
126
Notebookcheck’s review:


Geekbench 6.2 (no SME)



Power consumption









3DMark Wild Life Stress Test (frame rate during repeated benchmarks)



 
Last edited:

poke01

Golden Member
Mar 8, 2022
1,212
1,394
106
Notebookcheck’s review:


Geekbench 6.2 (no SME)

View attachment 100096

Power consumption

View attachment 100101

View attachment 100102

View attachment 100100

View attachment 100097

3DMark Wild Life Stress Test (frame rate during repeated benchmarks) - Max limited to 60 fps on iPad Pro

View attachment 100099

View attachment 100098
This shows that Apple needs a big IPC jump next M chip. They can’t raise frequency forever if they want to keep these chips in fanless devices.
 

Mahboi

Senior member
Apr 4, 2024
741
1,313
96
This shows that Apple needs a big IPC jump next M chip. They can’t raise frequency forever if they want to keep these chips in fanless devices.
Which is going to be a very interesting squeeze. They can't get any more IPC since all the IPC makers apparently left for Nuvia.
We're going to see a very wild switcheroo where QC will provide M1-like chips, for cheaper, and go to M2/3/4 pretty quickly without raising area or frequency too much.
While Apple will have to eat through their margins for area or admit that fans are necessary again because freq has gone up too much.


Mmmmmh.
 
Reactions: igor_kavinski

FlameTail

Platinum Member
Dec 15, 2021
2,922
1,655
106
Which is going to be a very interesting squeeze. They can't get any more IPC since all the IPC makers apparently left for Nuvia.
We're going to see a very wild switcheroo where QC will provide M1-like chips, for cheaper, and go to M2/3/4 pretty quickly without raising area or frequency too much.
While Apple will have to eat through their margins for area or admit that fans are necessary again because freq has gone up too much.


Mmmmmh.
That Kepler tweet is garbage though. ARM isn't the problem.

Edit: They reason is that switching to ARM wasn't the problem. It's becuase they lost their engineers. Even if they switched to RISC-V instead of ARM, they would still have the same issue.

Apple making their own processors is great for them. They have always been wanting to do this. Tight vertical integration is their signature. I don't think Apple is going to have any regrets for the foreseeable future (~5 years), even if Intel/AMD surpasses them and make better processors.
 
Last edited:
Jul 27, 2020
17,491
11,280
106
That Kepler tweet is garbage though. ARM isn't the problem.
I'm almost tempted to tag Kepler but if you want to be educated, I suggest you tag him before calling his analysis garbage. Not doing that makes you sound like you don't want a confrontation with him. Or if saying stuff like that, at least post your own justification why you think it's garbage.
 

Eug

Lifer
Mar 11, 2000
23,725
1,263
126
I think Notebookcheck tested power consumption only on the 10-core variant. I would have liked to see the power consumption numbers on the 9-core variant, since that would make for an interesting comparison, and that is the one I will buy.

Also, they claimed it charges at only 20 Watts max. Maybe that’s true for the 11” but if so, I’d be surprised since the 13” can charge at up to 39-40 Watts.

EDIT:

I see elsewhere that the 11” can charge at 36+ Watts. However, both of them will charge at 20 Watts when nearing full charge.
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,662
1,213
136
Well, whoever added it to the WIkipedia is wrong. Quite a lot technical details on Apple Silicon is wrong there. And the funny thing is that it is impossible to correct this information because no definitive authority exists. Basically, whoever edits the article first gets to invent whatever BS they want and there is no way to get it fixed.
The "sources" are:



You can count one point less in their credibility score
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |