Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Eug · Apr 14, 2022

So basically, ported software that leverages the GPU often isn't very well optimized for Apple Silicon.

*Shakes fist at Kyro*

I appreciate the technical explanation, but the general idea that software optimization won't be ideal for gen 1 Apple Silicon, esp. its GPU, shouldn't come as a surprise to anyone, even non-coders like me.

On another note:

I know that Apple had signed a new licensing agreement with PowerVR/Imagination, after that very public split. However, I still don't understand what that was all about. Did they have a stall in their negotiations and Apple just took the nuclear option as a negotiation ploy? Or did PowerVR better educate Apple about their inability to move forward without PowerVR IP, forcing Apple to sign? IOW, do you guys and gals know if Apple's custom GPU uses Imagination IP?

repoman27 · Apr 14, 2022

It looks like I was wrong (and Hector Martin was right) about the transport between the NAND modules and SoC on the M1 Macs.

Each NAND package contains 4 to 16 NAND dies plus one S5E MSP die. The NAND dies are connected to the S5E via two 8-bit Toggle DDR 3.0 busses operating in the 533 to 800 MT/s range. The S5E in each NAND package is in turn connected to the ANS3 NVMe controller in the M1 SoC via a PCIe Gen4 x1 link. The ANS3 is hooked directly into the system fabric with access to universal memory.

The SSDs in the M1 Macs all use two NAND packages. The 1TB AP1024Q and 2TB AP2048Q models each have 32 dual-plane NAND dies connected via 4 Toggle DDR 3.0 channels to 2 memory signal processors with 2 PCIe Gen4 x1 links connecting them to the NVMe storage controller integrated into the SoC.

For the SSDs in the M1 Pro/Max/Ultra Macs, 4 to 8 NAND packages are used. The 4TB AP4096R and 8TB AP8192R have 8 NAND packages containing 128 dual-plane NAND dies connected via 16 Toggle DDR 3.0 channels to 8 memory signal processors with 8 PCIe Gen4 x1 links connecting them to the NVMe storage controller integrated into the SoC.

When I originally looked at the M1 die layouts, I assumed that the PCIe interfaces were Gen4 2x2 + 1x1 and only supported up to 3 ports. I similarly assumed the M1 Pro and Max interfaces were Gen4 3x4 and were also limited to 3 ports. Although I got the lane counts right, it would appear that the M1 actually has 5 independent ports, and the M1 Pro and Max have 12. Two ports are dedicated to the ANS3 NVMe controller in the M1, and 8 ports for the M1 Pro/Max/Ultra. These lanes don't show up at all using the ioreg command I normally issue to view the PCIe device tree because they are exclusive to the integrated NVMe controller. This also explains why Apple only used x1 links for the PCIe devices on the logic board. I thought perhaps this was done just to facilitate signal routing, but wondered why they included so many PCIe lanes in the design if they weren't planning on utilizing them. Turns out they were actually using 50-100% of them across the line-up.

Doug S · Apr 14, 2022

Has anyone seen any benchmarks of Apple's "SSD"? Preferably something testing raw performance, not performance through a filesystem. I checked Storage Review since they used to do the occasional Mac related benchmark but they haven't done one for years.

I'm curious to see how well that design translates into actual performance.

repoman27 · Apr 14, 2022

Saylick said:
What do you guys make of this?

Max Tech could only get their M1 Ultra to hit 58º C during their testing, and their conclusion is that, "Something has gone terribly wrong in terms of chip perf."

On the other hand, Max Tech had no trouble finding workloads that would cause Intel Macs to hit 100º C and throttle, apparently concluding that this was normal, expected, and probably the best way to engineer a system in order to achieve maximum performance. 🤔

Also, thinking that Apple would choose to implement a hardware change that might cost them money in order to give developers an easy-out, rather than just leaving third parties to do a better job of optimizing their code... Doesn't sound like the Apple I know.

Eug · Apr 14, 2022

repoman27 said:
Also, thinking that Apple would choose to implement a hardware change that might cost them money in order to give developers an easy-out, rather than just leaving third parties to do a better job of optimizing their code... Doesn't sound like the Apple I know.

Didn’t they do just that to make Rosetta 2 translation faster?

EDIT:

Yes they did:

https://twitter.com/x/status/1331736203402547201

repoman27 · Apr 14, 2022

Eug said:
Didn’t they do just that to make Rosetta 2 translation faster?

EDIT:

Yes they did:

https://twitter.com/x/status/1331736203402547201

Rosetta was an absolutely essential technology to enable the Apple Silicon transition, and Apple will still drop it as soon as they possibly can. This TLB issue is entirely different.

edit: From an end user perspective, this pretty much only affects M1 Ultra Mac Studio owners who want to run crappy ports of PC games—a tiny audience that Apple has shown zero interest in catering to in the past.

Eug · Apr 14, 2022

repoman27 said:
From an end user perspective, this pretty much only affects M1 Ultra Mac Studio owners who want to run crappy ports of PC games—a tiny audience that Apple has shown zero interest in catering to in the past.

Isn't this a significant issue for stuff like Blender?

Ajay · Apr 14, 2022

Eug said:
Isn't this a significant issue for stuff like Blender?

? Blender has M1 binaries - now with Metal support (though still needing performance tuning for Apple iGPUs).

Eug · Apr 14, 2022

Ajay said:
? Blender has M1 binaries - now with Metal support (though still needing performance tuning for Apple iGPUs).

That's what I mean. It's been ported to be compatible with Metal, but not optimized.

IIRC, performance is not good.

repoman27 · Apr 14, 2022

Ajay said:
? Blender has M1 binaries - now with Metal support (though still needing performance tuning for Apple iGPUs).

And Apple contributed the Metal GPU backend for Cycles. They also claim to have significant performance improvements in the works.

Ajay · Apr 14, 2022

Eug said:
That's what I mean. It's been ported to be compatible with Metal, but not optimized.

IIRC, performance is not good.

I think it will take more time for a lot of Apple software ISV to get their software fully tuned for M1 Macs. There is now a considerable divergence from the former Intel Macs with Intel iGPUs and AMD graphics. That, and @Doug S comments on Apple being on v1.0 of M series silicon - it'll be a bit for all of this to get worked out for optimal performance. In the PPC->Intel transition, Intel's CPUs and x86 support structure were already well established.

Eug · Apr 14, 2022

Ajay said:
I think it will take more time for a lot of Apple software ISV to get their software fully tuned for M1 Macs. There is now a considerable divergence from the former Intel Macs with Intel iGPUs and AMD graphics. That, and @Doug S comments on Apple being on v1.0 of M series silicon - it'll be a bit for all of this to get worked out for optimal performance. In the PPC->Intel transition, Intel's CPUs and x86 support structure were already well established.

My point was just that in the case of Rosetta 2, Apple actually made a change in its hardware design specifically to speed Rosetta 2 up, instead of relying on brute force performance or else AS ports and software optimization later on. I was expecting Rosetta 2 performance of about 30-50% of native, but we got 50-70% of native, because of Apple's hardware/OS design change that AFAIK people here did not predict. Correct me if I'm wrong, but I gather this design change was implemented strictly for this purpose.

I do acknowledge @repoman27's point though that Rosetta 2 will be deprecated at some point in macOS though.

Doug S · Apr 14, 2022

Eug said:
My point was just that in the case of Rosetta 2, Apple actually made a change in its hardware design specifically to speed Rosetta 2 up, instead of relying on brute force performance or else AS ports and software optimization later on. I was expecting Rosetta 2 performance of about 30-50% of native, but we got 50-70% of native, because of Apple's hardware/OS design change that AFAIK people here did not predict. Correct me if I'm wrong, but I gather this design change was implemented strictly for this purpose.

I do acknowledge @repoman27's point though that Rosetta 2 will be deprecated at some point in macOS though.

The main reason Rosetta 2 is much faster than other translation schemes is because it does static translation on the binary, rather than JIT translation. The memory ordering thing was to help correctness, not performance. Rosetta 2 could have worked without it, it would have made it job harder and little corner cases of data corruption would likely have occurred.

If Intel's memory ordering made a real difference for performance they would have added a mode using looser memory ordering long ago, either with different load instructions or AMD would have made the switch for x86-64 mode. The main impact of Intel's memory ordering is that it complicates the design of the load/store unit.

Eug · Apr 14, 2022

Doug S said:
The main reason Rosetta 2 is much faster than other translation schemes is because it does static translation on the binary, rather than JIT translation. The memory ordering thing was to help correctness, not performance. Rosetta 2 could have worked without it, it would have made it job harder and little corner cases of data corruption would likely have occurred.

Yes, static translation on the binary is a very important aspect of course.

However, regarding the memory ordering change, IIRC much of the previous discussion in this thread 1.5 years ago was about how it would significantly improve performance.

Eug · Apr 14, 2022

M2 Max 12-core / 38-core mentioned:

https://www.bloomberg.com/news/arti...everal-new-macs-with-next-generation-m2-chips

Apple Inc. has started widespread internal testing of several new Mac models with next-generation M2 chips, according to developer logs, part of its push to make more powerful computers using homegrown processors.

The new machines being tested include:

A MacBook Air with an M2 chip, codenamed J413. This Mac will have eight CPU cores, the components that handle the main processing, and 10 cores for graphics. That’s up from eight graphics cores in the current MacBook Air.

A Mac mini with an M2 chip, codenamed J473. This machine will have the same specifications as the MacBook Air. There’s also an “M2 Pro” variation, codenamed J474, in testing.

An entry-level MacBook Pro with an M2 chip, codenamed J493. This too will have the same specifications as the MacBook Air.

A 14-inch MacBook Pro with M2 Pro and “M2 Max” chips, codenamed J414. The M2 Max chip has 12 CPU cores and 38 graphics cores, up from 10 CPU cores and 32 graphics cores in the current model, according to the logs. It will also have 64 gigabytes of memory.

A 16-inch MacBook Pro with M2 Pro and M2 Max chips, codenamed J416. The 16-inch MacBook Pro’s M2 Max will have the same specifications as the 14-inch MacBook Pro version.

A Mac Pro, codenamed J180. This machine will include a successor to the M1 Ultra chip used in the Mac Studio computer.

Apple is also testing a Mac mini with an M1 Pro chip, the same processor used in the entry-level 14-inch and 16-inch MacBook Pros today. That machine is codenamed J374. The company has tested an M1 Max version of the Mac mini as well, but the new Mac Studio may make these machines redundant.

Roland00Address · Apr 14, 2022

Bring it! Not sure if I want it, or if I want M1 Generation products to go cheaper. Of course the answer is both =D

Eug · Apr 14, 2022

Roland00Address said:
Bring it! Not sure if I want it, or if I want M1 Generation products to go cheaper. Of course the answer is both =D

M1 generation products are already cheaper. There have been several sales on MacBook Airs, MacBook Pros, iMacs, and Mac minis. And they're on the refurb store, too.

M2 is coming...

Roland00Address · Apr 14, 2022

Eug said:
M1 generation products are already cheaper. There have been several sales on MacBook Airs, MacBook Pros, iMacs, and Mac minis. And they're on the refurb store, too.

M2 is coming...

Agreed, but also you ...

underestimate the power of the ~~dark side~~ how cheap I am =P

Doug S · Apr 15, 2022

If this stuff has only started to appear in developer logs it looks good for M2 being made with A16 cores, though that doesn't guarantee it. Presumably made on N4, which entered risk production late last year but as a tweak of N5 would be a shorter risk to mass production cycle than a node jump.

Eug · Apr 16, 2022

The Chips That Rebooted the Mac

Apple’s risky, years long effort to design its own silicon paid off when supply-chain disruptions left competitors scrambling

The Chips That Rebooted the Mac

Apple’s risky, yearslong effort to design its own silicon paid off when supply-chain disruptions left competitors scrambling.

www.wsj.com

The Wall Street Journal article above is paywalled but in there is an analyst's estimate that by revenue, Apple's chip business makes it the 12th largest chip company in the world.

EDIT:

This article says 11th at $15 billion, just behind AMD and ahead of Infineon.

Counterpoint

www.counterpointresearch.com

oak8292 · Apr 16, 2022

Eug said:
This article says 11th at $15 billion, just behind AMD and ahead of Infineon.

Counterpoint

www.counterpointresearch.com

View attachment 60158

What is meant by revenue? The revenue for Apple is based on wafer and packaging purchases by Apple from TSMC at about 25% of the $60 billion in TSMC revenue. Apple buys a lot more in both wafers and packaging than AMD or Qualcomm. If you give Apple an internal transfer margin of 45-50% similar to other fabless companies as if Apple Semi sold to Apple Consumer Products then semi revenue would be similar to Qualcomm or Micron.

What accounting does Samsung do for revenue on their internal transfers for Exynos processors in Samsung phones? Is semi revenue based on wafer sales or is there a margin similar to what fabless companies like AMD, Nvidia or Qualcomm would add for design and IP?

Intel had a 60+% margin because they were getting both the foundry margin and the design/IP margin. The fall closer to 50% is a bad sign which indicates either the design/IP has lost value or they are not getting a foundry margin to pay for Capex.

Doug S · Apr 17, 2022

oak8292 said:
What is meant by revenue? The revenue for Apple is based on wafer and packaging purchases by Apple from TSMC at about 25% of the $60 billion in TSMC revenue. Apple buys a lot more in both wafers and packaging than AMD or Qualcomm. If you give Apple an internal transfer margin of 45-50% similar to other fabless companies as if Apple Semi sold to Apple Consumer Products then semi revenue would be similar to Qualcomm or Micron.

What accounting does Samsung do for revenue on their internal transfers for Exynos processors in Samsung phones? Is semi revenue based on wafer sales or is there a margin similar to what fabless companies like AMD, Nvidia or Qualcomm would add for design and IP?

Intel had a 60+% margin because they were getting both the foundry margin and the design/IP margin. The fall closer to 50% is a bad sign which indicates either the design/IP has lost value or they are not getting a foundry margin to pay for Capex.

Yep I was going to say the same thing. Assigning a "revenue" figure based on TSMC's revenue would be like assigning Intel's chip "revenue" based on how much it cost them to fab the chips, ignoring costs like its teams of architects and R&D shared between the chip side and fab side, then assigning a gross margin above that, with the lion's share of the profits going to their sales side.

Also curious how they calculated Qualcomm's revenue. They reported $33.5 billion in total revenue for 2021, the list credits them with $29.2 billion in "chip" revenue. I googled for a couple minutes looking for a breakdown and the only one I could find is $26.7 billion in "equipment and services" and $6.8 billion in "licensing". So obviously they are including patent licensing revenue as part of Qualcomm's "chip" revenue (and since Qualcomm double dips there, it is twice as nice)

If you based every fabless company on what they paid to have their chips made, like Apple's estimate, they would obviously all be below Apple because no one pays a foundry more to have their chips made than Apple does. So they should be no lower than 5th on that list - and probably would be 4th since even a fairly conservative (by Intel standards) 50% gross margin would put them above Micron's $30 billion.

Thala · Apr 19, 2022

Eug said:
My point was just that in the case of Rosetta 2, Apple actually made a change in its hardware design specifically to speed Rosetta 2 up, instead of relying on brute force performance or else AS ports and software optimization later on. I was expecting Rosetta 2 performance of about 30-50% of native, but we got 50-70% of native, because of Apple's hardware/OS design change that AFAIK people here did not predict. Correct me if I'm wrong, but I gather this design change was implemented strictly for this purpose.

I do acknowledge @repoman27's point though that Rosetta 2 will be deprecated at some point in macOS though.

Excuse my ignorance, but we are getting 50-70% native speed without any hardware/OS design changes for instance with standard ARM cores in WoA using JIT. Of course, chance is, that Microsofts emulation technology is inherently more advanced.

Heartbreaker · Apr 28, 2022

Interesting testing/benchmark for Computation Fluid Dynamics. Apparently this work load is very hard on the memory subsystem, which leads to falloff as cores are added, but the more robust memory of Mac Studio holds up much better:

2022 Mac Studio (20-core M1 Ultra) Review

Roland00Address · Apr 28, 2022

I lost track of the multiple replies where people were speculating is mac sales going to go up as part of the marketshare of all computers. Well this post did not answer this but Apple just did its Q2 numbers just a few hours ago and this happened.

Mac revenue: $10.44 billion vs. $9.25 billion estimated, up 14.73% year-over-year

Aka in a non Holiday quarter we are now talking 10 billions in sales (not profits) for the macs.

Aka look at the long term trend of the Mac Revenue (graph below does not included the most recent numbers it is 3 months old.

Aka prior to the pandemic and prior to the M1 switch total sales were 5 billion to 7.5 billion per quarter with Q1 (aka the Holiday quarters) being in the low 7ish range. Now after the pandemic and after the M1 switch over we are seeing higher sales 10.85 billion in revenue 3 months ago and now 10.44 billion when last year we were dealing in the 8s to low 9s range all year and even lower than that in the 7s to 8s with 2020.

Something changed with sales even if I can't prove it is the M1.

Edit: we have m1 products in the last 6 quarters of the below image, and the most recent numbers make 7, but not ever type of product in the line has not had a m1 equivalent until now (still missing a Cheese grater though the Mac Studio is close to many of the use cases.)

Discussion Apple Silicon SoC thread

Lifer

Lifer

Senior member

Platinum Member

Senior member

Lifer

Senior member

Lifer

Lifer

Lifer

Senior member

Lifer

Lifer

Platinum Member

Lifer

Lifer

Platinum Member

Lifer

Platinum Member

Platinum Member

Lifer

Member

Platinum Member

Golden Member

Diamond Member

Platinum Member

Attachments