Discussion Apple Silicon SoC thread

Page 396 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
24,013
1,630
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

name99

Senior member
Sep 11, 2010
606
494
136
Intel and Apple are not exactly the same type of company. Intel is a processor company that tries to ship as many processors as they can at the highest margin possible. They also happen to make GPUs and a shrinking number of other things. Apple is a technology lifestyle company that makes a broad range of technical devices and provides a platform for software distribution as well as their own software products. As part of what they deliver, they also happen to make their own in-house silicon.

Intel is under much greater pressure to maximize the ASP of each piece of silicon that they sell. For that goal, they have to aggressively bin each item, which results in the massive number of SKUs that they have historically sold.
Aggressive binning and lots of SKUs sounds like a great plan IF there isn't a cost to these behaviors. Problem is that there is. The most obvious (though not the only) such cost is how many intel initiatives have been sunk over the pats few years because they were not supported by developers, because they are not common enough across Intel chips.

AVX-512 is the obvious case. The various off-CPU accelerators that are limited to (some) Xeons remain up-in-the-air with it being unclear how valuable they actually are given how much they are (or are not) supported. And of course the whole Optane as DRAM extension thing died because Intel insisted on locking it to a few high-end Xeons.

I think the sort of behavior you describe provides a local optimum (local in time and space) for the product manager of the product in question.
But it usually doesn't provide a global optimum for *the company as a whole*...
 

LightningZ71

Platinum Member
Mar 10, 2017
2,236
2,739
136
Oh, I don't disagree that the practice has it's issues. And Intel certainly took things a bit too far with Optane and the Xeon accelerators, though, I do think that there is likely an IP licensing issue at play there that would make them too costly to have enabled on every Xeon in that family. I'm just pointing out that, for Intel, at that time, it was important. They do seem to have fewer SKUs for each processor family on the consumer side lately, and they TEND to be less similar to one another as well.
 

Doug S

Diamond Member
Feb 8, 2020
3,192
5,467
136
Aggressive binning and lots of SKUs sounds like a great plan IF there isn't a cost to these behaviors. Problem is that there is. The most obvious (though not the only) such cost is how many intel initiatives have been sunk over the pats few years because they were not supported by developers, because they are not common enough across Intel chips.

AVX-512 is the obvious case. The various off-CPU accelerators that are limited to (some) Xeons remain up-in-the-air with it being unclear how valuable they actually are given how much they are (or are not) supported. And of course the whole Optane as DRAM extension thing died because Intel insisted on locking it to a few high-end Xeons.

Intel was fine when they mainly had SKUs based on speed & power, and later on number of cores. Oh there were times like 486sx/486dx where you had important features like the FPU cut out in a cheaper SKU but they'd always rectify it in the next iteration (i.e. all Pentiums having an FPU, the Pentium/Pentium MMX split then all PPro and derived cores having MMX)

When they started binning on feature flag level stuff like not only AVX512 vs no AVX512 but different levels of AVX512 instruction support, different levels of virtualization support and so forth that's where it all went off track. They'd announce these new things but at the time of announcement no one knew what their product lines would look like, the price points they would be selling at, the markets they would be selling into etc. so how were developers supposed to be able to guess the size of their potential addressable market? But most importantly the old expectation that "everyone will get this feature in the next generation" was out the window. Sometimes the next generation would REMOVE the feature (e.g. SGX) so it was utterly predictable that developer support for newly introduced features has trended lower and lower outside of the enterprise server market where the old "next generation gets the stuff that's optional this generation" has remained mostly intact.

I suppose Apple has it easier since they don't have an enterprise market and don't really take the needs of business in general much into account. It is on the enterprise/business market where Intel has been playing all these games hoping to extract the maximum price for SKUs going into non-consumer markets from HPC to POS, and the consumer market is the worse for it. With Apple you get the same ISA feature set at the low end in Macbook Air and Mac Mini that do you at the high end in Mac Pro. You just get "more" of everything in the higher end.
 
Jul 27, 2020
25,047
17,410
146
My USB flash drive is 1TB and it has a lot of stuff on it. It was taking quite some time to get mounted by MacOS because I assume that the whole drive is scanned before mounting? That was pretty annoying. Found out by accident that plugging in the USB before logging in evades the scanning and the drive is available mounted already.

Locked the Mac screen for the first time today (usually don't use it that much so the need never arose). The moving lock screen is kinda mesmerizing. I suppose it would take quite some work to replicate this on Windows hence why Microsoft hasn't done so.
 
Reactions: poke01

mvprod123

Senior member
Jun 22, 2024
260
300
96
The new Metal 4 looks promising. Improved MetalFX Upscaling, MetalFX Frame Interpolation and MetalFX Denoising announcement, tensor support for ML and more. Apple is making the most of its latest GPU architecture.
 
Last edited:

Eug

Lifer
Mar 11, 2000
24,013
1,630
126
Hmmm. iPadOS got a huge upgrade with version 26.

Looking much more macOS-like now.


New windowing system
Window controls
Menu bar
File folders in dock
Active background applications
Etc.

Given all this, it would have been nice if my M4 had come with 12 GB RAM instead of 8. The really annoying part of this is that at least some of the 8 GB models are 12 GB with 4 GB disabled, as shown in teardowns.
 

Eug

Lifer
Mar 11, 2000
24,013
1,630
126
Someone I know just installed the iPadOS beta. Looks fundamentally different.



This is a screen grab of our iMessage conversation, showing some of the new features. The interface has bugs and is stuttery though.
 
Last edited:

mvprod123

Senior member
Jun 22, 2024
260
300
96
The new Metal 4 looks promising. Improved MetalFX Upscaling, MetalFX Frame Interpolation and MetalFX Denoising announcement, tensor support for ML and more. Apple is making the most of its latest GPU architecture.

Metal 4 is designed exclusively for Apple silicon, and sets the stage for the next generation of games on Apple platforms with support for advanced graphics and machine learning technologies.
Developers can now run inference networks directly in their shaders to compute lighting, materials, and geometry, enabling highly realistic visual effects for their games. MetalFX Frame Interpolation generates an intermediate frame for every two input frames to achieve higher and more stable frame rates, and MetalFX Denoising makes real-time ray tracing and path tracing possible in the most advanced games.
 
Reactions: Mopetar and dr1337

okoroezenwa

Member
Dec 22, 2020
143
144
116
Given all this, it would have been nice if my M4 had come with 12 GB RAM instead of 8. The really annoying part of this is that at least some of the 8 GB models are 12 GB with 4 GB disabled, as shown in teardowns.
Or better yet updated M4 iPads to start at 16GB like the Airs. Oh well.
 
Reactions: Mopetar

Eug

Lifer
Mar 11, 2000
24,013
1,630
126
It also looks like a DeviantArt glass icon pack. Can't wait for their next unnecessary visual change, maybe it'll be inspired by fanmade Mac OS 9 themes and be readable again.
This is the optional glass icon setting, which is optional, not the default. I just wanted to give that a try to see if I like it or not. So far, I've got mixed feelings about that setting, but I suspect I'll eventually revert back to the default.
 

moinmoin

Diamond Member
Jun 1, 2017
5,217
8,398
136
This will certainly take some getting used to. It's kind of a weird mish-mash of macOS and the old iPadOS.
So how does that behave exactly? Is displaying apps in windows optional, or does it always happen with apps that support it? Is that support automatic or limited to updated apps? What about that new macOS like menu, how does it behave with lack of space, e.g. moving from landscape to portrait? Thanks.
 

mvprod123

Senior member
Jun 22, 2024
260
300
96
The new Metal 4 looks promising. Improved MetalFX Upscaling, MetalFX Frame Interpolation and MetalFX Denoising announcement, tensor support for ML and more. Apple is making the most of its latest GPU architecture.
Does Metal 4 support for tensors hint at the presence of tensor cores in M5, or is there another explanation? What do you think, @name99?
 

Eug

Lifer
Mar 11, 2000
24,013
1,630
126
So how does that behave exactly? Is displaying apps in windows optional, or does it always happen with apps that support it? Is that support automatic or limited to updated apps? What about that new macOS like menu, how does it behave with lack of space, e.g. moving from landscape to portrait? Thanks.
Displaying apps in windows is optional. However, once you've put one in a window, by default it will launch in that window in the same location. Or you can re-expand it to full screen with the "traffic light" green icon like in macOS.

Third party apps support this already, although some have limitations as to the minimum window size.

When you rotate from landscape to portrait, some apps get moved to fit the screen (or almost fit the screen), while some just are partially offscreen.

Here are some screenshots of Safari + Affinity Photo + Zoom + PowerPoint.

Landscape windowed. I have a window of Safari that is wide but short. The Affinity Photo window is narrow but long, with the bottom part of the window offscreen. Safari is positioned near the top of the screen.



Portrait windowed. Safari goes partially offscreen, but Affinity gets moved up to fit, moving it higher than Safari. I’m not sure why Safari didn’t get moved to the right to fit more of it onscreen.



PowerPoint fullscreen landscape.



Note that this is just on the iPad’s 11” screen. I’ll try testing on an external screen eventually, but I don’t have one that rotates.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
5,217
8,398
136
Much appreciated @Eug! I looked into it, seems my old small 5th gen iPad mini will still get to see iPadOS 26, and unlike Stage Manager this appears to be available on all devices, so I'll be able to try it out myself how the new approach behaves on small space. Got a currently underused Apple Pencil, I guess I'll get some more use out of that as well then. Honestly still not sure what to think about the changes, but looking forward to trying them out myself.
 
Reactions: Mopetar

Eug

Lifer
Mar 11, 2000
24,013
1,630
126
iPadOS dev beta 1 is extremely crashy when trying to use an external monitor. Basically unusable. In addition, if I'm playing Netflix, window resizing is extremely jittery. Also, it appears Affinity Photo 2 has only specific window sizes supported, but when you go beyond that, instead of increasing the usable window size, it appears to scale the window so fonts and such increase in size. Very weird.

Here is my iPad Pro M4 using my 2010 iMac Core i7 as an external 1440p monitor. Netflix plays audio out the iMac.



 
Reactions: moinmoin

name99

Senior member
Sep 11, 2010
606
494
136
Does Metal 4 support for tensors hint at the presence of tensor cores in M5, or is there another explanation? What do you think, @name99?
1. "Tensor" simply means that rank-n objects (ie objects that are a rectangular plane of data, or a cube of data, or a 4-D volume of data) are "first class objects". They can be defined easily, used as arguments easily. It's a LANGUAGE feature, it makes no claims about the hardware.

2. There are various natural operations on these arrays, the most important of which is contraction (essentially the building block of what youthink of as matrix multiplication).

3. So what do we ACTUALLY get in Metal 4?
If you look at ANE, that provides deep tensor support. For example you can set up DMA engines describing the layout of a tensor, and the DMA engine will autonomously stream the array of data into L2 or a Neural Core. You can also do things like interpolate values at "fake indices" within the tensor.

In contrast, Metal 4 suggests much less HW support (for now?). You can describe the tensor layout, and simplify your Metal 4 code (which, recall is basically C++; if you're working at the level of Python you already have all this stuff...). There's no reason to believe that current GPU hardware provides special tensor-aware load/store mechanisms. HOWEVER simply being able to describe the data access pattern within Metal at a higher level has value in optimizing the code.

4. In terms of operations provided, there are only two so far; matrix multiply and convolution. You can be more clear in your specification of how you want these performed (eg you can indicate the equivalent of if each lane is calculating a distinct small matrix multiply vs you want a threadgroup co-operatively to calculate a single large matrix multiply) and that again helps the optimization.

5. Nothing that I see indicates a deviation from the current Apple paradigms:
- ANE is first choice for inference, whenever feasible
- Matrix Multiplication is done using existing FMAs (with various tweaks to make them more energy efficient)

If we compare with nVidia
- There appears to be no interest in chasing tiny types (FP8, UINT4 and so on). Those are inference types, and inference is for the ANE.
- Sparsity (ie weight compression) is apparently being handled by ATSC texture compression (!!!) On the one hand this probably is more flexible than nV's structured sparsity. On the other hand, it expands to 0 before the matrix multiply, so you save bandwidth but not computation. On the third hand, when training you're mostly limited by bandwidth, more than computation, so who cares?

The overall pattern, it seems to me is
- nV understand that their customers are willing to pay anything for training (with inference as a secondary task that has to be performed to make this all work out) and design accordingly, by adding more (and ever more) dedicated HW ("tensor cores").
This is the fastest way to give the paying customers what they want. That doesn't mean it's the best solution if you had time to design a best solution...

- Apple understands that they are not competing in the training olympics; they want training to work, but they're also shipping generic GPUs, and they're also shipping ANE for inference. So they're much more into reuse of hardware.
Reuse existing FMAs for matrix multiply, don't add a separate tensor core. This means you don't get weirdo support like FP8 or INT4, but they don't need that, they have ANE.
Reuse existing texture compression HW to achieve the equivalent of structured sparsity.
Probably at some point they'll use the texture unit to do the equivalent of tensor interpolation if that becomes necessary. etc etc

Ultimately this looks to me more like a SW than a HW play. By surfacing these primitives in Metal 4, they make it easier for the tools people actually use (Python, but maybe also Julia or Mathematica, to route ML calculations to the GPU without having to spend man-years figuring out all the details). It changes things for devs, not so much inside Apple.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |