Question Incredible Apple M4 benchmarks...

igor_kavinski · May 15, 2024

Eug said:
Besides breaking the 4000 score barrier in single-core, it’s also the fastest M4 9-core multi-core score to date.

The slowest M4 Macbook now has to best this score otherwise it's a fail!

branch_suggestion · May 15, 2024

SarahKerrigan said:
I can't help it. Hysteria, you know.

senttoschool is quite the expert in hysterics.

okoroezenwa · May 15, 2024

igor_kavinski said:
The Geekbench Browser also needs an advanced search with better filtering options.

One can only hope they do this. We really need to be able to filter by subtest.

okoroezenwa · May 15, 2024

branch_suggestion said:
senttoschool is quite the expert in hysterics.

Wait, mikeegg is senttoschool? Or am I misunderstanding?

Mopetar · May 15, 2024

okoroezenwa said:
Wait, mikeegg is senttoschool? Or am I misunderstanding?

Yeah it's the same person, apparently he changed his account name. There's a few people who have done it for whatever reason. Unless there's some other way to tell that I'm not aware of you'd have to go back and look at an old post where someone replied to them as name in any quoted text from a post doesn't update with the name change on the account.

StinkyPinky · May 15, 2024

Why do people waste their energy with these synthetic benchmarks when there are plenty of real world ones that can be used. I just don't get it, Geekbench is just a fun little tool and should never be taken that seriously. Yet people seem to take it as the holy grail of CPU performance.

SarahKerrigan · May 15, 2024

StinkyPinky said:
Why do people waste their energy with these synthetic benchmarks when there are plenty of real world ones that can be used. I just don't get it, Geekbench is just a fun little tool and should never be taken that seriously. Yet people seem to take it as the holy grail of CPU performance.

Geekbench isn't bad, in a pinch. SPEC is better, and now we have SPEC numbers for M4, demonstrating small but real clock-normalized gains and a decent generational clock bump.

Nothingness · May 15, 2024

It might be interesting to put GB vs SPEC test by test comparison iPad vs A17 Pro. Some people might be surprised.

Eug · May 15, 2024

Re: Geekbench vs. spec. Did you ever see this 2020 article on Geekbench 5?

Performance Delivered a New Way Part 2 — Geekbench versus SPEC

Author: Ram Srinivasan, Performance Architect, NUVIA

SarahKerrigan · May 15, 2024

Eug said:
Re: Geekbench vs. spec. Did you ever see this 2020 article on Geekbench 5?

Performance Delivered a New Way Part 2 — Geekbench versus SPEC

Author: Ram Srinivasan, Performance Architect, NUVIA

I've seen it, and I agree with some of their points, though with reservations. The tl;dr version of my opinions on Geekbench is that it's pretty good, better than other free benchmarks available, but is not a replacement for SPEC even if they tend to correlate decently well with each other. It is also notable that their prediction for the A14 correlated much less well with Anandtech's eventual SPEC estimate for that processor.

Any benchmark primarily available in binary form is going to be limiting in terms of what you can learn from it IMO. This is doubly true in that Primate Labs openly states that it uses platform-specific intrinsics when available. I'm very wary of the latter because it has a tendency to turn benchmarks from a measurement of a core's performance to a measurement of how good a machine-specific implementation of an algorithm is.

These issues could probably be addressed by buying the source license and doing a custom build, ideally excluding the platform-specific code, but Primate doesn't advertise a list price and I've never seen its source code in use anywhere myself. SPEC is cheap, ubiquitous, contains no platform-specific optimization whatsoever, and is very well-analyzed in its behavior.

Edited to add: Furthermore, I just don't think that "two integer CPU benchmark suites correlate with each other" is really meaningful. They're both integer benchmark suites. Some CPUs have faster integer perf than others. If they don't have a reasonably strong correlation, it means that one or both is not being a very good benchmark. This is extra true if you're measuring your R2 across only seven points. For the heck of it, I tried correlating SpecFP to Specint, which measure completely and utterly different things, using seven random SPEC06 results from Anandtech. I got an R2 of approximately 0.92.

Hitman928 · May 15, 2024

SarahKerrigan said:
I've seen it, and I agree with some of their points, though with reservations. The tl;dr version of my opinions on Geekbench is that it's pretty good, better than other free benchmarks available, but is not a replacement for SPEC even if they tend to correlate decently well with each other. It is also notable that their prediction for the A14 correlated much less well with Anandtech's eventual SPEC estimate for that processor.

Any benchmark primarily available in binary form is going to be limiting in terms of what you can learn from it IMO. This is doubly true in that Primate Labs openly states that it uses platform-specific intrinsics when available. I'm very wary of the latter because it has a tendency to turn benchmarks from a measurement of a core's performance to a measurement of how good a machine-specific implementation of an algorithm is.

These issues could probably be addressed by buying the source license and doing a custom build, ideally excluding the platform-specific code, but Primate doesn't advertise a list price and I've never seen its source code in use anywhere myself. SPEC is cheap, ubiquitous, contains no platform-specific optimization whatsoever, and is very well-analyzed in its behavior.

Edited to add: Furthermore, I just don't think that "two integer CPU benchmark suites correlate with each other" is really meaningful. They're both integer benchmark suites. Some CPUs have faster integer perf than others. If they don't have a reasonably strong correlation, it means that one or both is not being a very good benchmark. This is extra true if you're measuring your R2 across only seven points. For the heck of it, I tried correlating SpecFP to Specint, which measure completely and utterly different things, using seven random SPEC06 results from Anandtech. I got an R2 of approximately 0.92.

Agreed. You can use pretty much any CPU benchmark and it will show very strong correlation. I tried it with Cinebench r20 from the 2700x to the 7950x and Anandtech's SPECint numbers and got 0.998 for the correlation coefficient. You know what that tells me? Faster CPUs perform better on CPU benchmarks. Not a very surprising result, but that doesn't mean that they track closely enough that you can substitute one for the other, especially as you get to larger data sets.

SarahKerrigan · May 15, 2024

Hitman928 said:
Agreed. You can use pretty much any CPU benchmark and it will show very strong correlation. I tried it with Cinebench r20 from the 2700x to the 7950x and Anandtech's SPECint numbers and got 0.998 for the correlation coefficient. You know what that tells me? Faster CPUs perform better on CPU benchmarks. Not a very surprising result, but that doesn't mean that they track closely enough that you can substitute one for the other, especially as you get to larger data sets.

Amen.

Hitman928 · May 15, 2024

SarahKerrigan said:
Amen.

I just realized that we can easily check their predictive accuracy with newer CPUs. If I use their prediction model with the 5950x, I get:

5950x

GB5 Integer = 1405

Predicted SPECint = 6.83

Actual SPECint (Anandtech) = 7.65

Error = -10.72%

If I repeat it for the 7950x, I get a -6.02% error. For the 13900k, I get a +4.78% error.

So the prediction based on the GB5 score would overestimate the 13900k while underestimating the Zen 3 and Zen 4 CPUs. It's not huge amounts of error, but it is significant and much higher than they show in their write-up. It is made worse when you have one line of CPUs swinging one way and the other line swinging in the opposite error direction.

mikegg · May 16, 2024

branch_suggestion said:
senttoschool is quite the expert in hysterics.

Thanks man! I don't even know you though. Glad I'm famous.

Mopetar said:
Yeah it's the same person, apparently he changed his account name. There's a few people who have done it for whatever reason. Unless there's some other way to tell that I'm not aware of you'd have to go back and look at an old post where someone replied to them as name in any quoted text from a post doesn't update with the name change on the account.

I'm. Never hiding it. I can be found in many hardware forums and reddit. But I'm not sure why your post is relevant to this thread. Let's focus on the topic instead.

mikegg · May 16, 2024

Eug said:
Re: Geekbench vs. spec. Did you ever see this 2020 article on Geekbench 5?

Performance Delivered a New Way Part 2 — Geekbench versus SPEC

Author: Ram Srinivasan, Performance Architect, NUVIA

See this post 2 years ago by me on Reddit:

https://www.reddit.com/r/hardware/comments/pitid6/eli5_why_does_it_seem_like_cinebench_is_now_the

Andrei F of Anandtech chimed in.

Since that post, the internet has changed its tune on Geekbench and Cinebench. Finally, people are getting out of the AMD marketing fueled era of using Cinebench R23 as a proxy for general CPU performance. It has its niche but needs to remain a niche.

AMD and Intel need to stop optimizing their DIY CPUs for Cinebench. Usually, their first marketing slide is Cinebench. This is very misleading since Cinebench has very poor correlation to actual CPU performance in most applications.

mikegg · May 16, 2024

Hitman928 said:
Agreed. You can use pretty much any CPU benchmark and it will show very strong correlation. I tried it with Cinebench r20 from the 2700x to the 7950x and Anandtech's SPECint numbers and got 0.998 for the correlation coefficient. You know what that tells me? Faster CPUs perform better on CPU benchmarks. Not a very surprising result, but that doesn't mean that they track closely enough that you can substitute one for the other, especially as you get to larger data sets.

I'd like to see your data on Cinebench vs SPECInt.

Cinebench R20 results are quite terrible at predicting gaming performance, for example.

branch_suggestion · May 16, 2024

mikegg said:
Since that post, the internet has changed its tune on Geekbench and Cinebench. Finally, people are getting out of the AMD marketing fueled era of using Cinebench R23 as a proxy for general CPU performance. It has its niche but needs to remain a niche.

As is well known, Cinebench does have bottlenecks that do not represent modern CPU trends or real world software developments. But it is very accessible and consistent and it is a sustained perf bench, not a burst perf bench like Geekbench.
Neither in their current forms represent good PC performance correlation, unlike SPEC. Geekbench can be cheesed in too many ways, with OS's and the like showing rather different results.
AMD used Cinebench when it suited them (Z2/3), Intel did when it suited them (ADL/RPL). ARM companies have always leaned on Geekbench because it suits them.
SPEC is the gold standard for subtest heavy sustained perf, and for rendering Blender does utilise modern CPU instructions and other capabilities very well.

mikegg said:
AMD and Intel need to stop optimizing their DIY CPUs for Cinebench.

They really don't, it is just easy for the GP to understand. SPEC is numero uno and then specific stuff like Javascript or gaming.

mikegg said:
Usually, their first marketing slide is Cinebench. This is very misleading since Cinebench has very poor correlation to actual CPU performance in most applications.

Well things have changed since 2020, AMD is strong in SPEC and many real world apps but specifically weak in both Cinebench and Geekbench.
Z5 is a very different beast compared to previous Zen cores, so how it does relative to Z4 and the current comp in strong/weak apps is complete guesswork.

mikegg said:
I'd like to see your data on Cinebench vs SPECInt.

Cinebench R20 results are quite terrible at predicting gaming performance, for example.

View attachment 99053

That is false equivalence and you know it. Comparing Cinebench results to Blender/V-Ray etc sure. But not to gaming, they are very different workloads.
Memory performance largely kept weaker Intel cores competitive in gaming vs stronger AMD cores, which is why V-Cache became a thing for client. It was originally only for technical DC but someone played around with it for client workloads and it worked out really well.

Nothingness · May 16, 2024

branch_suggestion said:
Neither in their current forms represent good PC performance correlation, unlike SPEC. Geekbench can be cheesed in too many ways, with OS's and the like showing rather different results.

Are you sure that Geekbench still varies a lot depending on the OS? Picking some random results of Linux vs Windows, I see small differences; that was not the case for GB5 IIRC where some tests depended too much on libraries (libm mostly).

BTW SPEC is not only a CPU benchmark, but also a compiler benchmark, and there you can have huge differences (even when compilers don't cheat). SPEC also depends on some extra libraries (many use jemalloc) and OS feaures (Linux THP, rebooting the machine to have the OS in an as clean as possible state). But I agree it's a better benchmark than Geekbench (especially when one uses the same compiler to compare different CPUs).

mikegg · May 16, 2024

branch_suggestion said:
As is well known, Cinebench does have bottlenecks that do not represent modern CPU trends or real world software developments. But it is very accessible and consistent and it is a sustained perf bench, not a burst perf bench like Geekbench.
Neither in their current forms represent good PC performance correlation, unlike SPEC. Geekbench can be cheesed in too many ways, with OS's and the like showing rather different results.
AMD used Cinebench when it suited them (Z2/3), Intel did when it suited them (ADL/RPL). ARM companies have always leaned on Geekbench because it suits them.
SPEC is the gold standard for subtest heavy sustained perf, and for rendering Blender does utilise modern CPU instructions and other capabilities very well.

You should read my original Reddit post. Everything you wrote here has either been argued or agreed upon in the Reddit post already.

branch_suggestion said:
They really don't, it is just easy for the GP to understand. SPEC is numero uno and then specific stuff like Javascript or gaming.

I'm certain that they do for their consumer CPUs. It's the #1 benchmark for AMD and Intel and unfortunately, Cinebench dominates the minshare space for CPU benchmarking. Things have turned a bit more since I made the Reddit post a few years ago. Quite often, you'll see people on the internet actually cite Nuvia's medium post (for example, @Eug cited it just yesterday) as well as bring up the fact that Cinebench R23 is optimized for x86 and not ARM when comparing between the two architectures.

branch_suggestion said:
That is false equivalence and you know it. Comparing Cinebench results to Blender/V-Ray etc sure. But not to gaming, they are very different workloads.
Memory performance largely kept weaker Intel cores competitive in gaming vs stronger AMD cores, which is why V-Cache became a thing for client. It was originally only for technical DC but someone played around with it for client workloads and it worked out really well.

The person I quoted claims that Cinebench R20 scores are a close predictor of SPEC scores. I'd like to see the data.

You can barely compare Cinebench results to Blender. They don't correlate as much as you think.

I'm not sure how a rendering benchmark became THE mainstream CPU benchmark over the last 6 or 7 years. Most people who use Cinema4D would use a GPU renderer. Same for Blender.

I guess people need a CPU benchmark to justify buying 16 or more consumer CPUs.

igor_kavinski · May 16, 2024

mikegg said:
I guess people need a CPU benchmark to justify buying 16 or more consumer CPUs.

I think the percentage of people who really need more than 8C/16T cores is positively dwarfed by the number of people buying more than that due to other reasons, not limited to "I bet I might need this many cores at some point so better make sure I have them available!".

SarahKerrigan · May 16, 2024

Nothingness said:
Are you sure that Geekbench still varies a lot depending on the OS? Picking some random results of Linux vs Windows, I see small differences; that was not the case for GB5 IIRC where some tests depended too much on libraries (libm mostly).

BTW SPEC is not only a CPU benchmark, but also a compiler benchmark, and there you can have huge differences (even when compilers don't cheat). SPEC also depends on some extra libraries (many use jemalloc) and OS feaures (Linux THP, rebooting the machine to have the OS in an as clean as possible state). But I agree it's a better benchmark than Geekbench (especially when one uses the same compiler to compare different CPUs).

Sure, SPEC is a compiler benchmark to a degree, but that's not a bad thing. It means you can build it in whatever configuration approximates what the software you'll be shipping or running will use - rather than having to just roll with whatever Primate Labs gives you. That's valuable! Vendor submissions are free to use trick compilers in whatever bizarre configuration they want, but then you can just run it yourself at O3 with LTO and no autopar, and see something realistic.

I've never seen significant improvements from jemalloc on SPECint in my own use (though if I recall, a couple of SPECfp tests benefit from it.) I have notes saying that omnet and leela break with jemalloc, though I haven't tested that in a few years and it's possible it's no longer the case.

Jan Olšan · May 16, 2024

Hitman928 said:
Where is the outrage now?

As far as Zen 4, GB6 was largely ignored/minimized when evaluating Zen 4.

That should be due to AVX-512 I suppose? That's just general SIMD though, you can compile pretty much anything with AVX-512 support and at least in theory, almost all codebases could benefit in small ways at least (standard functions like MEMCPY for example). It might not be significant in practice, or it could slow CPU down if it causes downclock like on Intel, but it's just plain SIMD - a standard feature of CPU core. As far as I understand, SME has narrower applicability. Not sure how much narrower exactly, but narrower.

(Also there was no update of the benchmark with new code specifically to make use of the feature for Zen 4, but I guess let's not go there)

Hitman928 · May 16, 2024

mikegg said:
I'd like to see your data on Cinebench vs SPECInt.

Cinebench R20 results are quite terrible at predicting gaming performance, for example.

View attachment 99053

Here you go. I've made it in the same format as in the link. Note that I quoted the correlation coefficient in my previous post but am showing the coefficient of determination (R^2) in the graph to match the link.

	CB r20 1t	SPECint 2017 1t
2700x	433	4.64
3950x	527	5.78
5950x	644	7.65
7950x	796	9.39

Hitman928 · May 16, 2024

Another graph with some Intel CPUs thrown in. I got rid of the connecting lines to better see the trend line. Still near perfect correlation. Just don't try to use this to predict ADL/RPL SPEC scores, it won't work out very well, just like with GB.

	CB r20 1t	SPECint 2017 1t
2700x	433	4.64
9900ks	517	6.1
3950x	527	5.78
11900k	626	7.346
5950x	644	7.65
7950x	796	9.39

Doug S · May 16, 2024

SarahKerrigan said:
Sure, SPEC is a compiler benchmark to a degree, but that's not a bad thing. It means you can build it in whatever configuration approximates what the software you'll be shipping or running will use - rather than having to just roll with whatever Primate Labs gives you. That's valuable! Vendor submissions are free to use trick compilers in whatever bizarre configuration they want, but then you can just run it yourself at O3 with LTO and no autopar, and see something realistic.

I've never seen significant improvements from jemalloc on SPECint in my own use (though if I recall, a couple of SPECfp tests benefit from it.) I have notes saying that omnet and leela break with jemalloc, though I haven't tested that in a few years and it's possible it's no longer the case.

The thing I like about SPEC involving the compiler is that everyone starts from the same source code. If you have fancy SIMD instructions, fancy SME type instructions, whatever, the compiler has to be able to on its own translate that source code into those instructions. Yes that has been abused in the past with compilers written to essentially "recognize" SPEC like code from a specific test. But that's again why I go back to looking specifically at gcc/llvm/clang compiler benchmarks because those can't be gamed. There is no dominant inner loop in a compiler that can be gamed to result in a massive speedup. Even my pet peeve with SPEC around autopar is not an issue in the gcc benchmark.

All the gaming of SPEC involved vendors reporting results using their own compiler. Maybe that still goes on but I haven't looked at results on SPEC's site for a while now. I'm more interested in stuff that doesn't qualify for posting under their run rules, like when Andrei was running SPEC in reviews here, or geekerwan's work. They are trying to be fair so they're using the same compiler everywhere (or using the "standard" compiler for each platform, i.e. Xcode/llvm on Mac, MSVC/C++ on Windows, gcc or llvm on Linux) and not using goofy non standard heap libraries.

Question Incredible Apple M4 benchmarks...

Lifer

Senior member

Member

Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Lifer

Senior member

Diamond Member

Senior member

Diamond Member

Golden Member

Golden Member

Golden Member

Senior member

Platinum Member

Golden Member

Lifer

Senior member

Senior member

Diamond Member

Diamond Member

Platinum Member