- Mar 3, 2017
- 1,626
- 5,908
- 136
For AM4 we had X370, X470 and X570 so they could launch one with even more connectors, but I doubt it. Maybe a X870 with PCIe 6 with zen 6.Yeah, after all, X570 didn't have any follow up (except a silent revision) when zen 3 launched. I think at that point motherboard manufacturers probably realised that devaluing perfectly capable motherboards by releasing the same ones but with a different name was a bad idea. And now that AMD requires every AM5 motherboard to have USB BIOS flashback, this is a complete non-issue
I don't get why people think GB6 MT not scaling perfectly is a problem. I don't know what it is doing, maybe it is being stupid, but in the real world most problems aren't like SPEC or CB's MT where the way they do it allows perfect scaling until you hit system limits like memory bandwidth or cache coherence traffic.
GB5 tracks closely with SPEC under optimal conditions (read: you know how to properly run benchmarks)Thought it was GB 5 that tracks pretty well with SPEC? Maybe GB 6 ST does too, but I doubt GB 6 MT does. IIRC there was a paper on it, but idk if that was just a fever dream or what lol
I feel like the pattern is much more complicated than just "AMD does worse on CB" and more to do with the specific arch implementation. Bcuz Zen 5 is such a large change, I have no reason to believe that pattern would hold.
I agree, I thought about this after they launched it.They can easily fix the GB6 score problem by introducing a third MT Max score where everything is embarrassingly parallel.
Depends in the workload. I routinely run workloads on my 7950X that run 2x faster than an 8 core part would. My GPU also executes tasks that are embarrassingly parallel on a day to day basis.The same can be said in reverse, we've been fooling people for years into believing the high core count CPUs are twice or three times as fast in consumer MT workloads.
Was that not the same kind of lie, only upside down?
I do not believe there will be a new chipset or new boards - Maybe some 600 series refresh boards only. There's no new IOD with new IO to justify a new board design at all. I believe this will follow the same pattern B550/X570 did with Zen 3, which shared the same IOD as Zen 2. Only a few refresh boards.Does anybody know what upgrades or differences the B750/X770 motherboards will bring with Zen 5?
From a marketing perspective - I'm sure consumer board makers would like AMD to rebrand their chipset, even though it is unchanged. We'll see if they get their wish.There is no new chipset, if B750/X770 exist it will just be a refresh of current mobos.
Correct me if I'm wrong, but wasn't Geekbench initially a smartphone benchmark?With regards to multi-core, GeekBench have explicitly stated that their design goal was to mimic multi-core scaling in an undisclosed set of client applications and that those apps exhibited limited to negative scaling past 4 cores (GB is a little annoying in that they insist on using the word "core" interchangeably with "thread" much too often). The technical way they in which they achieved this was two-fold:
#1. They switched from "discrete tasks" to "shared tasks". This means all threads now work on the same task until that task is complete before proceeding to the next task. Compared to the previous method, this increases the benefit of caches closer to the core, eliminates most of the downsides to shared caches, and also absolutely trashes high core-count scaling, especially if multiple groups of cores (e.g. AMD's core complexes) are used. It should be noted that this switch appears to have been done on all subtasks, even those where "real world" would never see a "shared task" approach.
#2. They changed the weights of each subtask to achieve the scaling that was observed in their aforementioned (undisclosed) set of client applications. There are no public weights, since this would imply considering one subtest more important than another, so instead they modified the workload for each test as a proxy. Obviously, this means they cannot have the same workload for the single-core and multi-core version of the same subtest (and indeed they don't).
As a side note, this is a fairly fragile way of setting up a benchmark if your goal is to have it keep tracking a fixed set of client applications and it, unsurprisingly, promptly broke, forcing them to issue updates.
In any event, if you want to know about the internals of GB6, they have published some details. If you do look at those, it is very easy to criticize the subtests, but since these are all chosen simply as a proxy for an entirely different set of applications, it does not really matter much what the individual subtests actually are.
For those that are more interested in what GB6 actually *measures*, it is quite easy to take a look in their database and isolate a single line of similar processors and watch how they scale. I have not performed a rigorous analysis of Geekbench 6 and the results in their database are very noisy, so this approach obviously comes with some caveats. It would be relatively easy to perform some consistent tests across a variety of simple platforms, varying as few parameters as possible at a time, but I have not seen such an analysis.
If you looked at AMD's Dragon Range of mobile processors, you would notice that for the single-core score, the only variable with a significant correlation is boost frequency. Base frequency (range[2.5;4.0 GHz]) and L3 cache (range[32;128]) do not appear to affect the score much at all. For multi-core it is a bit harder to separate the variables, but core count does appear to the biggest factor, along with base frequency - although the latter is likely a proxy for max power. Again, L3 cache does not appear to be a major contributor.
Overall, judging by the actual scores, GB6 seems to care very little for L3 cache. It is somewhat counter-intuitive that a big 3D v-cache has almost no effect on either single-core or multi-core, despite the rather huge effect it has on many workloads (including games). The relatively small effect of L3 cache, but large effect of bandwidth to memory, appears to reflect a mix of fairly small datasets and very large datasets, with very little in between. This limit, particularly when coupled with the "shared task" approach that mitigates downsides to shared caches, really skews the results quite heavily in favor of Apple-style caches, as opposed to AMD's (or even Intel's P-cores).
I have not performed a rigorous analysis of Geekbench 6 and the results in their database are very noisy. It would be relatively easy to perform some consistent tests across a variety of simple platforms, varying as few parameters as possible at a time.
Inserting my opinion on this, specifically as it pertains to multi-core performance, I have trouble finding anything in my daily usage that reflects GB6's vision of what multi-core is. When I have dozens of tabs open all chugging away with their BS javascript and moronic h264 video ads, that type of workload is not reflected by GB6 scores. When I apply a filter to a photo or encode a small video to send to my family, that also is not reflected by GB6 scores. If I played a game (I rarely find time to), or if I analyze a chess position (blitz chess I do have time for), or if I do something that stresses the computer for more than a few *seconds* (minutes at best), or if I compile something in the background while continuing to actually use my computer, then those things *too* are not reflected by GB6 scores.
So, yes, the next time I open a single Word document or a single web page in a browser with no other active tabs, then I will be sure to think just how well GB6 reflects Real World multi-core scaling.
If I remember my history correctly, GB was originally written by a guy who thought his PowerPC Mac was slow and wrote his own benchmark to prove it, with Mac and Windows the first two OSes.Correct me if I'm wrong, but wasn't Geekbench initially a smartphone benchmark?
I seem to remember it from the galaxy nexus days. Which would be Nexus 5.
If I remember my history correctly, GB was originally written by a guy who thought his PowerPC Mac was slow and wrote his own benchmark to prove it, with Mac and Windows the first two OSes.
It certainly does seem like a smart phone focused benchmark these days, though.
ArsTechnica said:"I just switched over to the Mac back in about 2002," Poole told Ars. "So I was getting used to that ecosystem. And then the [Power Mac] G5 came out and I thought, oh, this looks really cool. I went out, bought one of the new G5s, and it felt slower than my previous Mac. And I thought, well, this is really strange; what's going on. ... So, you know, I grabbed what [benchmarks] I could download and ran them and got really confused, because what the benchmarks were saying wasn't jiving with my experience.
"So I actually went and I reverse-engineered one of the popular benchmarks and found that the tests were, for lack of a better word, terrible," said Poole. "They weren't really testing anything substantial, you know, doing really simple arithmetic operations on really small amounts of data, not really testing anything. And so I thought, how hard can it be to write a benchmark? Maybe I should write my own."
Only if you consider the use case of PCs to be running a single application at a time, mobile OS style. But even with ST heavy or Geekbench 6 style "MT" applications you can easily fill plenty more cores as soon as you run multiple instance of them at once. Something a modern browser may already be doing by running every tab in a separate process.The same can be said in reverse, we've been fooling people for years into believing the high core count CPUs are twice or three times as fast in consumer MT workloads.
Was that not the same kind of lie, only upside down?
Looks like the same SP5 socket as Genoa.View attachment 90580
View attachment 90581View attachment 90582View attachment 90583
YukkiAnns, ES2 samples of Turin
If these diagrams are true, then there is new IO die also for Turin since there is support for cxl2.0 and 6000MT DDR5. CCD in standard Turin are losing some bandwidth. Turin dense CCD are using some dual ring setup so L3 latency is going up but for cloud it should not matter much.View attachment 90580
View attachment 90581View attachment 90582View attachment 90583
YukkiAnns, ES2 samples of Turin
Yes, this must be a new die as the current one has only 12 GMI links while they now need 16. But so far nothing points to each of them being narrower.If these diagrams are true, then there is new IO die also for Turin since there is support for cxl2.0 and 6000MT DDR5. CCD in standard Turin are losing some bandwidth. Turin dense CCD are using some dual ring setup so L3 latency is going up but for cloud it should not matter much.
Edge platform (Sienna replacement) will have beastly PPW or PPC(ost) when it comes out.
But only if diagrams are true.
But you can , its just terrible for performance because of obvious reasons............Same as in 1995, it seems like AMD still needs to figure out how to run asynchronous memory to bus. Intel's patent lock on the breakthrough should be expired by now. Surely there is a solution to take better advantage of these non-synchronized memory speeds.
It actually already does for mobile parts, for saving on power consumption depending on whether the CPU (preferring low latency) or iGPU (preferring high bandwidth) requests data.Same as in 1995, it seems like AMD still needs to figure out how to run asynchronous memory to bus.
It can since Zen2, although going further upstream from umc, in-sync or 1:2 modes required.It actually already does for mobile parts, for saving on power consumption depending on whether the CPU (preferring low latency) or iGPU (preferring high bandwidth) requests data.
Can we please stop with the accusations that Geekbench 6 is useless and non-transparent because it changed how it was approaching multi-threading compared to v5?
You only need to read page 2 that contains the summary, and then click on Multithreading for example. It's a well structured document.I'm not going to read 49 pages, why don't you point out the relevant context?
Geekbench 6 uses a “shared task” model for multi-threading, rather than the “separate task” model used in earlier versions of Geekbench. The “shared task” approach better models how most applications use multiple cores.
The "separate task" approach used in Geekbench 5 parallelizes workloads by treating each thread as separate. Each thread processes a separate independent task. This approach scales well as there is very little thread-to-thread communication, and the available work scales with the number of threads. For example, a four-core system will have four copies, while a 64-core system will have 64 copies.
The "shared task" approach parallelizes workloads by having each thread processes part of a larger shared task. Given the increased inter-thread communication required to coordinate the work between threads, this approach may not scale as well as the "separate task" approach.
Geekbench 6 uses a “shared task” model for multi-threading, where each thread works on a part of a bigger task. This mimics how most applications use multiple cores.