- Mar 3, 2017
- 1,626
- 5,910
- 136
I think it was working fine in Cinebench. Anywhere else ran the risk of hitting crazy concurrency problems:
63 Cores Blocked by Seven Instructions
I seem to have a habit of writing about super powerful machines whose many cores are laid low by misuse of locks. So. Yeah. It’s that again. But this one seems particularly impressive. I mean, how …randomascii.wordpress.com
But i guess it was AMD's way to prepare us all for very parallel future, release as retarded product as possible to exposes such weaknesses before launching 64C chips.
On topic of this new chip -> i don't have any problems with it and it will perform great beyond typical scheduling woes. ZEN5C even with reduced clock is lots of perf.
Still, let's not underestimate AMD's stupidity like @naukkis did, they surely can release lazy and stupid products.
The ironySo You are being intelectually dishonest here, going from "worked fine" to 'certainly knew what they were getting into".
Did you read the article you posted? It was basically system restore and the linker fighting over a bitmap.
And we can very reasonably infer that said fighting was made worse by said Threadripper having bad cache latencies, in turn scaling worse that in would have been.
So in the era where typical L2 to L2 latency was 10-20ns, an abomination with hundred+ of ns arrived making all lock contention worse and things like false L2 sharing bite much more.
There were plenty of benchmarks where this chip fell over, lagging behind 16C Theadripper, but i guess this is wrong thread to point that out.
Anyway i stay with my opinion that AMD was completely stupid to unleach such chip in workstation market and sorry for overestimating the signal to noise ratio of this forum. It has become no longer tolerable to me.
Well, they said that Strix Point was launching "later this year". No performance preview, but that's also a mobile part.So there goes the presentation and no word of Zen5. Too bad, was looking forward to that. MEH!
It worked fine for what it was. Everyone knew where it was good and where it wasn't. Hardly anything to complain about.So You are being intelectually dishonest here, going from "worked fine" to 'certainly knew what they were getting into".
And we can very reasonably infer that said fighting was made worse by said Threadripper having bad cache latencies, in turn scaling worse that in would have been.
So in the era where typical L2 to L2 latency was 10-20ns, an abomination with hundred+ of ns arrived making all lock contention worse and things like false L2 sharing bite much more.
There were plenty of benchmarks where this chip fell over, lagging behind 16C Theadripper, but i guess this is wrong thread to point that out.
Anyway i stay with my opinion that AMD was completely stupid to unleach such chip in workstation market and sorry for overestimating the signal to noise ratio of this forum. It has become no longer tolerable to me.
Driver Name | OS | Version |
AMD Processor Power Management Support - AMD Ryzen Power Plan | Windows 10(64-bit) | 8.0.0.13 |
Windows 11(64-bit) | 8.0.0.13 | |
AMD PCI Device Driver | Windows 10(64-bit) | 1.0.0.90 |
Windows 11(64-bit) | 1.0.0.90 | |
AMD I2C Driver | Windows 10(64-bit) | 1.2.0.124 |
Windows 11(64-bit) | 1.2.0.124 | |
AMD UART Driver | Windows 10(64-bit) | 1.2.0.116 |
Windows 11(64-bit) | 1.2.0.116 | |
AMD GPIO2 Driver | Windows 10(64-bit) | 2.2.0.130 |
Windows 11(64-bit) | 2.2.0.130 | |
PT GPIO Driver | Windows 10(64-bit) | 3.0.0.0 |
Windows 10(64-bit) | 3.0.0.0 | |
AMD PSP Driver | Windows 10(64-bit) | 5.25.0.0 |
Windows 11(64-bit) | 5.25.0.0 | |
AMD IOV Driver | Windows 10(64-bit) | 1.2.0.52 |
Windows 11(64-bit) | 1.2.0.52 | |
AMD SMBUS Driver | Windows 10(64-bit) | 5.12.0.38 |
Windows 11(64-bit) | 5.12.0.38 | |
AMD AS4 ACPI Driver | Windows 11(64-bit) | 1.2.0.46 |
AMD SFH I2C Driver | Windows 10(64-bit) | 1.0.0.86 |
Windows 11(64-bit) | 1.0.0.86 | |
AMD SFH Driver | Windows 10(64-bit) | 1.0.0.336 |
Windows 11(64-bit) | 1.0.0.336 | |
AMD MicroPEP Driver | Windows 10(64-bit) | 1.0.41.0 |
Windows 11(64-bit) | 1.0.41.0 | |
AMD Wireless Button Driver | Windows 10(64-bit) | 1.0.0.2 |
Windows 11(64-bit) | 1.0.0.2 | |
AMD PMF-6000Series Driver | Windows 10(64-bit) | 22.0.3.0 |
Windows 11(64-bit) | 22.0.3.0 | |
AMD PPM Provisioning File Driver | Windows 10(64-bit) | 8.0.0.26 |
Windows 11(64-bit) | 8.0.0.26 | |
AMD 3D V-Cache Performance Optimizer Driver | Windows 10(64-bit) | 1.0.0.7 |
Windows 11(64-bit) | 1.0.0.7 | |
AMD AMS Mailbox Driver | Windows 10(64-bit) | 3.0.0.635 |
Windows 11(64-bit) | 3.0.0.635 | |
AMD S0i3 Filter Driver | Windows 10(64-bit) | 1.0.0.17 |
Windows 11(64-bit) | 1.0.0.17 | |
AMD CIR Driver | Windows 10(64-bit) | 3.2.4.135 |
AMD USB Filter Driver | Windows 11(64-bit) | 2.1.11.304 |
AMD USB4 CM Driver | Windows 10(64-bit) | 1.0.0.37 |
AMD SFH1.1 Driver | Windows 10(64-bit) | 1.1.0.12 |
Windows 11(64-bit) | 1.1.0.12 | |
AMD PMF-7040Series Driver | Windows 10(64-bit) | 23.2.3.0 |
Windows 11(64-bit) | 23.2.3.0 | |
AMD PMF-8000Series Driver | Windows 10(64-bit) | 23.5.9.0 |
Windows 11(64-bit) | 23.5.9.0 | |
AMD PMF-7736Series Driver | Windows 10(64-bit) | no |
Windows 11(64-bit) | 23.1.17.0 | |
AMD Interface Driver | Windows 10(64-bit) | 2.0.0.14 |
Windows 11(64-bit) | 2.0.0.14 | |
AMD DRTM Driver | Windows 11(64-bit) | 1.0.16.4 |
imho, I am hoping for a PHX2 CCX * 2 configuration. That way it scales up from PHX2, rather than being a new paradigm.
CCX0: 2x Zen5 + 4x Zen5c
CCX1: 2x Zen5 + 4x Zen5c
Didn't think that possibility. It's more sensible configuration than pairing normal and c-cores to different CCX's. Wonder how today's MT-applications scale - will such a configuration affect some 3-to-4 equal threaded(do they exists ) applications or is such a configuration doing fine in most of applications?
That configuration only makes sense if the 4 core frequency is within the 'c' core reach and in the 'c' core part of the curve that is more efficient than the 'p' core. However, if that is the case, then it doesn't make sense to have more than 2 'p' cores to begin with. They would take up additional die space for no added value.
While pairing p and c-cores to different CCXs will surely introduce penalties of sorts, I fail to see how splitting the p cores in two CCXs would be better. From my understanding the whole idea of using smaller/denser cores is to cater to MT workloads, which care more about throughput than latency. Meanwhile most consumer workloads are still comparatively lightly threaded and much more latency sensitive.It's more sensible configuration than pairing normal and c-cores to different CCX's.
Actually that's not how today's CPU's boost. Even with only p-cores cores differ in their boost frequencies. Prime core is fastest and so on. There's probably at least 1GHz speed difference between P and C-cores on 4-thread load - question I make was that does it matter - or is just two P-cores sufficient if there ain't much of workloads that rely on 3-4 evenly fast threads?
Because with such a split what they got is 4-core cpu, which probably would suck on games. With 6 core hybrid CCX they basically got 12-core cpu divided to dual CCX - not so far from 7900x.With that in mind, why split P cores and wreak havoc in the speed sensitive workloads? Think about browsers or games, which can definitely make use of 4 P cores.
P-core boost differences are typically very small, especially if all of the cores are in close proximity on the same piece of silicon. That's just max boost though which doesn't come into play in this context because you're not hitting max boost clocks past 1 - 2 cores being loaded. Then the question becomes, what is the 3 - 4 cores boost frequency of the P-cores? If the C-cores can't hit that same frequency, it makes no sense to have a 2p4c+2p4c split because you'd have a significant drop-off in performance past 2 cores being loaded. If the C-cores can hit that frequency but are at the end of their frequency range and thus less efficient than the P-cores in that range, then it makes no sense to have a 2p4c+2p4c split because you are using more power for no performance improvement. Additionally, once you move to the 2nd CCX, you are now bringing in 2 P-cores that will never boost above a 5-6 core loaded frequency, which the C-cores could easily achieve, so why make them P-cores at all? You are then using more space for no performance or efficiency gain. The proposed configuration makes no sense.
It's very easy to schedule everything there since anything with a QoS priority gets scheduled onto bigs and everything not lives in the dense ghetto.is something AMD hasn't done and should newer done - non symmetrical configurations should be avoidable for being really hard for scheduler to optimize workloads.
It's very easy to schedule everything there since anything with a QoS priority gets scheduled onto bigs and everything not lives in the dense ghetto.
With that in mind, why split P cores and wreak havoc in the speed sensitive workloads?
Maybe there are 8 cores in one CCX, of which half are normal and half are dense. (Leaving 4 more dense for another CCX.) Still, a single CCX would be preferable in multithreaded applications with notable inter-thread communication.Because with such a split what they got is 4-core cpu, which probably would suck [...]
And get RAM level latency (and respective energy cost) for inter-thread synchronization. :-(You still can split you 4-core load to 4 p cores even if they are in different CCX.
In some ideal workloads, split caches perform almost the same as a unified cache with a size = the sum of the sizes of the split caches.Actually you got twice the L3 with split CCX.
And get RAM level latency (and respective energy cost) for inter-thread synchronization. :-(
No it works and works really well.That scheduling downgrades that "12" core cpu pretty much to 4-core cpu. That's the main point against doing such braindead splitting of cores.
No it works and works really well.
Everything not upper QoS priority lives in a ghetto and the moment stuff gets relevant it gets promoted to the premium quad.
You still can split you 4-core load to 4 p cores even if they are in different CCX. Actually you got twice the L3 with split CCX. Thing is that multithreaded shared job threads will almost newer scale to equal strong threads - so there's usually need for 1 prime core to do main thread which splits child threads. By splitting p-cores to both CCX have equal cabability to work with such a jobs - and doing asymmetric CCX configuration is something AMD hasn't done and should newer done - non symmetrical configurations should be avoidable for being really hard for scheduler to optimize workloads.
That scheduling downgrades that "12" core cpu pretty much to 4-core cpu. That's the main point against doing such braindead splitting of cores.
No they're specifically clocked low.C-cores should give something like 80% of P-cores speed.
No one cares about that stuff in a premium laptop chip. What are you on?Both 2+4 CCX's should be faster than 4 p core CCX on multithreaded jobs.
That may sound weird, but Zen5 cluster interconnect is designed to be explicitly modular, goes from 4 to 16 cores.Why would AMD use additional money and resources to design 4-core CCX just for Strix Point instead of using CCX:'s they have already designed?
Language aside, the solution you like more would result in a permanent performance penalty (to be read as "more often than not"), albeit a relatively smaller one. The asymmetrical solution would result in no penalties for a number of workloads, and rather nasty ones when software and scheduler have no idea what they're doing. I happen to think the second option is more likely because AMD believes the cases with optimal resource allocation will outweigh the other ones.That's the main point against doing such braindead splitting of cores.