- Mar 3, 2017
- 1,623
- 5,894
- 136
That one is fake, just an edit of already existing shot
It's stretched like the CPU shothow blurry the shot was should have been a dead giveaway it doesn't have the noisy fine grain a slipped shot takes from a partner testing an es. what's with the yellow animal?
what's with the yellow animal?
this is the crap kids watch these days?
It's good.
no kids started watching ~ 25 years agothis is the crap kids watch these days?
Where's your inner child? Grow downthis is the crap kids watch these days?
My reaction to someone not knowing what the "yellow animal" is:how blurry the shot was should have been a dead giveaway it doesn't have the noisy fine grain a slipped shot takes from a partner testing an es. what's with the yellow animal?
It died about 50 years ago.Where's your inner child? Grow down
I see where you are coming from and as stated above, I am in no way saying that this is not a likely option. Just to point this out, CnC is making educated guesses on that pic as well.Look that picture in your own link. https://i0.wp.com/chipsandcheese.co...23/04/zen4_ring_vs_broadwell_drawio.png?ssl=1
And you just calculated total bandwidth - what matters is individual bandwidth between ring stops. Ifop bandwith is in same league as any other ring traffic. Ring link speed is 32B * ring clock @4ghz equals to 128GB/s. Zen4 could also use double-link ifop from CCD to provide that bandwidth from server IOD.
Both Intel and AMD slice their L3 by lower-address bits. So only address-bit matching 1/8 of L3-accesses will hit local L3 - other 7/8 has to go through interconnect to L3 slice that matches that slicing. As local slice is bit faster there have been some optimizing effort in Intel for it - but today they probably more likely want to slow down that local access L3 to match remote slices to prevent possible yet undiscovered side-channels.Several notes:
- The interesting part about AMD's ring (if it is one) is that unlike Intel's ring bus it's not visible on die shots but appears to be an integral part of L3$ itself.
- Being a victim cache and all cores having their own slices likely means that for writes the cores write their miss data to the L3$ more directly (without taking the ring bus? Though it is unclear how the slices work when cores are disabled as that increases the size of the slices each remaining core has write access to).
There's only need for one Ifop stop on ring because that can saturate also dual-link ifop. And with one-link connection makes it possible to feed whole Ifop-bandwidth to one core which sure has been one design point (anyone benchmarked that yet?)I see where you are coming from and as stated above, I am in no way saying that this is not a likely option. Just to point this out, CnC is making educated guesses on that pic as well.
Making the IFoP part of the ring surely makes the topology and therefore the whole layout simpler.
But it also has some disadvantages:
- One more stop on the ring (or even two because of the wide mode), increasing average hops.
- congestion of the ring. The thing is: From a topology PoV the IOD traffic sits below L3 traffic because only if there is a cache miss, the data gets loaded from memory. So it does not seem beneficial to have both kinds of traffic on the same level. But as can be seen, that has been done before.
- non uniform latency to memory (almost negligible)
Yep, one core can almost saturate the whole IFoP link, that was tested by C'n'C IIRC. So yep, good point 😃There's only need for one Ifop stop on ring because that can saturate also dual-link ifop. And with one-link connection makes it possible to feed whole Ifop-bandwidth to one core which sure has been one design point (anyone benchmarked that yet?)
And there's absolutely no reason to have two simultaneous interconnections to cores and L3-slices because those aren't dual-ported. So if slice/core port isn't used for L3 access it's free for Ifop access. Ring can be saturated from other cores traffic but from core perspective one interconnect is all what is needed. As main traffic is L3 separating IOD accesses from L3 traffic won't speed up things at all. - And memory latency uniformity - memory request is made only after it has missed L3 - so there's nothing to gain here - ring has to be accessed anyway.
Aside the slide owned by the core the whole of L3$ is only available to read accesses. Accesses go through the L3$ tags which also covers L2$ of all cores. Only if that's a miss RAM is being accessed. So it would make sense to split the paths there, if it's a hit use the ring bus to get the data, if it's a miss forward the request to the IOD/IMC. Now those L3$ (and L2$ shadow) tags themselves are not centralized but interwoven with the L3$ though.
That's wrong. L3 is sliced by lower-addressing bits so every core writes it's L2 evicted lines to slice that hold those cache ways. That way whole L3 is available to every core. Read-only cache memory is pretty much useless.
- Aside the slide owned by the core the whole of L3$ is only available to read accesses.
Didn’t read like a confirmation to me. (and PHX2 was not mentioned?)Mark Papermaster finally confirmed Hybrid for Zen. I guess, PHX2 is immediately incoming.
AMD to Make Hybrid CPUs, Also Using AI for Chip Design: CTO Papermaster at ITF World
More cores, with a new twist.www.tomshardware.com
Because it's really now where one size doesn't fit all; we're not even remotely close to that. You're going to have a set of applications that actually are just fine with today's core count configurations because certain software and applications are not rapidly changing. But what you're going to see is that you might need, in some cases, static CPU core counts, but additional acceleration.
But what you'll also see is more variations of the cores themselves, you'll see high-performance cores mixed with power-efficient cores mixed with acceleration. So where, Paul, we're moving to now is not just variations in core density, but variations in the type of core, and how you configure the cores. It's not only how you've optimized for either performance or energy efficiency, but stacked cache for applications that can take advantage of it, and accelerators that you put around it.
All I'm reading seems to be specialized cases that were probably planned several years ago, not really addressing the big AI boom in the last months. I suspect Intel and AMD are scrambling to add AI features to their CPU lines right now. Well, by 'right now' I mean they probably spotted the trend long before us, but it still takes many years until its in a mass market CPU. I keep thinking back to that interview Ian did with Mike Clarke, where he hinted at them working on Z8 in 2021, which is kind of depressing with regards to adding new features.Any ideas what ryzen ai will bring to the table other than what was outlined today in a press release? in future it seems amd will bring it to more client than their initial planning allows for. i await another what about excel post from igor.
Wake me up when Excel AI sees the user doing something repeatedly and asks the user if they would like to create a macro to do the repetitive task and then creates the macro if the user agrees.i await another what about excel post from igor.