AMD “Next Horizon Event" Thread

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
You speak as if this is a given and has already been detailed. Where does this operate in this capacity today outside of CPU socket linking?
Gen-Z is the closet to IF.
The diagram below shows a mixed PCIe Root Complex and Gen-Z Requester / Responder protocol stack inside of a SoC. This “dual-boot” application example shows how new CPU / SoC designs can be made to allow for both PCIe root complex and Gen-Z protocols sharing the same gigabit-transceivers. The SoC firmware can configure the IO interface to be a mixed PCIe Root Complex / Gen-Z complex, a dual port PCIe Root Complex, or a dual port Gen-Z complex. This flexibility allows SoC designers to leverage the Gen-Z fabric for specific applications while allowing for support of legacy PCIe components within the system.


Then, you need to look at CCIX:
As noted earlier, one of the biggest attractions of CCIX is its compatibility with PCI Express, and in fact CCIX’s cache coherency protocol can be carried over any PCI Express link running 8GT/s or faster. The highest data rate specified by PCI Express 4.0 is 16GT/s, which works out to around 64GB/s of total bidirectional bandwidth on a 16-lane link, but some members of the CCIX Consortium needed even more bandwidth. They determined that by raising the transfer rate to 25GT/s, a CCIX link could approach 100GB/s under the same conditions. This led to a CCIX feature known as Extended Speed Mode (ESM). Since PCI Express is owned by a different standards body, the CCIX Consortium chose a clever mechanism to allow compatibility between ESM-capable components and PCI Express components. Two CCIX components wishing to communicate with each other proceed through a normal PCI Express link initialization process (generally a hardware autonomous process) to the highest mutually supported PCI Express speed. From that point, software running on the host system can interrogate CCIX-specific configuration registers and determine if both components are ESM-capable, and if so, identify their highest supported speeds. That software then programs other CCIX-specific registers on both components to map PCI Express link speed(s) to CCIX ESM link speed(s). From that point forward, link negotiation would be for CCIX ESM speed(s), so by forcing a link retraining, the two components could now communicate as quickly as 25GT/s.

The no bridges/switches comes from ringing like HTX(Hypertransport's PCIe).
 
Last edited:

beginner99

Diamond Member
Jun 2, 2009
5,219
1,591
136


If only we could zoom and enhance...

Well all the suggested layouts where wrong. In fact 4 chiplets are further away. Got to wonder if that has a tiny performance impact? or in power consumption due to longer traces?

To be fair most experimenting with layouts did say that fitting 8 chiplets next to IO die would be hard or not possible due to size and they were right.

And for sure no interposer assuming this isn't a mock-up.
 

exquisitechar

Senior member
Apr 18, 2017
664
883
136
AMD showed C-Ray comparison of 1x 64 Rome against 2x 28 core intel Xeon. AMD finished the render 7% faster with 14% more cores. They said Rome was prototype system and not the highest performance (clocks) they'll achieve with the part.
C-Ray is very favorable for Zen.
 
Mar 11, 2004
23,143
5,610
146
Seems they're leaving a lot on the table if that's the case given that Zen2 is PCIE 4.0 and all of the other goodies : infinity fabric CPU/GPU? Is this a point in which AMD will start to focus moreso on profits and segmentation where the pro CPUs/GPUs start to have clear features distinguished from consumer line? What's this event spell for consumer line? We have 8 cores per CCX now... Will consumer side see a doubling too or just a shrink? Any potential for exotic add-ons to the chiplet like HBM? GPU complex? What will AMD do w/ all of the newly available space on consumer Zen2? What does consumer Zen2 look like?

In a way yes, but, while the clock speeds and memory bandwidth increases mean that Vega 20 should be able to substantially outperform Vega 64 (while consuming less power), it comes with huge extra cost, so it'd be pointless to market it to gamers. Just look at the backlash over NVidia's pricing. Now imagine AMD tries selling a card that offers maybe 2080Ti performance but costs even more (possibly a LOT more, like $2K+), and doesn't have the ray-tracing capability (possibly AMD could tailor some of the other stuff for that, but it would be half-baked, even moreso than Nvidia's RTX features are). It'd be a dud and pointless for gamer market. On top of that, I think Navi should offer similar gaming performance to Vega 20, and be much less expensive to produce (especially when factoring in the HBM). I expect that Navi should offer close to the CU count of Vega 64/Vega 20, but should have improvements to the pipeline, and clock speeds should be similar if not higher than Vega 20. Navi should make Vega 20 pointless for gamers. Because of the memory bandwidth (it'll probably be 2-3x the memory bandwidth of Navi) it might be able to offer more performance still, but it'd probably cost almost 10x what Navi based cards do.

Yeah the GPU IF link is pretty much a pro feature right now. Games just don't need that bandwidth yet, and need it even less now that multi-GPU like Crossfire/SLI are practically deprecated in modern games. Although if it could perhaps target latency instead it would probably help games. And if they do start the mGPU as monolithic move, or even just the single GPU per eye for VR/AR, that might change, but PCIe 3.0 should be enough for that for now, and they might have PCIe 4 support. And then PCIe 5 is not far off (2020, maybe 2021 for AMD to have it in consumer stuff).

I definitely see AMD doing GPU chiplets, especially for APUs (they'll still have a small singular die, but it'll lag behind, and it'll be aimed at low cost systems). We'll see on stuff like HBM since its still pretty expensive. Its a possibility in the future. But I think the chiplet design is more about higher end markets where their customers want more specialized hardware and the ability to have things tailored to their needs. It has benefits lower down, but this is more about offering the big players like Facebook, Google, Microsoft, Amazon and others the ability to add their own custom stuff (like Google's Tensor Processing Unit), or to mix and match based on the use.
 
Reactions: ub4ty

ub4ty

Senior member
Jun 21, 2017
749
898
96
Gen-Z is the closet to IF.


Then, you need to look at CCIX:


The no bridges/switches comes from ringing like HTX(Hypertransport's PCIe).

My dude dropping pure knowledge. Thank you so very much for the detailing.
Appreciate it very much .. Things are spicy in the world of communication architectures. I wasn't aware there was this much flexibility down under !
 
Reactions: darkswordsman17

DrMrLordX

Lifer
Apr 27, 2000
21,768
11,088
136
A wide(r) core, finally

Totally unexpected. I didn't think AMD would bother given their current portfolio and (apparent) performance targets. Instead they're taking a serious swing at 256-bit SIMD. I like it. Not sure if it'll help their bottom line much, but I like it!

The next big question I have is how much cache does that IO die have? To host all that IO, it needs a large circumference. So a lot would fit, even at 14nm.

Best guess I've seen is 512MB L4 cache. That sucker is huge.

Wow, AMD un-integrated the northbridge.

Interesting, but not altogether bad. If you boot up Sandra and look at some of the old memory latency scores for pre-QPI Intel systems that still ran on a quad-pumped FSB, you'll see that (in terms of absolute latency) the numbers aren't all that bad. AMD has learned that heat and other factors are making it impossible to cram everything on one die. So faster interconnects and chiplet designs are the way to go - for now.

I think it will still be 16 MB L3 cache within each chiplet with a larger L4 cache within the IO die. Someone mentioned that the IO die will need to have a large perimeter to interface with each of the chiplets, even more so if there are 8 of them. The interior space within this IO die can be used for a large, shared L4 cache.

Why only 16MB L3? Why not 32?

Well all the suggested layouts where wrong. In fact 4 chiplets are further away. Got to wonder if that has a tiny performance impact? or in power consumption due to longer traces?

To be fair most experimenting with layouts did say that fitting 8 chiplets next to IO die would be hard or not possible due to size and they were right.

And for sure no interposer assuming this isn't a mock-up.

It may be that each pair of chiplets has its own IF controller, and that the pair is connected by a local crossbar. Sort of creating super-CCXs of 16 cores each.
 
Reactions: lightmanek

inf64

Diamond Member
Mar 11, 2011
3,753
4,191
136
Well sure it is possible I guess (4 CCX instead of 8 CCX). It still looks like 8 cores per chiplet to me.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
In a way yes, but, while the clock speeds and memory bandwidth increases mean that Vega 20 should be able to substantially outperform Vega 64 (while consuming less power), it comes with huge extra cost, so it'd be pointless to market it to gamers. Just look at the backlash over NVidia's pricing. Now imagine AMD tries selling a card that offers maybe 2080Ti performance but costs even more (possibly a LOT more, like $2K+), and doesn't have the ray-tracing capability (possibly AMD could tailor some of the other stuff for that, but it would be half-baked, even moreso than Nvidia's RTX features are). It'd be a dud and pointless for gamer market. On top of that, I think Navi should offer similar gaming performance to Vega 20, and be much less expensive to produce (especially when factoring in the HBM). I expect that Navi should offer close to the CU count of Vega 64/Vega 20, but should have improvements to the pipeline, and clock speeds should be similar if not higher than Vega 20. Navi should make Vega 20 pointless for gamers. Because of the memory bandwidth (it'll probably be 2-3x the memory bandwidth of Navi) it might be able to offer more performance still, but it'd probably cost almost 10x what Navi based cards do.

Yeah the GPU IF link is pretty much a pro feature right now. Games just don't need that bandwidth yet, and need it even less now that multi-GPU like Crossfire/SLI are practically deprecated in modern games. Although if it could perhaps target latency instead it would probably help games. And if they do start the mGPU as monolithic move, or even just the single GPU per eye for VR/AR, that might change, but PCIe 3.0 should be enough for that for now, and they might have PCIe 4 support. And then PCIe 5 is not far off (2020, maybe 2021 for AMD to have it in consumer stuff).

I definitely see AMD doing GPU chiplets, especially for APUs (they'll still have a small singular die, but it'll lag behind, and it'll be aimed at low cost systems). We'll see on stuff like HBM since its still pretty expensive. Its a possibility in the future. But I think the chiplet design is more about higher end markets where their customers want more specialized hardware and the ability to have things tailored to their needs. It has benefits lower down, but this is more about offering the big players like Facebook, Google, Microsoft, Amazon and others the ability to add their own custom stuff (like Google's Tensor Processing Unit), or to mix and match based on the use.
Yeah, something tells me there will be a lot of exotic packages going into the near future with this new product launch. I actually predict that the 14nm central I/O die + chiplet will be present far more than people think as it opens up the way for interesting new APU configurations/etc... and its cheap being 14nm... Further, it allows for far more reuse. It will be cut down no doubt from the server version but there's no reason not to see this even on a 8core CCX rollout.
 
Mar 11, 2004
23,143
5,610
146
That looks like 4 chiplets to me. Two 8 core CCX?

Actually it looks like the 4 are split in the middle, so they might be put very close together in 4 pairs, for 8 chiplets. Which the layout seems weird, you'd think they'd have all the chiplets edged against the I/O piece.
 

Glo.

Diamond Member
Apr 25, 2015
5,753
4,660
136
Guys, think. Each Chiplet is separate CCX. If it is 8 Core, From this moment on - CCX has 8 Cores.
 
Reactions: inf64

DisEnchantment

Golden Member
Mar 3, 2017
1,659
6,101
136
Reactions: IEC and prtskg

DrMrLordX

Lifer
Apr 27, 2000
21,768
11,088
136
I actually predict that the 14nm central I/O die + chiplet will be present far more than people think as it opens up the way for interesting new APU configurations/etc... and its cheap being 14nm... Further, it allows for far more reuse. It will be cut down no doubt but there's no reason not to see this even on a 8core CCX rollout.

Good point on the APU thing. AMD can now stick the iGPU on the system controller/ I/O die. If AMD is confident that they can reduce memory latency going from the old EPYC design to this one, I have every reason to believe they can do the same for desktop Ryzen CPUs and APUs. The system controller design will probably be present throughout the entire Zen2 lineup.
 
Reactions: lightmanek

HurleyBird

Platinum Member
Apr 22, 2003
2,725
1,342
136
Given the larger than expected die size of the chiplets, some possible explanations:

-The cores themselves should be a bit larger thanks to increased execution width.

-Perhaps the core transistors are a bit spaced out for frequency?

-Double L3 cache is likely. Perhaps More L1/L2 as well?

-These may be reused as consumer dies, and therefore waste die size on (in this case) unnecessary IO.

-Might still be 2X CCX, which is likely to be less space efficient than say, a single block of cores connected via ringbus ala 8800K.
Could be some combination of these factors.
 
Reactions: Saylick
Mar 11, 2004
23,143
5,610
146
Yeah, something tells me there will be a lot of exotic packages going into the near future with this new product launch. I actually predict that the 14nm central I/O die + chiplet will be present far more than people think as it opens up the way for interesting new APU configurations/etc... and its cheap being 14nm... Further, it allows for far more reuse. It will be cut down no doubt from the server version but there's no reason not to see this even on a 8core CCX rollout.

I have a hunch the one we're seeing is for EPYC only. We'll either see a different die for consumer (and/or Threadripper), or a substantially different I/O chip. Seems like people are expecting a very large L4 cache in that I/O chip, and I don't see them putting that in consumer stuff.

It'll definitely be interesting to see how things go.
 

Doom2pro

Senior member
Apr 2, 2016
587
619
106
Guys, think. Each Chiplet is separate CCX. If it is 8 Core, From this moment on - CCX has 8 Cores.

So you are suggesting Desktop Zen 2 will be 16 core (2x 8core CCX)?

I wouldn't be surprised, 7nm 8 core Zen would be tiny (Less than half 14/12nm Zen die size).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |