AMD “Next Horizon Event" Thread

Abwx · Nov 6, 2018

JDG1980 said:
This reflects a 20% clock speed increase at the same TDP (300W for both cards).

JDG1980 said:
The original Zen on GloFo 14nm tops out at about 4.0 to 4.1 GHz (the "12nm" process which is really a tweaked and refined 14nm can do a few hundred MHz above that). Interpreting the process gains based on what we've seen announced with Vega so far, this would indicate to me that we're probably looking at peak boost clocks of 4.8 to 5.0 GHz for consumer-focused Ryzen 3000 products. .

This cant be transposed, at 4GHz+ it wont be the same, we can expect 250-300MHz higher fequencies but 10% seems a stretch.

It s likely that AMD has anticipated that they would be lacking process wise for years, and that the only mean they have to compensate is higher density to allow more transistors and improve the IPC and throughput, hence the big pushes they planned in this matter.

We ll see how the things materialize in the DT/NBook environment, but if anything the bench they displayed point eventualy to very healthy gains in FP.

Markfw · Nov 6, 2018

moonbogg said:
Is 3 seconds a big deal? I honestly don't know.

Thats 10%, I call that significant.

piesquared · Nov 7, 2018

Not only that, but this is early silicon vs production CPUs. Expect the gap to widen even more.

DownTheSky · Nov 7, 2018

JDG1980 said:
We don't know what the shipping clocks for Zen 2 products will be, but we do have some figures released for Vega. The existing Radeon MI25 (Vega 10 @ GloFo 14nm) has a peak boost clock of 1500 MHz. The upcoming Radeon MI60 (Vega 20 @ TSMC 7nm), announced today, has a peak boost clock of 1800 MHz. This reflects a 20% clock speed increase at the same TDP (300W for both cards).

That's not counting the memory. Old card has 16GB, new one 32GB higher clocked.

TheGiant · Nov 7, 2018

so the predictions on memory latency (and thus the gaming performance) ? how come that AMD claims better latency with de-integrated memory controller ? the L4 (/wave Broadwell) ?

itsmydamnation · Nov 7, 2018

TheGiant said:
so the predictions on memory latency (and thus the gaming performance) ? how come that AMD claims better latency with de-integrated memory controller ? the L4 (/wave Broadwell) ?

They could have much improved cache/home agent/directory controller. That is one of the weaker areas in Zepplin.

Atari2600 · Nov 7, 2018

TheGiant said:
so the predictions on memory latency (and thus the gaming performance) ? how come that AMD claims better latency with de-integrated memory controller ? the L4 (/wave Broadwell) ?

As I pointed out elsewhere, the time (on Zen1+) to get data from foreign CCX to local CCX via IF is still an order of magnitude less than the time taken to get data from DRAM.

In Zen2, it is very likely that the links from CCX to IO Controller are not running at MEMCLK = lower latency, so that difference (between going direct vs. going indirect) decreases further.

They have also stated they intend to support higher memory speeds. Given this presentation was all about Rome, I can only assume their statement is in the context of EPYC - i.e. server EEC memory - so an increase from 2933 to > 3000 should see latency from IO Controller to DRAM drop. 10% improvement there would dwarf any loss due to travel from CCX to IO Controller.

arandomguy · Nov 7, 2018

Official memory for current gen Epyc is 2400/2666.

https://www.amd.com/system/files/2017-06/AMD-EPYC-Data-Sheet.pdf

Were they specific with regards to better memory latency? Better memory latency with respect to Rome vs Naples does not necessarily mean the statement will apply to other platforms.

A general thing I've noticed with respect to Rome predictions/theorizing is that it seems like many people were applying bias towards the desktop with respect to interpreting information with regards to Rome.

Now I'm not predicting this either way but I don't see why the idea that Zen 2 (at least in this format) might have been primarily targeted for a different work load then desktop/gaming should be dismissed as the possibility. Intel's Skylake and Skylake-E differentiation for example is a clear example of optimization for different work loads. Likewise don't be dismissive that desktop Zen 2 might have more divergence, APU Zen itself had some divergence for example.

Gideon · Nov 7, 2018

IMO AMD has plenty of room to improve the memory latency. Sure, chiplets will degrade it somewhat, but Ryzens current latency is already 20+ ns worse than Intel's, despite being monolithic. Whatever latency they lose from going MCM, they could certainly gain back just by improving the memory controller and decoupling the Fabric clocks from the memory-clock (or even just allow faster memory, say 4000 Mhz+). L4 might help somewhat in addition to that.

DrMrLordX · Nov 7, 2018

JDG1980 said:
this would indicate to me that we're probably looking at peak boost clocks of 4.8 to 5.0 GHz for consumer-focused Ryzen 3000 products.

I remain skeptical of that claim. Yes, we can expect higher clocks than Pinnacle Ridge. But 500-700 MHz more at the top end? Nahhh. It just doesn't seem likely. I'm sticking with max 4.5 - 4.6 GHz for Matisse.

Gideon · Nov 7, 2018

Hmm, servethehome has a few interesting dibits (and they are usually very accurate):

1. The I/O chip will handle all I/O, including PCIe. (in fact no NVLink required for any GPUs connected to a single socket)
2. Infininty-Fabric speeds greatly improved (probably uses PCIe Gen4 under the hood)
3. Rumors and Hints from AMD about increased clock speeds

From highlighted parts below (but i suggest reading the entire article):

AMD EPYC 2 Rome Details
Here is the quick summary of what we learned today about the AMD EPYC 2 “Rome” generation:

Up to eight 7nm x86 compute chiplets per socket.

Each x86 chiplet up to 8 cores

64 cores confirmed AMD EPYC Rome Details Trickle Out 64 Cores 128 Threads Per Socket

There is a 14nm I/O chip in the middle of each package

This I/O chip will handle DDR4, Infinity Fabric, PCIe and other I/O

PCIe Gen4 support providing twice the bandwidth of PCIe Gen3

Greatly improved Infinity Fabric speeds to be able to handle the new I/O chip infrastructure including memory access over Infinity Fabric

Ability to connect GPUs and do inter-GPU communication over the I/O chip and Infinity Fabric protocol so that one does not need PCIe switches or NVLink switches for chips on the same CPU. We covered the current challenges in: How Intel Xeon Changes Impacted Single Root Deep Learning Servers. This can be a game changer for GPU and FPGA accelerator systems.

Socket compatible with current-generation AMD EPYC “Naples” platforms.

Although not confirmed by AMD, we will state that most if not all systems will need a PCB re-spin to handle PCIe Gen4 signaling. So existing systems can get Rome with PCIe Gen3 but will require higher-quality PCB for PCIe Gen4.

Claimed significant IPC improvements and twice the floating point performance per core.

Incrementally improved security per core including new Spectre mitigations

This is a long list. We now have a fairly good idea about what the next-generation will offer. Cache sizes, fabric latencies, clock speeds, I/O chip performance, DDR4 speeds and other aspects have not been disclosed, so there is still a long way to go until we have a full picture. We have heard rumors of, and AMD hinted at the notion that with 7nm they would be able to get increased clock speeds as well.

exquisitechar · Nov 7, 2018

IIRC Charlie Demerjian has said on Twitter that the turbo on a few cores will be significantly higher than before for Ryzen 3xxx CPUs. I wonder about final 64c Epyc 2 clocks, the one that they benchmarked probably wasn't clocked all that high.

krumme · Nov 7, 2018

Why not just use same 8c chiplets for everything?
I mean you have current 14nm apu to cover mobile low-end and current Zen plus for initial desktop low end.
They just different 14nm io die eg. with and without gpu. If gpu is needed at all and they don't glue it on also.

TheGiant · Nov 7, 2018

exquisitechar said:
IIRC Charlie Demerjian has said on Twitter that the turbo on a few cores will be significantly higher than before for Ryzen 3xxx CPUs. I wonder about final 64c Epyc 2 clocks, the one that they benchmarked probably wasn't clocked all that high.

this is what I am looking for and missing in current zen implementations

piesquared · Nov 7, 2018

exquisitechar said:
IIRC Charlie Demerjian has said on Twitter that the turbo on a few cores will be significantly higher than before for Ryzen 3xxx CPUs. I wonder about final 64c Epyc 2 clocks, the one that they benchmarked probably wasn't clocked all that high.

Having a couple or a few highly binned chiplets for Xtreme boost is a nice option that this design gives them.

The clocks were almost certainly lower than final release. I think it was Papermaster who might have even said that this was a prototype package? Is it even possible to run benchmarks on a prototype?

Atari2600 · Nov 7, 2018

krumme said:
Why not just use same 8c chiplets for everything?
I mean you have current 14nm apu to cover mobile low-end and current Zen plus for initial desktop low end.
They just different 14nm io die eg. with and without gpu. If gpu is needed at all and they don't glue it on also.

This is exactly what I expect them to do.

(i) Mask cost for 7nm is extremely arduous.
(ii) Limited 7nm wafer capacity (given expected demand).
(iii) AMD have indicated IO does not scale so well on the smaller process and performance is not so sensitive to the smaller process.
(iv) Reduced resources (all 3) to qualify new parts.
(v) More flexibility in how they use chiplets to meet demand (depending on where market yield is greatest they can adjust package ratios).

Despoiler · Nov 7, 2018

krumme said:
Why not just use same 8c chiplets for everything?
I mean you have current 14nm apu to cover mobile low-end and current Zen plus for initial desktop low end.
They just different 14nm io die eg. with and without gpu. If gpu is needed at all and they don't glue it on also.

I was just thinking about this in combination with the 1x vs 2x CCX. If AMD keeps the same overall strategy for consumer and enterprise we would have 8 core 1x CCX as the base chip. They can bin or fuse to get less cores sure. Depending on how cheap the chiplet strategy is 8 core could possibly be the lowest core count offered. I think it gives AMD an advantage to continue ramping up core counts because basically starves Intel. Intel is stuck on process. The bigger the chips they have to pump out the less chips they can make and the less profits they can reap. The other cool thing is if 8 core is the new norm that means something bigger like 16 core has to take the place on the high end desktop. Can you imagine what devs could do with an extra 8 core chip? A new best AI mode. Run actual simulations. It would be completely new territory. Probably dreaming. Hahaha

coercitiv · Nov 7, 2018

Despoiler said:
I think it gives AMD an advantage to continue ramping up core counts because basically starves Intel. Intel is stuck on process.

Let's not forget this mainstream consumer product needs to keep cost down and also work efficiently with dual channel memory. The more chiplets they use, the bigger the size of the IO chip, and the higher the cost of the entire package.

Atari2600 · Nov 7, 2018

coercitiv said:
Let's not forget this mainstream consumer product needs to keep cost down and also work efficiently with dual channel memory. The more chiplets they use, the bigger the size of the IO chip, and the higher the cost of the entire package.

Yep, but a base 8C product does not need anything beyond dual channel memory.

HurleyBird · Nov 7, 2018

edit: nvm

AtenRa · Nov 7, 2018

moonbogg said:
Is 3 seconds a big deal? I honestly don't know.

Since those CPUs are for servers, the first thing we are interested to know is how much power each system used to finish the C-Ray benchmark and secondly, how much space each system will use because rack space is gold in server rooms.

So if EPYC 2 is 10% faster than dual XEONs but using 30-40% less power and can fit the same amount of cores at half the rack space then we are talking about a major advantage for the AMD product vs the competition.

ub4ty · Nov 7, 2018

Gideon said:
Hmm, servethehome has a few interesting dibits (and they are usually very accurate):

1. The I/O chip will handle all I/O, including PCIe. (in fact no NVLink required for any GPUs connected to a single socket)

From highlighted parts below (but i suggest reading the entire article):

Ability to connect GPUs and do inter-GPU communication over the I/O chip and Infinity Fabric protocol so that one does not need PCIe switches or NVLink switches for chips on the same CPU. We covered the current challenges in: How Intel Xeon Changes Impacted Single Root Deep Learning Servers. This can be a game changer for GPU and FPGA accelerator systems

Single root PCIE complex on such a massive core count CPU is a YUGE game changer indeed.
Currently with threadripper, you have a PCIE complex per CPU die which causes issues with various applications involving GPU<->GPU.
This problem is further impacted by added latency.

With a single I/O chip handling all of the I/O, PCI-E .. The scalability and performance is massive.
Looking forward to Zen2 Threadripper !
Glad I got the first gen and road it out to gen2.

Looks like a sell and upgrade in 2019/2020 with some retiring to servers.

Also of note is that Intel broke the single PCIE root complex paradigm with their new line.
Also of note is the encoded jab at Nvidia w.r.t to the new Radeon GPUs being able to run infinity fabric protocol over the PCIE 4.0 lanes. AMD truly went for an open and scalable approach and its beginning to payoff bigly !

Atari2600 · Nov 7, 2018

arandomguy said:
Official memory for current gen Epyc is 2400/2666.

https://www.amd.com/system/files/2017-06/AMD-EPYC-Data-Sheet.pdf

My bad - I was thinking of the Zen1+ supported speed - which of course as you say doesn't apply to Zen1 (EPYC).

coercitiv · Nov 7, 2018

Atari2600 said:
Yep, but a base 8C product does not need anything beyond dual channel memory.

If base is 8c, what's at the $300 price point?

Atari2600 · Nov 7, 2018

coercitiv said:
If base is 8c, what's at the $300 price point?

Same 8C chiplet. Right down to the basement. They might not even offer a 4 core Zen2 product (unless harvesting duds I guess).

7nm design cost is ~300m USD and cost per unit is pretty much flat going from 12nm to 7nm - so its as cheap to make the IO on 12nm as it is to incorporate it onto 7nm.

AMD had a revenue of around 1.8B USD in Q2 2018 - how much of that would have been from Ryzen3? Judging from mindfactory proportions, very little.

Basically, I don't believe the design cost will justify its existence by lowering manufacturing cost enough.

https://www.extremetech.com/computing/272096-3nm-process-node
https://www.icknowledge.com/news/Technology and Cost Trends at Advanced Nodes - Revised.pdf
https://www.pcgamesn.com/wp-content/uploads/2018/09/Mindfactory-AMD-vs-Intel-580x326.jpg

AMD “Next Horizon Event" Thread

Lifer

Moderator Emeritus, Elite Member

Golden Member

Senior member

Senior member

Platinum Member

Golden Member

Senior member

Golden Member

Lifer

Golden Member

Senior member

Diamond Member

Senior member

Golden Member

Golden Member

Golden Member

Diamond Member

Golden Member

Platinum Member

Lifer

Senior member

Golden Member

Diamond Member

Golden Member