Speculation: Ryzen 4000 series/Zen 3

mikk · Jul 16, 2019

Abwx said:
Using Cinebench as a metric Zen 2 has 9/14% higher IPC than CFL in ST/MT but they would still need 15% uplift to match a CPU that is stated (by Intel...) as being 18% faster per cycle than CFL (with RAM bandwith and AVX512 accounted in this weird IPC definition)..?.

If i understand well AMD must be, at least, 10% ahead to be on par with Intel....

Zen 2 and Skylake are on par with IPC in applications: https://www.overclock.net/forum/10-amd-cpus/1728758-strictly-technical-matisse-not-really.html

As for gaming Skylake might have the upper hand if both use the same RAM.

tamz_msc · Jul 16, 2019

Abwx said:
It s just not in CB but also in any other renderer, excepted of course Corona since it use an Intel designed and compiled renderer (Embree) as well as in 7 Zip wich is representative of Integer.

As for Gbench, what is exactly its relevancy.?

Better than professional softwares.?.

But even then, did you notice that Zen 2 is 5% faster (clock/clock) in ST and 16% in MT, precisely in Geekbench 4...?.

https://browser.geekbench.com/v4/cpu/compare/13910644?baseline=13911572

Geekbench is relevant because Ice Lake results are only available on Geekbench. You need to look at int and fp results separately in Geekbench. That gives roughly the same IPC for Zen 2 and Coffee Lake.

Abwx · Jul 16, 2019

mikk said:
Zen 2 and Skylake are on par with IPC in applications: https://www.overclock.net/forum/10-amd-cpus/1728758-strictly-technical-matisse-not-really.html

As for gaming Skylake might have the upper hand if both use the same RAM.

Lol, The Stilt is not credible, if he were he wouldnt have deserted AT forum, guess that he preventively escaped some justified critics once Zen 2 would be released...

Embree, an Intel designed and compiled renderer, Caselab s Euler 3D wich is explicitely using a CPU dispatcher favouring Intel, as stated by Caselab themselves..

Actually he selected the benches to get the results he wanted, like Variable Density fluid wich show Intel 16% ahead, or worse, Perlin Noise where the difference is roughly 40% in ST, for sure that with such skewed benches one can make an Atom faster than Zen1 or 2...

tamz_msc said:
Geekbench is relevant because Ice Lake results are only available on Geekbench. You need to look at int and fp results separately in Geekbench. That gives roughly the same IPC for Zen 2 and Coffee Lake.

So 5% better in ST and 16% better in MT (In GB..) is the same thing, i was wrong, 10% advantage is not enough for AMD to be ahead, it must be something like 20-25%...

Edit :

Why should the total score be in favour of AMD but the subscores would still be even..?..

How is this possible, surely not due to RAM bandwith since Zen s much higher latency more than compensate for any advantage in bandwith.

tamz_msc · Jul 16, 2019

Abwx said:
So 5% better in ST and 16% better in MT (In GB..) is the same thing, i was wrong, 10% advantage is not enough for AMD to be ahead, it must be something like 20-25%...

Edit :

Why should the total score be in favour of AMD but the subscores would still be even..?..

How is this possible, surely not due to RAM bandwith since Zen s much higher latency more than compensate for any advantage in bandwith.

Geekbench has four components: crypto, int, fp, memory. The composite score is a weighted average of these four. Go back to your link, look at the overall int scores, divide it by the actual frequency and you have int scores per GHz. Coffe Lake and Zen 2 are equal in this regard.

tamz_msc · Jul 16, 2019

Abwx said:
Lol, The Stilt is not credible, if he were he wouldnt have deserted AT forum, guess that he preventively escaped some justified critics once Zen 2 would be released...

Embree, an Intel designed and compiled renderer, Caselab s Euler 3D wich is explicitely using a CPU dispatcher favouring Intel, as stated by Caselab themselves..

Actually he selected the benches to get the results he wanted, like Variable Density fluid wich show Intel 16% ahead, or worse, Perlin Noise where the difference is roughly 40% in ST, for sure that with such skewed benches one can make an Atom faster than Zen1 or 2...

Blah blah, Intel biased benchmarks are still credible benchmarks. Or do you mean to say that people don't use embree or variable density fluid sim or linpack?

maddie · Jul 16, 2019

tamz_msc said:
Chiplet-to-chiplet communication is through the IO die, so it's going to stay the same regardless of whether inter-core communication is through a crossbar(which it is at present) or ringbus.

The SMT yield is higher, which means that the front end is still not efficiently feeding the execution units. Thus there is room for improvement as far as the front-end is concerned.

So what is feeding the 2nd thread in SMT? Isn't it the same front end? IPC has increased on top of SMT yield has also increased. This equates to an improved front end from the previous generation.

I think your reasoning is flawed, but yes there will always be need for further improvement.

Abwx · Jul 16, 2019

tamz_msc said:
Blah blah, Intel biased benchmarks are still credible benchmarks. Or do you mean to say that people don't use embree or variable density fluid sim or linpack?

Saying bla bla is the way to say that you have no answer and no valuable argument..

Compare the results of Corona to other renderers, it s the only one giving an advantage to Intel, as for Euler 3D they state that they know that the ICC compiler give an advantage to Intel but they said that they did choose to not change the thing.

Anyway if you consider the following result as something that is not biaised then i dont know what to add to this mascarade of a technical debate :

Wonderfull ST perf for CFL in Perlin Noise, courtesy of The Stilt....

tamz_msc · Jul 16, 2019

maddie said:
So what is feeding the 2nd thread in SMT? Isn't it the same front end? IPC has increased on top of SMT yield has also increased. This equates to an improved front end from the previous generation.

I think your reasoning is flawed, but yes there will always be need for further improvement.

IPC has increased due to a better front-end. True. But at the same time SMT yield has also increased. If all were equal except the front-end, SMT yield should have gone down because the front-end now better feeds the execution units. Under such circumstances, adding a second thread would result in lesser resources for the second thread, as a result of which SMT yield would be lowered.

But in reality, SMT yield has gone up, which means that the front-end is still a limiting factor when it comes to extracting ILP.

Abwx · Jul 16, 2019

tamz_msc said:
Geekbench has four components: crypto, int, fp, memory. The composite score is a weighted average of these four. Go back to your link, look at the overall int scores, divide it by the actual frequency and you have int scores per GHz. Coffe Lake and Zen 2 are equal in this regard.

The 5% and 16% i stated are difference in IPC, you know what that mean..?.
Perhaps that i took account of the frequency, isnt it, otherwise perhaps that i wouldnt had stated "IPC", but while we are at it you are stating that the subscores are indicative of equal IPC despite the average not being even IPC wise, could you explain me the math theory where two sums of strictly equal operands yield two different totals.?..

tamz_msc · Jul 16, 2019

Abwx said:
Saying bla bla is the way to say that you have no answer and no valuable argument..

Compare the results of Corona to other renderers, it s the only one giving an advantage to Intel, as for Euler 3D they state that they know that the ICC compiler give an advantage to Intel but they said that they did choose to not change the thing.

Anyway if you consider the following result as something that is not biaised then i dont know what to add to this mascarade of a technical debate :

Wonderfull ST perf for CFL in Perlin Noise, courtesy of The Stilt....

So two benchmarks which favour Intel and suddenly it must be biased?

Abwx said:
The 5% and 16% i stated are difference in IPC, you know what that mean..?.
Perhaps that i took account of the frequency, isnt it, otherwise perhaps that i wouldnt had sated "IPC", but while we are at it you are stating that the subscores are indicative of equal IPC despite the average not being even IPC wise, could you explain me the math theory were two sums of strictly equal operands yield two different totals.?..

Why would I look at crypto and memory scores in order to understand core IPC changes? Did you

Look at the overall int and fp scores separately?
Append .gb4 to the URL to get the actual frequency?
Divide the scores by this number to get the score per GHz?

If you did, you would come to the conclusion that Coffee Lake and Zen 2 have roughly the same IPC.

Atari2600 · Jul 16, 2019

ApTeM said:
The usual questions spring to mind

1. 400 & 500 yes, 300 maybe - but the mobo manufacturers would probably like to not support 300.
2. Probably <10%. Would be very surprised if they were able to do another Zen2 jump.
3. Probably a slight increase over what is available now, but maybe not over what is available from refined Zen2 CPUs.
4. 7nm+.
5. Same envelope as now to fit with AM4.
6. Nope. AVX512 is too niche to justify its existence.
7. Hoping they being to bring the building blocks together. [cornerstones being the HBCC on Vega & IO chiplet on Zen2]

But folks should be under no illusion that Ryzen is almost an afterthought to EPYC. If they have a compromise to make, it will be made in favour of the server variants.

thigobr · Jul 16, 2019

I think IPC uplift will be small, akin to what happened to Zen+. I am expecting better clock scaling though.

Flayed · Jul 16, 2019

I'm guessing 3 - 5% IPC and +200 Mhz frequency

turtile · Jul 16, 2019

It will support previous chipsets but not sure if it will be all of them.
5-12% (this won't be like Zen+ because it's more than just a shrink)
7nm+
Same
Probably 2 x 256 like Zen 1
January-March

The bigger question might be whether AMD should just use an improved 7nm from TMSC to desktop. 7nm+ will cost more and will likely have little to no clock speed boost.

Abwx · Jul 16, 2019

tamz_msc said:
So two benchmarks which favour Intel and suddenly it must be biased?

Why would I look at crypto and memory scores in order to understand core IPC changes? Did you

Look at the overall int and fp scores separately?

Append .gb4 to the URL to get the actual frequency?

Divide the scores by this number to get the score per GHz?

If you did, you would come to the conclusion that Coffee Lake and Zen 2 have roughly the same IPC.

At least 4, the two i quoted an,d the two i linked, dunno for the rest but that s obscure benches set apart for 7 Zip and CB, i see no Spec, no Web Xprt (an Intel bench, but not "good enough for the purpose..), anything where AMD does well is removed and Blender/CB used as a cover..

Now if you do the maths a "bench" like Perlin noise with 40% difference will produce an advantage of 4% if used in a total of 10 benches where 9 yield an even score, but of course that doesnt matter once it s biaised in favour of Intel, we should believe that there can be 40% in this one and nowhere else...

As for the AES results granted that they are not indicative of general performance but the difference is not that big between the two uarch, not enough to skew the results in favour of AMD, i guess that they eventually lose more (in the average) with the latency that they gain from AES.

Besides when Intel state that Sunny Cove is that much better than CFL (by 18%, same as Zen 1 to Zen 2, lol) it still include AES, AVX512, RAM bandwith, new instructions targeting some apps, all numbers that have no more relevancy than AES, i guess that although it will perform better than previous gen it wont save some people from a rude awaking if ever AMD manage to grab about 8% with Zen 3 (what they more or less stated as being their target)).

Quote from AT article :

"AMD stated that they wanted Zen+ and future products to go above and beyond the ‘industry standard’ of a 7-8% performance gain each year. "

https://www.anandtech.com/show/12625/amd-second-generation-ryzen-7-2700x-2700-ryzen-5-2600x-2600/4

mikk · Jul 16, 2019

Abwx said:
Lol, The Stilt is not credible, if he were he wouldnt have deserted AT forum, guess that he preventively escaped some justified critics once Zen 2 would be released...

He is more credible than other reviewers. No other page did such a comprehensive IPC comparison with so many different applications and workloads at the same clock, the one from Stilt is by far the best available. You are are making IPC statements just based on Cinebench and Geekbench which is noobish in comparison.

moinmoin · Jul 16, 2019

Looks like the topic is off to a good start.

Personally I'm most interested in what low hanging fruits there are to pick for Zen 3.

Considering how wide Zen 2 already is I'd expect SMT4 for Zen 3. Maybe a SVE compatible extension to enable combining 2x 256 bit to 512 bit and expand on that approach instead going down the AVX512 rabbit hole. Maybe Zen 3 will be a return of heterogeneous computing, making APUs and other customizations possible through a more flexible IOD.

Thunder 57 said:
I think AMD is too invested in the CCX design to scrap it and I don't think they could in a year's time anyway.

AMD has several teams working on different Zen gens concurrently. Zen 4 and Zen 5 have already been announced to be in work, Zen 3 isn't being done in a year's time.

Atari2600 said:
But folks should be under no illusion that Ryzen is almost an afterthought to EPYC. If they have a compromise to make, it will be made in favour of the server variants.

This definitely.

turtile said:
The bigger question might be whether AMD should just use an improved 7nm from TMSC to desktop. 7nm+ will cost more and will likely have little to no clock speed boost.

7nm+ will actually cost less per wafer, and it's arguably better to successively change to EUV than stick to the more costly multi-pattering.

Abwx · Jul 16, 2019

mikk said:
He is more credible than other reviewers. No other page did such a comprehensive IPC comparison with so many different applications and workloads at the same clock, the one from Stilt is by far the best available. You are are making IPC statements just based on Cinebench and Geekbench which is noobish in comparison.

There s the Spec comparison at AT wich show Zen 2 being faster than CFL, he tested at their rated frequencies and made a frequency normalisation, dunno if he accounted for the fact that his Ryzen set up wasnt boosting at the proper frequency with his first tests.

As for The Stilt wich crediility can be given to a guy who use an obscure bench that show 40% better IPC for CFL..?.Seriously.?.
He manage to put the uarch at even scores thanks to such tricks, i find curious that Computerbase are about in the same range as the one i stated despite their reviewer, a known Intel fan, adding for the first time Corona as a mean to somewhat help his beloved brand, at some point using more of those flawed benches will be counterproductive..

https://www.computerbase.de/2019-07...performancerating-fuer-anwendungen-multi-core

In the Multicore test he has the 9900K 1% above the 3700X, granted Cinebench is used twice but CB R20 is more favourable to Intel than what was CB R15 as aknowledged by their numbers...

In the Single core score the 9900K is 7% ahead, all in all this is indicative of 13.5% better IPC in MT and 6.2% in ST, same numbers as Geekbench curiously, isnt it surprising, everyone agree on these numbers excepted The Stilt, wonder if he s not on some interested payroll...

extide · Jul 16, 2019

I am going to say yes because I don't think DDR5 will come until Zen 4.
Another 10-15%
Hopefully another 300+Mhz across the board
7nm+ (I am pretty sure I have seen this confirmed in AMD slides, in fact)
Similar to current gen
I am going to say it's definitely possible -- at least perhaps if it uses 2 clock cycles and runs it 256b at a time (kinda like Zen1 with AVX256). Mainly I think that this will be the case because they are pushing so hard for server market share.
Probably soonish -- but Ryzen 4000 APU's will be Zen2 not Zen3.

Madcap_Magician · Jul 16, 2019

7nm+. Slight clock increases.

On package HBM cache chip

HurleyBird · Jul 16, 2019

Zen 2 is already possibly competitive with Sunny Cove in terms of IPC (excluding AVX 512 workloads) if everything else is equal and you're just comparing the cores themselves. ~50% higher memory latency surely has a huge impact on numerous workloads.

The issues with Ryzen 3K are latency and halved effective L3 per CCD.

Even without any major changes to the core architecture if you change from 2x CCX per CCD to 1x ringbus (either doubling effective L3 size or halving the number of transistors spent on L3), add an interposer, and throw on some on-package memory for good measure I doubt the initial desktop Sunny Cove implementations will be able do much even if Intel fixes the clock-speed issues... assuming of course that Intel doesn't also significantly evolve the non-core stuff.

Of course, it's possible that AMD doesn't do any of those things with Zen 3 even though they seem like obvious remaining low hanging fruit. It's also possible that, even though Intel seems like they're a bit behind with the entire chiplet thing, that they do adopt such strategies before AMD does. My gut says that Intel will be lagging AMD in that area by at least a year.

For Zen 3, I'm expecting AMD does at least two of the three listed non-core improvements, and also increases the core IPC and clocks a bit. So my estimates are anywhere between 10% (few non-core improvements) and 35% (all discussed non-core improvements) single threaded performance uplift over Zen 2.

Thunder 57 · Jul 16, 2019

moinmoin said:
Looks like the topic is off to a good start.

Personally I'm most interested in what low hanging fruits there are to pick for Zen 3.

Considering how wide Zen 2 already is I'd expect SMT4 for Zen 3. Maybe a SVE compatible extension to enable combining 2x 256 bit to 512 bit and expand on that approach instead going down the AVX512 rabbit hole. Maybe Zen 3 will be a return of heterogeneous computing, making APUs and other customizations possible through a more flexible IOD.

As am I. One thing that comes to mind is decode. Right now Zen can decode 4 instructions and take 8 from the uop cache but only dispatch 6. I think that number will increase. They already added an AGU which I thought was a weakness and did not expect to see that done in Zen 2. Otherwise I would have to look at the slides again.

AMD has several teams working on different Zen gens concurrently. Zen 4 and Zen 5 have already been announced to be in work, Zen 3 isn't being done in a year's time.

Oh yes I know. I didn't word that well. What I meant was that the time already spent building on Zen 2, in addition to another 12-18 months, is not enough time to just nuke the CCX concept and do something else. Besides, it's working well for them, why mess with success? (Not directed at you, but whoever my original quote was responding to).

7nm+ will actually cost less per wafer, and it's arguably better to successively change to EUV than stick to the more costly multi-pattering.

This was in response to:

"The bigger question might be whether AMD should just use an improved 7nm from TMSC to desktop. 7nm+ will cost more and will likely have little to no clock speed boost."

I'm not sure where that came from, but that quote is not attributable to me. I even clicked my handle and reread my original post to make sure I wasn't losing my mind.

7nm+ will certainly help as EUV is the future, but I hope people aren't expecting much. Even TSMC isn't expecting much over 7nm. Looks like AMD will get a little bit more area to play with though. Wonder where they'll spend it. Did I mention decode?

Space Tyrant · Jul 16, 2019

Abwx said:
The 5% and 16% i stated are difference in IPC, you know what that mean..?.
Perhaps that i took account of the frequency, isnt it, otherwise perhaps that i wouldnt had stated "IPC", but while we are at it you are stating that the subscores are indicative of equal IPC despite the average not being even IPC wise, could you explain me the math theory where two sums of strictly equal operands yield two different totals.?..

If the overall score consists of crypto, int, fp, memory, it would be double-counting memory. Crypto, int, and fp all benefit from improved memory speeds/latencies. Memory performance is inherently counted *exactly* as much as it is relevant in the other subscores. To then separately include the memory score in the overall ST and MT scores biases it toward the system with better memory scores.

IntelUser2000 · Jul 16, 2019

tamz_msc said:
But in reality, SMT yield has gone up, which means that the front-end is still a limiting factor when it comes to extracting ILP.

This will always be the case. Extracting ILP means increasing decode bandwidth and execution resources. But that just means more will be idle because its inherently limited by the code.

That's why SMT is a win-win. And SMT isn't just about taking advantage of idle resources. SMT also helps in cases where MLP(memory level parallelism) exist or is bound by the memory subsystem.

amd6502 · Jul 16, 2019

NostaSeronx said:
Too bad about 7nm+ at GloFo => power-performance-area (PPA) target of 40% power reduction, 10% performance boost and 10% area compaction through standard cell library richness focusing on parasitic reduction, physical design incorporate with EUV element and cell drive strength granularity

What do you mean by this? GF cancelled 7nm finfet.

maddie said:
So what is feeding the 2nd thread in SMT? Isn't it the same front end? IPC has increased on top of SMT yield has also increased. This equates to an improved front end from the previous generation.

SMT is symmetric. The front end and L1 has changed in Zen2, and it's much more capable despite same old 4 decode/cycle. They almost doubled up op-cache.

Thunder 57 said:
I think Zen 3 goes wider. Right now it can do what, decode 4 as well as dispatch 2 ups for a total of 6? I could see them adding a decoder. Beyond that I'd have to look at the slides to see where some of the bottlenecks might be.

https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/8

They feed the L2 instruction code effectively using a new unit. Then have shrunk L1i (halved?) but doubled op-cache size.

The op cache can also now feed up to 8 ops per cycle versus the old 6 ops maximum; link above is new zen2 and this link is zen1 https://www.anandtech.com/show/1057...lers-micro-op-cache-memory-hierarchy-revealed

The decoder still chugs only 4. It may or may not bottleneck the performance. It just depends how the op cache is performing. They did a great job so I think it's likely rare that the 4-wide decode is an issue. Because Zen3 likely is mobile focused I think doubling up decoder might not happen unless there are energy efficiency tricks. (I remember Kaveri getting doubled up decode and I think it has a lot to do with why these little APU's made such good space heaters and why 8c Steamroller was cancelled.) Maybe they are designing a 6-wide decode.

This is total speculation and likely totally wrong; but it's my best guess.

I think they will widen the core a little more and do 4-way multithreading, so with four threads and wider core they may need to widen the decoder. I don't think it will be SMT4 though, but think they will add a "Threadrip" mode, that allows a pair of opportunistic threads to run on top of SMT2 and help keep the execution units busy. This would be similar to big-little in the acorn world. These small threads would run completely without speculation (taking turns pausing on branches) and out-of-order execution would be very limited.

For consumer enabling Threadrip would be a benefit; with quadcore APU having 8 strong threads, and 8 "small" threads. Building a kernel with -j16 would be a big speedup over a kernel build with only SMT2 enabled. Little threads would also have no vulnerability to spec execution. The OS can use it for itself and system processes. Browsers can be made to use it (eg offloading incessant and useless javascript threads associated with bg tabs). Little threads would be most useful for high latency and high FPU code and could be useful in parallel compute datacentres.

Zen3 would be primarily mobile focused, secondarily server focused; and hopefully the first to see this core would be a quadcore sub 10W APU, followed by 2 CCX chiplets for the server and consumer markets.

As far as product lines, I think unlike 3000 gen, consumer 4000/5000 MCM's would be strictly single CPU chiplet APUs, with 8 CU Vega built into the IOX, and available for both Zen2 and Zen3 chiplets--Zen2 arriving late this year and Zen2 5000 in late H2 or early 2021. Monolithic mobile quadcore Zen3 4000 APU arriving mid 2020 and mid H2 for AM4.

Speculation: Ryzen 4000 series/Zen 3

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Golden Member

Senior member

Senior member

Senior member

Lifer

Diamond Member

Diamond Member

Lifer

Senior member

Junior Member

Platinum Member

Platinum Member

Member

Elite Member

Senior member