Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 265 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
800
1,364
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

lightmanek

Senior member
Feb 19, 2017
390
763
136
As per notes posted by tester, both ES platforms had unfinished BIOS, experienced issues and produced incomplete results. This is normal when working with early ES processors or if manufacturer doesn't want to show full performance and disables or downclocks certain features.

For example all core clock under load of only 1.65GHz with 128 cores loaded or wrong IF divider to make it slow, disabled prefetchers, etc...
 
Reactions: Tlh97 and Kaluan

deasd

Senior member
Dec 31, 2013
526
800
136
well...... at least the frequency is more realistic than just 2.xGhz, ES 96C192T with 3.5Ghz ST which is the same as 7773x and 7763. OTOH it also raise doubt that how AMD predicted ES Raphael performance by using immature equipments(bios, etc)......
 

inf64

Diamond Member
Mar 11, 2011
3,706
4,050
136
well...... at least the frequency is more realistic than just 2.xGhz, ES 96C192T with 3.5Ghz ST which is the same as 7773x and 7763. OTOH it also raise doubt that how AMD predicted ES Raphael performance by using immature equipments(bios, etc)......
AMD knows exactly how Raphael and Genoa perform. What they give out to testers is another story.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Well, there is not way to tell. There is no comparative since this will be AMD's first AVX-512 implementation.
AMD should be quite good at floating point units and keeping them fed. That is a large chunk of a GPU, after all, and they have been saying significant increases in power efficiency for their gpus. I have wondered if the basic floating point unit will actually be the same for GPUs and CPUs. Do the AVX512 execution units actually need anything that a gpu unit would not?
 

eek2121

Platinum Member
Aug 2, 2005
2,934
4,033
136
AMD should be quite good at floating point units and keeping them fed. That is a large chunk of a GPU, after all, and they have been saying significant increases in power efficiency for their gpus. I have wondered if the basic floating point unit will actually be the same for GPUs and CPUs. Do the AVX512 execution units actually need anything that a gpu unit would not?


Among what I have already stated: this 100% (happy to see your input btw).

Bottom line: we know nothing about AMD’s implementation of AVX-512 specific instructions. You can’t look at an Intel implementation and compare it to AMD, especially when AMD has a rather large node advantage.
 
Reactions: Tlh97

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
AMD should be quite good at floating point units and keeping them fed. That is a large chunk of a GPU, after all, and they have been saying significant increases in power efficiency for their gpus. I have wondered if the basic floating point unit will actually be the same for GPUs and CPUs. Do the AVX512 execution units actually need anything that a gpu unit would not?
AVX is sufficiently distinct from GPU SIMD implementations that I really doubt there's any meaningful leveraging between the two. And if anything, AMD's CPU team has helped the GPU team.
 

Kedas

Senior member
Dec 6, 2018
355
339
136
Looking at the 'leaked' benchmark results (if true). The ratio between non AVX-512 and AVX-512 on single thread look very similar to intel.

I assume the main target was to remove this server advantage for intel and it looks like they got their target.
Even if the 'leaks' aren't true, the chances are AMD did for AVX-512 what was needed successfully.
 

deasd

Senior member
Dec 31, 2013
526
800
136



finally some useful data.



If we compare it to a 5G 5900X which has 1/8 core count, Genoa L1, L2 bandwidth devided by 8, then we can see Genoa's high-level cache bandwidth is still a bit better than 5G Vermeer.

I'm not expert but I GUESS it's a good indication.


ouch, looks like the Genoa he tested in AIDA is a 400wTDP one (9664), core/thread count is still unknown.
 
Last edited:

Det0x

Golden Member
Sep 11, 2014
1,032
2,981
136

View attachment 64748

finally some useful data. Since AIDA bandwidth test is multithreaded, the 96C192T result is a monster that never existed on this planet.

View attachment 64747

If we compare it to a 5G 5900X which has 1/8 core count, Genoa L1, L2 bandwidth devided by 8, then we can see Genoa's high-level cache bandwidth is still a bit better than 5G Vermeer.

I'm not expert but I GUESS it's a good indication.
Instead of cores you should count CCD's
The 96 core Epyc Zen4 have 12 CCD's while your 5900x have 2 CCD's.

Compared to my max tuned 5950x which are probably running much higher clock speeds and much more optimized memory timings then this 96core Zen4 we see the largest increase in the L2 cache numbers..

L1 read = 29924.5 / 6 = 4987 GB/s (+ ~200mb/sec)
L1 write = 17524.5 / 6 = 2920 GB/s
L1 copy
= 30061.4 / 6 = 5010GB/s

L2 read
= 19985.6 / 6 = 3330 GB/s (+ ~1000mb/sec)
L2 write = 19100.7 / 6 = 3183 GB/s
L2 copy
= 18840.3 / 6 = 3140 GB/s

L3 read
= 10257.7 / 6 = 1709 GB/s (+ ~200mb/sec)
L3 write = 9693.1 / 6 = 1615 GB/s
L3 copy
= 9107.5 / 6 = 1517 GB/s
 

Attachments

  • 1658215201544.png
    881.1 KB · Views: 7
Last edited:

naad

Member
May 31, 2022
63
176
66

View attachment 64748

finally some useful data.



If we compare it to a 5G 5900X which has 1/8 core count, Genoa L1, L2 bandwidth devided by 8, then we can see Genoa's high-level cache bandwidth is still a bit better than 5G Vermeer.

I'm not expert but I GUESS it's a good indication.


ouch, looks like the Genoa he tested in AIDA is a 400wTDP one (9664), core/thread count is still unknown.




Those memory and l3 latency figures are quite amazing compared to Milan, which gets around 120ns on memory and 17ns on L3. This is with 4 more channels and DDR5.
L2 seems to only be increased by 2 cycles even though it's doubled, pretty decent.
I kinda expected the memory due to new IOD and GMI3, but the L3 was already really, really fast on AMD, they made it even faster somehow.

All of this should bode quite well for gaming perf
 

lightmanek

Senior member
Feb 19, 2017
390
763
136
Those memory and l3 latency figures are quite amazing compared to Milan, which gets around 120ns on memory and 17ns on L3. This is with 4 more channels and DDR5.
L2 seems to only be increased by 2 cycles even though it's doubled, pretty decent.
I kinda expected the memory due to new IOD and GMI3, but the L3 was already really, really fast on AMD, they made it even faster somehow.

All of this should bode quite well for gaming perf
Yes, indeed!
And this is with DDR5 server memory of unknown speed. Very good improvements all around.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,366
1,594
136
Yeah that memory latency is shockingly good. I honestly expected DRAM latency to increase because of DDR5.

Also, you can infer a range for the clocks from that image. L1 latency is 4 cycles and reported at .1ns granularity, so 1.1 means clocks between 1/(1.15ns/4) and 1/(1.05ns/4) = 3.48GHz to 3.81GHz.

L2 would then be 15 or 16 cycles, which is a reasonable increase from the 12 of Zen 3 for being twize the size.

(edit) actually if you take the lowest clock and assume maximum rounding, L2 could just barely be 14 cycles. I don't think that's likely, though. If the cpu is running at 3.7GHz, the L2 latency is almost certainly 16 cycles.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,623
5,894
136
(edit) actually if you take the lowest clock and assume maximum rounding, L2 could just barely be 14 cycles. I don't think that's likely, though. If the cpu is running at 3.7GHz, the L2 latency is almost certainly 16 cycles.
I think it is around 16 cycles, Zen3 is around 12. Zen's L3 is stellar, even for such a big size

Strange thing is how is memory BW that much? Read is 683GB/s i.e. 57GB/s per channel , but DDR5 4800 cannot provide so much BW
DDR5-4800 * 64 bit * 12 ch = 450GB/s i.e. 37 GB/s per channel

On my 5950X I get around 56 GB/s with DDR4-3600
DDR4-3600 * 64 bit * 2 ch = 56 GB/s i.e. 28GB/s per channel
Memory BW recorded by AIDA is in line with theoretical DDR4 BW.

Seems the second SDP is doing something to distort the values I think.
 

Henry swagger

Senior member
Feb 9, 2022
388
245
86
Poor Sapphire Rapids, even at QS E3 Levels the performance is very lack luster. Genoa is a Monster about to let loose on Poor Xeons, it's going to be the beat down of the century.
Stop it fanboy amd aint giving you stock. Lol

You were already told not to use the term 'fanboy' and yet here you are using it again.
Maybe you can't comprehend what you're told?

Iron Woode

Super Moderator
 
Last edited by a moderator:
Reactions: Exist50

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Stop it fanboy amd aint giving you stock. Lol
Why are you triggered? I have mentioned before. Let's wait for actual QS samples to get a better picture of Sapphire Rapids performance(Never mind Release Samples because that could take until 2023), but there is no much performance difference from the earlier samples to the latest QS E3 samples. YuuKi_AnS made the best he could with both samples(QS SPR vs Early sample Genoa), The Genoa 2S Genoa is Clobbering a apir of QS SPR? Mind you that the Genoa was gimped very hard with Beta OS(Windows Server 2025 Beta version) and with 68 less cores(Cinebench only Supports 256 Threads).
 

yuri69

Senior member
Jul 16, 2013
396
641
136
YuuKi_AnS made the best he could with both samples(QS SPR vs Early sample Genoa), The Genoa 2S Genoa is Clobbering a apir of QS SPR? Mind you that the Genoa was gimped very hard with Beta OS(Windows Server 2025 Beta version) and with 68 less cores(Cinebench only Supports 256 Threads).
TBH spending such a precious opportunity on a beta Windows test setup is sad. Both Cinebench and Windows have problems with high thread count. A quick Linux-based GeekBench run would be much less broken and thus reveal much more.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
TBH spending such a precious opportunity on a beta Windows test setup is sad. Both Cinebench and Windows have problems with high thread count. A quick Linux-based GeekBench run would be much less broken and thus reveal much more.
It's Cinebench or Bust... Get with the program..

I am requesting YuuKi_AnS to test Genoa in CBR23 but with SMP OFF, that way it will use the full 192 Cores. That should yield record breaking numbers more than 120,000 ponts (an OC 5995X just posted 105,000 points at 4.8 Ghz On Water)





Edit. Also he is going to test the 8490H QS sample. Which is a 60 Core processor. Let's hope he remembers to turn the SMT OFF on Genoa... Crossing my fingers for a World Record.
 
Last edited:
Reactions: lightmanek

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,637
14,628
136
Stop it redacted amd aint giving you stock. Lol
Why can't you just learn that people are reacting to plain FACTS. The fact that you apparently support Intel does not make their CPUs go any faster. The Sapphire Rapids QS is getting beaten very badly by a Genoa ES. FACT. Production of both chips could be different, but a QS E3 sample is very close to production. Most likely the difference will be even more when Genoa gets to production, or even QS.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Why can't you just learn that people are reacting to plain FACTS. The fact that you apparently support Intel does not make their CPUs go any faster. The Sapphire Rapids QS is getting beaten very badly by a Genoa ES. FACT. Production of both chips could be different, but a QS E3 sample is very close to production. Most likely the difference will be even more when Genoa gets to production, or even QS.
So let's get this straight. Despite all your lip service to "waiting for benchmarks", you unconditionally believe a leak that directly contradicts known performance and power characteristics of Golden Cove, but do not believe Genoa numbers from the same source showing it failing vs its own predecessor?
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
So let's get this straight. Despite all your lip service to "waiting for benchmarks", you unconditionally believe a leak that directly contradicts known performance and power characteristics of Golden Cove, but do not believe Genoa numbers from the same source showing it failing vs its own predecessor?
There are many Unknown on that implementation of Golden Cove.

Here is the list.

Larger L2:
Mesh Of Rings:
Quad compute tiles per CPU
Buggy Bios.

So far the QS samples are just not performing as they should. We will need to full release products and even then mature BIOS.
 

Abwx

Lifer
Apr 2, 2011
11,056
3,712
136
It's Cinebench or Bust... Get with the program..

I am requesting YuuKi_AnS to test Genoa in CBR23 but with SMP OFF, that way it will use the full 192 Cores. That should yield record breaking numbers more than 120,000 ponts (an OC 5995X just posted 105,000 points at 4.8 Ghz On Water)


View attachment 64762


Edit. Also he is going to test the 8490H QS sample. Which is a 60 Core processor. Let's hope he remembers to turn the SMT OFF on Genoa... Crossing my fingers for a World Record.

Cinebench R15 and R20 ST scores display lower IPC than Zen 3 at the rates of -8 and -13% while CB R23 show an improvement in the vicinity of 8%, all this assuming that frequency is 3.46GHz.

If it top the displayed 3.7GHz max then there s same IPC in CB R23 and much lower one in previous CB versions......
 
Reactions: lightmanek

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,637
14,628
136
So let's get this straight. Despite all your lip service to "waiting for benchmarks", you unconditionally believe a leak that directly contradicts known performance and power characteristics of Golden Cove, but do not believe Genoa numbers from the same source showing it failing vs its own predecessor?
A number of leaks have all pointed to the same thing. The fact that you think they are wrong does not make my belief that they are correct wrong.

Think about this, If the 12900f 8 P-cores uses about 230 watts(maximum), then the 56 cores of SR if clocked the same would be 1617 watts. Even if we use 125 watts, then SR would be 875. You know there is no way thats going to happen. So you need to ignore what you know about golden cove and focus on SR. And what is said above also makes sense.

And yes, ultimately we need to look at production benchmarks, but that will not happen now. The leaks do suggest that Genoa is faster, the question is how much faster.
 
Reactions: Drazick
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |