Question Zen 6 Speculation Thread

OneEng2 · 2025-06-15T19:43:12-0400

@igor_kavinski ,

I still think that using 2 different kinds of memory invalidates the test. When using the same memory type and just changing the timing, what I have generally seen in the past is that gaming performance is minimally effected.

It seems like AMD's X3D lineup conclusively showed that avoiding main memory hits (latency) is much more important than bandwidth.

StefanR5R said:
I wouldn't call it a drastic increase in memory bandwidth per core

Zen 5 Turin D: 192 cores, 252Gb/sec bandwidth (DDR6000 X 12) = 1.31Gb/s/core

Zen 6 Venice D: 256 cores, 1638Gb/sec bandwidth (MRDIMM 12800) = 6.4Gb/s/core.

600% increase seems "drastic" to me.

OneEng2 · 2025-06-15T20:15:24-0400

https://phoronix.com/benchmark/result/amd-epyc-9655-turin-linux-performance-benchmarks/clickhouse-1rhdtr.svgz

A database server application. Note the 2 Xeon systems where the only difference is the memory used. The much greater memory bandwidth didn't help the 128 core Xeon gain that much.

https://phoronix.com/benchmark/result/amd-epyc-9655-turin-linux-performance-benchmarks/rocksdb-rand-read.svgz

This one actually shows a decrease in performance when gaining more bandwidth.

Then we have this one:

https://phoronix.com/benchmark/result/amd-epyc-9655-turin-linux-performance-benchmarks/algebraic-multi-grid-benchmark.svgz

Xeon trounces Turin here. This seems to be a very memory sensitive benchmark.

Anyway, this page is here: https://www.phoronix.com/review/amd-epyc-9655/3

I can only assume that AMD knows what they are doing with the massive uplift in bandwidth for the next gen EPYC's. I find it difficult to believe that there isn't a really good reason to improve the bandwidth in DC.

I do question the need in desktop though. Seems like there are precious few apps that need more than DDR8000 dual channel.... and the lions share of apps that don't even need a fraction of that.

Io Magnesso · 2025-06-15T20:19:10-0400

OneEng2 said:
https://phoronix.com/benchmark/result/amd-epyc-9655-turin-linux-performance-benchmarks/clickhouse-1rhdtr.svgz

A database server application. Note the 2 Xeon systems where the only difference is the memory used. The much greater memory bandwidth didn't help the 128 core Xeon gain that much.

https://phoronix.com/benchmark/result/amd-epyc-9655-turin-linux-performance-benchmarks/rocksdb-rand-read.svgz

This one actually shows a decrease in performance when gaining more bandwidth.

Then we have this one:

https://phoronix.com/benchmark/result/amd-epyc-9655-turin-linux-performance-benchmarks/algebraic-multi-grid-benchmark.svgz

Xeon trounces Turin here. This seems to be a very memory sensitive benchmark.

Anyway, this page is here: https://www.phoronix.com/review/amd-epyc-9655/3

I can only assume that AMD knows what they are doing with the massive uplift in bandwidth for the next gen EPYC's. I find it difficult to believe that there isn't a really good reason to improve the bandwidth in DC.

I do question the need in desktop though. Seems like there are precious few apps that need more than DDR8000 dual channel.... and the lions share of apps that don't even need a fraction of that.

The dual sockets in Xeon6 are still immature.
There should be a range of improvements, but...
Is it an immature problem of control on the software side?

igor_kavinski · 2025-06-15T20:19:19-0400

OneEng2 said:
It seems like AMD's X3D lineup conclusively showed that avoiding main memory hits (latency) is much more important than bandwidth.

It's not just latency though.

https://videocardz.com/newz/second-generation-amd-3d-v-cache-has-up-to-2-5-tb-s-bandwidth-new-i-o-die-shot-revealed

Once the cache is accessed (latency), it needs to be read from/written to really quickly and that's where the increased bandwidth helps.

I think the missing piece of the puzzle is that no one has really investigated the impact of RAM speed on an X3D CPU, going from 3600 to 6400 MT/s in 1:1 mode.

yottabit · 2025-06-15T20:24:52-0400

As much as the memory bandwidth helps certain HPC applications I have to feel that the mega memory bandwidth is an AI play primarily.

Io Magnesso · 2025-06-15T21:25:39-0400

Io Magnesso said:
The wider the bandwidth, the better, as the IPC will be improved accordingly.
Rather than having a problem with the ZEN5 bandwidth, the AM5/IO Die, which is a consumer environment, feels like a bottleneck.
Spec2017 of EPYC9015 has a much better result than the 9950X, which is supposed to have more cores. (Memory speed is 4800)

https://jp.fujitsu.com/platform/server/primergy/performance/pdf/spec-cpu2017.pdf

It's from Fujitsu, but...

QuickyDuck · 2025-06-15T21:32:27-0400

It's not like everyone is going to use MRDIMM. Majority of users will still opt for regular RDIMM.

Also, if you're looking for bandwidth, choose the cpu with less core as it will give you more bandwidth per core.

igor_kavinski · 2025-06-15T21:35:58-0400

Great find, @Io Magnesso

It's not easy comparing Zen 4 and Zen 5 Epyc because the frequencies get a bump with the new gen but this is the closest and most fair comparison I could find in the table without getting a headache:

OneEng2 · 2025-06-15T21:45:37-0400

yottabit said:
As much as the memory bandwidth helps certain HPC applications I have to feel that the mega memory bandwidth is an AI play primarily.

I wonder if that isn't the case.

Last generation of DC from Intel and AMD, the Xeon had a very large bandwidth advantage yet got beaten pretty badly (on average about 40%) which would lead one to believe that EITHER most server loads aren't that bandwidth limited OR Intel did a VERY bad job with their DC processors.

As seen in the server and workstation benchmarks, there are certainly high points for the Xeon, but not that many.

As AMD had a substantial lead last generation and did so with a pretty severe bandwidth deficit, it would seem that AMD sees a future where bandwidth in DC is much more important that it is today. Perhaps AI and LLM is that reason.

igor_kavinski · 2025-06-15T21:47:00-0400

Bandwidth impact on Epyc Zen 5 DDR5-4800 vs. DDR5-6000: https://www.phoronix.com/review/amd-epyc-9755-ddr5/9

OneEng2 · 2025-06-15T21:47:40-0400

Io Magnesso said:
The dual sockets in Xeon6 are still immature.
There should be a range of improvements, but...
Is it an immature problem of control on the software side?

Yea, they seem pretty bad. When the original benchmarks came out everyone just assumed it was some bug that they would fix in a month or two .... yet here we are today and no update that would lead us to believe that a dual socket Xeon is a good idea (note, there were some benchmarks that it worked really well for which is quite a delimit).

OneEng2 · 2025-06-15T21:55:56-0400

igor_kavinski said:
Bandwidth impact on Epyc Zen 5 DDR5-4800 vs. DDR5-6000: https://www.phoronix.com/review/amd-epyc-9755-ddr5/9

Scaling the memory bandwidth up by 25% resulted in various degrees of performance improvement. The top 5 certainly gained greatly (with #1 scaling nearly linearly with the bandwidth improvement).

I wonder how much a new IOD and a 300% increase in per core bandwidth will effect these benchmarks?

Io Magnesso · 2025-06-15T22:59:21-0400

IPC is influenced by bandwidth...
Does memory speed really matter
Is it the bandwidth around the core

Thibsie · 2025-06-15T23:35:39-0400

OneEng2 said:
I wonder if that isn't the case.

Last generation of DC from Intel and AMD, the Xeon had a very large bandwidth advantage yet got beaten pretty badly (on average about 40%) which would lead one to believe that EITHER most server loads aren't that bandwidth limited OR Intel did a VERY bad job with their DC processors.

Just means bottleneck in Xeon is somewhere else.

Tigerick · 2025-06-16T00:18:11-0400

OneEng2 said:
@igor_kavinski ,

I still think that using 2 different kinds of memory invalidates the test. When using the same memory type and just changing the timing, what I have generally seen in the past is that gaming performance is minimally effected.

It seems like AMD's X3D lineup conclusively showed that avoiding main memory hits (latency) is much more important than bandwidth.

Zen 5 Turin D: 192 cores, 252Gb/sec bandwidth (DDR6000 X 12) = 1.31Gb/s/core

Zen 6 Venice D: 256 cores, 1638Gb/sec bandwidth (MRDIMM 12800) = 6.4Gb/s/core.

600% increase seems "drastic" to me.

I think you miscalculated Turin Dense memory bandwidth. It should be:

Turin Dense 12 x DDR5-6000: 614GB/s / 192 = 3.2 GB/s per core
Venice SP8 8 x DDR5-8000: 512GB/s / 128 = 4 GB/s per core
Venice 16 x DDR5-8000: 1024GB/s / 256 = 4 GB/s per core
Venice 16 x MRDIMM-12800: 1638GB/s / 256 = 6.4 GB/s per core

MRDIMM essential is a 6400 double bumped to 12800 (with latency trade-off). Unless there is bump in memory capacity, DDR5-8000 should be sufficient for majority customers.

rainy · 2025-06-16T03:27:19-0400

OneEng2 said:
Zen 5 Turin D: 192 cores, 252Gb/sec bandwidth (DDR6000 X 12) = 1.31Gb/s/core

Zen 6 Venice D: 256 cores, 1638Gb/sec bandwidth (MRDIMM 12800) = 6.4Gb/s/core.

600% increase seems "drastic" to me.

It seems drastic because you've made an obvious mistake: 12 channels of DDR5-6000 translate to 576GB/s.

Cardyak · 2025-06-16T03:49:05-0400

Io Magnesso said:
IPC is influenced by bandwidth...
Does memory speed really matter
Is it the bandwidth around the core

I interpreted this as a haiku

StefanR5R · 2025-06-16T04:29:01-0400

OneEng2 said:
252Gb/sec

up to 614 GBytes/s

up to 12 · up to 6400 MT/s — versus — up to 16 · up to 12800 MT/s
(That's 1 : 2⅔ per socket.)

1 T moves 64 b = 8 B.

Edit: I shouldn't post before reading all of the other new responses.

rainy · 2025-06-16T04:33:47-0400

StefanR5R said:
up to 614 GBytes/s

up to 12 · up to 6400 MT/s — versus — up to 16 · up to 12800 MT/s

1 T moves 64 b = 8 B.

614GB/s is indeed correct if you're using DDR5-6400 and not DDR5-6000 like in his example.

StefanR5R · 2025-06-16T05:07:22-0400

Tigerick said:
Turin Dense 12 x DDR5-6000: 614GB/s / 192 = 3.2 GB/s per core

Venice SP8 8 x DDR5-8000: 512GB/s / 128 = 4 GB/s per core

Venice 16 x DDR5-8000: 1024GB/s / 256 = 4 GB/s per core

Venice 16 x MRDIMM-12800: 1638GB/s / 256 = 6.4 GB/s per core

Apropos.

Rome 8 x DDR4-3200: 204.8 GB/s / 64 = 3.2 GB/s per core
Naples 8 x DDR4-2666: 170.6 GB/s / 32 = 5.3 GB/s per core :-D

It's apples to oranges of course. Naples was AMD's return to server mainly via the HPC segment as their entry door. Rome was dipping into general purpose and hyperscalers. Turin Dense is mainly cloud. Venice coincides with AMD's first rack-level AI solution.

LightningZ71 · 2025-06-16T06:55:28-0400

Keep in mind, when X3D chips are brought up, it's not just an average latency improvement that's present, it's also an average apparent bandwidth improvement. Each of those memory hits that are served by the 3d cache don't just get the first word to the core quicker, the following words are streamed at far higher effective bandwidth than they would have if they were coming from main memory.

OneEng2 · 2025-06-16T09:06:03-0400

rainy said:
It seems drastic because you've made an obvious mistake: 12 channels of DDR5-6000 translate to 576GB/s.

Not sure how I got my previous number. You are correct.

Turin D: 3GB/sec/core
Venice D: 6.4GB/sec/core

Still more than double the bandwidth per core vs Turin.... and therefore still drastic IMO.

OneEng2 · 2025-06-16T09:14:43-0400

LightningZ71 said:
Keep in mind, when X3D chips are brought up, it's not just an average latency improvement that's present, it's also an average apparent bandwidth improvement. Each of those memory hits that are served by the 3d cache don't just get the first word to the core quicker, the following words are streamed at far higher effective bandwidth than they would have if they were coming from main memory.

Absolutely. Of course, for many of the benchmarks being quoted, the transfers are artificially large in order to explicitly ensure that it will come from main memory.

In real world applications, I think that this is rarely the case and quite a bit of memory access is kept local within one of the levels of cache thus preventing a main memory access. Additionally, the specific memory elements in cache are frequently accessed repetitively.

All in all, I think that the new IOD and faster memory speeds are going to be the biggest improvements we see in Zen 6 platforms next year.

I am actually not expecting much in the way of IPC of Zen 6 over Zen 5 (10-15%) and little or no increase in clock speed. In more bandwidth limited operations (likely in DC applications / HPC /AI) much larger increases in performance will be seen than the IPC bump. At a minimum, we can expect the larger core counts will result in higher performance. The IPC and bandwidth increases would then be on top of that.

Zen 6 Venice should be a very powerful platform indeed.

OneEng2 · 2025-06-16T09:18:01-0400

Thibsie said:
Just means bottleneck in Xeon is somewhere else.

Possibly; however, bandwidth limited benchmarks indicate that the performance scales with memory bandwidth. It just doesn't scale in all cases enough to be competitive with EPYC Turin.

StefanR5R · 2025-06-16T11:25:39-0400

OneEng2 said:
Turin D: 3GB/sec/core
Venice D: 6.4GB/sec/core

Still more than double the bandwidth per core vs Turin.... and therefore still drastic IMO.

...vs. Turin-D!
Also much more level 3 cache per core, and much bigger level 3 cache per core complex.

This merely shows a very different focus of the highest-core-count Venice[-D] compared to the highest-core-count Turin[-D]. (A different focus enabled by manufacturing process progress, by JEDEC's work, and more.) Turin-Dense is not an AI-AI-AI powerhouse, Turin-Classic is.

As for 6.4GB/sec/core: Good job. I have exactly this at home for more than five years now. ;-)

Question Zen 6 Speculation Thread

Senior member

Senior member

Member

Lifer

Golden Member

Member

Member

Lifer

Senior member

Lifer

Senior member

Senior member

Member

Golden Member

Senior member

Senior member

Member

Elite Member

Senior member

Elite Member

Platinum Member

Senior member

Senior member

Senior member

Elite Member