Question Zen 6 Speculation Thread

Page 142 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MS_AT

Senior member
Jul 15, 2024
667
1,352
96
So the new definition of "well fed" is that main memory has L1 bandwidth?
Feel free to ridicule me but the core is capable of processing at the rate much higher than what the connection between CCD and IOD allow. Will increasing the avg MemBW by 30% make it suddenly
just fine even with AVX512.
I would say no. I mean it's better to have that increase than not have it. But the more useful thing would be wider read/write link to memory, as right now single CCD SKU won't be able to even consume this 8000MT/s sensibly. And we don't know what the new IOD will bring. If we go by HALO as the example, it's still not an equivalent of what low core counts Epycs can achieve when it comes to BW available to CCD.

At the same time, I really like Zen5 AVX512 implementation, and I hope they won't gimp it for Zen6, as not all algos are memory bound.
 
Reactions: Tlh97

DrMrLordX

Lifer
Apr 27, 2000
22,590
12,476
136
24 Zen 6 cores is enough to compete. It doesn't need to match it in CB nT, only beat it in 1T. Much like the 7700K didn't need to match the 1800X in nT for it to mop the floor with it in sales. Making a product for the one or two lunatics using a consumer CPU to transcode is an interesting business decision if real.

Also let's keep in mind that jumping from 4c/8t to 8c/16t was a much bigger deal back then than it would be to jump from 24c/48t to 48c/48t (when the majority of those cores are monts). Many power users are struggling to utilize even 16c/32t fully. Also people are beginning to notice on Zen5 that DDR5-8000 2:1 can (sometimes, depending on the application) be competitive with DDR5-6400 1:1, indicating that bandwidth starvation is already happening. Zen6 24c is going to need even more bandwidth.

People don't really need the 4090/5090

Whether they need it is irrevelant. The question is: can they use it? And in most cases, the answer is "yes". When it comes to increasing CPU core counts, the answer is not always so clear.

I wonder if 144 MB L3 on TSMC N2 passes the 50% Gross Margin test...

Can Intel even afford many N2 wafers? They didn't secure any N2 supply when they dumped cash on TSMC for N3B. TSMC will make them pay upfront for wafers, too.
 

DavidC1

Golden Member
Dec 29, 2023
1,475
2,420
96
You should refer to latest (Q1) AMD earnings conference call. Higher ASPs especially in desktop were referred to more than once as a contributor to the strong Q1 results.
Exactly. And as I keep pointing out, and even Intel has noticed, that enthusiasts affect purchasing decisions of people around them.

The companies that ignore the enthusiast segment eventually ends up dying. It's no wonder there are "gaming" versions of every computer parts and even furnitures.

Speaking of margins, the enthusiast parts have the highest margins. Do people really think the office workers care even a bit about the boring boxes under them?
 
Reactions: Joe NYC

LightningZ71

Platinum Member
Mar 10, 2017
2,222
2,721
136
Pretty straightforward, having the highest performing parts in a given generation is more than just it's per-unit ASP and margin. It is ALSO a marketing tool. It's existence pays for more marketing reach than most any other marginal increase in marketing expenditures. This is in the managerial accounting textbooks as having a premium offering in a given market segments generates it's own market momentum where the moat is performance/suitability to task and the products are effectively interchangeable.

X3D has paid for itself more through it's marketing effects than it ever did in per unit margins (data center excluded as I don't know the one and outs of that game well enough).
 

DavidC1

Golden Member
Dec 29, 2023
1,475
2,420
96
X3D has paid for itself more through it's marketing effects than it ever did in per unit margins (data center excluded as I don't know the one and outs of that game well enough).
History shows you can't ignore the enthusiast even though datacenter will get you more profits.

If you read about history of 3dfx, they got an immense boost because it managed to bring 3D graphics quality not too far from SGI's immensely expensive workstations to the affordable consumer PC.

Let's compare enthusiast focused company versus datacenter only one.
1. Datacenter chips are potentially tens of thousands of dollars. So they don't need to learn about cost efficiency.
2. Datacenter chips are extremely power hungry. So they don't need the latest power management techniques.
3. Datacenter volumes are far less, meaning they need to cater far less in terms of support. Client volumes are massive, meaning you need to focus on every detail, complaints, and whims of the consumer.

Over time, eventually the enthusiast only company looks at datacenter and says "Hey look at the massive revenue potential". And they take their learnings regarding power efficiency, cost efficiency, and great customer support and absolutely dominate the datacenter.
 
Last edited:

OneEng2

Senior member
Sep 19, 2022
591
844
106
Wow. I guess I have to consider the source here .

Datacenter is higher profit and is growing faster than any other segment. AMD is also growing faster in DC than any other segment.

I believe that this is the most important contributor to AMD's recent success.

Desktop in general and DIY in specific are decreasing markets. While its great that AMD has raised their ASP in desktop, they are focusing on DC:

It just happens that the people who care about performance of PCs coincide with people who care about gaming on PC.
I doubt that workstation people, in general, care about gaming on PC's (or at all).
OK, this thread has had enough, this is clearly trolling at this point.
Why is it trolling to point out a fact?
Feel free to ridicule me but the core is capable of processing at the rate much higher than what the connection between CCD and IOD allow. Will increasing the avg MemBW by 30% make it suddenly

I would say no. I mean it's better to have that increase than not have it. But the more useful thing would be wider read/write link to memory, as right now single CCD SKU won't be able to even consume this 8000MT/s sensibly. And we don't know what the new IOD will bring. If we go by HALO as the example, it's still not an equivalent of what low core counts Epycs can achieve when it comes to BW available to CCD.

At the same time, I really like Zen5 AVX512 implementation, and I hope they won't gimp it for Zen6, as not all algos are memory bound.
Going from 5600 to 8000 I am figuring at 43%. Did I get that right?

And yes, I am critisizing you for comparing L1 bandwidth to main memory bandwidth. I am also saying that increasing bandwidth to main memory and lowering latency through an improved IOD can bring surprising improvements in MT performance ESPECIALLY in bandwidth hungry applications.

Do you disagree?
Exactly. And as I keep pointing out, and even Intel has noticed, that enthusiasts affect purchasing decisions of people around them.

The companies that ignore the enthusiast segment eventually ends up dying. It's no wonder there are "gaming" versions of every computer parts and even furnitures.

Speaking of margins, the enthusiast parts have the highest margins. Do people really think the office workers care even a bit about the boring boxes under them?
DC has the highest margins, not enthusiast parts.

I'll give you this though. Gamers are one vocal minority!

I mean, seriously, Arrow Lake really got lambasted because of its sub standard gaming. If you look over the processor though, I would say it SHOULD have ben lambasted for its poor showing in MT overall. It actually does a decent job of non-latency constrained ST.
Pretty straightforward, having the highest performing parts in a given generation is more than just it's per-unit ASP and margin. It is ALSO a marketing tool. It's existence pays for more marketing reach than most any other marginal increase in marketing expenditures. This is in the managerial accounting textbooks as having a premium offering in a given market segments generates it's own market momentum where the moat is performance/suitability to task and the products are effectively interchangeable.

X3D has paid for itself more through it's marketing effects than it ever did in per unit margins (data center excluded as I don't know the one and outs of that game well enough).
Granted. It is good to be king .

I think AMD will continue to hold the gaming crown with their X3D parts, but if I were Intel, I would be figuring out how to recapture the DC market they are bleeding. AMD made more revenue that Intel did last year on DC parts. Ouch. That hurts if your Intel.

You don't want to be losing market share in the fastest growing, highest margin market segment.
History shows you can't ignore the enthusiast even though datacenter will get you more profits.

If you read about history of 3dfx, they got an immense boost because it managed to bring 3D graphics quality not too far from SGI's immensely expensive workstations to the affordable consumer PC.

Let's compare enthusiast focused company versus datacenter only one.
1. Datacenter chips are potentially tens of thousands of dollars. So they don't need to learn about cost efficiency.
2. Datacenter chips are extremely power hungry. So they don't need the latest power management techniques.
3. Datacenter volumes are far less, meaning they need to cater far less in terms of support. Client volumes are massive, meaning you need to focus on every detail, complaints, and whims of the consumer.

Over time, eventually the enthusiast only company looks at datacenter and says "Hey look at the massive revenue potential". And they take their learnings regarding power efficiency, cost efficiency, and great customer support and absolutely dominate the datacenter.
Yea, about that ....

Considering the exponentially rising cost of die shrinks, any company that is relying on high volume, lower margin parts (which desktop and laptop are) is going to go under. All that wafer space costs a butt ton of money.

There is a reason AMD has stated that they are data center FIRST focused.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,061
465
126
In other words it becomes a balancing act as you have to decide when to handle over, you complicate scheduling and your user who bought 600$ CPU will get choppy experience when trying to run speedometer because you got the scheduling thresholds wrong.
It works perfectly fine on other big/little core CPUs, and it’ll work fine on AMD Zen6 too, unless someone screws up the OS scheduler code.
 
Last edited:

Darkmont

Member
Jul 7, 2023
28
56
61
I’m not about to leak the source (anyone whos anyone knows who posted it) but it’s incorrect that bLLC is only for the dual compute tile SKUs
 
Reactions: 511

Darkmont

Member
Jul 7, 2023
28
56
61
Actually, I can’t find ANY mention of bLLC even being on the dual CT version. Yet another reason to not take the professional Chiphell surfer at face value as a primary source
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,061
465
126
Because you want serious performance if you go TR. It's not more complicated with that.
Again, you won’t be getting more performance with TR than 52C NVL unless you go for 64C+ TR. So 16-32C TR will not be sensible to get, unless AMD drops the price significantly on those.
I'm not invalidating your user case or your desire for it, but it's not a product with a big market.
Neither is TR. But we’re talking about what makes most sense to get for that small market segment.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,061
465
126
Nobody needs 52 cores in a desktop PC.
So how many do they need? And which of the AMD TR 12-96C SKUs should AMD kill off?

And note that the user does not care if you label your CPU as DT or HEDT. What they care about is what performance you get for what price. They’ll buy whatever provides best perf per $, just like everyone else.
 
Last edited:

Fjodor2001

Diamond Member
Feb 6, 2010
4,061
465
126
They can't get the same performance. If, for instance, your finite element model has so many degrees of freedom that you have good use for fifty-two computing threads, then you also tend to have use for eight DDR5 DRAM channels. (And for memory error detection and correction too.)
That’s one single specific use case. There are many others where you don’t need it, especially if we’re talking about ”only” 52C without SMT.
If the use case is not data intensive, then a four-core CPU is most often a perfect fit for the use case.
Not sure if you’re serious. You’re saying anything above 4C is memory bandwidth bottlenecked on Zen5/Zen6 and NVL-S?
 

Timorous

Golden Member
Oct 27, 2008
1,947
3,792
136
So how many do they need? And which of the AMD TR 12-96C SKUs should AMD kill off?

And note that the user does not care if you label your CPU as DT or HEDT. What they care about is what performance you get for at what price. They’ll buy whatever provides best perf per $, just like everyone else.

Unless the P cores are strong a lot of DIY desktop buyers will buy 11800X3D or whatever AMD name the 12c single CCD X3D part.

Partly because they already have an AM5 motherboard so the cost is just the chip not chip+ motherboard and partly because it will be the best performing gaming part.

5800X3D was popular because it performed between 12900K DDR4 and 12900K DDR4 at a price point that was a lot lot cheaper for existing AM4 owners than swapping to the Intel platforms.

The top single CCD zen 6 part will be similar for AM5 with a lot of zen 4 and non X3D zen5 owners potentially upgrading to it for their gaming rig.

Also for productivity users the 24C Zen 6 will not be as behind as you think. It will probably lose in CB but in any workloads that don't fully load 52c, which is a lot, you will see better performance with Zen 6 and that will probably be the real money making workflows to boot. I would not be surprised if Zen 6 was ahead in Puget Bench for example.

If those users really do have need for more than 24c then it will be for a profit motive and TR will be lurking with more PCIe lanes and more mem bandwidth and other features those profit driven users will likely benefit from even with a higher upfront cost.

Very very few people will DIY build a TR system just because they want to play with it, it is a market so tiny no sane company would cater to it, so maybe Intel actually will.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,485
10,080
136
Again, you won’t be getting more performance with TR than 52C NVL unless you go for 64C+ TR.
You are talking Cinebench only, and that's just boring.
[PS; I think you meant 48t+ TR. And compared with a 350 W NVL.]
But we’re talking about what makes most sense to get for that small market segment.
Cinebenching 24/7/365 is not a small market, it is no market at all.
There are many others where you don’t need [high core counts and decent memory bandwidth combined], especially if we’re talking about ”only” 52C without SMT.
In all of the real use cases in which you have small datasets, you also don't need "52" (48) threads. (Even in some cases with comparably big datasets, 48 threads get you nowhere where e.g. 16 threads wouldn't get you also.)
 
Last edited:
Reactions: Joe NYC

511

Platinum Member
Jul 12, 2024
2,300
1,969
106
I’m not about to leak the source (anyone whos anyone knows who posted it) but it’s incorrect that bLLC is only for the dual compute tile SKUs
That's news to me lol

Actually, I can’t find ANY mention of bLLC even being on the dual CT version. Yet another reason to not take the professional Chiphell surfer at face value as a primary source
Raichu is a fairly reliable leaker tbh and he is usually right.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,061
465
126
Performance gains in client, for each additional thread beyond ~32 threads approaches zero.
And this is any different between TR and NVL-S or any other CPU how exactly?

Also, it’s actually any additional thread beyond 1, assuming you’re talking about Amdahl’s law. But I don’t think you recommend we should only use 1C CPUs.
 
Last edited:

Fjodor2001

Diamond Member
Feb 6, 2010
4,061
465
126
Unless the P cores are strong a lot of DIY desktop buyers will buy 11800X3D or whatever AMD name the 12c single CCD X3D part.
Agreed, if the 12C Zen6 X3D part wins the gaming crown, a lot of gamers will get that CPU.
Also for productivity users the 24C Zen 6 will not be as behind as you think. It will probably lose in CB but in any workloads that don't fully load 52c, which is a lot, you will see better performance with Zen 6
Zen6 24C will be SMT so it’s 48T, which is rougly the same number of threads as 52C/T for NVL-S.

Also, it’ll be 26-28C if you include the LPE cores on Zen6. So potentially 50-56T.

So the problem of ”loading threads fully” apply to Zen6 as well.

In any case, I think the discussion here with regards to TR vs NVL-S 52C vs Zen6 24C assumes that you’ll actually have use cases where you need and use many threads. E.g. no point in getting a 96C/192T TR if you only have workloads that use 64T.
If those users really do have need for more than 24c then it will be for a profit motive and TR will be lurking with more PCIe lanes and more mem bandwidth
For those that actually need more PCIe lanes or whatever, sure. But for those that just need max MT perf per $, there are huge cost savings by going with regular DT instead of HEDT. For 64C+ there will be no alternative than to go with HEDT, but for 16-32C HEDT will not make sense in most cases unless you have special needs such as more PCIe lanes.
 

inquiss

Senior member
Oct 13, 2010
447
665
136
Never said that. But how many moms and pops will be getting 24C Zen6 or TR for that matter, or run workloads that make full use of them?
None. Any workload that you use to increase productivity that requires 52c nova lake or TR, i.e.. is very parallel will always take more threads. So TR will win, or epyc. The market for people that need threads but like more than 16 and less than what you can afford is tiny. Either you need threads or you don't.

The market for client that need higher threads but won't pay for it is hobbyists. Totally fine hobby. Just not a big market. I honestly don't expect high core count nova to exist. Feels like that's pretty early on the chopping block.
 

StefanR5R

Elite Member
Dec 10, 2016
6,485
10,080
136
You’re saying
No. That's not what I said.

SW compilation
Mr. Amdahl has a hint for you.

[Edit, I for one started using a 56-threaded computer for software compilation in 2016 (as a secondary use case for this computer; it was built for engineering simulations primarily). Meanwhile, I suppose others know how software compilation works from looking at benchmark bar graphs on the internet.]

Keep in mind that the cores in IOD will not share caches with CCDs. Every time you will put CCD to sleep you have to flush caches (you want to cut power to SRAM too as it always drains power). So when you wake the cores up you have to populate them again, etc. This costs both time and power.

In other words it becomes a balancing act as you have to decide when to handle over, you complicate scheduling and your user who bought 600$ CPU will get choppy experience when trying to run speedometer because you got the scheduling thresholds wrong. Only to save few watts that are drowned out by power innefficiences of the power supply. I mean sure, they might be able to run the workload good enough, but for good enough you could have bought n100

This is why I would dedicate LP cores only to predictable low interactive tasks, ideally background tasks. And ideally give the OS an API that would let software devs mark needed QoS level of the threads their program is using. It is also not to say they will be useless as they can handle background OS stuff giving big cores more time to serve the user.

This could work quite fine with this https://www.guru3d.com/story/windows-11-25h2-introduces-user-interactionaware-cpu-power-management/

But anyway this is more or less my reasoning why
It works perfectly fine on other big/little core CPUs, and it’ll work fine on AMD Zen6 too, unless someone screws up the OS scheduler code.
Actually read, not just glance over, what you are responding to. Microsoft is still working on it.¹ They add detection of non-interactive low-energy workload periods. And they seem to add _offlining_ of logical CPUs to power management. (Server operating systems have been supporting logical CPUs going offline and online for some time now. Client operating systems and applications haven't dealt with this yet, and it will be interesting to watch the fallout when it gets introduced to client.)

¹) Energy-aware scheduling is a work-in-progress in Linux too.
 
Last edited:
Reactions: Joe NYC
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |