Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 49 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Toggle sidebar Toggle sidebar

K

Kedas

Senior member

Jun 1, 2021

#1

Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"

Last edited: Jun 1, 2021

Reactions: Tlh97 and Gideon

Sort by date Sort by votes

DrMrLordX

Lifer

Dec 30, 2021

#1,201

Only things I really miss about Win11 at this point are the two-click to get to the program list thing that you can't fix and the way it groups up task bar items instead of letting them have their own tabs.

So um

About Zen3D

Anyone else find it funny that we're already seeing pre-orders for Milan-X EPYC but nothing for Vermeer-X?

Reactions: Joe NYC

Upvote 0 Downvote

esquared

Forum Director & Omnipotent Overlord

Forum Director

Dec 30, 2021

#1,202

Ok, guys stop with the Off Topic posts. This is the
64MB V-Cache on 59XX Zen3 Average +15% in Games thread.

Want to talk about Windows 11? Start your own thread.

esquared
Anandtech Forum Director

Reactions: Tlh97, french toast, TESKATLIPOKA and 7 others

Upvote 5 Downvote

J

Joe NYC

Diamond Member

Dec 31, 2021

#1,203

DrMrLordX said:
Anyone else find it funny that we're already seeing pre-orders for Milan-X EPYC but nothing for Vermeer-X?

Just a speculation, but I think AMD may be taking a hard turn to Milan-X. There was some cost analysis article I came across recently, showing the SRAM to be super cheap to produce (as I have been saying).

Also, I am not sure if AMD is going to be switching to B2 stepping with regular Milan or make it a Milan-X only.

As I said, speculation only, but it seems that Google and Microsoft want to jump on the Confidential Computing, but it has still been only in pre-release state. Maybe it needs the B2 stepping for full release. And maybe AMD will say B2 comes only with Milan-X.

Milan-X is going to have a bit of a price increase, so maybe really wants to make hay while the sun shines...

As far as Vermeer-X, I guess we will hear something in 4 days, but there are no leaks at all, as you said. Nothing. No hints coming from any reviewers, so I don't think they have it. January 4 release looks to be very much in doubt. It is probably going to continue to be treated like a stepchild...

Upvote 0 Downvote

DrMrLordX

Lifer

Dec 31, 2021

#1,204

Joe NYC said:
Just a speculation, but I think AMD may be taking a hard turn to Milan-X.

That is possible. It sort of depends on how many of the B2-stepping dice are flexible enough to bin for either Vermeer-X or Milan-X. Vermeer-X isn't going to make them anywhere near as much money as Milan-X. If Raphael is indeed going to be announced/launched at CES, AMD doesn't have long to wait before burying Alder Lake and Raptor Lake in one shot (like they even care; Vermeer is surprisingly competitive with Alder Lake as it is).

Reactions: Tlh97 and Joe NYC

Upvote 0 Downvote

J

Joe NYC

Diamond Member

Jan 1, 2022

#1,205

DrMrLordX said:
That is possible. It sort of depends on how many of the B2-stepping dice are flexible enough to bin for either Vermeer-X or Milan-X. Vermeer-X isn't going to make them anywhere near as much money as Milan-X.

I think the B2 stepping will improve the binning a little bit, say just 100 MHz, then the binning at clock speed AMD has been offering becomes almost a non issue.

Also, with half or more cores disabled, in 16 and 32 core SKUs, finding 2 or 4 really good cores out of 8 should not be a problem at all.

I think AMD has an easy trajectory to reach 25% server market share by mid 2022, so that is the priority, or to even exceed that.

If, as some people (namely MLID) say that it is the substrate that is the bottleneck, not silicon from TSMC than shifting capacity to Milan from Vermeer would result in being able to sell more silicon and more expensive silicon per substrate used.

DrMrLordX said:
If Raphael is indeed going to be announced/launched at CES, AMD doesn't have long to wait before burying Alder Lake and Raptor Lake in one shot (like they even care; Vermeer is surprisingly competitive with Alder Lake as it is).

I think Raphael may at best be teased, officially put on road map (after it disappeared from it).

Absolutely zero leaks form anywhere, including Mobo makers means no chance of launch. for Raphael anywhere close to CES.

Vermeer-X does not need any platform changes, so even with zero leaks, it is still possible Vermeer could get launched, maybe with another date - shipping date shortly after CES... That would be within the realm of possibilities.

Upvote 0 Downvote

biostud

Lifer

Jan 1, 2022

#1,206

I think they’ll roll out B2 to all lines, for EPYC and threadripper it will only be b2 as they will not need to validate different steppings for servers/professional use. The b2 will will probably silently take over ryzen, as the older stepping goes out of production/stock.

Reactions: Tlh97 and Joe NYC

Upvote 0 Downvote

T

tomatosummit

Member

Jan 1, 2022

#1,207

Joe NYC said:
I think the B2 stepping will improve the binning a little bit, say just 100 MHz, then the binning at clock speed AMD has been offering becomes almost a non issue.
*redacted*

I think Raphael may at best be teased, officially put on road map (after it disappeared from it).

Regarding b2 clocks, attributed to either stepping or just another extra year of process maturity, it'll depend more on what clocks the top stacked cache part will have. If the 6950x3d can hit 5 or 4.9ghz still then that'll be the limit and you can probably drop the rest of the line in increments from there. It's a tragedy if the halo part has to drop clocks.
Another question is will the non stacked parts have the same or slightly reduced peak turbos.
Not that I think it matters too much, the all core clock speeds will be more telling, could non-stacked parts could have a higher sustained all core for example.

As for the raph, there was the gigabyte leak, which revealed the existence of the 6?series motherboards so the engineering information is out there.
But it's been said too many times at this point that amd has gotten very good at not leaking much and that's especially true for performance metrics.

Regardless raphael is in a troublesome spot. It's biggest competition is going to be vermeer-x so revealing it too early would be damaging to their product cycle, no matter how much people like us are craving for any informtion.
Also will raphael have a stacked cache option on or very close to launch? Without 3d cache it might not fully outperform vermeer-x. There are only questions for raph and keeping it all under wraps is the best option for now.

Upvote 0 Downvote

biostud

Lifer

Jan 1, 2022

#1,208

The last official about B2 says nothing about higher max clocks, but might be able to sustain all core turbo boost @ max frequency for longer periods of time.
I really don't think B2 is anything groundbreaking.

Reactions: Tlh97 and Joe NYC

Upvote 0 Downvote

J

jpiniero

Lifer

Jan 1, 2022

#1,209

tomatosummit said:
Also will raphael have a stacked cache option on or very close to launch?

Doesn't sound like it. Depending on when Zen 5 launches it could be a mid cycle refresh.

Upvote 0 Downvote

J

Joe NYC

Diamond Member

Jan 1, 2022

#1,210

tomatosummit said:
Regarding b2 clocks, attributed to either stepping or just another extra year of process maturity, it'll depend more on what clocks the top stacked cache part will have. If the 6950x3d can hit 5 or 4.9ghz still then that'll be the limit and you can probably drop the rest of the line in increments from there. It's a tragedy if the halo part has to drop clocks.
Another question is will the non stacked parts have the same or slightly reduced peak turbos.
Not that I think it matters too much, the all core clock speeds will be more telling, could non-stacked parts could have a higher sustained all core for example.

Well, there is a good reason for that. Without V-Cache, on a cache miss, the core is idle at high clock speed, doing nothing, keeping cool, but with V-Cache, there will be fewer cache misses, so core doing more work, generating more heat.

So, the work done is more important metric than all core clock speed. Because comparing clock speed of core without and with V-Cache is comparing apples with oranges.

tomatosummit said:
Regardless raphael is in a troublesome spot. It's biggest competition is going to be vermeer-x so revealing it too early would be damaging to their product cycle, no matter how much people like us are craving for any informtion.
Also will raphael have a stacked cache option on or very close to launch? Without 3d cache it might not fully outperform vermeer-x. There are only questions for raph and keeping it all under wraps is the best option for now.

Also, Vermeer vs. Vermeer-X, as far as the Osborne Effect. Because they are actually swappable.

As far as Raphael without V-Cache vs. Vermeer X, my guess would be that Raphael would on average, outperform Vermeer X. But Vermeer-X will win some ...

I am not sure how much of an Osborne effect there would be between the 2. Raphael is a whole new platform, with higher cost than a Vermeer-X upgrades for people who may already have the mobo and RAM.

The V-Cache option for Raphael, I think, will depend on when Raphael launches. If it launches late in the year, V-Cache may be available on launch. If it comes out early, say mid-year, then probably not...

Reactions: Tlh97 and Vattila

Upvote 0 Downvote

J

Joe NYC

Diamond Member

Jan 1, 2022

#1,211

jpiniero said:
Doesn't sound like it. Depending on when Zen 5 launches it could be a mid cycle refresh.

I think it depends more on whether the technical challenges have been solved, and it is manufacturable.

First, there is TSMC timeline which says N5 will become available for stacking, as the bottom die, only in H2 2022.

Then, I think there is another challenge to get more than 1 layer of cache to work. I am guessing that AMD will want to have that feature / option for Zen 4 processors...

As a way to fix the broken Moore's law in a different way...

Upvote 0 Downvote

Schmide

Diamond Member

Jan 1, 2022

#1,212

Joe NYC said:
Well, there is a good reason for that. Without V-Cache, on a cache miss, the core is idle at high clock speed, doing nothing, keeping cool, but with V-Cache, there will be fewer cache misses, so core doing more work, generating more heat.

On an OOP processor, the core is never idle. It may push latency into some sequences, but you would have to work hard, like custom purposefully bad code hard to make a stall that long with such large register file (zen 168).

Upvote 0 Downvote

K

Kedas

Senior member

Jan 2, 2022

#1,213

I'm pretty sure B2 is mainly a needed optimization for V-Cache support.
And they probably also took the opportunity to fix a few things that needed fixing in the microcode before, anything more than that I would be surprised.

I'm even a bit surprised that the B2 stepping isn't out there yet, maybe soon or people didn't notice, Zen3 production must have switched to B2 some time before they started making the V-Cache versions last year.

edit: nevermind here it is: https://min.news/en/tech/b43fdca64913b7776ea173b95ec971fa.html

edit2: why not 6nm, because TSMC 6nm doesn't support V-cache...

Last edited: Jan 2, 2022

Reactions: Schmide and Joe NYC

Upvote 0 Downvote

DrMrLordX

Lifer

Jan 2, 2022

#1,214

Kedas said:
edit2: why not 6nm, because TSMC 6nm doesn't support V-cache...

Kind of a head-scratcher since TSMC is mostly pushing their N7 customers on to N6 where possible. But Vermeer/Milan will stay N7, even with stacked cache.

Upvote 0 Downvote

K

Kedas

Senior member

Jan 2, 2022

#1,215

DrMrLordX said:
Kind of a head-scratcher since TSMC is mostly pushing their N7 customers on to N6 where possible. But Vermeer/Milan will stay N7, even with stacked cache.

Well it increases the wafer production capacity with 20% for TSMC (hence the push) and on top of that customers have about 15% more dies per wafer, so APUs on 6nm is an easy choice.
Zen4 release will mostly be determined by TSMC 5nm V-cache stacking support.

Reactions: Joe NYC

Upvote 0 Downvote

DrMrLordX

Lifer

Jan 2, 2022

#1,216

Kedas said:
Well it increases the wafer production capacity with 20% for TSMC (hence the push) and on top of that customers have about 15% more dies per wafer, so APUs on 6nm is an easy choice.

Makes sense, hence my confusion about all Vermeer products staying on N7. And where did you hear that N6 didn't support stacked cache?

Upvote 0 Downvote

M

Mopetar

Diamond Member

Jan 2, 2022

#1,217

Joe NYC said:
Well, there is a good reason for that. Without V-Cache, on a cache miss, the core is idle at high clock speed, doing nothing. . . .

If that occurs and the pipeline is completely stalled due to dependencies it will just start utilizing SMT. The core isn't really idle either. It's still running and just inserting NOPs because it doesn't know at the time how long it will need to stall because it can't know how bad the cache miss is until it's worked it's way through to that point. Either it runs the hyper thread while that's being worked out or it tries to execute other instructions that don't have dependencies.

It may be generating less heat just because control lines are being forced to 0 and as a result fewer transistors are switching, but most of that logic is still operating, it's just that the results are being discarded. The only way for it to idle is for the core to communicate the stall to the OS scheduler and for that to down-clock the core. Of course it could just load another thread and start that if it knows the other thread will be stalled for a while.

Reactions: Joe NYC, Insert_Nickname, Tlh97 and 1 other person

Upvote 0 Downvote

K

Kedas

Senior member

Jan 2, 2022

#1,218

DrMrLordX said:
Makes sense, hence my confusion about all Vermeer products staying on N7. And where did you hear that N6 didn't support stacked cache?

I haven't seen it as 'Not supported' but the half nodes are missing if they list it 3, 5, 7.
Could be carelessness or correct that half nodes don't support stacking.

Upvote 0 Downvote

D

Doug S

Diamond Member

Jan 2, 2022

#1,219

Kedas said:
I haven't seen it as 'Not supported' but the half nodes are missing if they list it 3, 5, 7.
Could be carelessness or correct that half nodes don't support stacking.

TSMC considers nodes like N6 and N4 (which are NOT true half nodes) to be optimizations of N7 and N5, respectively. So it is quite possible that support for stacking on N7 means it will also support N6.

Reactions: Joe NYC, Tlh97 and Lodix

Upvote 0 Downvote

K

Kedas

Senior member

Jan 2, 2022

#1,220

Doug S said:
TSMC considers nodes like N6 and N4 (which are NOT true half nodes) to be optimizations of N7 and N5, respectively. So it is quite possible that support for stacking on N7 means it will also support N6.

Yes but the thing is that it is also shrinking and for stacking size is important hence extra work that needs to be done again so they may have decided to skip that part. (certainly since the amount of stacking request is limited for now)

Upvote 0 Downvote

Schmide

Diamond Member

Jan 2, 2022

#1,221

Mopetar said:
It's still running and just inserting NOPs because it doesn't know at the time how long it will need to stall because it can't know how bad the cache miss is until it's worked it's way through to that point.

NOPs are actual instructions and although they were designed to do no operation, they do take up space and used to have the effect of incrementing the instruction pointer. Although superfluous today because they are removed by the decoder, they are not inserted into the pipeline. Moreover, by the time the pipeline gets to the point where there is any possibility of complete stall, everything is in the μop cache anyways.

Regardless, with deep pipelines where instructions have latencies of 14+ clocks and register files with 150+ entries, there is plenty of things to do even with a 60+ clock to memory.

Reactions: Joe NYC and igor_kavinski

Upvote 0 Downvote

M

Mopetar

Diamond Member

Jan 2, 2022

#1,222

Schmide said:
NOPs are actual instructions and although they were designed to do no operation, they do take up space and used to have the effect of incrementing the instruction pointer.

Maybe it's architecture dependent, but I don't think you'd want to advance the program counter on a NOP, at least not in every case. Even in a simple pipeline there are plenty of cases where it can't advance just due to physical limitations. Anything with OOE an an actual program running probably has something else in the instruction queue that can be executed, but if you've got a simple program that's specifically designed to benchmark certain kinds of performance or better understand those characteristics of the chip it might not have that.

Suppose we've got some program written and specifically designed to test out cache performance in the CPU. Even the fastest caches are still usually at least 3 clock cycles which means that the next operation needs to be delayed until that data becomes available (wether that means being written to the register file or forwarded in some manner) so the processor needs some mechanism to stall on a specific instruction until it can actually execute it with the correct data.

But it's been well over a decade since I took a course in CPU architectures so it's entirely possible that the state of the art has advanced beyond that and I'm operating on some outdated assumptions or ideas. I mean it would be ideal if someone could figure out how to keep a CPU pipeline completely fed with potentially useful calculations, but I'm not sure if that's possible in reality for anything as complex as x86 and if that were the case it just likely means performance is being left on the table somewhere else to accommodate such a design.

Upvote 0 Downvote

D

deasd

Senior member

Jan 2, 2022

#1,223

Some new B2/A0 stepping SKUs incoming?

https://twitter.com/x/status/1477645529844940800

USB-IF

www.usb.org

USB-IF

www.usb.org

Reactions: Joe NYC, Tlh97 and lightmanek

Upvote 0 Downvote

Schmide

Diamond Member

Jan 2, 2022

#1,224

Mopetar said:
Maybe it's architecture dependent, but I don't think you'd want to advance the program counter on a NOP, at least not in every case. Even in a simple pipeline there are plenty of cases where it can't advance just due to physical limitations. Anything with OOE an an actual program running probably has something else in the instruction queue that can be executed, but if you've got a simple program that's specifically designed to benchmark certain kinds of performance or better understand those characteristics of the chip it might not have that.

Typically on x86 you say instruction pointer because instructions are variable size and you can't just increment. Risc and arm the the term program counter is more apt as you have fixed instruction size and use with relative offset data.

It the past nops were used for timing, alignment, self modifying code, and a few other edge cases. With out of order execution and decoupled decoders they really serve zero purpose.

Mopetar said:
Suppose we've got some program written and specifically designed to test out cache performance in the CPU. Even the fastest caches are still usually at least 3 clock cycles which means that the next operation needs to be delayed until that data becomes available (wether that means being written to the register file or forwarded in some manner) so the processor needs some mechanism to stall on a specific instruction until it can actually execute it with the correct data.

That's the beauty of out of order execution. It's just a bunch of queues and mapped registers. The load store unit need only move data in and out of the register file. When the data makes it there, it is sent down a pipeline to be worked on. As dependencies are met more data can be fed into the pipelines. Instructions don't execute in a cycle but multiple instructions can move through a pipeline with latency producing a cycle average throughput. The memory hierarchy (register file, cache, main memory) need only evict when more space is needed or an explicit barrier or flush is called.

Any program written to measure the memory hierarchy is not executing and measuring one instruction at a time. Typically it's iterating over a sequence of memory, measuring throughput, then increasing the span. (repeat)

So lets put this in context. We're talking about the difference in stalling a huge 32 MiB vs 96 MiB L3 multi way victim cache. So a span operation greater than 32 MiB 16 way. I think that falls into the "like custom purposefully bad code hard" category to stall it.

Reactions: Mopetar

Upvote 0 Downvote

N

NostaSeronx

Diamond Member

Jan 2, 2022

#1,225

Kedas said:
Yes but the thing is that it is also shrinking and for stacking size is important hence extra work that needs to be done again so they may have decided to skip that part. (certainly since the amount of stacking request is limited for now)

The shrink only occurs on the New Tapeout side. The Re-Tapeout side, has no shrink and gets the benefit of SAQP 193i to Single-Patterned EUV.

Die A w/ Mask Set A-193i-SAQP is the same exact Die A w/ Set-B-EUV-SP.

Upvote 0 Downvote

You must log in or register to reply here.

Share:

Facebook X (Twitter) Reddit Tumblr WhatsApp Email Link

TRENDING THREADS

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)
- Started by DisEnchantment
- Sep 29, 2022
- Replies: 24K
CPUs and Overclocking
T
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads
- Started by Tigerick
- Aug 22, 2022
- Replies: 20K
CPUs and Overclocking
Discussion Intel current and future Lakes & Rapids thread
- Started by TheF34RChannel
- Jun 18, 2017
- Replies: 23K
CPUs and Overclocking
Discussion Apple Silicon SoC thread
- Started by Eug
- Nov 10, 2020
- Replies: 10K
CPUs and Overclocking
Discussion Qualcomm Snapdragon Thread
- Started by FlameTail
- Nov 20, 2023
- Replies: 4K
CPUs and Overclocking

Top Bottom

This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Accept Learn more…

sale-70-410-exam | Exam-200-125-pdf | we-sale-70-410-exam | hot-sale-70-410-exam | Latest-exam-700-603-Dumps | Dumps-98-363-exams-date | Certs-200-125-date | Dumps-300-075-exams-date | hot-sale-book-C8010-726-book | Hot-Sale-200-310-Exam | Exam-Description-200-310-dumps? | hot-sale-book-200-125-book | Latest-Updated-300-209-Exam | Dumps-210-260-exams-date | Download-200-125-Exam-PDF | Exam-Description-300-101-dumps | Certs-300-101-date | Hot-Sale-300-075-Exam | Latest-exam-200-125-Dumps | Exam-Description-200-125-dumps | Latest-Updated-300-075-Exam | hot-sale-book-210-260-book | Dumps-200-901-exams-date | Certs-200-901-date | Latest-exam-1Z0-062-Dumps | Hot-Sale-1Z0-062-Exam | Certs-CSSLP-date | 100%-Pass-70-383-Exams | Latest-JN0-360-real-exam-questions | 100%-Pass-4A0-100-Real-Exam-Questions | Dumps-300-135-exams-date | Passed-200-105-Tech-Exams | Latest-Updated-200-310-Exam | Download-300-070-Exam-PDF | Hot-Sale-JN0-360-Exam | 100%-Pass-JN0-360-Exams | 100%-Pass-JN0-360-Real-Exam-Questions | Dumps-JN0-360-exams-date | Exam-Description-1Z0-876-dumps | Latest-exam-1Z0-876-Dumps | Dumps-HPE0-Y53-exams-date | 2017-Latest-HPE0-Y53-Exam | 100%-Pass-HPE0-Y53-Real-Exam-Questions | Pass-4A0-100-Exam | Latest-4A0-100-Questions | Dumps-98-365-exams-date | 2017-Latest-98-365-Exam | 100%-Pass-VCS-254-Exams | 2017-Latest-VCS-273-Exam | Dumps-200-355-exams-date | 2017-Latest-300-320-Exam | Pass-300-101-Exam | 100%-Pass-300-115-Exams |

http://www.portvapes.co.uk/ | http://www.portvapes.co.uk/ |