Vega/Navi Rumors (Updated)

Page 69 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Well the problem with all that technical data is the fact - currently game performance just tanks when hitting VRAM limits. It's not like current GPUs cannot stream/prefech data from RAM to GPU RAM, actually some games do that magic pretty fine now managing huge open wiorld games without any intermediate level loads. I am certain AMD has found ways to improve the prefetch, caching policies and so on.


Look, all that info is nice and dandy, but at the end of the day PCI3 X16 has a TOTAL of 16GB bandwith. Lets say you want 100 FPS, that is 163.84MB/s per frame MAX. and that is very very generous. Probably when all is said and done you have maybe a third of that due to overhead and other traffic and not streaming 100% of time. is 60MB per frame a lot? Well no.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
Look, all that info is nice and dandy, but at the end of the day PCI3 X16 has a TOTAL of 16GB bandwith. Lets say you want 100 FPS, that is 163.84MB/s per frame MAX. and that is very very generous. Probably when all is said and done you have maybe a third of that due to overhead and other traffic and not streaming 100% of time. is 60MB per frame a lot? Well no.
I have to say, I chuckled when reading your post .

Nothing rude, it is just oversimplification of situation, and the difference between Vega framebuffer, and memory pool, and the HBCC. You should not worry about this. The GPU knows what is doing, and because it is based on Infinity Fabric, is capable of understanding what can happen in the future, and "prepare" itself. There is a reason why there were rumors of next generation scheduling in Vega architecture .
 

Krteq

Golden Member
May 22, 2015
1,005
713
136
Well, there is still quite a big difference between how much VRAM application allocates and how much VRAM is app really using.

Yes, it's about prefetching, caching, compression and so on, but there are other techniques to optimize VRAM/RAM usage
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
With the talk about the "big install size," the HBCC, the 4GB option, and the Fiji GPU with an SSD/NAND/extra semi-local memory built onto it -

Is it possible maybe even consumer Vega will ship with a small onboard SSD that acts as a "2nd tier" VRAM that is swapped into the HBCC as needed for high speed data? Basically HBCC = on-interposer memory, 2nd tier is off interposer but on PCB? That would make all these other things fall into place - 4GB makes a lot more sense when that's just the high speed cache to a 32GB SSD or whatever else. Somewhat like the big iGPU FPS uplift you get from Crystallwell despite it being just 128mb. I'm purely speculating here
 
Last edited:

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
This is what sebbi wrote
"Brand new Frostbite GDC presentation is perfect example of this:
http://www.frostbite.com/2017/03/framegraph-extensible-rendering-architecture-in-frostbite/

Pages 57 and 58 describe their PC 4K GPU memory utilization. Old system used 1042 MB with 4K. New system uses only 472 MB. This is a modern engine with lots of post processing passes. Assets (textures and meshes) are obviously additional memory cost on top of this, and this is where a good fine grained texture streaming technology helps a lot (whether it is a fully custom solution or automatic GPU caching/paging solution)."

So the challenge is on a modern engine to prefetch textures and meshes? Seems doable to me in a game like bf1. Done properly i cant even imagine how low you can get. 2gb should be fine for 4k.
But bf1 is also a game where fps dips and frametime bumps is a huge pain. So it needs to be done 99.999% perfect. No slip. Interesting if it can be done so precise.
 

Snarf Snarf

Senior member
Feb 19, 2015
399
327
136
With the talk about the "big install size," the HBCC, the 4GB option, and the Fiji GPU with an SSD built onto it -

Is it possible maybe even consumer Vega will ship with a small onboard SSD that acts as a "2nd tier" VRAM that is swapped into the HBCC as needed for high speed data? That would make all these other things fall into place - 4GB makes a lot more sense when that's just the high speed cache to a 32GB SSD or whatever else. Somewhat like the big iGPU FPS uplift you get from Crystallwell despite it being just 128mb.

That would be an interesting development, the driver work to get that working properly in every engine seems like a big task though. I imagine this would all require HSA and would maybe only be possible on a full AMD platform unless they could get software companies to buy into this idea. 16 GB high speed DDR4 for great infinity fabric clocks on the CPU, HBM Cache on GPU with a M.2 64-128 GB enormous frame buffer for seemless streaming of assets. Sounds great on paper but making it work and turn into a performance benefit over the normal software paradigm would be an enormous undertaking.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
That would be an interesting development, the driver work to get that working properly in every engine seems like a big task though. I imagine this would all require HSA and would maybe only be possible on a full AMD platform unless they could get software companies to buy into this idea. 16 GB high speed DDR4 for great infinity fabric clocks on the CPU, HBM Cache on GPU with a M.2 64-128 GB enormous frame buffer for seemless streaming of assets. Sounds great on paper but making it work and turn into a performance benefit over the normal software paradigm would be an enormous undertaking.
Definitely - but they've kind've already started with the existing Fiji implementation http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-fiji-with-m2-ssds-onboard. That needs its own programming, but this HBCC piece with caching logic could be the missing element to make it work out of the box.

I'd expect it still needs developers to make tweaks to use it, but the HBCC may provide some API hooks and abstraction for the developer/drivers

Notice in his talk he always mentions "Off GPU" or "Off Chip" memory when he talks about HBCC. He doesn't say "Off Card." Is this intentional?
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
I think you guys are making simple thing/idea, enormously complex.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
I think you guys are making simple thing/idea, enormously complex.
Possibly yeah. But I doubt HBCC without anything more is going to work magic. You can't just magic current VRAM requirements into less. If you do what Sebbi describes for BF1 you have to have architecture-level developer support in the game itself. It would be as large an undertaking as programming for Vulkan or DX12 vs DX11. Which very well could be the case, but that would be another long term play on AMD's behalf
 
Reactions: Arachnotronic

Snarf Snarf

Senior member
Feb 19, 2015
399
327
136
I think you guys are making simple thing/idea, enormously complex.

You opened the can of worms by linking Sebbbi lol. He goes on to mention that there will come a time when we need an extra cache layer somewhere in the stack beyond just VRAM and system RAM, similar to Optane by Intel.
It's two sides of the same coin, you can have massive frame buffers and lazy engine devs with the SSD storage solution with a very fast cache, or you can go AMD's route which is to make the hardware aware of what assets need to be streamed in and when with low latency and high bandwidth.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,028
3,800
136
Well the problem with all that technical data is the fact - currently game performance just tanks when hitting VRAM limits. It's not like current GPUs cannot stream/prefech data from RAM to GPU RAM, actually some games do that magic pretty fine now managing huge open wiorld games without any intermediate level loads. I am certain AMD has found ways to improve the prefetch, caching policies and so on.


Look, all that info is nice and dandy, but at the end of the day PCI3 X16 has a TOTAL of 16GB bandwith. Lets say you want 100 FPS, that is 163.84MB/s per frame MAX. and that is very very generous. Probably when all is said and done you have maybe a third of that due to overhead and other traffic and not streaming 100% of time. is 60MB per frame a lot? Well no.

Except you just have gone and ignored 1/2 my post, which is about data locality, objects will live for far longer then one frame in all likeliness 1000's to millions of frames. It will then only push out data that hasn't been used/doesn't look like being used. This is completely different to the driver constant swapping/paging the same data back of forth because it is to slow to selection data to move and has little idea of what is important or not.

You realize your argument is that hardware caching, prefetching and prediction doesn't work, yet we have relied heavily on them for the last 30 years on the CPU side. On the GPU side the fact is bandwidth and amount of memory isn't scaling with amount of compute , consider R300 had ~34gflop and 20gb bandwidth and 256mb of memory, on the highend we now have 1/20th of bandwidth per flop and like 1/100th of memory and those ratios will only grow. The concept of HBCC is being forced by the physical realities of this.
 
Reactions: w3rd and psolord

Mopetar

Diamond Member
Jan 31, 2011
8,365
7,458
136
To some degree that's a result of memory compression getting better and the nature of games and how at a certain point it becomes difficult to use exponentially more memory.

I think memory use will increase (or require larger quantities) once open world games can increase draw distance to show more varried terrain and environments in high resolution settings with the pixels to allow for that detail.

I think HBCC will be a bigger deal for compute and HPC applications than it will for gaming. If it means AMD can skimp on VRAM and not have a big performance hit I only care if that means I pay less. But HBCC is extra die space so it's probably break even at best.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
I think memory use will increase (or require larger quantities) once open world games can increase draw distance to show more varried terrain and environments in high resolution settings with the pixels to allow for that detail.

Nintendo somehow managed to make one of the best open-world games ever on literally a tablet SoC (Tegra X1) with 4GB of shared RAM at far lower bandwidth than even entry-level GPUs. Granted, it runs at 900p max, but even 5-6x the resources for 4K would still be within reach of a modern midrange PC card. Breath of the Wild demonstrates just how terribly optimized most PC AAA titles really are.

That said, AMD needs to take into account that game devs are lazy bastards who won't lift a finger. HBCC is only going to make low-VRAM configurations viable if it doesn't require any work on the developer's part to implement. It has to be completely transparent and in the background.
 

Wall Street

Senior member
Mar 28, 2012
691
44
91
Nintendo somehow managed to make one of the best open-world games ever on literally a tablet SoC (Tegra X1) with 4GB of shared RAM at far lower bandwidth than even entry-level GPUs. Granted, it runs at 900p max, but even 5-6x the resources for 4K would still be within reach of a modern midrange PC card. Breath of the Wild demonstrates just how terribly optimized most PC AAA titles really are.

That said, AMD needs to take into account that game devs are lazy bastards who won't lift a finger. HBCC is only going to make low-VRAM configurations viable if it doesn't require any work on the developer's part to implement. It has to be completely transparent and in the background.

This is a poor argument. While the new Zelda is a good open-world game, there is a reason it runs on such low memory. If flat textures and cartoon style cell shaded lighting models were acceptable for all games, we could go back to GeForce 2 cards. The reason most AAA games need much more memory is that most textures in the real world (cloth, metal, skin, wood grain) don't easily compress like the flat colors of Zelda and most games have lighting models which require normal data and radeosity data unlike Zelda's cartoon style. This all cost memory to get the realistic effect.
 
Reactions: Bacon1 and Yakk

tential

Diamond Member
May 13, 2008
7,348
642
121
This is a poor argument. While the new Zelda is a good open-world game, there is a reason it runs on such low memory. If flat textures and cartoon style cell shaded lighting models were acceptable for all games, we could go back to GeForce 2 cards. The reason most AAA games need much more memory is that most textures in the real world (cloth, metal, skin, wood grain) don't easily compress like the flat colors of Zelda and most games have lighting models which require normal data and radeosity data unlike Zelda's cartoon style. This all cost memory to get the realistic effect.
The reason it costs so much memory is because they're bad.

How many times have we seen random forum users or gamers or whoever do a far better job at textures/graphics for the resources used?

If a game allows mods, I do not play it until someone has redone the games textures/graphics.

Look at the "HD Texture packs" from a AAA Studio.

LOL

AAA Studio doesn't mean optimized game..... far from it.
 

beginner99

Diamond Member
Jun 2, 2009
5,312
1,750
136
Notice in his talk he always mentions "Off GPU" or "Off Chip" memory when he talks about HBCC. He doesn't say "Off Card." Is this intentional?

I always wondered that as well. Why not just put 16 GB of DDR4 on the board? Or does that need an additional DDR4 controller besides the HBCC? This could also be a user option to add the cache.

And yeah if fast enough as you mention a small SSD. That would be a twist if this AMD tech would give the otherwise useless Intel optane cache drive a great usage scenario.
 

imported_jjj

Senior member
Feb 14, 2009
660
430
136
Vega has 512GB/s data fabric and with HBCC, aren't you guys curious how AMD will do dual cards?
They might get 20TFLOPS inside 300W and get it to scale properly.
 
Reactions: w3rd

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
So why vendors are bothering with increasing memory amounts? Must be some anti-AMD conspiracy, 256kb + PCIE must be enough.

Does it occur to you guys that for example Sony with it's 8GB of GDDR has advantage over MS who are using DDR3 + "HBCC" of their own. Caching schemes are always like that, difficult to manage, prone to fail.

The reason why vendors are bothering with increasing memory is simple. It's much easier to set up a memory management system that is wasteful with VRAM usage than it is to set up one that is efficient with VRAM usage.

As previously mentioned by Zlatan, current memory management systems loads tons of data into VRAM that isn't actually used, data that would be perfectly happy sitting in system RAM and be streamed as necessary instead. Problem is that streaming data from system memory as necessary requires a significantly more advanced memory management system, something that most developers simply haven't gotten around to making yet (partly because they haven't had to due to large amounts of VRAM being available).

So basically, vendors are increasing memory amounts as a hardware solution to a software problem, in other words developers are being wasteful with the memory management systems, and so in return vendors have to be wasteful with their VRAM amounts to compensate.

AMD's HBCC is essentially a more direct solution to the wasteful memory management issue, since it tackles the problem head on (by reducing the waste), instead of circumventing it (by adding more VRAM) as has traditionally been done.

Look, all that info is nice and dandy, but at the end of the day PCI3 X16 has a TOTAL of 16GB bandwith. Lets say you want 100 FPS, that is 163.84MB/s per frame MAX. and that is very very generous. Probably when all is said and done you have maybe a third of that due to overhead and other traffic and not streaming 100% of time. is 60MB per frame a lot? Well no.

You didn't actually answer my question about how data you think needs to be sent over PCIe for a new frame, but apparently you think it's more than 60MB.

Well, since we're quoting Sebbbi anyway, it's worth noting that he had an example where his game only required 5 MB per frame. Now this was with texture pool size of 256MB, whereas something like UE4 has a default size of 1024MB I believe, but that would then still only be 20MB per frame.

Also why do you think only a third of the PCIe bandwidth is available for asset data streaming? what other data could possibly use up the remaining bandwidth? do you really think sending command lists and the like takes up that much bandwidth?
 
Last edited:
Reactions: Bacon1 and Headfoot

french toast

Senior member
Feb 22, 2017
988
825
136
And the post is from sebbbi. Look nowhere else for information, and knowledge of GPU architectures, and game development.
Second that, been a member there for years, sebbi certainly is considered an expert in game development and knowledge of gpu.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Also why do you think only a third of the PCIe bandwidth is available for asset data streaming? what other data could possibly use up the remaining bandwidth? do you really think sending command lists and the like takes up that much bandwidth?

It's basic math, PCIE bw after overhead is maybe 80%, and you have 10ms for each frame, quite obviously you can't make use of last miliseconds, cause that's when frame has to be ready right? So that leaves less time, and during that time you are competing with other data.
Currently framerate just tanks once game runs out of VRAM ( just like CPU performance tanks when apps start taking hard page faults ), therefore it is easy to assume that same will continue to happen in the future.

There are reasons why NV is going for NVLink with 160GB/s, PCIE is not enough. But that has nothing to do with desktop as we are stuck with PCIE for years.

So despite fancy buzzwords, you can only get maybe 50-100MB per frame and all that buzzword caching is focusing on compression and hw assisted caching/prefetching of hot assets, while minimizing impacts. It is definately not going to use hUMA and HSA buzzwords on desktop as they are unsupported on Intel and very likely are going to be unsupported on Windows either.
 

beginner99

Diamond Member
Jun 2, 2009
5,312
1,750
136
It's basic math, PCIE bw after overhead is maybe 80%, and you have 10ms for each frame, quite obviously you can't make use of last miliseconds, cause that's when frame has to be ready right? So that leaves less time, and during that time you are competing with other data.
Currently framerate just tanks once game runs out of VRAM ( just like CPU performance tanks when apps start taking hard page faults ), therefore it is easy to assume that same will continue to happen in the future.

There are reasons why NV is going for NVLink with 160GB/s, PCIE is not enough. But that has nothing to do with desktop as we are stuck with PCIE for years.

So despite fancy buzzwords, you can only get maybe 50-100MB per frame and all that buzzword caching is focusing on compression and hw assisted caching/prefetching of hot assets, while minimizing impacts. It is definately not going to use hUMA and HSA buzzwords on desktop as they are unsupported on Intel and very likely are going to be unsupported on Windows either.

And what if you put some DDR4 on the card? Or an SSD as with the Radeon pro? You can then load the stuff onto the card and don't have any pcie bandwidth or latency penalty.
 
Reactions: w3rd

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Look where intel got with a puny 128MB or ms with 47MB sram in the xbox.
Perhaps in vega the problem is actually by far mostly bandwith and not vram size. Lol. They probably beef up L2 at the same time.
Damn interesting to see how this plays out.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
It's basic math, PCIE bw after overhead is maybe 80%, and you have 10ms for each frame, quite obviously you can't make use of last miliseconds, cause that's when frame has to be ready right? So that leaves less time, and during that time you are competing with other data.

So with 80% bandwidth, that leaves you with 131MB per frame at 100 FPS. If only 60MB is available to asset data, then something else must be using the remainder. Again what is this other data, that uses up ~70MB per frame, you can't just handwave it and say "other data", and expect to be taken seriously, be specific please?

And no you can't necessarily use the last milliseconds, but remember that when you're looking at how long you have to send your data, it's not a question of the frame interval time (which is of course 10 milliseconds at 100FPS), but rather your frame latency (which will quite likely be in the 40-50 millisecond range at 100 FPS).

Currently framerate just tanks once game runs out of VRAM ( just like CPU performance tanks when apps start taking hard page faults ), therefore it is easy to assume that same will continue to happen in the future.

The whole point is that with either a proper fine grained memory streaming setup, or hardware setup like HBCC, you will waste significantly less VRAM, and thus won't run out of VRAM.

So if such systems are widely implemented (which I'll agree is far from certain), then we won't see such issues continue to happen in the future.

There are reasons why NV is going for NVLink with 160GB/s, PCIE is not enough. But that has nothing to do with desktop as we are stuck with PCIE for years.

Nvidia isn't going with NVLink because PCIe is not enough for gaming purposes, they are going for NVLink because PCIe is not enough for GPGPU/HPC purposes.

So despite fancy buzzwords, you can only get maybe 50-100MB per frame and all that buzzword caching is focusing on compression and hw assisted caching/prefetching of hot assets, while minimizing impacts.

And as mentioned above 50-100MB per frame is fine when you only need 5-20MB. That's the whole point.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
So with 80% bandwidth, that leaves you with 131MB per frame at 100 FPS. If only 60MB is available to asset data, then something else must be using the remainder. Again what is this other data, that uses up ~70MB per frame, you can't just handwave it and say "other data", and expect to be taken seriously, be specific please?

And no you can't necessarily use the last milliseconds, but remember that when you're looking at how long you have to send your data, it's not a question of the frame interval time (which is of course 10 milliseconds at 100FPS), but rather your frame latency (which will quite likely be in the 40-50
millisecond range at 100 FPS).
.

GPU renders a frame, here and now, and data is not in VRAM. It needs 20MB, how many miliseconds it has to get it, to keep 100FPS? What about 50MB? 100MB? What about DRAM latency, what a about GPU DMA controller latency, how does that impact bandwidth?
I can't believe i am discussing with a person who talks about latency of 50ms and 100FPS in same sentence. What's next? GPU with AMD patented time machine that knows what data to prefetch perfectly or relevation that AMD has long solved P=NP problem and is about to reveal GPU containing it.

And as mentioned above 50-100MB per frame is fine when you only need 5-20MB. That's the whole point.

Who told you that? What games has that expert shipped? Can i expand your assumption to GPU with 1GB of VRAM? What about 256MB? This whole assumption that you can get what frame needs over PCIE during rendering it is crazy and does not pass common sense checks as GPU has half of terabyte of BW, and PCIE has 16GB before overhead.

Anyway I am done with this thread. Some clear tendencies here: "attack the messenger", "learn about pointers", " but hUMA is supported on future desktop system, you are wrong and i will keep bringing this point and not take you seriuos", "these calculations have error of 10MB/frame, lets ignore orders of magnitude in bw/latency between PCIE/ local VRAM"
 
Reactions: xpea
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |