NVIDIA Pascal Thread

Page 76 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Actually it is since 290/x is already faster than 970 in dx12 or any situation where its running optimized games, since gcn>polaris is a bigger difference than maxwell>pascal, it's possible that 40cu polaris chip can beat 970 at 90-110w range. but green eyes won't see it. in any case we'll see it in 2-3 months.

People already dream of Fury X performance. Hawaii was passed long ago.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
4.2 TFLOPs GTX 980 was able to outperform 5.3 TFLOPs GTX 780 Ti in compute, OpenCL environment. And outperform it by a huge margin. Two things come to my mind. Nvidia GPUs may have 5 or even more TFLOPs of compute power but they are held back by something in their design of executing compute tasks, or drivers gimp Kepler performance, and same thing will happen with Maxwell.
Look at this: http://images.anandtech.com/graphs/graph8526/67745.png
http://images.anandtech.com/graphs/graph8526/67744.png

GP106 with 1280 CUDA cores and 1480 core clock will have 3.8 TFLOPs of compute power.
GM204 used in GTX 970 has 3.4 TFLOPs of compute power.

Big difference?

Other thing is the performance. TSMC 16 nm process provides 60% lower power consumption, or 30% increase of frequency compared to 28 nm. 1280 CUDA core GPU is slightly bigger than GM206. At the same time it would not bring lower power consumption, and much higher frequency(which also would increase power consumption). 1024 CUDA core GPU with 1178 MHz has 120W of TDP, and around that power consumption. Im having hard time believing GP106 will have higher performance than that and lower than 75W power consumption. It would be true miracle if it would be possible.
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Hh also i forgot one thing, drivers which can also be used to nerf 970 and that memory configuration will take the blame. LOL


im not saying there won't be any p/w improvements. but you need to look at bandwidth too, there's a limit to how one can make it efficient. since maxwell already has color compression and there's a very little chance of 1070 using gddr5x i can't see it delivering that kind of performance in 75w envelope. specially with less bandwidth than what 970 has.


NV has very inefficient memory controllers.
Short version is nv GPUs can't achieve they theoretical bandwidth. Maybe Pascal will improve on that front?
@Mahigan could probably explain it better than me.
 

kraatus77

Senior member
Aug 26, 2015
266
59
101
By definition.

Comparing it to AMD doesn't make it any less inefficient.
Nv has more efficient memory controller vs amd but it's already limited by what it has.

amd's memory controller is less efficient vs nv, but it has more than what it needs. (fury is exception because it has more wider but slower memory)

you want to test ? down clock 380x/980's memory bandwidth by 25% and you will see 980 will take a bigger hit in performance %wise compared to 380x. both gpus hve color compression too.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
Nv has more efficient memory controller vs amd but it's already limited by what it has.

amd's memory controller is less efficient vs nv, but it has more than what it needs. (fury is exception because it has more wider but slower memory)

you want to test ? down clock 380x/980's memory bandwidth by 25% and you will see 980 will take a bigger hit in performance %wise compared to 380x. both gpus hve color compression too.

That is because ROPs in Maxwell arch are starved for bandwidth. You narrow the stream, and you get even less performance. Simple as it can be.

What worries in this context is that Pascal GP104 will also have 64 ROPs, and... more cores to feed. Imagine 2560 CUDA core 1.48 GHz GPU with 8000 MHz, GDDR5X, 256 Bit memory, and 64 ROPs. Will it be enough? We will have to wait and see...
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Nv has more efficient memory controller vs amd but it's already limited by what it has.

amd's memory controller is less efficient vs nv, but it has more than what it needs. (fury is exception because it has more wider but slower memory)

you want to test ? down clock 380x/980's memory bandwidth by 25% and you will see 980 will take a bigger hit in performance %wise compared to 380x. both gpus hve color compression too.

Architecture bottlenecks, color compression, etc have nothing to do with how efficient memory controller is.

I don't know why you want to drag AMD into this...

Here is a great post explaining it:
http://forums.anandtech.com/showpost.php?p=38148709&postcount=30
5&6. In order to understand what I mean by ROp to cache or ROp to Memory Controller ratio, we need to look at a schematic of GM107.
GM20x differs from GM107 in that NVIDIA increased the ROps ratio from 8:1 to 16:1. So lets look at both GM204 and GM200.
GM204
- 64 ROps divided by 16 = 4.
- 2MB of L2 cache divided by 4 = 512KB.
- 256bits/4 = 64bits
- Each grouping of 16 ROps has 512KB L2 cache and a 64-bit memory controller at its disposal (aside from the color cache).

GM200
- 96 ROps divided by 16 = 6.
- 3MB of L2 cache divided by 6 = 512KB.
- 384bits/6 = 64bits
- Each grouping of 16 ROps has 512KB L2 cache and a 64-bit memory controller at its disposal (aside from the color cache).

The result is that there isn't enough bandwidth to feed these ROps and they're consistently 10GPixel/s behind their theoretical throughput. This is without any other work straining the memory controller or L2 cache as seen here:

NVIDIA thus, knowing this was a limitation, invested heavily in color compression algorithms in order to reach parity, or near parity, with Fiji and its 64 ROps as seen here:

This issue is further compounded by the inefficient memory controllers used by GM20x. NVIDIA had to sacrifice efficiency in order to keep die size down and power usage low as seen here:

263/320=82%

So no, AMD has more efficient memory controllers.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
One of your benchmarks with actual compare shows otherwise.

Fiji gets ~65%
GM204 gets ~77%
GM200 gets ~70%
Hawaii gets ~76%

263/320=82%

The 290X in the test got 346 and not 320.

And ask yourself why for example the GM200 scores less than GM204. Or Fiji less than Hawaii. Because its not due to the memory controller efficiency.
 
Last edited:

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
Beautiful, but I hate the name. Geforce X80 sounds better.

Agreed.

"Hey regular consumers, buy this GTX 1080 for 4K and VR even though it has the number you associate with your old TV (1080p) on it!"

If that is really it then that is the biggest Nvidia marketing screw up in a while.
 

kraatus77

Senior member
Aug 26, 2015
266
59
101
Architecture bottlenecks, color compression, etc have nothing to do with how efficient memory controller is.

I don't know why you want to drag AMD into this...


263/320=82%

So no, AMD has more efficient memory controllers.
Not dragging amd to this just stating what it is.

333/512=65% so it just provs my point. since fury does have color compression too. but i'd like to see those benchmark with 380x though since it has same memory config as 980.


i like reading Mahigan's posts because it's mostly very informative. but the fact nvidia can deliver same or better performance with same/less memory bandwidth just shows they are able to handle available bandwidth more efficiently, rop/shaders etc. doesn't matter as long as they are able to deliver same or better performance with same/less memory bandwidth. this also means amd has more chance of delivering better performance at 256bitgddr5@7-8ghz than nvidia because there's still some room left to improve. nvidia on the other hand are already limited by bandwidth and already have color compression too, so there's very little they can do.
 
Last edited:

Genx87

Lifer
Apr 8, 2002
41,091
513
126
They'll go multi die acting as one larger gpu. Not impossible if designed properly from the getgo.

I'd think that would add in costs and complexity. I would think a bigger single die is probably less expensive. Unless they try a multi-gpu card or something like they have in the past. But that requires CF support for it to be effective. It sounds like you are thinking multi-gpu but with some kind of scheduling engine above to feed each gpu efficiently?
 

Adored

Senior member
Mar 24, 2016
256
1
16
I'd think that would add in costs and complexity. I would think a bigger single die is probably less expensive.

Once you pay for the interposer everything else on it is basically cheaper based on smaller die sizes. I'd guess an interposer won't come in at much over $20 but I haven't looked closely at it.
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
Once you pay for the interposer everything else on it is basically cheaper based on smaller die sizes. I'd guess an interposer won't come in at much over $20 but I haven't looked closely at it.

That may be but remember there is still limitations on said design. Intels original multi-core started at the board level and done via chipsets. Then they moved to two chips on a single PCB. Then built them right on the same die. There are reasons why when possible CPU manufacturers move everything onto the die. There are performance and power costs associated with distances. And in reality GPUs have that logic already built into the die. The execution units are really cores and the logic that feeds them are acting like an interposer.
 

Adored

Senior member
Mar 24, 2016
256
1
16
That may be but remember there is still limitations on said design. Intels original multi-core started at the board level and done via chipsets. Then they moved to two chips on a single PCB. Then built them right on the same die. There are reasons why when possible CPU manufacturers move everything onto the die. There are performance and power costs associated with distances. And in reality GPUs have that logic already built into the die. The execution units are really cores and the logic that feeds them are acting like an interposer.

This is a good read - http://www.eecg.toronto.edu/~enright/Kannan_MICRO48.pdf

It gets eye-watering pretty fast but even the first few pages are worth it. For me this is the only obvious way forward for all the big tech companies.
 

rainy

Senior member
Jul 17, 2013
518
445
136
So between August-September for desktop GP106.
That's earlier than Bits and Chips predicted (Q4).

Since when autumn beginning in August?
Officially it starts in third decade of September and that's very close to Q4.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,143
136
Since when autumn beginning in August?
Officially it starts in third decade of September and that's very close to Q4.

Headline says autumn but the article mentions August-September.

The absence so far nailed a date for the launch of the GP106, but the period spoken of is August and September.

Great news for mainstream gamers (except fanboys), hopefully competitively priced with Polaris parts.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Nobody is doubting anything. but to say we'll get 970's performance without any power connector is nonsense. it will require same amount of bandwidth + more than 2x p/w.

For what it's worth, the last time we had a node jump (from 40nm to 28nm), Nvidia achieved a 2.2x improvement in p/w for their ~230mm2 die (GF116 to GK106, or GTX 550 Ti to GTX 660). This was basically entirely due to increase in performance and not a decrease in power usage.

As such an increase in performance of more than 100% for GP106 over GM206 isn't entirely impossible, however it's worth noting that Nvidia traditionally uses their increases in efficiency to drive performance not power usage (Maxwell being the obvious exception). As such a sub 75W GP106 seems unlikely, but an increase in performance of 100% or more over GM206 doesn't necessarily (although that would put GP106 between 980 and 980 Ti in performance, which admittedly sounds crazy).

For sub 75W GPUs we will probably have to wait for GP107 (GX107/8 GPUs have traditionally been the sub 75W GPU for Nvidia, alternatively severely cut down GX106 chips have also been used). Assuming a 2x increase in p/w over GM107 and we something roughly in between 960 and 970 in performance, not 970 and 980.

So I guess what I'm trying to say is that the necessary increase in p/w isn't the biggest hurdle here, but that doesn't mean that there isn't several other issues with the rumor (market segmentation between GP107/GP106, bandwidth etc.).

Also it's worth noting that although Nvidia achieved a 120% increase in p/w for GK106 over GF116, it was much less for GK104 and GK110 over their respective predecessors (50-70%).
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Not dragging amd to this just stating what it is.

333/512=65% so it just provs my point. since fury does have color compression too. but i'd like to see those benchmark with 380x though since it has same memory config as 980.


i like reading Mahigan's posts because it's mostly very informative. but the fact nvidia can deliver same or better performance with same/less memory bandwidth just shows they are able to handle available bandwidth more efficiently, rop/shaders etc. doesn't matter as long as they are able to deliver same or better performance with same/less memory bandwidth. this also means amd has more chance of delivering better performance at 256bitgddr5@7-8ghz than nvidia because there's still some room left to improve. nvidia on the other hand are already limited by bandwidth and already have color compression too, so there's very little they can do.

Nvidia is definitely closer to the wall in bandwidth constraints than AMD, but not so much that they can't make gains at current bandwidth levels. http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/22 AMD has more... "opportunity" in the bandwidth conservation department, but that means they'll have to dedicate a higher percentage of transistors to it with Polaris vs. previous GCN than Nvidia will going to Pascal from Maxwell. Pro's and con's, give and take.

With 224 gb/s, GTX 980 still averaged a 13% OC on Anandtech's review 19 months ago. Core and memory OC combined was 18%, albeit with only a mediocre memory OC. Still, that shows with 8 gbps chips, a 256-bit bus will get 14% more bandwidth right off the bat. Coupled with some minor improvements and the available scaling room left as demonstrated by OC'ing GTX 980 core only, there is definitely room for slightly better than GTX 980 TI performance at 256 gb/s. 10 gbps GDDR5X will get 320 gb/s bandwidth and should have zero problems outperforming Titan X in all situations.
 
Last edited:

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com

swilli89

Golden Member
Mar 23, 2010
1,558
1,181
136
Makes sense. Nvidia doesn't roll out entire product lines all at once. In fact, neither company has intro'd it's entire new lineup all at once when it's all new chips. Coupled with the fact that several companies announced low-power GTX 950's within the past month probably means they are working hard to clear out surplus inventory.

Yeah this is par for the course for nvidia releases. This shows confidence in their higher end GP104 chips. Only thing is they will be hurting in the low power segment as Polaris 11 will probably be available in June and will beat the 950/960 soundly.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
Not dragging amd to this just stating what it is.

333/512=65% so it just provs my point. since fury does have color compression too. but i'd like to see those benchmark with 380x though since it has same memory config as 980.


i like reading Mahigan's posts because it's mostly very informative. but the fact nvidia can deliver same or better performance with same/less memory bandwidth just shows they are able to handle available bandwidth more efficiently, rop/shaders etc. doesn't matter as long as they are able to deliver same or better performance with same/less memory bandwidth. this also means amd has more chance of delivering better performance at 256bitgddr5@7-8ghz than nvidia because there's still some room left to improve. nvidia on the other hand are already limited by bandwidth and already have color compression too, so there's very little they can do.
No it doesn't prove your point. That's a ROP limitation on FuryX causing the low memory bandwidth utilization.

GCNs 64 ROps are tapped out by Fiji's use of HBM memory in that test whereas Hawaii had its ROPs memory bandwidth limited with GDDR5.

Basically, Fiji has more than enough bandwidth to spare by using HBM but lacks ROPs to make any true use of the available memory bandwidth.

This allowed AMD to use the extra memory bandwidth for their 4GB framebuffer tunning. Transferring data from system memory to the framebuffer and back based on priority allowing Fiji the ability to run full speed even when a game went over the 4GB framebuffer. AMD have two engineers dedicated to that task. Rise of the Tomb Raider is an example of that fine tunning.

So no, AMD don't have weaker memory controllers, their memory controllers are superior to NVIDIAs. Heck, GCN's entire memory hierarchy, from cache to framebuffer is more robust than NVIDIAs.

What NVIDIA excelled at, especially with Maxwell, is color compression. This made up for NVIDIAs inferior memory controller's and cache hierarchy in Maxwell.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |