Saying that the memory bandwidth for GCN 1.0 and 1.1 is very inefficient is quite a statement.
It's not really an opinion, it's a fact.
Based on AMD's own comments, Tonga has
at least 40% more memory bandwidth efficiency vs. GCN 1.0/1.1. However, 3dMark Vantage results show better than that, close to
double.
In my previous post, AT's testing shows an R9 285 providing 20% higher 3dMark Vantage fill-rate but it only has 176GB/sec memory bandwidth vs. 320GB/sec of the R9 290. Both Tonga and Hawaii have the same rasterization rate of 4 triangles/clock. TechReport shows similar numbers with Tonga having 70% more performance over the R9 280/7950B despite 176GB/sec vs. 240GB/sec of the 7950!
If Tonga has 40-100% more efficient memory bandwidth, that means GCN 1.0 7970 and GCN 1.1 290X
are inefficient.
AMD's engineers aren't fools, so I don't think they would add a 512 bit memory bus to Hawaii without serious consideration, as that would increase the power consumption significantly.
They aren't fools, but they went with 512-bit memory controller not for the reason one might think. They were targeting memory
bandwidth/mm2 efficiency to save die space. They did not target a 512-bit bus in order to increase the memory bandwidth throughput efficiency.
The main benefits of the 512-bit bus were 20% less die area than a 384-bit Tahiti XT controller, while increasing the total available memory bandwidth by 20%. Yet, both Hawaii and Tahiti still incur at least a 40%+ penalty for their total memory bandwidth.
That means 7970's 264GB/sec is really "Tonga's/GCN 1.2 equivalent" of 189GB/sec, while R9 290X's 320GB/sec is at best 229GB/sec of GCN 1.2. That's why while 7970/290X/s memory bandwidth looks great on paper, in the real world, it's more like marketing. Like a 700 HP car (Challenger Hellcat) that can't put it's power to the ground as well as a 500 HP all-wheel drive sports car (Nissan GT-R).
Also, just because your benchmark scores don't go up that much from overclocking your memory, doesn't mean the architecture is inefficient with memory bandwidth. It could also mean that the computational units aren't bandwidth limited. Some GPUs aren't bandwidth limited at all, even at stock memory speeds.
I get that but my point was piling on extra memory bandwidth on cards like 7970/R9 280X/290X does very little to boost their performance and probably like you said because they don't have enough functional units. Now with 390X (or 380X?), what if your goal is to increase performance 50%, suddenly add Tonga's 40% increase in efficiency and add HBM to reduce power usage.
When I had my GTX 770s, overclocking the memory yielded a larger performance increase than overclocking the core, so it was definitely bandwidth starved to a large degree.
I never owned 770s or tested them but AT's testing shows that the card was more GPU than memory bandwidth limited. With GPU clocks up 9% and memory clocks up 14%, the average net gain was 9-12%.
http://www.anandtech.com/show/6994/nvidia-geforce-gtx-770-review/17
Either way, since my point was GCN 1.0/1.1 inefficiency, how 770 scales with extra bandwidth is not related.
It's impressive, because bandwidth is crucial at higher resolutions and the GTX 980 has much less raw bandwidth than Hawaii.
We don't know that. What if 980 and Tonga have similarly efficient memory bandwidth? That means in "980/Tonga" terms, R9 290X's 320GB/sec is only as high as 229GB/sec!!! What if 980 is even more memory bandwidth efficient than Tonga itself? That's why comparing memory bandwidth between AMD and NV is a waste of time most of the time. But we can make more interesting comparisons of what major improvements 300 series will have based on what AMD has already done with Tonga.
See, since 980 barely beats a 290X at 4K, 380X will have the benefit of at least 40% more efficient memory bandwidth AND piles more of it due to HBM.
This is kind of a contradiction to what you said earlier. You claimed that Tahiti and Hawaii were inefficient with their bandwidth. So if thats the case, how would increasing their bandwidth even more make it beat the 980 at 4K if it can't already do so with 43% greater bandwidth compared to the GTX 980?
That's only greater on paper. While I probably should backtrack that improving R9 290X's memory bandwidth would automatically let it beats a 980 since it could still be more ROP/TMU/SP limited, I don't think your comparison of Maxwell's memory bandwidth to GCN 1.1 is relevant at all. The only way that would work is if both Maxwell and GCN 1.1 had identical memory bandwidth architecture efficiency, but we don't know that.
Thing is, a GPU's bandwidth usage is tied to it's computational ability. So increasing bandwidth will not increase performance if the GCN cores and ROPS aren't starved for bandwidth.
You are right.
Yes, those numbers are impressive, but like I said above, unless the R9 390x's computational units can utilize the bandwidth, then it won't make a difference.
We have to assume that AMD's engineers wouldn't increase bandwidth beyond what the GPU is capable of using....but we'll have to see.
I would not assume that one. AMD's engineers piled way too much memory bandwidth for an HD4890 and 7950/7970.
HD4870 = 115GB/sec
HD4890 = 129GB/sec
HD5850 = destroys them both with only 128GB/sec, meaning HD4870/4890 had too much wasted memory bandwidth available.
HD7970 = 264GB/sec
HD7970Ghz = 288GB/sec
R9 290X = 320GB/sec but the performance is 30-35% faster, despite only 21% and 11% greater memory bandwidth, meaning HD7970 had too much wasted memory bandwidth.
Tonga though shows that both 7970 and R9 290X are wasting memory bandwidth. :sneaky:
If 390X has 512-640GB/sec, it might also be 'too much' of a good thing but if HBM starts at 1GB/sec, you have your minimum speed there and well for marketing reasons it sounds nice.