Vega/Navi Rumors (Updated)

Page 117 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
Vega x2 is a thing.
And AMD's infinity Fabric is their future.

I am a keen observer of the industry, and given Dr Su's & Raj's comments over the latest months, AMD trajectory isn't hard to place. For Gamers and end-users, AMD is going to take the crown.



Lets face it, AMD now has better support and drivers than NVidia. They have a unified architecture (hsa) that AMD has aggressively been working towards for 7 years and that research and development and effort is now coming together and starting to pay off (APU).

But it is not hard to figure that AMD's "Infinity Fabric" plays a big role in all of this. And so does their partnerships with Hynix, thus HBM2 and the new "unified cache controller". AMD development into HSA has given them an ahead start on things like Vega X2 (ie: $799 TitanXp.?), which allows them to showcase their fabric with an array of multi-gpu chips.


Not important? Ask yourself this... if baby vega is nearly equal to GTX1080, then how much wattage would two 1080's be in SLI. Then what would two 1080 in SLI gpu scaling be..? Cost for end-user to buy two 1080's..?

Then realize, that two baby vegas sitting on fabric don't suffers from any of those^ problems. It gives AMD tremendous value. And that is something their competitors typically have never offered.

Infinity fabric is going to be about as big a win as Hypertransport was with the Athlon64 for AMD. I bridge mesh that works as low a level as connecting two Core or GCU modules together, but also as comm link that go through die on the same package, to other modules on different socket, even communicating through other connection types. On top of that it can also be used for GPU to CPU communication, which when you add HBCC to the equation magic happens and that is without having HBM in there yet.

Also to the other doubting scalability. Keep in mind that AMD isn't approaching it like Intel would, at least not now. Right now AMD has 4 different configurations available on their CPU's. In a little while they with have at least a 12c, 16c, 24c, 32c, configurations added to that. That is 8 configurations. Maybe a 2c4t R3, if but let's say 9. That is all Zepplins as far as we can tell. Maybe we scratch the R3 because considering the H2 release at the same time as Raven Ridge I think there is a chance that they are all single CCX devices. This is at least 8 CPU combinations from $100 to probably $3k that will be using a single die. This scalability allows for both macro and micro scalability. They can change the configuration from 2 to 1 CCX, They can can up or decrease the GCU, they can add something like HBM, all easier than before.

But since IF isn't restricted to on die communication the options on what to do with a single die become immense. Think of Little Vega. Maybe little Vega's job isn't really to compete with the 1070. Maybe it's job is to be small enough that they could clock them down and go with 4GB or 2GB of HBM2 and have 4 of them all working efficiently on a single compute card. Losing out to the 1070 won't matter as much if it can double up the density of the chips in an HPC setting.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
The increase in performance of Polaris vs Tahiti is not because increased throughput of cores, but architectural improvements.
 

Paratus

Lifer
Jun 4, 2004
17,430
15,316
146
The increase in performance of Polaris vs Tahiti is not because increased throughput of cores, but architectural improvements.

Assuming big Vega is actually the same layout as Fiji, (TMU- 256, SP-4096, ROP-64, ~512Gb/s), then we should be able to get a good handle on Fiji to Vega architectural performance improvements at the same clock and clock scaling improvements.

Hope AT still has a Fury for direct comparison.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
Assuming big Vega is actually the same layout as Fiji, (TMU- 256, SP-4096, ROP-64, ~512Gb/s), then we should be able to get a good handle on Fiji to Vega architectural performance improvements at the same clock and clock scaling improvements.

Hope AT still has a Fury for direct comparison.
Well everything depends on two things. How big impact have on the performance of the GPUs architectural improvements, and the question: Did AMD increased the throughput of the Cores.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
That's like 3% per year. It is nothing for all intents and purposes.

You started off with a false statement that Polaris CU is equal to Tahiti CU which was proven wrong with computerbase comparison of Tahiti vs Polaris at same sp, same clocks, same bandwidth, same ROP and same TMU. Now that you were proven wrong you have shifted the goalpost saying its 3% per year. You should atleast not continue the argument and just accept that you were wrong. Vega will be the first major architectural change in half a decade for AMD and GCN. Lets wait and see what they have come up with before writing them off.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Assuming big Vega is actually the same layout as Fiji, (TMU- 256, SP-4096, ROP-64, ~512Gb/s), then we should be able to get a good handle on Fiji to Vega architectural performance improvements at the same clock and clock scaling improvements.

Hope AT still has a Fury for direct comparison.

Vega is designed for higher IPC and higher frequencies. So even though we could drop Vega down to Fury X clocks it would not really illustrate the architecture's capabilities. What matters is what performance AMD achieve within 250W TDP and 500 sq mm on 14nm. I think Vega 10 should clock around 1.6 Ghz given we know MI25 is clocked at 1525 Mhz and server parts usually clock a bit slower than consumer graphics. I am very interested in seeing what is Vega's perf, perf/sp , perf/sp/clock, perf/sq mm and perf/watt. These will give an idea of the architecture's performance and efficiency.
 
Reactions: Bacon1

Paratus

Lifer
Jun 4, 2004
17,430
15,316
146
Vega is designed for higher IPC and higher frequencies. So even though we could drop Vega down to Fury X clocks it would not really illustrate the architecture's capabilities. What matters is what performance AMD achieve within 250W TDP and 500 sq mm on 14nm. I think Vega 10 should clock around 1.6 Ghz given we know MI25 is clocked at 1525 Mhz and server parts usually clock a bit slower than consumer graphics. I am very interested in seeing what is Vega's perf, perf/sp , perf/sp/clock, perf/sq mm and perf/watt. These will give an idea of the architecture's performance and efficiency.

Well sure but I was thinking about comparing Fiji and Vega at at several different clockspeeds. That way it would be possible to see which architecture improves more per clock.

Say if Vega is 55% faster when both are at 1000mhz, 57% faster at 1050, and 60% faster at 1100 that tells us Vega is architected to be more efficient at higher clockspeeds than Fury.


Well everything depends on two things. How big impact have on the performance of the GPUs architectural improvements, and the question: Did AMD increased the throughput of the Cores.

Yup. Although I can't see a possible 50% increase in transistors with no increase in performance other than clockspeeds increase.

GCN already supports more features than Pascal. It just depends on how many transistors are required to support HPC functions for the M25.
 

IllogicalGlory

Senior member
Mar 8, 2013
934
346
136
That's like 3% per year. It is nothing for all intents and purposes.
And yet, they're still competitive in perf/mm^2 with Pascal. The reason for this is that AMD has not been focusing on increasing per shader per clock performance, but rather packing far more shaders in the same area and bumping clocks a bit, while NV has been focusing on per shader per clock and big clock speed increases, but not a huge increase in shaders/mm^2. In the end, it's fairly even, though NV's designs remain more power efficient.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
Yup. Although I can't see a possible 50% increase in transistors with no increase in performance other than clockspeeds increase.

GCN already supports more features than Pascal. It just depends on how many transistors are required to support HPC functions for the M25.
The actual fact is that AMD GPUs have higher IPC than Nvidia GPUs(At least previous generations of NV Architectures. GP100 and GV100 architectures may be on par, or higher in IPC), thats why they do not clock as high.
 

SpaceBeer

Senior member
Apr 2, 2016
307
100
116
That was more or less officialy confirmed
When fully populating a 42U server rack with six Falconwitch 4U servers and six K888 2U servers, compute performance presumably reached up to 3 petaflops by utilizing a total of 120x MI25 GPUs. AMD’s Senior Vice President and Chief Architect, Raja Koduri, claimed “And the whole rack will cost less than a single DGX-1 server.” Assuming this to be true, we are talking about twelve servers, RAM, and 120x MI25 GPUs for around the $129,000 price range.
https://exxactcorp.com/blog/digging...instinct-gpus-miopen-gpu-accelerated-library/
 

exquisitechar

Senior member
Apr 18, 2017
722
1,019
136
A 5376 CUDA core GPU has more performance MHz for MHz than a 4096 SP GPU?
An 815mm^2 GPU is more expensive than a ~530mm^2 GPU?

Shocker lol

Well, if he's talking about GV100 then it's not much of a scoop, I agree. I just assumed he meant core for core as that would actually matter.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
The actual fact is that AMD GPUs have higher IPC than Nvidia GPUs(At least previous generations of NV Architectures. GP100 and GV100 architectures may be on par, or higher in IPC), thats why they do not clock as high.
That only makes sense if you entirely disregard core counts, in which case you're not actually talking about IPC, but rather instructions per clock per die area or some such metric. Per core, Nvidia wins hands down, even at the same clocks, but AMD packs far more cores into the same area, making up for this. Now let's just cross our fingers for AMD pulling a serious clock bump off with Vega.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
I was skimming through this after it was posted in Beyond3D forums, and it makes me think that GPU "IPC" isn't a viable metric in comapring between different GPU architectures.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
I was skimming through this after it was posted in Beyond3D forums, and it makes me think that GPU "IPC" isn't a viable metric in comapring between different GPU architectures.

When people talk about GPU "IPC" what they usually really mean is DX11 performance per TFlop. But that's a mouthful.

RX 480, at full boost clocks, has a peak throughput of 5.83 TFlops.
GTX 1060, at full boost clocks, has a peak throughput of 4.37 TFlops.
However, despite this, GTX 1060 is as fast or faster in most DX11 AAA titles. This is what people mean when they say that Pascal has higher "IPC" than GCN (Polaris).
 
Reactions: Headfoot

xpea

Senior member
Feb 14, 2014
458
156
116
From the link:

and from https://devblogs.nvidia.com/parallelforall/inside-volta/?ncid=so-twi-vt-13918


Figure 6: Tesla V100 Tensor Cores and CUDA 9 deliver up to 9x higher performance for GEMM operations. (Measured on pre-production Tesla V100 using pre-release CUDA 9 software.)


hmmm Vega must be very very cheap to be even relevant against V100 at Machine Learning workflows. Tensor cores are the kiss of death.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
We should get more information about AMD's GPU roadmap on May 16th at their Financial Analyst Day. AMD needs a Volta refresh at 14nm in 2018 to compete with GV102/GV104/GV106. We are unlikely to see 7nm GPUs from AMD and Nvidia before mid-2019. So AMD need something to compete with Volta or else they could end up going back to the sub 20% market share which we saw during the Maxwell generation.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
A 5376 CUDA core GPU has more performance MHz for MHz than a 4096 SP GPU?
An 815mm^2 GPU is more expensive than a ~530mm^2 GPU?

Shocker lol
It may mean that Pascal will achieve higher core clocks, not that it has higher performance per clock.
 

iBoMbY

Member
Nov 23, 2016
175
103
86
hmmm Vega must be very very cheap to be even relevant against V100 at Machine Learning workflows. Tensor cores are the kiss of death.

So, it will probably be NVidia's fault when AI is going to kill us all? Also you get at least 10 MI25 for the price of a V100.
 

alcoholbob

Diamond Member
May 24, 2005
6,379
445
126
When people talk about GPU "IPC" what they usually really mean is DX11 performance per TFlop. But that's a mouthful.

RX 480, at full boost clocks, has a peak throughput of 5.83 TFlops.
GTX 1060, at full boost clocks, has a peak throughput of 4.37 TFlops.
However, despite this, GTX 1060 is as fast or faster in most DX11 AAA titles. This is what people mean when they say that Pascal has higher "IPC" than GCN (Polaris).

Using IPC seems like just confused jargon then. To me if you are going to reduce it to a efficiency comparison it should be something more like perf/mm2, in which case Pascal is ahead by about 25% in DX12 and 35% in DX11 (once you factor in that GF 14nm is slightly denser than 16nm TSMC)
 

IllogicalGlory

Senior member
Mar 8, 2013
934
346
136
So, it will probably be NVidia's fault when AI is going to kill us all? Also you get at least 10 MI25 for the price of a V100.
Power consumption and space consumption is going to make a big difference here. If I can get 10x as many GPUs for the same price, but also the same performance as a single one from the other side, there's no contest.
 
Reactions: CatMerc
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |