Nvidia Pascal Lineup Speculation

nvgpu · Apr 7, 2016

http://cdn.videocardz.com/1/2016/04/NVIDIA-GP106-GTC-2016-vs-GM206-GPU.png

GP106 is smaller than GM206.

Qwertilot · Apr 7, 2016

Only barely so looking at though, so it'll presumably be ~75% faster? Unless it end ups bandwidth throttled of course.

RussianSensation · Apr 7, 2016

Qwertilot said:
Only barely so looking at though, so it'll presumably be ~75% faster? Unless it end ups bandwidth throttled of course.

Based on what? It's NV = milk, milk, milk. 960 is 13-14% faster than 760 and 960 came out 1.5 years later at $200.

The speed you just proposed would put GP106 at 93% of the performance (35% x 1.75 = 61.25/66 =>93%) of a 980 at 1440p.

jpiniero · Apr 7, 2016

nvgpu said:
http://cdn.videocardz.com/1/2016/04/NVIDIA-GP106-GTC-2016-vs-GM206-GPU.png

GP106 is smaller than GM206.

Are they sure that's the dGPU and not the Drive's CPU?

MrTeal · Apr 7, 2016

jpiniero said:
Are they sure that's the dGPU and not the Drive's CPU?

Yeah. It's from the GTC Keynote, it was on the back of the PX2. The two chips are extremely different.

moonbogg · Apr 7, 2016

Nvidia will milk the ever loving Christian cow out of this next gen. Don't expect any big jumps. They want you to buy once, then twice, then realize you are silly for wasting money on titans, sell them, then buy thrice.

JDG1980 · Apr 7, 2016

nvgpu said:
http://cdn.videocardz.com/1/2016/04/NVIDIA-GP106-GTC-2016-vs-GM206-GPU.png

GP106 is smaller than GM206.

GP106 could easily be smaller than GM206 and still provide better performance than GM204.

GM206 is a 228mm^2 chip with 2.94 billion transistors. If we assume that 16FF+ has roughly double the transistor density of the 28nm process, a GM206-sized chip would be able to have 5.88 billion transistors. In comparison, GM204 only needs 5.2 billion. If they just did a straight die-shrink of GM204, it would be right around 200mm^2.

In fact, I think GP106 pretty much will be a slightly modified die-shrink of GM204. It will have FP16 support added (the Drive PX 2 needs that), an updated video decoder block, a few other minor tweaks here and there, and a 128-bit GDDR5X bus instead of 256-bit standard GDDR5. But in terms of actual shader power and ROPs, I suspect it will be almost identical. The performance benefits of GP106 over GM204 will come from much higher clock speeds - the announced specs for Tesla P100 indicate that the FinFET process can go far faster than 28nm.

JDG1980 · Apr 8, 2016

RussianSensation said:
Based on what? It's NV = milk, milk, milk. 960 is 13-14% faster than 760 and 960 came out 1.5 years later at $200.

That's a poor comparison for two reasons. First, the 760 is a cut GK104 chip, so it's one full size class higher. Instead, you should be comparing 660 (GK106) to 960 (GM206). In this review, done around the time GM206 was first released, we see that the stock GTX 960 does about 37% better than the stock GTX 660.

Secondly, and more importantly, there was no node shrink from Kepler->Maxwell. That was done with architectural improvements alone (a minor miracle, IMO). You can't use that as precedent for what will happen on a new node. In the past, a node shrink usually meant 60%-100% increases in performance at the same mm^2. For instance, GTX 680 (full GK104) beat GTX 560 Ti (full GF114) by ~59%, even though the old Fermi chip had ~13% more die area. That means an ~80% boost in perf/mm^2.

RussianSensation said:
The speed you just proposed would put GP106 at 93% of the performance (35% x 1.75 = 61.25/66 =>93%) of a 980 at 1440p.

I expect GP106 to be faster than GM204, just as GK106 beat out GF114. The combination of the added transistor density and higher clocks enabled by FinFET mean that Nvidia would have to drop the ball pretty badly to not match this expectation. And they can't afford to hold back, because AMD won't be. Do you think Nvidia wants to have GP106 be considerably slower than Polaris 10?

Silverforce11 · Apr 8, 2016

GP106 should easily match GM204, that's the point of a next-gen node afterall. 2x the transistors.

Now that we know GP100 = 3,840 CC, with 1:2 FP64, made up of 6 GPCs.

How to get the lower variants the most cost effective way? Don't need a whole new uarch, just chop up the GPC like NV has been doing for over a decade! Before anyone say that this gen is different to all the historic precedent, they need to come up with a real compelling case as to why would NV waste so much more R&D $ to change the status quo.

GP104 = 2,560 CC, maybe 1:32 FP64. 4 GCPs.

GP106 = 1,280 CC. 2 GCPs.

GP107 = 640 CC. 1 GCP.

Don't be offended by the lower CC numbers, paper specs don't mean much without knowing the real uarch gains, IPC and all.

Kepler GK110 has higher CC than GM204, yet is a lot slower, worse in modern games by far, as the 980 routinely stomps on the 780Ti of late.

ps. There is a possibility of GP102, a gaming focused GP100, stripped of FP64. Such a chip could have 3840 CC and be ~400-450mm2, much smaller than GP100 and yield better.

JDG1980 · Apr 8, 2016

xpea said:
up
After GP100 announcement and before Pascal consumer cards, let's review predictions and make new ones...

Well, I made two clear errors in my original set of predictions. First of all, I failed to foresee that Nvidia would be creating GP100 as a dedicated HPC chip, and not using it for any gaming cards at all. Secondly, at the time I made that post, the existence of GDDR5X was not public knowledge yet, so I overestimated the necessary bus widths.

My expectation at this point is still that most of the new Pascal chips will provide around 2x the shader cores of their Maxwell counterparts. Bus widths should be the same as on the corresponding Maxwell chips, but should be using GDDR5X to get more bandwidth. And clocks will be way up, due to the use of a FinFET process. It's also looking like the GP102 rumors are probably true, because if not, then Nvidia won't have anything to compete with Vega 10 when that comes out in late 2016 - early 2017.

Silverforce11 · Apr 8, 2016

JDG1980 said:
Well, I made two clear errors in my original set of predictions. First of all, I failed to foresee that Nvidia would be creating GP100 as a dedicated HPC chip, and not using it for any gaming cards at all.

Do not be so sure.

When GK110 was first revealed, it's diagram also had no ROPs, but the consumer GTX diagram added the ROPs.

There's ZERO historic precedent for not using these chips for both HPC and gaming.

3840 CC vs 2880 CC is a ~33% increase. Add 20% higher clocks, we've reached 50%. Add 20% IPC gains, or even 30% in GCN-optimized engines, and you have 80%.

Once HPC demands are met and yields on 16nm FF improve, NV will sell GP100 as GTX Titan class, priced higher, probably $1249 - $1499, with 16GB HBM2. Who would buy it you say? Well, a lot of folks who bought Titan don't care about price. And if it keeps the 1:2 FP64 unlocked, that's justification enough, a "Prosumer SKU".

JDG1980 · Apr 8, 2016

Silverforce11 said:
Don't be offended by the lower CC numbers, paper specs don't mean much without knowing the real uarch gains, IPC and all.

Pascal isn't a major architectural change from Maxwell. In fact, Maxwell wasn't even part of Nvidia's original roadmap; it's basically 90% of Pascal back-ported to 28nm.

GP100 has a low core count because it wastes an insane amount of die space on FP64, due to Nvidia's inefficient method of supporting that feature. There's no magic that makes an individual GP100 CUDA core better than a GM200 CUDA core. FinFET enables higher speeds, but that's down to the process, not the architecture. I stand by my prediction that GP104 will have more FP32 power than GP100.

Silverforce11 · Apr 8, 2016

JDG1980 said:
GP100 has a low core count because it wastes an insane amount of die space on FP64, due to Nvidia's inefficient method of supporting that feature. There's no magic that makes an individual GP100 CUDA core better than a GM200 CUDA core.

It's not magic, it's just minor tweaks here and there that lead to a nice IPC gain.

More cache, registers per CC. Optimal 64 warp -> 64 CC setup. On Maxwell, it was 128 CC for 2x 32 Warp Schedulers, it had to fire twice to reach all 128CC. With Pascal, it hits 100% CC utilization in one cycle.

Do the maths.

33% more cores, 20% higher clocks, 20% IPC gains, it's a nice result.

JDG1980 · Apr 8, 2016

Silverforce11 said:
Do not be so sure.

When GK110 was first revealed, it's diagram also had no ROPs, but the consumer GTX diagram added the ROPs.

There's ZERO historic precedent for not using these chips for both HPC and gaming.

Sure there is: GK210. Separate mask, never used for anything except Tesla K80M.

Silverforce11 said:
3840 CC vs 2880 CC is a ~33% increase. Add 20% higher clocks, we've reached 50%. Add 20% IPC gains, or even 30% in GCN-optimized engines, and you have 80%.

Once HPC demands are met and yields on 16nm FF improve, NV will sell GP100 as GTX Titan class, priced higher, probably $1249 - $1499, with 16GB HBM2. Who would buy it you say? Well, a lot of folks who bought Titan don't care about price. And if it keeps the 1:2 FP64 unlocked, that's justification enough, a "Prosumer SKU".

That only works if Nvidia doesn't have any competition. The word from the GTC 2016 presentation is that even the Tesla P100 card won't be available to the public until Q1 2017. (All the chips for 2016 are going into the $129,000 DGX-1 boxes.) Thus, if they were going to make a Titan from GP100, we can assume it wouldn't be available until even later than that. (For reference, GK110 Tesla hit the mass market in November 2012; GK110 Titan not until February 2013.)

AMD is expected to have Vega 11 and 10 out in Q4 2016 - Q1 2017. Even if the biggest Vega is only about Hawaii-sized (~450mm^2) it could still beat GP100 if Nvidia was foolish enough to use that HPC card as their gaming flagship. Note how Hawaii managed to match or beat the much larger GK110 chip in gaming, especially newer titles.

antihelten · Apr 8, 2016

nvgpu said:
http://cdn.videocardz.com/1/2016/04/NVIDIA-GP106-GTC-2016-vs-GM206-GPU.png

GP106 is smaller than GM206.

I did a bit of measuring and GP106 in that picture is roughly 63% the size of GM206, which would put it at 142-143 mm2.

Assuming a doubling in transistor density from FF16+* and a corresponding doubling in functional units, then we end up with 1290 cores if we use GM206 as a basis, and 1472 cores if we use GM204 as a basis. This would round off to 10 SMs (1280 cores total) and 12 SMs (1536 cores total) respectively. Using GM206 as a basis is probably more accurate since it probably has a ratio between shaders and uncore closer to what GP106 would have.

1280 cores at 1480 MHz (same frequency as P100), would put GP106 at roughly 10% slower than a reference GTX 970. 1536 cores at 1480 MHz would put GP106 at roughly 10% slower than a GTX 980.

*It's worth noting that GP100 achieved an 88% increase in transistor density relative to GM200, and a 95% increase relative to GK110.

Silverforce11 · Apr 8, 2016

JDG1980 said:
That only works if Nvidia doesn't have any competition.

Don't worry man, according to nvgpu's claims from Digitimes, AMD is only paper launching, they got nothing to show, so no competition for NV to worry about.

http://forums.anandtech.com/showpost.php?p=38152125&postcount=17

airfathaaaaa · Apr 8, 2016

Silverforce11 said:
Don't worry man, according to nvgpu's claims from Digitimes, AMD is only paper launching, they got nothing to show, so no competition for NV to worry about.

http://forums.anandtech.com/showpost.php?p=38152125&postcount=17

i always find it funny how people tend to just format the recent past with two live demos only to make their statement better

xthetenth · Apr 8, 2016

airfathaaaaa said:
i always find it funny how people tend to just format the recent past with two live demos only to make their statement better

Mockups are a more reliable indicator of working chips than working chips, doncha know?

LTC8K6 · Apr 8, 2016

nvgpu said:
http://cdn.videocardz.com/1/2016/04/NVIDIA-GP106-GTC-2016-vs-GM206-GPU.png

GP106 is smaller than GM206.

I can't tell if those are scaled correctly, though.

el etro · Apr 8, 2016

Won't get much into it, but i think that Pascal will use GDDR5x on all but high end and halo class, and will have 2x gaming perf/watt vs. Maxwell.

Nvidia Pascal Lineup Speculation

Senior member

Golden Member

Elite Member

Lifer

Diamond Member

Lifer

Golden Member

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Golden Member

Lifer

Senior member

Golden Member

Lifer

Golden Member