NVIDIA Pascal Thread

Sweepr · Apr 5, 2016

Arachnotronic said:
Gaming is NVIDIA's largest market by far & one of its fastest growing, if you think NV is taking its eye off the ball and giving AMD an opening I don't know what to tell you.

It's obvious that NVIDIA is building GPUs tailored for each application because it has the luxury to do so. They have the revenue in these major segments to justify doing an HPC/professional oriented GPU as well as a set of gaming-oriented versions which I'm sure we'll learn about closer to launch.

Agreed, and it is painfully obvious that today's event focused on HPC/professionals. The fact that they didn't demo any gaming orientated Pascal GPUs today doesn't tell much (if at all). Concerning the previous rumours, where there's smoke, there's fire - they could very well announce the new GeForces at Computex for a Q3 launch. GP104/GP106 should be the first if SweClockers is right.

Props to NVIDIA for creating this beast (P100). I'm updating the OP with the latest news today.

Adored · Apr 5, 2016

Geforce sells itself, Nvidia barely even has to try. It makes a lot more money but that doesn't mean it's priority for Nvidia. They've been trying to get out of their PC dependency since the late 2000's when they realised just how vulnerable the company is to asshattery from Intel and just general losses to integrated graphics.

The kinds of numbers Nvidia posted the past 18 months will never be seen in PC graphics again. This deep learning stuff is really promising for them though - way better than previous attempts to get off their PC dependency.

parvadomus · Apr 5, 2016

Looks bad taking into account specs only. Probably 2xHawaii performance, if AMD does not get a better product this round they probably never will.

JDG1980 · Apr 5, 2016

My first reaction when I saw Nvidia's blog post on Pascal was: are you kidding me? Only 3584 shaders from a 610mm^2 chip on a new node shrink? That's pathetic. Maxwell has 3072 shaders, so for a similarly sized chip, we should be seeing 6144. Instead we get a measly ~17% increase (okay, 25% if you count the full die, which apparently even Tesla buyers won't be getting at first).

After thinking it over, I realized that this is strong evidence that the "GP102" rumors are true - and that Nvidia's double-precision HPC chips will be completely separate from their graphics chips from now on. We saw the first steps in this direction with GK210, which was a completely new revision of Kepler's big die that never saw a consumer-grade release. As others have pointed out, the blog post on GP100 contains no mention of ROPs at all. It could just be an oversight, but I think they aren't mentioned because they are not included on the chip. GP100 isn't designed to support video output at all, it's purely a HPC computing chip.

The apparent reason why Nvidia is doing this is that their method of DP computing takes up a tremendous amount of die space. To get the 1/2 DP/SP ratio, they need to have one DP CUDA core for every two SP CUDA cores. Assuming the DP cores need more hardware because they're wider, that means as much as half the chip's shader power is for DP support only, and completely useless both to gaming and to ordinary single-precision GPGPU apps. If a chip of this same size was done Maxwell-style with minimal DP support, we could be looking at a whopping 7680 shaders! Now that's more like it. That said, though, I don't think GP102 will be quite this massive. A stronger possibility is 6144 shaders, double GM200's count. Whether Nvidia will use HBM2 on GP102 is another question, but given their traditional conservatism when it comes to new memory types, I think a 384-bit GDDR5X bus is more likely. GP102 would then be the highest actual GPU in the Pascal lineup, powering the most expensive Quadro card and the next-generation Titan when it arrives.

We might well see a situation where even GP104 has more single-precision power than GP100. That would be unprecedented, but today's announcement is a spanner in the works and is different from everything that Nvidia has done up until now. If they really plan on having the GP100 being the best in their lineup all-around, and everything else inferior to it, then AMD is going to win this generation without a doubt.

JDG1980 · Apr 5, 2016

Rvenger said:
I can tell you that I sell way more Quadros than anything. Geforce is not Nvidia's primary focus. Not even close.

Don't most Quadro users care about single-precision GPGPU performance and rendering? GP100 is fairly lackluster on that front; sure, it beats the 28nm products, but by a far smaller margin than you'd expect. 10.6 TFlops of single-precision performance is only ~23% better than Fiji. We should be seeing at least 12.3 TFlops (double GM200's performance), and probably more since 16FF+ can apparently clock higher than its predecessor. Nvidia pretty much has to have a GP102 chip in the pipeline at this point.

Silverforce11 · Apr 5, 2016

Arachnotronic said:
Your understanding of NVIDIA's business needs some work. Tesla is a very, very small portion of NV's overall revenues; GeForce GTX is far more important to the company in terms of total revenue and gross profit dollars.

http://www.nextplatform.com/2015/05/08/tesla-gpu-accelerator-grows-fast-for-nvidia/

HPC and Enterprise (Tesla & Quadro) for 2015 FY is ~1B, while Gaming is ~2B.

But here's the kicker, in terms of revenue and importantly, PROFIT, the margins on HPC/Enterprise is much higher than the ~50% of gaming. This is where they actually earn their profit.

HPC in particular is a strong growth market. Whereas gaming is tapped, due to their taking of AMD's marketshare (80:20), it's not likely to grow that much in the future and potentially drop if AMD is competitive.

GP100 is made for Tesla, a compute beast. But it is still going to be very good for gaming, with it's new GCN-like SM/CC design, it will be a lot faster than Maxwell in console ports.

This is where NV's focus is in case you didn't see the talk:

JDG1980 · Apr 5, 2016

Arachnotronic said:
AMD is hyping Polaris 10/11 because their current GPU sales are in the toilet and by hyping them up in the press, they would create an image of "being ahead" of NV (no doubt to help boost its stock price). NV, which actually makes a lot of $ from selling GPUs obviously doesn't want to signal to gamers that "hey, the stuff we're trying to sell you is crappy and obsolete, wait for the new stuff!"

The fact remains that AMD has demonstrated working FinFET GPU silicon, and Nvidia has not.

Arachnotronic said:
Anyway, the point is that there were some really comical posts across the web claiming that NVIDIA hadn't taped out Pascal when in fact they had not only taped out a monster of a GPU using state-of-the-art process tech & packaging technology but have now gone into volume production.

And yet they couldn't seem to spare an actual GP100 unit for the presentation. Instead all we saw is a 3D render.

Head1985 · Apr 5, 2016

JDG1980 said:
My first reaction when I saw Nvidia's blog post on Pascal was: are you kidding me? Only 3584 shaders from a 610mm^2 chip on a new node shrink? That's pathetic. Maxwell has 3072 shaders, so for a similarly sized chip, we should be seeing 6144. Instead we get a measly ~17% increase (okay, 25% if you count the full die, which apparently even Tesla buyers won't be getting at first).

After thinking it over, I realized that this is strong evidence that the "GP102" rumors are true - and that Nvidia's double-precision HPC chips will be completely separate from their graphics chips from now on. We saw the first steps in this direction with GK210, which was a completely new revision of Kepler's big die that never saw a consumer-grade release. As others have pointed out, the blog post on GP100 contains no mention of ROPs at all. It could just be an oversight, but I think they aren't mentioned because they are not included on the chip. GP100 isn't designed to support video output at all, it's purely a HPC computing chip.

The apparent reason why Nvidia is doing this is that their method of DP computing takes up a tremendous amount of die space. To get the 1/2 DP/SP ratio, they need to have one DP CUDA core for every two SP CUDA cores. Assuming the DP cores need more hardware because they're wider, that means as much as half the chip's shader power is for DP support only, and completely useless both to gaming and to ordinary single-precision GPGPU apps. If a chip of this same size was done Maxwell-style with minimal DP support, we could be looking at a whopping 7680 shaders! Now that's more like it. That said, though, I don't think GP102 will be quite this massive. A stronger possibility is 6144 shaders, double GM200's count. Whether Nvidia will use HBM2 on GP102 is another question, but given their traditional conservatism when it comes to new memory types, I think a 384-bit GDDR5X bus is more likely. GP102 would then be the highest actual GPU in the Pascal lineup, powering the most expensive Quadro card and the next-generation Titan when it arrives.

We might well see a situation where even GP104 has more single-precision power than GP100. That would be unprecedented, but today's announcement is a spanner in the works and is different from everything that Nvidia has done up until now. If they really plan on having the GP100 being the best in their lineup all-around, and everything else inferior to it, then AMD is going to win this generation without a doubt.

Full GP100 with Zero DP units will have 5760SP.
Each SMX have 32x DP units.Full GP100 have 60x SMX.32x60 is 1920DP units.3840+1920 is 5760SP

If that is true and NV making Gaming GP100 with zero DP units then GP104 should have 3500-3800SP
GP 104 with zero DP units(assuming its 40SMX) will have 2560+1280DP=3840SP.

raghu78 · Apr 5, 2016

P100 is a massive chip.With Pascal Nvidia seems to have gone towards a more GCN like architecture. But that SP DP ratio of 2:1 seems to have come at a huge power cost. 300W for a single GPU Tesla chip. Wow this seems to be a damn power hungry chip. Its going to be interesting to see how Vega 10 and P100 stack up both in DP compute and gaming perf and perf/watt. Pascal seems to have gone towards lower number of cores at higher clocks. Since power is generally linked to square of voltage and that higher the frequency the higher the voltage required. AMD might go for more units and lower clocks having been criticized for poor power inefficiency. Nice contest.

Anyway we are going to see a Q1 2017 face off most likely between GP100 and Vega.

Mopetar · Apr 5, 2016

JDG1980 said:
My first reaction when I saw Nvidia's blog post on Pascal was: are you kidding me? Only 3584 shaders from a 610mm^2 chip on a new node shrink? That's pathetic.

I'm guessing that they're being conservative with the design in order to keep the yields up given the size of the chip and the maturity of the process.

JDG1980 · Apr 5, 2016

Head1985 said:
Each SMX have 32x DP units.Full GP100 have 60x SMX.32x60 is 1920DP units.3840+1920 is 5760SP

I am assuming that a DP shader core uses about twice the die area of a SP shader core. It may indeed be less than that, but DP cores pretty much have to be bigger.

Silverforce11 · Apr 5, 2016

Head1985 said:
Full GP100 with Zero DP units will have 5760SP.

If that is true and NV making Gaming GP100 with zero DP units then GP104 should have 3500-3800SP

Despite NV's dominance in both HPC and Gaming, their profits aren't that big. Look at the numbers.

Do you think they will waste more money making a gaming focused GP100 in parallel? On what production capability? TSMC 16nm FF is tapped out between all the mobile demands.

What do they make with the few 16nm wafers available at TSMC?

GP100 for Tesla that sells for over $10,000 (just look at the prices of their custom Pascal rig).

Or

GP100 for Gaming that sells for $1,000? With HBM2 adding to the expense.

Let's just play a very simple thought game for those who don't understand this concept.

NV sells the chip for ~$500 to go into a $1000 GTX SKU. With 50% margin, the profit each chip is $250.

NV sells the same chip on a Tesla SKU for $9,500. They earn so much more profit for the same chip.

When production is limited and yields are low, it does not make any sense to rush to market a consumer GTX SKU for anything big.

There is no gaming-focused high FP32 GP100. It's illogical because it competes with the same 16nm wafers from TSMC. A big gaming chip is a recipe for failure unless the node is very good with excellent yields.

What they can do and what they are doing, is make a HPC big chip, that has insane profits so it can offset any yield issues, basically paying for the wafer and more. Then the harvested chips later will go to consumer GTX SKUs. Making the most of products and profit per wafer.

Head1985 · Apr 5, 2016

Silverforce11 said:
Despite NV's dominance in both HPC and Gaming, their profits aren't that big. Look at the numbers.

Do you think they will waste more money making a gaming focused GP100 in parallel? On what production capability? TSMC 16nm FF is tapped out between all the mobile demands.

What do they make with the few 16nm wafers available at TSMC?

GP100 for Tesla that sells for over $10,000 (just look at the prices of their custom Pascal rig).

Or

GP100 for Gaming that sells for $1,000? With HBM2 adding to the expense.

Let's just play a very simple thought game for those who don't understand this concept.

NV sells the chip for ~$500 to go into a $1000 GTX SKU. With 50% margin, the profit each chip is $250.

NV sells the same chip on a Tesla SKU for $9,500. They earn so much more profit for the same chip.

When production is limited and yields are low, it does not make any sense to rush to market a consumer GTX SKU for anything big.

There is no gaming-focused high FP32 GP100. It's illogical because it competes with the same 16nm wafers from TSMC. A big gaming chip is a recipe for failure unless the node is very good with excellent yields.

What they can do and what they are doing, is make a HPC big chip, that has insane profits so it can offset any yield issues, basically paying for the wafer and more. Then the harvested chips later will go to consumer GTX SKUs. Making the most of products and profit per wafer.

But Gp100 didnt any Rops.Atleast on GPU diagram.There are no Rops.It looks like 100% compute SKU.

PhonakV30 · Apr 5, 2016

If we look at table "Pascal Compute Capability" in this link :

http://videocardz.com/58838/nvidia-announces-pascal-gp100-with-3840-cuda-cores

we can say that Nvidia will add Hardware scheduler for Async compute in next uArch.

Maverick177 · Apr 5, 2016

They did say Pascal have pre-emption, not sure to what degree.

JDG1980 · Apr 5, 2016

Silverforce11 said:
Do you think they will waste more money making a gaming focused GP100 in parallel? On what production capability? TSMC 16nm FF is tapped out between all the mobile demands.

Source? iPhone sales have been tapering off, with Apple facing its first year-over-year decline in iPhone sales since the first model debuted. That doesn't indicate to me that TSMC is "tapped out". To the contrary, it seems likely they're going to need more GPU orders to fill up their capacity.

That makes all of your other arguments moot, since they rely on TSMC capacity constraints that we have no reason to believe exist at this point.

Silverforce11 said:
What they can do and what they are doing, is make a HPC big chip, that has insane profits so it can offset any yield issues, basically paying for the wafer and more. Then the harvested chips later will go to consumer GTX SKUs. Making the most of products and profit per wafer.

Even if GP100 does have ROPs (and nothing so far indicates that), it would not make a good gaming chip. Yes, it could do better than Titan X, but it will definitely lose to Vega. Again, we're talking about a meager 25% increase in shader count from GM200. Unless Nvidia wants to write off the high-end gaming market, they will need a GP102 chip that focuses on shader performance and sacrifices double precision. Quadro sales will make it worthwhile (look at the price on the Maxwell-based Quadro M6000). Most Quadro users don't care about DP support any more than gamers do. DP support is now a niche category that gets its own chip. And in retrospect it already was - the HPC market got the exclusive GK210, and the non-HPC market got GM200.

Silverforce11 · Apr 5, 2016

Head1985 said:
But Gp100 didnt any Rops.Atleast on GPU diagram.There are no Rops.It looks like 100% compute SKU.

It's definitely not full details. Note the TPC, what's that?

GP100

It's interesting they also seem to scale the FP32 and FP64 CC, with the FP64 being around 1.5x larger.

GM200

Why are they calling it that, it's a Texture Processing Cluster and it was used in the old days, pre-Fermi. It has no purpose anymore... unless something new. Thread Processing Cluster?

More details are needed.

Silverforce11 · Apr 5, 2016

JDG1980 said:
Source? iPhone sales have been tapering off, with Apple facing its first year-over-year decline in iPhone sales since the first model debuted. That doesn't indicate to me that TSMC is "tapped out". To the contrary, it seems likely they're going to need more GPU orders to fill up their capacity.

That makes all of your other arguments moot, since they rely on TSMC capacity constraints that we have no reason to believe exist at this point.

Even if GP100 does have ROPs (and nothing so far indicates that), it would not make a good gaming chip. Yes, it could do better than Titan X, but it will definitely lose to Vega. Again, we're talking about a meager 25% increase in shader count from GM200. Unless Nvidia wants to write off the high-end gaming market, they will need a GP102 chip that focuses on shader performance and sacrifices double precision. Quadro sales will make it worthwhile (look at the price on the Maxwell-based Quadro M6000). Most Quadro users don't care about DP support any more than gamers do. DP support is now a niche category that gets its own chip. And in retrospect it already was - the HPC market got the exclusive GK210, and the non-HPC market got GM200.

Not only Apple taps into TSMC's FF node. Come on. TSMC was last reported as saying 16nm FF is still only a small share of it's projected revenue for 2016. It's ramping up, definitely not what we could consider as a mature node.

If it was mature, yeah, we can expect consumer GP100 as GTX in June, but it ain't going to happen. Q1 2017 for Tesla GP100 (a harvested chip itself, only 56/60 active) means there's major serious yield issues.

This has all the hallmarks of Fermi, if you recall the launch as a harvested Tesla chip for 9 months before the consumer GTX 480 was released.

Gaming performance will not be only slight better than Titan X. You're thinking CC = CC and that's not the case at all.

64 CC per SM means one warp (consoles optimized for GCN performance uses 64 wavefront/warp) will hit peak SM/CC utilization in Pascal.

Compare Kepler Cuda Core count vs Maxwell, note that Maxwell GM204 has less but beats the 780Ti. You cannot base performance targets on Cuda Core alone, or on paper TFlops. These assume 100% efficiency on these individual units. Not going to happen. What Pascal's change in the SM layout does is to ensure it hits peak performance in games that are optimized for GCN.

http://www.computerbase.de/2016-04/...sser-pascal-soll-all-in-fuer-hpc-markt-gehen/

It seems I am not the only one who notices this:

It is noticeable that every Streaming Multiprocessor thus has only about half as many shader units like Maxwell - with 64 sets in the same number as AMD's GCN architecture.

xorbe · Apr 5, 2016

New gpu + new fab tech can be a bring-up nightmare. What if they hit us with a surprise Maxwell shrink to 14nm of say the 980, as an interim before consumer Pascal hits later.

And we weren't hating when we said pascal Titan wouldn't launch this Spring. It's that og Titan (kepler) was March '13, then Titan X (maxwell) was March '15, so it seems really unlikely that they would slash cycle time in half suddenly. Titan Black was just a minor fuse-off adjustment not a new product family, just sayin'. Maybe they could roll out a refreshed maxwell Titan X shrunk to 14nm, but that seems unlikely to me, at this point. Maybe one of the lesser chips like I mentioned above.

Arachnotronic · Apr 5, 2016

Rvenger said:
I can tell you that I sell way more Quadros than anything. Geforce is not Nvidia's primary focus. Not even close.

No.

"Gaming" which is really GeForce GTX is ~4x the side of Professional Visualization.

nsavop · Apr 5, 2016

JDG1980 said:
The fact remains that AMD has demonstrated working FinFET GPU silicon, and Nvidia has not.

And yet they couldn't seem to spare an actual GP100 unit for the presentation. Instead all we saw is a 3D render.

Do you not believe Nvidia has working finfet gpus? With P100 in mass production and shipping in June in the DGX-1? When did it become a thing that companies need to demo future products for them to be real? Someone needs to tell Intel, Apple etc. they need to get with the program and start demoing unreleased products.

Arachnotronic · Apr 5, 2016

Fastest growth segment for NV outside of Autos is GeForce GTX...

Gaming is the lifeblood of NVIDIA's business, plain and simple. It is extremely unlikely that they neglect GeForce GTX for Quadro/Tesla.

jpiniero · Apr 5, 2016

Silverforce11 said:
Not only Apple taps into TSMC's FF node. Come on. TSMC was last reported as saying 16nm FF is still only a small share of it's projected revenue for 2016. It's ramping up, definitely not what we could consider as a mature node.

TSMC said that 16FF and 20p were 20% of their revenue in 2015. It was 24% in the 4Q. That doesn't sound like much but 28 nm was 28% for the year. That doesn't sound like a node that's "not mature".

Silverforce11 · Apr 5, 2016

nsavop said:
Do you not believe Nvidia has working finfet gpus? With P100 in mass production and shipping in June in the DGX-1?

Ofc they have.

8x GP100 per cluster for ~$127K? Heck yes! Profits!

Much better than selling 8x GTX Titan Pascal.

Arachnotronic · Apr 5, 2016

nsavop said:
Do you not believe Nvidia has working finfet gpus? With P100 in mass production and shipping in June in the DGX-1? When did it become a thing that companies need to demo future products for them to be real? Someone needs to tell Intel, Apple etc. they need to get with the program and start demoing unreleased products.

It's the argument that the NVIDIA haters who spent months spreading FUD about how NV hadn't taped out Pascal are resorting to in the face of evidence to the contrary.

It's clear now that the Drive PX2 will use low-end Pascal GPUs and those typically come out after the higher end/higher-margin GeForce GTX and Tesla/Quadro parts, which is why they are using Maxwell now but will switch to Pascal in Q3.

It never made sense to think that NVIDIA would so ineptly drop the ball as some have suggested, and the evidence is finally coming out.

NVIDIA Pascal Thread

Diamond Member

Senior member

Senior member

Golden Member

Golden Member

Lifer

Golden Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Lifer

Golden Member

Senior member

Senior member

Golden Member

Lifer

Lifer

Senior member

Lifer

Member

Lifer

Lifer

Lifer

Lifer