NVIDIA Pascal Thread

RussianSensation · Apr 5, 2016

Kris194 said:
It seems that Pascal will smash Maxwell cards

That's not even up for debate. What's rather interesting is über high clocks + narrow chip based on what some of us expected (16nm FinFET CC isn't a big jump when 28nm was 3072 CCs). That could be a lifeline to AMD. 780Ti/980Ti were wide, huge chips and both had great overclocking headroom. This time with Pascal NV isn't going that much wider than GM200. Of course we don't know if there is a corresponding increase in IPC with those CUDA cores (128 Maxwell cores were 90% as efficient as 192 Kepler cores = 35% increase in IPC). But my point it is this could be the first time in a while where AMD actually has a choice of going a lot more SPs @ lower clocks OR high clocks with less cores. If AMD can manage a wide design (5120-6144) and still have overclocking headroom, then this gen will be a lot more interesting if GP100 maxes out at 3840.

Where are you guys getting that GP100 has no ROPs?

AtenRa · Apr 5, 2016

Timmah! said:
How does it work then?

Each Cuda Core has ALUs and FP units. Each ALU/FPU can operate at 16-32 and 64 bit depending on the registers, datapath widths and operands.

There are no independent 32bit Cuda Cores and 64bit Cuda Cores. There is a single Cuda Core design that its ALUs/FPUs can operate at 32bit or 64bit.

For 64bit you need double the datapath width and operands of the 32bit. Thus you have 1/2 (half) the single (32bit) precision units (half the performance of the 32bit).
But you may want to narrow your 64bit even further and not all your Cuda Cores have 64bit datapath widths, in order to save space and make your IC simpler. So perhaps you want only 1 out of 4 Cuda Cores to have a datapath width of 64bits. Thus you will have 1/4 the 32bit performance etc etc.

Arachnotronic · Apr 5, 2016

Didn't somebody promise us that AMD was 6+ months ahead of NVIDIA in terms of a 14/16nm GPU? NV is in volume production today on this monster while all AMD has talked about are relatively small Polaris chips.

Lol.

BlitzWulf · Apr 5, 2016

Anyone posted this yet?

https://devblogs.nvidia.com/parallelforall/inside-pascal/

Arachnotronic · Apr 5, 2016

BlitzWulf said:
Anyone posted this yet?

https://devblogs.nvidia.com/parallelforall/inside-pascal/

Interesting quote:

While TSMC’s 16nm Fin-FET manufacturing process plays an important role, many GPU architectural modifications were also implemented to further reduce power consumption while maintaining high performance.

Guess the people who said Pascal was Maxwell on 16FF+ were dead wrong.

BlitzWulf · Apr 5, 2016

AtenRa · Apr 5, 2016

Arachnotronic said:
Didn't somebody promise us that AMD was 6+ months ahead of NVIDIA in terms of a 14/16nm GPU? NV is in volume production today on this monster while all AMD has talked about are relatively small Polaris chips.

Lol.

I dont know if AMD is 6+ months ahead or if they are ahead at all but im 110% sure we will see and use 14nm AMD GPUs before GP100 products reach the market.

MrTeal · Apr 5, 2016

RussianSensation said:
That's not even up for debate. What's rather interesting is über high clocks + narrow chip based on what some of us expected (16nm FinFET CC isn't a big jump when 28nm was 3072 CCs). That could be a lifeline to AMD. 780Ti/980Ti were wide, huge chips and both had great overclocking headroom. This time with Pascal NV isn't going that much wider than GM200. Of course we don't know if there is a corresponding increase in IPC with those CUDA cores (128 Maxwell cores were 90% as efficient as 192 Kepler cores = 35% increase in IPC). But my point it is this could be the first time in a while where AMD actually has a choice of going a lot more SPs @ lower clocks OR high clocks with less cores. If AMD can manage a wide design (5120-6144) and still have overclocking headroom, then this gen will be a lot more interesting if GP100 maxes out at 3840.

Where are you guys getting that GP100 has no ROPs?

By what metric do you mean that128 Maxwell cores come within 90% of 192 Kepler cores? One Maxwell core had the same number of FLOPs as one Kepler core obviously. Even in gaming at launch, normalized for frequency and core count Titan X was only 12% faster than 780Ti at 1080p, but there are a lot more things than the cores that affect that.

Kenmitch · Apr 5, 2016

Arachnotronic said:
Interesting quote:

Guess the people who said Pascal was Maxwell on 16FF+ were dead wrong.

Nothing really has been proven wrong yet. No info = nothing as far as the cards for mortals go.

Unless....You count the thread titile.

" NVIDIA Pascal Thread - Geforce GTX 1080 launching in May "

Not sure if the odds are slim to none yet but as each day goes by May is getter closer and closer.

Really wish some real leaks, demos, etc would drop pretty soon.

Snarf Snarf · Apr 5, 2016

Arachnotronic said:
Didn't somebody promise us that AMD was 6+ months ahead of NVIDIA in terms of a 14/16nm GPU? NV is in volume production today on this monster while all AMD has talked about are relatively small Polaris chips.

Lol.

We still don't know if Nvidia is actually ahead of AMD in anything other than a monstrous die that mere mortals won't get to touch for 18 months. No mention of GP104 at all, not even a "this is coming soon" hint. AMD showed the industry working parts actually running games with drivers... still looks like AMD is ahead right now (at least for the time being)

I think it goes without saying this chip really is a marvel of engineering, but Nvidia needed this badly, and needed it fast. High margin low volume massive die vs high volume low margin smaller dies on consumer space is an interesting twist here. My bet is on Nvidia coming out ahead here from a financial standpoint. Maybe AMD can claw some market share back for a couple of months while Nvidia is still (maybe) waiting on GDDR5X

RussianSensation · Apr 5, 2016

Arachnotronic said:
Interesting quote:

Guess the people who said Pascal was Maxwell on 16FF+ were dead wrong.

I think the general theme in regard to Pascal = refined Maxwell + HBM2 + 16nm FinFET came directly from NV's slide. From what you posted, all it tells us is higher priority on perf/watt but doesn't mean the architecture is dramatically different. Sure, it's improved but doesn't sound like a major redesign as we saw with VLIW-4/5 --> GCN. That was revolutionary. Pascal restructured CCs to 64 per SM and added HBM2. Not a single word was spoken on Asynchronous Compute suggesting all those people who said Pascal won't be a huge leap in this area are probably correct. Seems when Pascal was designed 3-4 years ago, they went all in on AI/Neural Networks/Deep Learning. That means AMD can just retain 8 ACEs as they will still be way ahead in Async, and focus on rasterizer, culling, geometry shaders, TMUs, ROPs and SPs. This is another lifeline to AMD in the gaming market since NV clearly prioritized different areas with Pascal. I think NV has the right approach as those are faster growing markets than gaming. However, I truly think this is a huge opportunity for AMD. For the first time in a long time, Pascal doesn't sound mind-blowing for games/Async. Seems the heavy focus on compute side took its toll finally. Otherwise, wouldn't we have expected a 1200+MHz 4500-6000 CUDA Async Compute 610mm2 monster? Are you honestly not shocked it's "only" a 3840 chip clocked at nearly 1.5Ghz?

Most of the features unveiled are professional: NV-Link, FP64. Those don't matter for GeForce. I didn't see anything about Pascal itself that is revolutionary vs. Maxwell. Sounds like more of the same, just improved courtesy of FinFET and HBM and huge L2 cache. I am going to wager that GCN is going to be a bigger leap in perf/mm2 vs. GCN 1.0/1.1 than Pascal is against Maxwell. I mean they literally almost upped the clocks 40%. 25% is coming from more CCs. That alone accounts for about a 70% improvement over Maxwell. How much more will be from new architecture? Maybe another 15-20%? Granted NV is in the driver seat and AMD is asleep at the wheel with small Polaris 10/11 chips; so not like NV cares

airfathaaaaa · Apr 5, 2016

Arachnotronic said:
Interesting quote:

Guess the people who said Pascal was Maxwell on 16FF+ were dead wrong.

how so? they are identical its like they slapped a kepler on a maxwell put some of gcn 1.0 and voila
also a card that is almost 100% pure compute with no ROP's isnt really a measure point for gaming is it now D:

and since we didnt saw anything gaming wise its safe to say that we will see nvidia cards at best from q3

Arachnotronic · Apr 5, 2016

RussianSensation said:
I think the general theme in regard to Pascal = refined Maxwell + HBM2 + 16nm FinFET came directly from NV's slide. From what you posted, all it tells us is higher priority on perf/watt but doesn't mean the architecture is dramatically different. Sure, it's improved but doesn't sound like a major redesign as we saw with VLIW-4/5 --> GCN. That was revolutionary. Pascal restructured CCs to 64 per SM and added HBM2. Not a single word was spoken on Asynchronous Compute suggesting all those people who said Pascal won't be a huge leap in this area are probably correct. Seems when Pascal was designed 3-4 years ago, they went all in on AI/Neural Networks/Deep Learning. That means AMD can just regain 8 ACEs as they will still be way ahead in Async, and focus on rasterizer, culling, geometry shaders, TMUs, ROPs and SPs. This is another lifeline to AMD in the gaming market since NV clearly prioritized different areas with Pascal. I think NV has the right approach as those are faster growing markets than gaming. However, I truly think this is a huge opportunity for AMD. For the first time in a long time, Pascal doesn't sound mind-blowing for games/Async. Seems the heavy focus on compute side took its toll finally. Otherwise, wouldn't we have expected a 1200+MHz 4500-6000 CUDA core 610mm2 monster? Are you honestly not shocked it's "only" a 3840 chip clocked at nearly 1.5Ghz?

Most of the features unveiled are professional: NV-Link, FP64. Those matter for squat for us. I didn't see anything about Pascal itself that is revolutionary vs. Maxwell. Sounds like more of the same, just improved courtesy of FinFET and HBM and huge L2 cache. I am going to wager that GCN is going to be a bigger leap in perf/mm2 vs. GCN 1.0/1.1 than Pascal is against Maxwell. I mean they literally almost upped the clocks 40%. 25% is coming from more CCs. That alone accounts for about a 70% improvement over Maxwell. How much more will be from new architecture? Maybe another 15-20%? Granted NV is in the driver seat and AMD is asleep at the wheel with small Polaris 10/11 chips; so not like NV cares

Gaming is NVIDIA's largest market by far & one of its fastest growing, if you think NV is taking its eye off the ball and giving AMD an opening I don't know what to tell you.

It's obvious that NVIDIA is building GPUs tailored for each application because it has the luxury to do so. They have the revenue in these major segments to justify doing an HPC/professional oriented GPU as well as a set of gaming-oriented versions which I'm sure we'll learn about closer to launch.

AMD is hyping Polaris 10/11 because their current GPU sales are in the toilet and by hyping them up in the press, they would create an image of "being ahead" of NV (no doubt to help boost its stock price). NV, which actually makes a lot of $ from selling GPUs obviously doesn't want to signal to gamers that "hey, the stuff we're trying to sell you is crappy and obsolete, wait for the new stuff!"

Anyway, the point is that there were some really comical posts across the web claiming that NVIDIA hadn't taped out Pascal when in fact they had not only taped out a monster of a GPU using state-of-the-art process tech & packaging technology but have now gone into volume production. Oh, and I guess NV is the first to use HBM2 memory in a product that is selling for revenue, so much for the claims that NV would be behind because they didn't put out an HBM1 product with a measly 4GB of RAM.

RussianSensation · Apr 5, 2016

Arachnotronic said:
Gaming is NVIDIA's largest market by far & one of its fastest growing, if you think NV is taking its eye off the ball and giving AMD an opening I don't know what to tell you.

It's obvious that NVIDIA is building GPUs tailored for each application because it has the luxury to do so. They have the revenue in these major segments to justify doing an HPC/professional oriented GPU as well as a set of gaming-oriented versions which I'm sure we'll learn about closer to launch.

So you think GP100 for Tesla and GP102 will have > 4000 CCs, > 240 TMUs and Async Compute? Hmm... I don't believe it. NV has never done that before IIRC. They used to subsidize Flagship GeForce with Quadro/Tesla; hence why it made sense to make a Big Daddy for 3 of those markets. Now you are suggesting NV will start making Big Quadro/Tesla chips which are completely separate from Big Daddy gaming line?

Btw, GP104 is nowhere to be found. If it were launching in May, it's pretty odd to not unveil it now. Starting to sound like GP104 May launch may have been rumormill.

Seems NV is in no rush to ship GeForce GP100/102 if they are selling 8x Tesla P100 cards via DGX-1 for $129,000! This just reinforces what many of us predicted -- mid-range Pascal 970/980 successors for 2016.

Glo. · Apr 5, 2016

If, Arachnotronic, you will look closer to SMM's of Pascal you will see that they are 128 core units from Maxwell split into two parts that have shared memory, and they have slapped FP64 units.

Nothing so far proves that Pascal is not refined Maxwell.

airfathaaaaa · Apr 5, 2016

have you seen any consumer cards so far? did they annc anything towards consumers? supplying oak ridge its something nvidia does EVERY TIME first and then goes to annc the consumer cards

if you think they will have enough cards to supply both oak ridge and consumers for the rest of the 2016 you need to wake up lol

Arachnotronic · Apr 5, 2016

RussianSensation said:
So you think GP100 for Tesla and GP102 will have > 4000 CCs, > 240 TMUs and Async Compute? Hmm... I don't believe it. NV has never done that before IIRC. They used to subsidize Flagship GeForce with Quadro/Tesla; hence why it made sense to make a Big Daddy for 3 of those markets. Now you are suggesting NV will start making Big Quadro/Tesla chips which are completely separate from Big Daddy gaming line?

I think it is a fallacy that Tesla/Quadro "subsidized" the GeForce line. Quadro and Tesla are relatively tiny businesses for NV. They are higher margin but the gaming GeForce cards are quite high margin & the revenue from them is huge .

I'm not going to pretend to be able to know what the specs of NV's gaming Pascal cards will be, but it's important not to underestimate NVIDIA in light of what has been phenomenal execution over the last several years. This company knows that it lives & dies by its performance in the gaming GPU market, so I would expect products tailored with that notion in mind to roll off the line when the time comes.

Qwertilot · Apr 5, 2016

Ummm, this architecture is clearly rather different from maxwell if it can't play games! Seems to qualify as a pretty radical step to me

There are also reports about a talk about that self driving car module online. That would be GP106(?whatever it is?)/ the Pascal Tegra. Pascal version of that shipping in Q3 it seems, which puts a worse case time scale on a chunk of their gaming product stack.

Nothing about consumers, but then if they have really split off the compute/consumer stuff they wouldn't talk about that at this conference. Keep the markets distinct from each other.

nvgpu · Apr 5, 2016

http://www.anandtech.com/show/8729/nvidia-launches-tesla-k80-gk210-gpu

Meanwhile GK210 will be in an odd place as it will likely be the first NVIDIA GPU not to end up in a consumer card; prior to this generation every GPU has pulled double duty as both a compute powerhouse and a graphics king. But with GM204 clearly ahead of GK110/GK210 in graphics, GK210 seems destined to Tesla cards and at most a Titan card for the budget compute market. Given the costs in bringing a new GPU revision to market – just the masks alone are increasingly expensive – the situation implies that NVIDIA expects to more than make back their money on additional sales enabled by GK210, which in turn indicates that they have quite a bit of faith in the state of the GPU compute market since it alone would be where the additional revenue would come from.

Nvidia has already done it before, their first HPC only GPU was GK210.

Timmah! · Apr 5, 2016

AtenRa said:
Each Cuda Core has ALUs and FP units. Each ALU/FPU can operate at 16-32 and 64 bit depending on the registers, datapath widths and operands.

There are no independent 32bit Cuda Cores and 64bit Cuda Cores. There is a single Cuda Core design that its ALUs/FPUs can operate at 32bit or 64bit.

For 64bit you need double the datapath width and operands of the 32bit. Thus you have 1/2 (half) the single (32bit) precision units (half the performance of the 32bit).
But you may want to narrow your 64bit even further and not all your Cuda Cores have 64bit datapath widths, in order to save space and make your IC simpler. So perhaps you want only 1 out of 4 Cuda Cores to have a datapath width of 64bits. Thus you will have 1/4 the 32bit performance etc etc.

Thanks. Does this mean that full GP100 has 5760 cuda cores, but only 1920 out of them have that 64bit wide datapath? Are all the 5760 cores available for 32bit computing? Or only those 3840?

AtenRa · Apr 5, 2016

Arachnotronic said:
Oh, and I guess NV is the first to use HBM2 memory in a product that is selling for revenue, so much for the claims that NV would be behind because they didn't put out an HBM1 product with a measly 4GB of RAM.

Tesla GP100 will only be launch in Q1 2017, and that will only be on Servers.

Vega with HBM2 will launch in early 2017 and that will be for retail.

So no, NV is not first with HBM2.

ThatBuzzkiller · Apr 5, 2016

Honestly I was expecting a little more from GP100 ...

The clocks just aren't that special considering overclocked GM200 chip can hit 1200MHz base and 1400+MHz boost easily while Pascal seems to be already at it's thermal limits ...

For a 610mm^2 die size with double the transistor density you would think that there'd be a 50% increase in shader count but it's half of that. Maybe there's an increase in IPC but I thought Maxwell had good IPC already plus it needs less parallelism too ...

AtenRa · Apr 5, 2016

Timmah! said:
Thanks. Does this mean that full GP100 has 5760 cuda cores, but only 1920 out of them have that 64bit wide datapath? Are all the 5760 cores available for 32bit computing? Or only those 3840?

GP100 has 3840 Cuda Cores capable of 32bit, and half of those can do 64bit.

Arachnotronic · Apr 5, 2016

AtenRa said:
Tesla GP100 will only be launch in Q1 2017, and that will only be on Servers.

Vega with HBM2 will launch in early 2017 and that will be for retail.

So no, NV is not first with HBM2.

Try again:

Availability
General availability for the Pascal-based NVIDIA Tesla P100 GPU accelerator in the new NVIDIA DGX-1 deep learning system is in June. It is also expected to be available beginning in early 2017 from leading server manufacturers.

If you've got the cash to buy DGX-1, you can get GP100 in June.

Despoiler · Apr 5, 2016

airfathaaaaa said:
who was the one shouting a week ago that pascal will be on the level of gcn 1.0 but nothing more? yeah i guess he was the one that fall into it spot on i guess

You must have missed the context of the post. Pascal's async compute abilities are supposedly only as good as GCN 1.0

RussianSensation said:
Where are you guys getting that GP100 has no ROPs?

P100 has no ROPs listed on the spec sheet. That's probably why.

EDIT:Jesus I was thinking GP100 is the consumer class chip, but Nvidia has both a product name P100 and chip level. They seeming interchange the two on their website, when they are the same thing.

Arachnotronic said:
Interesting quote:

Guess the people who said Pascal was Maxwell on 16FF+ were dead wrong.

Actually what people have said is that Maxwell is a derivative of Pascal. Maxwell being created when 20nm fell through. The GP consumer series chips could be radically different from P100 to be more suited for gaming. You can see that P100 Pascal adds back the DP and compute that Maxwell is missing as well as adds NVLINK and HBM2, both of which have been known for sometime. Pascal has changed it's compute capability level, which is new information. It's really not all that different spec wise compared to Maxwell. We still don't know if it can be used in conjunction with DX12 or Vulkan.

Edit:spelling

NVIDIA Pascal Thread

Elite Member

Lifer

Lifer

Member

Lifer

Member

Lifer

Diamond Member

Diamond Member

Senior member

Elite Member

Senior member

Lifer

Elite Member

Diamond Member

Senior member

Lifer

Golden Member

Senior member

Golden Member

Lifer

Golden Member

Lifer

Lifer

Golden Member