GPU-Driven Rendering Pipelines

NTMBK

Lifer
Nov 14, 2011
10,409
5,673
136
Talks from SIGGRAPH have started going up, there's an interesting one from the Assassin's Creed developers: http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf

Summary of their results:

CPU:
•1-2 Orders of magnitude less drawcalls
•~75% of previous AC, with ~10x objects

GPU:
•20-40% triangles culled (backface + cluster bounds)
&#8226;Only small overall gain: <10% of geometry rendering
&#8226;30-80% shadow triangles culled

Work in progress:
&#8226;More GPU-driven for static objects
&#8226;More batch friendly data
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,409
5,673
136
what's this mean for us? They made Unity more efficient?

It's an alternative approach to reducing CPU overhead. Instead of trying to push millions of draw calls, your CPU only launches a couple of draw calls- and the GPU handles the rest of the rendering process, doing GPU culling, launching the actual render parts of the pipeline.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Haha, funny joke. I thought the last 10 years of OpenGL development had shown how that plans out.

I thought a large part of the issue with OpenGL was the insistence upon backwards compatibility throughout all the different versions. Something that is being "fixed" with Vulkan (of course there may very well be other fundamentel issues).

Anyway, getting back on topic and talking about novel rendering pipelines, I personally found this, quite interesting. It's Media Molecules game "Dreams", where their engine has essentially dropped the normaler rasterizer pipeline and is 100% compute based, with the graphics made up of point clouds.
 

NTMBK

Lifer
Nov 14, 2011
10,409
5,673
136
Anyway, getting back on topic and talking about novel rendering pipelines, I personally found this, quite interesting. It's Media Molecules game "Dreams", where their engine has essentially dropped the normaler rasterizer pipeline and is 100% compute based, with the graphics made up of point clouds.

Yup, that one is pretty awesome too! Great read too, as they go through a bunch of failed prototypes before they get to the final working solution.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Yup, that one is pretty awesome too! Great read too, as they go through a bunch of failed prototypes before they get to the final working solution.

Yeah, following the whole process is really the most fascinating thing in that talk, at least for me personally (although the end product is also really really nice).
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
IMO the scope and sheer amount of stuff in Assassin's Creed Unity is staggeringly impressive. Truly a monument of modern technology imo. It's a crying shame that Ubisoft forced such a competent team to rush the product out well before it was done so they got a bad rap.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
The MSAA trick redlynx is nice. 1080p 60fps w/ MSAA on the xbox one, and with 10x as many objects rendered as before(Trials Fusion I assume). Very impressive.
 
Last edited:

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
That presentation's a bit too technical for me. XD What I want to know is, does this use Deferred Contexts in DirectX 11, or does it use DirectX 12? Deferred Contexts, from what I understand it, is an inferior approach to the simple closer-to-the-metal approach of DirectX 12.

Edit: Ok, seems like they're talking about DirectX 11, and only mention the future of DirectX 12. My takeway from all that is this: Whatever they've accomplished with DirectX 11, they can do even better with low-level APIs like DirectX 12 and Vulkan. I'm most intrigued by this slide:

NEW DX12 (PC) FEATURES
&#61656; ExecuteIndirect
&#61656; Asynchronous Compute
&#61656; VS RT index (GS bypass)
&#61656; Resource management
&#61656; Explicit multiadapter
&#61656; Tiled resources + bindless
&#61656; Conservative raster + ROV
FEATURES IN OTHER APIs
&#61656; Custom MSAA patterns
&#61656; GPU side dispatch
&#61656; SIMD lane swizzles
&#61656; Ordered atomics
&#61656; SV_Barycentric to PS
&#61656; Exposed CSAA/EQAA samples
&#61656; Shading language with templates

In the slide, "Conservative raster + ROV" is colored differently than the other items in that list. Perhaps in reference to how it's feature level 12.1? Ubisoft has been in bed with Nvidia for a while, so if anyone makes use of 12.1 features it's likely going to be them. I also wonder what they mean by "other APIs", do they mean Vulkan or the Gameworks/CUDA SDK? It's also interesting to see them mention EQAA along with CSAA, as EQAA is AMD's equivalent to CSAA. I've yet to see a game natively support EQAA.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,409
5,673
136
Just noticed that the Powerpoint version of this has presenter notes, which give some more information: http://advances.realtimerendering.c...iggraph2015_combined_final_footer_220dpi.pptx One example:

On the GPU we achieved significantly more effective culling, but this translates to only a small gain overall for the geometry rendering. We also had to move the gpu pipeline to async compute to remove the cost from the many stalls required for dependent compute jobs and compute jobs. For many passes the compute jobs of the gpu pipeline also do not completely fill the gpu, wasting time if not done in parallel with rendering.

And:

In the future use of bindless textures will allow us to further reduce the number of drawcalls significantly. Apart from the CPU benefits this will also help with the problem of increased GPU overdraw due to the unpredictable drawing order of batched meshes. As the number of drawcalls becomes very low, the drawing order will once again approach the order sorted by object center camera distance. With a very low number of drawcalls actual cluster camera distance sorting could become feasible.

As DX12/Vulkan reduces the cost per individual drawcall significantly, the biggest immediate benefit of a GPU-driven rendering pipeline (given the rendering algorithms and data used in ACU) would be reduced. Whether this kind of pipeline provides a benefit under DX12/Vulkan will depend on the data and rendering algorithms that are used going forward.
The next part of the talk will show some of the very interesting possibilities.

---------------------------------
GPU-driven still has*advantages:
* *- With DX12 (especially on PC), the*CPU still doesn't have low latency access to GPU depth buffer -->**CPU cannot cull shadows based on visible pixels.
* *- CPU is still bad at culling 100k+ objects at sub-object granularity (=*more than a million*visibility tests per frustum).
* *- GPU driven culling*is more power efficient. Reduced power usage increases CPU & GPU*clocks -> runs faster.
* *- Reference:*Modified Intel*DirectX 12 asteroids demo with ExecuteIndirect. Runs faster on Intel GPU and has much lower CPU usage.

With more control over async compute we hope to move the non-rendering part of the GPU-driven pipeline into async compute to remove all pipeline bubbles due to many small synchronous compute jobs (e.g. the compute jobs to generate the compute arguments for the next indirect compute job). We do already run other GPU-tasks in parallel during geometry only rendering (shadow, visibility), but do to the very coarse load balancing between async and graphics pipeline, the async compute jobs do not fully utilise the GPU during bubbles in the graphics pipe.
 

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
It's an alternative approach to reducing CPU overhead. Instead of trying to push millions of draw calls, your CPU only launches a couple of draw calls- and the GPU handles the rest of the rendering process, doing GPU culling, launching the actual render parts of the pipeline.

There is absolutely nothing new about this. Most game engines allow this and designers do it all the time. It minimizes the number of draw calls and hence 'stalling' of the pipeline, basically giving the GPU much more work to do in a single draw call by making that draw call very large and complex.


http://docs.unity3d.com/Manual/DrawCallBatching.html


To draw an object on the screen, the engine has to issue a draw call to the graphics API (e.g. OpenGL or Direct3D). Draw calls are often expensive, with the graphics API doing significant work for every draw call, causing performance overhead on the CPU side. This is mostly caused by the state changes done between the draw calls (e.g. switching to a different material), which causes expensive validation and translation steps in the graphics driver.

Unity uses static batching to address this. The goal of the static batching is to regroup as many meshes in less buffers to get better performance, rendering giant meshes instead of a lot of small meshes which is inefficient. Unity will only loop on the same resources to render different ranges of these resources. Effectively it executes a series of fast draw calls for each staticcally batched mesh.


What they've done is keep the draw calls down by doing a massive amount of linking up different geometric shapes that might otherwise appear unrelated.


PCPer talks about this too -

This is a single draw call for animating 3 birds in flight.

 

dogen1

Senior member
Oct 14, 2014
739
40
91
There is absolutely nothing new about this. Most game engines allow this and designers do it all the time. It minimizes the number of draw calls and hence 'stalling' of the pipeline, basically giving the GPU much more work to do in a single draw call by making that draw call very large and complex.








What they've done is keep the draw calls down by doing a massive amount of linking up different geometric shapes that might otherwise appear unrelated.


PCPer talks about this too -

This is a single draw call for animating 3 birds in flight.

[/QUOTE]


Do you mean batching? Redlynx's engine only does 2 indirect draw calls for any arbitrary scene. Then they handle scene visibility on the gpu itself. I don't think that's the same thing. Correct me if I'm wrong.
 
Last edited:

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
Do you mean batching? Redlynx's engine only does 2 indirect draw calls for any arbitrary scene. Then they handle scene visibility on the gpu itself. I don't think that's the same thing. Correct me if I'm wrong.

I don't know where you got 2 indirect calls for a scene, I think that may be true of a particular *object* if there is only one light source. Multiple light sources cause more draw calls for all objects affected by the light. So if you have say 500 separate polygons and one light source, you got at least 1000 draw calls.

I think their main point was a novel way of applying textures. Normally that's part of the draw call, but the slides indicate are linking up to textures somehow without it being part of the draw call. They also are doing a lot to make sure they aren't issuing draw calls for things that can't be seen on screen.

This has some answers on draw calls :

http://stackoverflow.com/questions/4853856/why-are-draw-calls-expensive

and

http://www.nvidia.com/docs/IO/8228/BatchBatchBatch.pdf
 

dogen1

Senior member
Oct 14, 2014
739
40
91
I don't know where you got 2 indirect calls for a scene, I think that may be true of a particular *object* if there is only one light source. Multiple light sources cause more draw calls for all objects affected by the light. So if you have say 500 separate polygons and one light source, you got at least 1000 draw calls.

I think their main point was a novel way of applying textures. Normally that's part of the draw call, but the slides indicate are linking up to textures somehow without it being part of the draw call. They also are doing a lot to make sure they aren't issuing draw calls for things that can't be seen on screen.

This has some answers on draw calls :

http://stackoverflow.com/questions/4853856/why-are-draw-calls-expensive

and

http://www.nvidia.com/docs/IO/8228/BatchBatchBatch.pdf

Sebbbi himself(the lead graphics programmer at redlynx, the guy who did that presentation) said that they only use 2 draw calls from the CPU for any given scene. From there the GPU takes over.

Here
https://forum.beyond3d.com/threads/...-a-useful-reference.46956/page-9#post-1865977
 
Last edited:

buletaja

Member
Jul 1, 2013
80
0
66
X1 using DX 11.x not even latest XDK
too bad games still not use it ...

MS seems must pleased IHV too + some politics

The actual X1 performance DX11.x versus PC DX11, this is from single thread
==================================




And Max Deferred context under DX11 core is way beyond PC or PS4
for example PS4 is only has (based on CPU) is max 6-8 deferred
=========================================




for example of relation core to deferred context under DX11
=====================================
 
Last edited:

zlatan

Senior member
Mar 15, 2011
580
291
136
I also wonder what they mean by "other APIs", do they mean Vulkan or the Gameworks/CUDA SDK?

Custom MSAA patterns, GPU side dispatch, Ordered atomics, SV_Barycentric to PS, Exposed EQAA samples - can be used in Mantle (some of these probably will be available in Vulkan)
SIMD lane swizzles - can be used in OpenCL/CUDA
Shading language with templates - I don't want to break NDA, but khm...Vulkan
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |