Understanding NVIDIA's architecture

Cybercat

Member
Feb 28, 2004
57
0
61
http://www.anandtech.com/video/showdoc.aspx?i=2031

Using this article, I've been trying to get an understanding of the architecture of NVIDIA's pixel pipeline. One part I'm confused at is here:

"In both NV3x and NV40 architectures, z and color can be calculated per pixel at the same time. In addition, rather than coloring a pixel, a z or stencil operation can be performed in the color unit. This allows NV3x to perform 8 z or stencil ops per clock and NV40 to perform 32 z or stencil ops per clock. NVIDIA has started to call this "8x0" and "32x0", respectively, as no new pixels are drawn. This mode is very useful if a z only pass is performed first, or if stencil shadows are used (as is the case with Doom 3)."

First I thought that the NV3x could do 8 of those operations per clock because it has two texture units in each of its four pipelines. But then it says the NV40 can do 32 of them per clock, when it only has one texture unit per pipeline. Then I realized it's not saying they're being done in the texture unit, they're being done in the "color" unit. I've never heard of this one before. Then later on in the article a "math/shader" unit is mentioned.

So from what I gather, it breaks down like this. In the pixel pipeline of the NV40 you have all of these units:

a texture unit
two math/shader units
a color unit

Is this correct? And if it is, are there more unit types in the pipeline I'm not picking up on?

------------------------------------------------------------------------------------------------------------------------

Next half to this. It is said:

"If we had enough processing power, we could actually process every single pixel on the screen at the same time. Even though going to such extremes is currently not an option (I wonder where we'll be in another decade or two), currently graphics cards are able to process multiple pixels at a time."

In other words, current graphics cards can't process all the pixels on the screen at once. Then it says later on:

"The way NVIDIA overcame these [scheduling] issues in NV40 was to revamp the internals of their shader pipelines by adding an extra math unit to all the pixel pipes (pixel shaders can now execute two math instructions at the same time, or a math and texture instruction), and expanding the number of registers available for shader programs to use."

So, without texturing going on, in a pure pixel shader environment, the NV40 architecture can run two math operations per clock. Then the register space is increased.

Current resolutions by these day's standards are usually 1024x768, 1280x960, and 1600x1200, assuming we're talking about highend graphics cards (which the NV40 is). The total amount of pixels for these resolutions are as follows:

1024x768 = 786432 total pixels
1280x960 = 1228800 total pixels
1600x1200 = 1920000 total pixels

So 768,432 is the minimum amount we're talking about. Since it is said current graphics cards can't process all the pixels on the screen at the same time, this is going to be my threshold. This, along with taking into account the statement that the NV40 can have up to 8 times the shader performance of the NV3x, are to be used as guidelines for finding how many (unspecified) registers are in the NV3x and NV40 architecture. By choosing different variables, I've come up with these as the most likely numbers:

NV3x
4 pipelines
2 texture units (not used in pixel shading)
1 math unit
35 registers (amount of space)
5900 Ultra: 450 (clockspeed) x 4 (pipes) x 1 (math) x 35 (reg's) = 63000 pixels per sec
5950 Ultra: 475 (clockspeed) x 4 (pipes) x 1 (math) x 35 (reg's) = 66500 pixels per sec

NV40
16 pipelines
1 texture unit (not used in pixel shading)
2 math units
40 registers (amount of space)
6800 Ultra: 400 (clockspeed) x 16 (pipelines) x 2 (math) x 40 (reg's) = 512000 pixels per sec

Keep in mind of course that I pulled the number I have here of registers here out of my ass, as far as how accurate I know them to be. But if you round it out, the NV40 in that equation ends up having roughly 8x the pixel shader performance of the NV3x.

Now, I don't know nearly as much about ATI's pixel-pipeline architecture as I do for NVIDIA, but let's just say they're the same, except that the R42x architecture doesn't have a second math unit. So this is what it comes out to:

R42x
16 pipelines
1 texture unit (not used in pixel shading)
1 math unit
40 registers (amount of space)
X850XT PE: 540 (clockspeed) x 16 (pipelines) x 1 (math) x 40 (reg's) = 345600 pixels per sec

So this is suggesting that in a pure pixel shader environment, the 6800 Ultra is faster than the X850XT PE (by 48%; the X800XT PE would be 54% slower). This goes along with my theory that the R42x is faster in vertex processing (higher fillrate), but slower in pixel shading than the NV40.

HOWEVER, most games aren't just using pixel shading for all of their effects. They're also using texturing (normal mapping for instance). Since the NV40 architecture can only do either one math op and one texture op per clock, or just two math ops per clock, in a real-world setting where texturing does occur, the NV40 isn't able to do two math ops at once. So the number come down to:

6800 Ultra: 400 (clockspeed) x 16 (pipes) x 1 (math op) x 40 (reg's) = 256000 pixels per sec
X850XT PE: 540 (clockspeed) x 16 (pipes) x 1 (math op) x 40 (reg's) = 345600 pixels per sec

This then suggests that the X850XT PE is faster in real-world shader performance than the 6800 Ultra (by 35%; the X800XT PE would be 30% faster).

Keep in mind these are all theoretical numbers. They don't take into account driver and application efficiency (along with influences from other pieces of hardware).

So then, how far off the ball am I here?
 

Insomniak

Banned
Sep 11, 2003
4,836
0
0
Go look at the benchmarks and win. Theoretical Calculations vs. Real World Results never equate.
 

Cybercat

Member
Feb 28, 2004
57
0
61
Originally posted by: Insomniak
Go look at the benchmarks and win. Theoretical Calculations vs. Real World Results never equate.

I have looked at the benchmarks. Many, many times, from different sources. I'm not trying to figure who "should" perform better, or who actually does perform better, I'm trying to find out if my understanding of the architecture is correct. And to the second statement, I've clearly acknowledged this fact when I wrote:

"Keep in mind these are all theoretical numbers. They don't take into account driver and application efficiency (along with influences from other pieces of hardware)."

Have you read it more closely (or all of it), you would have saw that statement. I think you underestimate me.

Anyway, bump.
 

jiffylube1024

Diamond Member
Feb 17, 2002
7,430
0
71
I don't know where you got the idea of multiplying the fill rate (clock speed times # of pipelines) by the alleged number of 'math units' and registers to get the number of pixels/sec the video cards can process, but you are way, way off. Registers are just registers - places to store the data in!

Fill rate is how cards are measured and that is how you get the theoretical peaks of the cards. Radeon X800XT: 520MHz X 16 pipelines = 8.336 Gigapixels/sec (also 8.3336 Gigatextels/sec theoretically - explained below).

GeForce 6800Ultra: 400 MHz X 16 pipelines = 6.4 Gigapixels/sec (also 6.4 Gigatextels/sec theoretically)

Gigatextels are a made up term for 'textured pixels', so older cards such as the GeForce2 which used a 4X2 pipeline configuration (two texture units per pipeline) would render double the number of 'textels' to pixels per second, while all current gen cards run at 4X1 / 8X1 / 12X1 /16X1 (they all have 1 texture unit per pipeline), so they have the same theoretical rating for gigapixels/sec and gigatextels/sec.

Remember, the clockspeed is in MHz, so that's 540,000,000 cycles per second, not 540.

According to your calculations the 6800 ultra can process 512,000 pixels/sec while the X850XT PE can process 345,600 pixels/sec.

This is less than the number of pixels on a screen even at 800X600! According to your units the X850XT PE can process 1/4 of the pixels on the screen at 1600X1200 per second, meaning if it didn't have to do any calculations on the pixels whatsoever but just output them, it would get (maximum) 0.25fps! Imagine what having to do many additional DirectX9 shader calculations would do to the framerate (along with the usual geometry translations, etc)!
 

knyghtbyte

Senior member
Oct 20, 2004
918
1
0
woah...dont read threads like this when your drunk.....hahahah
love it

hands u guys a :beer:
 

Cybercat

Member
Feb 28, 2004
57
0
61
Originally posted by: jiffylube1024
I don't know where you got the idea of multiplying the fill rate (clock speed times # of pipelines) by the alleged number of 'math units' and registers to get the number of pixels/sec the video cards can process, but you are way, way off. Registers are just registers - places to store the data in!

Fill rate is how cards are measured and that is how you get the theoretical peaks of the cards. Radeon X800XT: 520MHz X 16 pipelines = 8.336 Gigapixels/sec (also 8.3336 Gigatextels/sec theoretically - explained below).

GeForce 6800Ultra: 400 MHz X 16 pipelines = 6.4 Gigapixels/sec (also 6.4 Gigatextels/sec theoretically)

Gigatextels are a made up term for 'textured pixels', so older cards such as the GeForce2 which used a 4X2 pipeline configuration (two texture units per pipeline) would render double the number of 'textels' to pixels per second, while all current gen cards run at 4X1 / 8X1 / 12X1 /16X1 (they all have 1 texture unit per pipeline), so they have the same theoretical rating for gigapixels/sec and gigatextels/sec.

Remember, the clockspeed is in MHz, so that's 540,000,000 cycles per second, not 540.

According to your calculations the 6800 ultra can process 512,000 pixels/sec while the X850XT PE can process 345,600 pixels/sec.

This is less than the number of pixels on a screen even at 800X600! According to your units the X850XT PE can process 1/4 of the pixels on the screen at 1600X1200 per second, meaning if it didn't have to do any calculations on the pixels whatsoever but just output them, it would get (maximum) 0.25fps! Imagine what having to do many additional DirectX9 shader calculations would do to the framerate (along with the usual geometry translations, etc)!

So are you saying all that matters is clockspeed and pipelines when it comes to theoretical performance?

And also, the article did say that not all the pixels on the screen can be processed at the same time, so that's what led me to come up with such low numbers. Then I thought registers were the width size that math data could flow through (I came to that conclusion because of the Tetris analogy). I understand now. Thanks for the clarification so far.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: Cybercat
So are you saying all that matters is clockspeed and pipelines when it comes to theoretical performance?

For fillrate, yes. I mean, that's basically how fillrate is *defined*. If you run a fillrate test (such as the ones included with the various 3DMark benchmarks), you'll get a number very close to the theoretical limit of the GPU.

However, shader performance is a very different beast, and it's very hard (if not impossible) to compare shader performance directly between cards with different architectures. The GeForceFX cards are very different from the GeForce6 cards in terms of shaders, and are even further from, say, ATI's R420 architecture.

The "math" and "color" 'units' being described in that AT article are not really separate entities from the shader pipeline, but rather are part of it. It's just that, if you are not using the "color" calculation hardware for calculating color (for instance, when doing a z-only pass for shadowing), you can use that same hardware to do an extra z-calculation in parallel with the normal math hardware on the NV3X and NV40. So they act (effectively) like they have twice as many pixel shaders when doing this sort of operation (hence the '8x0' and '32x0' modes for NV3X and NV40, respectively). This is a big part of why these cards perform so well at Doom3 (which makes extensive use of stencil shadows).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |