Stream processors...

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Denithor

Diamond Member
Apr 11, 2004
6,298
23
81
Originally posted by: keysplayr2003
So those other 640 sp's sit by idly and watch the 160 do all the work then? LOL

In some cases, yes.

AT comparison of ATi & nVidia stream processor architecture

Take a look at the SP Issue Width table (near the bottom). The coding of a game basically determines how well it will run on nV or ATi hardware. ATi architecture has the potential to be considerably faster than nV but only if the game is optimized to take advantage of the architecture.

And I doubt anyone will want to code specifically to favor ATi hardware as the result would be harder to process on nV so a lot of gamers would be pissed.

Really, the mouse vs rat example is quite good. Think of it in terms of five mice working on a block of cheese versus one rat. If you can distribute the cheese to all five mice they will eat (process) it much faster than the single rat. However, if the cheese cannot be broken apart into pieces the rat will easily outpace a single mouse trying to eat the same size piece of cheese alone. (graphic here.)
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: myocardia
Originally posted by: Azn
It still does more processing. That's all that matters.

Sure, to the children that couldn't comprehend what his post said.

Let me guess. You must be the child who doesn't understand.



Originally posted by: keysplayr2003
Originally posted by: Azn
Originally posted by: Cuular
Unfortunately in the case of ATI and nvidia there is no shared definition of stream processor.

Each company defines an SP differently.

For ATI they have a processing unit with 5 "streams" in it, which could in the right circumstances do 5 things at once. So in reality they only have 160 real processors(800=5*160), but to make it look better, the marketing people say it's 800.

So if you look at the number of real processing units, not the maximum number of operations that could be going on at once, with the best workload imaginable, it's amazing that the 160 processors in the ATI card can meet or sometimes beat the 240 in the nvidia card.

Since the general public can only understand larger numbers ATI uses the "800 stream processors" as their number instead of 160 multi-stream processors. Technophiles understand the amazing work that 160 real processing units can keep up with 240. The public in general doesn't.

It still does more processing. That's all that matters.

So those other 640 sp's sit by idly and watch the 160 do all the work then? LOL

You are getting way skewed by your signature. Pity.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: Denithor
Originally posted by: keysplayr2003
So those other 640 sp's sit by idly and watch the 160 do all the work then? LOL

In some cases, yes.

AT comparison of ATi & nVidia stream processor architecture

Take a look at the SP Issue Width table (near the bottom). The coding of a game basically determines how well it will run on nV or ATi hardware. ATi architecture has the potential to be considerably faster than nV but only if the game is optimized to take advantage of the architecture.

And I doubt anyone will want to code specifically to favor ATi hardware as the result would be harder to process on nV so a lot of gamers would be pissed.

Really, the mouse vs rat example is quite good. Think of it in terms of five mice working on a block of cheese versus one rat. If you can distribute the cheese to all five mice they will eat (process) it much faster than the single rat. However, if the cheese cannot be broken apart into pieces the rat will easily outpace a single mouse trying to eat the same size piece of cheese alone. (graphic here.)

How do you explain games like GRID or Assasin's Creed? Are they specifically optimized for ATI hardware?
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Originally posted by: Azn
Originally posted by: myocardia
Originally posted by: Azn
It still does more processing. That's all that matters.

Sure, to the children that couldn't comprehend what his post said.

Let me guess. You must be the child who doesn't understand.



Originally posted by: keysplayr2003
Originally posted by: Azn
Originally posted by: Cuular
Unfortunately in the case of ATI and nvidia there is no shared definition of stream processor.

Each company defines an SP differently.

For ATI they have a processing unit with 5 "streams" in it, which could in the right circumstances do 5 things at once. So in reality they only have 160 real processors(800=5*160), but to make it look better, the marketing people say it's 800.

So if you look at the number of real processing units, not the maximum number of operations that could be going on at once, with the best workload imaginable, it's amazing that the 160 processors in the ATI card can meet or sometimes beat the 240 in the nvidia card.

Since the general public can only understand larger numbers ATI uses the "800 stream processors" as their number instead of 160 multi-stream processors. Technophiles understand the amazing work that 160 real processing units can keep up with 240. The public in general doesn't.

It still does more processing. That's all that matters.

So those other 640 sp's sit by idly and watch the 160 do all the work then? LOL

You are getting way skewed by your signature. Pity.

The real pity is that you can't have a conversation without descending to my sig. It's cheap.
So my question was a real one. What my question has to do with my sig remains a mystery.
Do the other 640 sp's just sit by idly and watch the other 160 do all the work?
If that was the case, why bother with the other 640 if they do nothing?

@ Denithor: You make a nice example. But just by looking at the SP Issue table can't tell up how much work the 640 lesser shaders do at any given time. It could be Rats, Mice, and dust mites for all we know.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Just let the issue rest guys. SPs by themselves are a useless metric of comparing performance between architectures, just like megahertz are for a CPU (or a GPU for that matter), and that's the entire point of this thread. Do you really need to argue for the n+1th time about which card is "really" faster when that wasn't even the question?
 

AzN

Banned
Nov 26, 2001
4,112
2
0
If you weren't so cheaply skewed then I wouldn't even mention your sig. No point ending your comment with LOL if you were for REAL.

Yet they don't do nothing as mentioned before. You can see games that are heavily shader dependent is where RV770 does well.

 

AzN

Banned
Nov 26, 2001
4,112
2
0
Games like GRID and Assasin Creed in particular. It could also be that RV770 FP32 Texture blending rates are more than 280gtx which might also be a factor.

Easy way to test out this theory would be Crysis with very high shader and rest in high settings and measure the differences from ALL high. But this has been proven if you look at Anandtech benches where they bench 4670, 3870, etc with very high shader and everything else set to high. GSO loses to 4670 that has higher theoretical processing power in those settings. More so then everything set to all high.
 

Denithor

Diamond Member
Apr 11, 2004
6,298
23
81
Originally posted by: Azn
How do you explain games like GRID or Assasin's Creed? Are they specifically optimized for ATI hardware?

Not specifically, as they were in development and/or released before the new ATi hardware was available. However, the coding instructions happen to fit the profile for highly efficient processing on the RV770 architecture so these games run faster on ATi GPUs.

And wasn't Assassin's Creed the one mainstream game to support 10.1 (briefly--until nV squashed it)? So maybe it was developed with ATi in mind.
 

Griswold

Senior member
Dec 24, 2004
630
0
0
Originally posted by: justinburton
The fastest stream processor is still useless with drivers. Ati might have better stream p's but Nvidia has better drivers and CUDA.

Thats exactly what a marketing victim would say. CUDA sounds cool, so it must be so much better...

 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: Denithor
Originally posted by: Azn
How do you explain games like GRID or Assasin's Creed? Are they specifically optimized for ATI hardware?
Not specifically, as they were in development and/or released before the new ATi hardware was available. However, the coding instructions happen to fit the profile for highly efficient processing on the RV770 architecture so these games run faster on ATi GPUs.

Coding instructions fit what profile? I don't quite understand what this profile is. Shader dependent? Is that what you mean?

You say you need to specifically optimized for RV770 5 branches of RV770 160SP yet these shader heavy games that aren't optimized for ATI are better on RV770 with less of everything except theoretical processing power.
 

Cuular

Senior member
Aug 2, 2001
804
18
81
Originally posted by: keysplayr2003

So those other 640 sp's sit by idly and watch the 160 do all the work then? LOL

In normal use no.

The 160 multi-stream processors contain the total 800 stream procs. So a portion of those 160 are always in use. Each of those 160 can do up to 5 things at once. Hence the 5*160=800

So if the program running is written in such a way that it can feed the right workload to the GPU, you could have 800 things going at once. But in the normal case it's less than that.

So in a worse case scenario it can be running 160 things at once, using a single of those 5 "streams" within each of the processor units. Which like you pointed out, would leave the other 640 streams unused.

In general the drivers find a way to balance the workload such that it doesn't ever have a single stream from each processor working.
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Originally posted by: Griswold
Originally posted by: justinburton
The fastest stream processor is still useless with drivers. Ati might have better stream p's but Nvidia has better drivers and CUDA.

Thats exactly what a marketing victim would say. CUDA sounds cool, so it must be so much better...

Gris, it not only sounds cool, but actually is pretty freakin cool. It could have been named SPLOITALFEERSH, and wouldn't sound cool anymore, but still would be for what it is/does/enables.

 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: Cuular
Originally posted by: keysplayr2003

So those other 640 sp's sit by idly and watch the 160 do all the work then? LOL



In general the drivers find a way to balance the workload such that it doesn't ever have a single stream from each processor working.

Bingo! the drivers take care of the 800SP to off load whatever it is processing. So those 5 branches of 160SP doesn't just sit idle.

 

Denithor

Diamond Member
Apr 11, 2004
6,298
23
81
Take a look at the page I referenced before.

http://www.anandtech.com/video/showdoc.aspx?i=3341&p=6

That shows visually how the instructions are distributed among the SPs on different hardware. On the ATi SP diagram you will note that there are a total of five "mini" SPs that make up each "full" SP. If the instructions can be run simultaneously, each of these will handle a separate instruction, much improving the flow through the SP.

In the given example the instruction string could be broken up into a chain of eleven elements (each of 1-5 units per element) to be fed through the ATi hardware, while the same instruction string has twenty single elemets that have to be fed individually through the nVidia gpu.

Now, consider a different string, where all blocks are single instruction set so the ATi gpu won't be able to distribute at all. In this case both gpus have to simply crunch through all twenty steps. Then expand this to several thousand instruction sets coming through at once. The ATi gpu can handle 160 strings simultaneously (1 string per "full" SP) while the nVidia gpu can process 240 strings simultaneously. You figure out which will be faster under those circumstances.

Of course, most games are not going to be either extreme case but rather somewhere in between. The ATi hardware can do some distribution of load but in most games not enough to make up for having fewer "full" SPs.

This also explains why the GTX260 cannot beat the 4870 in most games. The 4870 is able to do enough distribution that its virtual SP count is higher than the 192 available to the GTX260 (and why the GTX260-216 is more of a challenge to the 4870).

EDIT: You guys tossed in some more comments there while I was typing.
@AZN: But in many cases the 5 branches do just sit idle because otherwise the 4870 would absolutely skunk the GTX280.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
A single instruction? I haven't programed in a while but we can rest assure single instructions are very slim in the programming world.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: Denithor
@AZN: But in many cases the 5 branches do just sit idle because otherwise the 4870 would absolutely skunk the GTX280.

So not true when game performance isn't determined by SP alone.

If it those 5 branches just sat idle RV770 would be skunked by GT200 considering with only 160SP it would have theoretical 244 GFLOPs compared to GT200 933GFLOP.
 

Denithor

Diamond Member
Apr 11, 2004
6,298
23
81
Which is exactly why I said most games don't hit either extreme but rather somewhere between the two points. Therefore the 4870 wins in a few games, matches in others, and loses to the GTX 280 in most.

Notable examples of the extremes are Oblivion (GTX 260 beats 4870) and Bioshock (4870 "demolishes" the GTX 280). Obviously in Oblivion more of the instructions have to be processed in series so the 4870 functions like it only has 160 SPs most of the time. In Bioshock, however, more of the instructions can be processed in parallel so the 4870 easily surpasses the GTX 280 (functions like it has 300 or more SPs).

EDIT: The way I see it, the GTX 280 always has 240 SPs while the 4870 varies between 160 and 800 depending on how the game is coded (really, varies with each instruction set, depending on how much of each set can be handled in parallel). Very rarely the instructions will be highly parallel, allowing the full power of the 4870 to be unleashed, but when it does happen you have an outcome like Bioshock.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Only a hand full of games out today are dictated by shader performance. Even then pixel and texel performance is still king.

Bioshock
GRID
Oblivion used to but it's so old that we already reached a point of deminishing gains with better shader. Now it's dictated by texture and pixel performance than anything else
Colin Mcrae Dirt
NFS Prostreet
Assassin Creed
Crysis (very high shader)

Out of these games RV770 does particularly well with all of them compared to GTX 280 except for oblivion considering GT200 has much more pixel or texel fillrate compared to 4870.

The way I see it. RV770 does better in THESE shader heavy games because a single instructions rarely exist in the complex world of modern games. Single instructions would be like Pong literally. Shader doesn't command performance but it's part of the equation. We already went through this when 9600gt was released. Not to mention BFG did his ultra findings between SP, fillrates, and bandwidth which fillrate came out on top long as you weren't shader limited like Geforce 7 series. RV770 does particularly well with newer generation of games because it has more FP32/16 blending rates than GTX 280 while GTX 280 excels in older 8 bit texture formats and color fill.
 

Zap

Elite Member
Oct 13, 1999
22,377
2
81
Originally posted by: FalseChristian
So are the rodents in my 8800GT the same as the rodents in the GTX 260/280?

No. They are as similar as comparing African to European swallows.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
Originally posted by: Griswold
Originally posted by: justinburton
The fastest stream processor is still useless with drivers. Ati might have better stream p's but Nvidia has better drivers and CUDA.

Thats exactly what a marketing victim would say. CUDA sounds cool, so it must be so much better...

Plus I happen to believe Nvidia has worse drivers.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
Originally posted by: Azn
Originally posted by: keysplayr2003
Originally posted by: Azn
Originally posted by: Zap
That's the equivalent to how many rodents running on wheels it takes to power the cards (wheels hooked up to generators). The big difference comes from the way they measure things. ATI uses mice, while NVIDIA uses rats. Obviously rats are bigger and thus can do more work, so 240 rats are kinda-sorta equivalent to 800 mice in performance, with slight variations of course depending on individual rodent strength, age and working conditions.

Good analogy but you are forgetting one thing. SP clock speeds. How fast those mice and rats run to make the wheel spin. ATI 800SP are a little bit more powerful than Nvidia's 240SP at current clocks. In theory at least.

How are you figuring this? Are you talking about shader clock speeds?
In that case:

800sp * 750MHz = 600,000 (theoretical RP (rodent power))
240sp * 1300MHz = 312,000 (theoretical RP (rodent power))

Looks like Nvidia's rodents get a lot more done with a lot less.

If you meant something else, let me know.

Ummm no...

Nvidia GTX 280
3 cycles * 240SP * 1296mhz = 933 GFLOP

ATI 4870
2 cycles * 800SP * 750mhz = 1.2 TFLOP

Rodents or mice does it matter? In the end ATI gets more work done.

And gets lower fps in a majority of tests relating to single gpu performance?
 

cusideabelincoln

Diamond Member
Aug 3, 2008
3,275
46
91
Originally posted by: sonnygdude
Originally posted by: Avalon
We should make a video benchmark that has scores in terms of rodent power instead of stuff like Futuremark and their dreary 3Dmarks.

Hmmmm... Isn't that what Furmark is for? Think of it as a rat chasing its tail

What would be awesome is if Futuremark included a demo or test that actually displayed rodents doing work. Just imagine a hundred tiny, fury, animals on several wheels. As more wheels turn, a single giant lightbulb becomes brighter and brighter. Shadows are cast, the camera pans. Now that would be one hell of a demo.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |