Anand R420 review analysis

Pete · May 6, 2004

Edge, let me be blunt. Your support for SM3.0 seems unreasonably heated. A lot of ppl in this thread have already been through this excitement with R300 and PS2.0, and that was an extended let-down up to very recently. We're a bit more cautious now when it comes to new features. They may sound great, but we're more interested in playing games that demonstrate a card's features than reading about said features.

UltraShadow2 is not "just" stencil shadows. Play Splinter Cell on a NV3X and then a R3XX and you'll see the difference.
I assume you're referring to buffer vs. projector shadows, as R420 seems to be faster than NV40 in SC at the same settings. nV's IQ advantage has less to do with "UltraShadow" than "Xbox."

A few thoughts on your ATi/nV post:

Yes, it would seem nV made a relatively small leap from PS2.0a to PS3.0 functionality, potentially to their advantage. By the same token, ATi made a relatively small leap from PS1.4 to PS2.0, to great advantage with R300. PS2.x -> PS3.x is, by all accounts, a smaller leap than PS1.x -> PS2.x (though it's certainly still an important one), so I'm not sure ATi will have too tough a time adjusting. Just as you consider NV30 -> NV35 -> NV40 three revisions of PS3.0-level functionality, so I would consider R200 -> R300 -> R350 -> R420 four revisions of PS2.0-level functionality. But that's just talk; what ultimately matters most to consumers (and the developers who desire them) is speedy features available on a large number of cards, not idealism (witness PS1.1 vs. PS1.4).

Is NV40 IEEE32, or is it just "FP32?" Because the two aren't the same, AFAIK, and I don't think NV40 is fully IEEE32-compliant. FP32 hasn't seemed to help nV at all in the IQ dept., anyway, as it's still occasionally noticably uglier than ATi in both Far Cry and Lock On. (This may still be due to FP16/NV3x, though, so I suppose I'll have to suspend judgement for a month or two, to give devs a chance to retarget for NV40.)

Truth of the matter is and moral of the story, ATI got stuck with their pants down somehow by deciding to not take the time or money to develop SM3 class hardware sooner, rather than later.
I'm not convinced this is the case. ATi is working on Xbox2 tech as we speak ("R500"), and I'm pretty sure that's more advanced than R420. ATi may well be developing hardware more advanced than SM3.0, just not mass-producing it.

Its not as easy as people think, they cant just throw on 32bit FP precision... the rest of the SM3.0 requirements and run to the bank with the performance crown. It wont happen.
Well, ATi's DX9 vertex shaders have been FP32 all along, and they're still faster than anything nV offers. So ATi does seem to be able to run FP32 at speed, even if they haven't chosen to integrate it into their pixel shaders yet. I think ATi may have stuck with FP24 in their PSs because they had already laid them out in 130nm low-k with RV350. ATi obviously opted to hold back this gen, and only games can prove their decision right or wrong. I was one of many who thought R300's great PS2.0 performance would mean lots of nice PS2.0 games making NV30 owners a bit jealous. 18 months later, and we've only got a handful of PS2.0 games that run better on ATi. Will it be the same case with SM3.0? I think it will, and mainly because there's now a generationally larger base of SM2.0 cards out there than SM3.0, and devs won't want to ignore that larger potential audience.

I agree, though, that drivers for SM3.0+ hardware will get more complicated, more like CPUs. OTOH, CPUs have been down this road before, so it's not like either ATi or nV are entirely reinventing the wheel with conditionals and loops.

The rest of the market has to catchup to NV before moving on. And the rest of the market is not even at 32bit FP, they are very likely to not have but half of the performance when they do either.
Why do you keep assuming implementing FP32 in hardware automatically means slower performance? AFAIK, it doesn't. FP32 mainly requires more transistors than FP24, for both the logic and the registers/cache. NV3x choked in FP32 because of lack of registers, and NV40 still shows greater performance in FP16 because of its lower register usage. ATi used their transistor budget more wisely than nV with R300, IMO, and they may do well betting on the same horse again (shades of GF3->GF3Ti->GF4).

Also, the NV40 core DOES have the power to perform displacement mapping, and all the nifty features of SM3.0.. dont believe me? Well, I could elaborate.
I'd love to see real examples of NV40's features. Would you mind elaborating?

I'm still curious why you hold all sites other than AT in such low regard.

BTW, I'm not trying to "shut you up," as Matthias said, but, frankly, I can understand his frustration. A lot of what you're saying strikes me as less than reasonable or informed, as I've tried to say in my responses. I'm just trying to understand your POV.

DeeSlanger · May 6, 2004

Thank You Pete, for a most welcomed breath of ethical common sence . What most of these PFY only realize is immediate gratification and allegiance.

Matthias99 · May 6, 2004

Oy. Where do I start?

Originally posted by: Edge3D

Originally posted by: Matthias99

Originally posted by: Edge3D

No, my spindoctoring friend. I never said PS3 increased IQ. Vertex Shader 3.0 compliance DOES.

Click to expand...

VS3.0 compliance *could*. Displacement mapping and other complex vertex shader programs might also tank performance on the NV40; I haven't seen any numbers.

Click to expand...

Right. But that isnt what I said.
My whole point of my existence here has been to tell some of you that VS3.0 DOES have HUGE IQ gains when used.
That is all. Its nice to see at least you agree with the facts of the matter... /phew!

No. VS3.0 *might* have *some* IQ gains. They also might come at a high price in terms of performance. Unless you have some sort of numbers -- or anything -- to back up your position with, you really have nothing to go on here.

Doom 3 is the only game engine with full per pixel dynamic shadowing and lighting. Even the much touted HL2 still uses light and shadow maps. For everyones' information... that was the technology used in Quake 2!
Its not really comparable.

Different lighting models are not necessarily better. HL2 uses a very different rendering system, with lots of little shaders to handle its dynamic lighting needs.

Click to expand...

Hehe. "different" lighting models?? Interesting way of putting it.
Thats like saying SM2 is merely "different" than SM3.. not inferior as it is in reality.

SM2 is a subset of SM3; of course it is 'inferior'.

HL2 does its lighting differently than Doom3 does. That doesn't mean it's going to look worse, or that the way they're doing it is 'wrong' or 'inferior'.

/sigh
NV has improved their drivers to the point where its now just as fast in ARB2 as NV30 mixed mode specific path.

Click to expand...

<snip -- info from Carmack>

IE, they didn't really improve their speed, they just made it so it runs at lower precision automatically, instead of Carmack having to tell them to do it explicitly via a different codepath. No big changes here; it's still running mixed-mode FP16/FP32, just doing it in the driver instead of in the game code itself.

There's actually quite a bit of good discussion in that thread about how Doom3's shading works, and how it compares to HL2. Perhaps you should take a look.

Click to expand...

I was already aware of all this stuff. It IS good that it is in the driver though. Believe me, I am no NV3X fan.. I was just stating the facts.

You said "NV ... improved their drivers ... its (sic) now just as fast in ARB as [when using the] NV30 mixed mode specific path." That's not true; they just made it so that the NV30 automatically runs in the mixed mode path without having to be specifically told to do so by the game engine (it just marks some of the shaders as 'can be used with lower precision'). So they didn't make it any faster, they just made it easier to write mixed-mode code. While that's a good thing, it's not what you said they did.

It sucks seeing OGL compared to DX versions.

Click to expand...

Then what the xxxx *should* I compare it to? It uses shader code that has features comparable to the ones found in SM1.4 and SM2.0. Sorry if I've offended you by mentioning DirectX in the same sentence as OpenGL.

Click to expand...

Well. I was just being a biyatch about it. I prefer OGL. I'm not a programmer..

Stop. You're not allowed to bitch about the internals of graphics programming languages unless you've used them.

1. If Vertex and Pixel shader 3.0 is so useless and 2.0 is fine and dandy.. then WHY is ATI planning to implement it into future R420 revisions if there is no need? Please, I'm all ears.

Click to expand...

It *won't* be useless in the future, when a) there are actually games that have meaningful support for it, and b) the cards are fast enough to actually use its nifty new features (super-long shaders with dynamic branching, hardware displacement mapping, etc.). For now (until proven otherwise), it's a paper feature, much like the SM2.0 support in the FX5200.

Click to expand...

Thank you again. My point is well illustrated now by someone else other than me. I wouldnt disagree with what you said but you are intelligent enough to see what it is at LEAST capable of.
I give you that its worth remains to be proven.

"its worth remains to be proven". Good. Will you shut up now about how it provides "HUGE" IQ gains until someone demonstrates them?

Plus, it might actually shut up all the people going "Ooh, look, NVIDIA has SM3.0 and ATI only has 2.0! ATI sucks!"

Click to expand...

<snip -- giant rant implying I'm an ATI fanboy because I think SM3.0 is useless right now>

And in all honesty, NV has had pretty much "DX9C" class hardware since the NV30.. not totally but MANY of its features were in the NV30 core... yes that 5800 leafblower. Reason I say this? They have much, much, much, much more experience with "SM3" class hardware.. you say that you wonder about NV's future SM3.0 performance.

No, I wonder about their current SM3.0 performance, and if the NV40 is actually fast enough to use its features effectively.

I will say this right now- NV is exponentially likely to have GREAT performance in SM3 based games. This is, essentially, 3rd generation SM3.0 hardware (NV30 was missing some crucial features like displacement mapping, it is like the new ATI part "SM2+").
While ATI with the R420 could be said to be on their 1st revision of anything much more than bare SM2.0 requirements and taken at least a step towards full SM3 compliance.

What does 'exponentially likely' mean? You *do* understand that SM3.0 is essentially SM2.0 with a few new instructions, right? So unless the shader code somehow becomes WAY less complicated because you're able to use dynamic branching (or some of the new VS3.0 features like texture lookup), it runs just as fast in SM2.0 as in SM3.0.

They dont even have FP32 precision yet. Thats huge in DX9C and SM3.0. NV had that in the NV30.. expect the performance and IQ delta to widen considerably with DX9Cs release and supporting games.
And dont expect ATI to catch up fast, or at all... they are actually behind on tech.

FP24 (ATI) is partial precision. FP32 is full precision.

Generall, FP16 is called 'partial precision', although I suppose both FP16 and FP24 are 'partial' in SM3.0, since it specifies 32-bit precision. You wouldn't happen to have any examples that show why FP32 is necessary, or even noticeable, do you? Didn't think so.

Truth of the matter is and moral of the story, ATI got stuck with their pants down somehow by deciding to not take the time or money to develop SM3 class hardware sooner, rather than later.

Even though the worth of SM3 "still has yet to be proven"?

32bit FP precision is considered across the industry as "full precision", has been for 20+ years.
The rest of the market has to catchup to NV before moving on. And the rest of the market is not even at 32bit FP, they are very likely to not have but half of the performance when they do either.

Graphics cards have never had 32-bit floating point math before. It may have been the standard in CPUs for 20+ years, but not in GPUs.

Also, the NV40 core DOES have the power to perform displacement mapping, and all the nifty features of SM3.0.. dont believe me?

I believe it has the power to use them in a demo. That doesn't mean it will be able to use them extensively in real games. I don't believe it until I see it running, generally.

Or could just point to the simple known fact that SM3.0 is as much about speed increases as IQ. Case in point.

Explain why. Or provide a demonstration of such speed increases. I've shown counterexamples in other threads where it doesn't help you.

To sum it up once again, NV has most of the performance of every single ATI card but sometimes the x800XT runs away.. but its absolutely absurd IMO to spend $500 and short yourself something like a overall, very longtime developed, mature SM3.0 architechure from Nvidia.. RIGHT at the time when SM3.0 IS going to be relevant in the market place.

You on SM3: "its worth remains to be proven". So... NV is *almost* as fast, except in the cases where ATI beats it by a lot, but you should buy it because it has SM3.0, which *might* be worth something eventually.

sxr7171 · May 6, 2004

Looking at the review again it looks like the 6800GT will serve me well. It reminds me of the Ti4200 days where you could buy it and overclock it to get close to top of the line performance. Let's see how the 6800GT and the X800pro overclock.

CrystalBay · May 6, 2004

^^^Very Nice Matt,^^^

OT..... I know HL2 is still going to blow the doors off peoples expectations. Think Far Cry is bad, HL2 shaders will make Fart CRY... Ouchey..!!

BFG10K · May 6, 2004

Hey I dont see that forum. Could you point me to it or create the post and create a poll for it?

It seems they've removed it.

My whole point of my existence here has been to tell some of you that VS3.0 DOES have HUGE IQ gains when used.

I've seen absolutely zero evidence to support this. Not even FP32 appears tied to it as FP32 was possible under SM 2.0 (and even then I wouldn't class the difference in IQ to be huge).

Pete · May 6, 2004

Edge, this post should interest you re: FP24 vs. FP32. This whole thread, too.

Edge3D · May 6, 2004

Everyones responses are what I expected. As I said I'm pulling out from posting on this.. its turning into a pissing match. I laid it down pretty hard on old ACKmed and that was the main goal.
Anyway, seems many arent going to change their minds.. its made up for many of you thats pretty clear.

You are convinced ATI has the better implementation this round. Thats your opinions and that is fine.
I have no problem with that. I was just trying to explain some things that it seemed many of you were missing.
Like I said, downplaying SM3.0 (which many are still doing) seems a bit odd to me. But whatever, you guys can believe what you want.
I would have a VERY hard time laying down $500 for a x800XT and not getting SM3.0 regardless of the propaganda being spread attempting to downplay it. Thats a pretty simple statement, and based off of my own investigation into the technology. I guess I am extremely curious who will, and who WON'T be rockin the X800XT like they claimed.

I'm going to do a roll call this summer on how many people currently own a x800XT. I want to see 40 votes, but I wouldnt doubt those who hold me in ill regard will vote just to spite me, whether they have one or not, but I'm still going to be interested.

As far as SM3.0, what else do you guys need to know? I've provided links, tech demos illustrating it. If your not convinced, then why argue? Let it die and go to another thread. I've read my info and I KNOW what is the best choice.

I will respond to one thing, and I find this comment equally silly as most of the other replies I've gotten.
Its the comment that light and shadow maps are not inferior to full, dynamic per pixel lighting... guys, seriously. Whats going on in this forum?
Do you read what you're saying?
"HL2 does its lighting differently than Doom3 does. That doesn't mean it's going to look worse, or that the way they're doing it is 'wrong' or 'inferior'."

Um.. Right. Is this Lawyer time? Am I going to be held to some dictionary definition of "wrong" and "inferior" next? I mean, I suppose it isnt Wrong. But it sure as hell IS inferior!
Hands down, no questions asked, thank you, come again, you've been served, ect, and so on.
There is no question, point blank period.. that Doom 3 has the BEST, most superior, super duper, fraglicous shadows and lighting of any game in the history of PC gaming.
Seriously. There is NO competition. Amazing how no one will back me up on any of my points... biased crowd, eh?

I honestly dont see too many responses that arent any less ludicrous as that one in response to my post besides Petes.
The facts have been layed out about 3.0 shaders and my opinions have been illustrated from those facts.
If you dont agree, then congratulations. I just hope you read it with an honest eye.

Pete is my favorite so far. Damn good points and great posts. To anyone who might like my outlook and finds it good and informative.. pay attention to Pete as well. He makes good sense, hes not correct, but he makes sense. No really (ChkSix), he has good points and I *think* hes got a pretty open mind. He is at least, the best at portraying his POV in words.
I no longer wish to check this thread but if you PM that post to me Pete I'll answer ya. But I tire of replying to 5 people. I did very well and now I'm tired of the endless hoopla, if you dont agree now, then you never will. Just face it and buy all yer x800XTs.. I dont give a toot. But I love ya all, and look forward to seeing ya post in my "Who Bought a X800XT again?" thread. But I got this thread for evidence

PS. Its likely the x800XTs wont o/c fer nothin.. we'll see. In summer, if my GT can get some nice watercooled o/c's and beat all of your XTs!

Ackmed · May 6, 2004

Keep dreaming about "laying it down hard on me". You are not even worth the trouble. I wont be posting back to any of your ignorance anymore. This forum needs an ignore feature.

This is exactly the reason why these forums have gone down hill.

Go ahead and get the last word it so you can "law it down hard on me" again.

MemberSince97 · May 6, 2004

Edge, your a good dood and you are right about Nvidia. This is a monster chip that even the architects still dont understand how it behaves. It will be at least 6 months before drivers start scratching the throat of these pipes. Where as the R300 is in its second to last carnation as of now, one more refresh. The foundation is there Nv to go many years. Its a great video card ...

ChkSix · May 6, 2004

For me personally, I cannot seeing laying down 500 dollars on a card that so far doesn't seem to overclock well, is missing key features that may or may not make a difference in short time if the developers implement them in some upcoming titles, and is only ahead of the Extreme by a small FPS lead in some (not all) benchmarks. I would rather have the superior performance in OpenGL apps (Doom 3 to me is much better visually than HL2 and the game I'd rather have), the overclocking headroom ( my system is watercooled ) and the better driver support that Nvidia brings. I own both a 9800pro and a 5950Ultra, and this time around, at least for me, I cannot see giving a company a big chunk of cash for nothing but a suped up 9700-9800 (R300). To each his own, but for me, I like having all the latest wizbangs, at least for the longevity, and at this stage in the game, only Nvidia has those offerings.

What is striking to me, is that since the R300, ATi has yet to create a new architecture for it's videocards. Some might say why invest that money if the hardware isn't broken, and that's a very true and valid point, yet I would have thought that by now, considering PS/VS 3.0 and DX9.0c, that ATi (which follows DX to the letter) would have given us something completely new that would have been just as technologically superior, "at least on paper" as the NV40. I have been around computers for quite some time, and knowing what ATi was before the 9700-9800 series is a little troubling in my eyes. I keep thinking, although I may be wrong, is that they got the design right one time and cannot build something brand new on it's success. For me, it is a little unnerving seeing that the R420 is an extension of an already seemingly old (yet still swesome) R300 core. And although the R420 is a wonderful and fast card, it's a little dissapointing to me, only because it doesn't have 128bit or PS 3.0 support.

The fact that the R420 doesn't overclock well ( although it may under exotic phase change or TEC systems ) is what ultimately keeps me away. I love overclocking, and if something is only getting 15mhz on air (according to hexus) than it isn't going much much further on water. And since Gainward's watercooled variant of the NV40 is suppose to offer a 20% increase over the Extreme, I think that is where my money is personally going.

reever · May 6, 2004

As far as SM3.0, what else do you guys need to know? I've provided links, tech demos illustrating it. If your not convinced, then why argue?

Convinced of what? Showing us tech demos convinces up 3.0 is a real instruction set that exists, try showing us something supporting anything you say. No facts have been laid down to prove any assumptions you are making, simply because nothign exists to prove it, and nobody is going to believe until the 3.0 games show up.

ChkSix · May 6, 2004

Here is something on PS3.0 shaders written by Crytek himself. I think downplaying what they can be in the very near future, is wrong, whether 4 or 20 games will be released utilizing it. It isn't a company, but the technology here that is important and should not be brushed away.

As a developer, what are the most convincing arguments for the use of Shader 3.0 over 2.0?

· In VS3.0 shader model actually is possible to support general displacement mapping (with smart shader design when vertex shader has to do something during waiting for texture access).

· In PS3.0 shaders it?s possible to decrease number of shaders using dynamic branching (single shader for general lighting) and in such way decrease number of shader switches, passes, and as result increase speed, and also we can utilize dynamic conditional early reject for some cases in both PS and VS and this also will increase speed. As to NV40 generally possible to use co-issues better to take advantage of super-scalar architecture (we can execute 4 instructions per cycle in a single pipeline).

· We can handle several light sources in single pixel shaders by using dynamic loops in PS3.0.

· We can decrease number of passes for 2-sided lighting using additional face register in PS3.0.

· We can use geometry instancing to decrease number of draw-calls (remove CPU limitations as much as possible).

· We can use unrestricted dependent texture read capabilities to produce more advanced post-processing effects and other in-game complex particles/surfaces effects (like water).

· We can use full swizzle support in PS3.0 to make better instructions co-issue and as result speedup performance.

· We can take advantage of using 10 texture interpolators in PS3.0 shader model to reduce number of passes in some cases.

chsh1ca · May 6, 2004

Originally posted by: Edge3DYou are convinced ATI has the better implementation this round. Thats your opinions and that is fine.

Evidently it isn't, you've spent three pages arguing with people about why they choose to buy a certain card.

I have no problem with that. I was just trying to explain some things that it seemed many of you were missing.
Like I said, downplaying SM3.0 (which many are still doing) seems a bit odd to me.

There's no reason to hold it aloft as the key to performance now and in the future. It's akin to SSE3 -- probably be VERY useful somewhere down the road, but COMPLETELY useless right now.
IMO the on-chip Hardware MPEG2 encoder is FAR more useful to FAR more people than SM3.0 support is.

But whatever, you guys can believe what you want.

Is there a problem with belief rooted in fact?

I would have a VERY hard time laying down $500 for a x800XT and not getting SM3.0 regardless of the propaganda being spread attempting to downplay it. Thats a pretty simple statement, and based off of my own investigation into the technology. I guess I am extremely curious who will, and who WON'T be rockin the X800XT like they claimed.

If I have the cash to spend, I will be aiming for an X800XT, but it will likely run ~$700, so I may end up settling for an FX5900U or 9800Pro if I can find either.

I'm going to do a roll call this summer on how many people currently own a x800XT. I want to see 40 votes, but I wouldnt doubt those who hold me in ill regard will vote just to spite me, whether they have one or not, but I'm still going to be interested.

I think you overestimate how much people give a damn about you.

As far as SM3.0, what else do you guys need to know? I've provided links, tech demos illustrating it. If your not convinced, then why argue? Let it die and go to another thread. I've read my info and I KNOW what is the best choice.

Tech demos do not equate to useability in games at acceptable speeds. Some time ago 3dfx convinced everyone to write games with GLIDE support, and many did. Did that stop their company from having to adapt to a technology that they didn't themselves want to adapt?

There is no question, point blank period.. that Doom 3 has the BEST, most superior, super duper, fraglicous shadows and lighting of any game in the history of PC gaming.

There is a question, since Doom3 isn't out yet, nor is its primary competitor in HL2. Shadows and lighting aren't why I buy a game -- I buy it for playability, and if it happens to come with some cool eye candy, that's a bonus. I suggest you stick to playing 3DM03 if you prefer synthetic reasons to practical ones. Even assuming you're right, and Doom3 does have the most amazing shaders, SM3.0 is still irrelevant since any example anyone has seen of Doom3 has been run on cards using SM2.0 or lower.

Seriously. There is NO competition. Amazing how no one will back me up on any of my points... biased crowd, eh?

The alternate possibility is that you're wrong and you haven't clued into it yet.

I honestly dont see too many responses that arent any less ludicrous as that one in response to my post besides Petes.

What about all the people saying "wait until you see it in a game before making up your mind". How is that ludicrous?
IMO Ludicrous is lauding a minor piece of technology as the next greatest revolution in PC gaming when it has never been seen in use in a game before.

IMO people who post for the explicit purpose of baiting and verbally abusing a specific person should get to take a free all expenses paid vacation from the forums.

Deeko · May 6, 2004

Alright...if you're going to play nothing but Doom3, by all means get the nVidia. If you want to base your $500 decision on ONE game that isn't even out, hey, by all means. Face it....openGL superiority doesn't hold the same weight it once did. While there are still very good OGL games, most games are DX. As for 3.0 vs 2.0....by the time it is really necessary, ATI will have a card supporting it.

Acanthus · May 6, 2004

I think the funniest thing is that many of the people touting ATi have AMD-64 systems. Lets see how their "much improved" drivers perform on AMD-64 where they have to completely rewrite them.

ChkSix · May 6, 2004

I'm not basing my decision on one game. The real fact of the matter is that unlike NV3X that didn't stand a chance next to R300, this time around the situation is quite different. Both top end cards from either company are pretty much locked in performance, with one taking some benchmarks and the other taking the rest, or a complete draw. Stating that Nvidia is only good for one game, considering the performance benchmarks on any review site showing how neck and neck they both are, is completely false.

One better than the other, regardless of which one it is, is utter nonsense and visible to anyone reading all the reviews thus far. They are both performing on par with each other in DX9 games for the most part, and OpenGL is a no brainer on Nvidia, but that has been the case for some time now...no new news.

Deeko · May 6, 2004

Originally posted by: Acanthus
I think the funniest thing is that many of the people touting ATi have AMD-64 systems. Lets see how their "much improved" drivers perform on AMD-64 where they have to completely rewrite them.

Huh?

Matthias99 · May 6, 2004

Originally posted by: ChkSix
Here is something on PS3.0 shaders written by Crytek himself. I think downplaying what they can be in the very near future, is wrong, whether 4 or 20 games will be released utilizing it. It isn't a company, but the technology here that is important and should not be brushed away.

The problem is, this is a 'laundry list' of every new feature in SM3.0. This reads like MS and NVIDIA's press releases.

As a developer, what are the most convincing arguments for the use of Shader 3.0 over 2.0?

· In VS3.0 shader model actually is possible to support general displacement mapping (with smart shader design when vertex shader has to do something during waiting for texture access).

Yes, although performance remains to be seen. I'm also confused by his statement about doing it 'while waiting for texture access', since displacement mapping geerally *requires* texture access. Maybe he meant doing it while the pixel shader is waiting for textures? I guess that's possible.

· In PS3.0 shaders it?s possible to decrease number of shaders using dynamic branching (single shader for general lighting) and in such way decrease number of shader switches, passes, and as result increase speed, and also we can utilize dynamic conditional early reject for some cases in both PS and VS and this also will increase speed. As to NV40 generally possible to use co-issues better to take advantage of super-scalar architecture (we can execute 4 instructions per cycle in a single pipeline).

There's a lot of qualifiers in that paragraph. His first two points are essentially this:

Let's say you have four PS2.0 shaders, maybe 50 instructions each. You want to run either 3 or 4 of them on each pixel. Right now (using SM2.0), you have to do this in 4 passes of 50 instructions each (with the card excluding some pixels from the last pass entirely). In SM3.0, you can build one large shader with the same 200 instructions and a conditional branch that skips the last 50 instructions in certain situations. You then run one pass with the 200-instruction shader, and it internally decides whether or not to execute the last 50 instructions on each pixel. Unless the change to SM3.0 *also* decreases the overall shader length (and it may actually increase here, since you have to add the logic for deciding whether or not to skip the last part of the shader), all this saves you is the setup time for each shader pass -- the shader core still has to execute the same 150 or 200 instructions for each pixel. I don't know exactly what that setup time is (I would assume it's pretty short), but we're talking percentage points here, not orders of magnitude.

His last point is no different than the situation with NV30. In fact, you could rephrase that to say: "With NV40, we have to be very careful about the order in which we issue shader instructions, because their architecture is very inefficient if they're issued in the wrong order and it can't pipeline them." With a HLSL this is usually not a big deal, but it's not like this is some fantastic new feature they've added.

· We can handle several light sources in single pixel shaders by using dynamic loops in PS3.0.

...or by using multiple passes and the F-Buffer in ATI's SM2.0 hardware. It will likely run a little faster with dynamic loops, but I don't know how much (it's much the same situation as above; it has to do the same amount of work, but it might do it a little more efficiently in a single pass rather than two or three).

· We can decrease number of passes for 2-sided lighting using additional face register in PS3.0.

Well, yes... but I don't know how much real impact this has.

· We can use geometry instancing to decrease number of draw-calls (remove CPU limitations as much as possible).

...in some situations. This requires that they have multiple instances of the same models on-screen at the same time, and unless there are a *lot* of them (or they're very large/complex), the savings are not that great. I have no doubt this will be great for CAD, but probably not killer for gaming.

· We can use unrestricted dependent texture read capabilities to produce more advanced post-processing effects and other in-game complex particles/surfaces effects (like water).

It's true, although I wasn't exactly blown away by NVIDIA's tech demo on this feature. You can do most of the same things in different ways in SM2.0 (although I'll admit that it's *easier* in SM3.0, which is a good thing). Maybe developers will find better ways to use it.

· We can use full swizzle support in PS3.0 to make better instructions co-issue and as result speedup performance.

Beats me. No clue what he's even talking about.

· We can take advantage of using 10 texture interpolators in PS3.0 shader model to reduce number of passes in some cases.

And even then, it may not be much (if any) reduction in total instruction count, depending on what the shader is doing.

SM3.0 is an improvement on SM2.0. Yes. But it's an iterative improvement, not a huge jump like SM2.0 was over PS/VS 1.1/1.4.

Matthias99 · May 6, 2004

Originally posted by: Edge3D
I will respond to one thing, and I find this comment equally silly as most of the other replies I've gotten.
Its the comment that light and shadow maps are not inferior to full, dynamic per pixel lighting... guys, seriously. Whats going on in this forum?
Do you read what you're saying?
"HL2 does its lighting differently than Doom3 does. That doesn't mean it's going to look worse, or that the way they're doing it is 'wrong' or 'inferior'."

Um.. Right. Is this Lawyer time? Am I going to be held to some dictionary definition of "wrong" and "inferior" next? I mean, I suppose it isnt Wrong. But it sure as hell IS inferior!
Hands down, no questions asked, thank you, come again, you've been served, ect, and so on.
There is no question, point blank period.. that Doom 3 has the BEST, most superior, super duper, fraglicous shadows and lighting of any game in the history of PC gaming.
Seriously. There is NO competition. Amazing how no one will back me up on any of my points... biased crowd, eh?

Are you aware that Doom3 'renders' objects by drawing dozens (if not hundreds) of so-called 'primitives'? This technique dates all the way back to Quake (and earlier, in computer graphics in general).

Also, Doom3 is built on a *ray-casting* engine, just like Wolfenstein 3D. Jeez. What a piece of junk.

Just because something is new and different doesn't mean that it's automatically better. Doom3 does some very interesting things with their lighting, but anybody who's watched the HL2 demos can vouch for the fact that the Source engine is no slouch in this department either. I'm going to hold off declaring either of them the "BEST" until I see at least them both in action. I think doing otherwise means you must be biased. How can you declare something the best when you haven't even seen its final version, or what it's going up against?

Blastman · May 6, 2004

We can handle several light sources in single pixel shaders by using dynamic loops in PS3.0.

This is just pie-in -the-sky stuff. Yes you could loop several lights into one shader but your performance would probably be killed. This isn?t shader language but for illustration purposes consider hypothetical shader code for 5 lights ?

example A using looping :

For light=1 to 5
light shader
Next light

Example B straight code :

light1 shader
light2 shader
light3 shader
light4 shader
light5 shader

The code with the loop (A) has to execute extra looping instructions in addition to the 5 light shaders. The straight code in B is going to execute much faster. The ?best case ? scenario for A would be that the DX9 runtime can upack it to look like B so it won?t run slower --but I don?t know if that?s possible.

The PS3.0 program extensions (not really PS3.0?s) simply open up the programming model past 2.0 but when one considers how PS?s are used I don?t think there is really going to be much use for looping and branching with PS?s.

Consider Farcry?s PS count ?

44 ? ? PS 2.0/2.x
100 ? ?PS 1.1

PS1.1 is limited to a maximum instruction count of 8
PS1.4 is limited to a maximum instruction count of 14
PS2.0 is limited to a maximum instruction count of 96

Even on a shader heavy game like Farcry the vast majority of shaders are going to have a very small instruction count. That?s simply how shaders are used. Those 100 PS1.1 shaders in Farcry are limited to an instruction count of 8.

If you look at the 3Dmark2001 nature demo, it has what PS 1.1 and 1.4? Again all shaders with a small instruction count -- less than 14 or 8. And it renders some very nice lighting effects. What counts is being able to run a lot of small shaders fast and have enough precision for high-dynamic-lighting (PS2.0 24bit aka R300), not being able to run real long shaders which probably have extremely little use in general for games.

Even a shader heavy game like HL2 uses Shader Model 2.0 extensively but the shaders are only 30-40 instructions long -- well short of the 96 maximum instruction limit even in the standard PS2.0. The X800 can handle 1536 PS2.0+++ instructions on one pass and infinite on many passes. It?s doubtful we?ll find much use for shaders hundreds of instruction long as the vast majority of lighting effects only take a few instructions and shaders hundreds of instructions long would probably grind frame rates too low anyways. .

From ATI?s POV the engineers had to make a call. Add looping and branching which would have added millions of transistors to the GPU while not being very useful for the PS?s programming model. (Not saying totally useless) Or beefing up the PS2.0 to the hilt -- what we rally need -- the ability to run lots of shaders like a runaway freight train. Looking at shader ?heavy? games like Farcry and Tomb Raider the X800 has very good shader performance and performs very well in those games and is looking better compared to the 6800 so far.

Edge3D · May 6, 2004

Originally posted by: ChkSix
I'm not basing my decision on one game. The real fact of the matter is that unlike NV3X that didn't stand a chance next to R300, this time around the situation is quite different. Both top end cards from either company are pretty much locked in performance, with one taking some benchmarks and the other taking the rest, or a complete draw. Stating that Nvidia is only good for one game, considering the performance benchmarks on any review site showing how neck and neck they both are, is completely false.

One better than the other, regardless of which one it is, is utter nonsense and visible to anyone reading all the reviews thus far. They are both performing on par with each other in DX9 games for the most part, and OpenGL is a no brainer on Nvidia, but that has been the case for some time now...no new news.

robertsmcn · May 6, 2004

I'm no fanboy of either manufacturer. I had a GeForce 3 Ti200 for two years that was solid. I just upgraded that to a 9600Pro which is also a great card.

Both Nvidia and ATI have put out some impressive products here. I'm never one to buy the latest and greatest. I'd rather save a few bucks and go for something one level down. As I see it right now, when I upgrade my system in about 6 months I wil probably be looking at either the X800Pro or 6800GT. I'd like to see a shootout between these two cards soon and then make an educated guess based on that. That should also be enough time to see if there are any driver issues, etc. with either of the cards.

Anand R420 review analysis

Diamond Member

Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Banned

Diamond Member

Senior member

Member

Senior member

Member

Golden Member

Lifer

Lifer

Member

Lifer

Diamond Member

Diamond Member

Golden Member

Banned

Member