Originally posted by: Rollo
I wouldn't call that "knowing your GeForce lore"- you left out that even if the 9700Pro and FX5800 had bubble gum where the shaders should be, it would be over a year after the release of the 9700Pro till games started to trickle out that barely used shaders. It was 16 months after the release of the 9700Pro the first "big" DX9 game came out (Far Cry) so any DX9 "trouncing" ATI did was limited to a couple games most people didn't have.
Saying they "trounced" based on that would seem biased to me.
However, the games that were out, such as Tomb Raider: AOD (I know, it was crap-ass), and later Halo, the FX's problem with shaders was shown for all the world to see. The shaders the FX produced did not look as good as ATI's, nor did it perform as fast. As more and more games that used shaders came out, the flaws of the NV30 were even more apparent. Granted, by this time, the NV35 had came out and had partially fixed the problem, but some of it was still there.
I'll break it down for you:
The NV3x has a long processing pipeline, much like the P4. As we all know, the P4 sometimes chokes when software is not coded to be "friendly" to the architechture. The R3x0, on the other hand, is more like the Athlon. It can't reach astounding speeds, but it has a high IPC (instructions per clock). This doesn't allow ATI to crank the core to 500MHz, but it is more efficient than NV30, and thus, it is a little "friendlier" to software (in this case, shaders) that is not coded efficiently.
Shaders, as you may know, sometimes have to do several things to properly work. For example, if a shader is to fill an object with some texture and then blend it with another, a poorly coded shader would be sent to the GPU in the order of: texture; blend; texture; blend. The NV30 would have to kick the shader out of its pipeline, and then start over with the second operation. It would do this for each step, and, because the NV30 has a long pipeline, it would take a whopping
4 clock cycles to complete. However, if you take this same shader op, and re-order it to make it more efficient, in the order of: texture; texture; blend; blend, the program would run through the pipeline in one shot, no problem. Unfortunately for NVIDIA, programmers oftentimes coded the shaders in the order of the first example, which made the chip perform worse than the R300. There were, obviously, a few exceptions to this analogy, but for the most part, R300 was king when it came to shaders.
Now, for ATI's chip. The R300 would have none of the problems that NV30 would with unoptomized shaders, as to it, the code was all the same. Granted, the code would perfom slower with the first example, but its impact would be much less compared to NV30's as R300 has a much shorter pipeline.
Wow, I can't believe that I'm arguing on the side of ATI. Something must be seriously wrong with me. Wait, I know, I've been up all night at a LAN Party. Duh! Too much caffeine, I guess, and no sleep = weird stuff. Anyways, I hope this sheds some light as to what I was talking about. Didn't know that stuff didja Rollo
See, I do know what I'm talking about!