greater than 100% scaling? really?

Triggaaar · Dec 23, 2010

Seero said:
Let say 1x i/o * 1xmem * 1xGPU = 100%, you agreed that, individually, i/o, mem, and gpu can, in theory scale linearly(200%). So assume that they are indenpendent, can't we have a case where each of them scales by 133%, making the formula 1.33*1.33*1.33 = 235%?

No, we can't. As I said already, increasing the performance of 1 of those functions by 33% doesn't increase the performance of the whole unit by 33%, unless it is basically responsible for all of the time. For example, i/o takes a milisecond, memory use takes a milisecond, GPU takes an hour. Increasing GPU performance by 33% will increase overall performance by 33%. Increasing GPU performance by 100% will improve overall performance by 100%. But increasing i/o and memory performance by any amount you like will not noticably increase performance at all. So no, you can't have a case where you get 1.33 * 1.33 * 1.33.

If our goal is to see whether or not it is "Logically possible" to go beyond 2, and you guys continously, and repeatedly come back and say there is no way 8.

No we don't - we're not just saing 'nah, 8 is way too much'. You came up with 8 by say 2 cards can theoretically improve performance by 2 to the power of n, where n = the number of functions, which in this case is 3. So the theoretical limit is 2 * 2 * 2. I'm not saying 'mm, 8 is too much', I'm saying your method is totally wrong and inconceivable. I know how you got to 8, but it is total nonsense. As I explained, what if we didn't have 3 functions - what if graphics cards all had 2 different GPUs (to do different things) so there were 4 functions in a card - your logic would then say 2 cards could theoretically give 16 times the performance (and more functions would give more exaggurated cases, as I already explained).

If you are being serious, then people here can't explain it to you more than they have. Your method is completely incorrect.

JAG87 said:
The concept is simple: no two frames are ever identical.

If one card renders 30 frames in one second, the second card is rendering a different set of frames, which might be a little easier to render, and might give the card the chance to render 32 for example instead of 30. Hence you end up with 30+32 = 62, greater than 100% scaling.

That's all there is to it.

That doesn't make sense, because we're not comparing the performance of card 1 against card 2. We're comparing a test a few minutes long with 1 card, and then again with 2 cards. Both scenarios will encounter the same mix of easy and difficult to render frames.

iCyborg · Dec 23, 2010

@Seero

Dude, you're just wrong. There are bunch of people telling you that, could it be that not all of us are ignorant and just can't understand you? Please try reading what people are trying to tell you.
RAMs on both cards contain the same thing. 2X1GB cards in CF effectively have at most 1GB of unique data. I/O is doubled, but you need to send the same stuff TWICE, therefore there's no performance improvement, and even if you could you still do not get >2X scaling. You're left with processing power and that's a factor of 2.

And yes, I do know what an upper bound is. We're not complaining about not being able to reach 8x in practice, it's your explanation of why it can be bigger than 2 that's incorrect. Why even stop at just those 3 things? 2 cards will also have 2x ROPs and 2X texture units and whatnot, the theoretical max should be in hundreds according to your way of computing it, right? The CF must be very inefficient. And given the exponential nature of your scaling, 3xCF must be horrendously inefficient...

Triggaaar · Dec 23, 2010

Seero said:
Some claim that doubling each individual factor can only have 133% performance at best, not 200%.

However, if the same rule applies to mem size and number of GPUs, then each can individually increase performance by up to 33%, which in total 133% * 133% * 133% = 235.26%.

No, the maths doesn't work like that (ie, you have the wrong equation). If each individual factor were to have equal effect on total performance, doubling the performance of each factor does not increase overall performance by a third. It would be a fifth for the first factor, a quater for the second, and a third for the last. The reason the overall performance boost goes up, is because the same peformance increase each time, has a greater overall effect when the total time has reduced - better explained with an example:
If someting takes 90 seconds, 30 seconds for each component, and you double up 1 component (say 2 GPUs, having the time), you new total time is 30 + 30 + 15 = 75. A 20% improvement (0.011 rec frames per second to 0.0133 recuring frames per second = 20% increase).
Now doubling up the 2nd component gives total time 30 + 15 + 15 = 60. A 25% improvement over 75.
And doubling up the 3rd component gives total time = 15 + 15 + 15 = 45. A 33% improvement over 60.
So doubling all of them gives you improvements of 1.2 * 1.25 * 1.33 recurring = 2.

You can do your own examples Seero, where the time for each component is different, and the final result will be 200% max.

Seero · Dec 24, 2010

Triggaaar said:
No, the maths doesn't work like that (ie, you have the wrong equation). If each individual factor were to have equal effect on total performance, doubling the performance of each factor does not increase overall performance by a third. It would be a fifth for the first factor, a quater for the second, and a third for the last. The reason the overall performance boost goes up, is because the same peformance increase each time, has a greater overall effect when the total time has reduced - better explained with an example:
If someting takes 90 seconds, 30 seconds for each component, and you double up 1 component (say 2 GPUs, having the time), you new total time is 30 + 30 + 15 = 75. A 20% improvement (0.011 rec frames per second to 0.0133 recuring frames per second = 20% increase).
Now doubling up the 2nd component gives total time 30 + 15 + 15 = 60. A 25% improvement over 75.
And doubling up the 3rd component gives total time = 15 + 15 + 15 = 45. A 33% improvement over 60.
So doubling all of them gives you improvements of 1.2 * 1.25 * 1.33 recurring = 2.

You can do your own examples Seero, where the time for each component is different, and the final result will be 200% max.

Ok, so basically you are trying to fix the number to keep everything under 2*performance. Which is what most of the posters are trying to say.

Now it is hard to find a cf/sli vs single time vs performance chart out there, you just need to find it. I found one where there is 5770 vs 5770CF from
Hardocp.

There are several pages of data. This page is particularly interesting as on the second graph at time 166, you see a single 4770 running at 30 FPS where 2x4770 runs at 100FPS. There are many graphs where > 2*performance at some time t occurs. What is your explanation on that?

Clearly, I didn't make those graphs, and I don't think the purpose of those graphs has any relationship of this debate, so it can't be bias. Also note that 4xxx doesn't scale well compare to newer generations. If you can find more graphs like this where it shows time vs performance on sli/cf, please share.

Rebel_L · Dec 24, 2010

I will preface this with saying that I certainly dont have a good understanding of the inner workings of a GPU, but I can suggest how you could end up with greater than 100% improvement from 2 cards vs 1.

Basically it boils down to that in some situations having 2 cards can require you to do less work than 1.

Resources from a single card have to be used to decide what work has to sent where, also the outputs received back from various places have to be combined again into a finished product. Adding a second card into the mix dosnt change this bit of work, so the second card has these resources available to do other work instead. If for example 90% of a cards resources are used for management then the second card could add 100% of its resources to the 10% from the first card doing non management work and give you 10x performance. (obviously card management dosnt consume 90% of a cards resources, and resources suitable for doing one kind of work are not necessarily suitable for doing other kind of work, but its still a way of 2 cards working together having to do less work than 2 single cards doing the same thing.)

Another example where you can do less work with 2 cards vs 1 twice is with complex calculations. If any calculation equation or whatever you wish to call it takes multiple cycles to complete the results of the part calculation of cycle which does not complete the calculation has to be saved for the next cycle. This consumes resources, and being able to finish a calculation in less cycles requires less carryover of work and so in total less work. Say for instance a card can do 100 units for work and the resources used to store the results of a previous cycle for the next to work on uses up 1 unit of work. If we have calculation that requires 200 units of work done, the first card will do 99 of the 200 units the first cycle and 1 unit to store the results for cycle 2, which will do another 99 and 1 to store for cycle 3 which only does 2 units of work out outputs the result. if you had 2 cards you could do all 200 units of work in one cycle going from 3 cycles for one card to 1 cycle for 2. If the result is required for another calculation to start this would be a huge boost as the rest of cycle 3 for the single card would have to be wasted, but even otherwise card 1 had to do 202 units of work compared to the 200 done by the two cards.

Now I would hope that cards are designed to minimize these types of inefficient overheads such that the work required for the overhead is such a small % of the cards overall work in a cycle that it saving on it would lead to performance increases of an unnoticeable size.

taltamir · Dec 24, 2010

@Seero: Lets just agree to disagree on that point, I had enough of this argument and its not going anywhere.

Resources from a single card have to be used to decide what work has to sent where, also the outputs received back from various places have to be combined again into a finished product. Adding a second card into the mix dosnt change this bit of work, so the second card has these resources available to do other work instead.

Cards work in AFR, that is, each card takes a turn to render an alternate frame. Thus this does not occur.

But it wouldn't work regardless. Lets call those things you just described step A and Step B.
Step A can only be done by one card, step B can be done by both cards... Your logic is, card A is spending 20% of its time doing step A, then 80% doing step B... Card 2 is thus able to spent 100% of its time doing Step B...
there are two major issues with this:
1. The way percentages work, if it happened this way, and step A was a constant, then you wouldn't get a theoretical max of greater than 100% increase as I have pointed out before.
2. The way GPUs work, card 2 will actually be spending 20% of its time idle and doing nothing (while card 1 worked on step A) then spent 80% of its time (same as card 1) working on step B. Video rendering is a very structured process and you can't just skip steps, if a step is unfinished you have to wait.

It was a neat idea though, for about a minute I was thinking it was a good explanation too before figuring out why it wouldn't work.

Seero · Dec 24, 2010

taltamir said:
@Seero: Lets just agree to disagree on that point, I had enough of this argument and its not going anywhere.

Did someone force you to read or post?

taltamir said:
Cards work in AFR, that is, each card takes a turn to render an alternate frame. Thus this does not occur.

But it wouldn't work regardless. Lets call those things you just described step A and Step B.
Step A can only be done by one card, step B can be done by both cards... Your logic is, card A is spending 20% of its time doing step A, then 80% doing step B... Card 2 is thus able to spent 100% of its time doing Step B...
there are two major issues with this:
1. The way percentages work, if it happened this way, and step A was a constant, then you wouldn't get a theoretical max of greater than 100% increase as I have pointed out before.
2. The way GPUs work, card 2 will actually be spending 20% of its time idle and doing nothing (while card 1 worked on step A) then spent 80% of its time (same as card 1) working on step B. Video rendering is a very structured process and you can't just skip steps, if a step is unfinished you have to wait.

It was a neat idea though, for about a minute I was thinking it was a good explanation too before figuring out why it wouldn't work.

You actually bring up something rather interesting. If each card spend 20% doing nothing and 80% working, then the total performance scale should be no more than 160% on ideal case. Ain't you further away from the result we can from benchmarks?

So 1 card working at 100% gives you X fps, and 2 card working at 80% each gives you 2X fps. Isn't it obvious that, in theory, both cards can work at 100%? If so, doesn't 2 cards working at 100% gives 2.5x performance?

Triggaaar · Dec 24, 2010

Seero said:
This page is particularly interesting as on the second graph at time 166, you see a single 4770 running at 30 FPS where 2x4770 runs at 100FPS. There are many graphs where > 2*performance at some time t occurs. What is your explanation on that?

You can't take 1 split second in time in two different runs to compare a single card vs crossfire. Overall, the crossfire does not perform twice as well as a single card.

Seero · Dec 24, 2010

Triggaaar said:
You can't take 1 split second in time in two different runs to compare a single card vs crossfire. Overall, the crossfire does not perform twice as well as a single card.

If it is impossible, then it is impossible. I only need 1 example to show you that it is indeed possible to have > 2x scale to contradict your theory of "it is impossible." In fact, this is the OP's question, why is it possible?

You may say that i am taking 1 ms of data out of years and that is just an error, but if that data does not fall into the margin of errors, than it is indeed not an error. We are now seeing cases where taking the average readings of FPS over the span of minutes(aka benchmarking) and the result shows that the average FPS does scale more than 200%, which is not by taking 1 split of a second, and are reproduced by different people, different hardware setup, and different softwares. If I can find the time vs performance graph of those, you will see that not only it exists, but throughout the graph you will see > 2 scaling because there must be some readings where scaling < 2 to have a total average of 200%, so let alone 200+%.

GaiaHunter · Dec 24, 2010

If you double your resources the max increase you can have is twice.

A single point on the chart isn't relevant because that point can be the anomaly, which considering a PC ecosystem is quite easy to happen.

That point can be explained by the PC ecosystem having some other bottleneck at that precise second.

The above is easy to understand - sometimes a graphic card won't reach its potential on a slower CPU compared to another faster one.

It is possible to reach twice the performance without even doubling the total resources - if for example a card simply hasn't enough bandwidth it is useless to add more shaders and it is much better to add bandwidth.

But on this case we are talking about doubling the exact same resources and not resource tweaking.

Why didn't you include on your function the caches? They also double. Why did you agglomerated all those things like shaders, tmus, caches, etc, into a single unit, instead of splitting it?

The fact is you can agglomerate the memory, the processing power and the I/O too and call it a graphics card + monitor.

Voo · Dec 24, 2010

Seero said:
If it is impossible, then it is impossible. I only need 1 example to show you that it is indeed possible to have > 2x scale to contradict your theory of "it is impossible." In fact, this is the OP's question, why is it possible?

Ok, now you've just got to prove that the situation in that particular benchmark was exactly the same both times, i.e. the SLI/CF configuration rendered exactly the frame that was used by the single GPU (technically impossible), while taking the OS scheduling, the cache and every other small detail into consideration.

Or if you want a simpler exercise, just invent a perpetuum mobile.

blanketyblank · Dec 24, 2010

Not saying this is the reason, but it would make sense if you think of it like the relationship between RAM and harddrive when managing a pagefile. If the computer has enough RAM more stuff can be loaded onto much faster memory so the hard drive is never used.
If this were to apply for multi GPUs that means a single card will be much worse than CFed cards if it has to "page" or do something similar and the multigpu setup doesn't.

Maybe a decent analogy is a half truck and a full truck moving stuff. If you're moving over a half truck load the half truck has to not only make more trips (potentially twice as many), but also each time it loads and unloads adds a little extra time compared to the full truck.

Triggaaar · Dec 24, 2010

Seero said:
If it is impossible, then it is impossible. I only need 1 example to show you that it is indeed possible to have > 2x scale to contradict your theory of "it is impossible."

Yes I agree, but picking a spike on one graph and showing it against a graph from a different setup run at a different time is not an example of >2x scaling.

You may say that i am taking 1 ms of data out of years and that is just an error, but if that data does not fall into the margin of errors, than it is indeed not an error.

That's two different things. Margin for error is one thing, a spike on one graph against a different graph run at a different time is another. The cards could be doing something completely different at that fraction of a second.

We are now seeing cases where taking the average readings of FPS over the span of minutes(aka benchmarking) and the result shows that the average FPS does scale more than 200%, which is not by taking 1 split of a second, and are reproduced by different people, different hardware setup, and different softwares.[/QUOTE]Yes, these are the results we want to look at more closely and identify whether they really are scaling over 100%, and then consider what could cause it.

There will be times with a PC where you can a large performance improvement relative to the increase in hardware - for example, increasing memory - if you have 1GB of memory, and your machine is have to do a lot of reading from/writing to a page file, you could find that performance more than doubles by adding another GB of memory. There may be similar situations with graphics cards, maybe someone with a good understanding of how they work can tell us, but I do know that your method for calculating the theoretical limit is incorrect.
EDIT - seems Terry Wogan above me had already come up with the same example.

Triggaaar · Dec 24, 2010

Seero said:
Ok, so basically you are trying to fix the number to keep everything under 2*performance. Which is what most of the posters are trying to say.

Don't be rude Seero, I've spent some time explaining to you the flaw with your equation. I haven't tried to keep the number under 2, I've told you how it is worked out (ie, why it's not simply 133% x 133% x 133%). If 2 cards are scaling above 100% (allowing for errors) that's because of efficiency savings (like in the paging example above) and not demonstrated by your equations.

Voo · Dec 24, 2010

blanketyblank said:
Not saying this is the reason, but it would make sense if you think of it like the relationship between RAM and harddrive when managing a pagefile. If the computer has enough RAM more stuff can be loaded onto much faster memory so the hard drive is never used.
If this were to apply for multi GPUs that means a single card will be much worse than CFed cards if it has to "page" or do something similar and the multigpu setup doesn't.

Good idea, but alas both GPUs will have the same memory, i.e. using SLI/CF won't double your available memory

BFG10K · Dec 24, 2010

It is possible to get more than 100% scaling, but its not done in the manner that is being presented here.

Its done by making an AFR system render ahead further than a single GPU system does. That way the framerate is higher during GPU bound situations, at the cost of input latency.

Seero · Dec 25, 2010

BFG10K said:
It is possible to get more than 100% scaling, but it’s not done in the manner that is being presented here.

It’s done by making an AFR system render ahead further than a single GPU system does. That way the framerate is higher during GPU bound situations, at the cost of input latency.

I thought Alternate Force Render means, basically, allow rendering on the second video card before the first one is finishes on the other card. That is one of the implementation which makes scaling possible. However, this implementation requires a small delay between frames, and this delay is already covered by taltamir. His number of that delay is 20%, that may be old, but the theory still holds because such delay is required for AFR to work. Each card can only play at 80% efficiency, and there for 160% in total, where perfect scaling requires 200%. Now even if the delay became so small that SLI appears to have 199.99% efficiency, it still does not explain why it can go beyond 200% scaling. However, benchmarks shows 200% under AFR, and I don't think you have covered the reason of this. You stated it isn't the reason isn't presented here, can you share what that reason is then?

BTW, you mentioned input delay, but I don't think input delay has anything to do with this context as, in theory, whatever input occurs at time t, will occur on the frame generated right after time t on a single card, but on 2xcard, it may occur on the next next frame as the process of the next frame may have initiated before time t. This input has no impact on how frames are generated or how does scaling works. Am I incorrect?

BFG10K · Dec 25, 2010

Seero said:
You stated it isn't the reason isn't presented here, can you share what that reason is then?

Compared to a single card system, AFR always operates with [n 1] frame latency, where [n] is the number of GPUs in the system. This is because the ideal case of scaling is achieved when one frame is being displayed while all of the other GPUs are working on new frames, frames that will be displayed in the future. If the system didnt render ahead then AFR wouldnt gain any performance over a single card, because all youd be doing is alternating between multiple GPUs at single card scheduling intervals.

The input delay occurs because the game is sampling the input rate independent of the framerate, which means a keyboard input at a given time will only be shown when the correct frame is displayed. But that frame is actually queued to be rendered in the future due to pre-rendering, so it can only be displayed when all older frames have been displayed first.

The explanation as to why you can get over 100% efficiency comes from that pre-render value. A GPU can only render one frame at a time, and with no pre-rendering the CPU must wait before constructing a new frame, which reduces the framerate even more in these situations. But pre-rendering lets the CPU keep preparing frames even when the GPU is busy, and the results are stored in an offline buffer. That way when the GPU is ready for a new frame, it can start working immediately without having to wait for the CPU to construct it.

If the pre-render value is higher for a multi-GPU system than it is for a single GPU system, its possible to achieve greater than 100% scaling in ideal cases, because a higher value makes it more likely the CPU can continue working on frames, while it mightve stopped on a single card with a lower value. Of course since youre rendering frames even further into the future relative to displaying them, input lag increases even further.

Other factors can also contribute to greater than 100% scaling such as benchmarking noise, and also driver bugs that hold back a single GPU but dont manifest themselves in multi-GPU situations for whatever reason.

Seero · Dec 25, 2010

BFG10K said:
The explanation as to why you can get over 100% efficiency comes from that pre-render value. A GPU can only render one frame at a time, and with no pre-rendering the CPU must wait before constructing a new frame, which reduces the framerate even more in these situations. But pre-rendering lets the CPU keep preparing frames even when the GPU is busy, and the results are stored in an offline buffer. That way when the GPU is ready for a new frame, it can start working immediately without having to wait for the CPU to construct it.

If I understand you correctly, you said although the actual time required to render a frame does not decrease regardless of the number of cards present, but with N cards, it is possible, and practical under some specific scenarios where it is possible to render frames in the future. Assume that it takes X ms to generate a frame, ideally, N cards can generate N frames in X ms.

Also, since there are more than one card, it is possible for the CPU to prepare a new frame even though one card is busy due to the fact that there may be another card that is actually ready.

However, pre-render is not exclusive to SLI/CF, and has a default value of 3. Now, suppose there are 2 cards, and delay between alternate frame is 0, then 2 card can generate 2x the number of frames a single card can produce. But if that is the case, video card itself is the bottleneck at CPU must wait. The wait is, at best, halfed, therefore able to generate 2x frames for video cards to render. That means, at the end, it is at best 2x performance(FPS count).

BFG10K said:
If the pre-render value is higher for a multi-GPU system than it is for a single GPU system, its possible to achieve greater than 100% scaling in ideal cases, because a higher value makes it more likely the CPU can continue working on frames, while it mightve stopped on a single card with a lower value. Of course since youre rendering frames even further into the future relative to displaying them, input lag increases even further.

Well, that is cheating. If user don't go into the driver and change settings, the pre-render of pre-render value reminds the same. Unless new bottleneck arises with the introduction of SLI/CF, the number of actual pre-rendered frame should be smaller than, or equal to a single card.

Consider this, suppose the CPU is a bottleneck, meaning that video cards must wait for the CPU to pre-render a frame, then at worst case, no scaling. Suppose the CPU is not a bottleneck, then at best case, scaling is 200%, not greater than 200%.

BFG10K said:
Other factors can also contribute to greater than 100% scaling such as benchmarking noise, and also driver bugs that hold back a single GPU but dont manifest themselves in multi-GPU situations for whatever reason.

Of all reasoning, I must admit that this is the most sounding reason, but I highly doubted this. Since AMD and Nvidia doesn't share their codes, and the way they handle varies tasks are different, yet both SLI/CF shows > 200% performance.

Ben90 · Dec 25, 2010

While I'll stay out of the main debate because a certain someone is too stubborn to make it matter, I would like to point out that there can be times where with certain things, scaling can exceed the linear boundaries depending on what exactly you are measuring.

This does not apply to the GPU directly when using AFR. This also doesn't really apply even with SFR due to the nature of rendering. I am not sure if this can apply to the video card driver somehow because I just don't know enough about how the driver handles things.

Greater than linear scaling can occur quite often on on the CPU when there are large amounts of overhead in the code. When the code requires X things done regardless of framerate and the rest come optional with resources to spare, you can often see massive boosts of performance from small changes. Note that the amount of processing done doesn't have to increase over linear (although it can if the optional code uses the core resources better), but the framerate can see impressive changes.

The best example I can provide is Starcraft II. For those of you that play custom games, I'm sure you are aware just how resource intensive some of those games can be. When you are computing at ~0.5fps, an overclock can bring orders of magnitude better performance. The actual amount of work doesn't change that much, but since the "optional" frames don't take nearly as much processing as running the required engine calculations, a 50% overclock on a i7-920 can bring a 100 fold improvement in framerate.

taltamir · Dec 25, 2010

Seero said:
I thought Alternate Force Render means, basically, allow rendering on the second video card before the first one is finishes on the other card. That is one of the implementation which makes scaling possible. However, this implementation requires a small delay between frames, and this delay is already covered by taltamir. His number of that delay is 20%, that may be old, but the theory still holds because such delay is required for AFR to work.

1. I made up the 20% figure to flesh out the example, I thought I made that clear.
2. the 20% figure was NOT referring to AFR, it was referring to a hypothetical SFR situation as described by Rebel_L, I thought I made that clear too.

sorry about any confusion here

Seero · Dec 25, 2010

taltamir said:
1. I made up the 20% figure to flesh out the example, I thought I made that clear.
2. the 20% figure was NOT referring to AFR, it was referring to a hypothetical SFR situation as described by Rebel_L, I thought I made that clear too.

sorry about any confusion here

Be it alternate force render or split force render, it can't be a reason for 200% scaling because those are only methods of splitting loads which make scaling possible. The ideal case is loads are split in half perfectly and no other overheads, making multi-card utilization possible. In short, if 2 frames needs X ms to to generate, then having 2 cards, in theory can generate 2 frames, with half of the time, X/2 ms. This is the best possible case as if loads (both is terms of data required to be transfered, data needed to stored, and data needed to be computed) are indeed perfectly split. That is all fine as we are talking about the maximum possible scaling, or are we? Suppose loads to video cards indeed split perfectly throughout the benchmark, it can still not reach 200% scaling under that best possible theoretical case because, as Rebel_L mentioned it, that there must be some form of management mechanism to control SLI/CF and it is an overhead that does not exists in single card, making it impossible to have scaling of 200%. Both SFR and AFR doesn't change this as loads were split, and not reduced. Unless there exists an implementation where processing power exponentially increase performance, the implementation cannot be a factor of this phenomenon.

Seero · Dec 25, 2010

Ben90 said:
The best example I can provide is Starcraft II. For those of you that play custom games, I'm sure you are aware just how resource intensive some of those games can be. When you are computing at ~0.5fps, an overclock can bring orders of magnitude better performance. The actual amount of work doesn't change that much, but since the "optional" frames don't take nearly as much processing as running the required engine calculations, a 50% overclock on a i7-920 can bring a 100 fold improvement in framerate.

Have you ever tried to catch a bus and is 10 second too late, ended up waiting for 15-30minutes for the next bus?

Now try to picture this, CPU passes data through bus to video card, and video card only start its task when data arrives. Suppose CPU needs 5.11ms to generate the data, bus leaves every 5.1 ms, and video card needs 5 ms to render it. In this case, if CPU can indeed work just a little faster, to a little bit under 5.1ms, then it can catch every single bus. It doesn't matter if it is more CPU, faster process, or simply lesser load. If it bring down computation time by 1 ms, then the system is capable of delivery 2x FPS.

In practice, bus are never on schedule as everyone wants to talk to CPU and everyone is waiting for CPU to reply, and for CPU to be able to reply, it must have all the replies from others. It may only take 1ms for CPU to process the data, but it look forever to get to its destination.

If something indeed got processed faster, then what you said is applicable. Adding one more card, however, does not make anything faster. In fact, more traffic is generated by having more units, and extra management(i.e.synchronization) is needed, making perfect scaling impossible, however results shows otherwise.

To me, perfect scaling is a myth. You can never reduce time by half by doubling cores as to make it work, there are always new overheads, in other words more load. Scaling HDDs/SSDs is a good example. 2xHDD = under 170% scaling on Raid0. It never even close to 200% due to overheads. SSDs on the other hand scales really well, but can it exceed 200% scaling? No, but it is really close to 200%. That means, those little controllers are really good at doing its job, as if it takes no time at all.

Now a video card is much more complicated that a SSD (it has more parts) and does much more complicated tasks. Not only I don't believe that tasks can be divided perfectly all the time, but they are actually scaling perfectly. However, that doesn't mean I don't believe in over 200% performance. If there exists another factor that will also scale up performance independently, than it is possible to have > 200% performance, while each factor is not where close to perfect scaling. The factors I chose may be wrong, but the existence of those factors and very high. Think about it.

Voo · Dec 25, 2010

Seero said:
However, that doesn't mean I don't believe in over 200% performance. If there exists another factor that will also scale up performance independently, than it is possible to have > 200% performance, while each factor is not where close to perfect scaling. The factors I chose may be wrong, but the existence of those factors and very high. Think about it.

Well now that's a reasonable post. If you don't just double the resources, but also shift some independant variables you can get a larger performance boost (e.g. the adding more memory/cache example).
Though with GPUs I don't see where something similar could be expected - BFG makes an interesting point, but then that's more a driver thing which should be applicable to single GPUs as well (maybe I misunderstand it), but if it's not implemented/different default value there, we've got another factor.

The bus analogy doesn't really hold up though (but yeah I did, thanks for reminding me), since it's more like getting to your car 10seconds later.. you'll drive away 10seconds later and that's it. Since PCIe isn't a bus (and bidirectional) you don't even get any interference from any other device.

Seero · Dec 26, 2010

Voo said:
Well now that's a reasonable post. If you don't just double the resources, but also shift some independant variables you can get a larger performance boost (e.g. the adding more memory/cache example).

Which part of the reasoning from my posts changed?

Voo said:
Though with GPUs I don't see where something similar could be expected - BFG makes an interesting point, but then that's more a driver thing which should be applicable to single GPUs as well (maybe I misunderstand it), but if it's not implemented/different default value there, we've got another factor.

He is saying that it is caused by a bug which occurs on only single GPU setups.

Voo said:
The bus analogy doesn't really hold up though (but yeah I did, thanks for reminding me), since it's more like getting to your car 10seconds later.. you'll drive away 10seconds later and that's it. Since PCIe isn't a bus (and bidirectional) you don't even get any interference from any other device.

If PCIe is not a bus, but the connection between it and North bridge is.

Edit: sometimes people refer PCIe as a bus too, but you get the idea.

greater than 100% scaling? really?

Member

Golden Member

Member

Golden Member

Senior member

Lifer

Golden Member

Member

Golden Member

Diamond Member

Golden Member

Golden Member

Member

Member

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Platinum Member

Lifer

Golden Member

Golden Member

Golden Member

Golden Member