Interesting to know , what do you mean by "project horizontal scalability "?
Also, it maybe that the bottleneck now is the rate at which they can produce WUs, at least the Linustech video mentioned that.
There are 2 types of performance scaling that we test - horizontal and vertical. Lets look at one example of each based on processing 10 jobs that in a current setup that each take 1 hour to complete. If it helps, you can think of them as 10x 1 hour WUs for Folding at home.
The first is the one most people are familiar with - Vertical scalability. We take this set of 10x 1 hour jobs, and we increase aspects of the computer running it - we double clock speed and go from a 2ghz CPU to a 4ghz CPU. Ideally each job will now take 30 minutes each and a total of 5 hours to complete. If someone suddenly sends us 20x work units and we vertically scale by 2x, we should be able to complete those 20 work units in 10 hours. However, the problem with vertical scaling is computers only get so big - you might be able to go from 2ghz CPUs to 4ghz CPUs, but there are no 8ghz CPUs. Depending on the work load, you can run into other bottlenecks in a system as well. You can go from 256GB ram to 512GB ram, but 512GB to 1TB ram is a lot harder, or 50TB storage to 100TB, but maybe not 200TB, and so on.
The second one is horizontal scalability, which is the ideal we strive for in cloud applications, because we typically have lots of the same type of servers to throw at a problem. Given our 2ghz server that does 10 work units in an hour, if we add another server, can we do 20 work units in an hour? Can we add 4 servers and process 40 work units? Can we add 10 servers and do 100? In a well designed cloud application, we should be able to add as many horizontal servers as necessary to sustain any load, and do it dynamically. If our current workload is 15 work units, we should have 2 servers. If in 2 hours we have 100 work units, we should be able to immediately add 8 more servers and process the load. If, 2 hours after that, our inbound work units drop to 6, we should be able to remove 9 servers without any issues.
Now, despite working for a fortune 100, they don't give me the resources to test at 20x and 50x load (200 and 500 work units), so what we do is measure the whole application when going from 10 work units to 100 work units, and then we try to deduce if any part of the application will fail with additional load. Maybe our application depends on a database, and we can't horizontally scale the database. We might use 5% of our DB for 10x work units, and 50% of the DB at 100 work units. We can project that the current application will be able to sustain about 200 work units before the DB prevents adding more work units. In a case like this, we might know the DB will be a problem, but since we are unlikely to get 20x more load quickly, we will have time to address the DB issue when needed.