I'm actually think it's a good idea to have a discussion about stability and how to test for it. Because on one hand I want a stable system but on the other hand a total torture test might be more than what is needed for stability testing.
In normal situations I've never been close to anything that resembles a LinX/Prime95 AVX torture test, so how to do a varied load testing that resembles "normal" full loads, that still stresses all of CPU without those extreme tests.
Here's the deal with torture tests and stability - instability is a time-dependent phenomenon, and torture testing is one means of performing an accelerated test on stability by reducing the average time-to-failure.
So what does that mean?
It means that your rig, just for example, might on average (statistically) be prone to a "destabilizing event" once every 3 hours, every 3 days, 3 weeks, 3 months, or once every 3 years, etc.
But you wouldn't know in advance whether to it was 3 days or 3 years unless you generate stability data that would speak to the statistics of your particular rig's time-to-failure distribution.
This is where having the ability to "accelerate" the time-to-failure comes in handy. By heating up your CPU to temperatures hotter than it will typically experience, you shift the entire distribution of time-to-failure statistics to the left (to shorter times) but in a very predictable and straightforward (
to an engineer at least) way.
And thus you can generate your data in a time-compressed manner, rather than literally waiting 3 days, or 3 months, to see for yourself how frequently or infrequently your computer seems to randomly crap out.
Torture testing is one means of compressing the time to failure. Making your rig survive longer without becoming unstable during a torture test will have a direct causative impact on extending the time-to-failure for the same computer when it is operating at lower temperatures with less strenuous computing scenarios.
There is no such thing as a stable computer, not even at stock straight from the manufacturer/seller is the computer stable in the sense that consumers think of the term.
Such stock computers havea statistically known timeline for their expected time-to-failure, and part of what goes into defining the stock operating parameters (voltage, max temperature, clockspeed) is the intended reliability which the manufacturer intends to customer to experience. (how frequently can the computer reset or have an instability issue before the customer finds the rate of failure to be unacceptable?)
By running various torture tests, and passing them for arbitrarily set minimum lengths of time (1hr, 1 day, 1 wk) or cycles (10, 100, 1000) we are increasing the likelihood that the computer will have an even longer time-to-failure profile when operating at reduced temperatures with less strenuous work-loads.
A computer that can pass 1 day of a given torture test is assured to be more stable (higher time-to-failure) than the computer that can pass only 1 hour of the same torture test when those two computers are otherwise being used to handle everyday computing type transactions.
We just can't say
a priori whether that means the computer that passed 1 day of torture testing will now live 3 months on average before experiencing a random instability whereas the less stable rig might only survive 2 weeks, etc.
We can't speak to how much more time to failure we have bought ourselves by pursuing the longer time to failure in the accelerated testing environment, but we are guaranteed (as in physics and engineering) that we most certainly did buy ourselves more time.