Official Haswell-E thread

Mydog · Sep 23, 2014

I would not use Prime, Linx, IBT etc for stability testing on X99, use AIDA, Realbench, x264 and Games like BF4 etc.

Single thread and a 4.9 GHz multi.

Trying to hit 2000CB but it's hard with this memory, the CPU drags it around like an anchor.

Edit: I'm keeping the water temp at 16 C now as I'm pushing vcore pretty hard. All voltages I use here are for short benches only and at my own risk

groovieknave · Sep 24, 2014

Grooveriding said:
First update to the latest BIOS: http://www.overclock.net/t/1510328/...upport-thread-north-america/990#post_22893334

Then leave everything at stock, use manual voltage control, change your CPU input voltage to 1.9, CPU voltage to 1.3v and set your CPU ratio to 45.

Try to boot into Windows, if successful run a system stability stress test like the one included in the latest Aida64 Extreme beta build. If you pass go back to BIOS and either start reducing CPU voltage and running the stress test until you fail the test in order to get your lowest voltage needed for 4.5ghz, or try raising the ratio to 46 and seeing if you still pass. If you fail at 1.3v and 4.5 then reduce the CPU ratio to 44 and try again. I would not go over 1.3V on the CPU with your cooling setup. Make sure you watch your CPU temperatures as well throughout all of this.

You can also try adjusting the CPU current and LLC settings under the Digi+ power settings in BIOS if you are having a hard time getting your overclock stable. A lot of people have found LLC to 6 or 7 and cpu current at 120% can help. This will raise temperatures but can help to get an unstable overclock stable.

Once you find a stable CPU clock you can try raising the Max cache ratio to something like 40, you'll also need to set the CPU cache voltage to near 1.2 to get that stable. This is only if you want to play with the cache speed, it doesn't make that much difference.

Once you've found your stable overclock and necessary voltages using manual control you can switch over to Offset or Adaptive voltage controls so your voltage and clocks will dial down when the system is under a light load or at idle. Adjust offsets as necessary to get the needed voltage you determined in manual mode delivered to your CPU when it's under loads. Make sure Speedstep is enabled and CPU C states are also enabled, you need at least C1 to be set to enabled for voltages and downclocking to work with adaptive and manual mode.

This is a good thread to read : http://www.overclock.net/t/1510388/haswell-e-overclock-leaderboard-owners-club A lot of HW-E owners there overclocking the chips. And this PDF gives a little more detail on how to overclock on the platform in the context of what Asus saw binning a pile of 5960X chips : https://docs.google.com/file/d/0Bz2VRRbLPrZnYmJVVHM2UGxVS00/edit?pli=1

Once again, thanks for all that info. I am now at 4.5 ghz 1.275vcore and 2600mhz memory, I ran prime95 for an hour and that passed with a lot of heat. Going to run realbench next... but I think I am good for now!

AdamK47 · Sep 24, 2014

Mydog said:
I would not use Prime, Linx, IBT etc for stability testing on X99, use AIDA, Realbench, x264 and Games like BF4 etc.

I don't understand why not. If you want to stress test, you want to do it with something that can actually stress test.

AdamK47 · Sep 24, 2014

I decided to reinstall AIDA64 Extreme and check out the stress test.

Am I doing something wrong? I tried out the default CPU, FPU, cache, and memory combined test and I don't see temps getting over 50. I see higher temps with LinX simply allocating memory. That's not even LinX at the processing phase. I tried AIDA64 CPU test alone and that was even weaker. Some of the cores never get above 46 degrees. Heck, even Cinebench puts more stress on the CPU than this.

I firmly disagree with the built in stress test in AIDA64 being a good indicator for stability. It's incredibly weak. LinX with the latest libraries is by far the best and the truest test for determining stability.

If you want me to believe your overclock is stable, run LinX affinity locked to each core with near max memory allocation for 8 hours. Post a screenshot.

Mydog · Sep 24, 2014

I can't explain in detail or technical why there's no point in using those kind of stress test software I only speak of my own experience and others. I've run Prime, IBT etc for hours to check stability and what appeared to be a stable OC failed in a short time in games or other applications.

Mydog · Sep 24, 2014

This is one of the good anwers on the matter

I agree this is stated quite often but it seems as time goes by fewer users are sticking to this flawed assumption. Stability with Prime type unities is only valid if those are the only programs you are using. They apply a load and use a set of instructions few will ever encounter in real life use. And in all honesty neither the CPUs nor motherboards are designed for this type of torture. Since Sandy Bridge I have been able to configure systems to pass Prime type utilities but they will not successfully make it through a 6 hour render in 3ds Max. This is why testing with the programs and games you normally use should be considered the final test of stability. However, Prime is good if one wishes to test the capabilities of the cooling system.

No overclock is 100% stable

AdamK47 · Sep 24, 2014

I'm going to have to disagree with that assessment. The argument that a person should only test their overclock with the type of workload they will be using on their system is just something I have never agreed with.

You aren't stress testing if all you put on the system is light loads. I like to call what you are doing to get 4.8GHz+ dirty clocking. It's Jessi Pinkman's Chilli P to Walter White's blue. The quote you posted has been made by some overclockers for years to justify unrealistic and unstable overclocks. You're only setting yourself up for disappointment once that new game comes out and puts extra load on your CPU causing lockups, reboots, crashing, etc. It's an invitation to trouble. You are, of course, free to use the system the way you want though.

It's also true that LinX/Prime95 alone is not the only basis someone should use to determine stability. Different types of software will exercise the system in different manners. A combination of software can also increase or shift loads one way or another to uncover stability problems. LinX affinity locked with high memory usage is a good start. LinX without affinity locking using 12 or 13 threads while also looping 3DMark/Unigine at max settings is another step I use for stress testing. I want to put the system in its worst case scenario and try to break it. Let it go for hours and hours. I'm not affraid to scale back if something fails.

I don't want instability issues constantly looming over me. A couple hundred MHz isn't worth it. If I want more performance I'll simply wait a year or so until the next latest and greatest component is released. When that time comes, it also helps put into perspective all the wasted effort that went into squeezing every last drop of performance out of the previous generation. It's why I don't go full blown water cooling and why I don't care about getting a few percent more in clock speeds. Something better will eventually replace it.

Grooveriding · Sep 24, 2014

I used linx and aids64 on my 5960x to check stability. Adam, those temps are low for aida. Did you use the latest beta that is updated for x99 ? My linx temps are higher than aida, but only by about 8C. The right aida build should list all the relevant x99 voltages under the sensor area.

AdamK47 · Sep 24, 2014

I'm using 4.60.3153 Beta. It says it has a build date of 09/23/2014, yesterday. I suppose that's pretty new.

If I let it run long enough I do see it hitting 60. It's not like LinX where cores hit 70 right after memory allocation.

ehume · Sep 24, 2014

I gave up on Aida long ago. It doesn't put enough stress on my system.

When you are dealing with systems that are stable to OC but thermally limited, you want to run as hot as possible. Of all the pieces of software I have tried, Linpack with AVX2 gets the hottest. The newest LinX is an acceptabnle front end, IMO.

WhoBeDaPlaya · Sep 24, 2014

Try the x264 Stability Test. It rooted out instability in my 4770K when it was ~8 hour Prime95 AVX2 + 100x standard IBT passes stable.

http://www.overclock.net/t/1487922/going-deeper-on-the-x264-v2-stress-test

BonzaiDuck · Sep 24, 2014

I'm still running this SB-K processor, and do all my stress-testing and "recertification" with the latest programs I found when I built the system.

Looking in on this thread or that, I decided to download the latest Prime95 version, which some had said provided more thermal stress with AVX2 extensions.

Here is the range of core averages from Minimum to Maximum: 65C to 68.5C @ 80F room ambient for old version; 68.5 to 72C for latest version 28.[whatever]. The latest version provides no more heat stress than did the versions of LinX and IBT I'd used before.

I wouldn't trust Aida-64's test alone to prove stability.

How my other pronouncements could be relevant to Haswell processors, I cannot say. Someone else could enlighten us.

Mydog · Sep 24, 2014

AdamK47 said:
I'm going to have to disagree with that assessment. The argument that a person should only test their overclock with the type of workload they will be using on their system is just something I have never agreed with.

You aren't stress testing if all you put on the system is light loads. I like to call what you are doing to get 4.8GHz+ dirty clocking. It's Jessi Pinkman's Chilli P to Walter White's blue. The quote you posted has been made by some overclockers for years to justify unrealistic and unstable overclocks. You're only setting yourself up for disappointment once that new game comes out and puts extra load on your CPU causing lockups, reboots, crashing, etc. It's an invitation to trouble. You are, of course, free to use the system the way you want though.

It's also true that LinX/Prime95 alone is not the only basis someone should use to determine stability. Different types of software will exercise the system in different manners. A combination of software can also increase or shift loads one way or another to uncover stability problems. LinX affinity locked with high memory usage is a good start. LinX without affinity locking using 12 or 13 threads while also looping 3DMark/Unigine at max settings is another step I use for stress testing. I want to put the system in its worst case scenario and try to break it. Let it go for hours and hours. I'm not affraid to scale back if something fails.

I don't want instability issues constantly looming over me. A couple hundred MHz isn't worth it. If I want more performance I'll simply wait a year or so until the next latest and greatest component is released. When that time comes, it also helps put into perspective all the wasted effort that went into squeezing every last drop of performance out of the previous generation. It's why I don't go full blown water cooling and why I don't care about getting a few percent more in clock speeds. Something better will eventually replace it.

First of all I never claimed my 4.8 GHz clock was stable, all I showed that it was stable enough to run as you say light load software as Cinebench15 and Wprime.
All I did was trying to warn people about the risk of using Prime, IBT etc as they are potentially dangerous for your system.

Originally Posted by Raja@ASUS View Post

Praz nailed it really. The newer versions of Prime load in a way that they are only safe to run at near stock settings. The server processors actually downclock when AVX2 is detected to retain their TDP rating. On the desktop we're free to play and the thing most people don't know is how much current these routines can generate. It can be lethal for a CPU to see that level of current for prolonged periods.

As for the universal validity of various stability testing programs, that's a more difficult question to answer without using illustrations to simplify what occurs at the electrical level on some of the associated buses.

Being brief as possible and focusing on DRAM transfer as an example: Data is moved around the system in high and low logic or signal states. The timing of these systems and those that rely on them needs to be matched closely enough for data to be moved around and interpreted correctly.A burst of data may contain a series of 1s and 0s. The 1s pull more current as they require defined voltage level that is above 0. Each data pattern has a different effect on the timing margin. Some eat into the timing margin more than others (I may illustrate the theory of this in a future guide). If a given stress test does not generate patterns in a way that eats into the timing budget sufficiently to represent how the system is used, the stress test won't be as useful to the end-user.

That's why most stress test programs alternate between different data pattern types. Depending on how effective the rotation is, and how well that pattern causes issues for the system timing margin, it will, or will not, catch potential for instability. So it's wise not to hang one's hat on a single test type. Evaluate what your needs are from the system and try to run a variety of tools to ensure the system is stable in various ways. We also need to bear in mind that some stress tests only focus on a single part of the system, while others will impact multiple areas at once.

Seasoned users usually find a systematic way that leads them from stress tests that focus on individual areas to those that hit the entire system as part of their test regimen. Ultimately, this all comes down to what your requirements are and using enough testing to confirm reasonable stability for the system in its intended usage scenario.

We coded Realbench to generate stress with real-world apps. It's a useful tool for people that encode, render or crunch numbers with their systems. However, it's not the only method out there - there are many tools to evaluate system stability that are perfectly valid.

-Raja

AdamK47 · Sep 25, 2014

Mydog said:
First of all I never claimed my 4.8 GHz clock was stable, all I showed that it was stable enough to run as you say light load software as Cinebench15 and Wprime.
All I did was trying to warn people about the risk of using Prime, IBT etc as they are potentially dangerous for your system.

Post #232.

biostud · Sep 25, 2014

I'm actually think it's a good idea to have a discussion about stability and how to test for it. Because on one hand I want a stable system but on the other hand a total torture test might be more than what is needed for stability testing.

In normal situations I've never been close to anything that resembles a LinX/Prime95 AVX torture test, so how to do a varied load testing that resembles "normal" full loads, that still stresses all of CPU without those extreme tests.

Is Asus realbench a good mix?

RealBench uses open source software components to judge the real performance of your system. Any PC or laptop can run this software to compare an old build to a new one, or its stock speed to new overclocks!

This software is split into two sections using the two buttons on the right: Benchmark and Stress Test.

1) GIMP Image Editing.
Test focuses on single threaded and memory performance.(Uses up to SSE4.2 CPU extensions)

2) Handbrake h.264 video compression.
Test focuses on multi-threaded CPU and cache performance.(Uses up to AVX CPU extensions)

3) OpenCL.
Test focuses on GPU accelerated computing using Luxmark OpenCL rendering test. Runs entirely on all available GPUs! (Uses any GPU with OpenCL acceleration)

4) Heavy Multitasking.
Test uses a combination of the above to simulate a heavy multitasking scenario that loads the entire system! (Uses up to AVX CPU extensions)

:: Stress Test ::

The stress test is designed to push your system to the absolute limit. It is useful for testing the ultimate reliability of your system or overclock as it both stresses all the available subsystems in a of realistic, complex scenarios, instead of a synthetic repeated calculation.

WARNING: The Stress Test applies a very high load across your entire system (even beyond the Very Heavy Multitasking benchmark). Please make sure you have adequate cooling available.

Ajay · Sep 25, 2014

I've generally been happy with OCCT. There is no perfect bench, as it depends are the real-life loads you'll be running. I don't think the point is to make your system bomb proof (if it is, why are you overclocking) - but just to have a reasonable predictor that your overclock isn't overstressing your mobo/ram/cpu etc.

Dave3000 · Sep 25, 2014

AdamK47 said:
I don't want instability issues constantly looming over me. A couple hundred MHz isn't worth it. If I want more performance I'll simply wait a year or so until the next latest and greatest component is released. When that time comes, it also helps put into perspective all the wasted effort that went into squeezing every last drop of performance out of the previous generation. It's why I don't go full blown water cooling and why I don't care about getting a few percent more in clock speeds. Something better will eventually replace it.

I agree with you there. That's the reason why I went back to stock clocks on my 4930k. I had it overclocked to 4 GHz and also tried 3.9 GHz for all-core loads (stock is 3.9 GHz for single core loads only) and had a few weird issues turn up in certain games. Maybe it was because of the overclock or maybe it was not or maybe it was a glitch in the software, I won't exactly know the cause since it was a rare occurance. However, overclocking in my opinion should be the first factor to blame for a system or software acting weirdly, and it just makes troubleshooting much harder and longer.

Idontcare · Sep 25, 2014

biostud said:
I'm actually think it's a good idea to have a discussion about stability and how to test for it. Because on one hand I want a stable system but on the other hand a total torture test might be more than what is needed for stability testing.

In normal situations I've never been close to anything that resembles a LinX/Prime95 AVX torture test, so how to do a varied load testing that resembles "normal" full loads, that still stresses all of CPU without those extreme tests.

Here's the deal with torture tests and stability - instability is a time-dependent phenomenon, and torture testing is one means of performing an accelerated test on stability by reducing the average time-to-failure.

So what does that mean?

It means that your rig, just for example, might on average (statistically) be prone to a "destabilizing event" once every 3 hours, every 3 days, 3 weeks, 3 months, or once every 3 years, etc.

But you wouldn't know in advance whether to it was 3 days or 3 years unless you generate stability data that would speak to the statistics of your particular rig's time-to-failure distribution.

This is where having the ability to "accelerate" the time-to-failure comes in handy. By heating up your CPU to temperatures hotter than it will typically experience, you shift the entire distribution of time-to-failure statistics to the left (to shorter times) but in a very predictable and straightforward (to an engineer at least) way.

And thus you can generate your data in a time-compressed manner, rather than literally waiting 3 days, or 3 months, to see for yourself how frequently or infrequently your computer seems to randomly crap out.

Torture testing is one means of compressing the time to failure. Making your rig survive longer without becoming unstable during a torture test will have a direct causative impact on extending the time-to-failure for the same computer when it is operating at lower temperatures with less strenuous computing scenarios.

There is no such thing as a stable computer, not even at stock straight from the manufacturer/seller is the computer stable in the sense that consumers think of the term.

Such stock computers havea statistically known timeline for their expected time-to-failure, and part of what goes into defining the stock operating parameters (voltage, max temperature, clockspeed) is the intended reliability which the manufacturer intends to customer to experience. (how frequently can the computer reset or have an instability issue before the customer finds the rate of failure to be unacceptable?)

By running various torture tests, and passing them for arbitrarily set minimum lengths of time (1hr, 1 day, 1 wk) or cycles (10, 100, 1000) we are increasing the likelihood that the computer will have an even longer time-to-failure profile when operating at reduced temperatures with less strenuous work-loads.

A computer that can pass 1 day of a given torture test is assured to be more stable (higher time-to-failure) than the computer that can pass only 1 hour of the same torture test when those two computers are otherwise being used to handle everyday computing type transactions.

We just can't say a priori whether that means the computer that passed 1 day of torture testing will now live 3 months on average before experiencing a random instability whereas the less stable rig might only survive 2 weeks, etc.

We can't speak to how much more time to failure we have bought ourselves by pursuing the longer time to failure in the accelerated testing environment, but we are guaranteed (as in physics and engineering) that we most certainly did buy ourselves more time.

biostud · Sep 26, 2014

I'm just wondering if the thermals are creating an issue that will never occur under normal circumstances. Lets say that running over 90C will make the chip unstable and induce a crash, but running below 80C and you will never see that type of crash. Then proper cooling would make the chip stable with the same work load. But if you at all times can keep your CPU below 80C to prevent the crash, then just don't load AVX toture tests, because they are the only things that can crash your computer, as no other software is going to get you in the +90C.

So will a linX stable computer be more stable in normal use than a not stable? not necessarily. You could have two computers equally stable under normal max loads where one of them will fail LinX, because only +90C will result in crashing.

(just speculating )

Idontcare · Sep 26, 2014

biostud said:
I'm just wondering if the thermals are creating an issue that will never occur under normal circumstances. Lets say that running over 90C will make the chip unstable and induce a crash, but running below 80C and you will never see that type of crash. Then proper cooling would make the chip stable with the same work load. But if you at all times can keep your CPU below 80C to prevent the crash, then just don't load AVX toture tests, because they are the only things that can crash your computer, as no other software is going to get you in the +90C.

So will a linX stable computer be more stable in normal use than a not stable? not necessarily. You could have two computers equally stable under normal max loads where one of them will fail LinX, because only +90C will result in crashing.

(just speculating )

The answer to that question is a positive "no".

All that temperature changes is the rate of failure, the time to failure, it doesn't do anything about changing (in binary fashion as you are proposing) whether or not the failure mode itself exists or ceases to exist.

This isn't something I am speculating about either. We deal with the statistical probabilities of lifetime reliability on a daily basis in R&D.

I suspect the reason consumer are given to have the impression that stability is a binary phenomenon (it is stable or it is not) is because we engineers do a good enough job of pushing the time-to-failure out to a such a length of time as to make it appear as if failures never occur. The consumer becomes far more apt to blame the software (damn M$!) than to blame the hardware when their rig throws the seemingly random "driver" error every month or three.

Time accelerated reliability testing is a fundamental pillar of silicon validation and verification, as well as process tech validation.

And we employ it because the reality of the physics and engineering of these so-called "solid state" devices is that they are very much undergoing non-static changes (thermal-mechanical, electron-hole, etc) and they do have a statistically quantifiable "stability envelope" that evolves over time.

biostud · Sep 26, 2014

Ok, then it's a matter of how often are you willing to accept crashes.

And linX stable vs un-stable could in theory be a crash every five year vs one every year.

And even though crashes can be because of unstable overclock, they can still happen because of bad code?

AdamK47 · Sep 26, 2014

It would be wonderful if AnandTech went old school and had an in-depth article about stability testing with technical details about what goes on with the CPU, memory, and other sub systems from a low level perspective. How to do it properly with effective results.

DrMrLordX · Sep 26, 2014

biostud said:
And even though crashes can be because of unstable overclock, they can still happen because of bad code?

Yes. It's less likely now with protected memory spaces being commonplace, but you can still do some things to crash a machine with poor software design.

edit: well maybe not always crash it, but lock it up or otherwise make it an unusable mess. When it comes to drivers, though, yeah, bad drivers can take down a PC.

wilds · Sep 26, 2014

Certain video games are great stability testers. GTA IV, Watch_Dogs, and ARMA 3 are games I own that require lots of horsepower to run, and will crash an unstable CPU very quickly.

Idontcare · Sep 26, 2014

biostud said:
Ok, then it's a matter of how often are you willing to accept crashes.

And linX stable vs un-stable could in theory be a crash every five year vs one every year.

And even though crashes can be because of unstable overclock, they can still happen because of bad code?

Yes, in addition to the hardware failure rate, there is a software failure rate as well.

This is why people advise to "undo your overclock and see what happens when running stock" when someone is asking for help because of their rig being unstable. The idea being that at stock the likelihood of a hardware-induced error (while still being non-zero) is vastly less than the likelihood of a software-induced error (which would continue to remain present regardless of the hardware parameters).

But you nailed it with your one-line summary:

biostud said:
Ok, then it's a matter of how often are you willing to accept crashes.

Bingo. And if you are trying to compare your OC to another person's OC then you need some way of making it apples-to-apples and that is why having an agreed upon "working definition" of stability is required even for OC'ing enthusiasts.

One person's 4.8GHz OC (that crashes every 2 weeks) is not comparable to another person's 4.5GHz OC (that crashes every 2 months), to make their OC'ing "achievements" comparable they need to demonstrate equivalent stability with a pre-defined (albeit still arbitrarily so) torture test the likes of LinX or Prime95, specific memory settings (determines matrix size), and specific time or number of passes.

Official Haswell-E thread

Member

Member

Lifer

Lifer

Member

Member

Lifer

Diamond Member

Lifer

Golden Member

Diamond Member

Lifer

Member

Lifer

Lifer

Lifer

Golden Member

Elite Member

Lifer

Elite Member

Lifer

Lifer

Lifer

Platinum Member

Elite Member