Windows 2000 Random Siezes (Involves: X5DAE, SRCU32, ATI, nvidia (nv log errors), and more...)
I hope I can get this all down right. For nearly 30days straight I have been battling my new PC. It has been featuring random lockups as follows. I will be using it, often simply web browsing, and all the widows will freeze, but the mouse is mobile. However, once the mouse is clicked a few times, the mouse siezes, the PC speaker beeps once, and W2K locks entirely and must be reset. There were no event logs for Windows, my MB, ECC, or RAID. No memory dumps either.
I cannot reproduce the error. It is utterly random. In the past, when at its very worst, I have seen it lockup two times during booting. (Boot-logging showed no errors.) It has locked up at least one time when no one was using it. Other than that, someone has always been using it, and 99.95% of the lockups have been during light, low temp work.
Here is the hardware:
Antec True 550 EPS12V (SSI) PSU
Supermicro X5DAE MB (Intel E7505)
1GB Infineon ECC Registered DDR (1st try)
1.5GB Kingston ECC Registered DDR (2nd try)
x2 Intel Xeon 2.4GHz
ATI 9700 (1st try)
nVIDIA TNT2 64 (2nd try)
Intel SRCU32 U160 RAID Controller
x4 Atlas 10k III HDDs
x1 WD800JB IDE HDD
x1 Samsung SM-332 DVD/CD-RW
Intel Intelligent Server NIC (1st try)
Intel 82545EM Gigabit NIC (2nd try)
Turtle Beach Santa Cruz Sound
Windows 2000 SP3, IE6SP1, DX9
(W2K Pro SP3 is a clean, slipstreamed copy)
First things I did:
Checked temps:
CPU1: 31-39degC
CPU2: 31-38degC
System: 35-41degC
Checked voltages:
+12V: 11.92
+5V: 4.97
+3.3V: 3.26
3.3VSB: 3.34
VCCP: 1.46-1.47
-12V: 11.96
Checked voltages and load coming from my Belkin 1200VA UPS. No problems. (NOTE: Belkin software is not installed on this machine, I used a remote machine to check.)
Pulled the older Intel NIC and reinstalled Windows with gigabit onboard NIC. No change.
Tried w/ and w/o Intel DMA drivers (Intel Application Accelerator.) No change.
Made CD-RW (has latest firmware) master & WD800JB slave. No change.
I pulled the Turtle Beach sound card (onboard sound also disabled) and reinstalled Windows. No change.
Flashed latest firmware on the SRCU32. No change. Flashed older firmware, little change... controller no longer showed in the BIOS and the lockups were unaffected. (NOTE: I also tried two different drivers. The original 2k driver, and the newer XP/2k driver.)
Flashed X5DAE BIOS from 1.0a to 1.0b. No change.
Replaced RAM. No change.
Tried running HDDs on a single SCSI channel. No change.
Here is where it gets more interesting:
Pulled SRCU32, and loaded Windows on the IDE drive. No lockups for nearly four days(!), and I started looking for a new controller. Locked up fourth day morning using the IDE. (NOTE: I have tried the SRCU32 in all three 64bit PCI slots)
I had already tried adjusting cables, &c. But I totally disassembled the PC, down to the MB tray, and rehooked all devices. I took out and reseated the CPUs, swapping their sockets.
Pulled the ATI (catalyst 3.0 driver) and replaced it with this old TNT2 (Nvidia Driver 41.09) and reloaded Windows on the RAID array. Ran three days(!) without any lockups.
(NOTE: When both the SRCU32 and the 9700 were in the system, 5hrs was my record without it siezing.) So it's the the video card? Not quite.
Last night I replugged in my WD800JB. That is the only change from the previous few days... however, in the past it has locked whether or not the WD800JB was installed.
This morning it siezes again, almost exactly like before, but this time it comes back (NOTE: it had never come back before... or at most one other time when someone else was using it, they said they were able to have it come back by quickly pressing Ctrl+Alt+Del (I did the same this time). Unfortunately.. I was not there to see it.), and at the exact time it logged an error in the event log:
Source: nv
Event ID: 13
GR SW Notify Error on 0001 dd003f00 00000042 00000304 2f802f80 00000002
0000: 00 00 00 00 02 00 4e 00 ......N.
0008: 00 00 00 00 0d 00 aa c0 ......ªÀ
0010: 00 00 00 00 00 00 00 00 ........
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
That happened at 10:33:51 MT. From 11:51:51 - 11:51:54 that error repeated one-hundred and ten (110) times.
It has not happened since.
Now remember, it was doing this on an ATI card first, though it left no traces so, either this nv error is an entirely separate problem (and I know others have seen it) OR, whatever caused the ATI error is causing the nVidia card to throw fits as well.
Basically any time I have done anything I have cleanly reinstalled Windows (heh, nearly an hundred times now).
The only other oddity I experience, is that when hitting the end of a form in Opera 6.05 with the cursor (moving it with arrow keys...) it beeps. And if you hold down the key you get a whole stack of beeps. Opera 6 does not do that on any other computer I have used. Opera 7 does not do it either. Perhaps a buffer overrun? When I told that to Supermicro tech support the guy cursed, but didn't tell me anything else.
To download my complete system info (per Computer Managment's System Information):
216k TXT
http://www.redbudcomputer.com/sysinfo.txt
56k ZIP (TXT & MSInfo format)
http://www.redbudcomputer.com/sysinfo.zip
I hope I can get this all down right. For nearly 30days straight I have been battling my new PC. It has been featuring random lockups as follows. I will be using it, often simply web browsing, and all the widows will freeze, but the mouse is mobile. However, once the mouse is clicked a few times, the mouse siezes, the PC speaker beeps once, and W2K locks entirely and must be reset. There were no event logs for Windows, my MB, ECC, or RAID. No memory dumps either.
I cannot reproduce the error. It is utterly random. In the past, when at its very worst, I have seen it lockup two times during booting. (Boot-logging showed no errors.) It has locked up at least one time when no one was using it. Other than that, someone has always been using it, and 99.95% of the lockups have been during light, low temp work.
Here is the hardware:
Antec True 550 EPS12V (SSI) PSU
Supermicro X5DAE MB (Intel E7505)
1GB Infineon ECC Registered DDR (1st try)
1.5GB Kingston ECC Registered DDR (2nd try)
x2 Intel Xeon 2.4GHz
ATI 9700 (1st try)
nVIDIA TNT2 64 (2nd try)
Intel SRCU32 U160 RAID Controller
x4 Atlas 10k III HDDs
x1 WD800JB IDE HDD
x1 Samsung SM-332 DVD/CD-RW
Intel Intelligent Server NIC (1st try)
Intel 82545EM Gigabit NIC (2nd try)
Turtle Beach Santa Cruz Sound
Windows 2000 SP3, IE6SP1, DX9
(W2K Pro SP3 is a clean, slipstreamed copy)
First things I did:
Checked temps:
CPU1: 31-39degC
CPU2: 31-38degC
System: 35-41degC
Checked voltages:
+12V: 11.92
+5V: 4.97
+3.3V: 3.26
3.3VSB: 3.34
VCCP: 1.46-1.47
-12V: 11.96
Checked voltages and load coming from my Belkin 1200VA UPS. No problems. (NOTE: Belkin software is not installed on this machine, I used a remote machine to check.)
Pulled the older Intel NIC and reinstalled Windows with gigabit onboard NIC. No change.
Tried w/ and w/o Intel DMA drivers (Intel Application Accelerator.) No change.
Made CD-RW (has latest firmware) master & WD800JB slave. No change.
I pulled the Turtle Beach sound card (onboard sound also disabled) and reinstalled Windows. No change.
Flashed latest firmware on the SRCU32. No change. Flashed older firmware, little change... controller no longer showed in the BIOS and the lockups were unaffected. (NOTE: I also tried two different drivers. The original 2k driver, and the newer XP/2k driver.)
Flashed X5DAE BIOS from 1.0a to 1.0b. No change.
Replaced RAM. No change.
Tried running HDDs on a single SCSI channel. No change.
Here is where it gets more interesting:
Pulled SRCU32, and loaded Windows on the IDE drive. No lockups for nearly four days(!), and I started looking for a new controller. Locked up fourth day morning using the IDE. (NOTE: I have tried the SRCU32 in all three 64bit PCI slots)
I had already tried adjusting cables, &c. But I totally disassembled the PC, down to the MB tray, and rehooked all devices. I took out and reseated the CPUs, swapping their sockets.
Pulled the ATI (catalyst 3.0 driver) and replaced it with this old TNT2 (Nvidia Driver 41.09) and reloaded Windows on the RAID array. Ran three days(!) without any lockups.
(NOTE: When both the SRCU32 and the 9700 were in the system, 5hrs was my record without it siezing.) So it's the the video card? Not quite.
Last night I replugged in my WD800JB. That is the only change from the previous few days... however, in the past it has locked whether or not the WD800JB was installed.
This morning it siezes again, almost exactly like before, but this time it comes back (NOTE: it had never come back before... or at most one other time when someone else was using it, they said they were able to have it come back by quickly pressing Ctrl+Alt+Del (I did the same this time). Unfortunately.. I was not there to see it.), and at the exact time it logged an error in the event log:
Source: nv
Event ID: 13
GR SW Notify Error on 0001 dd003f00 00000042 00000304 2f802f80 00000002
0000: 00 00 00 00 02 00 4e 00 ......N.
0008: 00 00 00 00 0d 00 aa c0 ......ªÀ
0010: 00 00 00 00 00 00 00 00 ........
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
That happened at 10:33:51 MT. From 11:51:51 - 11:51:54 that error repeated one-hundred and ten (110) times.
It has not happened since.
Now remember, it was doing this on an ATI card first, though it left no traces so, either this nv error is an entirely separate problem (and I know others have seen it) OR, whatever caused the ATI error is causing the nVidia card to throw fits as well.
Basically any time I have done anything I have cleanly reinstalled Windows (heh, nearly an hundred times now).
The only other oddity I experience, is that when hitting the end of a form in Opera 6.05 with the cursor (moving it with arrow keys...) it beeps. And if you hold down the key you get a whole stack of beeps. Opera 6 does not do that on any other computer I have used. Opera 7 does not do it either. Perhaps a buffer overrun? When I told that to Supermicro tech support the guy cursed, but didn't tell me anything else.
To download my complete system info (per Computer Managment's System Information):
216k TXT
http://www.redbudcomputer.com/sysinfo.txt
56k ZIP (TXT & MSInfo format)
http://www.redbudcomputer.com/sysinfo.zip