Repeated RAID5 Failures

JohnVM

Member
May 25, 2004
170
0
76
Hey Guys,

So, I bought 6 500GB HD501LJ Samsung drives to run in a RAID array. Was planning on running RAID5. Bought a Gigabyte P35 DQ6 board, was planning on using the onboard RAID. Bought an Antec P190 case. Did all of this at the end of July. I run Windows 2003 x64.

Since then I've been going through hell.

When I run RAID5 and start doing sustained writes, drives rapidly fail and the RAID array tries to rebuild. Right now for instance I was trying to do a write to the array (which hadn't finished initializing yet -- takes like 10 hours to initialize the array but I get the same results regardless of whether the array is initialized yet or not), had imported about 20GB of data and the RAID array failed and started rebuilding a drive. The drive that fails/needs to be rebuilt seems to change every time.

Running RAID0 seems to help this issue for a few reasons. #1, it seems to fail less, don't know why. #2, when it fails, the whole volume falls offline/writes don't continue, so if I just mark it as normal and/or reboot, the volume seems to operate just fine. For instance, I was able to import/use my RAID0 volume for the past 3 weeks or so, importing over 2.1TB of data in that time period, with no issues. But then it failed yesterday.

Attempting to figure out what was going on previously resulted in suspicions of heat issues. I have now added fans to all of the bays, and am continuing to have these problems. The drives are running cool.

What in the world could be causing this issue? The Antec P190 case has 1200W of power, so the drives should have enough power. Don't THINK it's a faulty PSU. All of the drives have fans blowing directly onto them, so they shouldn't be overheating. I doubt that I have six bad drives, and the drive that fails changes each time or so, so don't think its faulty HDDs.

This is making my computer totally inoperable and really needs a solution, and I'm completely out of solutions. Any ideas appreciated.

Thanks in advance.

-John
 

jkresh

Platinum Member
Jun 18, 2001
2,436
0
71
Onboard raid5 is never all that good (outside of some server boards) and for a 6 drive array you really should pick up a good hardware raid card.
 

jkresh

Platinum Member
Jun 18, 2001
2,436
0
71
I dont think it should be failing as much as you are having, but I have never seen anyone try a 6 disk array on a motherboard (3 or 4 disk yes, but more then that and they buy a card). Also from what I have seen you will see a substantial performance improvement by going to a card (most likely more then 100%). Areca makes good cards, but as to whether that will work in an x16 slot you will have to check either areca's or gigabyte's support to see (as from what I have seen, it will work in some boards but not others).
 

JackBurton

Lifer
Jul 18, 2000
15,993
14
81
Originally posted by: JohnVM
It's so bad it'd flat outright fail constantly?

I'm looking at raid cards right now -- expensive eh. Looking at this one: http://www.newegg.com/Product/...x?Item=N82E16816151016

Also, would I be able to plug that PCI-E x8 card into a PCI-x16 slot on my mobo?

Yes, on board RAID is junk (non server board). I'd recommend this card and one of these if you don't have a UPS connected to your machine. GREAT card and yes, you can install it on one of your x16 PCI-E slots. You have 2 x16 PCI-E slots, right?
 

JohnVM

Member
May 25, 2004
170
0
76
Yep I do have 2 x16 PCI-E slots. How does that one compare to the Areca and why that card over the Areca?
 

JackBurton

Lifer
Jul 18, 2000
15,993
14
81
Originally posted by: JohnVM
the ones that come with the Antec P190 case. It actually has two power supplies in it: http://www.antec.com/us/productDetails.php?ProdID=81900

"Neo-Link 1200 Watt dual power supply system:
One 650 Watt Neo Power is responsible for powering the motherboard and add-in cards, while another 550 Watt, handles your drives and other peripherals"

It doesn't matter how many PSUs you have when the power goes out. I'd highly recommend getting a UPS.
 

JackBurton

Lifer
Jul 18, 2000
15,993
14
81
Originally posted by: JohnVM
Yep I do have 2 x16 PCI-E slots. How does that one compare to the Areca and why that card over the Areca?
The two top high end RAID controller manufacturers are Adaptec and 3Ware. I wouldn't trust my data integrity to just any company that can make a RAID controller. Adaptec and 3Ware are VERY safe bets. You really can't go wrong with either manufacturer.
 

Madwand1

Diamond Member
Jan 23, 2006
3,309
0
76
Some random suggestions:

It could be a PSU issue, motherboard issue, chipset - drive compatibility issue...

I'd use Prime95 or something like that to test to test system/RAM stability. If you're overclocking, try disabling that.

If the drives have any jumpers, try using them to reduce to SATA 1.5 Gb/s to simplify matters. If the controller has advanced options for NCQ, etc., try turning them off. Try reducing to few drives to see if the problem actually depends on the number of drives.

Try contacting Samsung to see if they have any suggestion / utilities that can change drive options that might help.

And of course, if there are any RAID driver updates, try applying them.
 

Are Back

Junior Member
Oct 2, 2007
3
0
0
If I can stick my nose in here (new to forums, don't want to start new thread):
I have been slowly acquiring parts to build a RAID 5 rig. I am going to be connecting six frugally purchased seagate 400gb drives to a promise sx6000 RAID card. I'll be investigating a new PSU to buy soon, but I am a bit worried about the 2.90 amps @ 12V each drive draws during spinup. Is staggered spin-up a standard feature on most RAID cards (not mentioned anywhere in the Promise docs), or should any decent PSU be able to supply this much power when I flip the switch?
(2.9 x 6 = 17.4 Amps, plus other components)
 

Madwand1

Diamond Member
Jan 23, 2006
3,309
0
76
Originally posted by: Are Back
If I can stick my nose in here (new to forums, don't want to start new thread):
I have been slowly acquiring parts to build a RAID 5 rig. I am going to be connecting six frugally purchased seagate 400gb drives to a promise sx6000 RAID card. I'll be investigating a new PSU to buy soon, but I am a bit worried about the 2.90 amps @ 12V each drive draws during spinup. Is staggered spin-up a standard feature on most RAID cards (not mentioned anywhere in the Promise docs), or should any decent PSU be able to supply this much power when I flip the switch?
(2.9 x 6 = 17.4 Amps, plus other components)

Don't be shy -- it's generally better to start a new tread for branches.

Staggered spin-up is a common feature in modern high-end controllers, but not all controllers are modern / high-end, and not all drives support it, and probably not all drive + controller combinations work. I don't think that PATA drives and controllers generally support staggered spin-up.

I'm not sure how critical the "2.9A" figure is for start-up. This is probably some sort of upper bound, and practical start-up requirements, sustained, could be somewhat lower.

In any case, you can do the math and ensure that you have a PSU that's beefy enough on 12V to handle that number of drives. I'm running 12 drives off a 420W Enermax Noisetaker PSU here (Athlon X2, on-board video), and it's been fine -- I measure about 400w peak draw at the plug during start-up (320w internally assuming 80% efficiency), much less normally and very little when sleeping.
 

JohnVM

Member
May 25, 2004
170
0
76
Thanks everyone for the help so far.

An Update:

I've tried many of the suggestions here with little luck. The problems persist. So, I ended up disabling the onboard RAID feature of my mobo all together, so the drives just show up as individual drives in Windows (as IDE or whatever I'm not sure). Then I setup Windows Software RAID on them (RAID 5), and let it initialize, and then imported data. I've imported about 430GB of data over the past 24 hours and so far no problems. Very interesting, considernig my RAID5 was failing within 3 minutes when using the Intel onboard RAID controller and RAID0 would show data errors eventually.

The fact that windows software raid works fine makes me think that it is not: 1) the PSU, 2) heat, 3) the hdd's themselves. It DOES make me think it's the onboard RAID controller being fucked up or too cheap/shitty to handle a 6 drive RAID array. So, it looks like it's time to look into RAID controllers.

Anyone else have any new opinions/anything to add considering this new information?

Thanks again for all the help so far you guys have been GREAT! I REALLY appreciate it.
 

RebateMonger

Elite Member
Dec 24, 2005
11,586
0
0
Remember that even a minor problem with a single drive can "kill" a RAID 5 array. A slow spinup, for instance, will require a rebuild. When you have six drives that depend on each other, a lot can go wrong.

If you don't have diagnostics tools from the RAID chip maker, then I'll use the drive maker's diagnostics to thoroughly test EACH single drive.

And I'd pick up a name-brand hardware RAID controller.
 

cmbehan

Senior member
Apr 18, 2001
276
0
0
3ware and adaptec are GREAT brands of controllers.

I have a 4 port PATA 3ware controller that only cost me a shade over $120 that's been running a RAID 5 array 24x7 for the past 16 months.

Depending on how often you're going to be on the server, you might even want to consider a RAID 6 array...and the good news is most SATA/SAS cards you get now will include RAID 6.


Also, have you considered something like Windows Home Server? It builds a JBOD-like single volume, but spans parity data over multiple drives, resulting in a fault-tolerant array.
 

Madwand1

Diamond Member
Jan 23, 2006
3,309
0
76
Both Areca and 3ware have the following sort of note regarding compatibility with some Samsung drives:

SMART failure might be seen on this drive due to a drive firmware problem. NCQ is currently not recommended for this drive.

E.g. source:

http://www.3ware.com/products/...ist_9650SE_2007_09.pdf

However, they don't list your drives, so the note does not directly refer to them. Your drives might not be affected, and if they are, there might be a firmware update for them.

In addition, your drives don't appear on the compatibility list, so you might want to find out what your status would be on support / etc., before laying out the bundle of cash that such controllers cost.
 

JohnVM

Member
May 25, 2004
170
0
76
----------------------------------------------------------
Edit:

Yikes. Sorry, I hit the "Edit" button on your post, rather than the "Quote" button. (The option to Edit another's post is a Moderator tool).

I DEEPLY apologize. Hopefully you can re-create your original post. I have no way to do so.

My sincere apology,
RebateMonger - Moderator
 

RebateMonger

Elite Member
Dec 24, 2005
11,586
0
0
It could be a lot of things. Maybe bad drivers for the RAID controller and Windows Server x64. Maybe a loose SATA connection on the onboard RAID controller.

I had a new client who kept losing a software RAID 1 array. One day, it regenerated five times.

The solution was to replace EVERYTHING. New RAID controller (3Ware), new SATA cables, and new drives. The problem went instantly away and we kept the old drives as spares. That company could NOT afford to keep screwing with the problem, which had already cost them thousands of dollars.

Hopefully, you have a good backup system in place for this new array.
 

Snooper

Senior member
Oct 10, 1999
465
1
76
I have always found that a company that "could NOT afford to keep screwing around with the problem" would be better off not trying to set up storage arrays on desktop PC hardware in the first place. Of course, I'm a NetApp admin at work, so my opinion is probably a bit skewed!
 

ForumMaster

Diamond Member
Feb 24, 2005
7,792
1
0
Originally posted by: JohnVM
----------------------------------------------------------
Edit:

Yikes. Sorry, I hit the "Edit" button on your post, rather than the "Quote" button. (The option to Edit another's post is a Moderator tool).

I DEEPLY apologize. Hopefully you can re-create your original post. I have no way to do so.

My sincere apology,
RebateMonger - Moderator

:Q our perfect mods messing up! who would have belived that?

anyway, onboard RAID rarly works well and often gives inferior performance. before purchasing the RAID card though, try to look and see if any one has been having problems with samsung drives.
 

jonmcc33

Banned
Feb 24, 2002
1,504
0
0
Should have gotten an Asus and Seagates instead. JK

Have you tried running a diagnostic on each drive? Samsung should provide it's own utility for that. With every drive you increase redundancy but you also increase chance for failure if one drive is bad out of the bunch. It's a catch 22.
 

MerlinRML

Senior member
Sep 9, 2005
207
0
71
I've seen a number of problems with Samsung drives, although the ones I used were older than the ones you're using. I saw a number of problems with Samsung drives dropping connection and reconnecting. In a RAID5 configuration, this would cause a drive failure and very likely the drive is marked FAILED and cannot be rebuilt automatically.

I also saw a number of NCQ problems that caused the Samsung drives to lockup completely until I power-cycled.

You should contact your motherboard manufacturer or even Intel Matrix RAID support to see if they can give you some suggestions. My guess is that they're going to point their fingers at the Samsung drive and tell you to get different drives.

Beyond that you can get an add-in RAID controller from your favorite vendor.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |