Building a storage server...

Kraeoss · Jan 3, 2014

Good day guys,

i was tasked to build a storage server for the large amount of disk images my department does and being a bit pressured to find the best buck per terabyte i come to the AT forum to ask assistance.

would anyone be generous as to lay out a spec compilation for a system that would be high on storage yet light on the pocket ? taking all suggestions

Cerb · Jan 3, 2014

1. Say no to RAID 5 (with a major vendor, that might not even be an option, anymore)
2. Capacity needs to be...?
3. Performance needs are...?
4. Redundancy needs are...?

smitbret · Jan 3, 2014

Sounds like this is for a business, so basically insist on hiring a professional vendor. You don't want to be holding the bag when no one can work because the system went down. Systems crash and no one will look at you again when it does happen. You'll be the stupid guy that built the chappy server that crashed and ruined the company.

If you still end up as the man in charge, insist on a RAID 10 setup unless you are only serving a handful of clients and then maybe RAID 1.

RAID 5 has its place but a business is not it.

lakedude · Jan 3, 2014

ZFS?

Raid 5 can only handle a single disk failure. If 2 go you are out of luck.

This will be a balancing act. The more drives you have for redundancy the more robust the system will be but at a higher cost. The fewer drives you have for redundancy the lower the cost but the system will be less robust.

I've got a 6 disk ZFS raid with 2 redundant disks so any 2 disks can fail and system will still function but if I were to loose 3 the whole thing would be garbage. With 6 disks I get 4 disks worth of usable space.

With a proper mirror you would have half the disks being redundant and half the disks worth of available space.

ZFS is flexable with the number of redundant disks and protects against bit rot as well.

http://en.wikipedia.org/wiki/ZFS

BonzaiDuck · Jan 4, 2014

I had to look up ZFS at wiki!

There used to be a time when businesses had backup servers.

And -- there was once a time when a person could be the floating IT guy in the non-IT department -- pushing the "edge of that envelope."

But these days, you don't want to be holding the bag for insufficient solutions preserving business data.

Kraeoss · Jan 8, 2014

This machine is just to be used as a System Image storage unit. was looking at around two 3Tb drives in Raid 1 with around 8GB memory. should i use a beefy cpu ? also my company doesn't see a point to buying a new machine as we have a couple spare licenses lying around.

imagoon · Jan 8, 2014

Cerb said:
1. Say no to RAID 5 (with a major vendor, that might not even be an option, anymore)
2. Capacity needs to be...?
3. Performance needs are...?
4. Redundancy needs are...?

RAID 5 is alive and well even in the heavy iron gear like EMC. It only appears to be "dying" in the minds of Anandtech forum users. Shard RAID 5 is actually pretty common. It all depends on the need really.

Kraeoss · Jan 8, 2014

Cerb said:
1. Say no to RAID 5 (with a major vendor, that might not even be an option, anymore)
2. Capacity needs to be...2 TB or more
3. Performance needs are... High copy speeds and fast access
4. Redundancy needs are...Able to recover if one drive is rendered non functional.

also for performance should i include a small ssd ? 60-80 GB ?

smitbret · Jan 8, 2014

RAID 5 is alive and well and probably the best solution for most home users.

For a business, though, RAID 1 or 10 only.

Throwing an SSD into a server that just serves files is pointless. Once it is booted up, the SSD is a non factor.

Update resume.

smitbret · Jan 8, 2014

If you only need 2TB of storage, just get a 2-bay NAS from Synology or Qnap and stuff a couple of 3TB drives running RAID 1 in it. Then you MAY get by without doing that resume update.

Kraeoss · Jan 8, 2014

well it's not just to serve files. its supposed to run norton ghost for copying hard drive images for labs etc.

smitbret · Jan 8, 2014

Kraeoss said:
well it's not just to serve files. its supposed to run norton ghost for copying hard drive images for labs etc.

RAID 1 is your horse to ride for this, then.

You'll probably be fine with any dual core and 8GB RAM. I wouldn't worry about ZFS or anything special. Doesn't sound like a high I/O.

My previous recommendation of a pre-built by Qnap or Synology is still probably the best answer:

http://www.newegg.com/Product/Produc...82E16822108139

and a couple of these in RAID 1

http://www.newegg.com/Product/Produc...82E16822178324

Emulex · Jan 8, 2014

raid-5 works great when striped for fastest reliable transport.

aka raid-50 -> you can achieve read rates faster than any other raid with minimal overhead.

aka restoring from backups, each axle of raid-5 (stripe) adds more read i/o and extra redundancy!

Cerb · Jan 8, 2014

imagoon said:
RAID 5 is alive and well even in the heavy iron gear like EMC.

With multi-TB SATA drives, in arrays that can only read a few hundred MBps at best, and that can't fail over to other arrays?

It only appears to be "dying" in the minds of Anandtech forum users.

http://en.community.dell.com/techce...ons-and-best-practices-released.aspx#comments
Yup, just some forum users...more like, some AT users have been keeping up.

This was long predicted. Luckily, drive makers have made drives, typically, above their minimum specs, so it's not as terrible as predicted, but it's getting there, as drive sizes increase without much corresponding bandwidth increases.

Shard RAID 5 is actually pretty common.

Sharding is a whole 'nother thing. This would be a single array with large slow drives.

It all depends on the need really.

That is true. But, RAID 5s failing during rebuild is happening not infrequently, now, and shouldn't be used with big SATA drives, in an array that is intended to be serviceable, due to the chances of an array failure during rebuild being non-negligible.

Kraeoss said:
also for performance should i include a small ssd ? 60-80 GB ?

GbE will be a real limit to even a RAID 1, for a backup server, with 7200 RPM drives.

imagoon · Jan 9, 2014

Cerb said:
With multi-TB SATA drives, in arrays that can only read a few hundred MBps at best, and that can't fail over to other arrays?http://en.community.dell.com/techce...ons-and-best-practices-released.aspx#comments
Yup, just some forum users...more like, some AT users have been keeping up.

This was long predicted. Luckily, drive makers have made drives, typically, above their minimum specs, so it's not as terrible as predicted, but it's getting there, as drive sizes increase without much corresponding bandwidth increases.
Sharding is a whole 'nother thing. This would be a single array with large slow drives.
That is true. But, RAID 5s failing during rebuild is happening not infrequently, now, and shouldn't be used with big SATA drives, in an array that is intended to be serviceable, due to the chances of an array failure during rebuild being non-negligible.

GbE will be a real limit to even a RAID 1, for a backup server, with 7200 RPM drives.

You might be right, a 3 disk RAID 5 with 4TB disks is on the way out but my point was that it is still alive and well in the big iron arena where they have more in place than just the disks. 520 byte sectors with checksums being one tiny example. Basically they shard the LUN across tons of large sata disks just like you mentioned.

Maybe I am just lucky or maybe I am making my own luck by using quality gear. Never had a LUN fail to rebuild.

Do I use RAID 50 / 6 / 60 etc where appropriate sure. RAID 5 hasn't been this scary thing lots of people make it out to be even with 3TB SATA disks.

Cerb · Jan 9, 2014

Well, and that's the thing. HDDs went from 1TB at about 120MBps peak to 4TB at about 170MBps peak, and RAID 5 consistently attracts users that want less expensive storage, so big drives tend to be the order of the day. RAID 5 recovery is taking so long that it's increasing the chances of (a) a mechanical failure, and (b) coming across one or more 'random' UREs (with the array not degraded, scrubbing takes care of those) to levels where it's happening on a regular basis. It also helps that RAID 1 is often viable (looks to be for the OP, FI), and that RAID 6 and 10 are quite cheap, if using regular HDDs (as in not branded and marked up by Dell, HP, IBM, etc., but also w/o NBD replacement options).

While URE specs don't seem to matter in healthy arrays (IIRC, CERN found about 10e-17 across the board), and the lower/higher specs are made up just as much as MBTF, they do seem to correlate well to big SATA-based drive array failures v. smaller SAS-based drive array failures, as parity RAID rebuilds, even throttled, can really put the screws to a HDD.

imagoon · Jan 9, 2014

I think it also depends on what is doing the RAID5. Even an URE on a degraded array doesn't need to be an international incident. Many of the high end controllers simply make the bad sector appear as "bad" and continue the rebuild, then send a message that the disk that has the URE is degraded. They also log these events. In that case you need to restore the affected file same as a single sector failure on a single disk. I know there are other controllers out there that will simply barf all over the place.

RAID 5 vs RAID 6 vs RAID anything is always a $ vs needs argument. A $700 SAS disk (*) might not be worth it for RAID 6 over 5 if it just the archive server that is backed up once a month.

I agree that RAID 5 should be pushed out on the URE issue eventually but there seems to be a ton of parrots out there saying it is dead or it is insanity to use it etc. It is like and tool, you can use it correctly or incorrectly.

* There might be more also. RAID6 generally only recommends 6-8 disks. EMC starts "pooling" RAID5 above a certain disk count [varies on controller etc] making a 15 disk raid5 more like a 15 disk RAID 50 etc.

Cerb · Jan 9, 2014

imagoon said:
I agree that RAID 5 should be pushed out on the URE issue eventually but there seems to be a ton of parrots out there saying it is dead or it is insanity to use it etc. It is like and tool, you can use it correctly or incorrectly.

And most do just that. They hear or read best practices, not getting the tool's usefulness. So, they'll set it up and expect it to be as reliable and serviceable as it was 10+ years ago, when the stuff they heard or read was quite accurate, and not consider recovery issues of today. They have a lot of other things to do, in small shops.

It's not dead, so much as it should need an argument to use instead of 1 or 10, in general, and 6 should generally be used in its place for large cheap storage.

COPOHawk · Jan 9, 2014

RAID 6 is what I have been ordering for a number of small business servers in the last 3 years. You can also have a hot spare for the array, just in case.

Why everyone rips RAID 5 or 6 is beyond me. If set up properly, with good drives, and a hot spare, you shouldn't have any problem. I am guessing most of the bitching and moaning comes from people using lower end consumer drives with MOBO chipset raid.

BTW...My experience with Intel chipset RAID 1 has been great for some cheaper systems. I wouldn't use it for RAID 5...but RAID 1 with a hot spare works well. Even having drive failures happen...the Intel Chipset recovers nicely.

Cerb · Jan 9, 2014

1. Due to varied factors, every now and then a read will fail. Mechanical and electrical stress on the drives seems to be one of those factors.
2. In a healthy array, those errors are corrected with the parity stripe in a parity array, or the other copy in a mirror.
3. The denser the drive, the longer it takes to read itself back, and the longer it is under load, the greater chances of such an error occurring. While there might be some practical differences with arrays of many drives v. few, predictive calculations normalize to array capacity. More drives causing non-periodic Gs on their array members v. fewer drives causing less, but taking longer by how many fewer drives there are...which is worse?
4. A failed RAID 5 array is basically a RAID 0 array, until it is rebuilt, offering no protection against data loss, which could result in downtime. As the chances of an uncorrectable read increase, the recovery of the array eventually becomes the likely time the array will fail.

A RAID 6 array still protects against data correctness errors during the rebuild (it effective degrades to a RAID 5), so is only particularly vulnerable to additional mechanical failures, for the time being (much like RAID 1 and 10). They happen, but if you're starting from a lone array (not part of a mutli-tiered storage system, or with regular staged backups), the risk is pretty small compared to the costs of more and more servers and/or hosted services...and if you need to plan around that risk, you'll have your storage set up with sufficient redundancy to weather it.

A long read, but worth reading:
http://www.smbitjournal.com/2012/07/hot-spare-or-a-hot-mess/

smitbret · Jan 10, 2014

Cerb said:
1. Due to varied factors, every now and then a read will fail. Mechanical and electrical stress on the drives seems to be one of those factors.
2. In a healthy array, those errors are corrected with the parity stripe in a parity array, or the other copy in a mirror.
3. The denser the drive, the longer it takes to read itself back, and the longer it is under load, the greater chances of such an error occurring. While there might be some practical differences with arrays of many drives v. few, predictive calculations normalize to array capacity. More drives causing non-periodic Gs on their array members v. fewer drives causing less, but taking longer by how many fewer drives there are...which is worse?
4. A failed RAID 5 array is basically a RAID 0 array, until it is rebuilt, offering no protection against data loss, which could result in downtime. As the chances of an uncorrectable read increase, the recovery of the array eventually becomes the likely time the array will fail.

A RAID 6 array still protects against data correctness errors during the rebuild (it effective degrades to a RAID 5), so is only particularly vulnerable to additional mechanical failures, for the time being (much like RAID 1 and 10). They happen, but if you're starting from a lone array (not part of a mutli-tiered storage system, or with regular staged backups), the risk is pretty small compared to the costs of more and more servers and/or hosted services...and if you need to plan around that risk, you'll have your storage set up with sufficient redundancy to weather it.

A long read, but worth reading:
http://www.smbitjournal.com/2012/07/hot-spare-or-a-hot-mess/

I still don't get it Cerb.

RAID 5 provides redundancy and the ability to continue provide access even when a drive has failed. That is the whole point and it does it just fine. In fact, RAID 5 is the most cost effective way to provide redundant storage. If another drive fails during the rebuild, you just simply create a new array and restore from backup. You still come out $$$ ahead for hardware than if you had built the same thing with a RAID 1 or RAID 10. If you don't need the access speed, then RAID 1 and 10 make even less sense.

Sounds like you're one of those guys that tries to pass his RAID off as a backup and are cruising to get burned.

nk215 · Jan 10, 2014

imagoon said:
RAID 5 is alive and well even in the heavy iron gear like EMC. It only appears to be "dying" in the minds of Anandtech forum users. Shard RAID 5 is actually pretty common. It all depends on the need really.

Agree.

I use RAID5 for its read/write speed. I also have a mirrored server. Last year when the main server went down, the mirrored server kicked in. Almost no down time. The main server syncs data real time ti the backup and I need fast read/write speed for that.

Many things can fail; HDD, mother board, memory, SATA controler etc. The chance of having 2 drives fail together is prob as small as having other components go.

Cerb · Jan 10, 2014

smitbret said:
I still don't get it Cerb.

Drives aren't perfect, and data errors happen. If you use software RAID on Linux, FI, or w/ ZFS, and have all your logs turned up, you can see them having been fixed during scrubs. You can't fix any of them that occur during the time that a RAID 5 is effectively a RAID 0. The chances of them occurring during a parity RAID rebuild are much higher than a mirror-type RAID rebuild, due to the added load on all the drives, all the reading that must be done, and the long rebuild time (can be several days with big SATA drives). None of the drives need to fail.

RAID 5 provides redundancy and the ability to continue provide access even when a drive has failed. That is the whole point and it does it just fine.

That it does. What it doesn't provide is any protection for the data on the array, when degraded. Between the time the array degrades, and time it is rebuilt, which can stretch into days, the protection you have is that of a RAID 0. RAID 6 gives you the data protection of a RAID 5 while degraded. The rebuild itself effectively even gives you scrubbing during that time between degraded and optimal operational states. So, until drives reach the next point where it becomes likely for even RAID 6 to not make it through a rebuild (about 2020 based on the worst-case numbers, IIRC, so really another 5+ years on top of that, unless error rates get worse in upcoming drives).

In fact, RAID 5 is the most cost effective way to provide redundant storage.

And RAID 6 adding protection to that, typically for <=$200 more, which is nothing compared to your time, combined with potential productivity losses.

If another drive fails during the rebuild, you just simply create a new array and restore from backup.

Most people expect to put a new drive in and keep going. So you pop a new drive in, it fails the rebuild during working hours (IE, not the drive failing, but a bad stripe, which, depending on your controller an config, may or may not halt the rebuild entirely, but may necessitate a fsck or chkdsk, which is another avoidable downtime), after a days of being slow as molasses, hindering users, and now what? It could have been prevented with a different RAID setup, or you could have gone straight to backups. If the system is one where going to backups during work hours is fine, and an efficient means of doing so is part of your DR planning, then that's fine.

Sounds like you're one of those guys that tries to pass his RAID off as a backup and are cruising to get burned.

If your array has a likelihood of failing during rebuild, however, and you can't get it rebuilt only during off hours, that's a risk that's generally easy to mitigate. The general point of RAID is so that you can keep on working without needing to go to backups (specifically, the downtime involved in that), and/or to protect your backups that are too large or too new for other storage mediums, especially when access to the volume is needed. That includes not only the drive functioning, but also being able to reconstruct inaccessible data, be it through parity or a mirror. The chances of being able to do that are staying in favor of 6 and 10, but no longer in favor of 5, at least for arrays of big SATA drives, as the chances of not being able to read some sector grows with drive density/capacity.

Backups need to be done anyway, and even they might be done first and foremost to another machine's RAID array.

nk215 said:
Agree.

I use RAID5 for its read/write speed. I also have a mirrored server. Last year when the main server went down, the mirrored server kicked in. Almost no down time. The main server syncs data real time ti the backup and I need fast read/write speed for that.

Aside from RAID 5 offering poor write speed for random IO, what you have there is a redundant whole server, effectively giving you a delayed mirror.

You effectively have RAID 5+1, just that the mirror portion is set up with eventual consistency. I'm only talking about a lone RAID 5 array, on a lone server, in the typical budget-limited SMB that's likely to want to make a 8TB 3-drive RAID 5 with 7200 RPM SATA drives . Another server storing the same data that an be brought right up changes everything, because in that case, there is no rebuild issue--you go ahead and treat the degraded array as failed, and can easily work from a fresh array when getting that server back up, if needed.

Squeetard · Jan 10, 2014

Is this storage used only for backup images? If so you need to look into a hardware based disk deduplication appliance.

Cerb · Jan 10, 2014

Squeetard said:
Is this storage used only for backup images? If so you need to look into a hardware based disk deduplication appliance.

...or a software one, if time is a better investment than hardware. ZFS' ability to do it reliably is well established, and it will handle the RAID, too, but would present quite a time sink to get going with.

Building a storage server...

Senior member

Elite Member

Diamond Member

Platinum Member

Lifer

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Senior member

Elite Member

Diamond Member

Senior member

Elite Member

Senior member

Elite Member