This is your drives way of telling you "Hay, I am going to give up soon, so if you want save your data do it now!"
Only in cases of excessive bad sectors, generally because of a physical issue. For example, dust that somehow got loose in the contained atmosphere of the harddrive could wreak havoc and cause many more bad sectors to occur as the drive is continued to being utilised. Such drives would distinguish themselves from the normal harddrive with a casual bad sector every now and then, because they will continue to generate bad sectors even after a full zero write. In normal cases, this does not happen.
In most other cases,
bad sectors do not indicate imminent failure! A harddrive with bad sectors could live for 10 years, but still is useless because the drive is utilised by an archaic filesystem that belongs to the 1990's - i.e. without any protection to bad sectors. That is why many people replace them under warranty. Actually a gigantic waste of resources.
When you defrag it should theoretically mark the bad sectors as it moves the data around.
Defrag had no other effect on bad sectors than a complete surface read of all sectors. It will also not solve bad sectors, just detect them.
Defects are map out in the factory, by the time defects start to show at the os level the spare area has been used up.
Absolutely
not true. What you are saying is that the host will only see the bad sectors when the harddrive has run out of reserve sectors. This is one big myth and absolutely incorrect.
If your reserve sectors are used up, you have got tens of thousands of bad sectors and the normalised value of Reallocated Sector Count in the SMART data is 1 and below the failure threshold. This is extremely rare. You simply have been taught wrong, like most of you.
Let's try to merge our knowledge together and see what we all can learn about bad sectors? ;-)
When a HD starts having bad sectors, that is a major symptom that the drive should be replaced.
Also not true. If you calculate the uBER of modern drives with 2TB+ capacities, you can simply calculate that more than 50% of all harddrives will get bad sectors. Harddrives are designed that way. If manufacturers wanted something different, they would have increased the ECC error correction and less bad sectors would occur. If you do not understand the relation between ECC and bad sectors, you simply do not know what we are talking about right here.
You guys probably are thinking about bad sectors which are physically damaged. However, this is actually very rare. In most cases the bad sectors due to insufficient error correction occur without any physical damage. Such bad sectors will continue to be used after being overwritten.
This is why every single HD maker tells you to RMA once it has been detected.
And will zero write the disk and send it to another customer as a 'refurbished drive' - very correct. ;-)
While ZFS might help in the short term, in some circumstances
Can you explain to me how, if you happen to know ZFS that well? ;-)
nor is it an option on windows machines.
True, Windows users - just like Linux users and Mac users to a lesser extent - are vulnerable because these users do not have possession of a filesystem that can deal with current era storage devices - like ZFS.
Sure, you can continue to use the HD, but it will crap out on you sooner or later, and one thing for sure is, it will never get better.
In many cases the harddrive stabilizes on bad sectors, having swapped a few of them while every few months a pending sector flies by. This is in no way is abnormal or indicative of imminent failure. Bad sectors are normal for high capacity harddrives. uBER = 10^-14 remember? What does that mean? It means 100 times more bad sectors than SSDs (10^-16). SSDs use more than half their raw storage space as error correction - preventing the occurrence of bad sectors.
Every time I get a customer that don't listen to me regarding bad sectors, they always come back crying. To OP, just replace it!
Probably because your customers do not use reliable filesystems. Not that strange, since there are only three filesystems that are safe at this time: ZFS, Btrfs and ReFS. Only ZFS is mature enough to be actually usable. So this would confirm your experience in my view.
No, they don't. They have a high chance of needing to re-read, a high chance of eventually doing a bad write, and a high chance of light data corruption under very heavy write utilization.
I really don't know what you mean by all this? Can you explain to me what uBER 10^-14 means? How that does translate to bad sectors?
Actual bad sectors are usually hidden from you.
No, all bad sectors show up as visible to the host. They become invisible when they are overwritten - by the host. At this time the SMART data will substract the Current Pending Sector by 1 and increase the Reallocated Sector Count by 1. That is called a bad sector remap.
The only exception to this is the so called 'weak sector'. This sector can still be read, but has to be read multiple times. The drive will replace such sectors as a preventative measure. This is the only exception where bad sectors can occur without being visible to the host first. These kind of weak sectors are often discovered by the autonomous action of a harddrive. This happens during background surface scans that the harddrive performs autonomously - independent from the host. You can hear this and notice it with a power consumption monitor.
When they show up in a way that you can see them, that's generally bad. They may be inherent in mechanical storage, but the HDD controller and firmware are prepared to deal with the expected bad sectors transparently--you don't see problems until they are rather bad.
Sorry, but this is not true. If it was, you did not need TLER harddrives and all the troubles with bad sectors would be gone. Because somehow magically, before the bad sector becomes bad the harddrive could read the data and write it somewhere else. This is not the case. When a sector becomes unreadable, it stays like that until it can be read. During that time, it is visible to the host and shows up as Current Pending Sector.
What happens when ZFS meets a bad sector in free space? The same thing that happens on any sane FS: pretty much nothing (note the bad sector, and move on).
Both untrue. What happens for legacy storage (NTFS, Ext4) is that due to long recovery times your desktop will stall (i.e. the application freezes; only the mouse moves). And after a minute or so, you can get a blue screen or crash or sudden reboot. If you read the SMART data at that time, you find Current Pending Sector is not 0.
When using ZFS, the bad sector is fixed instantly even before the harddrive finishes its recovery cycle. ZFS reads redundant data from other sources (either RAID redundancy or ditto blocks) and uses this data to determine what data should have been stored on the bad sector. It then writes this data to the affected harddrive, which will then initiate a remap of the bad sector with a reserve one (in case of physical damage only!)
What happens when ZFS meets a bad sector in some of your data? The same thing that happens in any sane FS: a CRC error.
CRC errors do not happen - only if corruption occured between the harddrive and the controller (UDMA CRC Error Count). If a harddrive cannot read a sector, it is obligated to return an I/O error conforming to the ATA-ACS2 standard. It may NEVER return corrupt data.
Your data is still corrupted, either way. ZFS is a server FS, recovering only metadata, like others before it (XFS and JFS1, FI). It is not remotely a solution to this problem, and it is not remotely immune to bad sectors.
100% incorrect as well. How come you do not know this?? To me the above statement is like telling the Earth is flat. It is not, its a globe. But how do you prove this to someone who thinks the Earth is flat? Might be difficult. ;-)
The only current solution requires the entire hardware and software stack of RAID-Zn on Solaris, OpenIndiana, or FreebSD, which also basically eats up an entire computer, and does no good for your desktop or laptop, running Windows, OS X, or Linux. Then, you still need to keep up with it, and replace the drive that got the bad sector(s).
I simply do not understand what you are trying to me tell me here.
NTFS may be on borrowed time, but the best solution for now, IMO, would be for Windows to check SMART and warn about certain errors. The core problem is HDD QC (and, to a lesser extent, the ECC used).
The core problem is that current era software solutions like NTFS and Ext4 treat the storage device as being perfect - i.e. not containing bad sectors. ZFS treats the harddrives as imperfect drives and tries to create a reliable storage facility based on imperfect hardware. This of course is the correct route and continues to grow more important as harddrives reach higher data densities.
This whole problem is addressed in these articles:
Why RAID5 stops working in 2009
Why RAID6 stops working in 2019
These two articles address the growing problem of bad sectors that increase as data densities increase. The magic word is: uBER. uBER uBER uBER!