Most modern hard disks are equipped with self-monitoring analysis and reporting technology, better known as SMART. SMART’s job is to predict when a hard disk is likely to fail, but just how concerned should you be about a SMART failure predicted on hard disk?
Generally speaking, in the case of a SMART failure predicted on hard disk, you should replace the disk. After all, hard disks cost far less than they once did, and replacing a disk before a catastrophic failure is the best way to ensure that you do not lose data.
But what if there is a SMART failure predicted on hard disk and you can’t replace the disk right away? Say you don’t have a spare drive available and have to order one. Given the current shortage of electronic components and supply chain disruptions, you might not be able to get a replacement drive overnight. The real question at that point is how long will the failing hard disk hold out?
A SMART failure predicted on hard disk does not usually mean that the disk is going to fail today or tomorrow (although it could). Remember, SMART’s job is to predict an impending disk failure with enough advance notice for you to be able to do something about it. This, of course, raises the question of what it is that SMART is really telling you.
Some types of hard disk failures are predictable, while others are completely unpredictable--such as when a hard disk stops working after it was dropped or suffered a power surge.
Predictable failures, on the other hand, are failures that the manufacturer knows will happen. Before disk manufacturers bring a new hard disk model to market, they typically do extensive testing to determine the drive’s limits. One really simple example of this is the testing that manufacturers do to determine the disk’s mean time to failure metric. This value can help customers determine how long a disk should last when subjected to continuous use.
Not surprisingly, there are numerous factors that can impact a disk’s predicted lifespan. For example, SSDs contain cells that wear out as they are repeatedly written to. As such, an SSD that is used by a write-intensive application can be expected to fail far more quickly than an SSD that handles read-only workloads.
Because there are so many different things that can potentially impact a hard disk’s life expectancy, manufacturers design their disks to keep track of numerous metrics, including (biut not limited to) the number of spin up cycles, the amount of time it takes a disk to spin up, the power cycle count and the helium pressure within the disk. Of course, the actual metrics tracked vary by manufacturer and whether the disk is an HDD or SSD.
When SMART examines a disk’s health, it compares metrics against various threshold values. If one or more metrics are found to exceed the manufacturer’s threshold, then the disk is assumed to be wearing out and may be prone to failure.
Again, this doesn’t necessarily mean that the disk is going to fail tomorrow. The disk could hang on for weeks or even months. In many cases, a predicted failure simply means that signs of wear have been detected.
However, the opposite can also be true. SMART might alert you to a predicted failure because the disk is experiencing a significant number of read or write errors, or because sectors frequently need to be reallocated.
Telling the difference between a wear issue and a more urgent issue such as a high error rate can be a little bit tricky. In a Windows environment, you can open PowerShell and enter the following command:
Get-WMIObject -NameSpace root\wmi -class MSStorageDriver_FailurePredictStatus | Select-Object PredictFailure, Reason
The output is similar to what you see in Figure 1.
This is how you check disk health in PowerShell.
The PredictFailure value will either be listed as True or False. The system used to create the screen capture above contains two disks, hence the reason why False is listed twice (once for each disk). A status of False means that the disk is healthy and is not predicted to fail, while a status of True indicates a predicted failure.
The reason code points to the reason why a failure has been predicted. Keep in mind that each disk manufacturer uses its own reason codes, although some of the codes are more or less standard. In any case, you should be able to do a web search for the manufacturer’s SMART reason codes. You can then compare a given reason code to the list of codes to determine why a disk is predicted to fail.