Temperature is one of the biggest factors impacting hard disk longevity. Over the years, there have been numerous studies comparing disk failure rates to operating temperature. Most of these studies have found that as a disk’s operating temperature increases, so, too, does its probability of failure. This does not necessarily mean that a disk that is running hot (but within the manufacturer’s operating temperature range) is in imminent danger of failure, but continuously operating at high temperatures can shorten a disk’s lifespan.
How Hot Is Too Hot?
A disk’s safe operating temperature range varies by manufacturer and by disk type. Generally speaking, most hard disks can safely operate at a range of 5 degrees to 50 degrees Celsius (41 degrees to 122 degrees Fahrenheit). Of course, the actual operating temperature range varies based on the make and model of the disk. Some Seagate drives, for example, can sustain temperatures of up to 60 degrees Celsius (140 degrees Fahrenheit).
Don’t Forget About Solid State Disks
It’s also important to consider the temperature of solid state disks. Although these disks do not have moving parts, they can still give off a considerable amount of heat. Once again, the specs vary by the disk’s make and model, but both NVMe and SATA SSDs can often operate at a range of 0 to 70 degrees Celsius (32 to 158 degrees Fahrenheit).
Monitoring Disk Temperature
Because excessive heat can contribute to disk failure, it is important to monitor the temperature of the disks that you are using. Nearly all storage arrays will report a disk’s temperature alongside other disk health information. If you look at Figure 1, for example, you can see that Disk 12 is currently running at a temperature of 46 C / 114 F, well within the manufacturer's stated operating temperature range.
You can use the array’s Web interface to monitor the temperature of each disk.
Storage arrays generally also feature an alerting mechanism that can tell you if a hard disk is getting too hot. As you can see in Figure 2, this storage array can send notifications by email, SMS text message or push service. However, you can’t assume that the storage array will automatically send notifications related to high temperatures. Typically, you have to specify the types of notifications you want to receive. The array shown in the figure, for instance, can send notifications related to warnings, alerts and firmware updates. However, some storage arrays support far greater granularity with regard to configuring notifications.
The storage array supports a variety of alerting mechanisms.
Another best practice with regard to monitoring the temperature of your storage arrays is to periodically spot check your storage hardware using a thermal imaging camera. Prices for these cameras have come down quite a bit, with some models costing less than a thousand dollars.
You can use a thermal imaging camera to create a temperature baseline of your storage arrays. That way, in the future, you will be able to easily tell if an array is running hotter than it should. Although using a thermal imaging camera will show you a storage array’s overall operating temperature, this technique does not give you the temperature of the disks within the array. The reason for this is that the disks are located inside of the array, beyond the camera’s reach. Even so, pointing a thermal imaging camera at a storage array’s individual drive bays can help you to figure out if any of your disks are running especially hot. In Figure 3, you can see (by the white color in the center) that the disk in the upper-left drive bay is running hotter than the array’s other disks.
A thermal camera may be able to show you the temperature of the disks relative to one another.
What If Disks Are Running Hot?
So what should you do if you find that disks within an array are running hot? On occasion, I have seen a disk run hot simply because the disk was defective. More often, however, disks overheat as a result of inadequate air flow. As such, it is important to periodically inspect your storage hardware to make sure that vents are not clogged with dust and that the array’s fans are all working properly.
Indeed, regularly tracking hard disk temperature and solid state disk temperature can help avoid disk failure--and the consequences of disk failure.