Recently I've been doing quite a bit of image and video editing, so when I decided to purchase a new high-end desktop system, flexible storage options were an important consideration in my decision. Keeping storage costs down was also somewhat important to me, so I eventually chose a motherboard that supported Serial ATA (SATA) RAID and four 250GB fast SATA drives: 1TB of raw storage.
I configured one of the drives as a system drive and one as a data drive and configured the other pair as a RAID 0 stripe set for optimum disk performance--a major consideration when working with large video files. I backed up the drives to a .5TB NAS device and to two 200GB USB 2.0 external drives. I needed the large system and data drives so that I could boot multiple OSs and run OS virtualization software on the system.
After I got the system up and running, the configuration proved to be exactly what I needed. It worked as I expected, and I comfortably installed all my applications and transferred data from my old desktop system. With its fast storage subsystem and a bleeding-edge CPU that had plenty of RAM, the system was a dream to work on.
Then I started getting write failures on the stripe set. The write failures weren't that big a deal because I had no data on the drives that wasn't replicated elsewhere. I used the storage-repair tools supplied by the system-board vendor to repair the problem. Everything was fine for a few days; then the problem recurred. This time I updated the RAID drivers, let the tool repair the volume again, and went back to work. Once again everything was fine for a few days, and then, of course, the problem happened again.
Because the drivers had seemed to temporarily fix the problem, I considered the possibility that the RAID driver wasn't happy with the RAID 0 configuration, so I killed the stripe set and reformatted the member drives as individual drives. Removing the hardware RAID interface seemed to once again fix the problem. But this fix contradicted one of the reasons that I bought this particular motherboard (the SATA RAID), so I was determined to find a solution that allowed the RAID functionality to remain.
At this point, one of the SATA drives became completely unavailable, disappearing from the BIOS setup as well as the OS. This made me think that I'd found the culprit: I had a bad drive. I called the vendor, and I had a replacement drive in hand the next day. I reinstalled the drive, reset the stripe set, and everything ran fine.
For one day.
Then the write problems to the stripe set started happening again, this time with the other drive in the pair. Being a bit less credulous this time, I swapped the cables on the offending drives, and the stripe set started working properly again, but only for a few hours.
At this point, I was sure that the problem wasn't the drive hardware, but the controller hardware. Because I was using a motherboard with integrated SATA RAID, I'd have to swap the entire motherboard. The motherboard vendor drop-shipped me a replacement motherboard in 24 hours, but now I had to pull the system completely apart to replace the motherboard and move all the system components (CPU, memory, add-in cards, and so on). Disassembling the computer was a little tricky since I hadn't assembled it in the first place (it came from a well-known custom builder).
After a couple of hours of careful disassembly and assembly, I was up and running again. Sure enough, the storage problem had been caused by an intermittent failure of the onboard RAID controller.
The system has been humming along nicely for the last few weeks since the complete motherboardectomy, which solved my storage problems. Now all that concerns me is what to do about backup if I should manage to fill up the 2TB of online storage that's easily available to me. Maybe I'll need to upgrade all the drives to the new 500GB SATA 2.5 drives that Seagate released last week