Like a lot of readers, I believe firmly in the truth behind the maxim, "If it ain't broke, don't fix it." How else can we explain the large numbers of Windows NT servers that readers tell me are still in production? When a product—whatever its age—provides quality service, a compelling reason rarely exists to upgrade to a newer version and potentially face new problems. Although customer resistance to change is a big part of the problem Microsoft faces in upgrading its server products with the latest OS versions, this resistance is not a product of inertia but rather because satisfied customers rarely see the need for change.
A Baffling Problem
A solid, stable networking environment is the goal of every network and systems administrator, and I hear regularly from readers who are happy with their base network's performance and reliability and with servers and applications that are debugged and running 99 percent of the time. Recently, however, I've received email from administrators who are seeing problems with their client computers and aren't sure why. These messages describe a common scenario: networks that have been stable for a long time; moderate (from 50 to 100) numbers of users; small (from 3 to 10) numbers of servers; and odd problems on client PCs that don't have an obvious cause, such as previously reliable applications that stop running and running applications that exhibit strange behavior. In most of the cases, the problems weren't consistent: A user would call for IT support; an IT staffer would sit at the user's PC but usually couldn't replicate the problem. Even when the current problem was resolved, a new problem would exhibit fairly quickly on the same computer, with the same irregular pattern of occurrence.
Partly because I received three messages in one day describing these problems, my first thought was that a virus attack was responsible. However, each of the systems administrators involved assured me that they kept their antivirus software updated, and each was confident that the problem wasn't virus-related. These administrators' networks had a lot in common: Each maintained the current Microsoft Office suite of applications, enforced moderately strict policies limiting user-installed software, had no real "problem" users who could be linked to the erratically behaving PCs, maintained one or two custom applications for their business process, and serviced a user environment that had been stable for more than 2 years.
I first looked to my own small office/home office (SOHO) network for clues to the problems. Although my SOHO network doesn't have as many users as the administrators I was corresponding with, I have multiple servers and a fairly stable network environment. Some of my servers haven't been rebooted except for patch installations in more than 2 years. And although I might not beat on my servers as hard as a big network enterprise would, I do a fair amount of testing and install weird applications that can easily cause server problems if not done correctly. Because my systems hadn't exhibited the problems that the administrators were describing, I was 90 percent certain that their problems weren't server- or even application-related.
A Surprising Solution
As soon as I was sure that these administrators' networks didn't have virus problems, I asked them when they had last defragmented the hard disks on their systems. After a moment of silence, they commented that they scheduled defrags on their servers. That comment gave me part of the answer. I asked each administrator to run a defragmentation analysis on each workstation that exhibited problems, then to let me know the results. In every case, the computer in question had a severely fragmented hard disk and not much free space available, which struck the administrators as somewhat odd.
The details of the defragmentation analysis gave me confidence in my diagnosis: The client computers that exhibited the problems needed a good housecleaning. Severely fragmented hard disks and a lack of available storage space can cause many applications to behave erratically, and I was hoping that was the case with these computers. The first step, however, wasn't to defrag the affected computers but to check for orphaned files. Windows applications have the annoying tendency to leave temporary files on the hard disk when the application is finished with them. The problematic client computers were older machines and for the most part had hard disks well under 10GB. One of the administrators told me that he discovered more than a gigabyte of undeleted .tmp files on one of his affected computers, and all the administrators reported no less than hundreds of megabytes of leftover .tmp files on the machines they were checking.
As soon as they had cleaned out the attic, so to speak, the next task was to organize the severely fragmented hard disks. On the next Friday night, each administrator identified a subset of his problem machines, started a defragmentation application, and went home. We had scheduled a conference call for Monday to analyze the results of our test.
Unfortunately, the results were mixed. The machines in the test subset still had some problems but at a lower rate than they previously had exhibited, and they were significantly more stable than the problematic computers that hadn't been defragmented. However, the continued erratic problems meant that we weren't finished with our work. The next step was to clean the registry, which is a significantly more difficult task. I use Iolo Technologies' System Mechanic application, which includes the Registry Cleaner Tool. After running this tool on the test computers, the administrators found dozens and in some cases more than 100 invalid or obsolete registry entries, any of which could easily have been causing erratic system behavior. Repairing these fouled-up registry entries resulted in problems disappearing from almost all of the affected systems.
An Important Lesson
I don't know what was causing the problems in the fewer than five systems that continued to exhibit behavioral problems after all our efforts, but the housecleaning solved the majority of aberrant behavior problems the administrators were seeing. The important lesson to be gleaned from this episode is that even when client computers are running well, you can't afford to neglect basic prophylactic tasks. All of the administrators I worked with are now running network-based defragmentation applications on their client computers, and each has started to institute procedures to teach users how to delete unwanted .tmp files and empty their recycle bins on a regular basis to help keep disk-space problems at bay. These administrators didn't need to upgrade their client computers to solve their problems—they just needed to update their procedures for maintaining the client systems, a far less expensive alternative.