You probably noticed that my column didn't appear in last week's newsletter. The primary reason was that my local telephone provider couldn't repair my DSL line for 5 days, but I experienced other problems as well. A few days before the newsletter deadline, one of my Windows NT 4.0 domain controllers started misbehaving—the desktop was extremely sluggish and unresponsive, services logged error messages in the system event log, Exchange logged startup and runtime error messages in the Application Event Log (AEL), and WINS refused to start, even after I removed the database and all the log files in the WINS directory.
When I booted the crippled domain controller, NT sometimes installed the correct graphics driver and other times started up in VGA mode, despite the fact that both my Control Sets had identical configurations. I never did figure out why NT alternated between the two modes (and no, I didn't select the VGA restart option from the boot menu). At the low point in this experience, I could enter only one or two GUI or command-line commands before the system hung and forced me to reboot—a lesson in patience. The desktop became more responsive after I reinstalled the video driver, but every time I logged on, NT reported that I was running without a properly sized pagefile.
When I checked the pagefile settings, the Performance tab indicated I had a permanent pagefile more than three times the size of the installed memory, but I could see a pagefile.sys of the correct size on my NTFS formatted boot drive. Hoping for an easy solution, I set the size of the pagefile to zero, rebooted, manually deleted the old pagefile.sys, created a new pagefile on the Performance tab of My Computer\Properties, and rebooted. Unfortunately, NT displayed the same error message when I logged on again. After repeating the pagefile purge procedure a few more times without success, I started digging in the Registry to find out why NT thought there was not a properly sized pagefile, even though the setting in the GUI was correct and the file appeared on the hard disk.
NT memory management and pagefile settings reside in the Registry key HKEY_LOCAL_MACHINE\CurrentControlSet\ Control\Session Manager\Memory Management. When I looked at the Memory Management value entries, the PagingFiles entry was set to system32\temppf.sys, an important clue. I compared the other Memory Management value entries to those on a running NT 4.0 system, and I found two entries that didn't appear on my other NT 4.0 systems. I don’t remember exactly what the value entries were called, but one indicated the number of pages in the pagefile allocated to the system, and the other indicated that the system was using a temporary paging file. Hoping for another quick fix (hope springs eternal), I set the PagingFiles value entry to the pagefile on my boot drive, changed the temporary pagefile value from one to zero, and rebooted—a technique that produced no results.
Next, I deleted all three value entries and rebooted. When I logged back on, NT displayed the same pagefile error message. Now, armed with indisputable evidence that the pagefile problem was not the result of incorrect memory management Registry settings, I examined the temporary pagefile more closely. I verified that the file was, in fact, in the system32 directory. Its size, 20MB, explained why the system was so slow—20MB is inadequate to start most standard system services (especially on a system with 128MB of RAM), let alone all the Exchange services I needed to run my mail server. The only rational explanation for the domain controller’s behavior is that whenever this file exists in the system32 directory, NT uses it as the pagefile, even though a valid and properly sized pagefile exists somewhere else.
So, here was my dilemma: How could I delete a temporary pagefile that was open while the system was running? Two methods came to mind. Either I could install a second system root, boot up, delete the temporary pagefile in the original system root, reboot the original system, and recreate the pagefile—or I could find a utility that would let me boot NT from a disk and delete the file. I tried option one, and the install failed several times, often while NT attempted to load the kbdus.dll file. This left option two.
I sent Mark Russinovich (one of our contributing editors) a desperate plea for help, and he replied promptly with a copy of the Emergency Repair Disk (ERD) Commander Pro utility. ERD Commander Pro creates a set of three NT boot disks with a robust set of command-line commands that operate on NTFS volumes. After I granted Everybody access to the boot drive, I could delete the temporary pagefile, reboot, and create a permanent pagefile. Lo and behold, my system was up and running in fine form. To clean up after the failed installs, I reapplied the NT and Exchange service packs, set all the services to start automatically, and proceeded to pick up 5 days of mail, smiling all the while.
So how does an NT system end up in this state? Because the domain controller had so many problems, it’s hard to backtrack to one specific cause. I tried several times to reinstall NT in the existing system root or in an alternate root, and the install failed repeatedly. Perhaps the installation procedure creates a temporary pagefile, and when the install fails, it doesn’t delete the temporary pagefile from the system root. Also, I’m sure that NT creates a temporary pagefile when you purge your pagefile (as I did repeatedly) and reboot. When I tried to recreate the permanent pagefile, the system hung after I entered only one or two commands, and it wouldn't shut down. To clear the hang, I abruptly powered the system down, so NT might have started to delete the temporary pagefile but not completed the operation because I hit the restart button at just the wrong time. File this lesson away for future troubleshooting and share my celebration of a successful conclusion to yet another tech story from the trenches.