Towards the end of the Dotcom boom, I was responsible for the administration for a web site and forum that ran off IIS 4. The server running the forums was housed in a hosting company’s cage somewhere in the mid-western United States. Although not really unusual today, I was going to manage the computer from half a world away in Melbourne Australia via a VNC remote desktop connection.
When responsibility for this particular server was passed on to me, I did what every systems admin should do and checked the configuration. This particular NT4 box was configured hardware disk mirroring and had a built in DAT drive for backup tapes.
Systems Administrators know what I mean when I say disk mirroring. However, realizing that all kinds of people might be reading it I should explain how mirrored hard disks work. In a nutshell: you get two identical hard disk drives and configure them so that all data that is written to the first hard disk drive is simultaneously written to the second. It works in a similar way to carbon paper. If the first hard disk drive fails, you can “break the mirror”, boot the server off the second hard disk drive which is an up-to-date copy of the first hard disk drive. When you have time, you can then replace the failed hard disk drive and create a new mirror. If something goes completely awry with your operating system, you can break the mirror, reinstall the operating system on one disk and then recover the data from the second disk.
During my audit I learned that a backup hadn’t been taken in months. No special backup software was available and the guy who had managed it before me said something about having a problem with the built in software. When I connected to the server and tried to manually perform a backup, I got a Dr Watson screen. The reason that no backups had been taken was that the backup utility was corrupt and the previous guy hadn’t gotten around to dealing with that.
Never a great fan of the built in NT4 backup utility, I recommended to my boss that he purchase one of the more popular commercial backup solutions. Whilst the paperwork went through, I zipped up all of the forum data and manually downloaded several gigabytes to the hard disk drive on one of my PCs back home in Australia.
A few days later the new backup software arrived at the hosting company. They emailed me asking me what I wanted to do and I instructed them to place it in the server’s CD-ROM drive. Once this was done I remotely connected to the server and attempted an install.
At this point that I became aware that not only was the NT Backup utility on the blink, the NT4 installation management software was also being recalcitrant. After trying a few different things it became clear that it was going to be impossible to install any software on this particular NT4 server without the installation procedure crashing.
I spent some time researching the issue I found that other people had encountered it and the way to fix it was to follow a complex procedure detailed on Technet. This procedure detailed how the installation management component of NT4 could be reinstalled by rebooting with an emergency repair disk and utilizing the installation media. You can do a lot of things remotely over VNC, but swapping floppy disks requires someone in front of the server itself. It was time to involve the hosting company.
When setting up the job, I explained in some detail that we hadn’t been able to back up the server for some time. That I wanted the procedure in the Technet article carried out and that if they had any problems, to contact me on the following phone number before taking any action.
In the back of my mind I figured that if worst came to worst, I’d get them to break the mirrored drives and re-install NT4. I would then be able to bring all of the forum and website data back from the newly unpaired disks.
The downtime was scheduled for a Sunday afternoon, early Monday morning in Australia. The server was meant to come back online at around 5am. I set my alarm and went to sleep.
I woke and 5am and tried to ping the server. No response. I figured that it was still down and that they hadn’t started on time. I checked the hosting company’s job tracking database and saw that the job was still in progress. Given that there wasn’t much I could do, I surfed the web for a while, intermittently trying to ping the server.
At about 5:45am I got a response to my ping. Cool - the server was back up. I tried to connect via VNC and didn’t get anywhere. I then tried to bring up the home page of the site that the server hosted.
I received the “welcome to IIS 4.0” screen”. For a moment it didn’t register in my mind what had happened. Under no conditions should I have received that screen. If the procedure hadn’t worked, they were to call me and I wouldn’t be able to connect to the server. If it had worked, I shouldn’t be seeing the default “welcome to IIS 4.0” screen.
I rang the hosting company and asked to speak to the tech who had been working on my server.
“Why am I getting the IIS 4.0 welcome screen?”
“Oh, I couldn’t get the Windows NT installer to work, so I just formatted the disk and installed NT 4 from scratch”.
“Did you break the mirror before you did this?”
“No. But you can restore from your backup tape!”
Every systems administrator has had a moment like this one. It is that moment when time stretches out for an eternity. When you realize that you are up that well traveled creek, in a canoe made out of barbed wire and that for some reason your paddle has gone AWOL.
I’d done my best. I’d written in the job report that the backups weren’t working and that was the whole reason I’d wanted them to try this procedure. I’d written in the job report that if the procedure I’d outlined had not worked, they needed to call me so I could decide what should be done.
If only the mirror had been broken, the data would still have been there…
By the middle of the week, the hosting company provided a free upgrade to a significantly better provisioned server running Windows 2000. I then uploaded the gigabytes of zipped data to the server and was able to restore it, only losing two weeks worth of changes.