To troubleshoot Windows NT problems, you need a working knowledge of NT's diagnostic tools and the Microsoft Windows NT Server 4.0 Resource Kit. But sometimes you have problems that this knowledge cannot help you solve. For example, if a problem occurs in the boot process, you cannot bring up NT to run its diagnostic tools. Understanding the steps of the NT boot process will help you troubleshoot problems that might occur.
Problems with the boot process can generate error messages about missing files, cause the system to hang, or crash the system with the infamous blue screen of death. Fixes might involve replacing a damaged file or using Emergency Repair Disks (ERDs) to reinstall NT. As I discuss the steps in the boot sequence, I will examine possible problems and suggest solutions.
The NT boot process requires certain files. If these files are missing or corrupted, the boot process cannot complete properly. Intel-based computers require the following files: ntldr, boot.ini, bootsect.dos (if you want to boot to DOS on a multiple-boot computer), ntdetect.com, and ntbootdd.sys (if you have a SCSI controller with the BIOS disabled). Each of these files plays a specific role in the boot process.
The files ntldr and ntdetect.com are not computer-specific. Thus, you can copy these files directly from one computer to another. The file boot.ini is computer-specific, but you can easily edit the file if you need to copy it from one computer to another. The file bootsect.dos is computer-specific, and you cannot copy it from one computer to another. You can copy this file to a 3.5" disk and then restore it from the disk, or you can use NT Backup to restore it. The file ntbootdd.sys is specific to your SCSI controller. You can copy this file from another computer that has the same SCSI controller.
The Preboot Sequence
The preboot sequence is not truly a part of NT. The process involves the computer going through its startup routine. First, the computer runs the power-on self test (POST). If the POST finds a problem, it identifies the cause of the problem with a series of beeps. (For information about beep codes, see Table 1, page 214. These beep codes are for older AMI BIOS systems. For the beep codes specific to your system, see the motherboard and BIOS manufacturers' specifications.) Possible trouble areas include a hard disk failure and the absence of a working video card.
After the POST completes successfully, the BIOS must load the Master Boot Record (MBR) from the hard disk or a 3.5" disk and run the program that the MBR contains (i.e., the boot program). If the hard disk starts to fail, the MBR can become corrupted. A virus might also infect the MBR. Viruses have a harder time running under NT than under many other operating systems (OSs), but a user might reboot the computer with an infected 3.5" disk in the drive and thus damage the MBR. If you damage the MBR, the boot process cannot proceed. (For information about recovering from MBR problems, see Bob Chronister, "Tricks & Traps," March 1998.)
The next step in the preboot sequence is for the system to load the boot sector from the hard disk's active partition into system memory. The boot sector on an NT computer or NT-formatted 3.5" disk contains instructions to load the ntldr file. If this file is missing, you receive the message BOOT: Couldn't find NTLDR. Insert another disk. If the drive partition that contains the boot files uses the FAT file system, you can use a 3.5" DOS disk to boot and then copy ntldr back to the hard disk's root directory. If the partition uses NTFS, you must use a 3.5" NT boot disk. (For information about creating this disk, see the sidebar "Building an NT Boot Disk," page 214.) Use the disk to boot the system. Then, start NT and copy ntldr from the 3.5" disk to the hard disk.
The Boot Sequence
After you load ntldr, the NT boot sequence starts. The system switches to the flat memory model, which supports as much as 4GB of RAM. Next, ntldr loads the mini file system drivers into memory and starts them. These drivers contain just enough code to read the hard disk (whether FAT or NTFS) and to load the rest of the OS from the hard disk.
During this stage of the boot sequence, the boot.ini file might cause problems. The boot.ini file controls the NT startup menu. The startup menu typically gives you the option of starting NT or starting NT in VGA mode only. If your computer can boot several OSs, you might see options such as NT, DOS, Windows 98, and Win95. You might see several versions of NT, especially if you are using a development system with NT Workstation and NT Server installed.
Listing 1 shows a typical boot.ini file. You'll notice that some of the lines in Listing 1 wrap for fit. These lines do not wrap in the original file, so you must be careful to not edit boot.ini in Notepad using the word-wrap feature.
The boot.ini timeout parameter controls how long the user has to decide which OS to boot. The default is 30 seconds, but I prefer 5 or 10 seconds. If you set a timeout of 0 seconds, the user cannot choose an OS, and the system starts with the default OS. You might want to set a timeout of 0 seconds if the computer dual-boots to NT or DOS but you do not want users to know about the dual boot, or you do not want to let them boot to DOS. Setting a timeout value of -1 causes the computer to wait for the user to make a choice, rather than using the default OS after the time delay expires.
The second line in the boot.ini file is the default OS. For the system to boot, the default OS listed must be one of the OSs in the list of possible choices. If the list does not include the default OS, the user must choose a new default OS.
The boot.ini file lists various other OS choices. The file uses the Advanced RISC Computing (ARC) convention to designate where the OS is located, and on which disk.
The first entry in the default OS line is multi(number) or scsi(number). The scsi option is only for SCSI controllers without an enabled BIOS. The multi option is for all other situations, including SCSI controllers with an enabled BIOS. The number (n) identifies the controller number, because a computer might have multiple disk controllers. Numbering starts at 0, so multi(0) designates the first controller.
The next setting, disk(n), is significant only for the SCSI option. This setting tells you which disk on the controller contains the OS files. Numbering starts at 0.
The rdisk(n) setting shows which disk on the multi controller contains the OS files. Numbering starts at 0.
The partition(n) setting determines which partition holds the OS. Partition numbering starts at 1 rather than 0.
The last setting specifies the directory path for the OS files. If the NT files are on the first disk on the first controller and reside on the first partition, the ARC path is multi(0)disk(0)rdisk(0)partition(1)\Winnt.
If you accidentally change your boot.ini file to point to the wrong partition, NT cannot start. This situation might occur if you copy a boot.ini file from another computer, where NT is on a different drive or partition. Alternatively, you might add a new primary partition to the disk and thus change the partition numbering. You can edit boot.ini to fix this type of problem. But be sure to change the file's attributes first, because it is a hidden system file.
Even if your boot.ini file is missing, NT can start. If NT is on the first controller, first drive, and first partition in the \Winnt directory (i.e., the default path), the OS can start without boot.ini.
After you clear up any boot.ini problems, the boot process can begin. If your system can boot NT or a DOS-based OS such as Win98 or Win95, you must decide which OS to boot to.
If you boot to NT, the next file that runs is ntdetect.com. This program surveys your system's hardware and builds the HKEY_LOCAL_MACHINE\HARDWARE hive in the Registry. If the file is missing or damaged, you receive a confusing error message: NTDETECT V4.0 CHECKING HARDWARE. You can use a DOS or NT boot disk to boot, and then copy ntdetect.com to the hard disk.
If you choose the DOS boot option, the required file is bootsect.dos. This file contains information about where NT relocated the old DOS boot sector of the disk when you installed NT. This location is different for every system, so you cannot copy bootsect.dos from another computer. If you do not have the file on a 3.5" disk or backup tape, you might be out of luck. To fix the problem, try restoring the DOS boot sector. Use a 3.5" DOS system disk to boot the computer. Then, use the SYS C: command (for the C drive) to restore the DOS boot sector. After you restore the DOS boot sector, the system will boot only to DOS. To restore the system's ability to boot to other OSs, you can use the three boot disks that came with NT, or the ERD, to restore the boot sector pointers to ntldr and write the DOS boot sector information to a new bootsect.dos file (which you'll want to back up).
Problems with the SCSI Device
The file ntbootdd.sys is a device driver only for the SCSI controller. (If you use IDE, you do not need this file.) If the SCSI disk is device 0 or 1, you do not need this file because the BIOS on the SCSI card lets the boot process access the disk. If the disk is another device number, the system does not use the BIOS to boot, and you need a device driver. The file ntbootdd.sys is the device driver for your SCSI card. You can copy this file from another computer with the same SCSI card, or you can copy it from the NT CD-ROM and rename it if you know which file your SCSI card uses.
Here Comes the Blue Screen
At this point in the NT boot process, the device drivers are loaded (as you watch the dots moving across the top of the screen). Then the screen turns blue, and the NT kernel load process starts. If the kernel files are missing or damaged, you receive the error message that Screen 1 shows. (If you wonder how I captured this screen before the OS loaded, I cheated and re-created the screen.) In this case, the kernel files might be missing (e.g., on a multiple-boot system in which someone booted to DOS and deleted the NT directory). However, a more likely situation is that the boot.ini file is missing or is pointing to the wrong place. Check the boot.ini file, and make sure the NT files are still on the disk. If they are missing, you must restore NT from a backup tape. You might wonder how you can restore the files if you run your backups under NT but NT will not boot. One option is to perform a quick, basic installation of NT in a different directory. Use this version to restore the original version of NT. Make sure you modify or restore the boot.ini file to point to the correct directory. Then you can delete the temporary installation or save it as an emergency copy. Some administrators like to plan ahead and install a backup copy of NT Server or NT Workstation on their server for this type of situation.
I have given you only an overview of the NT boot process. However, a basic knowledge of the required files and the roles they play will help you troubleshoot problems. (For more in-depth information about the NT boot process, see Mark Russinovich, "Inside the Boot Process, Part 1," November 1998.)