Skip navigation

More on Troubleshooting NT Boot Failures

Last month, I started explaining the Windows NT boot process, with an eye to diagnosing and repairing failures. I explained that the system starts with a Power On Self Test, loads the first hard-disk sector (the Master Boot Record--MBR), and (using information in the MBR) finds and loads the first sector of the C drive. That first sector, the boot record, loads the NTLDR program, which must be in the root of the C drive. NTLDR reads and interprets the boot.ini file and displays the operating system picker (Windows NT Server Version 4.0, Windows NT Server Version 4.00 \[VGA mode\], and whatever other operating systems you have on your computer). After you choose NT, NTLDR loads and executes NTDETECT, a program that sniffs out what hardware you have in your system, and passes that information to NTLDR, which uses that information to create part of the Registry, HKEY_LOCAL_MACHINE\Hardware.

Loading the Operating System
The entries in boot.ini tell NTLDR where to find the WT files. Then NTLDR loads three files to start NT. The first is ntoskrnl.exe, the basic NT kernel and most of the NT Executive. Before NTOSKRNL can do anything, it must be able to communicate with the basic computer system, so NTLDR loads hal.dll, the Hardware Abstraction Layer; think of HAL as the motherboard driver. Both files are in the \winnt\system32 directory.

The third file that NTLDR loads is sort of a config.sys for NT, a file that describes both the computer's hardware and the services running on that computer. This hive file called SYSTEM is part of the Registry and is in \winnt\system32\config. (Microsoft calls the files that the Registry lives in hives. Why? Good question. A Microsoft person once told me it was a leftover from UNIX; UNIX folks I know just give me a blank look when I ask them about it.) SYSTEM is the entire key HKEY_LOCAL_MACHINE\SYSTEM, which contains control sets. A control set is a list of drivers and system services NT needs to load, and configuration information for those drivers.

You'll see three control sets in your HKEY_LOCAL_MACHINE\SYSTEM. One is the CurrentControlSet; it contains a description of your current configuration, and any changes that you've made to your configuration today. Another control set is a copy of your current control set as of the last time you successfully started your system; this copy is the control set for Last Known Good Configuration. The third control set assumes that you're running the VGA video driver--that's what you get when you choose the \[VGA mode\] option on the operating system picker.

Loading a Configuration
At this point, the base operating system (kernel), the driver (HAL) that handles motherboard and multiprocessor peculiarities, and the descriptions (the SYSTEM hive) of possible configurations, are loaded. But which configuration to load? By default, CurrentControlSet will load. But NT lets you choose between CurrentControlSet and Last Known Good configuration. NT begins analyzing the CurrentControlSet about the same time that the Last Known Good configuration comes up on your screen, but you can interrupt the process if you press the spacebar. NT will then instead load the LastKnownGood control set.

The main branch inside a control set is the Services branch. In Screen 1, I've opened my HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services key. On the left panel, you see names such as Aha154x, atapi, and Atdisk. Each name refers to an NT driver--Aha154x refers to some 16-bit Adaptec SCSI host adapters, atapi is the generic driver for EIDE drives, and Atdisk is the generic driver for IDE drives. I compared this listing to config.sys under DOS. config.sys includes commands such as DEVICE=ASPI4DOS.SYS, which loads drivers, but a very important and fundamental difference is that config.sys names the drivers that you want to load. NT's Services key, in contrast, names every driver it's ever heard of, regardless of what driver you intend to load. The computer that I took that screen shot from does not have an IDE drive, an EIDE drive, or an Adaptec 154x SCSI host adapter. But how does NT know not to load the drivers for those devices? Look in the right pane. See the Start entry? The value is equal to 4, which in NT-ese means, "You never need to load this driver." Oddly enough, the NT Services key is organized so that it names every driver known to Microsoft, but the vast majority of them have Start values of 4.

Loading Drivers
NT loads drivers in three passes. The first set is a small group of low-level drivers essential to getting the basic text-mode portion of the kernel going. NTLDR looks through the Services key to find those drivers (e.g., disk drivers). Even the video drivers don't load yet. When NTLDR finds a driver with a 0 Start value, it loads that driver. You see a black screen with "OS Loader Vx.xx," and as NTLDR loads a driver, it places a period on the screen, giving you a small measure of boot progress. NTLDR loads the drivers; it doesn't start them.

Once the drivers are loaded, NTLDR's job is done. NTLDR passes control over to NTOSKRNL, which is already loaded into memory but has been inactive. NTOSKRNL wakes up when the screen turns blue--the normal bootup blue, not the something-bad-happened-and-you-no-longer-have-a-server blue--and the message "Micro-soft (R) Windows NT (TM) Version x.x (Build xxxx)" appears. As NTOSKRNL initializes, it reports the memory and number of processors in the system, and the version number. After initializing itself, NTOSKRNL initializes the drivers that NTLDR loaded into memory.

Load your operating system and be worry free

This point is where the first bunch of blue screens--the bad type--can appear. Drivers that have bugs often become apparent when they load. The error blue screen I see most often isINACCESSIBLE_BOOT_DEVICE, which is caused by either a buggy SCSI host adapter driver or a boot sector virus. The second common error is IRQL_NOT_LESS_OR_EQUAL, which is generated when a driver attempts to access memory that it's not allowed to access. Although this error is not welcome, it does generate some useful information. On an error blue screen, you see a message similar to *** stop: 0x0000000a (0x0000004c, 0x00000002, 0x00000001, 0x803214d2). The first and last numbers in the parentheses are good guides to what went wrong. The first number is the address in memory that the buggy driver tried to access, the access attempt that caused the blue screen. The last value is the address of the instruction that tried to make the illegal access, the address of the culprit. In the lines following the STOP error message, you see a listing of driver names, file dates, and start addresses. Look at the start addresses to pinpoint which driver (or program of any kind) contains the address that attempted the illegal access.

Often this blue screen comes from a new driver--knowing that can come in handy for quick fixes. Suppose you find a new and improved version of your AIC78XX.SYS driver, the driver that controls the Adaptec 2940 SCSI host adapter in your computer, and load it, only to get a blue screen. What do you do?

Rather than pull out some kind of system repair disk, all you need to know is that most NT drivers are stored in \winnt\system32\drivers. You boot your system in DOS and grab the old AIC78XX.SYS driver (the simplest method is to back up the old driver before playing with the new driver, or you can uncompress the original driver off the NT Server CD-ROM). Copy the driver over the newer, buggy driver. (This process assumes, of course, that the drive that you've put NT on is a FAT partition, which I recommend.)

Presuming you don't get an error from initializing the Start=0 drivers, NTOSKRNL scans the Services key again, looking this time for the Start=1 drivers. In general, these are the drivers that the GUI will need, and the foundations of system services such as networking services. For example, the video, mouse, and sound card drivers load here. NTOSKRNL loads them, and as with NTLDR, puts a period on the screen for each driver loaded.

Once the Start=1 drivers are loaded, they initialize one by one. Again, you have the possibility of a blue screen from any of them, but I've never seen a blue screen at this stage. The Start=0 drivers seem to have the greatest system-killing power.

In the Clear
From there, the GUI loads, you log on, and you start a new day. From this point on, you have to start loading applications to crash your system!

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish