I'll never forget the first time I ever encountered the infamous Blue Screen of Death (BSOD). Although my experience
with Windows was very limited at the time, I was absolutely positive of two things:
- Something bad had happened.
- All of those hieroglyphics on the screen contained useful information, if I could just figure out how to read them.
In retrospect, both of those assumptions turned out to be true. A Blue Screen of Death is Windows' way of telling you that an unrecoverable, kernel mode error has occurred. There are a number of different things that can cause a BSOD error, so the BSOD gives you information that is designed to help you figure out what went wrong. In this article, I will show you how to make sense of these rather cryptic screens.
BSOD has changed over time
Before I get started, I want to point out that the BSOD has evolved over the years. In Figure A, you can see a Windows NT style BSOD that I captured back in 1999. Figure B shows a Windows XP BSOD. Although there are some major differences between those two screens, there are also some similarities.
For the purposes of this article series, I will be examining the Windows XP style BSOD (which is also used in Windows Server 2003). If you happen to be running another version of Windows, you can still use the information in this series of articles to help figure out what's going on with your system, but you won't be able to follow along step by step.
The anatomy of a stop message
The text that is displayed on the BSOD is known in Microsoft circles as a stop message. The stop message is broken into four different parts, each of which has its own purpose. The parts of a stop message include bug check information, recommended user action, driver information and debug port and status information. Figure C shows the same stop message that's displayed in Figure B, but also shows the stop message's various parts.
- The Bug Check Information section
The Bug Check Information is made up of a stop error number and four additional parameters that are listed in parentheses immediately following the stop error number.
Here is where I need to stop for a quick reality check. Each of these five numbers has its own significance, as does everything else on the screen. If you happen to be a software developer, and a software component that you created is causing the stop error, then each of these five numbers is going to be critically important to you. From an administrator's standpoint, though, the four numbers found in parentheses are almost always unimportant.
Over the years, I have fixed more blue screen errors than I care to think about, and only on extremely rare occasions has the diagnostic process required me to look at the individual parameters. Typically, knowing the stop error code is sufficient. I will be talking about the stop error code in depth later in this series.
- The Recommended User Action section
The second part of the stop message is the Recommended User Action. Figure C shows that the recommended user action is usually a generic message that tells you to try disabling or removing whatever hardware or software was recently installed. While this is good advice, it won't always fix the problem.
By far the most important part of the recommended user action is the very first line. In Figure C, this is the line that reads:
This line directly corresponds to the stop error number. Using this bit of text in conjunction with the stop error number gives you a lot of insight into the problem.
- The Driver Information section
The Driver Information section provides the third important piece of information. It tells you which file triggered the stop error. By looking at the driver listed in this section and the information provided in the Bug Check Information section and the Recommended User Action section, you can usually gain a fairly clear picture of what happened.
- The Debug Port and Dump Status Information section
The Debug Port and Dump Status Information section tells you two main things. The first is which COM port is being used by the debugger and at what speed the COM port is running. You can ignore this bit of information. In the old days, you could connect a serial cable between a functional machine and a machine that had crashed, and use a debugger on the functional machine to figure out what had happened to the machine that had crashed. Today, though, computers are not even equipped with serial ports, so this information is irrelevant.
The other thing this section tells you is that a dump file was created. Essentially this means that the entire contents of the system's memory were written to a file and placed on the hard drive. Some administrators like to use this file as a tool for debugging the problem. But as I mentioned earlier, it is usually possible to fix the problem without delving into that level of complexity.
Memory dumps can come in a few different forms. You can use registry settings to control whether Windows performs a complete memory dump, a kernel memory dump or a small memory dump. In addition, there is a setting you can use to control whether or not the dump file is overwritten should a subsequent crash occur. I will discuss the dump file and the various configuration options in a lot more detail later on in this series.
Some final advice
I will delve much deeper into the information that is within the stop message in part two and beyond in this series of articles.
I realize, however, that some of you may need to correct a stop error now, and may not have time to wait for me to write part two. That being the case, here is a final piece of advice based on my own experience:
In most cases, stop errors occur immediately after installing a piece of hardware or software or changing some aspect of the system's configuration. If you notice this type of cause and effect pattern, you can usually boot the system into Safe Mode, and then correct whatever action it was that caused the problem (or remove the new hardware).
If the problem starts happening for no apparent reason, look for these two things: file corruption and memory problems. Try reinstalling the latest Windows service pack (to refresh the system files) and download the latest versions of all of the device drivers that are used by the system. If that doesn't work, then try removing the computer's memory and replacing it with known good memory. Nine times out of ten this will fix the problem.
Brien M. Posey, MCSE, has received Microsoft's Most Valuable Professional Award four times for his work with Windows Server, IIS and Exchange Server. He has served as CIO for a nationwide chain of hospitals and healthcare facilities, and was once a network administrator for Fort Knox. You can visit his personal website at www.brienposey.com .