Vol. 7: December 1998
Greetings once again!
Those of you who are running Windows NT have
most likely encountered the BSOD, or Blue Screen of Death. This is the blue
screen with all of that cryptic writing on it you see after a crash or system
lockup. Much information can be gleamed from that screen, and there are many
things to consider when debugging the cause of these screens.
So,
before we begin:
First, I will skim over what you can learn from the actual BSOD, and then
go over what you can do to help avoid it and track down the possible causes.
Near the top will be listed a program/driver name. The actual STOP event (IRQL
not less, page fault, divide by zero etc.) is often not as important as the
programs running at the time. If you see something like NTOSKRNL.EXE, that is
the Kernel itself and likely indicates a corrupt system file and/or hardware
failure or intermittent failure. On the lower half of the screen, look to the
right side. These are other program instances, see if any are recognizable, and
make a note of these. Several may well have the same name. In the middle are
many others but they have a lower likelihood of causing the problem in my
experience. If a certain driver for a device of yours comes up every BSOD, you
may have nailed the culprit. Only hardware drivers run in Kernel mode, that is,
with enough privilege to crash the machine. Below I will post more advanced
info from Microsoft:
One of the more frequent trap codes generated by Windows NT is STOP
0x0000000A. This STOP message can be caused by both hardware and software
problems. To determine the specific cause, you must debug the STOP. However,
some general information can be learned by examining the parameters of the STOP
message and the STOP screen information.
MORE INFORMATION ================
STOP 0x0000000A indicates a kernel mode process or driver attempted
to access a memory address that it did not have permission to access. The most
common cause of this error is a bad or corrupt pointer that references an
incorrect location in memory. A pointer is a variable used by a program to refer
to a block of memory. If the variable has a bad value in it, then the program
tries to access memory that it should not. When this occurs in a user mode
application, it generates an access violation. When it occurs in kernel mode, it
generates a STOP 0x0000000A message. To determine what process or driver tried
to access memory it should not, look at the parameters displayed on the STOP
screen information. For example, in the following STOP message STOP
0x0000000A(0xWWWWWWWW, 0xXXXXXXXX, 0xYYYYYYYY, 0xZZZZZZZZ)
IRQL_NOT_LESS_OR_EQUAL ** Address 0xZZZZZZZZ has base at (address)- (driver)
The four parameters inside the parenthesis have the following meaning:
0xWWWWWWWW Address that was referenced improperly
0xXXXXXXXX IRQL that was required to access the memory
0xYYYYYYYY
Type of access, 0=Read, 1=Write
0xZZZZZZZZ Address of instruction
which attempted to reference the memory at 0xWWWWWWWW
If the last
parameter (0xZZZZZZZZ) falls within the address range of one of the device
drivers loaded on the system, you will know which device driver was running when
the memory access occurred. This driver is often identified in the third line of
the STOP screen: **Address 0xZZZZZZZZ has base at (address)- (driver name)
If (driver name) is a specific driver, search in the Microsoft Knowledge Base
on the keyword 0x0000000A and the driver name.
Many things can contribute to a BSOD. Chief among them is incorrect BIOS
or jumper settings, followed by overheating or failing hardware. Although
software can cause them, it generally needs to be running in kernel mode to
produce a BSOD, which means either a hardware driver or part of NT itself.
Since a BSOD in a version of NT itself is fairly rare, the first thing you
should do if you still can't find the cause is:
This list was far from comprehensive, and I hope to post more on this
subject at a later time, but this should give you insight into some possible
causes of BSOD's.
Until next time!