- Sorting Hardware/Software/Configuration Problems
- Hardware Troubleshooting Tools
- Troubleshooting Power-Supply Problems
- Troubleshooting the System Board
- Troubleshooting Keyboard Problems
- Troubleshooting Mouse Problems
- Troubleshooting Video
- Troubleshooting Floppy Disk Drives
- Troubleshooting Hard Disk Drives
- Troubleshooting CD-ROM Drives
- Troubleshooting Tape Drives
- Troubleshooting Port Problems
- Troubleshooting Modems
- Troubleshooting Sound Cards
- Troubleshooting Network Cards
- Working on Portable Systems
Troubleshooting the System Board
The microprocessor, RAM modules, ROM BIOS, and CMOS battery are typically replaceable units on the system board. If enough of the system is running to perform tests on these units, you can replace them.
Problems with key system board components produce symptoms similar to those described for a bad power supply. Both the microprocessor and the ROM BIOS can be sources of such problems. You should check both by substitution when dead system symptoms are encountered but the power supply is good.
System Board Symptoms
Typical symptoms associated with system board hardware failures include the following:
The On/Off indicator lights are visible and the display is visible on the monitor screen, but there is no disk drive action and no bootup occurs.
The On/Off indicator lights are visible and the hard drive spins up, but the system appears dead and there is no bootup.
The system locks up during normal operation.
The system produces a beep code with one, two, three, five, seven, or nine beeps (BIOS dependent).
The system produces a beep code of one long and three short beeps (BIOS dependent).
The system does not hold the current date and time.
A DMA Error message displays, indicating a DMA controller failed page register test.
A CMOS Battery Low message displays, indicating failure of the CMOS battery or the CMOS checksum test.
A CMOS Checksum Failure message displays, indicating that the CMOS battery is low or a CMOS checksum test failure.
A 201 error code displays, indicating a RAM failure.
A Parity Check error message displays, indicating a RAM error.
Typical symptoms associated with system board CMOS setup failures include the following:
A CMOS Inoperational message displays, indicating failure of CMOS shutdown register.
A CMOS Memory Size Mismatch message displays, indicating a system configuration and setup failure.
A CMOS Time & Date Not Set message displays, indicating a system configuration and setup failure.
Typical symptoms associated with system board I/O failures include the following:
The speaker doesn't work during operation. The rest of the system works, but no sounds are produced through the speaker.
The keyboard does not function after being replaced with a known-good unit.
Configuration Problems
Configuration problems typically occur when the system is being set up for the first time, or when a new option has been installed. The values stored in CMOS must accurately reflect the configuration of the system; otherwise, an error occurs. Incorrectly set CMOS parameters cause the corresponding hardware to fail. Therefore, check the enabling functions of the advanced CMOS settings as a part of every hardware configuration troubleshooting procedure.
The many configuration options available in a modern BIOS require the user to have a good deal of knowledge about the particular function being configured. In cases in which you have serious configuration circumstances, don't forget that you normally have the option to select default configuration options through the CMOS setup utility.
Typically, if the bootup process reaches the point at which the system's CMOS configuration information is displayed onscreen, you can safely assume that no hardware configuration conflicts exist between the system's basic components. After this point in the bootup process, the system begins loading drivers for optional devices and additional memory.
If errors occur after the CMOS screen has been displayed and before the bootup tone, you must clean boot the system and single-step through the remainder of the bootup sequence to locate the cause of the failure. These techniques are described in detail in Chapter 4, "Operating System Troubleshooting."
Microprocessors
In the event of a microprocessor failure, the system might issue a slow single beep from the speaker along with no display or other I/O operation. This indicates that an internal error has disabled a portion of the processor's internal circuitry (usually the internal cache). Internal problems can also allow the microprocessor to begin processing, but then fail as it attempts additional operations. Such a problem results in the system continuously counting RAM during the bootup process. It might also lock up while counting RAM. In either case, the only way to remedy the problem is to replace the microprocessor.
If the system consistently locks up after being on for a few minutes, this is a good indication that the microprocessor's fan is not running or that some other heat buildup problem is occurring. You also should check the microprocessor if its fan has not been running, but the power is on. This situation might indicate that the microprocessor has been without adequate ventilation and has overheated. When this happens, you must replace the fan unit and the microprocessor. Verify that the new fan works correctly; otherwise, a second microprocessor will be damaged.
Microprocessor Cooling Systems
Microprocessor-based equipment is designed to provide certain performance levels under specified environmental conditions such as operating temperature. Using Pentium class microprocessors, PC systems are designed to maintain the operating temperature of the device in the range of 30 to 40 degrees C.
The ideal operating temperature setting varies between microprocessor types and manufacturers. Also, the location of the CMOS configuration setting varies between different BIOS makers and versions. Some CMOS setup utilities provide a separate Hardware Health configuration screen, whereas others integrate it into the Power Management screen. Many systems include an additional fan control circuit for use with an optional chassis (case) fan. In these cases, the system board features additional BERG connectors for the chassis temperature sensor and fan control cable.
If temperature-related problems like those described in the preceding section occur, you should access the CMOS Hardware Health configuration screen, similar to the one depicted in Figure 3.7, and check the fan speed and processor temperature readings.
If these readings are outside of the designated range, you can enter a different value for the temperature set point. If no fan speed measurement is being shown, check to see if the fan is actually turning. If not, you should turn the system off as soon as possible, check the operation of the fan, and replace it before the microprocessor is damaged.
Figure 3.7 The CMOS Hardware Health configuration screen.
Other alternatives when dealing with thermal problems in a PC include installing an additional chassis fan to help move cooler air through the system unit, changing the microprocessor fan for one that runs faster over a given range of temperatures, and flashing the BIOS to provide different fan control parameters.
Check for missing slot covers that can disrupt airflow in the case and route internal signal cables so that they do not block the flow of air through the case. Likewise, check the case's front cover alignment as well as any upper or side access panels to ensure they are well fitted. If the airflow openings in the front cover are blocked, the system fans cannot properly circulate air though the case.
If the front panel or any of the access doors or covers are not in proper position, they could create alternate airflow paths that disrupt the designed cooling capabilities of the system. In addition to disrupting the designed airflow capabilities of the case, missing or misaligned case panels can permit radio frequency interference (RFI) signals to escape from the case and disrupt the operation of other electronic devices, such as radio receivers and televisions.
RAM
The system board RAM is a serviceable part of the system board. RAM failures basically fall into two major categories and create two different types of failures:
Soft-memory errorsErrors caused by infrequent and random glitches in the operation of applications and the system. You can clear these events just by restarting the system.
Hard-memory errorsPermanent physical failures that generate NMI errors in the system and require that the memory units be checked by substitution.
Observe the bootup RAM count on the display to verify that it is correct for the amount of physical RAM actually installed in the system. If not, swap RAM devices around to see whether the count changes. Use logical rotation of the RAM devices to locate the defective part. The burn-in tests in most diagnostic packages can prove helpful in locating borderline RAM modules.
You can also swap out RAM modules one at a time to isolate defective modules. When swapping RAM into a system for troubleshooting purposes, take care to ensure that the new RAM is of the correct type for the system and that it meets its bus speed rating. Also, ensure that the replacement RAM is consistent with the installed RAM. Mixing RAM types and speeds can cause the system to lock up and produce hard-memory errors.
ROM
A bad or damaged ROM BIOS typically stops the system completely. When you encounter a dead system board, examine the BIOS chip for physical damage. If these devices overheat, it is typical for them to crack or blow a large piece out of the top of the IC package. Another symptom pointing toward a damaged BIOS involves the bootup sequence automatically moving into the CMOS configuration display, but never returning to the bootup sequence. In any case, you must replace the defective BIOS with a version that matches the chipset used by the system.
In situations in which new devices (for example, microprocessors, RAM devices, hard drives) have been added to the system, there is always a chance that the original BIOS cannot support them. In these situations, the system might or might not function based on which device has been installed and how its presence affects the system. To compensate for these possible problems, always check the websites of the device and the system board manufacturers to obtain the latest BIOS upgrade and support information.
CMOS Batteries
The second condition that causes a configuration problem involves the system board's CMOS backup battery.
If a system refuses to maintain time and date information, the CMOS backup battery or its recharging circuitry is normally faulty. After the backup battery has been replaced, check the contacts of the battery holder for corrosion.
If the battery fails, or if it has been changed, the contents of the CMOS configuration are lost. After replacing the battery, it is always necessary to access the CMOS setup utility to reconfigure the system.