Summary
The fundamental elements of a troubleshooting process are as following:
- Gathering of information and symptoms
- Analyzing information
- Eliminating possible causes
- Formulating a hypothesis
- Testing the hypothesis
Some commonly used troubleshooting approaches are as follows:
- Top down
- Bottom up
- Divide and conquer
- Follow the path
- Spot the differences
- Move the problem
A structured approach to troubleshooting (no matter what the exact method is) will yield more predictable results in the long run and will make it easier to pick up the process where you left off in a later stage or to hand it over to someone else.
The structured troubleshooting begins with problem definition followed by fact gathering. The gathered information, network documentation, baseline information, plus your research results and past experience are all used as input while you interpret and analyze the gathered information to eliminate possibilities and identify the source of the problem. Based on your continuous information analysis and the assumptions you make, you eliminate possible problem causes from the pool of proposed causes until you have a final proposal that takes you to the next step of the troubleshooting process: formulating and proposing a hypothesis. Based on your hypothesis, the problem might or might not fall within your area of responsibility, so proposing a hypothesis is either followed by escalating it to another group or by testing your hypothesis. If your test results are positive, you have to plan and implement a solution. The solution entails changes that must follow the change-control procedures within your organization. The results and all the changes you make must be clearly documented and communicated with all the relevant parties.
Having accurate and current network documentation can tremendously increase the speed and effectiveness of troubleshooting processes. Documentation that is wrong or outdated is often worse than having no documentation at all.
To gather and create a network baseline, the following data proves useful:
- Basic performance statistics obtain by running show commands
- Accounting of network traffic using RMON, NBAR, or NetFlow statistics
- Measurements of network performance characteristics using the IP SLA feature in IOS
Communication is an essential part of the troubleshooting process, and it happens in all of the following stages of troubleshooting:
- Reporting the problem
- Gathering information
- Analyzing and eliminating possible causes
- Proposing and testing a hypothesis
- Solving the problem
Change control is one of the most fundamental processes in network maintenance. By strictly controlling when changes are made, defining what type of authorization is required and what actions need to be taken as part of that process, you can reduce the frequency and duration of unplanned outages and thereby increase the overall uptime of your network. Essentially, there is not much difference between making a change as part of the maintenance process or as part of troubleshooting.