Step 1: Identify the Problem
Gather information, question users, identify symptoms, determine if anything has changed, and duplicate the problem.
Step 2: Establish a Theory of Probable Cause
Question the obvious and consider multiple approaches (OSI Top-to-Bottom, Bottom-to-Top, or Divide and Conquer).
Step 3: Test the Theory
Determine if the theory is correct. If confirmed, determine next steps; if failed, establish a new theory or escalate.
Step 4: Establish a Plan of Action
Create a specific plan to fix the issue and identify the potential ‘side effects’ or impact of the fix.
Step 5: Implement the Solution
Carry out the plan or escalate the issue to a senior engineer if necessary.
Step 6: Verify Full System Functionality
Ensure the problem is gone and the rest of the system still works; implement preventive measures (e.g., training, surge protectors).
Step 7: Document Findings
Write down the problem, the fix, the outcome, and lessons learned for future reference.
OSI Bottom-to-Top Approach
Starting troubleshooting at Layer 1 (Physical/Cables) and moving up to Layer 7 (Application).
OSI Top-to-Bottom Approach
Starting troubleshooting at Layer 7 (Application/Software) and moving down to Layer 1 (Physical).
Divide and Conquer
Starting troubleshooting at a middle layer (like Layer 3 - Ping) to quickly narrow down if the issue is hardware or software.
Question the Obvious
Part of Step 2; checking simple things first (e.g., ‘is it plugged in?’ or ‘is the power switch on?’).
Identify Potential Effects
Part of Step 4; considering how a fix might break other things (e.g., ‘If I reboot this switch, who loses connection?’).