Troubleshooting 101

Sometime in your career, you will need to troubleshoot an issue that may creep up in your network. Cisco has developed a methodical method, which if you follow it, may guarantee a successful outcome every time - or at least you'll have collected the critical information needed by TAC in order to resolve the problem as quick as possible. The methodology of troubleshooting is:

Define the problem: Did you receive clear understanding what the problem is? What are the real or perceived symptoms and has this symptom or symptoms occurred before. What is the scope of the problem? Is it only affecting one user or a subset of users or the entire company? Normally the help desk will receive the problem or trouble ticket then escalate it to the next level or team. Additional information may be required by your team to really determine what the problem is.

Gather the facts: This process is the corner stone of all the other processes rely on. First of all, refer to previous trouble tickets to see if this particular problem has occurred in the past and what methods were used to correct it. If this is the very first time, determine if changes were made to the system (i.e., IOS or other application upgrades). Interview user(s) who are affected by the problem and see if this just recently occurred or has been occurring over a period of time. Then review any protocols that may be involved with the problem to understand how they normally function. Pull any necessary trace files or dumps and analyze the traces for abnormal behaviors and you may have to execute debug commands when needed. Also topology diagrams are critical in helping to isolate the problem.

Analyze the Data: Use deductive reasoning to narrow the scope of possible causes, or enlist help from other sources who may have greater knowledge of the process that is being affected. Also make sure you use all the required troubleshooting tools and use them to find the primary cause. Once you determine all the likely causes associated to the problem it is time to go to the next step.

Create Action Plan: Write down Action Plan(s) that would be needed to implement to fix the issue at hand. You need to write it down before implementing it because what if it does not fix the issue and you forgot what you just implemented. In that case you could be injecting a new problem into the system. The other reason for writing it down, is that it allows your mind to focus on the possible problem resolution and to clearly analyze if this fix will cause another problem with other systems.

Implement the Action Plan: Now it's time where the rubber meets the road. You thoroughly researched the issue, came up with a sound solutions, are now ready to implement the fix. Once the fix is in place, it is time to test.

Observe the results: Did the test conclude that the issue has been resolved?

Utilize Process: If not, we may have to undo the fix that was done and go back to "Gather the facts" or "Create an Action Plan" if more than one possible fix was thought of to resolve the problem. It is critical that you undo the last fix before implementing a new one so that you know exactly what the fix is. Once the problem is fixed than it's time to go to the document phase.

Document: This for some is the hardest step to complete. It is vitally important that you document all the problems and what has been done to resolve them. Remember, history repeats itself and this is especially true in the world of IT. Along with this, you may use this time to look at process documents used by the help desk to see if they need to be updated, so that next time help desk personnel could resolve the issue before it gets escalated up to the engineering level.

Following this step by step guideline will take more time up front, but it can save you countless number of hours in the long run and provide a quicker resolution on average.