Whether you're working as an IT professional or training to enter the field, you've evaluated numerous products and sat through several training programs, and you probably hold one or two vendor certifications. Eventually, you'll face a crisis situation that will require you to put all this knowledge to use. In such situations, nothing screams "paper MCSE" more than panic. What you need is a plan of action.
The first thing to remember in troubleshooting situations is to stay calm. Whether the CEO is breathing down your neck or you're stressing yourself out, remember that such pressures aren't going to help you resolve the situation any faster. You must get past the initial panic as quickly as possible while maintaining an appropriate sense of urgency about the situation.
In hospital emergency rooms, triage is a process wherein medical professionals assess the injured and prioritize them based on the severity of their injuries. If you face a situation in which multiple problems are developing at once, you might find that you need to employ a similar triage process. When you've isolated one problem that you can concentrate on, take the time to understand how everything should be working, then break down the problem itself. Determining the extent of a problem on one machine might be relatively simple, but determining the extent of a problem on a network is usually complicated—especially because many end users seem to assume that you know when problems occur and so don't bother to report them. Windows 2000 Server Terminal Services, Symantec's pcAnywhere, VNC, and other remote control software can help you test for trouble on remote servers and clients, assuming you can connect to them. Of course, the inability to connect can also help you determine the extent of the damage.
Network problems can affect external clients. Whether your problem is a DNS entry that has become corrupt or a downed router, determining who is affected and who isn't goes a long way toward isolating the source.
Find out how to duplicate the problem. With this information, you can accurately determine whether a proposed solution works as expected. If you can't duplicate a problem reliably, then proposed solutions will address only the most likely causes of the problem, not necessarily the actual causes. The most difficult problem to troubleshoot is a seemingly random incident. In such cases, gather as much applicable information as you can. After the problem recurs a few times, you might have enough data to correlate common information and pinpoint causes.
After you isolate the likely source of the problem, you can begin testing solutions. If time is crucial, a shotgun approach—addressing all the most likely causes—improves your odds of fixing the problem but increases your chances of inviting unwanted side effects. If possible, avoid the shotgun approach unless you're aware of the consequences of your actions and potential interactions. If time permits, make only one change at a time and record the effects until you resolve the problem.
Use appropriate urgency in dealing with your problem. If necessary, use spare equipment to replace failed equipment until you can schedule downtime. Be aware that a "temporary" fix that works often becomes the permanent fix, so try to ensure that your temporary fixes are solid. Troubleshooting is an art, and finding the balance between simply making things work and making them work properly is a talent that you must refine over time.