Recently, I went on a vacation without my laptop. While I was gone, I checked my email by using a convenient computer and noticed that I couldn't open my mailbox. Like many other Microsoft Exchange 2000 Server administrators, my Exchange servers provide mail to my own organization, as well as to several other businesses with which I'm affiliated. If I couldn't get my email messages, neither could my partners, subcontractors, and others who depend on my service. However, I was 500 miles away from my Exchange Server system, and no one (except possibly the cat) was available to fix the problem. What to do?
As with any server problem, I had several options. First, I could choose to ignore the problem, move on to other activities, and wait until I got home to fix whatever was ailing my server. Because my partners expect a reasonable degree of email availability, this choice wasn't really an option—and probably isn't an option for most Exchange administrators. Second, I could drop what I was doing, go to the server, and fix it. If I'd been close to home, this approach might have been a viable option, but in this case I wasn't willing to pack up, go home, and miss my vacation. My third and final option was to access the misbehaving server remotely over the Internet to fix the problem.
Preparing for the Worst
Fortunately, I had gathered several tools before I left on vacation and was prepared to troubleshoot the problem remotely. Naturally, these tools, most of which are free, will help you only if you have the foresight to install and configure them before you need them.
The most important tool for remote troubleshooting is finding a way to get to the console of the server you want to troubleshoot. (For a primer on remote access tools, see Don Jones, "Must-Have Remote Administration Tools," May 2002, InstantDoc ID 24536.)
If you're running Exchange on Windows 2000, you can use the built-in Win2K Server Terminal Services component. You can install Terminal Services in one of two different modes: remote administration mode or application server mode. Remote administration mode lets you establish two separate Terminal Services sessions to each server you manage without any additional licensing; application server mode requires extra licenses. To activate Terminal Services, perform the following steps:
- Open the Control Panel Add/Remove Programs applet.
- Click Add/Remove Windows Components.
- Scroll down the Components list to find the Terminal Services component, select the Terminal Services check box, then click Details.
- In the Terminal Services dialog box, make sure Enable Terminal Services is checked. Click OK, then click Next in the components wizard.
- The component installer will ask you whether you want to configure Terminal Services for remote administration mode or application server mode. Select the Remote administration mode option, then click Next.
Of course, you need to make sure that the accounts you want to use for remote administration can log on locally to the servers you need to manage—as a result, you might need to use the Microsoft Management Console (MMC) Active Directory Users and Computers snap-in to tweak the user permissions.
If you're running Windows NT 4.0 on your Exchange server, you can either uninstall the OS and all other software and install NT Server 4.0, Terminal Server Edition, an expensive and labor-intensive process, or install some remote desktop-sharing software, such as Symantec's pcAnywhere, Altiris's Carbon Copy, or AT&T Laboratories Cambridge's Virtual Network Computing (VNC). I like VNC because it's free—go to http://www.uk.research.att.com/vnc. (As a bonus, the VNC viewer software is small enough to fit on a 3.5" disk, so I can carry it with me and run it on whatever machine I need to.) Admittedly, VNC is not as sophisticated or as fast as its commercial counterparts, and no formal organization supports it. As a result, you might find that another tool better suits your needs. All these remote desktop-sharing tools run on Win2K as well. No matter what host OS you're using, be sure to follow the software's security recommendations closely—you certainly don't want to provide random Internet users with access to your Exchange server's console.
Determining Whether the Machine Is Running
When you're away from your server and you realize that something is wrong, your first step is to find out whether the server is running. If you're lucky and it is up, you can ascertain specific problems (e.g., inbound mail is failing, Microsoft Outlook Web Access—OWA—doesn't work). In my case, I could ping the server's IP address, so I knew that the server was running and on the network, but I could also tell that POP3, IMAP4, and OWA were all dead. The server was answering POP3 and IMAP4 connections but rejecting logons for those protocols. Messaging API (MAPI) connections were failing, and attempts to load the OWA page for any user brought up the regular authentication dialog box, followed by a 503 error page.
Depending on your firewall configuration, you might not be able to ping a machine inside the demilitarized zone (DMZ) or firewall. Find out now whether you can ping a machine from the outside so you'll know what to expect when you need to remotely troubleshoot a server for real. Any test that will tell you whether the server is on the network will do: You can telnet to a well-known port or you can connect to a system that you know is up and use Windows tools such as Server Manager or the MMC Computer Management snap-in to try to connect to the misbehaving machine.
If your machine isn't running—say, because it's experienced a blue screen or because the power is out—you're probably out of luck. The one exception is if your server has one of those spiffy remote-management cards. Hewlett-Packard (HP), Compaq, Dell, and most other first-tier server vendors offer these cards, which let you dial or telnet into a text-based monitor that can identify the state of the machine's hardware. Some of these remote-management cards even show you a snapshot of what's currently on screen, a great feature for remotely debugging suspected or actual STOP errors. Most of these cards let you restart a dead machine because you connect to the card and not to the host system. This ability to revive a downed system is invaluable when you're not physically near the failed server.
Identifying What's Wrong
When you can ascertain that the computer is up and open for business, the next step is to use your remote console application. I fired up the Terminal Services client, which I had previously installed on a convenient computer at my remote location. If you can't install or use the dedicated Terminal Services client—perhaps you're at a customer site—Microsoft has an ActiveX control that you can install on your server. You can use this control to establish a Terminal Services session from any Windows machine that has Microsoft Internet Explorer (IE) installed. To download the ActiveX control, go to http://www.microsoft.com/windows2000/downloads/recommended/tsac/default.asp.
After you establish a remote console session, you can use the regular array of troubleshooting tools that you'd use if you were physically sitting in front of the machine. I typically start the troubleshooting process by using the Event Viewer to check for warning or informational events that can offer a clue as to what's wrong. Exchange logs most of its interesting events, including various types of errors and failures, in the Application event log. Applying a filter with the View, Filter command lets you see only warning or error events. Filtering out routine events can help speed the process of pinpointing a failure. After a bit of practice, you'll be able to look at an event ID and identify the problem. Get into the habit of searching Microsoft's Knowledge Base as soon as you encounter an unfamiliar event ID. Many common events are largely unmentioned in the Exchange documentation but are described in the Knowledge Base.
You can perform two other tricks from the command line, either through a remote console session or through Telnet, to help correct a problem:
- The Net Start and Net Stop commands let you start and stop individual services. Because Win2K and NT both understand service dependencies, you can simply stop the System Attendant (MSExchangeSA) service to kill all Exchange services. To run Net Stop in unattended mode, type
net stop msexchangesa /y
and watch your services shut down cleanly.
- The Microsoft Windows 2000 Server Resource Kit's Nltest command (described in Darren Mar-Elia, "10 Resource Kit Remote Administration Tools," April 2001, InstantDoc ID 20046) has a handy shutdown feature that lets you attempt to shut down a balky server from the command line on another machine. Sometimes a server that won't shut down from the console will honor an Nltest shutdown request. You need two switches to force a shutdown with Nltest:
nltest /server:<server name> /shutdown:<"server hung">
The /server switch tells Nltest which server you want to target, and the /shutdown switch lets you specify a reason (e.g., "server hung") that will appear in the event log.
My Exchange server hadn't logged any interesting event messages, so I couldn't immediately tell what was wrong. The fact that POP and IMAP users couldn't log on was a clue to the problem. The services were up, but because Exchange 2000 uses DLLs that IIS loads to handle POP3 and IMAP4 traffic, I couldn't vouch for the health of the Store. I began to suspect that my server was having trouble reaching a Global Catalog (GC) server because logon requests for POP and IMAP were immediately failing. (I have only one domain, so the GC server and domain controller—DC—are the same machine, even though I have two GC/DC computers.) A quick flourish of the Dsadiag tool from the Win2K support tools helped me identify part of the problem: My Exchange server wanted to talk to the GC named thunderstorm, which I had shut down while waiting for a new Fibre Channel adapter. The quickest way to fix the problem would typically be to stop and restart the Store, forcing it to find another GC. You wouldn't want to take this approach during the middle of a business day, but I figured I could get away with it during a noncritical time. However, when I tried to stop and restart the Store, it hung—always a sign that something is amiss.
Cutting to the Chase
Because I couldn't just shut down the Store, I had to shut down the entire machine. I pressed Ctrl+Alt+End to trigger the Windows Security dialog box from within the Terminal Services client and selected Restart, but nothing happened. So I used the Terminal Services client to attach to another server, opened a command window, and typed
nltest /server:cyclone /shutdown:wedged
The command shut down the server cleanly, and when the server came back up, all was well. OWA, IMAP, POP, and MAPI users could all get their mail, and Exchange was bound to the correct GC.
Learning Valuable Lessons
Would I follow the same procedures again? Possibly. In this case, the server's availability requirements were fairly low and I had created a known good backup the day I left for my vacation. I judged the risk of data loss to be minimal, and the odds of a successful restoration as high. Of course, you might want to follow a more deliberate course of troubleshooting if you have someone at the server who can help you, particularly if the trouble you're having requires you to stop the Store, run Eseutil, or run Isinteg. You can benefit from my experience by doing the following:
- Set up some kind of remote console access tool on any server that you might need to fix remotely; install the corresponding client (or make it available for download) on the machines you'll need to use while you're out of the office. Make sure you do so securely, and be sure to test your setup while you're still in the office.
- Make sure that you install the Win2K and Exchange 2000 support tools, which are located in the support folder on the product CD-ROM, on any machine that you might need to fix remotely. Installing the Win2K Server and Exchange 2000 resource kits is probably a good idea as well.
- Consider using a tool to alert you when something unusual happens. Exchange 2000 and Exchange Server 5.5 both include server-monitoring tools that can email you when something's amiss; you should probably consider combining these built-in features with a third-party tool that can send you pager messages (when your email server is down, it can't send you email).
- If others will be near the server while you're on the road, teach them as much about Exchange as you can. In the best case, they can fix problems without your intervention; at worst, you'll have someone who can change tapes or otherwise provide a spare pair of hands when you're not there.