Developing a Server Failure Notification System, Part 1


The situation was frustrating. Hard at work on a presentation, your company's director continually received errors when he tried to launch applications. After several hours, the director found that the server had gone offline, and the third-party notification software failed to detect the problem. In response, the director has asked you to develop a server failure notification system to report on server status and page someone to immediately fix any problems that arise. You begin by determining the script's requirements.

The Requirements
You need to write a script that doesn't make the same mistake that the notification software made, so the first step is to determine why the notification software didn't detect the offline server. After a quick investigation, you find that the notification software pinged all the servers but didn't provide an alert when the main application server went offline.

Although a positive ping response typically signifies server responsiveness and availability, occasionally a hung server responds to a ping but is unresponsive to clients' requests. So, you need to use more than just a standard ping test for such an occasion. Three tests can give you an accurate snapshot of server responsiveness and availability.

Test 1. Ping a server by name and IP address to determine whether it's responding to Internet Control Message Protocol (ICMP) echo requests. A positive response to the name ping specifies a correct server name. A positive response to the IP address ping denotes connectivity to the server. If an IP address ping succeeds but a name ping fails, you might have a name-resolution problem.

Test 2. Test the availability of shares on the server. Standalone and clustered servers can lose shares in a server crash. When the server comes back online, users might not be able to access their folders and files because the shares no longer exist.

Test 3. Check whether your servers' support services are running. Because of startup dependencies, a support service (e.g., Web, FTP, Server, Browser, Backup, Exchange Server) might fail to start after a reboot or failover.

Your script must conduct these three tests. However, you have a dynamic computing environment in which your server names, IP addresses, shares, and services change frequently. If you include these parameters in the script, you would need to modify the script each time a change occurs. This process is not only time-consuming but also increases the chance for errors. So you decide to read in these parameters from text files when the script executes. That way, you can easily update the parameters without changing your script. In addition, you can use the same script at a different company location without modifying it. You also need a text file containing the pager personal identification numbers (PINs) for the IT staff members who need to respond to failed servers.

To meet these key requirements, you determine that your notification system must contain one script (ServerTester.bat), four input text files (PingList.txt, SharesList.txt, ServicesList.txt, and OnCallList.txt), and one output text file (FailureLog.txt). ServerTester.bat will test the ping response, share availability, and services running for the objects you specify in PingList.txt, SharesList.txt, and ServicesList.txt. OnCallList.txt will contain the pager PINs for the IT staff members who are on call at any particular time. FailureLog.txt will log all paging events. Each event will have a date/time stamp, describe the failure incident, and identify the paging recipient.

With the solution mapped out, you get to work. You decide to create the input text files before you tackle the ServerTester.bat script.

The Input Text Files
Creating the input text files is simple. You just use a text editor to enter the necessary information for each file.

PingList.txt. This input file includes all the servers that could affect responsiveness and availability, such as WINS servers, DNS servers, Web servers, file and application servers, and PDCs and BDCs in your domain and other trusted domains. For each server, specify the server name and its IP address in this format:


Each server and its IP address must be on a separate line.

SharesList.txt. In this file, include the Uniform Naming Convention (UNC) paths to the shares you want to test. Each path must be on a separate line. You can use spaces or underscores in the paths, such as

\\rlserver\e drive

ServicesList.txt. This input file includes the names of the services you want to check and the names of the servers running those services. First, you specify the server name, followed by the common, or friendly, service name (i.e., the name that appears under Net Start) and the real service name (i.e., the name that appears in the Registry). The ServerTester.bat script uses the friendly service name for paging messages and event logging and the real service name to test whether the service is running. You can use the Microsoft Windows NT Server 4.0 Resource Kit's Netsvc utility to obtain the real service names. Each service must be on a separate line in this format:


OnCallList.txt. In this file, you specify the staff on call. Begin the file by including two lines:

Weekday Default Recipient(s),88888
Weekend Default Recipient(s),22222

In the first line, you specify the default IT staff members (i.e., recipients) to be notified if a failure occurs on a weekday, followed by those recipients' pager PINs. In the second line, you specify the default recipients to be notified if a failure occurs on the weekend, followed by those recipient's pager PINs.

After you specify the default on-call recipients, you can specify the rotating on-call recipients. Include the dates they're on call, followed by their names and pager PINs, using this format:

06/05/1999,06/06/1999,Linda Martinez

PingList.txt, SharesList.txt, ServicesList.txt, and OnCallList.txt are comma-delimited files. You can find examples of these files on the Win32 Scripting Journal Web site. An example of the FailureLog.txt output file is also on the Web site.

The ServerTester.bat Script
Creating the ServerTester.bat script isn't as simple as creating the input files because ServerTester.bat automates many complex tasks. When you have to automate many tasks, dividing your script into logical sections, or modules, makes the code easier to write initially and easier to understand later if you have to change it. Another benefit of using modules in your code is that you can easily reuse modules in other scripts.

Creating modules means dividing the script into smaller, more manageable pieces. Creating a module for each key task is an effective approach. Thus, you decide to create three primary script modules: the IP connectivity (:PINGTEST) module, the shares availability (:SHARTEST) module, and the services availability (:SERVTEST) module. As for most scripts, you need several secondary modules to accomplish common tasks. For example, you need a file-checking (:FILES) module to determine whether the input files are available and a PINS selection (:GETPINS) module to obtain the pager PINS for the IT staff members who are on call. Finally, to bind all the modules together, you need a main block of code—the :MAIN block—that calls each module. The modules in ServerTester.bat are subroutines. Some of the modules also contain subroutines, or procedures, and those procedures also include subroutines. For example, the :PINGTEST module contains the :PINGIT procedure, which includes the :NAMEFAIL, :TRYIP, :IPFAIL, and :FINPING subroutines. All subroutines that you access must begin with a colon, followed by a label. Although subroutine labels are case-insensitive, the labels in ServerTester.bat are capitalized for easy identification. (Because the ServerTester.bat script is long, I've included only some of its modules here. You can find the entire script on the Win32 Scripting Journal Web site.)

The :MAIN Block
The :MAIN block in Listing 1 isn't a module because you can't use this section in any other script—the section is specific to ServerTester.bat. However, the :MAIN section is vital because it directs the script's flow.

The :MAIN block begins with the Setlocal command, which sets a local scope for any environment variable changes that you make. Next, the :MAIN block calls the various modules. After the script proceeds through a module, that module's Goto :EOF command (EOF specifies end of file) returns the script to the :MAIN block, which advances the script to the next module. (For more information about how the Goto :EOF command works, see the Web exclusive sidebar "How the Call and Goto Commands Work." To view this sidebar, open the article "Scriptwriting Methodology, Part 2: Advanced Data Manipulation and Formatting" in the April 1999 issue. In the Article Info box, click the file under the heading Files/Code.)

After the :MAIN block calls the various modules, it uses the Endlocal command to restore the environment variables to the value they had prior to the Setlocal command. The block then displays the message Testing is complete. 2 minutes until next test and initiates the resource kit's sleep.exe utility for 60 seconds. The ampersand (&) concatenates the Echo. command, which places a blank line on the screen for easier readability. The block then displays the message 1 minute until next test and initiates sleep.exe for another 60 seconds. After the second sleep period, the script flows to the top of the :MAIN block, causing the script to repeat.

Although ServerTester.bat uses the sleep.exe utility to repeat every 2 minutes, you can reconfigure the script to run at a different interval. You can use the AutoExNT Service to start the script. The AutoExNT Service lets the script run in the background, making the script less vulnerable to accidental shutdown. (For more information about the AutoExNT Service, see autoexnt.doc in the resource kit.)

The :PINGTEST Module
Your script must test for a ping response by both name and IP address. You need a way to parse the ping results to determine success or failure. Figure 1 shows the possible results if you ping a server by server name and IP address with the Ping -n 1 command. This command initiates only one ping against the server name or IP address instead of the default four pings. The ping results show that you can use the first word in the fourth line to determine whether a ping was successful. If the first word of the fourth line is Reply, the ping succeeded. Any other result (e.g., no value, the word Request, the word Destination) specifies a failure.

Listing 2 contains the :PINGTEST module. You begin this module by using the Echo command to display the message Pinging each server by both name and IP address. Next, you use the For command's file-parsing (/f) switch to parse the comma-delimited PingList.txt file for the server name and IP address, which are tokens 1 and 2, respectively. You use iterator variables (e.g., %%I) to capture the information. (For more information about using tokens, delimiters, and iterator variables when parsing files, see "Scriptwriting Methodology, Part 1," March 1999.) You then set the captured server name and IP address to the serverp and ipaddp variables, respectively, and call the :PINGIT procedure. After the For command finishes parsing each line in PingList.txt and applying the :PINGIT procedure, the Goto :EOF command tells the script to return to the :MAIN block.

The :PINGIT procedure performs the two ping tests. Here's how the procedure works:

  1. The procedure uses the Time and Date commands with the /t switch to set the date and time for the page message and event log. The /t switch tells the commands not to include a prompt that asks you to specify a new date or time. (For more information about using the Time and Date commands, see "Scriptwriting Methodology, Part 2," April 1999.) The procedure uses the date and timestamp only if a ping test fails.
  2. The procedure sets the responsename and responseip variables to nothing on each run so that they don't hold any values. If you were to skip this step, the responsename and responseip variables would retain the values from the previous iteration, causing errors.
  3. The procedure pings the server name. In the ping result set, the For command skips the first three lines (skip=3) and captures token 1 (tokens=1) from the fourth line with the %%I variable. (The delims= specifies that you're using the default delimiter of a space or tab.) The procedure then sets the %%I variable to the responsename variable.
  4. The procedure tests whether the responsename variable's string matches the 'Reply' string. (The double equal sign, ==, specifies the comparison of two strings for equality.) If no match occurs (i.e., the name ping failed), the procedure goes to the :NAMEFAIL subroutine in step 5. If a match occurs (i.e., the name ping succeeded), the procedure goes to the :TRYIP subroutine in step 6.
  5. If the name ping fails, the procedure initiates paging with the :NAMEFAIL subroutine. The subroutine uses the Start command to begin an executable program that pages the on-call staff members with a message specifying the server name, type of failure (i.e., NAME Ping Failure), and date and time of failure. The executable program then records the recipients' pager PINs, server name, type of failure, and date and time of failure in FailureLog.txt.
  6. If the name ping succeeds, the procedure initiates the IP address ping with the :TRYIP subroutine. In the ping result set, the For command skips the first three lines and captures token 1 from the fourth line with the %%I variable. The procedure sets the %%I variable to the responseip variable for further processing in step 7.
  7. The procedure tests whether the responseip variable's string matches the 'Reply' string. If no match occurs (i.e., the IP address ping failed), the procedure goes to the :IPFAIL subroutine in step 8. If a match occurs (i.e., the IP address ping succeeded), the procedure goes to the :FINPING subroutine in step 9.
  8. If the IP address ping fails, the procedure initiates paging with the :IPFAIL subroutine. This subroutine works the same as the :NAMEFAIL subroutine, except that it identifies the type of failure as IP Ping Failure in the page and in FailureLog.txt.
  9. If the IP address ping succeeds, the procedure initiates the :FINPING subroutine. This subroutine uses the Goto :EOF command to send the script to the first line in the :PINGTEST module, which parses the next line in the PingList.txt file to obtain another server name and IP address pair for testing.

Next Month—The :SHARTEST, :SERVTEST, :FILES, and :GETPINS Modules

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.