Automate SQL Server Error Log Checking

SQL Server has a built-in GUI you can use to expand the SQL Server Logs node to check error logs. In SQL Server 2000 this GUI is the Enterprise Manager; in SQL Server 2005 it’s SQL Server Management Studio (SSMS). Because using these tools to examine log contents can be slow, many people instead log on to the server console to read the physical logs with WordPad. In addition, some DBAs use the undocumented stored procedure sp_readerrorlog to view logs. However, these methods are inefficient and labor intensive. If you’re managing thousands or even just hundreds of SQL Server systems, you might spend most of the day manually checking each server. An alternative is to use a third-party tool to automate the process. But even this solution isn’t ideal; the tool you choose might lack certain flexibility, such as the ability to add or remove target events at will or the ability to adjust the reporting time frame for error logging. Moreover, third-party tools generate additional costs and installation time.

My solution was to create my own automated, customizable, flexible, and accurate method for searching and reporting on SQL error logs. I designed my solution for an environment in which a large number of SQL Server 2005 and SQL Server 2000 instances coexist.

My Environment

The SQL Server farm that I manage consists of about 50 SQL Server 2005 systems and 50 SQL Server 2000 systems. I have a dedicated SQL Server 2005 box that serves as a central monitoring server. This system runs SQL Server 2005 Enterprise Edition SP2 on top of Windows Server 2003 SP2. I established linked servers on this system to connect to the remote client server via the current login’s security context. Separate jobs run on the central monitoring server and the remote client servers. You can view these processes as being built on a client/server architecture. The client-side job saves meaningful searched errors in a table based on certain customizable criteria. The server-side job polls information from the clients and sends email reports if errors are found. Note that I use the terms “clients,” “client servers,” and “remote client servers” interchangeably to describe the servers being monitored (like a client).

Findstr.exe

The DOS utility findstr.exe is fast and powerful for searching ASCII files for strings. When combined with files containing a group of strings to be included or excluded, findstr.exe is an ideal tool for searching SQL error logs in SQL Server 2000.

The following example demonstrates findstr.exe’s capabilities. First, build the folder structure on a remote client server by creating a folder named SQLErrorReport at the C root. Copy into this folder the inclusion file that Web Listing 1 contains and the exclusion file that Web Listing 2 contains. Rename these files findstr_incl.txt and findstr_excl.txt, respectively. Make sure the SQL Server Agent service account has read and write permissions to this folder. The remote client process calls the stored procedure Usp_SQLErrorLogReportClient. Web Listing 3 contains the code for this stored procedure, which enumerates and reads error logs within the desired time frame, as well as searches for and saves DBA-defined error strings. The client process is implemented through a scheduled job, which Web Listing 4 contains. Run Web Listing 3 and Web Listing 4 to prepare the client process. (Later in the article, I’ll explain how to quickly create a process in case you have many SQL Server systems to monitor and you don’t want to create each one manually.) You can change the job schedule to suit your needs.

The stored procedure Usp_SQLErrorLogReportClient now resides in the master database, with dbo as schema. The procedure takes three input parameters: @SinceWhen, @RootPath, and @ShowResult. @SinceWhen is a datetime value that defines the start time when error checking begins. The default is 12 hours ago, which ensures that no logs are skipped when a server restart happens during off-work hours. When a report runs, it typically logs the timestamp in a table so that it knows where in the logs to start on the next run. Customized reports typically scan the current error logs or the entire error log. This approach risks skipping logs because they are recycled every time a SQL Server service restarts. Another drawback to this method is that it repeats old search results. @RootPath is where the home working folder resides. Placing it at the C root makes the code more portable. @ShowResult is useful when a DBA wants to run ad hoc checks—for example, you can run the following T-SQL statement to check all errors for this year:

EXEC master.dbo.Usp_SQLErrorLogReportClient   '2008-01-01', NULL, 1

The findstr.exe utility doesn’t work with binary files—so you can’t use findstr.exe to check SQL Server 2005 logs, because SQL Server 2005 writes error logs in binary. The easiest method for searching SQL Server 2005 logs is to use T-SQL.

SQL Server 2005 error logs are formatted differently than SQL Server 2000 error logs. The most obvious difference is the separation of the log date from the log text, which makes sorting by date easy. The stored procedure Usp_SQLErrorLogReportClient has a WHERE clause that filters out a list of search strings if the database engine is SQL Server 2005.

Fine-Tuning

You can tighten or loosen the search criteria based on your environment. For example, if you don’t want to see the message “SQL Server terminating because of system shutdown” when searching SQL Server 2000 error logs, you can remove the keyword “terminating” from findstr_incl.txt (Web Listing 1). Alternatively, to avoid other error messages containing the word “terminating” from being filtered out, you can add the text “terminating because of system shutdown” to findstr_excl.txt. For SQL Server 2005 client servers, modify the WHERE clause to include the following condition in Usp_SQLErrorLogReportClient:

OR (LogText LIKE N'%terminat%' AND LogText    NOT LIKE N'%shut down%')

The DBA should customize the search strings to determine what data to include in reports.

The Central Monitoring Server

The central monitoring server’s process runs under the job in Web Listing 5, which calls Web Listing 6’s stored procedure Usp_SQLErrorLogReportServer in the master database. Run both listings. Web Listing 5 loops through a long list of remote client servers and extracts their saved error log information.

Many companies have numerous SQL Server systems that aren’t always accessible at the same time. To avoid interrogating a client server that is temporarily offline when the server-side job runs, I used the odbcping.exe utility to include database connectivity checking logic in Usp_SQLErrorLogReportServer. This tool is a command-shell utility that lets you test the ODBC data source’s integrity and the client’s ability to connect to the server.

Although odbcping.exe was phased out in SQL Server 2005, I’ve found it to be reliable for detecting whether SQL Server 2000 systems are accessible. The utility isn’t installed by default. You can find it in the SQL Server 2000 installation disk’s \x86\binn directory. Copy the executable to the C root. You can also download odbcping.exe from the SQL Server Magazine website; go to www.sqlmag.com, enter InstantDoc ID 99934, and click the Download the Code button. Remote login is required for odbcping to work. Later in the article, I discuss a quick way to create a remote login on client servers.

The table to store all error log information on the monitoring server is Tbl_SQLErrorLogReport, which resides in the tempdb database. This table can accommodate both SQL Server 2005 and SQL Server 2000 error log formats. If this table generates any entries, the Usp_SQLErrorLogReportServer stored procedure will send an email notification to a defined recipient, with the involved client server’s name in the header and error messages in the body of the message. For email notification to work, you must have correctly configured database mail on the central monitoring server. Figure 1 shows an email report sent to the DBA group and received by a Lotus Notes 7.0 client.

Mass Operations

Manually setting up the client processes on a large number of SQL Server systems is difficult. In addition, modifying the client processes on hundreds of servers (e.g., adding a new keyword in the inclusion files) is equally tedious. I created the stored procedure Usp_Mass_Operation_On_All_Clients, which WebListing 7 contains, to handle mass operations onSQL Server clients. This procedure includes fourinput parameters: @InputFileFullPath, @HasWritePermissionOnClient, @RootPathOnMonitoringServer, and @RootPathOnClient.

The parameter @InputFileFullPath must be a file containing valid T-SQL statements. The parameter @HasWritePermissionOnClient is a bit value that works solely with DOS commands, such as xcopy. When the parameter’s value is 1, the SQL Server service account on the central monitoring server has appropriate permission on the remote client servers. If the value is 0, the Mssqlserver account doesn’t have permission to run command shell commands on client servers. In this case, the stored procedure generates a result that can be copied and pasted into a batch file. You can then run the batch file after you log in to the client server via a proper user account. The parameter @RootPathOnMonitoringServer is the source file path; @RootPathOnClient is the destination file path. They are identical by default, but they don’t have to be.

To begin, create the SQLErrorReport folder under the central monitoring server’s C root and copy findstr_incl.txt and findstr_excl.txt into this folder. The SQLErrorReport folder and its contents will be rolled out to all the client servers.

You can use the same stored procedure to also create the client-side jobs. Open the SQL Editor on the central monitoring server and run the following code (you need remote write privileges to execute this code):

EXEC master.dbo.Usp_Mass_Operation_On_All_
  Clients @InputFileFullPath = 'C:\Client_
  job.sql', @HasWritePermissionOnClient = 
  1, @RootPathOnMonitoringServer = NULL, @
  RootPathOnClient = NULL

This code runs the client_job.sql file to create the job on all clients, and it copies the SQLErrorReport directory and its files to the same locations on all clients. Note that the client_job.sql file is the same file as Web Listing 4. Also note that the login account and password used with the OSQL utility in Web Listing 7 must be replaced with appropriate login credentials.

If you want to add, remove, or modify a search string in the inclusion file on all clients, but the central monitoring server’s SQL Server service account isn’t able to write to clients, you can make the change in the findstr_incl.txt file on the central server first, then save the result from running the following code to, for example, a file named Update_search_strings.bat.

EXEC master.dbo.Usp_Mass_Operation_On_
  All_Clients @InputFileFullPath = NULL,
  @HasWritePermissionOnClient = 0, @
  RootPathOnMonitoringServer = NULL, @
  RootPathOnClient = NULL

Figure 2 shows the results of running this code. Run the batch file under an account that can write to the client servers, such as a domain user account that is a member of the local administrator group on the client servers. Figure 3 shows the results of running the batch file.

Figure 2: SSMS query result

You can now automatically create the client stored procedure Usp_SQLErrorLogReportClient on multiple SQL Server systems. You need to save the stored procedure (which Web Listing 3 contains) as Usp_SQLErrorLogReportClient.sql. Then, run the following T-SQL statement to simultaneously create the stored procedure on all clients:

EXEC master.dbo.Usp_Mass_Operation_On_
  All_Clients @InputFileFullPath = 'C:  temp\ Usp_SQLErrorLogReportClient.sql', 
  @HasWritePermissionOnClient = NULL, @
  RootPathOnMonitoringServer = NULL, @
  RootPathOnClient = NULL

To create the client-side login, run the following code:

EXEC master.dbo.Usp_Mass_Operation_On_All_
  Clients ‘C:\temp\Create_OdbcpingLogin_
  account.sql’, NULL, NULL, NULL

The code for the Create_OdbcpingLogin_account.sql file is as follows:

IF NOT EXISTS (SELECT * FROM master.
  dbo.syslogins where \[loginname\] = 
  N'OdbcpingLogin')
  EXEC master.dbo.sp_addlogin 
  N'OdbcpingLogin', N'Pswd2008';

Refine As You Go

Although my solution can save DBAs a tremendous amount of time and labor, it shouldn’t prevent you from manually checking your error logs from time to time. Regularly investigating logs by hand will help you discover events that have been missed by the automated process, so that you can update your error checking processes in a dynamic database environment. Many factors can affect whether your automated error checking process works well. When rescheduling jobs, make sure the central monitoring server job runs after the completion of client-side jobs. In addition, if you modify the inclusion or exclusion files, be sure to watch for white spaces. You don’t typically see the hidden blanks in WordPad or Notepad but they can cause unexplained problems. Finally, remember that each line in the final report represents a problem or important event that needs further investigation. Although my technique works out of the box, it will be most effective if you continue to refine it for your environment.

Comments

Plain text