SQL Server Database Corruption, Part V: Storage Problem Alerts

Continuing on from our last post where we looked at setting up CHECKSUM verification as one means of helping more readily detect corruption early-on, we’ll now take a look at how to have SQL Server notify you of any instances where it runs into problems with the IO subsystem.

Configuring Alerts for Storage Problems

In addition to being able to quickly and easily detect corruption problems once CHECKSUM verification has been enabled, SQL Server is also robust enough to be able to watch out for other kinds of disk errors. In this regard, SQL Server can be set up to pro-actively send alerts when it encounters simple errors such as read-write failures, or even when it proactively attempts to issue multiple re-read attempts against ‘sticky’ or problematic data (that is typically a clear sign of impending problems at the IO subsystem level).

Therefore, to take advantage of SQL Server’s innate ability to track these kinds of errors, all you need to do is:

Ensure that you’ve got SQL Server’s Database Mail functionality set up and properly configured. (Setting this up is outside the scope of this document, but is easy enough to do by right-clicking on the node and walking through a wizard where you establish connectivity and settings that enable SQL Server to ‘talk’ to an SMTP server within your organization).

PartV_Figure1
Create at least one SQL Server Agent Operator – as per the screenshot below:

PartV_Figure2

And, again, the creation of operators is outside the scope of this document – but is easy enough to do by right-clicking the Operators node and walking through the process of creating at least a ‘catch all’ operator such as ‘Admins’ or ‘General’ that can be mapped to something like [email protected] or some other monitored email address where alerts can be sent.
Configure SQL Server to forward instances of common IO errors ‘up’ to a SQL Server Agent Operator so that you can be notified of errors AS they occur.

To implement this last step, execute the following code on your SQL Server to create the initial alerts needed to forward SQL Server errors 824, 825, and 826 as they occur:

USE msdb
GO

EXEC msdb.dbo.sp_add_alert @name = N'823 - Read/Write Failure', 
    @message_id = 823,
    @severity = 0,
    @enabled = 1,
    @delay_between_responses = 0,
    @include_event_description_in = 1
GO

EXEC msdb.dbo.sp_add_alert @name = N'824 - Page Error', 
    @message_id = 824,
    @severity = 0,
    @enabled = 1,
    @delay_between_responses = 0,
    @include_event_description_in = 1
GO

EXEC msdb.dbo.sp_add_alert @name = N'825 - Read-Retry Required', 
    @message_id = 825,
    @severity = 0,
    @enabled = 1,
    @delay_between_responses = 0,
    @include_event_description_in = 1
GO

Then, once these alerts are configured, refresh the Alerts node and then right click on each of them, select Properties and then set up the Response tab to Notify operators when an error occurs – as per the following screen shot:

PartV_Figure3

By means of this approach, you’ll be alerted any time SQL Server encounters disk failures – which are almost always related to the potential for corruption or associated with processes and situations that create corruption.

Up Next

Now that you’ve configured SQL Server to alert you when it encounters storage problems, you’re now better able to be alerted to problems with corruption (or other disk subsystem issues) when the happen – instead of letting them effectively go unnoticed until it becomes harder (or even too late) to deal with them. But, there’s much more that you can do to stay apprised of the potential for corruption, such as schedule regular checks – which we’ll look at in our next post.

Part VI: Regular Corruption Checks

Comments

Plain text