Transaction log corruption and backups

Question: I came across a situation where a transaction log backup failed because of corruption in the transaction log. One of my colleagues put the database offline and then online and problem went away. Can you explain why and whether log corruption can lead to a suspect database?

Answer: Transaction log corruption is interesting because it doesn’t usually cause any problems apart from failed backups. However, that doesn’t mean it should be ignored.

As I’ve stated many times, the vast majority of corruptions are caused by the I/O subsystem and a transaction log file is just as likely to be corrupted as a data file. Unless the corruption occurs in the active portion of the log (the portion that is required for some reason by SQL Server – see this article for background information) then its likely that no-one will ever know it occurred.

This may seem disturbing to you, and in a way it is – undiscovered corruption isn’t good – but there’s no way for SQL Server to process the entire log, only the active portion. DBCC CHECKDB doesn’t analyze the log at all any more since I rewrote it in SQL Server 2005, instead it will only use the active portion of the log as a by-product of creating a database snapshot to run the consistency checks on. There’s no consistency checking of the transaction log – only checksums that are checked as log records are read, for whatever reason.

The active portion of the log will also be used if a transaction rolls back, if crash recovery has to run, or when a transaction log backup occurs (plus a bunch of other uses like replication and mirroring – the aforementioned article has more information). If a damaged log record is encountered during one of these uses then an error will be thrown and the operation will fail.

The only time a database will be marked SUSPECT because of a corruption in the transaction log is if the corruption is encountered during crash recovery or during a transaction rollback. In that case, the operation will fail and the database will be transactionally inconsistent – the definition of a suspect database.

If a corrupt log record is encountered during a transaction log backup, the backup will fail – but that’s all. You can work around this by:

Switching the database to the Simple recovery model
Performing a checkpoint (which should clear the active log as long as nothing else requires the log to be kept active)
Switching back to the Full recovery model
Reestablishing the log backup chain by performing a full or differential backup

This isn’t guaranteed to work in all situations, but is the simplest method to have the backups essentially skip the damaged portion of the log. Just be aware that by doing this, you’re limiting your disaster recovery options if the data backups become corrupt and you want to be able to restore a long chain of log backups – you’ve broken the chain at the point you switched to Simple. However, you really have no choice as there’s no way to back up that log and be able to use the log backup in any meaningful way.

Of course, as in any corruption situation, make sure to do some root-cause analysis to figure out why the log became corrupt in the first place.

Comments

Plain text