Q: I’m confused by some behavior I’ve noticed with differential backups. When I update a large portion of a table (say, 10GB), the next differential backup is about 10GB. When I perform the same update and then roll back the transaction, the next differential backup is still 10GB. Why isn’t the differential backup empty in the second case?
A: You’re correct that the behavior doesn’t seem intuitive at first—nothing changed in the data, right? However, SQL Server is absolutely doing the right thing under the covers. The explanation of this behavior has two parts.
First, I’ll explain how SQL Server knows what to include in a differential backup. The behavior is the same for file, file groups, and database differential backups (I’ll just call them differential backups for simplicity). A differential backup includes everything that has changed since the last full backup, with the minimum unit of change being an extent (i.e., a 64KB chunk of a data file comprised of 8KB x 8KB data file pages).
When a change is made to a data file page, the extent it’s a part of is marked as changed in a special bitmap called the differential bitmap or differential change map. There’s a differential bitmap tracking each 4GB portion of each data file. A differential backup scans the differential bitmaps for the portion of the database being backed up and includes all extents marked as changed. You can read more about the differential bitmap and other specialized allocation bitmaps in my blog post "Inside The Storage Engine: GAM, SGAM, PFS and other allocation maps."
A full backup is the only operation that clears the differential bitmaps, so subsequent differential backups without intervening full backups can become larger and larger as more of the database changes. You can see how much of a database has changed since the last full backup using the script in my blog post "New script: How much of the database has changed since the last full backup?" Some people use this script in production to help automatically decide when to take a full backup instead of another differential backup, and it can also give you a measure of the churn rate of the database.
The second part of the explanation gets to the crux of your question—why doesn’t rolling back a transaction clear the differential bitmap bits that the transaction caused to be set in the first place? Well, it can’t.
Simply put, every change made to a SQL Server database must generate a transaction log record describing the change. A complex operation might generate many transaction log records, each describing a change to a data file page. Each data file page has a field in the page header (i.e., the first 96 bytes of the page that contain metadata about the page and its contents) that tracks the last transaction log record that affected that page. This field is used for crash-recovery purposes, which is beyond the scope of this article, and each transaction log record tracks which data file page it affected. (You can read more about page header fields and their meanings in my blog post "Inside the Storage Engine: Anatomy of a page.")
When a transaction rolls back, all of the changes it made in the database essentially have to be reversed. Performing the reverse of an INSERT is a DELETE, the reverse of a DELETE is an INSERT, and so on. This process is driven from the transaction log records the transaction generated—they’re undone in the opposite order from which they were generated by the original transaction.
The main point here is that rolling back a transaction involves performing more changes to the database and generating more transaction log records. This means the data file pages involved are changed again, even though the change is undoing the effects of the first change.
As far as the logical operation of backups is concerned, it doesn’t matter whether the pages in a data file extent were changed by a transaction that was subsequently rolled back—they were changed, so they have to be backed up by the next differential backup. That’s why a rolled-back transaction still results in data being included in a differential backup.