Although the SQL Server database engine is the core service for processing data, it's often misunderstood. However, it helps to know how the database engine performs the fundamental operations of retrieving and modifying data, including the transaction log's role. This information can help you architect disaster recovery strategies and understand the role of memory and disk on database performance.
The database engine has two major components: the storage engine and the query processor. The storage engine writes data to and retrieves data from stable media (e.g., disks). The query processor accepts, parses, and executes SQL commands.
When a user submits a SELECT statement, he or she is asking the query processor to retrieve data that resides in data files on the disk. These files usually have an .mdf or .ndf file extension. In addition to the data, the data files contain indexes and other objects. The indexes are copied to a server's RAM, as Figure 1 shows. This memory has several names, including buffer cache, buffer pool, and bpool. If the indexes are already in the buffer cache, the database engine doesn't need to copy the indexes from the disk again.
Once in the buffer cache, the query processor uses the indexes to find the desired data rows in the data files. To simplify this discussion, let's assume the table queried with a SELECT statement has a clustered index. (Tables without a clustered index are called heaps and have a different structure.) The data files reside on the leaf level of the clustered index. A table can have only one clustered index. The other indexes are called nonclustered indexes. Nonclustered indexes contain row locators that point to the data row in the clustered index. After the query processor uses the available clustered and nonclustered indexes to retrieve the data, it returns the result set.
When a user submits an INSERT, UPDATE, or DELETE statement to modify data, the transaction log gets involved because data modification statements are always enclosed in a transaction, even if they're not coded as such. When you enclose a data modification statement in the BEGIN TRANSACTION and COMMIT TRANSACTION statements, it's considered an explicit transaction. When you don't use those statements, it's considered an implicit transaction. Whether implicit or explicit, committing the transaction makes all data modifications in that transaction a permanent part of the database.
A transaction log sequentially records all of a database's modifications and other types of transactions. (Microsoft recommends having only one transaction log per database.) A transaction log typically has an .ldf file extension.
The transaction log supports the A in ACID. That is, it helps transactions achieve atomicity, which is one of the four properties (Atomicity, Consistency, Isolation, and Durability) that transactions should possess if they're going to be processed reliably. Atomicity simply refers to the idea that a transaction needs to be processed completely or not at all. In other words, if a transaction fails at any point in the process, the entire transaction must be rolled back.
The process leading to a committed transaction is a complicated one. When a user submits a data modification statement, SQL Server reads the data files affected by the modification into the buffer cache (assuming they aren't already present in the buffer cache from a previous operation), updates the values in the buffer cache, then records the modification in the transaction log. Recording the modification is a vital step because, for a transaction to commit, the change information must be written to the transaction log. This requirement is called write-ahead logging.
Data files in the buffer cache that have been modified but the modifications haven't yet been written to disk are called dirty data pages. Dirty data pages are later written, or flushed, to disk by an asynchronous write process, as Figure 2 shows. There are three types of asynchronous write processes:
- Eager writing process. This process writes dirty data pages associated with nonlogged (e.g., BULK INSERT) operations to disk.
- Lazy writing process. This process wakes up periodically to write dirty data pages to disk and remove infrequently used pages from the buffer cache.
- Checkpoint process. The checkpoint process writes dirty data pages to disk, including those pages whose transactions haven't been committed yet. Its primary purpose is to keep the number of dirty data pages to a minimum.
Data can be modified several times in the buffer cache before the dirty data pages are flushed to disk.
Upon SQL Server startup or a database restore, the database undergoes a recovery process. During recovery:
- Log records of data modifications not flushed to disk are rolled forward (they're replayed). For example, suppose a modification was made to a data file in the buffer cache and that modification was recorded in the transaction log as required by write-ahead logging, but the data file wasn't written to disk because the SQL Server service failed. When the SQL Server service restarts, the data file will be reloaded into the buffer cache and updated with the modification based on the information found in the transaction log.
- Log records of data modifications associated with incomplete transactions are rolled back (they're undone). For example, suppose a modification was made to a data file in the buffer cache, the modification was recorded in the transaction log, and the modification was written to disk by the checkpoint process. However, the transaction remained open and did not commit when the SQL Server service failed. When the SQL Server service restarts, the incorrect data file will be read into the buffer cache and corrected with the information found in the transaction log.
Exploring the Transaction Log Further
The transaction log is logically divided into virtual log files. VLFs determine the boundaries of transaction log reuse and truncation. A transaction log is a "wrap around" file in that inactive portions are overwritten with new log records. As Figure 3 shows, the active portion of the transaction log is from the minimum recovery log sequence number (MinLSN) to the last written log record. MinLSN is set by the checkpoint process. Log records preceding MinLSN can be truncated. Log records after MinLSN are active and can't be truncated.
The checkpoint process first marks a VLF as reusable. The log truncation process then marks the VLF as logically truncated. Truncated VLFs can be overwritten. Note that the physical transaction log file isn't shrunk by the log truncation process.
For databases with the full recovery model, log truncation usually occurs after a log backup and requires a checkpoint beforehand. For databases with the simple recovery model, log truncation usually occurs after a checkpoint. Log truncation won't occur if a data backup is running, if there's an active transaction, or if transactional replication is waiting for a transaction to be delivered to the distribution database. If the log truncation process fails, the transaction log will grow until it fills up the disk it resides on.
If you'd like more information about how the transaction log, data modification operation, or data retrieval operation works, check out these articles: