Microsoft SQL Server databases comprise at least two files: the data file and the transaction log file. The data file is where all the focus tends to be. That is where you build out the schema, defining everything from tables and relationships between tables, to every defining property of every database object. Most importantly, this is where you store all the data you’ve created this database for in the first place. While you need at least one data file, you can (and, I normally recommend, should) have multiple files. This allows you to distribute data among physical storage locations and tiers of storage performance to fit your needs.
Meanwhile, you have the lowly transaction log file. Few data professionals tend to even look inside the transaction log. Many seem to regard the transaction log with trepidation or even fear because it can cause some real problems if not well cared for. The recommendation is that you only ever have a single transaction log file, as Microsoft SQL Server will “round-robin” through multiple log files versus striping across them as a performance boost. You can even attach a data file from a previously detached database to an instance without the associated transaction log file.
So, here you have a tale of two file types, where one is generally perceived as important and the other isn’t--it’s a black box that occassionally causes issues. In fact, just like all good partners, one isn’t good without the other. The transaction log plays a critical role in everything your database was initially created to do. It hardens logged transactions to disk as query requests are processed. It enforces the atomicity of transactions. It allows for rolling back uncommitted transactions and rolling forward committed transactions yet to be written successfully to the data file in event of server failures, fail-overs, and other situations where the Microsoft SQL Services are stopped and restarted either intentionally or otherwise.
I’d say that the transaction log is very much like the Wizard in The Wizard of Oz: It’s quietly running things while the real work appears to be done elsewhere in the data file. Of course, one doesn’t work without the other: The transaction log is no good without the data and structure associated with the data file; the data file is of no use without a process for logging transactions and providing the ability to harden transactions asynchronously to the database, as well as manage all the intricacies needed to support high availability and disaster recovery needs.
Microsoft SQL Server has Dynamic Management Views (DMVs) associated with surfacing transaction log metadata to the interested end user. There are four to be discussed here:
I’ll be reviewing each one individually below, then wrapping up this article with a diagnostic query that ties them all together.
The primary purpose for this DMV is to provide a join conduit from the transaction log DMVs I’m covering here to the stack of execution-centric Dynamic Management Objects (DMOs): dm_exec_requests | _sessions | _connections, dm_exec_sql_text, and others that provide insight into the who, what, and where of all user and system load running on a SQL Server instance. dm_tran_session_transactions exposes both session_id and transaction_id columns, which are the unique identifiers for their respective stacks. Any join you wish to make to look at transaction log metadata and execution info must use dm_tran_session_transactions to do so effectively.
There are only two other columns of interest provided beyond the linking functionality stated above: is_user_transaction, which allows you to filter out system transactions from user-initiated transactions, and open_transaction_count, which presents the total number of open transactions for any given session_id.
This dynamic management view offers insights into the current session’s transaction. Considering that scope, I rarely find this DMV useful because you really can’t query it for information on the current session’s transaction unless you call the DMV in the session you’re attempting to gain insight on from within the query being run. For example, the following query could be used to see all of the metadata from dm_tran_current_transaction:
SELECT * FROM sys.dm_tran_current_transaction;
Alone, the results would only return the metadata for that statement. This is of no value when looking to get information on true transaction load for the current session. The only way to make this DMV usable for diagnostic purposes is to add it in advance to whatever query you’re looking to gain insight into through this DMV. Furthermore, it must be embedded in the same explicit transaction as the code you’re looking to gain transaction log insight into. Something like this as an example:
BEGIN TRANSACTION FOO DELETE FROM TAT_demo.dbo.Temps WHERE Temperature IN ( SELECT TOP 50000 Temperature FROM TAT_demo.dbo.Temps ORDER BY Temperature ); SELECT * FROM sys.dm_tran_current_transaction; COMMIT TRANSACTION FOO
By contrast, if not part of the transaction, dm_tran_current_transaction will only return information about the log impact of the call to that DMV, not the relevant T-SQL code.
If you wish to get insight into the transaction log impact of code running inside your current session, I’d suggest using a query using dm_tran_session_transactions and dm_tran_active_transactions in order to filter on your session_id:
SELECT TST.session_id , transaction_id , open_transaction_count , is_user_transaction , TAT.interesting_columns_here FROM sys.dm_tran_session_transactions AS TST INNER JOIN sys.dm_tran_active_transactions AS TAT ON TCT.transaction_id = TAT.transaction_id WHERE TST.session_id = session_id_of_interest;
Not only does this work as a bypass to valuable info out of dm_tran_current_transaction, it also is appropriate for any session(s) you’re looking to filter to.
Of course, you’ll need to replace the code above with an appropriate list of columns from sys.dm_tran_active_transactions and a proper session_id value, but we have yet to discuss the information afforded from dm_tran_active_transactions, so I’ve left that to your imagination for the time being.
Imagination time is over. This DMV provides general transaction log metadata at the granularity of the transaction. That limits some of the insights we can gain because any database-level metrics are out of scope of this DMV. Like all the DMVs being discussed here, transaction_id is the unique identifier for any transaction in the database and is the column to join on for the diagnostic queries in this article.
The columns of interest here, in addition to transaction_id are:
- name - If this is a named transaction that name will be presented here. Otherwise it’s populated with the type of transaction the underlying code is running (for example, “SELECT” or “DELETE”).
- transaction_type - This column presents as an integer value and is identical to the column of the same name in dm_tran_database_transactions, but with an additional value available to denote a distributed transaction type. Insight into the meanings of these integer values can be found here. I’ve also included shortened descriptor values in the comprehensive query that closes this article. I prefer to use this DMV’s column in that query over the identically named one exposed in dm_tran_database_transactions because it includes that additional value for distributed transactions.
- transaction_state - Similar to transaction_type, this is presented as an integer code that refers to a listing of descriptors also available here. Also, as with transaction_type, there is an identically named column in dm_tran_database_transactions. But while there are many similarities between the columns in both DMVs, there are enough differences that I include them both in this article’s closing query.
There are also various distributed tranaction-based queries in dm_tran_active_transactions that are not within the scope of this article. More information on those columns can be found here.
The final DMV I’m reviewing today is dm_tran_database_transactions. This DMV provides a view of transactions at the database level of granularity. As such, I hit this DMV whenever I want to see what database(s) are associated with a given transaction or transactions through the DB_Name(database_id) system function. Like all the other DMVs in this article, transaction_id is used for joining and for identifying a particular transaction.
This DMV is also my main source of transaction log information. It provides detailed information on log consumption for any transaction (user or system sourced) in terms of transaction log record counts and the amount of space in bytes. dm_tran_database_transactions also presents metadata for LSNs associated with any transaction. I’ll briefly touch upon what LSNs and VLFs are in the next section of this article.
The listing of columns I find relevant on a regular basis from dm_tran_database_transactions and their definitions, if not obvious from their naming conventions, are:
- database_transaction_type - referenced earlier, this presents information similar to transaction_type in dm_tran_active_transactions.
- database_transaction_state - similar, but different when compared to dm_tran_active_transactions’ transaction_state column.
- database_transaction_log_record_count - sum of all records associated with a transaction_id, aggregated at the database level. If you have spills to tempdb you’ll see records in the results of this DMV to both the source database and tempdb.
- database_transaction_log_bytes_used | _reserved - amount of bytes used and reserved for consumption by a transaction. While the results shown in these columns are specifically for user transactions, there are two other columns by the same name appended with “_system” that accounts for any internals activity that may occur in order to facilitate the source user transaction.
- database_transaction_last_lsn - last LSN currently associated with the transaction.
- database_transaction_commit_lsn - LSN associated with the commit log record for the transaction.
- database_transaction_last_rollback_lsn - if there was a rollback, the LSN based upon that rollback will be listed here. If there have been no rollbacks, this will the the maximum LSN
- database_transaction_next_undo_lsn - LSN of next record to undo
An Aside about LSNs and VLFs
An LSN, or Log Sequence Number, uniquely identifies a log record in the transaction log. LSNs are ordered numerically. The LSN value also has significance that I’ll explain here, but is outside the scope of this article.
The LSN value is made up of three parts: the VLF (virtual log file) in the transaction log that the transaction information is stored, an offset value to the log block in the VLF for this transaction, and the slot number inside of the log block. Depending on where you look, you’ll see LSNs identified explicitly in their three-part naming or as the decimal equivalent. Inside of the DMVs they're presented as decimal values.
Here's the best way I can describe at a high level what a virtual log file (VLF) is: It’s a building block of the transaction log file. Each transaction log file is made up of many smaller VLFs. The original sizing and number of VLFs is a factor of the original size setting for the transaction_log. Any time a transaction log file has to grow, additional VLFs are created at a count and size based upon that growth increment. A transaction log with a large amount of small VLFs will lead to less efficient operations, including, but not limited to, log backup/restore activity and database startup. There are also implications on performance of certain high-availability constructs with regard to proper sizing of VLFs. A broader discussion around VLFs is out of scope for this article.
Putting it All Together
Now that I’ve given you some insight into these DMVs, it’s time to show you a comprehensive query that pulls all of them together. Please understand that not all stories you want to tell about the transaction log or questions you want answered need to have all these columns returned. Comment out those that are unnecessary or build your own using this logic as a guide. In a forthcoming article, I will show how you can use these DMVs to performance tune code you’re developing.
SELECT DB_NAME(TDT.database_id) AS the_db , TAT.transaction_id , TAT.[name] AS trx_name , TST.is_user_transaction , CASE TAT.transaction_type WHEN 1 THEN 'R/W' WHEN 2 THEN 'R/O' WHEN 3 THEN 'SYSTEM' WHEN 4 THEN 'DISTRIB' END AS TAT_type
,CASE TAT.transaction_state WHEN 0 THEN 'STILL INITIALIZING' WHEN 1 THEN 'INITIALIZED, NOT STARTED' WHEN 2 THEN 'ACTIVE' WHEN 3 THEN 'ENDED' WHEN 4 THEN 'DIST COMMIT IN PROGRESS' WHEN 5 THEN 'PREPARED, AWAITING RESOLUTION' WHEN 6 THEN 'COMMITTED' WHEN 7 THEN 'BEING ROLLED BACK' WHEN 8 THEN 'ROLLED BACK' END AS active_state , CASE TDT.database_transaction_state WHEN 1 THEN 'TRX NOT INITIALIZED' WHEN 3 THEN 'INITIALIZED, BUT NO LOG RECORDS YET' WHEN 4 THEN 'TRX HAS GENERATED LOG RECORDS' WHEN 5 THEN 'TRX PREPARED' WHEN 10 THEN 'TRX COMMITTED' WHEN 11 THEN 'TRX ROLLED BACK' WHEN 12 THEN 'STILL BEING COMMITTED-LOG RECORD BEING GENERATED, BUT NOT MATERIALIZED YET' END AS database_state , TAT.transaction_begin_time AS trx_begin_time , TDT.database_transaction_log_record_count AS trx_log_records , TDT.database_transaction_log_bytes_used AS trx_log_bytes_used , TDT.database_transaction_log_bytes_reserved AS trx_log_bytes_reserved , TDT.database_transaction_log_bytes_used_system AS trx_log_bytes_used_system , TDT.database_transaction_log_bytes_reserved_system AS trx_log_bytes_reserved_system , (TDT.database_transaction_log_bytes_used + TDT.database_transaction_log_bytes_reserved + TDT.database_transaction_log_bytes_used_system + TDT.database_transaction_log_bytes_reserved_system) AS total_log_bytes , TDT.database_transaction_begin_lsn AS begin_lsn , TDT.database_transaction_last_lsn AS last_lsn , TDT.database_transaction_most_recent_savepoint_lsn AS last_savepoint_lsn , TDT.database_transaction_commit_lsn AS last_commit_lsn , TDT.database_transaction_last_rollback_lsn AS last_rollback_lsn , TDT.database_transaction_next_undo_lsn AS next_undo_lsn FROM sys.dm_tran_active_transactions AS TAT INNER JOIN sys.dm_tran_database_transactions AS TDT ON TAT.transaction_id = TDT.transaction_id INNER JOIN sys.dm_tran_session_transactions AS TST ON TAT.transaction_id = TST.transaction_id ORDER BY 2, 1;
In my test environment, the results look something like the following. I’ve exported results to Excel for cleanup and have split them for sake of formatting constraints on this site, repeating the first 3 columns and adding a row number for easier reading:
Figure 1: Results - Type and State Information
Figure 2: Results - Physical Log Impact Information
Figure 3: Results - LSN Information
I’ve highlighted the two results that show how you may see more than a single record for one transaction. In this case, I caught the transaction after being initialized, but prior to log files being generated.
I hope the information here helps you to get more insight and understanding of the SQL Server transaction log. It’s a critical component of this platform and has many stories to tell.