Datetime calculations are at the heart of many databases. Every day, programmers face challenges related to the manipulation of datetime data. Nearly every time SQL Server Pro's editors and authors brainstorm about subjects that we should cover in the magazine, we agree about the need to cover datetime-related subjects simply because they're so practical. With that in mind, I'd like to begin a series of articles in which I discuss datetime challenges and calculations. This month, I focus on trimming techniques—that is, returning only the date or the time of a given datetime value.
Datetime Storage Format
One of the most common challenges in working with datetime datatypes (e.g., DATETIME, SMALLDATETIME) in SQL Server is that there's no separation between date and time. But, of course, you often have the need to store just dates or just times. Storing your dates and times in datetime datatypes has several advantages over storing them in other datatypes (e.g., character strings): You get inherent integrity enforcement (i.e., invalid values are rejected), and you can use the datetime functions in your calculations.
When you need to store only dates or only times, the trick is to trim the part you don't need. However, because the datetime datatypes include both a date and a time portion, you won't actually be trimming; rather, in practice, you'll be zeroing the irrelevant part. The storage format that SQL Server uses internally to represent datetime values is two 4-byte integers for DATETIME and two 2-byte integers for SMALLDATETIME. One integer is an offset in terms of days from the base date January 1, 1900, and the other is an offset from midnight (in terms of clockticks for DATETIME—in which one clocktick is 31/3 milliseconds—and in terms of minutes for SMALLDATETIME). When you need to store only dates, you'll store a date at midnight; technically, you'll "zero" the time portion of the datatime value. Knowing that the time portion is always midnight in the values you're manipulating, you can simply ignore it. By doing so, you'll be able to easily work with dates.
Notice what you get when you convert a character string containing only a string representation of a date to DATETIME:
SELECT CAST('20070212' AS DATETIME);
You get the output 2007-02-12 00:00:00.000. SQL Server assumes midnight as the time value. Internally, it stored 0 as the integer representing the offset from midnight. Assuming you specify dates with midnight in the time portion when you store them in a DATETIME column called date_col in a table, when you want to filter rows with a certain date (e.g., February 12, 2007) you'll use the filter
WHERE date_col = '20070212'
The column name date_col that appears to the left of the equals sign is a DATETIME, and the literal (i.e., constant) to the right of the equals sign is a character string (i.e., VARCHAR) that contains only a date. DATETIME has a higher datatype precedence than VARCHAR, so SQL Server will implicitly convert the VARCHAR value to DATETIME. Because no time component was specified in the literal, SQL Server will assume midnight as the time component, and thus there's basis for comparing date_col to a character string that contains only the date component. Similarly, if you want to store only times, you can zero the integer that represents the offset from the base date; in other words, you store the times with the base date. Notice what you get when you convert a character string that contains only a time to DATETIME:
SELECT CAST('01:23:43.210' AS DATETIME);
You get the output 1900-01-01 01:23:43.210. SQL Server assumes the base date as the date value. Internally, it stored 0 as the integer representing the offset from the base date. Assuming you stored times with the base date in a DATETIME column called time_col in a table, when you want to filter rows with a certain time (e.g., 01:23:43.210), you'll use the filter
WHERE time_col = '01:23:43.210'
Again, SQL Server will implicitly convert the literal that appears to the right of the equals sign to DATETIME assuming the base date, and thus the values are comparable.
Extracting Date Only
Now that you understand the storage format of datetime datatypes and the fact that date and time are technically inseparable, you can start handling common calculation needs. Suppose you need to extract only the date portion from a datetime value—for example, GETDATE(), which returns the system's datetime. You need to produce a datetime value with the input date at midnight.
You can perform this calculation in three ways. In the first technique
SELECT CAST( CONVERT(CHAR(8), GETDATE(), 112) -- 'YYYYMMDD' AS DATETIME);
the CONVERT function converts the input datetime value to a character string using style 112 (YYYYMMDD). This style extracts only the date portion from the input value. The CAST function converts the date character string back to DATETIME. When a character string expressed in this format is converted to a datetime datatype, it's independent of any language- or date-related settings that are in effect for your session.
The second technique that lets you set the time portion to midnight is to convert the input datetime value to an integer, subtract 0.50000004, and convert the result back to datetime:
SELECT CAST(CAST(GETDATE()- 0.50000004 AS INT) AS DATETIME);
When a datetime value is converted to an integer, SQL Server returns the offset in terms of days from the base date; the time portion is rounded down to 0 days if it's smaller than or equal to 11:59:59.993 and otherwise up to one day. By subtracting 0.50000004 portion of a day from the input datetime value, you compensate for cases in which the time portion is later than 11:59:59.993, in which case it would have otherwise been rounded up to the next day. When converting an integer to a datetime, SQL Server simply assumes this integer as the offset from the base date, and stores 0 as the other integer representing the offset from midnight. Although this expression is short (and efficient, as I'll demonstrate shortly), I have to say that I feel uneasy with it. I'm not sure I can put my finger on exactly why—maybe because it's too technical, and you can't see datetime-related logic in it.
I like the third technique best of all. I learned it from SQL Server MVP Steve Kass. It's very cool! Here goes:
SELECT DATEADD(day, DATEDIFF(day, '19000101', GETDATE()), '19000101');
The DATEDIFF function calculates the offset in terms of days between the base date— January 1, 1900—and the input date—GETDATE(). Call that offset diff. The DATEADD function adds diff days to the base date. And you have the input date at midnight. The anchor date doesn't have to be the base date of January 1, 1900. The important thing to remember is that you should use the same date in both the DATEDIFF function and the DATEADD function.
Remember that there's no way for you to specify a datetime literal; rather, here you specify a character string (i.e., '19000101') that SQL Server will implicitly convert to a datetime datatype. Similarly, you can specify an integer value representing an anchor date. Remember that converting the integer 0 to a datetime yields the base date at midnight. Bearing this in mind, you can shorten the expression to
SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0);
I ran a test to compare the performance of the three techniques that I've presented. Listing 1 shows the code I used, and Table 1 shows the performance measures. I ran the calculation in a loop of 1,000,000 iterations.
After subtracting the overhead time involved with the code surrounding the actual calculation, the first technique appears to be the slowest, taking. more than twice as long as the second and third techniques. The second technique seems to be the fastest, but it's just a bit faster than the third technique, which I believe to be the most elegant.
Extracting Time Only
The techniques to extract only the time are based on logic that's similar to the logic used for extracting only the date. You zero the date portion, setting it to the base date. I'll start with a calculation that relies on the logic from the third technique in the previous section:
SELECT DATEADD( ms, DATEDIFF( ms, DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0), -- date only GETDATE()), 0);
Notice that the DATEDIFF function calculates the difference in terms of milliseconds between the date-only portion of the input date and the input date. Call it diff. The outer DATEADD function adds diff milliseconds to the base date (represented by the integer 0).
If you want accuracy in terms of seconds instead of milliseconds, simply specify a second datepart instead of millisecond (ms):
SELECT DATEADD( second, DATEDIFF( second, DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0), -- date only GETDATE()), 0);
Of course, you also have the option to use logic similar to the logic used in the first technique—except that here you'll convert the input value to CHAR(14) using style 114 (hh:mi:ss:mmm) if you're after an accuracy of milliseconds:
SELECT CAST( CONVERT(CHAR(12), GETDATE(), 114) -- 'hh:mi:ss:mmm' AS DATETIME);
If you're after an accuracy of seconds, convert to CHAR(8). By doing so, you'll trim the milliseconds portion:
SELECT CAST( CONVERT(CHAR(8), GETDATE(), 114) -- 'hh:mi:ss:mmm' AS DATETIME);
Only the Beginning
I've discussed the storage format that SQL Server uses to store datetime values, focusing on challenges related to the fact that there's no separation between date and time. But this is only the beginning. There are so many more challenges related to datetime manipulation. I'll continue exploring those challenges in the coming months.