Simplify, Parameterize, Reuse

Downloads
26538.zip

User-defined scalar functions (scalar UDFs) are tools that can simplify access to real and contrived date and time data for the end user. In my September TSQL-Tutor article "Datetime for Durable Databases," InstantDoc ID 26170, I demonstrated how to put together a simple scalar UDF that you can use with datetime values. Now, let's look at the ways you can use UDFs to handle more complex date and time data. One opportunity for using a UDF arises when your data is a date and time value but it isn't in datetime format. For example, what if you had to handle a data dump from a system in which all date values are sent as the number of seconds past midnight of a system-defined date such as January 1, 1970? To test the syntax for this problem's examples, you can download the UserDefinedFunctionExamples.sql commented script file at http://www.tsqlsolutions.com, InstantDoc ID 26538.

The first thing you notice when you begin to develop a solution to this type of problem is that the date data is an integer that you must convert to a datetime data type—a relatively easy task because a function that handles this exact problem already exists: DATEADD(). To return the proper date and time, you could use the following statement:

SELECT DATEADD(ss, IntegerValue, BaseDate)

Adding this expression to all your queries doesn't seem like much additional work. However, you can minimize your typing and make your queries easier to use by creating a scalar function that automatically supplies input parameters. Let's use a function called PhoneSystemDate() and supply only one parameter for the input—the integer that needs to be converted to real time. This function returns the current date and time as a datetime value from the supplied integer. For the body of the function, use the DATEADD() function, specifying the component you want to add (seconds, represented as ss) and the base date you want to add this component to (January 1, 1970). Listing 1 creates the PhoneSystemDate scalar UDF. The following three statements are ways you can use the PhoneSystemDate() function. Remember, as I discussed in "Datetime for Durable Databases," you must owner-qualify functions when you use them.

SELECT dbo.PhoneSystemDate(981419317)
SELECT dbo.PhoneSystemDate(981419454)
SELECT dbo.PhoneSystemDate(981419537)
GO

Instead of receiving the date value as an integer, you might receive a date from a formatted string. Other formats used for date and time data relate to formatted strings (numeric values, as we've just discussed). The formatted value can be a character string (e.g., "Jan102002" for January 10, 2002) or a formatted integer (e.g., 20021012 stored in a 4-byte integer column for October 12, 2002). Just as you might need a current date value when an integer is supplied, you might find formatted character strings or formatted numeric values when you need to return a current date value. In some tables, the date and time components reside in one field; in other tables, they're separated into two or more fields. If you need to use these datetime values with datetime functions and also need the code to perform datetime math on the datetime values (e.g., you want to add days or weeks to an existing date), you need to convert them to a datetime datatype first. For example, you might choose to store only the date portion of a datetime value in a database. That database could store September 14, 2002, as the formatted character string 09142002 or as the character string or integer 20020914. You can convert each of these formats into a real datetime value, then pass the converted date value to other datetime functions. In fact, in tables that require both date and time components, you'll often find the date and time components separated and stored as integers. For example, an application can internally use the date and time data more easily when it controls the date values as numeric values. That is, storing the components as integers often gives the application the flexibility to look at times within a day without having to constantly strip out the date component. However, when you choose T-SQL to create reports and access the data, you might need to convert the values back to real datetime values so that you can manipulate the display or perform datetime calculations. Let's work through one of the more difficult scenarios for converting integers to real dates, and I'll leave the date-only character string up to you as homework. (See the T-SQL Project sidebar for your homework assignment.)

When date and time values are stored as integers, you need to convert each value to a string, use a substring to parse the value for the proper components, then reassemble the value as a string that's formatted appropriately as datetime input. You base these steps on the expected format for the DATEFORMAT() function's session setting. If you don't know the client setting for DATEFORMAT(), or if you don't know whether users typically change this setting, you might have a problem. In fact, you can't change session settings within UDFs, so you have to decide whether a function would work more effectively than a stored procedure (which could guarantee the session settings by changing them within the code). However, if users who access the database often change their session settings, using a stored procedure also might be a concern. When session settings aren't consistent, excessive stored-procedure compilations—which can compromise performance—might result. To solve this problem, you could write a stored procedure that changes the session setting to call the UDF. Although consistency is guaranteed, performance might be slowed. In general, I recommend that to achieve consistency in client applications, you should avoid changing session settings. Use the easiest method to achieve your goal. In this case, I recommend that the UDF should use the default setting for DATEFORMAT(). Then you can call the UDF from within a stored procedure to guarantee consistent and accurate use. If you can guarantee consistency from your client applications, you can skip creating the stored procedure.

Listing 2 creates and populates the ItemsRun test table, which is composed of four rows. For each item (i.e., each row), the code defines the ItemID, StartDate, StartTime, EndDate, and EndTime columns. Two groups use these values. Applications use them internally to control and track the time that items started and ended. Developers designed these values so that they could effectively write their own date-and-time manipulation algorithms for display in their custom applications. Users need ad-hoc access to this data to list items in the order in which they ran or the order in which they finished. In addition, users need to calculate the run time of each item, and to do so, they can use the DATEDIFF() system function if they have datetime values to pass in to the DATEDIFF() function. For this example, I'm going to focus on the users' needs and access to the data.

Users need real datetime values. To convert formatted integers to real datetime values, you need to combine the StartDate and StartTime into one datetime value and the EndDate and EndTime into another datetime value. To create the real datetime values for the start of the item and the end time of the item, you can use a UDF. For this function, you can pass in the formatted integer values representing the date and the time, break them down into their individual components for year, month, day, hour, minutes, and seconds, then piece them back together in the correct mdy input format. The formatted integer that stores time uses the 24-hour format, so you don't need to add AM or PM to the string; SQL Server understands times that use either the 24-hour format or AM and PM. Additionally, you can use a large character string (nvarchar(30)) as the return data type so that users can set the format for the final value instead of having to convert it again to change the style. Because display is the most important user need for this datetime data, allowing a large string is helpful. However, when you need to use this string for other purposes (e.g., ordering the data by date), you need to convert it back to a datetime value. Luckily, you won't need to complete this type of conversion very often because many functions such as DATEDIFF() and DATEADD() allow a string as input.

In this case, returning a string gives you flexibility in format and a real datetime value that you can use for input into most system functions. Users can choose any format style for display, but if they want to use this output as input to another datetime function, they must choose a format that's based on their session setting for DATEFORMAT(). Otherwise, their calculations could produce incorrect results. For more details about dealing with datetime data, see "The Datetime Dilemma," June 2002, InstantDoc ID 25173. Generally, coding the function to return a string makes the function more flexible than if you hard-code one final style into it.

To create this function, let's start with a quick review of the information stored in the ItemsRun table. This table tracks the start and end times for each item by storing the start date separately from the start time and the end date separately from the end time. Several options are available for creating a function to deal with this type of data. You can write one function that brings together all these date values and computes the run time (the lapse time between the start date and the end date). Or, you can write a more flexible function that pieces together the date and time values to create the datetime data value for the StartTime or EndTime columns. When you select the flexible function, you can leave the calculations and additional manipulation up to the users or to another function. I usually opt for the more reusable and more flexible function. In this case, let's piece together the individual dates to create a real datetime value. If you want additional homework fun, you can create a function that calls this article's function to compute the run time.

For the first row in Listing 2, 20020901 is the value for StartDate, and 102745 is the value for StartTime. The format for the date value is yyyymmdd, and the format for the time value is hhmmss. Because you need to concatenate the StartDate and StartTime string values and return a datetime value, you need to follow the client's session settings for DATEFORMAT(). Be aware that if you rely on certain configuration settings, users who change their environment settings might create out-of-range problems or incorrect dates. In all the following examples, I expect the mdy default setting for DATEFORMAT(). In addition, I show examples of what happens when users or applications change the default setting.

To create a string as input for a datetime value, you need to reformat the date and time formatted integers. In this example, you need to dissect the integer value and turn it into mm-dd-yyyy format to follow the client's mdy setting. You can use other separators, but for simplicity, let's use the hyphen (-). To create this format you need to piece together the components. To extract the month value and place it in the first position, you need the two characters starting at the fifth character in the integer. Next, you need to add the hyphen separator. To extract the day value and place it in the second position, you need the two characters starting at the seventh character in the integer. Between the second and third positions, you need another hyphen separator. Finally, to extract the year value, you need the four characters starting at the first character in the integer. The end result for the 20020901 integer value should be the string '09-01-2002.' You can separate this value from the time value by placing a space between the two values. Then, the time will follow the space, and colons will separate the time's hours, minutes, and seconds values. The completed date and time string for row 1's starting date and time will be '09-01-2002 10:27:45' and will represent AM, because all times without AM or PM within the string use the 24-hour clock.

The code in Listing 3, page 12, shows the completed function that returns a real datetime value from the two formatted integers in the ItemsRun table. The following examples show how you can use the StartDate and StartTime columns to produce one column of both date and time:

SELECT ItemID,
  dbo.DisplayDateFromINT(StartDate, 
StartTime, 109) AS 'Start Date & Time',
  dbo.DisplayDateFromINT(EndDate,
EndTime, 109) AS 'End Date & Time'
FROM dbo.ItemsRun

Additionally, if you want to see the difference between the two real datetime values in seconds, you can use DATEDIFF()and DisplayDateFromINT together. Calling functions within the input values to another function is completely legitimate. For example, you can make two function calls to DisplayDateFromINT within DATEDIFF(), as the following SELECT statement shows:

SELECT ItemID,
   DATEDIFF(ss,
      dbo.DisplayDateFromINT(StartDate,
 StartTime, 109),
      dbo.DisplayDateFromINT(EndDate, EndTime, 109))
   AS 'Total Time in Seconds'
 FROM dbo.ItemsRun

Notice that in these calls, you didn't need to convert the date values. Style 109 is an appropriate format for inputting a string value to the DATEDIFF() function because it returns a date in mdy format. However, user or application changes to the session settings could cause an out-of-range problem or invalid data values.

If you're concerned about the effect of the DATEFORMAT() session setting because applications or users in your environment change them, review and test the downloadable UserDefinedFunctionExamples.sql sample code that changes the DATEFORMAT() setting from mdy to dmy. This change is one of the most common changes to the DATEFORMAT() setting and produces an interesting effect with some dates. For example, the date January 10 might show up as October 1 because the values for day and month (1 and 10) are valid values for both month and day. Dates such as January 31 cause an overflow because 31 isn't valid for month. Regardless of date, the information is wrong unless you're consistent about the session setting that defines the order in which you list day and month. To head off user changes, you might need to force these session settings by using a stored procedure instead. All you need to do is set the session setting at the beginning of the stored procedure, then call the function from within the stored procedure, as Listing 4 shows.

Note that although explicitly setting the DATEFORMAT() value within a stored procedure usually isn't a problem, I don't recommend changing session settings within stored procedures. Changing some (not all) session settings can cause side effects that compromise performance. For more information about the effects of certain session settings, see the SQL Server Books Online (BOL) topic "SET Options that Affect Results" and the Microsoft article "Troubleshooting Stored Procedure Recompilation" at http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q243586&.

Finally, as I noted earlier, another use of a new datetime value is to order the data based on when the item started running or when the item finished. Because the function returns the value as a string, you need to convert it back to a date so that it ends up in the order you want. If you don't change it back to a date, the string value (April, August, December, February, January, July, June, March, May, November, October, September)—instead of the date value—will order your data alphabetically. To keep the string value from dictating the ordering of the data, you can use the function within the CONVERT() function (which resides in the ORDER BY clause), as the following statement shows:

SELECT ItemID,
   dbo.DisplayDateFromINT(StartDate, StartTime, 109) AS 'Start Date',
   dbo.DisplayDateFromINT(EndDate, EndTime, 109) AS 'End Date'
FROM dbo.ItemsRun
ORDER BY CONVERT(datetime, 
dbo.DisplayDateFromINT(StartDate, StartTime, 109))

Scalar UDFs are reusable and flexible because they can be parameterized. You can use functions within functions, stored procedures, views, and constraints to simplify as well as extend their usage. To encapsulate simple expressions, you can always consider hard-coding the expression in a view, but if you want to reuse the code, a function is a better choice.

Comments

Plain text