Skip navigation

T-SQL UDF Quickly Determines How Often a Substring Appears

Downloads
96786.zip

Executive Summary:

Microsoft SQL Server 2005 or Microsoft SQL Server 2000 database administrators sometimes come across the problem of hitting the 900-bytes limit for unique constraints. To work around the 900-byte limit, Microsoft SQL Server 2005 or Microsoft SQL Server 2000 database administrators can create an additional computed column that gives the checksum value of the string column. Because Checksum is an int data type, a unique index for the computed column can be created


I created a function, dbo.occurrences, that calculates how many times a substring appears in a source string. Rather than looping through the source string and checking whether the substring still exists in the part of the string that hasn’t been scanned yet, dbo.occurrences performs a simple arithmetic calculation. The number of occurrences of a substring within a string is expressed by the formula

Occur (s,sub ) =
\{ len(s) – len (replace(s,sub,’’) \}
/ len(sub) \{len(sub) 0\}

This formula calculates the number of occurrences of a string within a string by first replacing each substring inside the source string with an empty string. Next, the formula finds the difference between the length of the original source string and the length of the resulting string after you perform the replacement operation just described. Finally, the difference is divided by the length of the substring, providing the substring isn’t empty. Listing 1 shows how dbo.occurrences implements this formula.

To use dbo.occurrences, you pass in a source string and a substring as varchar(max) parameters. For example, if you want to see how many times the letter i is in the string Eli Leiba is the king, you use the call

select dbo.occurrences
(‘Eli Leiba is the king’,’i’)

which produces the result of 4. If you want to see how many times the substring ll is in the string Hello tell me hello, you use the call

select dbo.occurrences
(‘Hello tell me hello’, ‘ll’)

which produces the result of 3. If you try other pairs of source strings and substrings, you’ll realize that this simple calculation does the job!

The Occurrences.sql file contains the dbo.occurrences function. You can download Occurrences.sql from SQL Server Magazine’s Web site. (Go to www .sqlmag.com, enter 96786 in the InstantDoc ID text box, then click the 96786.zip hotlink.) I tested this function on SQL Server 2005, SP1. To make it work on SQL Server 2000, you need to use varchar(8000) instead of varchar(max) for the two parameters’ type.

—Eli Leiba, Senior Application DBA, Israel Electric Company

TAGS: SQL
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish