Parsing Denormalized Strings with the STRING_SPLIT Function in SQL Server 2016

There is a new string parsing function available to you if you're dealing with SQL Server 2016 databases.

Tim Ford, Owner

March 22, 2017

5 Min Read
Parsing Denormalized Strings with the STRING_SPLIT Function in SQL Server 2016

This Was Not the Article I Intended to Write

This article developed as I was in the process of writing another one on string functions. I had not run across the existence of the new STRING_SPLIT() function that was part of SQL Server 2016 until my research on what will be a couple of future articles.  So long as your target database is in COMPATIBILITY_MODE = 130 you can take advantage of this function.

What advantage is that?” You say?  STRING_SPLIT() is a table-valued function which allows you to parse a delimited string passed in as a parameter (along with a separation character as a second parameter) returning the output column [value] as a table valued result set that presents as a row for each delimited value between each separator character.

STRING_SPLIT Syntax

The syntax for returning results from a function call to STRING_SPLIT is reminiscent of any call to a table-valued function:

SELECT valueFROM STRING_SPLIT( input string varchar(max), separator character varchar(1));

The input string parameter can be one of any of the following:

  • Delimited string value

  • varchar() typed variable assignment

  • Table column when using a CROSS APPLY

Examples for Returning Results from STRING_SPLIT()

Let’s take a look at how each of the aforementioned options for passing an input string parameter into STRING_SPLIT yields results:

Example One: Passing in a String Value:

This is by far the most straight-forward approach requiring simply the input string and the separator character. Using the following example should provide some clarity. Let’s see what happens when we input a list of airports I’ve visited last year as a sample:

SELECT value FROM STRING_SPLIT ('AZO,YYZ,SEA,PDX,SFO,RKV,AMS', ',');
value---------------------------AZOYYZSEAPDXSFORKVAMS

Of course the results are able to be filtered with predicates in the WHERE clause subject to sorting via the ORDER BY clause:

SELECT value FROM STRING_SPLIT ('AZO,YYZ,SEA,PDX,SFO,RKV,AMS', ',')WHERE value LIKE 'A%'ORDER BY value;
value---------------------------AMSAZO

 

Example Two: Normalizing Data using a Variable for the Delimited String… and a Cursor:

We can also declare a variable as varchar(n) or varchar(max) and pass that in as the first parameter and see similar results. This example uses the practical case of cleaning denormalized data from one table through parsing and inserting into a normalized table. I’ll make use of three tables:

  • [events] – storing event data related to SQL Cruise for up-coming events through 2018

  • [event_ports_denormalized] – storing an id column referencing the id column in [events] and a ports_list_csv column which stores a comma-separated list of each port for that event.

  • [event_ports] which will be a normalized version of the [event_ports_denormalized] table.

CREATE TABLE [events]( id INT IDENTITY(1,1),event_name VARCHAR(100));CREATE TABLE [event_ports_denormalized](id INT NOT NULL, ports_list_csv VARCHAR(500) NOT NULL);CREATE TABLE [event_ports](id INT NOT NULL, port_name VARCHAR(50) NOT NULL);

We also need to populate these tables to make sense of the results:

INSERT INTO [events](event_name)VALUES ('SQL Cruise Alaska 2017'), ('SQL Cruise Caribbean 2018'), ('SQL Cruise Alaska 2018') INSERT INTO [event_ports_denormalized](id, ports_list_csv) VALUES (1,'Seattle,Ketchikan,Juneau,Skagway,Victoria'), (2,'Miami,St. Thomas,Tortola,Nassau'), (3,'Seattle,Ketchikan,Juneau,Skagway,Victoria');

The tables’ contents looks like this:

[events]:

id          event_name----------- --------------------------1           SQL Cruise Alaska 20172           SQL Cruise Caribbean 20183           SQL Cruise Alaska 2018

[event_ports_denormalized]:

id          ports_list_csv----------- --------------------------------------------1           Seattle,Ketchikan,Juneau,Skagway,Victoria2           Miami,St. Thomas,Tortola,Nassau3           Seattle,Ketchikan,Juneau,Skagway,Victoria

If we wanted to look at a result set of ports and their associated events the resulting data doesn’t look conducive to further processing and joins with other tables in this state:

SELECT E.event_name, EPD.ports_list_csvFROM [events] AS EINNER JOIN [event_ports_denormalized] AS EPDON E.id = EPD.idORDER BY E.id;
event_name                    ports_list_csv----------------------------- ---------------------------------------------SQL Cruise Alaska 2017        Seattle,Ketchikan,Juneau,Skagway,VictoriaSQL Cruise Caribbean 2018     Miami,St. Thomas,Totola,NassauSQL Cruise Alaska 2018        Seattle,Ketchikan,Juneau,Skagway,Victoria

So it’s fair time we clean this data up so we can use it with other tables in the database. Let’s get to normalizing!

The approach we will take here is to use a cursor to assign each csv value and its associated id to variables to use STRING_SPLIT() to parse the values and insert into the [event_ports] table:

DECLARE @id INTDECLARE @csv VARCHAR(500)DECLARE splitting_cursor CURSOR FOR SELECT id, ports_list_csvFROM [event_ports_denormalized];OPEN splitting_cursorFETCH NEXT FROM splitting_cursor INTO @id, @csvWHILE @@FETCH_STATUS = 0BEGIN    INSERT INTO [event_ports] (id, port_name)SELECT @id, SS.valueFROM STRING_SPLIT(@csv,',') AS SSFETCH NEXT FROM splitting_cursor INTO @id, @csvEND CLOSE splitting_cursor;DEALLOCATE splitting_cursor;Using the following query we can now see a normalized listing of events and the associated ports:SELECT E.event_name, EP.port_nameFROM [events] AS EINNER JOIN [event_ports] AS EPON E.id = EP.idORDER BY E.id;
event_name                    port_name----------------------------- ------------SQL Cruise Alaska 2017        SeattleSQL Cruise Alaska 2017        KetchikanSQL Cruise Alaska 2017        JuneauSQL Cruise Alaska 2017        SkagwaySQL Cruise Alaska 2017        VictoriaSQL Cruise Caribbean 2018     MiamiSQL Cruise Caribbean 2018     St. ThomasSQL Cruise Caribbean 2018     TotolaSQL Cruise Caribbean 2018     NassauSQL Cruise Alaska 2018        SeattleSQL Cruise Alaska 2018        KetchikanSQL Cruise Alaska 2018        JuneauSQL Cruise Alaska 2018        SkagwaySQL Cruise Alaska 2018        Victoria

Example Three: Using table values through a CROSS APPLY (Cursor-less):

I showed you the last process in order to show you how much simpler and lightweight it could be by using a CROSS APPLY.  Let’s take the final step to using STRING_SPLIT() to normalize delimited data by a straight CROSS APPLY bypassing the cursor all together. I’ve truncated the [event_ports] table (take my word for it) and am ready to use the following process to reload the table:

INSERT INTO [event_ports] (id, port_name)SELECT id, SS.valueFROM [event_ports_denormalized] AS EPDCROSS APPLY STRING_SPLIT(EPD.ports_list_csv,',') AS SS;

The results are the same and run lighter with only 4 lines of code:

SELECT E.event_name, EP.port_nameFROM [events] AS E INNER JOIN [event_ports] AS EP  ON E.id = EP.idORDER BY E.id;
event_name                    port_name----------------------------- ------------SQL Cruise Alaska 2017        SeattleSQL Cruise Alaska 2017        KetchikanSQL Cruise Alaska 2017        JuneauSQL Cruise Alaska 2017        SkagwaySQL Cruise Alaska 2017        VictoriaSQL Cruise Caribbean 2018     MiamiSQL Cruise Caribbean 2018     St. ThomasSQL Cruise Caribbean 2018     TotolaSQL Cruise Caribbean 2018     NassauSQL Cruise Alaska 2018        SeattleSQL Cruise Alaska 2018        KetchikanSQL Cruise Alaska 2018        JuneauSQL Cruise Alaska 2018        SkagwaySQL Cruise Alaska 2018        Victoria

NULL or Empty String?

One last thing I want to cover involves what happens when you have two delimiter characters positioned sequentially in the input string for the function. Take the following parameter as an example (noting the double comma sequence between Ketchikan and Juneau):

SELECT valueFROM STRING_SPLIT('Ketchikan,,Juneau,Skagway,Victoria',',');
value--------------KetchikanJuneauSkagwayVictoria(5 row(s) affected)

The value for the row returned for the sequential comma delimiter characters in treated as an empty string and not NULL.

Conclusion

Whether due to age or poorly-coded, many applications and nosql solutions often store comma-separated values (CSVs) in a single column/field. In many cases Microsoft SQL Server becomes involved in this denormalized data through either migration, ETL efforts, or some form of coordinated merge of system data that could benefit from using the STRING_SPLIT() function. The function is, essentially, a string parsing function that normalizes that which isn’t. This opens up new possibilities for easily normalizing data – whether internal to your SQL Server database or through interaction with an outside data source.

About the Author(s)

Tim Ford

Owner, SQL Cruise

http://thesqlagentman.com/

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like