Choose Wisely

Five Ways to Search a DataSet

CoverStory

LANGUAGES: VB.NET

ASP.NET VERSIONS: 1.0 | 1.1

 

Choose Wisely

Five Ways to Search a DataSet

 

By Rob Walling

 

DataSets offer luxuries unavailable with connected data retrieval, such as passing results through a Web service; having multiple, related tables in memory; and the ability to perform complex operations on a set of records without re-querying the database. But with these benefits come some complexities that you don t encounter with connected data retrieval, such as having to decide between a number of ways to search within a DataSet. This article compares the features and performance of five approaches to searching a DataSet.

 

Searching a DataSet

A DataSet contains a collection of DataTables, and each DataTable contains a specialized collection of DataRows called a DataRowCollection. In fact, although the term searching a DataSet is the most common way to describe this subject, it is a misnomer. A DataSet has the ability to contain multiple tables, but the approaches used to search a DataSet only search a single DataTable.

 

At the most basic level there are two general approaches to searching a DataTable: searching the DataRowCollection, or creating a DataView and using its built-in search capabilities. For those who aren t familiar with them, a DataView is an object that contains a single view of a DataTable and has additional sorting and filtering capabilities. You ve most likely encountered DataViews when binding information to a DataGrid.

 

Here are the five main approaches to use when searching a DataTable:

  • A For..Next loop
  • DataTable.Select
  • DataRowCollection.Find
  • DataView.RowFilter
  • DataView.FindRows

 

Running a For..Next loop on the DataRowCollection is a brute-force way to search a DataTable. Including this approach in the performance test allows us to see how .NET s built-in search capabilities compare to a classic search method.

 

DataTable.Select is likely the most common method of searching, and can be performed with or without a column index. Select accepts a SQL-style filter expression, an optional SQL-style sort statement, and an optional row state parameter, and returns an array of DataRows matching the filter expression. The filter expression can search multiple columns and perform wildcard searches, just as with standard SQL. Rows are searched in the order of the primary key, if one exists; otherwise they are searched in the order they were added to the DataTable. A demonstration of this method is shown in Figure 1.

 

Dim ds as DataSet = ' Here we would populate the DataSet

Dim rows() as DataRow = ds.Tables(0).Select( _

      "CustomerName='Fred Smith'", "CustomerId ASC")

Dim customerId as Integer = DirectCast(rows(0).Item( _

      "CustomerId", Integer)

Figure 1: Demonstration of DataTable.Select.

 

The third approach, DataRowCollection.Find, returns a single DataRow based on a key parameter of type String. Find searches only the primary key (or keys) of a DataTable and throws an exception if no primary key is defined. Find is overloaded to accept an array of objects for DataTables with multi-column primary keys. A demonstration of this method is shown in Figure 2.

 

Dim ds as DataSet = ' Here we would populate the DataSet

' DataTable must have a primary key or an exception is thrown

Dim row as DataRow = ds.Tables(0).Rows.Find("Fred Smith")

Dim customerId as Integer = DirectCast( _

      row.Item("CustomerId", Integer)

Figure 2: Demonstration of DataRowCollection.Find.

 

The next approach, DataView.RowFilter, requires that a DataView be created. The RowFilter property of the DataView is set to a SQL-style filter expression that designates the desired rows. As with DataTable.Select, the filter expression can search multiple columns and perform wildcard searches. Because an index is created each time the RowFilter property is set, the best approach is to pass all values into the constructor (see the sidebar Creating Indexes on a DataSet ). The resulting DataRowViews can be accessed using the DataView s Items collection, as shown in Figure 3.

 

Dim ds as DataSet = ' Here we would populate the DataSet

Dim dv As DataView = New DataView(ds.Tables(0), _

"CustomerName='Fred Smith'", "CustomerId ASC", _

 DataViewRowState.CurrentRows)

Dim customerId As Integer = DirectCast( _

      dv.Item(0).Item("CustomerId"), Integer)

Figure 3: Demonstration of DataView.RowFilter.

 

The final approach uses the DataView s FindRows method (see Figure 4). This method is similar to the DataRowCollection s Find method, but it returns an array of DataRowView objects instead of an array of DataRows. FindRows requires that the DataView be sorted on the search column, because FindRows requires an index (again, see the sidebar). The sort expression is a SQL-style sort string.

 

Dim ds as DataSet = ' Here we would populate the DataSet

Dim dv As DataView = New DataView(ds.Tables(0), Nothing, _

      "CustomerId ASC", DataViewRowState.CurrentRows)

Dim rowViews() As DataRowView = dv.FindRows("Fred Smith")

Dim customerId As Integer = DirectCast( _

      rowViews(0).Item("CustomerId"), Integer)

Figure 4: Demonstration of DataView.FindRows.

 

The DataView also has a method called Find that returns an integer index of the selected row. Because Find uses the same search mechanism as FindRows, it was not included in this comparison.

 

The Performance Test

To compare the performance of these approaches not only to each other, but to executing a SQL query based on the same search criteria, I created a simple Web application to process and display the results (see Figure 5).

 


Figure 5: A simple application to process and display the performance test results.

 

The table script shown in Figure 6 supplied the data used in testing the six approaches. 50,000 rows of data were inserted, with the CustomerName consisting of the generic string CustomerName followed by the numeric CustomerId. The code sample includes a button that can populate this table.

 

CREATE TABLE Customers (

 CustomerId int IDENTITY (1, 1) NOT NULL ,

 CustomerName varchar (50) NULL

) ON PRIMARY

Figure 6: Customers table creation script.

 

Four tests were run that each pulled a specific number of rows (50, 500, 5,000, and 50,000) from the Customers table into a DataSet and called each search method 100 times. Each test was run three times, with the aspnet_wp process killed between each test to clear the cache. The three trials were averaged; the results can be seen in Figures 7-9. Next to each rank is how many times slower that approach was compared to the fastest method for that quantity of rows.

 

At 50 rows, the For..Next loop is a surprising winner, coming in almost 10 times faster than the slowest approach, DataTable.Select. DataRowCollection.Find is a close second, followed by DataView.FindRows as a close third.

 

50 Rows

Time

Rank

SqlDataReader

0.0467339

4th (7x slower)

For..Next loop

0.0066763

1st

DataTable.Select

0.0634245

5th (9.5x slower)

DataRowCollection.Find

0.0100144

2nd (1.5x slower)

DataView.RowFilter

0.0467339

4th (7x slower)

DataView.FindRows

0.0200288

3rd (3x slower)

Figure 7: Results for 50 rows.

 

For 500 and 5,000 rows, because DataRowCollection.Find came in at 0.0000000 seconds, the relative speeds cannot be calculated (see Figure 8). I re-ran the tests a number of times and consistently received the same result. It s obvious that several of these methods use some form of caching, because SqlDataReader, DataTable.Select, and DataRowCollection.Find actually performed better with 500 rows than with 50.

 

500 Rows

Time

Rank

SqlDataReader

0.0333813

4th

For..Next loop

0.0267051

3rd

DataTable.Select

0.0100144

2nd

DataRowCollection.Find

0.0000000

1st

DataView.RowFilter

0.1335253

6th

DataView.FindRows

0.0901296

5th

 

5,000 Rows

Time

Rank

SqlDataReader

0.0333813

3rd

For..Next loop

0.3171227

4th

DataTable.Select

0.0166907

2nd

DataRowCollection.Find

0.0000000

1st

DataView.RowFilter

1.3285771

6th

DataView.FindRows

1.1383035

5th

Figure 8: Results for 500 and 5,000 rows.

 

At 50,000 rows, DataRowCollection.Find is a commanding leader, with the SqlDataReader a surprisingly close second (see Figure 9). DataView.RowFilter and DataView.FindRows performed dramatically worse as the row count increased, most likely because of the creation of a new DataView during each iteration of the loop. This instantiation was included within the loop in order to mimic the performance of a high-traffic Web site where separate pages make repeated search requests. The results would likely change if the DataView was created outside of the loop.

 

50,000 Rows

Time

Rank

SqlDataReader

0.0300432

2nd (1.5x slower)

For..Next loop

3.0710827

4th (153x slower)

DataTable.Select

0.1402016

3rd (7x slower)

DataRowCollection.Find

0.0200288

1st

DataView.RowFilter

13.1923029

5th (659x slower)

DataView.FindRows

15.6424928

6th (781x slower)

Figure 9: Results for 50,000 rows.

 

There are some points to keep in mind when examining these results:

  • This test searched for Integer values. Searching for Strings or Classes could change the results.
  • Because of the way the Customers table is populated, the CustomerId column is in ascending order. This means that index and primary key creation is faster than it would be under normal circumstances.

 

Recommendations for Searching

The table shown in Figure 10 summarizes the features and performance of the tested approaches.

 

 

Requires Index

Multiple Columns

Accepts Wildcards

Rank @ 50 rows

Rank @ 500 rows

Rank @ 5,000 rows

Rank @ 50,000 rows

DataRowCollection.Find

Yes

No

No

2nd

1st

1st

1st

DataTable.Select

No

Yes

Yes

5th

2nd

2nd

3rd

SqlDataReader

No

Yes

Yes

4th

4th

3rd

2nd

For..Next loop

No

Yes

No

1st

3rd

4th

4th

DataView.RowFilter

Yes

Yes

Yes

4th

6th

6th

5th

DataView.FindRows

Yes

No

No

3rd

5th

5th

6th

Figure 10: Summarized search comparison.

 

Based on the results shown in Figure 10, there are two questions that will help you determine which approach to use when searching a DataTable:

1)     Are you certain that you will always have a small number of rows and you don t need to use wildcards?

YES: Use a For..Next loop.

NO: See Question 2.

2)     Do you need to search multiple columns or use wildcards?

YES: Use DataTable.Select.

NO: Use DataRowCollection.Find

 

Further Work

Many variations of this test could be explored. One interesting variation would be to search for a non-primary key value, such as the Customer Name. Another would be to search for a record somewhere in the middle of the DataTable, instead of at the end.

 

Conclusion

This article took a detailed look at several approaches for searching a DataTable. Although the For..Next loop got out to an early lead, DataRowCollection.Find quickly showed its speed with 500 rows and beyond. In the end, these tools have different features and performance curves, and it s up to the developer to choose the tool that best fits the task at hand.

 

The sample code in this article is available for download.

 

Rob Walling is a Microsoft Certified Application Developer (MCAD.NET) with five years of development experience. His areas of expertise include ASP.NET, VB.NET, and Web application architecture. His technical articles appear on various .NET-related Web sites. You can reach him at mailto:[email protected].

 

Creating Indexes on a DataSet

Two of the approaches, DataRowCollection.Find and DataView.FindRows, require that indexes exist on the columns to be searched. Although you cannot explicitly create indexes in a DataTable, there are two indirect methods for creating them:

1)     Designate a column or set of columns as the primary key; ADO.NET automatically creates indexes on a DataTable primary key.

2)     Create a DataView from the DataTable and populate its Sort property; DataViews automatically generate indexes on their sorted columns.

 

A word of warning when dealing with DataView indexes: ADO.NET generates an index each time the RowFilter, Sort, or RowStateFilter properties of the DataView are set. To avoid unnecessarily re-creating indexes, use the DataView constructor that accepts a DataTable, RowFilter, Sort expression, and RowState argument instead of setting these properties individually after instantiation.

 

Additional Resources

 

 

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish