LINQ to SQL in Action: Joining and Grouping

Learn ways to code data-access functionality in your .NET applications

Dino Esposito

March 31, 2009

15 Min Read
LINQ to SQL in Action: Joining and Grouping

Microsoft built the Language-Integrated Query (LINQ) to SQL component in .NET Framework on top of LINQ to achieve one main purpose: Extend the general-purpose query syntax of LINQ to the world of SQL Server databases. As you may know, accessing in-memory data collections, databases, and XML documents today requires a familiarity with a variety of languages and syntax—one for each technology. With LINQ, you can use the same syntax to query different types of data.

The LINQ query engine is extensible enough to support any collection of objects that exposes a made-to-measure interface—the IQueryable interface. This means that any team of developers could, in theory, expose to LINQ their own data and query it through the common syntax—which is what Microsoft did with the LINQ to SQL and LINQ to XML interfaces. LINQ to SQL, in particular, wraps the content of a SQL Server database and makes it queryable through the LINQ syntax. We'll explore LINQ to SQL in depth here, examining the T-SQL basis for LINQ to SQL queries and walking through examples of using LINQ to SQL to code data-access functionality in .NET applications.

LINQ to SQL doesn't work with databases other than SQL Server. So, unlike ADO.NET or industry-standard object-relational mapping tools, you can't use LINQ to SQL to work with, say, Oracle databases. (You can instead load content stored in Oracle databases through DataSets and query over that using the LINQ to DataSet model.) LINQ to SQL still uses T-SQL under the hood to run its queries. LINQ to SQL doesn't attempt to push an alternate route and doesn't aim to replace T-SQL; it simply offers a higher-level set of query tools for developers to leverage. LINQ to SQL is essentially a more modern tool to generate T-SQL dynamically based on the current configuration of some business-specific objects. T-SQL operates on a set of tables, whereas LINQ to SQL operates on an object model created after the original set of tables.

LINQ to SQL opens up SQL Server querying to the larger universe of .NET developers. However, with LINQ to SQL as well as with raw T-SQL, only T-SQL and SQL Server database experts can fine-tune the queries and remove any bottlenecks and performance hits.

Preparing for LINQ to SQL
LINQ to SQL works by exposing an object-based query language to developers and producing T-SQL statements in response to their input. With LINQ to SQL, you don't explicitly use a connection string, nor do you open or close a connection explicitly. All you do is interact with a central console called the data context.

The data context is a class that's ordinarily created by the Microsoft Visual Studio 2008 wizard when you choose to add a new LINQ to SQL class item to the current project, as Figure 1 shows. Adding a LINQ to SQL class to a project displays the Object Relational Designer, through which the developer selects tables, stored procedures, and functions to add to the LINQ to SQL data context.

The completed data context class incorporates references to collections that represent all selected tables. The data context source file also includes the definition of a class that represents an individual row on each selected table. In other words, if you select the Customers and Orders tables from the Northwind database, you'll have in the data context class properties, which the C# code sample in Listing 1 shows.

You'll work against these collections using the LINQ syntax. Any query you perform against the collections will then be translated to T-SQL statements and executed against the database. The data context class takes care of all this for you.

The data context—usually a class derived from the base class DataContext—has a Log property, as the following example shows, which you use to register a stream to capture all T-SQL statements actually generated for each LINQ query to the database.

NorthwindDataContext nwind = new NorthwindDataContext();nwind.Log = Console.Out;

Through the Log property, you can track what really happens under the hood—that is, snoop on generated T-SQL statements—and decide to intervene with changes if you think the T-SQL code could be improved.

Running a Query through LINQ
The LINQ query language works on the LINQ to SQL object model in much the same way as T-SQL works on table rows. To express a simple query that selects customers from a given country and returns them ordered by name, in LINQ you use the following query:

var data = from c in nwind.Customers           where c.Country 

"Spain"            orderby c.CompanyName            select c;

As you can see, in this query there's no explicit reference to T-SQL; furthermore, the only point of contact between LINQ and the outside world is the data collection you work with. That collection is responsible for taking the input data (filter, order) and selecting the results. Being a LINQ to SQL collection, it will do so by running a dynamically generated T-SQL query. As a result, LINQ to SQL offers a platform for you to express entities in the domain space as objects. You can then query entities and update them, using an object-oriented syntax.

The bold elements in the code fragment are context keywords added to C# that map to the underlying LINQ query engine. Their role is analogous to the role played by similar T-SQL keywords. You might wonder why select—the projection operator—appears at the end of the statement and from is instead at the beginning of it. The only reason for having this is the need to enable Visual Studio 2008 IntelliSense on LINQ expressions. Because the projection is expressed as a list of properties on any of the objects referenced in the query, IntelliSense would not in fact be possible if the projection is declared before the actual objects. Consider the following:

var data = from c in nwind.Customers           where c.Country  "Spain"           orderby c.CompanyName           select new { c.CompanyName,            c.ContactName };

With such a select expression at the beginning of the statement, no IntelliSense would be possible as c is not defined yet.

It's interesting to note that LINQ to SQL statements are not run immediately. This is why you need to use a new C# keyword—the var keyword—in the statement. The var keyword (the Dim keyword in Visual Basic) indicates that the developer might not know the data's type at writing time. By using var, the developer delegates the compiler to resolve the type at compile time. The var keyword isn't a sign of weak typing; no weak types are ever used. It is, instead, a sign that some dynamically generated code is going to run and the actual type of results may be anonymous.

Data doesn't flow in the assigned variable right away as the LINQ to SQL statement executes. The execution of the previous statement doesn't fill the variable with any data. Instead, it simply prepares the ground for the execution of the T-SQL statement. The T-SQL statement is generated and configured but isn't executed until the main code stream needs to process resulting data. In other words, as soon as the code consumes the content of the previously mentioned data variable, the T-SQL statement is sent to the database. For example, if you bind the content of the data variable to a control, triggering the data binding process will actually trigger the following T-SQL statement:

GridView1.DataSource = data;GridView1.DataBind();

This lazy loading feature can have some undesired effects if you don't master it properly. For example, consider the code in Listing 2, a code fragment from an ASP.NET page. You might want to run SQL profiler to trace its execution. First, the code attempts to get all orders shipped through the same shipping company. When the execution flow reaches the foreach statement, the T-SQL statement highlighted in Figure 2 runs.

The following is a simplified version of the statement:

SELECT * FROM orders WHERE shipvia=3

You may think that after running the first query to grab all matching records in the Orders table, you're fine and can work safely and effectively with any selected object. Well, not exactly. As Figure 2 shows, many additional queries actually hit the SQL Server database. Why is that? Let's expand on one of them, the exec sp_executesql statement, in Listing 3. All additional queries are similar to the preceding query, and all they do is retrieve information about the customer who placed the order being processed in the foreach loop in Listing 2.

As a result, you run one query to retrieve all the orders, plus one additional query for each selected order to access customer information. This is the default behavior, which you might not have expected.

By changing the parameters for lazy loading and specifying your fetch plan, you can force LINQ to SQL to run a totally different query and, more importantly, to get all order and customer data in a single shot. Here's how you need to modify your code to do so:

DataLoadOptions options = new DataLoadOptions();options.LoadWith(o => o.Customer);dataContext.LoadOptions = options;

By setting the LoadOptions property, you specify a static fetch plan that tells the runtime to load all customer information at the same time order information is loaded. Based on this, the LINQ to SQL engine can optimize the query and retrieve all data in a single statement. Listing 4 shows the SQL Server Profiler detail information for the exec sp_executesql query after the code modification. The query now includes a LEFT OUTER JOIN that loads orders and related customers in a single step.

What does this mean to you? LINQ to SQL queries have their own logic, and the underlying behavior might not be exactly what you expect. LINQ to SQL produces T-SQL statements, but it's a totally different engine—so don't make assumptions about LINQ to SQL behavior before you have verified how it works in practice. LINQ to SQL is a brand-new API for operating on a SQL Server database; it's not just another way of writing T-SQL code. Let's see now how to perform more advanced query that involve joining and grouping operations on data.

Joining Data
It goes without saying that many SQL Server queries are easier to formulate with words than with T-SQL. The more you master the SQL Server language, the easier it could be for you to express queries directly in T-SQL. However, for the vast majority of developers working on data-access components, finding a T-SQL counterpart for the following query could be difficult. Let's consider a query that returns for each customer the total number of orders he placed in given timeframe. The query clearly involves a join between customers and orders. Listing 5 shows how you'd write it using a LINQ to SQL object model and the LINQ query language.

Here's a breakdown of the statement: You perform a JOIN between the Customers table and the subset of the Orders table that contains only the orders placed during January 1997. The JOIN is executed on the CustomerID column, and the results of the JOIN for the single customer record are saved into a group. Next, you simply select the columns you need from the customer and compute a Count operation on the related group of orders.

In LINQ, you can use the select keyword to return an entire object or just a dynamically specified list of fields. In the following simple statement, in fact, you use select to return the entire customer row.

var data = from c in nwind.Customers     select c;

If you want to restrict the projection to just a few fields, you use the select new variation, as follows:

var data = from c in nwind.Customers     select new {c.CustomerID, c.CompanyName};

When you have more complex queries that involve JOIN and GROUPBY operations, you can project on joined fields as well and apply functions (Max, Min, Count, Avg) to groups of data. The syntax of the select new keyword is flexible enough to accommodate an explicit property naming. In the JOIN example shown earlier, you explicitly set the name of the column that reports the total number of orders for a customer to OrderCount.

What about the type of the projected data when you use a select and a select new keyword? In the former case, the returned type is a list of the specified type. For example, the following code will return a List type.

List data = from c in nwind.Customers     select c;

When a select new is used, instead, an anonymous type is returned—so the var keyword is an absolute necessity to delegate to the compiler the task of resolving the type dependency as late as possible. It should be noted, though, that strong typing never yields to weak typing in LINQ to SQL; using the var keyword instructs the compiler to resolve the type of the variable during compilation. For purely illustrative purposes, let's see what the ToString method on the data variable returns.


The variable contains a list of values of an anonymous, unnamed type comprising a collection of string and integer values.

Grouping Data
To group data in a LINQ to SQL expression, you use the group keyword. The companion by keyword, in conjunction with the new keyword, lets you shape the records that go in each group. The variable that receives grouped data will actually get a list of data containers each implementing the IGrouping interface. Let's focus on the expression in Listing 6 first.

The query organizes all customers in groups according to the city, region, and length of the city name. Each group will contain each customer's sole contact name, and empty groups are discarded. Listing 7 shows the system-generated name for the returned data type. (Again, this is for illustrative purposes only; as a developer, you never have to work with such a system-generated name explicitly.)

Each element in the list you store in the data variable is an enumerable type that additionally has a key. The key represents the attribute that's common to each value in the group. In the preceding example, the key is represented by city, region, and a Boolean value indicating whether the length of the city name is greater than 5.

The results obtained from a group-by operation can hardly be used as-is to populate some UI control in an ASP.NET or Windows form. It's therefore likely that you'll build a UI block by processing data in the group programmatically. The example in Listing 8 shows how you'd programmatically access records in the groups.

As Listing 8 shows, you use the Key property on each element in the returned collection to access the information you grouped by. The Key property is an anonymous type defined by the properties you specified through the by new keyword in the query. Next, for each group you can access specific record information in the format that you've indicated via the group keyword. With reference to the preceding example, record information in each group is only a customer's contact name.

An Alternative Data-Access Method
LINQ to SQL offers an alternative model to plan and develop the data-access layer of .NET applications. Through LINQ to SQL, you can realize a brand new data-access layer and use auto-generated types to exchange data with the business and presentation layers. Any query that you express through the LINQ syntax is translated into a T-SQL statement and run. The execution of this statement is transparent to developers, and so is the dynamically generated T-SQL. LINQ to SQL is much more than just a high-level version of T-SQL, but keep in mind that it may not always be able to achieve the same results you can get out of raw T-SQL.

Listing 1: Data Context Class Properties

public partial class DataClasses1DataContext: System.Data.Linq.DataContext{   :   public System.Data.Linq.Table Customers   {get { return this.GetTable(); }   }   public System.Data.Linq.Table Orders   {get { return this.GetTable(); }   }   :}

Listing 2: ASP.NET Code Fragment

protected void Button1_Click(object sender, EventArgs e){    var data = from o in nwind.Orders               where o.ShipVia 

3 select o; foreach (var o in data) { if (o.Freight > 300) SendCustomerNotification(o.Customer); ProcessOrder(o); } GridView1.DataSource = data; GridView1.DataBind(); }

Listing 3: Query Used in exec sp_executesql

exec sp_executesql    N'SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],             [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region],             [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]      FROM [dbo].[Customers] AS [t0]      WHERE [t0].[CustomerID] = @p0',    N'@p0 nvarchar(5)',    @p0=N'PICCO'

Listing 4: Exec sp_executesql Query After Modifying the LoadOptions property

exec sp_executesql     N'SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID],              [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate],              [t0].[ShipVia], [t0].[Freight], [t0].[ShipName],              [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion],              [t0].[ShipPostalCode], [t0].[ShipCountry],              [t2].[CustomerID] AS [CustomerID2], [t2].[CompanyName],              [t2].[ContactName], [t2].[ContactTitle], [t2].[Address],              [t2].[City], [t2].[Region], [t2].[PostalCode],              [t2].[Country], [t2].[Phone], [t2].[Fax]       FROM [dbo].[Orders] AS [t0]       LEFT OUTER JOIN (            SELECT [t1].[CustomerID], [t1].[CompanyName],                   [t1].[ContactName], [t1].[ContactTitle], [t1].[Address],                   [t1].[City], [t1].[Region], [t1].[PostalCode],                   [t1].[Country], [t1].[Phone], [t1].[Fax]            FROM [dbo].[Customers] AS [t1]            ) AS [t2] ON [t2].[CustomerID] = [t0].[CustomerID]       WHERE [t0].[ShipVia] = @p0',N'@p0 int',@p0=3

Listing 5: Sample LINQ Query

var data = from c in nwind.Customers           join o in               (from orderInJan97 in nwind.Orders                where orderInJan97.OrderDate.Value.Year  1997 &&                      orderInJan97.OrderDate.Value.Month == 1                select orderInJan97)                on c.CustomerID equals o.CustomerID into g           select new { c.CompanyName, OrderCount = g.Count() };

Listing 6: Grouping Query

var data = from c in nwind.Customers           group c.ContactName by new {               City = c.City, Region = c.Region, Length = c.City.Length > 5           } into g           where g.Count() > 1           select g;

Listing 7: System-Generated Name for Returned Data Type

System.Collections.Generic.List  `1[System.Linq.IGrouping  `2[f__AnonymousType0     `3[System.String,System.String,System.Boolean], System.String]    ]

Listing 8: Accessing Results of a Group-by Operation

foreach (var group in data){    Console.Writeline(group.Key.City + ", " +                      group.Key.Region);    foreach (var contact in group)    {        Console.Writeline(contact);    }}


Read more about:

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like