New Features in LINQ

A couple of years ago I wrote an initial introduction to Language-Integrated Query (LINQ) to SQL (see “LINQ to Your SQL Server Data”). One of LINQ’s original goals was to let application programmers replace the use of T-SQL with Visual Studio and LINQ queries that could be converted into T-SQL–based queries. LINQ to SQL soon fell by the wayside as Microsoft pushed its ADO.NET Entity Framework as the up-and-coming data access technology. However, LINQ still has a place in the application development architecture, and Microsoft continues to support the LINQ to SQL subset—albeit with few, if any, enhancements.

Although Microsoft is no longer moving forward with the specific subset of LINQ that focuses on SQL Server and the integration with a graphical table designer, the core of LINQ to SQL is still supported. In this article I review the core features of LINQ to SQL and discuss some of the new roles of LINQ in the typical application architecture. I take a look at new LINQ features, such as Parallel LINQ and the use of LINQ in SharePoint.

LINQ Capabilities

Although Microsoft isn’t moving forward with the portion of LINQ that generates T-SQL code to run against SQL Server, the company still supports LINQ for processing query results. LINQ tends to be referred to in the context of a second technology group—the provider—such as LINQ to XML, LINQ to SQL, or LINQ to Entities. Each of these categories builds on the core LINQ syntax, and the differences between categories (e.g., LINQ to DataSets versus LINQ to SQL) can be subtle. A developer might initially suggest using LINQ to SQL even if he or she actually means LINQ to Entities or LINQ to DataSets. Some developers will continue leveraging LINQ to SQL as a familiar tool in certain situations, although LINQ’s larger focus is on data already returned from SQL Server. The area in which LINQ continues to have a robust role is in processing data in the form of entities and the results from processing stored procedures against a database.

As Figure 1 shows, LINQ provides six standard methods for accessing data (through Objects, Entities, SQL, DataSets, CAML, and XML). Of course there are literally dozens of third-party providers for LINQ, which is one of the strengths of LINQ as a technology. For SQL access, Microsoft provides LINQ to SQL. Microsoft also supports other technologies such as straight ADO.NET, the Entity Framework, and WCF data services for database access. Although most of these technologies have unique capabilities, several of them also overlap at some level. For example, the capabilities of LINQ to SQL are essentially a subset of the Entity Framework 4.0 capabilities. However, the Entity Framework was designed from the ground up to provide a more abstract layer than what's provided by LINQ to SQL. The Entity Framework abstraction adds some complexity but with the benefit of greater adaptability.

Each of these data access technologies has different characteristics that make its use appropriate. Some of those characteristics are technical—for example, LINQ to SQL supports only SQL Server databases; if you need to query an Oracle database, LINQ to SQL isn’t an option. Others depend on application characteristics—for example, if you want to create abstractions of the core tables (i.e., convert the relational data into an object model), the Entity Framework is much more powerful than the other options, particularly LINQ to SQL.

LINQ to SQL

The logical starting point for getting into the nuts and bolts of LINQ and SQL Server is the core LINQ to SQL capability. Unlike old database technologies (e.g., Data Access Objects—DAOs), although Microsoft hasn’t been developing enhancements for LINQ to SQL, the company hasn’t completely abandoned it. For example, the version of LINQ that shipped with .NET Framework 4.0 includes approximately 50 changes, with a couple of them being potentially ground-breaking updates. (For more information, see Damien Guard’s blog post “LINQ to SQL changes in .NET 4.0.”) However, if you read the list, you’ll see that most of the enhancements focus on addressing issues rather than improving the tool or adding new data type support. As Guard notes in his blog, Microsoft’s official policy is that the company will improve the core of LINQ and might implement some customer requests, but the Entity Framework is the company’s primary focus.

Clarifying the true core LINQ to SQL technologies is important. LINQ to SQL has essentially two execution models. One such model is its support for the dynamic creation and submission of T-SQL queries to SQL Server. The second model is the execution of existing T-SQL such as stored procedures from which it returns a data set. The generation and execution of T-SQL commands is the heart of what's unique and might be better described as LINQ to SQL Server. The logic used is specific to SQL Server as opposed to any other database management system (DBMS) and is part of the reason LINQ to SQL (unlike the Entity Framework) is SQL Server–specific. However, the core of this logic is really tied to the generation of dynamic SQL and was considered one of the initial strengths of LINQ.

LINQ to SQL’s unique element isn’t its use of a data context, but rather what LINQ is doing as part of a dynamic query. For example, when you query a SQL Server data table with a LINQ query, LINQ actually generates T-SQL and submits that query to the database. However, if you instead use LINQ against the same context to execute a stored procedure, there’s no T-SQL to generate. Instead, your query will process the return from the stored procedure—a data set. Thus, although most people associate the use of stored procedures with LINQ to SQL and we talk about it and implement it in that context, the results of a query using a stored procedure could just as easily be run from an Entity Framework context or even from straight ADO.NET. The code is actually using LINQ to Datasets as opposed to LINQ to SQL.

Because the DataContext object (e.g., System.Data.Linq.DataContext) lets you reference a stored procedure, the initial and functional impression is that LINQ supports stored procedures. However, just using LINQ to execute a stored procedure is overkill. Some application developers might have only stored procedure access to their databases, in which case LINQ to SQL probably isn’t the best data access technology. You would probably leverage the Entity Framework in a .NET Framework 4.0 application, using the Entity Framework for other tasks.

Another scenario you might face is that you’re using .NET Framework 3.5 or 3.0 but aren’t yet on .NET Framework 4.0. In that case the Entity Framework is less than optimal because although previous versions support the Entity Framework, it isn’t really a viable choice until .NET Framework 4.0. You might opt to work with LINQ in such a case, or you might prefer to work with ADO.NET and data sets instead, leveraging LINQ in the middle tier.

On the other side, there’s the question of moving forward. What if you’ve already invested heavily in LINQ—are you stranded? The answer is no. When you’re ready to move forward, you should migrate to the Entity Framework and create a new data context based on your entities. As long as you have entities that match the tables you used with LINQ to SQL, you can simply change the data context for your LINQ query. Although the process isn’t automatic, taken in the larger context of moving to the Entity Framework the transition isn’t too painful and lets you continue to leverage the effort spent creating your LINQ to SQL queries.

Parallel LINQ (PLINQ)

Parallel extensions for LINQ are a set of commands that can be used to let LINQ process large quantities of data in parallel. A comprehensive discussion of PLINQ is beyond the scope of this article, but it’s important to address where PLINQ intersects with LINQ to SQL. For example, you might wonder whether parallel extensions let LINQ generate multiple queries against SQL Server. The short answer is no.

PLINQ queries are a set of extensions you can add to any LINQ query, even LINQ to SQL queries. However, PLINQ targets LINQ objects in memory. Thus, adding PLINQ syntax to a LINQ to SQL query has no beneficial effect. Keep in mind that PLINQ is optimized for LINQ to Objects and LINQ to XML, which are versions of LINQ that are wholly in memory. Although you might see a benefit with a different provider, these providers have been optimized for PLINQ.

As with any tool that involves parallel processing, PLINQ should be used with care. If you’re expecting results in a specific order, note that using a parallel query can change the order. PLINQ can even change which results are returned if you combine it with a filter such as Top. Use PLINQ extensions with care on existing queries.

Adding the AsParallel option tells the processor to try to run the query in parallel—but doing so requires a certain amount of processing. If the amount of processing to break a query across multiple threads is greater than the time needed to process the query, performance decreases. Note that the parallel engine attempts to detect this scenario and will run a query sequentially if it detects that there would be little or no performance gain. However, this solution still isn’t perfect and does have an associated cost.

To make a query run in parallel, simply call .AsParallel() when you create the query.

var orderList = from order in currentDataContext.AsParallel()
        where order.value > 100
        select order;

The .AsParallel() extension includes a series of operators that let you further define how the query should run. For example, using the .WithDegreeOfParallelism(#) operator instructs the library to limit the number of threads to the number provided.

var orderList = from order in currentDataContext.AsParallel().WithDegreeOfParallelism(2)
        where order.value > 100
        select order;

PLINQ extensions are targeted at providers such as LINQ to Objects and LINQ to XML that work in memory. They’re implemented as extension methods and as such are available with any LINQ provider. However, because PLINQ extensions don’t interact within the core of the LINQ provider, they don’t provide custom capabilities for a provider such as LINQ to SQL. PLINQ spreads the processing of in-memory query execution across multiple threads. For more information about PLINQ, see Microsoft’s Introduction to PLINQ page.

LINQ in SharePoint

Developers no longer need to learn Collaborative Application Markup Language (CAML) to create custom data views in SharePoint 2010; they can now use LINQ. The underpinning of LINQ in SharePoint is the ability to generate CAML queries using the LINQ syntax, making it much easier to work with custom SharePoint data. SharePoint’s use continues to spread within and across organizations. Given the fact that SharePoint’s Business Data Connectivity service integrates with SQL Server, you might soon have to integrate your data directly into SharePoint.

SharePoint 2010 introduces the concepts of server-side and client-side code, which have different object models. In theory, you should work on the server-side code because it’s important to screen data before it’s sent to clients.

Recommended solutions for using client-side code involve using the SharePoint client object model. This set of objects supports two sources for accessing data: the SPQuery object and several named collection objects within the object model. The first, SPQuery, lets you define a CAML query. Unfortunately, by definition this object’s use of CAML indicates that it doesn’t support the LINQ syntax.

Most of the other objects within the client object model support the IEnumerable(Of T)—aka IEnumerable—interface. Any object that supports this interface can be part of a LINQ to Objects query. However, this capability isn’t new, and LINQ to Objects is outside the scope of this article. Although you can use LINQ against data in the SharePoint client object model, doing so is a function of the existing LINQ to Objects implementation.

On the server, you can leverage the objects in the Microsoft.SharePoint.Linq namespace. This namespace implements a LINQ to CAML provider. The code runs on the server, which is where you need to screen data before displaying it. The core capabilities are similar to those of LINQ to SQL. Instead of using a data context that points at a database, you can use one that points to a URL within your SharePoint site. The return from this URL is a list, or more specifically an instance of an EntityList object. The EntityList object is a strongly typed generic class that represents a SharePoint list.

To use the LINQ to CAML provider, you must create an instance of your data context, then call GetList to get the EntityList. After you have this list you can create standard queries against it and return the contents of the list, or even insert the contents into a different list. Listing 1 contains the code to create a DataContext from a list of orders, which can then be queried for orders only from a specific store, which are then inserted into the context of the list shown to the store.

As you can see in Listing 1, the context represents your SharePoint site. Each list can be retrieved by name, but you must know and define the type of object used in the list. In this case, because I have control, I used the same type of object within my target list. However, if the Store Orders list had a unique object OrderDetail instead of Order, then within the foreach loop in the listing I could create a new object of that type, assign the appropriate values (perhaps omitting details the store shouldn’t see), and insert my new object into the ordersPerStore object.

Obviously, working with SharePoint involves more detail in terms of setting up a list and understanding the underlying types. However, as a method for working with data in SharePoint, LINQ provides a much richer interface than trying to perform the same queries in CAML.

Still a Viable Solution

LINQ is a powerful tool for working with SQL Server data. Just because Microsoft currently has the core of LINQ to SQL primarily in maintenance mode doesn’t mean that you shouldn’t continue to use LINQ to SQL. More importantly, you can still expect to encounter the LINQ syntax even if it isn’t used in a LINQ to SQL context. LINQ to SQL is a still a viable tool for 1:1 mapping of tables and stored procedures within a SQL Server database. But don’t expect a future release to include the graphical designer or bidirectional database structure updates.

LINQ, when used with relational data, has transitioned from focusing on translating an application development syntax such as C# into dynamic parameterized T-SQL. The Q in LINQ has always stood for Query—and working with query results is where LINQ continues to shine. The generation of create, update, and delete statements is better left to tools such as Entity Framework that can focus on abstracting the mapping of relational structures.

The introduction of LINQ for processing data within SharePoint and as part of PLINQ shows its versatility. Although LINQ to SQL isn’t getting new features, experienced application developers can still take advantage of its original capabilities. None of the newer technologies truly make those core elements obsolete. If you’re already using LINQ (as I have been for a while), there’s no reason to stop. If you need a particular functionality that isn’t part of the core LINQ to SQL package, you can always leverage stored procedures and the Entity Framework to manipulate your data store, while using LINQ to screen and manipulate the data sent to and from these existing tools.

Comments

Plain text