LINQ Deep Dive: Part II

We discussed in the first part of this series some of the different behaviors of LINQ queries in different domains, such as LINQ to Objects and LINQ to SQL. LINQ to SQL interprets the query differently and sometimes returns different results as compared to when you execute the query using LINQ to Objects. The behaviors of a LINQ query in different domains can significantly impact performance, output, and memory consumption.

We discussed in Part I how to apply joins and filter on a LINQ to Objects query in an efficient way to yield maximum performance. We also learned how to delay load a particular column in a table and fetch the column only when it is necessary so you don t incur the cost of bringing a column with huge content, such as a binary or varchar(max) column. We ll continue our exploration in this second installment by introducing users to the caching feature offered by the object tracking service and examining on which scenarios caching is used and how it ensures object reference consistency. We ll also discuss how LINQ to SQL automatically maps results returned to entities from dynamic SQL, as well as how a non-default constructor can prevent composing queries from other queries. We ll also learn to understand and identify if a query is being sent to a database for execution or if it is being applied in memory.

Tip One: Query Based on a Primary Key Optimized for Database Roundtrips

LINQ to SQL has its own caching mechanism that it uses to ensure optimistic concurrency. If you retrieve an object from the DataContext, the LINQ to SQL object tracking service starts tracking the object. The tracking service caches the object based on a primary key. The caching mechanism is not actually there to improve performance of the query, but rather to provide consistency with the original results you obtained; for instance, if you retrieved a customer object from a DataContext and modified a few properties on the object. This makes the customer object dirty in memory, and marked for update. However, the update has not been sent to the database yet. If you were to query for the same customer again on the same DataContext, you should get the object you modified, not a fresh copy from the database (which will not have your changes). If LINQ to SQL did not maintain a cached version of the object, you could end up losing the changes if you queried for the object again.

Not only does caching behavior offer consistency with the results, but also you get consistency in terms of object references. For example, when you retrieve the customer initially, the object reference given by the DataContext would be the same no matter how many times you query the DataContext for the same customer again. Getting the same object back offers a clean programming model and helps avoid bugs introduced in the application because of object comparisons that would not be valid if LINQ to SQL did not give us back the same entity on which we originally worked.

Does that mean we do not incur the cost of a database roundtrip if we query for an object that initially had been fetched from the same DataContext? The answer is, it depends on how you are querying. If you query based on any column other than the primary key, regardless of which LINQ query operator you use, LINQ to SQL will make a database call. When it comes to giving you back the object, LINQ to SQL will check in the object tracking service to see if it is tracking the object that was returned from the query executed based on the primary key. If it finds an object in the tracking service that has the primary key as the one returned from the query sent to the database, LINQ to SQL will give you the object from the tracking repository. This means you ll still incur the database hit, regardless of whether LINQ to SQL has fetched the object initially and is tracking it in its tracking repository. However, if you query using a Single or First operator using a primary key, LINQ to SQL will first check in its object tracking service to see if it can find an object that has the same primary key on which you are searching. If a match is found, LINQ to SQL will immediately return the object without bothering to go to the database.

Figure 1 shows I used the Single operator to retrieve the ALFKI customer based on the primary key column. The next time I query for the ALFKI customer using the Single operator based on the primary key, LINQ to SQL won t go to the database, but simply will return the object from the tracking service. The First operator used with a primary key column offers the same benefit; it, too, simply returns the object from the tracking service. The last query in Figure 1 is what makes the second database call. Despite the fact that we are using the Single operator, LINQ to SQL makes a call to the database. Because the contact name of Maria Anders belongs to the customer ID of ALFKI, and object services was tracking the ALFKI customer, you end up with the same customer reference. This confirms our behavior that if you don t query based on a primary key, LINQ to SQL will always make a call to the database. On our output window we are printing the test to see if all four queries return the same object reference; indeed, the result confirms that all queries got back the same object.

Figure 1: Single and First operators do not go back to the database if the object has been initially fetched

Tip Two: Order Lazy-loaded Child Collections on the Database

If you ve worked with other OR mappers on the market, you must ve come across this issue several times. One of the constraints I ve encountered in some OR mappers is that there is no clean way to define how to sort child collections based on a certain column. For example, if I have a customer instance in my hand and I want to get access to its orders, I simply can navigate to the Orders property of the customer. But what if I want those orders to be sorted by ShipCity? In LINQ, you can apply the Order By operator on the Orders collection for the customer. But does that Order By operation get executed on SQL Server? No. As soon as we access the Orders property of the customer, LINQ to SQL fetches all the orders for the customer in memory. From there on, any operations you perform will get executed in memory.

Figure 2 shows that orders for the ALFKI customer got sorted based on ShipCity. However, looking at the SQL capture, it s apparent that the order operation was done in memory because our SQL statement does not include any Order By operations. By accessing the Orders collection in this fashion, you don t get a chance to give any hints to LINQ to SQL that you want the orders to be sorted by the ship city on SQL Server.

Figure 2: An Order By operator on the Orders collection gets applied in memory and is not very efficient

Filtering child collections in LINQ to SQL can be accomplished by using the AssociateWith method of the DataLoadOptions class. The AssociateWith method takes in a lambda statement that not only can define how the child collection needs to be filtered, but also specify how the child collection needs to be sorted. Figure 3 uses the AssociateWith method to sort the orders for a given customer by ShipCity. To confirm the sort, ShipCity was applied at the database level; the SQL capture is also recorded, which includes the Order By clause for ShipCity.

Figure 3: Confirms the sort order gets executed by the database

The AssociateWith method is only available in LINQ to SQL. If you are going to be using Entity Framework, you can use the CreateSourceQuery method available on Entity Collection and Entity Reference. CreateSourceQuery returns an ObjectQuery, which gets converted to SQL and is sent to the database. If you want your child collections to be loaded differently, such as sorted in a different order or apply a filter, you must access the ObjectQuery instance that is responsible for loading the child subcollection, then modify the query to add ordering before ObjectQuery gets executed. Figure 4 shows an example of applying to a child collection a sort operation using Entity Framework.

Figure 4: Applying a sort on the Orders collection for the ALFKI customer using Entity Framework

Tip Three: Avoid Non-default Constructors in LINQ to SQL Queries

You must use an object initializer if you are going to be composing queries from other queries and want the entire query to be executed on the database. LINQ to SQL cannot convert non-default constructors into SQL that SQL Server can understand. Therefore, it is preferred that you use non-default constructors to ensure that the entire query is sent to the database and no part of the query is performed in memory. If the query uses non-default constructors in projection, when you try to apply further composition such as adding a where clause, you ll get a runtime error saying the translation is not supported.

Figure 5 shows I joined products, categories, and suppliers and projected the output to the ProductSummary class. Notice I used a non-default constructor of the ProductSummary class that takes CategoryID and SupplierID as the constructor parameters. Further, I filtered the projected results by CategoryId of 1. Upon iterating the final query, we get an exception saying the query is not supported. The reason, as we discussed earlier, is the use of a non-default constructor, which cannot be translated in SQL syntax by LINQ to SQL.

Figure 5: Using non-default constructors raises an exception if you try to further compose queries from the original query

What if you get an API that has no default constructor and you are forced to use the class with its non-default constructor? In that case, you can force the rest of the query to be executed in memory by using the AsEnumerable operator available in LINQ. As shown in Figure 6, I m using the AsEnumerable operator before I apply the CategoryId filter. This hint tells the compiler that any query operation performed after AsEnumerable must be done in memory, thus avoiding the non-default constructor exception.

Figure 6: Using the AsEnumerable operation ensures we don t get an exception in our LINQ query

If you want your entire query to be sent to SQL Server, including your category filter, you must avoid using a non-default constructor. Figure 7 shows the correct version of the query, which, in its entirety, gets executed by SQL Server (as confirmed by the SQL capture). Notice we are not calling any constructor that defaults to using a default constructor with no parameters.

Figure 7: Entire query gets executed in SQL Server because we are using a default constructor

Tip Four: Eager Load a Self-referencing Table Using LINQ to SQL

There are many places where you need to represent your table records in a hierarchy. A real-world example would be where an employee has a manager. Both the employee and the manager are stored in the same table. However, to find out the manager for an employee, you would have an extra column, like ManagerId, that represents the ID of the employee to whom a particular employee reports.

Another example would be an e-commerce database with categories that each have subcategories. To identify the parent for a category, you would have a column, like ParentCategoryId, which represents the parent category for a category. If ParentCategoryId is null, then that category is considered the top-level category. If you have a situation where you need to load a category and all its subcategories any level deep in one single database call, there is no way to get that working in LINQ in a clean fashion. You cannot use DataLoadOptions to eagerly load a self-referencing table. If you were to try, you would get an exception saying cycles not allowed.

Figure 8 shows the database diagram for the Category and Products tables. Notice I have a category relationship pointing back to itself, which merely describes that each category has subcategories within it. The Category table also has a one-to-many relationship with Products because products belong to a particular category. In Figure 8, we retrieve top-level categories by applying a filter where the parent category instance is null. The C# syntax of comparing objects to null references is translated by LINQ to SQL to a null comparison of a foreign key column.

Figure 8: Eagerly loading a self-referencing table raises an exception because LINQ to SQL does not allow loading cycles

To eagerly load the first-level subcategories of a category, I used DataLoadOptions and passed in the lambda expression to load the Categories navigation property. I got an exception saying LINQ to SQL does not support loading cycles when I execute the LINQ query. What it essentially boils down to is you cannot eagerly load a self-referencing table. From a functionality point of view, I can understand why LINQ to SQL cannot arbitrarily load any level-deep hierarchy. But most of the time, in real-world scenarios, we are aware of the depth we d like to go, and it would be nice to allow recursive queries with a depth level specified.

Because LINQ to SQL has limitations where it won t allow you to eagerly load a self-referencing table, developers are required to manually load recursive queries by explicitly traveling each tree of the category. Figure 9 shows a large query for eagerly loading two-level-deep subcategories for each parent category. I obtained my top-level category by only retrieving categories that don t have a parent. To load children for the category, I accessed the navigation property, Categories, for each category. For instance, I obtain my first-level subcategories by cat0.Categories, second level by cat1.SubCategories, and so on. This kind of query can potentially get complex if you have a deep hierarchy. The query produces one large SQL statement that may be very expensive to execute, so use it with caution and use it only when you want to eagerly load child entities for a self-referencing table.

Figure 9: Manually traveling two levels deep for each parent category

Entity Framework supports the Include method, which, when passed in QueryPath allows eager loading to a predefined depth. For instance, in Figure 10, to load two-level-deep categories, the Include method was called on Categories with a QueryPath of Categories.Categories. QueryPath is comprised of navigation properties; the number of times the navigation properties are traversed indicates the desired depth.

Figure 10: Using Include to specify the depth in our LINQ to SQL queries

Tip Five: Use AsQueryable with LINQ to Objects and LINQ to SQL

The AsQueryable method allows a query to be converted to an instance of IQueryable. When you use the AsQueryable operator on an existing query and apply further transformations, such as applying a filter or a sort order, those lambda statements are converted to expression trees. Depending on the provider you are using, expression trees will be converted to the domain-specific syntax, then executed. In the case of LINQ to SQL, the expression tree would be converted to SQL and executed on SQL Server. However, if you use the AsQueryable operator on a query that does not implement IQueryable and only implements IEnumerable, then any transformations you apply on the query will automatically fall back on the IEnumerable specification. What this means is that by tagging a query with AsQueryable, you get the benefits of both LINQ to SQL and LINQ to Objects implementation. If your existing query happens to implement IQueryable, the query is converted to SQL by LINQ to SQL; otherwise, the query is executed in memory in the form of IL code.

This offers excellent benefits in real-world scenarios where you have certain methods on an entity that returns an IQueryable of T and some methods return List. But then you have a business rule filter that must be applied on the entire collection, regardless of whether the collection is returned as IQueryable of T or IEnumerable of T. From a performance standpoint, you really want to leverage executing the business filter on the database if the collection implements IQueryable; otherwise, fall back to apply the business filter in memory using the LINQ to Objects implementation of delegates.

Figure 11 shows the TopSellingProducts method, which returns an IQueryable of products. I defined that all the products that have been ordered more than 50 times are considered top products. Because the TopSellingProducts method returns an IQueryable, I have not executed the query; I simply declared the query of how to get top-selling products.

Figure 11: Using AsQueryable to apply a filter either in memory or a database

Another method, MostProfitableProducts, returns a list of products that have a UnitPrice greater than 60. I arbitrarily came up with this rule and stated that all products that have a UnitPrice of more than 60 dollars are considered profitable to the company.

Looking further in Figure 11, the DisplayProducts method is responsible for displaying products. One of the business rules states that we can only display products that have either quantity greater than 0 or the product is not a discontinued product. To apply our business filter, I sent the results of both TopSellingProducts and MostProfitableProducts to our filter method, FilterNonDisplayableProducts. FilterNonDisplayableProducts simply removes the products that violate our business rules. Notice that FilterNonDisplayableProducts takes in an IEnumerable because this is the lowest common denominator with which we can work and is implemented by all generic collections in one form or another. Before applying the business filter I convert the IEnumerable of T collection passed in as a parameter to IQueryable of T using the AsQueryable operator. We do this because we want the filter to be applied on the database for collections that implement IQueryable. If the collection does not implement IQueryable, the filter is applied in memory. This way you can apply the business rule to both IQueryable and IEnumerable and get the benefits of both worlds.

Figure 12 shows the SQL capture when I execute the code from Figure 11. Based on the SQL query sent, we can confirm that our business filter was applied on the database for top-selling products because it was using IQueryable, whereas, for the MostProfitableProducts collection, the business filter was applied in memory because it implemented only IEnumerable of T.

Figure 12: Confirms that the business filter gets executed on the database for top-selling products

Tip Six: Use ExecuteQuery to Execute Dynamic SQL in LINQ to SQL

If you find a particular query is difficult to represent as a LINQ query, you can consider other options, such as writing your own SQL statements inside a stored procedure or a function, then calling the stored procedure and function using LINQ to SQL. You also can write dynamic SQL statements using the helper methods provided in DataContext. Some of you may be thinking this is new or different from ADO.NET, which allowed executing dynamic SQL and getting a datareader or dataset back. You can achieve the same results by making use of the ExecuteQuery method available on the DataContext.

The ExecuteQuery method does a lot more than simply executing dynamic SQL. First, it supports the concept of parameterized queries. You provide parameters like you provide parameters to string.format using placeholders, which LINQ to SQL converts to a parameterized query. When you execute dynamic SQL in traditional ADO.NET, you either get a datareader or dataset. It always had been left to the developer to convert the datareader to a customized business object. However, ExecuteQuery takes a generic type that tells LINQ to SQL that a particular SQL statement would return a strongly typed instance of the class passed in as a generic type. You no longer need to worry about converting a datareader to an object.

There are some restrictions you must follow to accomplish proper mapping of your columns to properties on your object. First, your query columns must contain the columns that are defined as the primary key. Second, the column names returned by the query should match the property names defined on your entity class. If the column names do not match, you can use an alias in T-SQL to match the property name defined on your entity. Furthermore, LINQ to SQL does not require that you return all the columns that map to all the properties on your entity. If your query is missing some column, LINQ to SQL will assign a default value to your property. Beware that because you are loading an entity from the database, an object tracking service will start tracking the object you just obtained. And, if you ve partially populated your object, the next time you try to fetch the same object from the DataContext, you ll end up with a partially filled object unless you refresh the object from the database or use a fresh DataContext.

Figure 13 shows a dynamic SQL statement that returns all the customers who placed at least one order that was shipped in the city of Seattle. I used placeholder syntax to pass parameters to my SQL query. The placeholders get converted to parameterized queries, which will free the developer from worrying about a SQL injection attack. I then pass my SQL statement to the ExecuteQuery method on the DataContext, with a generic type of Customer to return a collection of customers. Because my query only returns CustomerID and CompanyName, I m partially populating my customer object so any fields that I m not populating get assigned a default value. If my query did not return CustomerID, then LINQ to SQL would throw a runtime exception complaining that the query must at least have a CustomerID column, as it is the primary column that LINQ to SQL uses to track the object.

Figure 13: Executing dynamic SQL to return a strongly typed customer object

Tip Seven: Execute Your LINQ Query on the Database

Depending on the approach you take when writing a LINQ query, you may be surprised that some of the operations are getting performed in memory instead of the call being translated to SQL and executed on the database. When you apply aggregate operators on association relationships on an entity in the context of query syntax, the aggregation is performed on the database. However, when executed outside the query syntax, the same syntax would force the entire association relationship to be brought from the database and the aggregate operation performed in memory.

In Figure 14 shows two versions of the same query that returns the customer ID, total number of orders a customer has placed, and total amount they ve spent. In the first query I used the Orders association relationship available on each customer to get the total orders placed. To get the total amount spent, I first used the association relationship Orders on the customer, then applied the SelectMany operator to get a flattened list of all the order details for all the orders for a customer.

Figure 14: Aggregate operators are applied on the database if used inside a LINQ query

After obtaining all the order details, I used the sum operator to calculate the total spent by each customer. Because I used an association relationship inside a LINQ query, the entire query is converted to SQL to send to the database for execution.

The second query in Figure 14 uses the same association relationship on the customer entity to perform calculations. Because the calculations are not part of an existing query, LINQ to SQL must bring all the orders for the customer and for each order bring down all its OrderDetails to calculate the Sum and Count operations. Bringing all the orders and order details for a customer is an expensive operation that need not be performed if all you want to do is get the count and sum. Those operations easily can be done on the database. As a developer, it is important to understand the trade-offs and know which option may be better suited for your scenario. For instance, if you have the Orders and OrderDetails for a customer in memory, it may be more efficient to perform these operations in memory instead of making a database call to the server.

Conclusion

In this article we explored how LINQ uses a tracking service to ensure the same object on which the user worked is being returned. When searching for an object based on the primary key, LINQ to SQL saves the database roundtrip if it has initially fetched the object and is available in the tracking repository. If the query searches on any columns other than the primary key, the LINQ to SQL engine will make a database roundtrip, but will then check the tracking service to see if the object retrieved from a query is being tracked. If the object is available in the tracking service, it will return that object. These behaviors ensure that consecutive retrievals will not wipe out the changes you made on an existing object.

We also discussed using the AssociateWith operator to apply sorting on the database for child entities. We went through the limitations of loading self-referencing tables using LINQ to SQL. We also saw an easier syntax available in Entity Framework that allows us to navigate the depth of the child collection using the Include method. We then discussed how using the AsQueryable operator lets you leverage existing IQueryable queries to perform further filtering on the database, and if the collection only implements IEnumerable, query processing falls back to the IEnumerable implementation of LINQ to Objects.

We also saw a few examples of using ExecuteQuery to run our dynamic SQL statements and get objects back. ExecuteQuery is better than its predecessors because its uses placeholders that get converted to parameterized queries, saving you from having to worry about SQL injection attacks. Using ExecuteQuery you no longer have to convert readers to entities, and ExecuteQuery does not require you to fully populate your entire entity. Finally, we discussed how aggregate operators are applied on an association relationship of an entity. Aggregate operators used in the context of an existing query will be applied on the database. Outside the context of IQueryable, the aggregate operators force the entire collection to be brought on the client and calculations performed in memory.

Source code accompanying this article is available for download.

Zeeshan Hirani ([email protected]) is a senior developer at CheaperThanDirt.com. He specializes mainly in ASP.NET, AJAX, and leveraging OR-mapping solutions like LINQ and Entity Framework in business applications.

Comments

Plain text