Home > Uncategorized > GOTCHA – LINQtoSQL Paging Problem

GOTCHA – LINQtoSQL Paging Problem

A quick search for how to display paged records in ASP.NET will yield a ton of links and how to’s ranging from as simple as automatic paging with the grid controls to custom paging and server side paging in SQL Server, but what about paging in LINQ?

Consider a simple LINQ query to find all employees with marital status single using the AdventureWorks db

var query = from e in Employees
where e.MaritalStatus == "S"
select e;

To get to the first page is simple as we just need to add a TOP(n) clause to the generated SQL. In LINQ we do this with the Take(n) function. Assuming we have a page size of 10 our first page would be

query.Take(10);

To get subsequent pages we need to somehow tell SQL to skip a certain number of records and then send us the TOP 10 records. i.e.

Skip Rows = Page Size * (Required Page – 1)

Again in LINQ this is extremely simple

query.Skip(10).Take(10); //page 2

query.Skip(20).Take(10); //page 3

Now the problem we have just created exists in the generated SQL when we apply Skip() / Take(); the resulting SQL queries have an unstable sort order.

For the first page the generated query is

SELECT TOP (10) [t0].[EmployeeID], [t0].[NationalIDNumber], [t0].[ContactID], [t0].[LoginID], [t0].[ManagerID], [t0].[Title], [t0].[BirthDate], [t0].[MaritalStatus], [t0].[Gender], [t0].[HireDate], [t0].[SalariedFlag], [t0].[VacationHours], [t0].[SickLeaveHours], [t0].[CurrentFlag], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate]
FROM [HumanResources].[Employee] AS [t0]
WHERE [t0].[MaritalStatus] = @p0
ORDER BY [t0].[ManagerID]
– @p0: Input NVarChar (Size = 1; Prec = 0; Scale = 0) [S]

And for subsequent pages where Skip(n) is specified it generates:

SELECT [t1].[EmployeeID], [t1].[NationalIDNumber], [t1].[ContactID], [t1].[LoginID], [t1].[ManagerID], [t1].[Title], [t1].[BirthDate], [t1].[MaritalStatus], [t1].[Gender], [t1].[HireDate], [t1].[SalariedFlag], [t1].[VacationHours], [t1].[SickLeaveHours], [t1].[CurrentFlag], [t1].[rowguid] AS [Rowguid], [t1].[ModifiedDate]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ManagerID]) AS [ROW_NUMBER], [t0].[EmployeeID], [t0].[NationalIDNumber], [t0].[ContactID], [t0].[LoginID], [t0].[ManagerID], [t0].[Title], [t0].[BirthDate], [t0].[MaritalStatus], [t0].[Gender], [t0].[HireDate], [t0].[SalariedFlag], [t0].[VacationHours], [t0].[SickLeaveHours], [t0].[CurrentFlag], [t0].[rowguid], [t0].[ModifiedDate]
    FROM [HumanResources].[Employee] AS [t0]
    WHERE [t0].[MaritalStatus] = @p0
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p1 + 1 AND @p1 + @p2
ORDER BY [t1].[ROW_NUMBER]
– @p0: Input NVarChar (Size = 1; Prec = 0; Scale = 0) [S]
– @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
– @p2: Input Int (Size = 0; Prec = 0; Scale = 0) [10]

Why different queries?

By definition ‘SELECT’ returns an unordered set, as SQL is set based, and sets have no order. If you want an order in the returned set, you have to define the ordering, with ORDER BY. The ROW_NUMBER() is required to determine which rows are in which page to return.

The problem is that different queries for page 1 vs page n returns the rows in a different sort order. Which means that some of the records displayed on the first page are displayed again on subsequent pages while other records may be missing entirely i.e. don’t show up at all on any page. Adding a Skip(0) to our first page query doesn’t solve things either because LINQ optimizes out the ROW_NUMMBER() from the generated SQL query for Skip(0).

query.Skip(0).Take(10); // doesn’t solve the problem

SELECT TOP (10) [t0].[EmployeeID], [t0].[NationalIDNumber], [t0].[ContactID], [t0].[LoginID], [t0].[ManagerID], [t0].[Title], [t0].[BirthDate], [t0].[MaritalStatus], [t0].[Gender], [t0].[HireDate], [t0].[SalariedFlag], [t0].[VacationHours], [t0].[SickLeaveHours], [t0].[CurrentFlag], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate]
FROM [HumanResources].[Employee] AS [t0]
WHERE [t0].[MaritalStatus] = @p0
ORDER BY [t0].[ManagerID]
– @p0: Input NVarChar (Size = 1; Prec = 0; Scale = 0) [S]

From the looks of it the problem seems to be that, for page 1, skip is 0, and the query is processed as if the skip wasn’t applied at all, all the row numbering, and order stabilizing stuff is optimized out. Skip(0) should generate the same row numbering SQL that skip(n) does so that paging systems can rely on the first page of a paged view operates on the same query, with the same row order that will be used for the other pages. Skip(n) imposes a consistent row order. Skip(0) should use the same row order.

But looks can be deceiving. No, you think that the ordering you see in the ROW_NUMBER() column is a valid order, but that’s applied to a set which happens to be not properly ordered! The order in which the rows are in that set with ROW_NUMBER() is determined based on the ordering in the query! The ROW_NUMBER() are added to the final, ordered set. So if in that set there are some rows with undeterminable ordering it can be that the order between queries flips.

So how to solve this? By adding an ORDER BY clause to our original query which forces a sort on the query and stabilizes the row order

var query = from e in Employees
where e.MaritalStatus == "S"
orderby e.NaionalIDNumber
select e;

Without a stated sort order, any standard SQL query is of indeterminate order, and thus in theory unstable. In reality, most databases will (as an implementation detail) honor a clustered index as a default sort (assuming it can pick one from the numerous tables involved in joins), but even this isn’t typically guaranteed. In reality, one should always specify an order condition if you are paging. Not specifying an ORDER BY clause in a query with TOP (n) specified is not recommended (and detected by the SQL Best Practices Analyzer tool) because the set of rows that will be returned will be plan dependent.

Specifying a mere order condition isn’t sufficient. If I were to sort by last name and there were multiple people with same last name I don’t want the rows with the same last name trading places on the first page and between subsequent pages. The order by should be 100% deterministic i.e. be as specific down to the unique column as possible.

 

Technorati Tags: ,,,

About these ads
Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: