Archive

Posts Tagged ‘T-SQL’

SQL to LINQ Cheat Sheet

September 27, 2009 2 comments

If you are already working with SQL and are familiar with SQL queries then you may find you at time are thinking of converting SQL syntax to LINQ syntax when writing LINQ. Following cheat sheet should help you with some of the common queries

 

SQL

LINQ

Lambda

SELECT *

FROM HumanResources.Employee

from e in Employees

select e

Employees
   .Select (e => e)

SELECT e.LoginID, e.JobTitle

FROM HumanResources.Employee AS e

from e in Employees

select new {e.LoginID, e.JobTitle}

Employees
   .Select (
      e =>
         new 
         {
            LoginID = e.LoginID,
            JobTitle = e.JobTitle
         }
   )

SELECT e.LoginID AS ID, e.JobTitle AS Title

FROM HumanResources.Employee AS e

from e in Employees

select new {ID = e.LoginID, Title = e.JobTitle}

Employees
   .Select (
      e =>
         new 
         {
            ID = e.LoginID,
            Title = e.JobTitle
         }
   )

SELECT DISTINCT e.JobTitle

FROM HumanResources.Employee AS e

(from e in Employees

select e.JobTitle).Distinct()

Employees
   .Select (e => e.JobTitle)
   .Distinct ()

SELECT e.*

FROM HumanResources.Employee AS e

WHERE e.LoginID = ‘test’

from e in Employees

where e.LoginID == "test"

select e

Employees
   .Where (e => (e.LoginID == "test"))

SELECT e.*

FROM HumanResources.Employee AS e

WHERE e.LoginID = ‘test’ AND e.SalariedFlag = 1

from e in Employees

where e.LoginID == "test" && e.SalariedFlag

select e

Employees
   .Where (e => ((e.LoginID == "test") && e.SalariedFlag))

SELECT e.*
FROM HumanResources.Employee AS e

WHERE e.VacationHours >= 2 AND e.VacationHours <= 10

from e in Employees

where e.VacationHours >= 2 && e.VacationHours <= 10

select e

Employees
   .Where (e => (((Int32)(e.VacationHours) >= 2) && ((Int32)(e.VacationHours) <= 10)))

SELECT e.*

FROM HumanResources.Employee AS e
ORDER BY e.NationalIDNumber

from e in Employees

orderby e.NationalIDNumber

select e

Employees
   .OrderBy (e => e.NationalIDNumber)

SELECT e.*

FROM HumanResources.Employee AS e

ORDER BY e.HireDate DESC, e.NationalIDNumber

from e in Employees

orderby e.HireDate descending, e.NationalIDNumber

select e

Employees
   .OrderByDescending (e => e.HireDate)
   .ThenBy (e => e.NationalIDNumber)

SELECT e.*
FROM HumanResources.Employee AS e

WHERE e.JobTitle LIKE ‘Vice%’ OR SUBSTRING(e.JobTitle, 0, 3) = ‘Pro’

from e in Employees

where e.JobTitle.StartsWith("Vice") || e.JobTitle.Substring(0, 3) == "Pro"

select e

Employees
   .Where (e => (e.JobTitle.StartsWith ("Vice") || (e.JobTitle.Substring (0, 3) == "Pro")))

SELECT SUM(e.VacationHours)

FROM HumanResources.Employee AS e

 

Employees.Sum(e => e.VacationHours);

SELECT COUNT(*)

FROM HumanResources.Employee AS e

 

Employees.Count();

SELECT SUM(e.VacationHours) AS TotalVacations, e.JobTitle

FROM HumanResources.Employee AS e

GROUP BY e.JobTitle

from e in Employees

group e by e.JobTitle into g

select new {JobTitle = g.Key, TotalVacations = g.Sum(e => e.VacationHours)}

Employees
   .GroupBy (e => e.JobTitle)
   .Select (
      g =>
         new 
         {
            JobTitle = g.Key,
            TotalVacations = g.Sum (e => (Int32)(e.VacationHours))
         }
   )

SELECT e.JobTitle, SUM(e.VacationHours) AS TotalVacations

FROM HumanResources.Employee AS e

GROUP BY e.JobTitle

HAVING e.COUNT(*) > 2

from e in Employees

group e by e.JobTitle into g

where g.Count() > 2

select new {JobTitle = g.Key, TotalVacations = g.Sum(e => e.VacationHours)}

Employees
   .GroupBy (e => e.JobTitle)
   .Where (g => (g.Count () > 2))
   .Select (
      g =>
         new 
         {
            JobTitle = g.Key,
            TotalVacations = g.Sum (e => (Int32)(e.VacationHours))
         }
   )

SELECT *

FROM Production.Product AS p, Production.ProductReview AS pr

from p in Products

from pr in ProductReviews

select new {p, pr}

Products
   .SelectMany (
      p => ProductReviews,
      (p, pr) =>
         new 
         {
            p = p,
            pr = pr
         }
   )

SELECT *

FROM Production.Product AS p

INNER JOIN Production.ProductReview AS pr ON p.ProductID = pr.ProductID

from p in Products

join pr in ProductReviews on p.ProductID equals pr.ProductID

select new {p, pr}

Products
   .Join (
      ProductReviews,
      p => p.ProductID,
      pr => pr.ProductID,
      (p, pr) =>
         new 
         {
            p = p,
            pr = pr
         }
   )

SELECT *

FROM Production.Product AS p

INNER JOIN Production.ProductCostHistory AS pch ON p.ProductID = pch.ProductID AND p.SellStartDate = pch.StartDate

from p in Products

join pch in ProductCostHistories on new {p.ProductID, StartDate = p.SellStartDate} equals new {pch.ProductID, StartDate = pch.StartDate}

select new {p, pch}

Products
   .Join (
      ProductCostHistories,
      p =>
         new 
         {
            ProductID = p.ProductID,
            StartDate = p.SellStartDate
         },
      pch =>
         new 
         {
            ProductID = pch.ProductID,
            StartDate = pch.StartDate
         },
      (p, pch) =>
         new 
         {
            p = p,
            pch = pch
         }
   )

SELECT *

FROM Production.Product AS p

LEFT OUTER JOIN Production.ProductReview AS pr ON p.ProductID = pr.ProductID

from p in Products

join pr in ProductReviews on p.ProductID equals pr.ProductID

into prodrev

select new {p, prodrev}

Products
   .GroupJoin (
      ProductReviews,
      p => p.ProductID,
      pr => pr.ProductID,
      (p, prodrev) =>
         new 
         {
            p = p,
            prodrev = prodrev
         }
   )

SELECT p.ProductID AS ID

FROM Production.Product AS p

UNION

SELECT pr.ProductReviewID

FROM Production.ProductReview AS pr

(from p in Products

select new {ID = p.ProductID}).Union(

from pr in ProductReviews

select new {ID = pr.ProductReviewID})

Products
   .Select (
      p =>
         new 
         {
            ID = p.ProductID
         }
   )
   .Union (
      ProductReviews
         .Select (
            pr =>
               new 
               {
                  ID = pr.ProductReviewID
               }
         )
   )

SELECT TOP (10) *

FROM Production.Product AS p

WHERE p.StandardCost < 100

(from p in Products

where p.StandardCost < 100

select p).Take(10)

Products
   .Where (p => (p.StandardCost < 100))
   .Take (10)

SELECT *

FROM [Production].[Product] AS p

WHERE p.ProductID IN(

    SELECT pr.ProductID

    FROM [Production].[ProductReview] AS [pr]

    WHERE pr.[Rating] = 5

    )

from p in Products

where (from pr in ProductReviews

where pr.Rating == 5

select pr.ProductID).Contains(p.ProductID)

select p

Products
   .Where (
      p =>
         ProductReviews
            .Where (pr => (pr.Rating == 5))
            .Select (pr => pr.ProductID)
            .Contains (p.ProductID)
   )

 

Also, here is an excellent LINQ query comprehension diagram http://www.albahari.com/nutshell/linqsyntax.emf

Tags: , , ,

Select top n rows from a table for each group

March 6, 2009 1 comment

We have a retail shop online and with just 2 weeks of our website launch we already have close to 30 orders. Now of course marketing wanted to get some ads up on the site to derive more orders and one for the reports was to get the top two products from each manufacturer which have the best promotional price.

A GROU BY clause immediately comes to mind for the above scenario, but SQL 2005/2008 offer a much better solution.

SELECT    
    ProductId,
    MfgPartNumber,
    DescriptionText,
    MfgListPrice,
    SellPrice,
    SavingsPercentage
FROM    (
    SELECT
        ROW_NUMBER() OVER(PARTITION BY P.MfgCode ORDER BY (P.MfgListPrice - PR.SellPrice) / P.MfgListPrice DESC) AS RowNumber,
        P.ProductId,
        P.MfgPartNumber,
        P.DescriptionText,
        P.MfgListPrice,
        PR.SellPrice,
        (P.MfgListPrice - PR.SellPrice) / P.MfgListPrice AS SavingsPercentage
    FROM    Product P
    INNER JOIN    Price PR
        ON    P.ProductId = PR.ProductId
    INNER JOIN    SpecialPricingXRef SP
        ON    PR.PriceGroupId = SP.PriceGroupId
    WHERE    SP.PricingType = 'Promo'
        AND    P.MfgListPrice > PR.SellPrice
    ) AS InnerTable
WHERE    RowNumber < 3
ORDER BY    SavingsPercentage DESC;

Aside from normal use of the ROW_NUMBER() function to generate sequence numbers for the result set the above is the perfect scenario where the ROW_NUMBER() function is really useful. ROW_NUMBER() function creates a new column in the result set with a unique / incremental number for each row. If the PARTITION BY clause (think of a partition as a category or group) is also specified then the sequence number of a row is reset to 1 for each new partition / category.

Lets dissect the above query; following query returns all the products which have a promo price record with a RowNumber incremented for each row. I’ll get to why we need the RowNumber in our query later.

SELECT
    ROW_NUMBER() OVER(ORDER BY (P.MfgListPrice - PR.SellPrice) / P.MfgListPrice DESC) AS RowNumber,
    P.ProductId,
    P.MfgPartNumber,
    P.DescriptionText,
    P.MfgListPrice,
    PR.SellPrice,
    (P.MfgListPrice - PR.SellPrice) / P.MfgListPrice AS SavingsPercentage
FROM    Product P
INNER JOIN    Price PR
    ON    P.ProductId = PR.ProductId
INNER JOIN    SpecialPricingXRef SP
    ON    PR.PriceGroupId = SP.PriceGroupId
WHERE    SP.PricingType = 'Promo'
    AND    P.MfgListPrice > PR.SellPrice

Adding the PARTITION BY clause to the above changes the output slightly. When the PARTITION BY clause is also specified the sequence number generated by ROW_NUMBER() is reset to 1 for each new partition. In concept the PARTITION BY clause is similar to a GROUP BY except it only applies to the function and not the select as a whole.

SELECT
    ROW_NUMBER() OVER(PARTITION BY P.MfgCode ORDER BY (P.MfgListPrice - PR.SellPrice) / P.MfgListPrice DESC) AS RowNumber,
    P.ProductId,
    P.MfgPartNumber,
    P.DescriptionText,
    P.MfgListPrice,
    PR.SellPrice,
    (P.MfgListPrice - PR.SellPrice) / P.MfgListPrice AS SavingsPercentage
FROM    Product P
INNER JOIN    Price PR
    ON    P.ProductId = PR.ProductId
INNER JOIN    SpecialPricingXRef SP
    ON    PR.PriceGroupId = SP.PriceGroupId
WHERE    SP.PricingType = 'Promo'
    AND    P.MfgListPrice > PR.SellPrice

The where clause in our outer select is just filtering the results so that only those rows with RowNumber less than 3 are returned. Recall from our previous query that RowNumber is reset to 1 for each manufacturer; so to get just the top 2 rows for each manufacturer we need to get rows where RowNumber is 1 and 2 (< 3). The result is the list of 2 rows with highest savings for each manufacturer… simple really.

P.S. If you prefer Common Table Expressions then you can rewrite the query to:

WITH InnerTable AS(
    SELECT
        ROW_NUMBER() OVER(PARTITION BY P.MfgCode ORDER BY (P.MfgListPrice - PR.SellPrice) / P.MfgListPrice DESC) AS RowNumber,
        P.ProductId, 
        P.MfgPartNumber,
        P.DescriptionText,
        P.MfgListPrice,
        PR.SellPrice,
        (P.MfgListPrice - PR.SellPrice) / P.MfgListPrice AS SavingsPercentage
    FROM    Product P
    INNER JOIN    Price PR
        ON    P.ProductId = PR.ProductId
    INNER JOIN    SpecialPricingXRef SP
        ON    PR.PriceGroupId = SP.PriceGroupId
    WHERE    SP.PricingType = 'Promo'
        AND    P.MfgListPrice > PR.SellPrice
) 

SELECT    ProductId,
    MfgPartNumber,
    DescriptionText,
    MfgListPrice,
    SellPrice,
    SavingsPercentage
FROM    InnerTable
WHERE    RowNumber <= 3
ORDER BY    SavingsPercentage DESC;

the query plan for sub query vs CTE is the same so its all a matter of choice and readability.

Table-Valued parameter in SQL Server 2005

August 3, 2008 3 comments

In pre-SQL Server 2005 in order to pass in a set of values one had to create a temporary table, populate it with data using INSERT, and then just use it in the procedure or function since they are created for the current session and are available to all processes in that session.

I did a blog on how to pass in Table-Value Parameters in SQL Server 2008 but what if we have a need to pass in multiple rows of data to a T-SQL statement, or a routine such as stored procedure or function in SQL Server 2005?

Turns out the same can be done in SQL Server 2005 without using temporary tables. By using the XML data type you can pass user-defined sets between queries and also between the client and the server.

The following code shows how you can create and use XML parameters.

USE AdventureWorks; GO CREATE PROCEDURE uspEmployeeList(@EmployeeList XML) AS BEGIN SET NOCOUNT ON; SELECT E.* FROM HumanResources.Employee E INNER JOIN @EmployeeList.nodes('/Root/E') AS Tbl(C) ON E.EmployeeID = Tbl.C.value('@EmployeeID', 'INT'); RETURN; END
 

How are XML parameters supported in .NET? ADO.NET allows full support using SqlDbType.Xml type. Adding a XML as a parameter to the stored procedure from C# would have something like:

//string EmployeeXml = "<Root><E EmployeeID=\"1\" /><E EmployeeID=\"3\" /><E EmployeeID=\"5\" /><E EmployeeID=\"7\" /><E EmployeeID=\"11\" /></Root>";
 
// Create the data table
DataTable dt = new DataTable("E");
// Create the table schema
DataColumn dc = dt.Columns.Add("EmployeeID", typeof(string));
// Add a few records to the data table.
for (int i = 1; i <= 10; i++)
{
      // create a new row
      
DataRow
dr = dt.NewRow();
      // populate the fields
      
dr[0] = i.ToString();
      // add the row to the table
      
dt.Rows.Add(dr);
}
... System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand("uspEmployeeList", sqlConn); cmd.Parameters.Add("@EmployeeList", System.Data.SqlDbType.Xml); //cmd.Parameters["@EmployeeList"].Value = EmployeeXml; ...
// Create a temporary MemoryStream to hold the output of the WriteXml method of the DataTable
using (MemoryStream memoryStream = new MemoryStream())
{
dt.WriteXml(memoryStream);
         UTF8Encoding encoding = new UTF8Encoding();
         cmd.Parameters["@EmployeeList"].Value = encoding.GetString(memoryStream.ToArray());
}

Now that’s cool. This is much better than passing in a comma separated list and using a dynamic query in our procedure or function.

Don’t Use SELECT *

July 7, 2008 1 comment

One of the most common performance and scalability problems are queries that return too many columns or too many rows. I’ve seen many developers actually using and abusing SELECT * FROM queries. SELECT * query not only returns unnecessary data, but it also can force clustered index scans for query plans because columns in the SELECT clause are also considered by the optimizer when it identifies indexes for execution plans.

The following two queries show this difference; note the difference in query plans as well as the relative cost to the batch. The query cost relative to the batch is 67% for the SELECT * compared to just 33% for the SELECT column query.

SELECT    SalesOrderID
FROM    Sales.SalesOrderHeader
WHERE    OrderDate< 01/01/1978

SELECT   *
FROM    Sales.SalesOrderHeader
WHERE    OrderDate< 01/01/1978

The first query uses a Clustered Index Scan to resolve the query because it has to retrieve all the data from the clustered index, even though there is an index on the OrderDate column.

The second query uses the OrderDate index to perform an Index Seek operation. Because the query returns only the SalesOrderID column, and because the column is the clustering key, the query is resolved using only that index.

Tags:

Table-Value Parameters in SQL Server 2008

April 26, 2008 2 comments

One of the cool new data types added in SQL Server 2008 is Table-Value Parameters. You can use table value parameters to send multiple rows of data to a Transact-SQL statement, or to a routine such as a stored procedure or function. This can be done without creating a temporary table or many parameters. Table-value parameters are like parameter arrays in Object Linking and Embedding Database (OLE DB) and Open Database Connectivity (ODBC), but offer better flexibility and closer integration with Transact-SQL. Table-value parameters can participate in set-based operations.

Table-value parameters allow you to pass user-defined tables between queries and also between the client and the server. In SQL Server 2008, you can use table-value parameters to pass an entire set of data into functions and procedures.

In earlier versions of SQL Server in order to pass in a set of values one had to create a temporary table, populate it with data using INSERT, and then just use it in the procedure or function since they are created for the current session and are available to all processes in that session.

With table-value parameters you need to declare a table variable, populate it with data using standard routines such as INSERT, and then pass it into a procedure or a function. The database engine handles this data by using a reference marked with the READONLY property to ensure that the data does not get copied again.

Table-value parameters have two primary components – a SQL Server type and a parameter. The following are the steps to create and use table-value parameters:

  1. Create a table type and define the table structure.
  2. Declare a routine that has a parameter of the table type.
  3. Declare a variable of the table type and then reference the table type.
  4. Fill the table variable by using an INSERT statement.
  5. Pass the variable to a routine.

The following code shows how you can create and use table-value parameters.

/* Create a table type. */ CREATE TYPE DeptTableType AS TABLE( DeptName VARCHAR(50), DeptCode VARCHAR(2)); GO /* Create a procedure to receive data for the table-value parameter.*/ CREATE PROCEDURE uspDepartmentInsert @TVP DeptTableType READONLY AS SET NOCOUNT ON; INSERT INTO Dept(Name, DeptCode, ModifiedDate) SELECT *, GETDATE() FROM @TVP; GO /* Declare a variable that references the type. */ DECLARE @DeptTVP AS DeptTableType; /* Add data to the table variable. */ INSERT INTO @DeptTVP(DeptName, DeptCode) VALUES('Software Development', 'SD'); INSERT INTO @DeptTVP(DeptName, DeptCode) VALUES('Quality Assurance', 'QA'); /* Pass the table variable data to a stored procedure. */ EXEC uspDepartmentInsert @DeptTVP; GO

Table-value parameters offer more flexibility and better performance than temporary tables to pass a list of parameters. Table-value parameters do not acquire locks for the initial population of data from a client and do not cause a statement to recompile.

This is all great but in any real application the parameters are usually passed in from .NET code. How are table-value parameters supported in .NET? ADO.NET 3.0 allows full support using SqlDbType.Structured type. The parameters which can be passed in from ADO.NET are DataTables, IEnumerable<SqlDataRecord> and DbDataReader. There is also support for ODBC, SSMS, and within Replication.

Adding a DataTable as a parameter to the stored procedure from C# would have something like:

DataTable dt = new DataTable(); dt.Columns.Add.... dt.Rows.Add..... SqlCommand cmd = new SqlCommand("uspDepartmentInsert", sqlConn); cmd.Parameters.AddWithValue("@DeptTVP", dt); cmd.ExecuteNonQuery();

Now that’s cool.

Table-value parameters offer the following benefits:

  • Provide a simple programming model.
  • Enable inclusion of complex business logic in a single routine.
  • Reduce round trips to the server.
  • Include a table structure of different cardinality.
  • Enable strongly typed and set-oriented queries.
  • Enable the client to specify sort order and unique keys.

Table-value parameters have the following restrictions:

  • Statistics are not maintained on columns of table-value parameters.
  • Table-value parameters must be passed as input READONLY parameters to Transact-SQL routines.
  • DML operations such as UPDATE, DELETE, or INSERT cannot be performed on a table-value parameter in the body of a routine. This may actually be a good thing if the data is not meant to change and a read-only copy of the data is all that is needed. However, if the routine needs to update the rows then you’d have to use temporary tables instead in procedures.
  • Table-value parameters cannot be used as the target of a SELECT INTO or INSERT EXEC statement. Table-value parameters can be in the FROM clause of SELECT INTO, or in the INSERT EXEC string or stored procedure.

 

Conditional INNER Joins

April 12, 2008 2 comments

Recently on  a newsgroup I found a question about conditional joins in TSQL. The OP was trying to get rid of dynamic SQL and was looking for a way to change the following example script

DECLARE @SQL VARCHAR(MAX) SET @SQL = 'SELECT * FROM T1' IF LEN(@Param1) > 0 SET @SQL = @SQL + ' INNER JOIN T2 ON T1.Col1 = T2.Col2' IF @Param2 > 0 SET @SQL = @SQL + ' WHERE T1.Col2 = @Param2' EXEC (@SQL)

What the OP was asking for was if it was possible to conditionally join to a table given a certain parameter value. What he was looking for was a way to do something like this

SELECT * FROM T1 IF LEN(@Param1) > 0 BEGIN INNER JOIN T2 ON T1.Col1 = T2.Col2 END IF @Param2 > 0 BEGIN WHERE T1.Col2 = @Param2 END

Now converting the second if statement is easy; just add a case statement to check the value of @Param2 and change the RHS of the equality operator based on value of @Param2

SELECT * FROM T1 IF LEN(@Param1) > 0 BEGIN INNER JOIN T2 ON T1.Col1 = T2.Col2 END WHERE T1.Col2 = CASE @Param2 WHEN 0 THEN T1.Col2 ELSE @Param2 END

Conditionally adding the INNER JOIN is very easy when building up a SQL string but the above would not work because of syntax errors. So how do we do this in pure TSQL? simple; we change our where clause to check for existence of matching rows in the second table T2 and OR with the parameter

SELECT * FROM T1 WHERE ( EXISTS( SELECT * FROM T2 WHERE T1.Col1 = T2.Col2 ) OR LEN(@Param1) = 0 ) AND T1.Col2 = CASE @Param2 WHEN 0 THEN T1.Col2 ELSE @Param2 END

The above generates an optimal query plan because SQL Server knows it needs to access T2 and it will appear in the plan but the LEN(@Param1) = 0 condition prevents it from being accessed when there is no need to.

An alternate approach would be to write separate queries and use IF statements to determine which query needs to be executed

IF LEN(@Param) > 0 BEGIN SELECT * FROM T1 INNER JOIN T2 ON T1.Col1 = T2.Col2 WHERE T1.Col2 = CASE @Param2 WHEN 0 THEN T1.Col2 ELSE @Param2 END END ELSE BEGIN SELECT * FROM T1 WHERE T1.Col2 = CASE @Param2 WHEN 0 THEN T1.Col2 ELSE @Param2 END END

But this can get pretty nasty if the number of parameters / conditions grows. An example would be a search SP which searches against Keyword, Product Category, Price, Inventory Quantity etc. Imagine the number of separate queries and the nightmare to maintain / modify them if needed. The conditional inner join version has the benefit of being a single query which can be easily modified if need be. But do experiment and perhaps a hybrid solution would be the best in your case where a couple of IF statements with separate queries each with some conditional inner joins are used.

 

Technorati Tags: ,

Tags: