Find gaps in a time data table (C#, LINQ to SQL): Optimized

In the last post I showed a LINQ to SQL query for finding gaps in a time data table. This query can be optimized to be very fast by applying a very little modification on it. First, let us fill the table with more than 1000 rows and adding an index to the dt column. This can be done by executing the following script:

USE [MeasuresDb]

-- Create table and index
CREATE TABLE [dbo].[dtTable](
	[id] [smallint] IDENTITY(1,1) NOT NULL,
	[dt] [datetimeoffset](7) NOT NULL,
 CONSTRAINT [PK_dtTable] PRIMARY KEY CLUSTERED
	(
		[id] ASC
	)
) ON [PRIMARY]

GO

CREATE UNIQUE NONCLUSTERED INDEX [Idx_Dt] ON [dbo].[dtTable]
(
	[dt] ASC
)
Go

-- Fill table with data
declare @dtStart datetimeoffset , @dtEnd datetimeOffset
set @dtStart = '2012-09-01 00:00:00'
set @dtEnd = dateadd(minute, 1440, @dtStart)

insert into [dbo].[dtTable]
    select dateadd(minute, number, @dtStart)
    from
        (select distinct number from master.dbo.spt_values
         where name is null
        ) n
    where dateadd (minute, number, @dtStart) < @dtEnd

-- Create gaps (delete rows)
delete from [dbo].[dtTable]
where id = 2 or id = 3 or id = 7

And the query that we have used in the previous post is:

from a in DtTables
where DtTables.Count(date => date.Dt == a.Dt.AddMinutes(1)) == 0 &&
				   a.Dt < DtTables.Max(date => date.Dt)
select new
{
		startGap = a.Dt,
		endGap = (from b in DtTables
			   where b.Dt > a.Dt
			   select b).Min(c => c.Dt)
}

Before optimizing the actual query, let us analyze it first. Each LINQtoSQL query is translated to T-SQL language before sending it to SQl Server. To see the translated T-SQL from this query, we will use LINQPad tool (free tool).  it’s an ergonomic C#/VB/F# scratchpad that instantly executes any expression, statement block or program with rich output formatting.  So, copy the linq query to LINQPad and execute it:

The translated  T-SQL query is :

-- Region Parameters
DECLARE @p0 Float = 1
DECLARE @p1 Int = 0
-- EndRegion
SELECT [t0].[dt] AS [startGap], (
    SELECT MIN([t4].[dt])
    FROM [dtTable] AS [t4]
    WHERE [t4].[dt] > [t0].[dt]
    ) AS [endGap]
FROM [dtTable] AS [t0]
WHERE (((
    SELECT COUNT(*)
    FROM (
        SELECT
            (CASE
                WHEN [t1].[dt] = DATEADD(ms, (CONVERT(BigInt,@p0 * 60000)) % 86400000, DATEADD(day, (CONVERT(BigInt,@p0 * 60000)) / 86400000, [t0].[dt])) THEN 1
                WHEN NOT ([t1].[dt] = DATEADD(ms, (CONVERT(BigInt,@p0 * 60000)) % 86400000, DATEADD(day, (CONVERT(BigInt,@p0 * 60000)) / 86400000, [t0].[dt]))) THEN 0
                ELSE NULL
             END) AS [value]
        FROM [dtTable] AS [t1]
        ) AS [t2]
    WHERE [t2].[value] = 1
    )) = @p1) AND ([t0].[dt] < ((
    SELECT MAX([t3].[dt])
    FROM [dtTable] AS [t3]
    )))

To know if this query has a problem, we must see its  execution plan.  SQL Sentry Explorer is a good free tool to accomplish this job.  Copy the T-SQL query to the commandText tab of the SQL Sentry and click on Actual Plan, we get the following execution plan:

We notic that the filter operator which is marked with red has 80.1% CPU and I/O cost. The job of this  operator is to restrict a set of rows based on a predicate. And the predicate in our T-SQL query is from line 12 to 21 in the above list. The overall statistics of the query execution is:

Now, let us optimize the Linq query. What we will do is a very simple modification. Change the Count operator of the second statement to Where and add a Count operator to the end of the satement as shown here:

from a in DtTables
where DtTables.Where(date => date.Dt == a.Dt.AddMinutes(1)).Count() == 0 &&
					a.Dt < DtTables.Max(date => date.Dt)
select new
{
		startGap = a.Dt,
		endGap = (from b in DtTables
			   where b.Dt > a.Dt
			   select b).Min(c => c.Dt)
}

The translated T-SQL query is:

-- Region Parameters
DECLARE @p0 Float = 1
DECLARE @p1 Int = 0
-- EndRegion
SELECT [t0].[dt] AS [startGap], (
    SELECT MIN([t3].[dt])
    FROM [dtTable] AS [t3]
    WHERE [t3].[dt] > [t0].[dt]
    ) AS [endGap]
FROM [dtTable] AS [t0]
WHERE (((
    SELECT COUNT(*)
    FROM [dtTable] AS [t1]
    WHERE [t1].[dt] = DATEADD(ms, (CONVERT(BigInt,@p0 * 60000)) % 86400000, DATEADD(day, (CONVERT(BigInt,@p0 * 60000)) / 86400000, [t0].[dt]))
    )) = @p1) AND ([t0].[dt] < ((
    SELECT MAX([t2].[dt])
    FROM [dtTable] AS [t2]
    )))

We notice that the translation is not tha same as in the first query. The fruit of the optimization is noticed in the value of the query execution duration which is very small now as shown here:

This T-SQL query does not contain the predicate that casues a poor performance as in the first version.

So, we must be careful with linq query, a little modification can change the execution performance from poor to good or vice-versa.

This entry was posted in C#, SQL Server. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>