Using DATEDIFF in T-SQL - tsql

I am using DATEDIFF in an SQL statement. I am selecting it, and I need to use it in WHERE clause as well. This statement does not work...
SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
WHERE InitialSave <= 10
It gives the message: Invalid column name "InitialSave"
But this statement works fine...
SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
WHERE DATEDIFF(ss, BegTime, EndTime) <= 10
The programmer in me says that this is inefficient (seems like I am calling the function twice).
So two questions. Why doesn't the first statement work? Is it inefficient to do it using the second statement?

Note: When I originally wrote this answer I said that an index on one of the columns could create a query that performs better than other answers (and mentioned Dan Fuller's). However, I was not thinking 100% correctly. The fact is, without a computed column or indexed (materialized) view, a full table scan is going to be required, because the two date columns being compared are from the same table!
I believe there is still value in the information below, namely 1) the possibility of improved performance in the right situation, as when the comparison is between columns from different tables, and 2) promoting the habit in SQL developers of following best practice and reshaping their thinking in the right direction.
Making Conditions Sargable
The best practice I'm referring to is one of moving one column to be alone on one side of the comparison operator, like so:
SELECT InitialSave = DateDiff(second, T.BegTime, T.EndTime)
FROM dbo.MyTable T
WHERE T.EndTime <= T.BegTime + '00:00:10'
As I said, this will not avoid a scan on a single table, however, in a situation like this it could make a huge difference:
SELECT InitialSave = DateDiff(second, T.BegTime, T.EndTime)
FROM
dbo.BeginTime B
INNER JOIN dbo.EndTime E
ON B.BeginTime <= E.EndTime
AND B.BeginTime + '00:00:10' > E.EndTime
EndTime is in both conditions now alone on one side of the comparison. Assuming that the BeginTime table has many fewer rows, and the EndTime table has an index on column EndTime, this will perform far, far better than anything using DateDiff(second, B.BeginTime, E.EndTime). It is now sargable, which means there is a valid "search argument"--so as the engine scans the BeginTime table, it can seek into the EndTime table. Careful selection of which column is by itself on one side of the operator is required--it can be worth experimenting by putting BeginTime by itself by doing some algebra to switch to AND B.BeginTime > E.EndTime - '00:00:10'
Precision of DateDiff
I should also point out that DateDiff does not return elapsed time, but instead counts the number of boundaries crossed. If a call to DateDiff using seconds returns 1, this could mean 3 ms elapsed time, or it could mean 1997 ms! This is essentially a precision of +- 1 time units. For the better precision of +- 1/2 time unit, you would want the following query comparing 0 to EndTime - BegTime:
SELECT DateDiff(second, 0, EndTime - BegTime) AS InitialSave
FROM MyTable
WHERE EndTime <= BegTime + '00:00:10'
This now has a maximum rounding error of only one second total, not two (in effect, a floor() operation). Note that you can only subtract the datetime data type--to subtract a date or a time value you would have to convert to datetime or use other methods to get the better precision (a whole lot of DateAdd, DateDiff and possibly other junk, or perhaps using a higher precision time unit and dividing).
This principle is especially important when counting larger units such as hours, days, or months. A DateDiff of 1 month could be 62 days apart (think July 1, 2013 - Aug 31 2013)!

You can't access columns defined in the select statement in the where statement, because they're not generated until after the where has executed.
You can do this however
select InitialSave from
(SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable) aTable
WHERE InitialSave <= 10
As a sidenote - this essentially moves the DATEDIFF into the where statement in terms of where it's first defined. Using functions on columns in where statements causes indexes to not be used as efficiently and should be avoided if possible, however if you've got to use datediff then you've got to do it!

beyond making it "work", you need to use an index
use a computed column with an index, or a view with an index, otherwise you will table scan. when you get enough rows, you will feel the PAIN of the slow scan!
computed column & index:
ALTER TABLE MyTable ADD
ComputedDate AS DATEDIFF(ss,BegTime, EndTime)
GO
CREATE NONCLUSTERED INDEX IX_MyTable_ComputedDate ON MyTable
(
ComputedDate
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
create a view & index:
CREATE VIEW YourNewView
AS
SELECT
KeyValues
,DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
GO
CREATE CLUSTERED INDEX IX_YourNewView
ON YourNewView(InitialSave)
GO

You have to use the function instead of the column alias - it is the same with count(*), etc. PITA.

As an alternate, you can use computed columns.

Related

Postgres combining date and time fields, is this efficient

I am selecting rows based on a date range which is held in a string using the below SQL which works but is this a efficient way of doing it. As you can see the date and time is held in different fields. From my memory or doing Oracle work as soon as you put a function around a attribute it cant use indexes.
select *
from events
where venue_id = '2'
and EXTRACT(EPOCH FROM (start_date + start_time))
between EXTRACT(EPOCH FROM ('2017-09-01 00:00')::timestamp)
and EXTRACT(EPOCH FROM ('2017-09-30 00:00')::timestamp)
So is there a way of doing this that can use indexes?
Preface: Since your query is limited to a single venue_id, both examples below create a compound index with venue_id first.
If you want an index for improving that query, you can create an expression index:
CREATE INDEX events_start_idx
ON events (venue_id, (EXTRACT(EPOCH FROM (start_date + start_time))));
If you don't want a dedicated function index, you can create a normal index on the start_date column, and add extra logic to use the index. The index will then limit access plan to date range, and fringe records with wrong time of day on first and last dates are filtered out.
In the following, I'm also eliminating the unnecessary extraction of epoch.
CREATE INDEX events_venue_start
ON events (venue_id, start_date);
SELECT *
FROM events
WHERE venue_id = '2'
AND start_date BETWEEN '2017-09-01'::date AND '2017-09-30'::date
AND start_date + start_time BETWEEN '2017-09-01 00:00'::timestamp
AND '2017-09-30 00:00'::timestamp
The first two parts of the WHERE clause will use the index to full benefit. the last part is then use the filter the records found by the index.

Calculate the sum of time column in PostgreSql

Can anyone suggest me, the easiest way to find summation of time field in POSTGRESQL. i just find a solution for MYSQL but i need the POSTGRESQL version.
MYSQL: https://stackoverflow.com/questions/3054943/calculate-sum-time-with-mysql
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(timespent))) FROM myTable;
Demo Data
id time
1 1:23:23
2 4:00:23
3 9:23:23
Desired Output
14:47:09
What you want, is not possible. But you probably misunderstood the time type: it represents a precise time-point in a day. It doesn't make much sense, to add two (or more) times. f.ex. '14:00' + '14:00' = '28:00' (but there are no 28th hour in a day).
What you probably want, is interval (which represents time intervals; hours, minutes, or even years). sum() supports interval arguments.
If you use intervals, it's just that simple:
SELECT sum(interval_col) FROM my_table;
Although, if you stick to the time type (but you have no reason to do that), you can cast it to interval to calculate with it:
SELECT sum(time_col::interval) FROM my_table;
But again, the result will be interval, because time values cannot exceed the 24th hour in a day.
Note: PostgreSQL will even do the cast for you, so sum(time_col) should work too, but the result is interval in this case too.
I tried this solution on sql fieddle:
link
Table creation:
CREATE TABLE time_table (
id integer, time time
);
Insert data:
INSERT INTO time_table (id,time) VALUES
(1,'1:23:23'),
(2,'4:00:23'),
(3,'9:23:23')
query the data:
SELECT
sum(s.time)
FROM
time_table s;
If you need to calculate sum of some field, according another field, you can do this:
select
keyfield,
sum(time_col::interval) totaltime
FROM myTable
GROUP by keyfield
Output example:
keyfield; totaltime
"Gabriel"; "10:00:00"
"John"; "36:00:00"
"Joseph"; "180:00:00"
Data type of totaltime is interval.

MS SQL Server 2008/2012 Get Min Difference between any two values

Given a table with a single money column how do I calculate the smallest difference between any two values in that table using TSQL? I'm looking for the performance optimized solution, which will work with millions of rows.
For SQL Server 2012 you could use
;WITH CTE
AS (SELECT YourColumn - Lag(YourColumn) OVER (ORDER BY YourColumn) AS Diff
FROM YourTable)
SELECT
Min(Diff) AS MinDiff
FROM CTE
This does it with one scan of the table (ideally you would have an index on YourColumn to avoid a sort and a narrow index on that single column would reduce IO).
I can't think of a nice way of getting it to short circuit and so do less than one scan of the table if it finds the minimum possible difference of zero. Adding MIN(CASE WHEN Diff = 0 THEN 1/0 END) to the SELECT list and trapping the divide by zero error as a signal that zero was found would probably work but I can't really recommend that approach...

TSQL DateTime to DateKey Int

In Scaling Up Your Data Warehouse with SQL Server 2008 R2, the author recommends using an integer date key in the format of YYYYMMDD as a clustered index on your fact tables to help optimize query speed.
What is the best way to convert your key date field to the Date Key? I feel the following would work, but is a bit sloppy:
select Replace(CONVERT(varchar,GETDATE(),102),'.','')
Clearly, I'm not using getdate, but rather a date column in the table that will be using in my aggregations.
First, how would you suggest making this conversion? Is my idea acceptable?
Second, has anyone had much success using the Date Key as a clustered index?
ISO long (112) would do the trick:
SELECT CONVERT(INT, CONVERT(VARCHAR(8), GETDATE(), 112))
Casting getdate() straight to int with ISO 112 gives 41008 for some reason, but going via a VARCHAR seems to work - i'll update if i think of a faster cast.
EDIT: In regards to the int only vs varchar debate, here are my findings (repeatable on my test rig & production server) Varchar method uses less cpu time for half a million casts but a fraction slower overall - negligible unless your dealing with billions of rows
EDIT 2: Revised test case to clear cache and differnt dates
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS TIME ON;
WITH RawDates ( [Date] )
AS ( SELECT TOP 500000
DATEADD(DAY, N, GETDATE())
FROM TALLY
)
SELECT YEAR([Date]) * 10000 + MONTH([Date]) * 100 + DAY([Date])
FROM RawDates
SET STATISTICS TIME OFF
(500000 row(s) affected)
SQL Server Execution Times:
CPU time = 218 ms, elapsed time = 255ms.
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS TIME ON;
WITH RawDates ( [Date] )
AS ( SELECT TOP 500000
DATEADD(DAY, N, GETDATE())
FROM TALLY
)
SELECT CONVERT(INT, CONVERT(VARCHAR(8), [Date], 112))
FROM RawDates
SET STATISTICS TIME OFF
(500000 row(s) affected)
SQL Server Execution Times:
CPU time = 266 ms, elapsed time = 602ms
Converting to strings and back again can be surprisingly slow. Instead, you could deal entirely with integers, like this:
Select Year(GetDate()) * 10000 + Month(GetDate()) * 100 + Day(GetDate())
In my brief testing, this is slightly faster than converting to string and then to int. The Year, Month and Day function each returns an integer, so the performance is slightly better.
Instead of creating a DateKey using the YYYYMMDD format, you could use the DATEDIFF function to get the number of days between 0 (i.e. "the date represented by 0") and the date you're making the DateKey for.
SELECT DATEDIFF(day,0,GETDATE())
The drawback is that you can't easily look at the value and determine the date, but you can use the DATEADD function to calculate the original date (I've also seen this trick used truncate the time part of a datetime).
SELECT DATEADD(day, 41007, 0)
(Note: 41007 is the result of the DATEDIFF function above when I ran it on 4/10/2012.)

TSQL Rolling Average of Time Groupings

This is a follow up to: TSQL Group by N Seconds . (I got what I asked for, but didn't ask for the right thing)
How can I get a rolling average of 1 second groups of count(*)?
So I want to return per second counts, but I also want to be able to smooth that out over certain intervals, say 10 seconds.
So one method might be to take the average per second of every 10 seconds, can that be done in TSQL?
Ideally, the time field would be returned in Unix Time.
SQL Server is not particularly good in rolling/cumulative queries.
You can use this:
WITH q (unix_ts, cnt) AS
(
SELECT DATEDIFF(s, '1970-01-01', ts), COUNT(*)
FROM record
GROUP BY
DATEDIFF(s, '1970-01-01', ts)
)
SELECT *
FROM q q1
CROSS APPLY
(
SELECT AVG(cnt) AS smooth_cnt
FROM q q2
WHERE q2.unix_ts BETWEEN q1.unix_ts - 5 AND q1.unix_ts + 5
) q2
, however, this may not be very efficient, since it will count the same overlapping intervals over an over again.
For the larger invervals, it may be even better to use a CURSOR-based solution that would allow to keep intermediate results (though normally they are worse performance-wise than pure set-based solutions).
Oracle and PostgreSQL support this clause:
WITH q (unix_ts, cnt) AS
(
SELECT TRUNC(ts, 'ss'), COUNT(*)
FROM record
GROUP BY
TRUNC(ts, 'ss')
)
SELECT q.*,
AVG(cnt) OVER (ORDER BY unix_ts RANGE BETWEEN INTERVAL '-5' SECOND AND INTERVAL '5' SECOND)
FROM q
which keeps an internal window buffer and is very efficient.
SQL Server, unfortunately, does not support moving windows.