TSQL DateTime to DateKey Int - tsql

In Scaling Up Your Data Warehouse with SQL Server 2008 R2, the author recommends using an integer date key in the format of YYYYMMDD as a clustered index on your fact tables to help optimize query speed.
What is the best way to convert your key date field to the Date Key? I feel the following would work, but is a bit sloppy:
select Replace(CONVERT(varchar,GETDATE(),102),'.','')
Clearly, I'm not using getdate, but rather a date column in the table that will be using in my aggregations.
First, how would you suggest making this conversion? Is my idea acceptable?
Second, has anyone had much success using the Date Key as a clustered index?

ISO long (112) would do the trick:
SELECT CONVERT(INT, CONVERT(VARCHAR(8), GETDATE(), 112))
Casting getdate() straight to int with ISO 112 gives 41008 for some reason, but going via a VARCHAR seems to work - i'll update if i think of a faster cast.
EDIT: In regards to the int only vs varchar debate, here are my findings (repeatable on my test rig & production server) Varchar method uses less cpu time for half a million casts but a fraction slower overall - negligible unless your dealing with billions of rows
EDIT 2: Revised test case to clear cache and differnt dates
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS TIME ON;
WITH RawDates ( [Date] )
AS ( SELECT TOP 500000
DATEADD(DAY, N, GETDATE())
FROM TALLY
)
SELECT YEAR([Date]) * 10000 + MONTH([Date]) * 100 + DAY([Date])
FROM RawDates
SET STATISTICS TIME OFF
(500000 row(s) affected)
SQL Server Execution Times:
CPU time = 218 ms, elapsed time = 255ms.
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS TIME ON;
WITH RawDates ( [Date] )
AS ( SELECT TOP 500000
DATEADD(DAY, N, GETDATE())
FROM TALLY
)
SELECT CONVERT(INT, CONVERT(VARCHAR(8), [Date], 112))
FROM RawDates
SET STATISTICS TIME OFF
(500000 row(s) affected)
SQL Server Execution Times:
CPU time = 266 ms, elapsed time = 602ms

Converting to strings and back again can be surprisingly slow. Instead, you could deal entirely with integers, like this:
Select Year(GetDate()) * 10000 + Month(GetDate()) * 100 + Day(GetDate())
In my brief testing, this is slightly faster than converting to string and then to int. The Year, Month and Day function each returns an integer, so the performance is slightly better.

Instead of creating a DateKey using the YYYYMMDD format, you could use the DATEDIFF function to get the number of days between 0 (i.e. "the date represented by 0") and the date you're making the DateKey for.
SELECT DATEDIFF(day,0,GETDATE())
The drawback is that you can't easily look at the value and determine the date, but you can use the DATEADD function to calculate the original date (I've also seen this trick used truncate the time part of a datetime).
SELECT DATEADD(day, 41007, 0)
(Note: 41007 is the result of the DATEDIFF function above when I ran it on 4/10/2012.)

Related

Calculate the sum of time column in PostgreSql

Can anyone suggest me, the easiest way to find summation of time field in POSTGRESQL. i just find a solution for MYSQL but i need the POSTGRESQL version.
MYSQL: https://stackoverflow.com/questions/3054943/calculate-sum-time-with-mysql
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(timespent))) FROM myTable;
Demo Data
id time
1 1:23:23
2 4:00:23
3 9:23:23
Desired Output
14:47:09
What you want, is not possible. But you probably misunderstood the time type: it represents a precise time-point in a day. It doesn't make much sense, to add two (or more) times. f.ex. '14:00' + '14:00' = '28:00' (but there are no 28th hour in a day).
What you probably want, is interval (which represents time intervals; hours, minutes, or even years). sum() supports interval arguments.
If you use intervals, it's just that simple:
SELECT sum(interval_col) FROM my_table;
Although, if you stick to the time type (but you have no reason to do that), you can cast it to interval to calculate with it:
SELECT sum(time_col::interval) FROM my_table;
But again, the result will be interval, because time values cannot exceed the 24th hour in a day.
Note: PostgreSQL will even do the cast for you, so sum(time_col) should work too, but the result is interval in this case too.
I tried this solution on sql fieddle:
link
Table creation:
CREATE TABLE time_table (
id integer, time time
);
Insert data:
INSERT INTO time_table (id,time) VALUES
(1,'1:23:23'),
(2,'4:00:23'),
(3,'9:23:23')
query the data:
SELECT
sum(s.time)
FROM
time_table s;
If you need to calculate sum of some field, according another field, you can do this:
select
keyfield,
sum(time_col::interval) totaltime
FROM myTable
GROUP by keyfield
Output example:
keyfield; totaltime
"Gabriel"; "10:00:00"
"John"; "36:00:00"
"Joseph"; "180:00:00"
Data type of totaltime is interval.

T-sql issue with coverting bigint to datetime

I have bigint value 635107999009730000.
I'm using this statement to convert this bigint to datetime:
select dateadd(second, 635107999009730000 /1000 + 635107999009730000 % 1000 + 8*60*60, '19700101')
I'm getting overflow error. Looks like dateadd function just cannot handle this bigint value.
How can I convert 635107999009730000 to datetime?
635107999009730000 value is grabbed from Microsoft LYNC 2013 database and I don't really know what datetime this should be.
I may be off the mark here but that value looks like nanoseconds and if that is the case you just have to divide it by a billion to get the seconds and add it to the unix time:
select dateadd(second, 635107999009730000 / 1000000000, '19700101')
You will have to test this against you data.
I ran into the same issue today. Both user1016945 and Suroh are correct to a degree. It is 10^7 ticks per second staring with 0001-01-01. So I slightly modified Suroh's statement. Since i'm not SQL expert, it's clumsy, but it work. You will lose seconds though. It overflows if I try to include seconds:
dateadd(year, -2000, dateadd(minute, 635107999009730000 / 600000000, '2001-01-01'))
I subtract 2000 years because I don't know how to pass 0001-01-01 to SQL. :)

Postgresql. Dates interval issue

I'm trying to get difference in days, casting result to decimal:
SELECT
CAST( TO_DATE('2999-01-01','yyyy-mm-dd') - TO_DATE('2909-01-01','yyyy-mm-dd') AS DECIMAL )
;
Now if I add 1 month to the 2nd date:
SELECT
CAST( TO_DATE('2999-01-01','yyyy-mm-dd') - (TO_DATE('2909-01-01','yyyy-mm-dd') + INTERVAL '1 MONTH' * (1) ) AS DECIMAL )
;
I'm getting an error:
ERROR: cannot cast type interval to numeric
OK, I can cast to char to get result:
SELECT
CAST( TO_CHAR( TO_DATE('2909-02-10','yyyy-mm-dd') - (TO_DATE('2909-01-01','yyyy-mm-dd') + INTERVAL '1 MONTH' * (1) ), 'DD') AS DECIMAL )
;
But in this case the 1st query modified with TO_CHAR casting stop working:
SELECT
CAST( TO_CHAR(TO_DATE('2999-01-01','yyyy-mm-dd') - TO_DATE('2909-01-01','yyyy-mm-dd'), 'DD') AS DECIMAL )
;
I'm getting ERROR: multiple decimal points.
So, my question is, how can I get days using the same sql statement? For both sql queries.
Look at your first two examples again. If you remove the outer CAST ... AS DECIMAL you get
?column?
----------
32872
?column?
------------
32841 days
Clearly the difference is in the "days". The second is an interval value rather than a simple number. You only want the number (because you always just want days) so you need to extract that part. Then you can cast to whatever precision you like:
SELECT extract(days FROM '32841 days'::interval)::numeric(9,2);
date_part
-----------
32841.00
Edit responding to Alexandr's follow-up:
Your first example fails with a fairly specific error:
SELECT extract(days FROM (TO_DATE('2999-01-01','yyyy-mm-dd') - TO_DATE('2909-01-01','yyyy-mm-dd'))::interval)::numeric(9,2);
ERROR: cannot cast type integer to interval
LINE 1: ...yyyy-mm-dd') - TO_DATE('2909-01-01','yyyy-mm-dd'))::interval...
Here you've got an integer (which is what you originally wanted) and try to cast it to an interval (for reasons I don't understand). It's complaining it doesn't know what units you want. You want 32872 what in your interval - seconds, hours, weeks, centuries?
The second example is complaining because you are trying to extract the "day" part from a simple integer, and of course there's no extract() function in the system to do that.
I think you probably need to take a step back and just take the time to understand the values your various expressions return.
Subtracting one date from another gives the number of days separating them - as an integer. There is no other sensible measure, really.
Adding (or subtracting) an interval to a date gives you a timestamp (without time zone) since the interval may contain whole days, days and hours, seconds etc.
Subtracting a timestamp from a date will give you an interval since the result may contain days, hours, seconds etc.
If you have an interval and you just want the days part then you use extract() on it and you will get an integer number of days back.
You will need an integer (or floating-point) number of days if you want to cast to numeric, not an interval because casting an interval to an scalar number makes no sense without units.
So - either stick to dates and date arithmetic (easy), or realise you are using timestamps (flexible) but understand which it is.
To get an illustration of what's happening you can do something like this (in psql):
CREATE TEMP TABLE tt AS SELECT
('2909-01-02'::date - '2909-01-01'::date) AS a,
('2909-01-02'::date - '2909-01-02 00:00:00'::timestamp) AS b;
\x
SELECT * FROM tt;
\d tt
That will show you the values and types you are dealing with. Repeat for as many columns as you find useful.
HTH
If you're doing interval arithmetic with dates, you should generally be using timestamps instead, as mentioned in the docs.
# SELECT extract(days FROM TO_TIMESTAMP('2999-01-01','yyyy-mm-dd') - TO_TIMESTAMP('2909-01-01','yyyy-mm-dd'))
date_part
-----------
32872
# SELECT extract(days FROM TO_TIMESTAMP('2999-01-01','yyyy-mm-dd') - (TO_TIMESTAMP('2909-01-01','yyyy-mm-dd') + '1 month'::interval) );
date_part
-----------
32841
The result of adding an interval to a date is actually a timestamp, not another date (the interval might have contained time portions), so you have to cast the result of the addition back down to date first:
SELECT
CAST( TO_DATE('2999-01-01','yyyy-mm-dd')
- CAST( (TO_DATE('2909-01-01','yyyy-mm-dd') + INTERVAL '1 MONTH' * (1) ) AS DATE)
AS DECIMAL )

How to truncate seconds in TSQL?

I have time, select cast(SYSDATETIME() AS time)
14:59:09.2834595
What is the way to truncate seconds?
14:59
Description
You can use the T-SQL function convert.
Sample
PRINT convert(varchar(5), SYSDATETIME(), 108)
will give you hh:mm
More Information
MSDN - CAST and CONVERT
If you want to truncate seconds and still have a T-SQL Date datatype, first convert the date into minutes from the date '0' and then add the minutes back to '0'. This answer doesn't require any additional parsing/converting. This method works to truncate other parts just change MINUTE.
Example:
SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, 0, '2016-01-01 23:22:56.997'), 0)
If you need to drop seconds off entirely, you can use the DATEPART() function (SQL Server) to strip out the hour and minute, then append it back together. (I like dknaack's solution more, if that works.)
SELECT CAST(DATEPART(hour, SYSDATETIME()) + ':' + DATEPART(minute, SYSDATETIME()) AS DATETIME)
select cast(left(cast(SYSDATETIME() AS time), 5) as time)

Using DATEDIFF in T-SQL

I am using DATEDIFF in an SQL statement. I am selecting it, and I need to use it in WHERE clause as well. This statement does not work...
SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
WHERE InitialSave <= 10
It gives the message: Invalid column name "InitialSave"
But this statement works fine...
SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
WHERE DATEDIFF(ss, BegTime, EndTime) <= 10
The programmer in me says that this is inefficient (seems like I am calling the function twice).
So two questions. Why doesn't the first statement work? Is it inefficient to do it using the second statement?
Note: When I originally wrote this answer I said that an index on one of the columns could create a query that performs better than other answers (and mentioned Dan Fuller's). However, I was not thinking 100% correctly. The fact is, without a computed column or indexed (materialized) view, a full table scan is going to be required, because the two date columns being compared are from the same table!
I believe there is still value in the information below, namely 1) the possibility of improved performance in the right situation, as when the comparison is between columns from different tables, and 2) promoting the habit in SQL developers of following best practice and reshaping their thinking in the right direction.
Making Conditions Sargable
The best practice I'm referring to is one of moving one column to be alone on one side of the comparison operator, like so:
SELECT InitialSave = DateDiff(second, T.BegTime, T.EndTime)
FROM dbo.MyTable T
WHERE T.EndTime <= T.BegTime + '00:00:10'
As I said, this will not avoid a scan on a single table, however, in a situation like this it could make a huge difference:
SELECT InitialSave = DateDiff(second, T.BegTime, T.EndTime)
FROM
dbo.BeginTime B
INNER JOIN dbo.EndTime E
ON B.BeginTime <= E.EndTime
AND B.BeginTime + '00:00:10' > E.EndTime
EndTime is in both conditions now alone on one side of the comparison. Assuming that the BeginTime table has many fewer rows, and the EndTime table has an index on column EndTime, this will perform far, far better than anything using DateDiff(second, B.BeginTime, E.EndTime). It is now sargable, which means there is a valid "search argument"--so as the engine scans the BeginTime table, it can seek into the EndTime table. Careful selection of which column is by itself on one side of the operator is required--it can be worth experimenting by putting BeginTime by itself by doing some algebra to switch to AND B.BeginTime > E.EndTime - '00:00:10'
Precision of DateDiff
I should also point out that DateDiff does not return elapsed time, but instead counts the number of boundaries crossed. If a call to DateDiff using seconds returns 1, this could mean 3 ms elapsed time, or it could mean 1997 ms! This is essentially a precision of +- 1 time units. For the better precision of +- 1/2 time unit, you would want the following query comparing 0 to EndTime - BegTime:
SELECT DateDiff(second, 0, EndTime - BegTime) AS InitialSave
FROM MyTable
WHERE EndTime <= BegTime + '00:00:10'
This now has a maximum rounding error of only one second total, not two (in effect, a floor() operation). Note that you can only subtract the datetime data type--to subtract a date or a time value you would have to convert to datetime or use other methods to get the better precision (a whole lot of DateAdd, DateDiff and possibly other junk, or perhaps using a higher precision time unit and dividing).
This principle is especially important when counting larger units such as hours, days, or months. A DateDiff of 1 month could be 62 days apart (think July 1, 2013 - Aug 31 2013)!
You can't access columns defined in the select statement in the where statement, because they're not generated until after the where has executed.
You can do this however
select InitialSave from
(SELECT DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable) aTable
WHERE InitialSave <= 10
As a sidenote - this essentially moves the DATEDIFF into the where statement in terms of where it's first defined. Using functions on columns in where statements causes indexes to not be used as efficiently and should be avoided if possible, however if you've got to use datediff then you've got to do it!
beyond making it "work", you need to use an index
use a computed column with an index, or a view with an index, otherwise you will table scan. when you get enough rows, you will feel the PAIN of the slow scan!
computed column & index:
ALTER TABLE MyTable ADD
ComputedDate AS DATEDIFF(ss,BegTime, EndTime)
GO
CREATE NONCLUSTERED INDEX IX_MyTable_ComputedDate ON MyTable
(
ComputedDate
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
create a view & index:
CREATE VIEW YourNewView
AS
SELECT
KeyValues
,DATEDIFF(ss, BegTime, EndTime) AS InitialSave
FROM MyTable
GO
CREATE CLUSTERED INDEX IX_YourNewView
ON YourNewView(InitialSave)
GO
You have to use the function instead of the column alias - it is the same with count(*), etc. PITA.
As an alternate, you can use computed columns.