Get days difference between two datetimes in SparkSQL - date

I want to get the integer days difference between two datetimes in SparkSQL, but it ignores the time part and returns a different result than expected.
For example, the query below returns 9, but I expected 8:
SELECT DATEDIFF(CAST('2021-07-10 02:26:16' AS TIMESTAMP), CAST('2021-07-01 19:10:28' AS TIMESTAMP))
I can achieve the expected result by converting the datetime to long, so I can get the seconds difference between them and convert the result to days, casting to integer, like:
SELECT CAST((CAST(CAST('2021-07-10 02:26:16' AS TIMESTAMP) AS LONG) - CAST(CAST('2021-07-01 19:10:28' AS TIMESTAMP) AS LONG))/(60*60*24) AS INTEGER)
I wanted to know if there is a 'more recommended way' of doing that, like using some built-in SparkSQL function.
Thanks in advance.

I would recommend using the extract SQL function and apply it to the interval (difference of two timestamps).
Extracts a part of the date/timestamp or interval source
*) extract function is available in Spark from version 3.x on.
See example below
WITH input AS (
select TIMESTAMP'2021-07-10 02:26:16' t2,
TIMESTAMP'2021-07-01 19:10:28' t1
)
SELECT
datediff(t2, t1) `datediff`,
extract(day FROM t2-t1) `extract`
FROM input
returns
datediff
extract
9
8

Related

Find days between dates

I am looking to subtract 2 dates to get the number of days in between. However, the columns are defined as "timestamp without time zone". I'm not sure how to get a whole integer.
I have a stored procedure with this code:
v_days_between_changes := DATE_PATH('day',CURRENT_DATE - v_prev_rec.date_updated)::integer;
But I get this error:
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
QUERY: SELECT DATE_PATH('day',CURRENT_DATE - v_prev_rec.date_updated)::integer
Any help would be great.
You can compute the difference between the dates, which returns an interval. Then, extract the number of days from this interval.
WITH src(dt1,dt2) AS (select '1/1/2019'::timestamp without time zone, CURRENT_DATE )
select EXTRACT(day FROM dt2-dt1) nbdays
from src;
nbdays
-----------
98

Convert integer to week interval

How one can convert integer to week interval?
CREATE TABLE integers( i integer);
INSERT INTO integers VALUES ('10');
Output would be table with one column indicating 10 weeks interval.
http://sqlfiddle.com/#!17/4b404/5/0
One take would be to create constant interval of 1 week and multiply it by integer.
I would prefer function to do it directly, but I am not aware of it.
SELECT interval '1 week' * i AS weeks_interval FROM integers;
Your solution is well accepted.
If you don't want to keep the "1" in the string you could write this instead
SELECT (i || 'week')::interval FROM intervals
demo: db<>fiddle

not able to perform comparison between two TIMESTAMP columns in apache beam

I'm trying to compare between two columns which are declared as TIMESTAMP datatype like below :
select a.*, ROUND(SUM(b.CausalValue),2) as GRPs from table1 a
left join table2 b
on a.Channel = b.Outlet and
a.SubBrand = b.SubBrand and
a.Event = b.SalesComponent and
b.Week >= a.PeriodStartDate and
b.Week <= a.PeriodEndDate
group by Vehicle,Campaign,Copy,Event,CatLib
Week, PeriodStartDate and PeriodEndDate are declared as TIMESTAMP and I'm not able to perform this operation.
What my understanding as of now is may be beamSql does not allow comparison between two TIMESTAMP columns.
Any idea ?
Correct, at the moment comparison of date, time, timestamp, interval types is not implemented yet
#Anton, #Andrew : Well I agree that this feature is not implemented yet and what I did is I replaced timestamp with Date datatype and then converted the date column into BigInt (using extract and concat functions of sql) and then did the comparison and it worked perfectly fine.
Thanks alot guys .

date and time concatenation in postgres

I wanted to get the only the hour of the time and concatenate it with the date.
here's my query
SELECT distinct TOTALIZER_METER_READINGS.date + to_char(TOTALIZER_METER_READINGS.time ,'HH')
FROM TOTALIZER_METER_READINGS
is there any other way to get the hour of the time without turning it into text?
+ is the operator to add numbers, dates or intervals.
The string concatenation operator in SQL is ||.
As you are storing date and time in two columns rather then using a single timestamp column, I would convert them to a single timestamp value, then apply to_char() on the "complete" timestamp:
Adding a time to a date returns a timestamp that can then be formatted as you want:
SELECT distinct to_char(TOTALIZER_METER_READINGS.date + TOTALIZER_METER_READINGS.time, 'yyyy-mm-dd HH')
FROM TOTALIZER_METER_READINGS
You can use EXTRACT (or the date_part function):
SELECT EXTRACT(hour FROM current_timestamp);
The result type is double precision.

Postgresql. Dates interval issue

I'm trying to get difference in days, casting result to decimal:
SELECT
CAST( TO_DATE('2999-01-01','yyyy-mm-dd') - TO_DATE('2909-01-01','yyyy-mm-dd') AS DECIMAL )
;
Now if I add 1 month to the 2nd date:
SELECT
CAST( TO_DATE('2999-01-01','yyyy-mm-dd') - (TO_DATE('2909-01-01','yyyy-mm-dd') + INTERVAL '1 MONTH' * (1) ) AS DECIMAL )
;
I'm getting an error:
ERROR: cannot cast type interval to numeric
OK, I can cast to char to get result:
SELECT
CAST( TO_CHAR( TO_DATE('2909-02-10','yyyy-mm-dd') - (TO_DATE('2909-01-01','yyyy-mm-dd') + INTERVAL '1 MONTH' * (1) ), 'DD') AS DECIMAL )
;
But in this case the 1st query modified with TO_CHAR casting stop working:
SELECT
CAST( TO_CHAR(TO_DATE('2999-01-01','yyyy-mm-dd') - TO_DATE('2909-01-01','yyyy-mm-dd'), 'DD') AS DECIMAL )
;
I'm getting ERROR: multiple decimal points.
So, my question is, how can I get days using the same sql statement? For both sql queries.
Look at your first two examples again. If you remove the outer CAST ... AS DECIMAL you get
?column?
----------
32872
?column?
------------
32841 days
Clearly the difference is in the "days". The second is an interval value rather than a simple number. You only want the number (because you always just want days) so you need to extract that part. Then you can cast to whatever precision you like:
SELECT extract(days FROM '32841 days'::interval)::numeric(9,2);
date_part
-----------
32841.00
Edit responding to Alexandr's follow-up:
Your first example fails with a fairly specific error:
SELECT extract(days FROM (TO_DATE('2999-01-01','yyyy-mm-dd') - TO_DATE('2909-01-01','yyyy-mm-dd'))::interval)::numeric(9,2);
ERROR: cannot cast type integer to interval
LINE 1: ...yyyy-mm-dd') - TO_DATE('2909-01-01','yyyy-mm-dd'))::interval...
Here you've got an integer (which is what you originally wanted) and try to cast it to an interval (for reasons I don't understand). It's complaining it doesn't know what units you want. You want 32872 what in your interval - seconds, hours, weeks, centuries?
The second example is complaining because you are trying to extract the "day" part from a simple integer, and of course there's no extract() function in the system to do that.
I think you probably need to take a step back and just take the time to understand the values your various expressions return.
Subtracting one date from another gives the number of days separating them - as an integer. There is no other sensible measure, really.
Adding (or subtracting) an interval to a date gives you a timestamp (without time zone) since the interval may contain whole days, days and hours, seconds etc.
Subtracting a timestamp from a date will give you an interval since the result may contain days, hours, seconds etc.
If you have an interval and you just want the days part then you use extract() on it and you will get an integer number of days back.
You will need an integer (or floating-point) number of days if you want to cast to numeric, not an interval because casting an interval to an scalar number makes no sense without units.
So - either stick to dates and date arithmetic (easy), or realise you are using timestamps (flexible) but understand which it is.
To get an illustration of what's happening you can do something like this (in psql):
CREATE TEMP TABLE tt AS SELECT
('2909-01-02'::date - '2909-01-01'::date) AS a,
('2909-01-02'::date - '2909-01-02 00:00:00'::timestamp) AS b;
\x
SELECT * FROM tt;
\d tt
That will show you the values and types you are dealing with. Repeat for as many columns as you find useful.
HTH
If you're doing interval arithmetic with dates, you should generally be using timestamps instead, as mentioned in the docs.
# SELECT extract(days FROM TO_TIMESTAMP('2999-01-01','yyyy-mm-dd') - TO_TIMESTAMP('2909-01-01','yyyy-mm-dd'))
date_part
-----------
32872
# SELECT extract(days FROM TO_TIMESTAMP('2999-01-01','yyyy-mm-dd') - (TO_TIMESTAMP('2909-01-01','yyyy-mm-dd') + '1 month'::interval) );
date_part
-----------
32841
The result of adding an interval to a date is actually a timestamp, not another date (the interval might have contained time portions), so you have to cast the result of the addition back down to date first:
SELECT
CAST( TO_DATE('2999-01-01','yyyy-mm-dd')
- CAST( (TO_DATE('2909-01-01','yyyy-mm-dd') + INTERVAL '1 MONTH' * (1) ) AS DATE)
AS DECIMAL )