Get Impala equivalent of MONTHS_BETWEEN() in Snowflake - datediff

I am facing a problem while migrating queries from Impala to Snowflake :
Impala
SELECT period
, now() as dt_today
, MONTHS_BETWEEN(now(), period) as mb
FROM my_table
yields
period dt_today mb
--------------------------------------------------------------------------
2018-10-30T21:43:57Z 2020-02-21 10:21:12.827383000 15.709677419354838
Snowflake
SELECT period
, CURRENT_TIMESTAMP() as dt_today
, DATEDIFF('month', CURRENT_TIMESTAMP(), period) as mb
FROM my_table
yields
period dt_today mb
--------------------------------------------------------------------------
2018-10-30T21:43:57Z 2020-02-21 10:21:12.827383000 16
Now, from Snowflake documentation I understand that when specifying month in DATEDIFF, Snowflake will only "use the month and year from the date" meaning the difference is not as accurate as Impala's.
I've tried implementing a proxy like taking the month difference and then apply some calculation to get the floating point part as such but I still get a wrong number of month :
DATEDIFF('month', period, CURRENT_TIMESTAMP()) + (GREATEST(DAY(period), DAY(CURRENT_TIMESTAMP())) - LEAST(DAY(period), DAY(CURRENT_TIMESTAMP()))) / 31
as well as the following to get even more precise but it still isn't right :
DATEDIFF('day', period, CURRENT_TIMESTAMP())/31 + (GREATEST(DAY(period), DAY(CURRENT_TIMESTAMP())) - LEAST(DAY(period), DAY(CURRENT_TIMESTAMP()))) / 31
Question : how can I exactly reproduce Impala's MONTHS_BETWEEN() in Snowflake ?

TL;DR
IFF(DAY(DATE1) >= DAY(DATE2), DATEDIFF('month', DATE2, DATE1), DATEDIFF('month', DATE2, DATE1) - 1)
+
IFF(DAY(DATE1) >= DAY(DATE2), (GREATEST(DAY(DATE1), DAY(DATE2)) - LEAST(DAY(DATE1), day(DATE2))) / 31, 1 - (GREATEST(DAY(DATE1), DAY(DATE2)) - LEAST(DAY(DATE1), DAY(DATE2))) / 31)
Impala MONTHS_BETWEEN(DATE1, DATE2) function works as follows :
MONTHS_BETWEEN('2019-04-13', '2019-02-10') yields 2.0967 (2 full months + 3/31=0967)
MONTHS_BETWEEN('2019-04-13', '2019-02-03') yields 1.7741 (1 full months + 1-(7/31)=0967)
Now we know that Snowflake DATEDIFF(DATE1, DATE3) applies a simple month-to-month operation :
DATEDIFF('month', '2019-04-13', '2019-02-10') yields 2 (04 - 02)
DATEDIFF('month', '2019-04-13', '2019-02-03') yields 2 (04 - 02)
In order to get the integer part of Impala's MONTHS_BETWEEN using Snowflake functions we apply the following logic :
IFF(DAY(DATE1) >= DAY(DATE2), DATEDIFF('month', DATE2, DATE1), DATEDIFF('month', DATE2, DATE1) - 1)
In order to get the fractional part of Impala's MONTHS_BETWEEN using Snowflake functions we apply the following logic :
IFF(DAY(DATE1) >= DAY(DATE2), (GREATEST(DAY(DATE1), DAY(DATE2)) - LEAST(DAY(DATE1), day(DATE2))) / 31, 1 - (GREATEST(DAY(DATE1), DAY(DATE2)) - LEAST(DAY(DATE1), DAY(DATE2))) / 31)
We simply add them together to get Impala's exact value :
IFF(DAY(DATE1) >= DAY(DATE2), DATEDIFF('month', DATE2, DATE1), DATEDIFF('month', DATE2, DATE1) - 1)
+
IFF(DAY(DATE1) >= DAY(DATE2), (GREATEST(DAY(DATE1), DAY(DATE2)) - LEAST(DAY(DATE1), day(DATE2))) / 31, 1 - (GREATEST(DAY(DATE1), DAY(DATE2)) - LEAST(DAY(DATE1), DAY(DATE2))) / 31)

How precise does your difference need to be?
Because my first call would be to substract both dates (getting the difference in days) - or you can just use datediff just extracting the day - and go from there (either by dividing by 30/31 if approximation would do or with some more complex calc if I need exact solution)
Your solution is not right because the months in datediff are going to differ by 1 even on the subsequent days if you take let's say 31st of dec and 1st of jan - those 2 would give "1" in datediff both by year, month and day.

Snowflake natively supports it: 4.42 Release Notes.
MONTHS_BETWEEN
MONTHS_BETWEEN( <date_expr1> , <date_expr2> )
Returns the number of months between two DATE or TIMESTAMP values.

Related

Rewrite dynamic T-SQL date variables in DAX

We're currently rebuilding basic emailed reports built using T-SQL to be paginated reports published on Power BI.
We're muddling through by creating the tables we need with the appropriate filters in Power BI Desktop to reconcile the numbers, then taking the DAX code from them using the Performance Analyser.
The one I'm working at the minute has a simple bit of SQL code to get data for a previous calendar month. I have no idea how or if it's possible for this to exist in DAX?
-- Validation to get previous month
IF (MONTH(GETDATE()) - 1) > 0
SET #MONTH = MONTH(GETDATE()) - 1
ELSE
SET #MONTH = '12'
-- Validation to get year of previous month
IF (#MONTH < 12)
SET #YEAR = YEAR(GETDATE())
ELSE
SET #YEAR = YEAR(GETDATE()) - 1
-- Set start date and finish date for extract
SET #PERIOD = #YEAR + RIGHT('00' + #MONTH, 2)
It needs to become a hidden SSRS parameter or just inline code to be used with this DAX variable:
VAR __DS0FilterTable =
TREATAS({"202212"}, 'Org View_VaultexCalendar'[Calendar Month No])
So the "202212" would become #period or the equivalent if doable without a parameter.
SSRS Parameter:
=IIF(
Month(Today()) > 1,
Year(Today()) & RIGHT("00" & Month(Today()) - 1, 2),
Year(Today())-1 & "12"
)
DAX expression:
IF (
MONTH ( TODAY () ) > 1,
YEAR ( TODAY () ) & FORMAT ( MONTH ( TODAY () ) - 1, "00" ),
YEAR ( TODAY () ) - 1 & "12"
)
In both cases we look at today's month and check if it's after January. If it is, take the current year and concatenate it with the current month less one and padded with a leading zero when needed. In the other case we know that the month is January so take the current year less one and concatenate it with "12"

Postgres generate date series with exactly 100 steps

Lets say we have the dates
'2017-01-01'
and
'2017-01-15'
and I would like to get a series of exactly N timestamps in between these dates, in this case 7 dates:
SELECT * FROM
generate_series_n(
'2017-01-01'::timestamp,
'2017-01-04'::timestamp,
7
)
Which I would like to return something like this:
2017-01-01-00:00:00
2017-01-01-12:00:00
2017-01-02-00:00:00
2017-01-02-12:00:00
2017-01-03-00:00:00
2017-01-03-12:00:00
2017-01-04-00:00:00
How can I do this in postgres?
Possibly this can be useful, using the generate series, and doing the math in the select
select '2022-01-01'::date + generate_series *('2022-05-31'::date - '2022-01-01'::date)/15
FROM generate_series(1, 15)
;
output
?column?
------------
2022-01-11
2022-01-21
2022-01-31
2022-02-10
2022-02-20
2022-03-02
2022-03-12
2022-03-22
2022-04-01
2022-04-11
2022-04-21
2022-05-01
2022-05-11
2022-05-21
2022-05-31
(15 rows)
WITH seconds AS
(
SELECT EXTRACT(epoch FROM('2017-01-04'::timestamp - '2017-01-01'::timestamp))::integer AS sec
),
step_seconds AS
(
SELECT sec / 7 AS step FROM seconds
)
SELECT generate_series('2017-01-01'::timestamp, '2017-01-04'::timestamp, (step || 'S')::interval)
FROM step_seconds
Conversion to function is easy, let me know if have trouble with it.
One problem with this solution is that extract epoch always assumes 30-days months. If this is problem for your use case (long intervals), you can tweak the logic for getting seconds from interval.
You can divide the difference between the end and the start value by the number of values you want:
SELECT *
FROM generate_series('2017-01-01'::timestamp,
'2017-01-04'::timestamp,
('2017-01-04'::timestamp - '2017-01-01'::timestamp) / 7)
This could be wrapped into a function if you want to avoid repeating the start and end value.

Is there a way to count days INSIDE a range of dates?

I'm quite a beginner on VB/SQL, I just began my learning few months ago, but I can understand the logic of algorithms as I used to do some Excel VBA .
I'm actually designing a database where I can (wish to) follow up every colleague's activity during the year.
The objective is to have a (Monthly) ratio of =>
Billable days / (Billable + Non Billable - Absent)
The context :
A single person can be : Working internally (Non billable), OR Working Externally (Billable) , OR on Holidays (Absent).
I have a [Planning] Table where it stores the following data : [Consultant_ID] (linked to another table [Consultant], [Activity] (A list with the three choices described above), [Beginning_Date], [End_Date].
Example :
Consultant 1 : Working externally from 01/01/2019 to 01/06/2019,
Working internally from 02/06/2019 to 31/12/2019,
Holidays from 02/03/2019 to 15/03/2019
Is there a way to have the Billable ratio of March for example ?
I created 4 queries (Maybe too much ?)
3 queries : [Consultant_ID] [Activity] [Beginning_Date] [End_Date] [Ratio : Datediff("d";[Beginning_Date];[End_Date]).
For each query : The [Activity criteria] : one Working Internally, one Working Externally, one Absent.
And for the [Beginning_Date] and [End_Date] criterias : <=[Enter beginning date], >=[Enter End date]
And the 4th query [Consultant ID] [Billable] [Non billable] [Absent] (and planning to add the [RATIO]).
Problem is : the Datediff counts the dates of the whole activity of what it finds, and not only the dates between 01/03/2019 and 31/03/2019 as I wish to.
I Expect the output of the ratio to be : Billable days / (Billable + Non Billable - Absent) of the desired period.
The actual output shows the billable, non billable, and absent days of the whole period between the dates which are inputted
So instead of 31 Billable, 0 Non billable, 15 Absent
It shows 180 Billable, 0 Non Billable, 32 Absent
Sorry for the long post, it is actually my first, and thank you very much !
I've been struggling with this for a whole week
We first need to figure out the maxBegin and the minEnd dates for each row
SELECT
*,
(IIF (Beginning_Date > #3/1/2019#, Beginning_Date, #3/1/2019#) ) as maxBegin,
(IIF (End_Date < #3/31/2019#, End_Date, #3/31/2019#) ) as minEnd,
Datediff("d", maxBegin, minEnd) + 1 as theDiff
FROM Planning
Where Beginning_Date <= #3/31/2019# AND End_Date >= #3/1/2019#
Then use that to compute the durations. Note: DateDiff does not count both ends, so we need to add +1.
SELECT
Consultant_ID,
SUM(IIF (Activity = "Working Internally", Datediff("d", maxBegin, minEnd) +1, 0) ) as NonBillable,
SUM(IIF (Activity = "Working Externally", Datediff("d", maxBegin, minEnd) +1, 0) ) as Billable,
SUM(IIF (Activity = "Holidays", Datediff("d", maxBegin, minEnd) +1, 0) ) as Absent
FROM
(
SELECT
*,
(IIF (Beginning_Date > #3/1/2019#, Beginning_Date, #3/1/2019#) ) as maxBegin,
(IIF (End_Date < #3/31/2019#, End_Date, #3/31/2019#) ) as minEnd
FROM Planning
Where Beginning_Date <= #3/31/2019# AND End_Date >= #3/1/2019#
) as z
GROUP BY Planning.Consultant_ID;
Finally, you need to substitute the actual Begin/End dates via params into the sql to run your query. Also note that the Holidays are only 14, not 15.
Also, you can add the Ratio calculation right into this sql, and have only one query.

PostgreSQL: Date Difference with fractions

SELECT cu.user_id, cu.last_activity, cu.updated_time,
DATE_PART('day', cu.last_activity - cu.updated_time), to_char(end_date - start_date, 'DD.HH24')
FROM stats.core_users cu
WHERE cu.user_id = '117132014' or cu.user_id = '117132012';
Get the result like:
117132014 2017-12-11 10:34:51.349905 2017-12-09 12:00:38.503518 1 01.22
117132012 2017-12-11 05:18:20.312283 2017-12-08 15:46:51.914085 2 02.13
Is is feasible to get the day difference with fractions like 1.91 days in the first case, instead of 1 days and 22 hours, to be more precise and easier to fit in a machine learning model?
date_part() does what it's name says: it returns one part of several elements from a date, interval or timestamp. In your case it's one part of an interval (because timestamp - timestamp returns an interval).
If you want the result as a fraction, you need to extract the seconds of the interval and then divide that by 86400 (which is the number of seconds in a day)
extract(epoch from cu.last_activity - cu.updated_time) / 86400

Using expression in pick function

I have tried Num((today()-I_TRAN_DATE)/90 + 1,0) individually and it will return integer, but it seems not working when I try to combined it with pick function. I know it's not finished but should at least return result for 1-3
pick(
Num((today()-I_TRAN_DATE)/90 + 1,0)
,'less than 3 months'
,'3-6 months'
,'6-12 months'
,'greater than 1 year'
)
This looks like an issue with passing the number from the expression to the pick function. When using the num function, this does not change the underlying value - including a few more brackets and a round function resolves the issue as per the below script which generates a list of dates and then applies the pick function at the end.
Let varMinDate = Num(31350); //30/10/1985
Let varMaxDate = Num(42400); //31/01/2016
TempCalendar:
LOAD
$(varMinDate) + Iterno()-1 As Num,
Date($(varMinDate) + IterNo() - 1) as TempDate
AutoGenerate 1 While $(varMinDate) + IterNo() -1 <= $(varMaxDate);
TestCalendar:
Load
TempDate AS I_TRAN_DATE,
week(TempDate) As Week,
Year(TempDate) As Year,
Month(TempDate) As PeriodMonth,
Day(TempDate) As Day,
if(Year2Date(TempDate),1,0) as CurYTDFlag,
if(Year2Date(TempDate,-1),1,0) as LastYTDFlag,
inyear(TempDate, Monthstart($(varMaxDate)),-1) as RC12,
date(monthstart(TempDate), 'MMM-YYYY') as MonthYear,
Week(weekstart(TempDate)) & '-' & WeekYear(TempDate) as WeekYear,
WeekDay(TempDate) as WeekDay,
if(TempDate>=MonthStart(AddMonths(Today(),-12)) and(TempDate<=MonthStart(Today())),1,0) as Rolling12Month
,'Q' & Ceil (Month(TempDate)/3) as Quarter
,Year(TempDate)&'-Q' & Ceil (Month(TempDate)/3) as YearQuarter
,if(Year(TempDate)=(Year(Today())-1),1,0) as LastYear
, if(MonthStart(TempDate)=MonthStart(Today()),null(),'Exclude Current Period') As ExcludeCurrentPeriod
Resident TempCalendar
Order By TempDate ASC;
Drop Table TempCalendar;
Let varMinDate = null();
Let varMaxDate = null();
PeriodTable:
Load
Pick(round((num((((today()-I_TRAN_DATE)/90) + 1),0)),1)
,'less than 3 months'
,'3-6 months'
,'6-12 months'
,'greater than 1 year') as Period
,I_TRAN_DATE
Resident TestCalendar;