postgres - window function - date difference within a groups

postgres - window function - date difference within a groups - postgresql

I searched and saw potential results but with my skills I'm not able to adapt them.
I have a table:
date
record status
1.10.2022
open
2.10.2022
waiting
3.10.2022
approved
5.10.2022
open
6.10.2022
waiting
8.10.2022
approved
10.10.2022
open
12.10.2022
waiting
and need the date difference between 'open' and 'approved' within groups starting with 'open' and ending with 'approved'. The last group is not yet approved. There the date difference is between the last open and today (=15.10.2022 just as example)
date
record status
group/rank
date diff
1.10.2022
open
1
2
2.10.2022
waiting
1
2
3.10.2022
approved
1
2
5.10.2022
open
2
3
6.10.2022
waiting
2
3
8.10.2022
approved
2
3
10.10.2022
open
3
5
12.10.2022
waiting
3
5
Questions then:
How do I define the groups. I thought maybe with a rank because the orig table has thousands of rows
How does the date diff function look like that considers only open and approved plus the special case where the record is not yet approved
How to apply this date diff to the groups only
Thanks a lot :-)

Your idea was good; a CTE to manage the case where you need to use the current day was all you were missing (See EXISTS).
WITH RankedStatus AS (
SELECT MyTable.*,
DENSE_RANK() OVER (PARTITION BY RecordStatus ORDER BY Date) AS Rank
FROM MyTable
)
SELECT RS.*,
CASE WHEN EXISTS(SELECT FROM RankedStatus WHERE Rank = RS.Rank and RecordStatus = 'approved')
THEN MAX(Date) OVER (PARTITION BY Rank)
ELSE CURRENT_DATE END
- MIN(Date) OVER (PARTITION BY Rank) AS DateDiff
FROM RankedStatus RS

Related

User Sessions | Month's Since Last Active Using SQL

UserID
CalMonth
ActiveFlag
Months_since_last_active
A
1/1/2021
1
0
A
2/1/2021
1
A
3/1/2021
2
A
4/1/2021
1
0
B
1/1/2021
1
0
B
2/1/2021
1
B
3/1/2021
1
0
Problem --> The first 3 colums are given. Generate the last one 'Months_since_last_active' by adding 1 until the use is active again
My Solution as below:
With active_sessions as (
Select
User_Id
, CalMonth
, active flag as current_flag
, LAG (ActiveFlag,1) over (partition by User_Id order by CalMonth) as previous_flag
)
Select User_Id, CalMonth, current_flag, sum(case when current_flag =1 then 0
when current_flag IS NULL then Months_since_last_active + 1
END
) as Months_since_last_active
from active_sessions
order by 1,2
I was asked the above question in an interview and told that my proposed solution would not work because:
When it comes to 3/1/2021 and beyond, the previous values of 'Months_since_last_active' are not in the table yet -- they are only in the code
If I wanted to use LAG function, then it'd take innumerable LAG functions to achieve what I was trying to achieve
I will appreciate if someone can comment on my solution.

Your solution has 3 major problems, 2 of them may be related to copy/past errors. The active_sessions CTE is missing the from clause, so there is no data source. Then the main portion uses the aggregate function SUM, however, the query has no group by which is required for the aggregate function. These are easily corrected. The other issue concerns the LAG function and your use of it.
First off in the CTE you alias the result as previous_flag, then in the main query you reference Months_since_last_active which does not exist yet. I think this is the source of the interviewer's first point.
The interviewer's second point also stems form the LAG function. As written it always looks back exactly 1 row, but from the current row yet it needs to look back 2 rows for (userid, calmonth) = ('A', 2021-03-01), and 3 rows for (A, 2021-04-01), etc. Basically you need to look back to to the last row with active_flag = 1. This leads directly to the it'd take innumerable LAG functions as you do not know how far beck you need to look. Suppose you had 30-40 or more inactive rows between active rows. You need a LAG(activeflag,n) ... for each possibility.
A solution. I dislike the problem statement it should not contain by adding 1 until the use is active again (is it yours or theirs). Either way this is an XY. If theirs they should be telling you what to solve, i.e. find number of months since last active. If yours you have created the problem for yourself. The problem statement should not say anything about how to solve the it. I will ignore that portion of the problem (And in a real interview I would/have ignored it, but be prepared to explain why).
What you have a a version of a Gaps And Islands (google it, you will find more that to think about). In this version lets consider each row with activeflag = 'Y' an as island, and anything else as a gap. Nor what you are looking for is the length of the gaps between islands. In the following the island_num CTE does 2 things. It assigns a sequence number to each row for a (userid, calmonth) and generates a boolean for each island. The `gap_points' then joins the results with itself, selecting the assigned for the max island whose calmonth is less than the current rows calmonth. In the main part the Months_since_last_active is assigned 0 if the current row is an island, and the difference between the generated row numbers if it is a gap. (see demo)
with island_num (userid, cal_month, active_flag, is_island, row_num) as
( select am.*
, case when am.activeflag = 1 then true else false end is_island
, row_number() over (partition by am.userid order by am.calmonth) rn
from active_month am
) -- select * from island_num
, gap_points(userid, cal_month, active_flag, is_island, row_num, island_row) as
( select *
from island_num i1
join lateral
(select max(row_num)
from island_num i2
where i1.userid = i2.userid
and i2.cal_month < i1.cal_month
and i2.is_island
) s0
on true
) --select * from gap_points;
select userid "User Id"
, cal_month "Cal Month"
, active_flag "Active Flag"
, case when is_island then 0
else row_num - island_row
end "Months_since_last_active"
from gap_points;

HQL: Max date of previous month

Good morning,
I have a problem I've been trying to solve for but am getting now where.
I need to find the max date of the previous month. Normally I would just use the following to find the last day of the previous month: last_day(add_months(current_date, -1)
However, this particular data set doesn't always have the last day with data. E.g. Last day in the data for May was May 30th. Obviously if i try using the syntax above it would return no data because it would be looking for 5/31.
So is there a way to find the "max" day available in the data of the previous month? Or the month prior etc.?

For example like this (two scans of table: one in subquery to find max date and one in main query):
select *
from mytable
where as_of_date in (select max(as_of_date) from mytable where as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
Or (single scan + analytic function) like this
select col1 ... colN
from
(
select t.*, rank() over (partition by month (t.as_of_date) order by t.as_of_date desc) rnk
from mytable t
where --If you have partition on date, this WHERE may improve performance
t.as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
)s
where rnk=1

Have Datetable with dates and if business day, need to find the 11th business day after a date

I need to find a date that is 11 business days after a date.
I did not have a date table. Requested one, long lead time for one.
Used a CTE to produce results that have a datekey, 1 if weekday, and 1 if holiday, else 0. Put those results into a Table Variable, now Business_Day is (weekday-holiday). Much Googling has already happened.
select dt.Datekey,
(dt.Weekdaycount - dt.HolidayCount) as Business_day
from #DateTable dt[enter image description here][1]
UPDATE, I've figured it out in Excel. Running count of business days, a column of business day count + 11, then a Vlookup finding the +11 date . Now how do I do that in SQL?
Results like this
Datekey
2019-01-01
Business_day 0
Datekey
2019-01-02
Business_day
1

I will assume you want to set your weekdays, and you can enter the holidays in a variable table, so you can do the below:-
here set the weekend names
Declare #WeekDayName1 varchar(50)='Saturday'
Declare #WeekDayName2 varchar(50)='Sunday'
Set the holiday table variable, you may have it as a specific table your database
Declare #Holidays table (
[Date] date,
HolidayName varchar(250)
)
Lets insert a a day or two to test it.
insert into #Holidays values (cast('2019-01-01' as date),'New Year')
insert into #Holidays values (cast('2019-01-08' as date),'some other holiday in your country')
lets say your date you want to start from is action date and you need 11 business days after it
Declare #ActionDate date='2018-12-28'
declare #BusinessDays int=11
A recursive CTE to count the days till you get the correct one.
;with cte([date],BusinessDay) as (
select #ActionDate [date],cast(0 as int) BusinessDay
union all
select dateadd(day,1,cte.[date]),
case
when DATENAME(WEEKDAY,dateadd(day,1,cte.[date]))=#WeekDayName1
OR DATENAME(WEEKDAY,dateadd(day,1,cte.[date]))=#WeekDayName2
OR (select 1 from #Holidays h where h.Date=dateadd(day,1,cte.[date])) is not null
then cte.BusinessDay
else cte.BusinessDay+1
end BusinessDay
From cte where BusinessDay<#BusinessDays
)
--to see the all the dates till business day + 11
--select * from cte option (maxrecursion 0)
--to get the required date
select MAX([date]) from cte option (maxrecursion 0)
In my example the date I get is as below:-
ActionDate =2018-12-28
After 11 business days :2019-01-16
Hope this helps

1st step was to create a date table. Figuring out weekday verse weekends is easy. Weekdays are 1, weekends are 0. Borrowed someone else's holiday calendar, if holiday 1 else 0. Then Business day is Weekday-Holiday = Business Day. Next was to create a running total of business days. That allows you to move from whatever running total day you're current on to where you want to be in the future, say plus 10 business days. Hard coded key milestones in the date table for 2 and 10 business days.
Then JOIN your date table with your transaction table on your zero day and date key.
Finally this allows you to make solid calculations of business days.
WHERE CONVERT(date, D.DTRESOLVED) <= CONVERT(date, [10th_Bus_Day])

Calculate best sale between several sellers

I'm using postgre .
Let's say there are 5 sellers .
Each month sale is recorded inside the database like this ( userId:6, january : 10000$, february:20000$ , march : 10000$ ... ,december:50000$, year :2018 )
I need to calculate , possibily with only one query, the best of each month sale in one array of this format : ( january : 15000$, february:30000$ , march : 40000$ , year :2018 ), i dont need the userId . I simply need to compare each sales per months and display the best amount ...
For now, i've got this code, who works well, givin me the user 6 sales per month on a given year :
SELECT date_trunc('month', date_vente) AS txn_month, sum(prix_vente) as monthly_sum,count(prix_vente) AS monthly_count
FROM crm_vente
WHERE 1=1
AND date_part('year', date_vente) = 2018
AND id_user = 6
GROUP BY txn_month ORDER BY txn_month
I wonder if somebody could tell me what kind of technology i could use to get the best of sales each 12 months between of the 5 employees .
COuld i use view ? SHould i better do a for loop in php, with each of the users sales per months, then do a kind of comparative array ?
No need to give me a full resolution, but maybe an advice on how to do, directly with postgre ? Because my only solution for now is to use php and to do a not nice code .
Nice day, ill check on MOnday
Sorry for my english

WITH monthly_sales AS (
SELECT
date_trunc('month', date_vente) AS txn_month,
user_id,
sum(prix_vente) as monthly_sum,
FROM crm_vente
WHERE 1=1
AND date_part('year', date_vente) = 2018
GROUP BY txn_month, user_id
ORDER BY txn_month, user_id),
rank_monthly_sales_by_user_id AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY txn_month ORDER BY monthly_sum DESC) AS rank
FROM monthly_sales)
SELECT
txn_month,
monthly_sum
FROM rank_monthly_sales_by_user_id
WHERE rank = 1
ORDER BY txn_month ASC;
Firstly what you should do is get the totals per month by user. This is the top subquery called monthly sales. Monthly_sales sums the sales of each user by month
Next, to get the top user for each month in terms of their total sales you have to rank the rows returned by the previous subquery. This is down by ROW_NUMBER()
ROW_NUMBER() gets the row number in a specified window, in this case it's ordering the rows from monthly_sales for each month (it starts ordering again from 1 each month). The PARTITION BY statement is the window in which we want to perform the row count, here it's month since we want to order our user_id's sales by month. The ORDER BY statement says how to order the rows from 1 to n. We're using monthly_sum in descending order. So the highest monthly sum is 1, lowest is 6
The next query is selecting only the rows from rank_monthly_sales_by_user_id that are the top sales for the month (WHERE rank = 1)
This leaves us with a output where is row is a month, with the highest sale for that month
Let me know if that was what you needed help with

Select unique values sorted by date

I am trying to solve an interesting problem. I have a table that has, among other data, these columns (dates in this sample are shown in European format - dd/mm/yyyy):
n_place_id dt_visit_date
(integer) (date)
========== =============
1 10/02/2012
3 11/03/2012
4 11/05/2012
13 14/06/2012
3 04/10/2012
3 03/11/2012
5 05/09/2012
13 18/08/2012
Basically, each place may be visited multiple times - and the dates may be in the past (completed visits) or in the future (planned visits). For the sake of simplicity, today's visits are part of planned future visits.
Now, I need to run a select on this table, which would pull unique place IDs from this table (without date) sorted in the following order:
Future visits go before past visits
Future visits take precedence in sorting over past visits for the same place
For future visits, the earliest date must take precedence in sorting for the same place
For past visits, the latest date must take precedence in sorting for the same place.
For example, for the sample data shown above, the result I need is:
5 (earliest future visit)
3 (next future visit into the future)
13 (latest past visit)
4 (previous past visit)
1 (earlier visit in the past)
Now, I can achieve the desired sorting using case when in the order by clause like so:
select
n_place_id
from
place_visit
order by
(case when dt_visit_date >= now()::date then 1 else 2 end),
(case when dt_visit_date >= now():: date then 1 else -1 end) * extract(epoch from dt_visit_date)
This sort of does what I need, but it does contain repeated IDs, whereas I need unique place IDs. If I try to add distinct to the select statement, postgres complains that I must have the order by in the select clause - but then the unique won't be sensible any more, as I have dates in there.
Somehow I feel that there should be a way to get the result I need in one select statement, but I can't get my head around how to do it.
If this can't be done, then, of course, I'll have to do the whole thing in the code, but I'd prefer to have this in one SQL statement.
P.S. I am not worried about the performance, because the dataset I will be sorting is not large. After the where clause will be applied, it will rarely contain more than about 10 records.

With DISTINCT ON you can easily show additional columns of the row with the resulting n_place_id:
SELECT n_place_id, dt_visit_date
FROM (
SELECT DISTINCT ON (n_place_id) *
,dt_visit_date < now()::date AS prio -- future first
,#(now()::date - dt_visit_date) AS diff -- closest first
FROM place_visit
ORDER BY n_place_id, prio, diff
) x
ORDER BY prio, diff;
Effectively I pick the row with the earliest future date (including "today") per n_place_id - or latest date in the past, failing that.
Then the resulting unique rows are sorted by the same criteria.
FALSE sorts before TRUE
The "absolute value" # helps to sort "closest first"
More on the Postgres specific DISTINCT ON in this related answer.
Result:
n_place_id | dt_visit_date
------------+--------------
5 | 2012-09-05
3 | 2012-10-04
13 | 2012-08-18
4 | 2012-05-11
1 | 2012-02-10

Try this
select n_place_id
from
(
select *,
extract(epoch from (dt_visit_date - now())) as seconds,
1 - SIGN(extract(epoch from (dt_visit_date - now())) ) as futurepast
from #t
) v
group by n_place_id
order by max(futurepast) desc, min(abs(seconds))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

postgres - window function - date difference within a groups - postgresql

Related

User Sessions | Month's Since Last Active Using SQL

HQL: Max date of previous month

Have Datetable with dates and if business day, need to find the 11th business day after a date

Calculate best sale between several sellers

Select unique values sorted by date

Categories

Resources