Group by Day or Most Recent Value Postgresql - postgresql

I have the following table using postgresql:
Date Group Count Days
31/01/2021 Gr1 50 5
30/01/2021 Gr2 40 10
29/01/2021 Gr1 30 6
28/01/2021 Gr2 20 4
27/01/2021 Gr1 10 5
26/01/2021 Gr2 40 8
25/01/2021 Gr1 30 6
24/01/2021 Gr2 10 5
23/01/2021 Gr1 5 1
What I want to is produce a table for group 1 (GR1) by day, produce the count and calculate 'Average Count Per Day' as Count/Days. Where GR1 is not produced for a day then use the previous days which was a GR1 day. For example where GR1 does not have a value for 30-01-2021 then use the values from the most recent earlier day which it was GR (i.e. 29-01-2021).
The result should look like this:
Date Group Count Days Avg Count Per Days
31/01/2021 Gr1 50 5 10
30/01/2021 Gr1 30 6 5
29/01/2021 Gr1 30 6 5
28/01/2021 Gr1 10 5 2
27/01/2021 Gr1 10 5 2
26/01/2021 Gr1 30 6 5
25/01/2021 Gr1 30 6 5
24/01/2021 Gr1 5 1 5
23/01/2021 Gr1 5 1 5

First I get all rows where "Group" = 'Gr1' and calculate "Avg Count Per Days" (subquery grp1). Then I generate all days between MIN("Date") and MAX("Date") (subquery days). Finally I JOIN boths subqueries and for each day I get most recent value not null (actual value or previous most recent value), using window function FIRST_VALUE():
WITH grp1 AS (SELECT *, "Count" / "Days" AS "Avg Count Per Days"
FROM t
WHERE "Group" = 'Gr1'),
days AS (SELECT generate_series(MIN("Date"), MAX("Date"), INTERVAL '1 day')::date AS day
FROM grp1)
SELECT DISTINCT
days.day AS "Date",
FIRST_VALUE("Group") OVER (PARTITION BY days.day ORDER BY "Date" DESC) AS "Group",
FIRST_VALUE("Count") OVER (PARTITION BY days.day ORDER BY "Date" DESC) AS "Count",
FIRST_VALUE("Days") OVER (PARTITION BY days.day ORDER BY "Date" DESC) AS "Days",
FIRST_VALUE("Avg Count Per Days") OVER (PARTITION BY days.day ORDER BY "Date" DESC) AS "Avg Count Per Days"
FROM days
LEFT JOIN grp1 ON grp1."Date" <= days.day
ORDER BY days.day DESC;
I assuming you have not more than 1 row "Group" = 'Gr1' by day.

Related

.Hive: Given a table t with schema (date, revenue),

6.Hive: Given a table t with schema (date, revenue), like this
6.Hive: Given a table t with schema (date, revenue), like this
date r
Jan. 1 100
Jan. 2 120
Jan. 3 80
Jan. 4 150
Jan. 5 50
What does the following query do?
SELECT t1.date AS date, sum(t2.revenue) AS revenue
FROM t as t1 JOIN t as t2 ON t2.date <= t1.date GROUP BY 1 ORDER BY 1

Cohort Analysis with RedShift by Month

I am trying to build a cohort analysis for monthly retention but experiencing challenge getting the Month Number column right. The month number is supposed to return month(s) user transacted i.e 0 for registration month, 1 for the first month after registration month, 2 for the second month until the last month but currently, it returns negative month numbers in some cells.
It should be like this table:
cohort_month total_users month_number percentage
---------- ----------- -- ------------ ---------
January 100 0 40
January 341 1 90
January 115 2 90
February 103 0 73
February 100 1 40
March 90 0 90
Here is the SQL:
with cohort_items as (
select
extract(month from insert_date) as cohort_month,
msisdn as user_id
from mfscore.t_um_user_detail where extract(year from insert_date)=2020
order by 1, 2
),
user_activities as (
select
A.sender_msisdn,
extract(month from A.insert_date)-C.cohort_month as month_number
from mfscore.t_wm_transaction_logs A
left join cohort_items C ON A.sender_msisdn = C.user_id
where extract(year from A.insert_date)=2020
group by 1, 2
),
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.sender_msisdn = C.user_id
group by 1, 2
)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3
I think the RANK window function is the right solution. So the idea is to assigne a rank to months of user activities for each user, order by year and month.
Something like:
WITH activity_per_user AS (
SELECT
user_id,
event_date,
RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
FROM user_activities_table
)
RANK number starts from 1, so you may want to substract 1.
Then, you can group by user_id and month_number to get the number of interactions for each user per month from the subscription (adapt to your use case accordingly).
SELECT
user_id,
month_number,
COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2
Here is the documentation:
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html

Group Results By Month and Column

Is there anyway i can group the results by month and by type? With the query below i managed to group by month but the "type" is not grouping.
query:
select date_trunc('month',processingdate),type,sum(itemcount), sum(itemamount)
from test
where date_trunc('month',processingdate) >= now() - interval '1 month'
group by type,date_trunc('month',processingdate)
order by processingdate
Data in table:
date_trunc type sum sum
1/1/20 11 5 1
1/2/20 12 3 2
1/3/20 11 2 3
1/4/20 12 3 5
Expected Results:
date_trunc type sum sum
1/1/2020 11 7 4
1/1/2020 12 6 7

PostgreSQL: first date cumulative score

I have this sample table:
id date score
11 1/1/2017 14:32 25.34
4 1/2/2017 12:14 34.34
25 1/2/2017 18:08 37.15
4 3/2/2017 23:42 47.24
4 4/2/2017 23:42 54.12
25 7/3/2017 22:07 65.21
11 9/3/2017 21:02 74.6
25 10/3/2017 5:15 11.3
4 10/3/2017 7:11 22.45
My aim is to calculates the first(!) date (YYYY-MM-DD) on which an id's cumulative score has reached 100 (>=). For that, I've written the following code:
SELECT date(date),id, score,
sum(score) over (partition by id order by date(date) rows unbounded preceding) as cumulative_score
FROM test_q1
GROUP BY id, date, score
Order by id, date
It returns:
date id score cumulative_score
1/1/2017 11 25.34 25.34
9/3/2017 11 74.6 99.94
1/2/2017 4 34.34 34.34
3/2/2017 4 47.24 81.58
4/2/2017 4 54.12 135.7
10/3/2017 4 22.45 158.15
1/2/2017 25 37.15 37.15
7/3/2017 25 65.21 102.36
10/3/2017 25 11.3 113.66
I tried to add either WHERE cumulative_score >= 100 or HAVING cumulative score >= 100, but it returns_
ERROR: column "cumulative_score" does not exist
LINE 4: WHERE cumulative_score >= 100
^
SQL state: 42703
Character: 206
Anyone knows how to solve this?
Thanks
What I expect is:
date id score cumulative_score
4/2/2017 4 54.12 135.7
7/3/2017 25 65.21 102.36
And the output just id and date.
Try this:
with cumulative_sum AS (
SELECT id,date,sum(score) over( partition by id order by date) as sum from test_q1
),
above_100_score_rank AS (
SELECT *, rank() over (partition by id order by sum) AS rank
FROM cumulative_sum where sum > 100
)
SELECT * FROM above_100_score_rank WHERE rank= 1;

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.
This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;