PostgreSQL grouped rolling average (for multiple labels) - postgresql

I could not find a clear example of this online.
I want a moving average for the last 2 days based on this data:
create table expenses as (
select 'food' as expense, 5.0 as cost, current_date as date
union select 'food', 5.0, current_date - 1
union select 'food', 4.0, current_date - 2
union select 'food', 4.0, current_date - 3
union select 'food', 3.0, current_date - 4
union select 'food', 3.0, current_date - 5
union select 'entertainment', 9.0, current_date
union select 'entertainment', 9.0, current_date - 1
union select 'entertainment', 8.0, current_date - 2
union select 'entertainment', 8.0, current_date - 3
union select 'entertainment', 7.0, current_date - 4
union select 'entertainment', 7.0, current_date - 5
)

Here is the solution I put together
select
expense,
date,
cost,
avg(cost) over
(partition by expense order by date rows 2 preceding) as rolling_avg_cost
from expenses
which gives the result:
expense date cost rolling_avg_cost
entertainment Thursday, March 23, 2017 12:00 AM 7 7
entertainment Friday, March 24, 2017 12:00 AM 7 7
entertainment Saturday, March 25, 2017 12:00 AM 8 7.3
entertainment Sunday, March 26, 2017 12:00 AM 8 7.6
entertainment Monday, March 27, 2017 12:00 AM 9 8.3
entertainment Tuesday, March 28, 2017 12:00 AM 9 8.6
food Thursday, March 23, 2017 12:00 AM 3 3
food Friday, March 24, 2017 12:00 AM 3 3
food Saturday, March 25, 2017 12:00 AM 4 3.3
food Sunday, March 26, 2017 12:00 AM 4 3.6
food Monday, March 27, 2017 12:00 AM 5 4.3
food Tuesday, March 28, 2017 12:00 AM 5 4.6
As can be seen, the window for the rolling average is 3 days inclusive of the current row (i.e. the current row plus the previous two, all divided by 3).

Related

How do I count discontinued dates in PowerBI?

I want to count the discontinued dates per ID with filter "FilterByValue" by 1.
What I mean by discontinued dates.
04.01.2021
05.01.2021
06.01.2021
08.01.2021
07.01.2021 date would be missing to be a continued date when a day between dates is missing its discontinued.
Dates have also to be distinct and within the last 90 Days.
RowID is just for explanation purposes.
RowID
ID
FilterByValue
Date
1
1
1
Monday, 4. January 2021
2
1
1
Tuesday, 5. January 2021
3
1
1
Tuesday, 5. January 2021
4
1
1
Wednesday, 6. January 2021
5
1
1
Monday, 11. January 2021
6
1
99
Friday, 8. January 2021
7
2
1
Tuesday, 9. February 2021
8
2
1
Wednesday, 10. February 2021
9
2
1
Thursday, 11. March 2021
10
2
1
Friday, 12. March 2021
11
2
1
Monday, 15. March 2021
12
2
1
Tuesday, 16. March 2021
13
2
99
Sunday, 14. March 2021
14
2
1
Wednesday, 14. April 2021
What I want to achieve:
RowID
ID
CountDiscontinuedDates
1
1
2
2
2
4
What I tried, I think is a bad/ not helping approach:
discontinuesDates = COUNTAX(FILTER(TableName, [ID]=1 && TableName[Date] > (TODAY()-90) && OR (DATEADD( TableName[Date] = (TableName[Datum],1,DAY), DATEADD( TableName[Date] = (TableName[Datum],-1,DAY) ) && TableName[ID] = EARLIER(TableName[ID]) && TableName[Date] = TableName[Date] ), TableName[ID])
discontinuesDates = CALCULATE(COUNT(TableName[ID]), FILTER(TableName, TableName[FilterByValue]=34 && TableName[ID] = EARLIER( TableName[ID]) && DATEADD( TableName[Date],1,DAY) <> EARLIER( TableName[Date])) )
Maybe something like this:
Assuming that FilterByValue is available to use:
_Foo =
CALCULATE (
DISTINCTCOUNT ( DiscontinuedDatesData[Date] ),
FILTER (
DiscontinuedDatesData,
'DiscontinuedDatesData'[Date] >= CALCULATE (MIN ( 'DiscontinuedDatesData'[DATE] ), 'DiscontinuedDatesData', DiscontinuedDatesData[FilterByValue] = 99)
&& 'DiscontinuedDatesData'[Date] >= TODAY() - 90
)
)

Groupby year, calculate sum and percentage per year

I have a table with the columns
datefield area
I want to calculate sum of area per year and a percentage column
year sum percentage
2022 5 12
2023 10 24
2024 6 15
[null] 20 49
(I have many more years in the table which I want to include)
WITH total as(
select extract(YEAR from "datefield") theyear, sum(area) as totalarea
from thetable
group by extract(YEAR from "datefield")
)
select total.theyear, total.totalareal,
totalarea/(SUM(totalarea) OVER (PARTITION BY theyear))*100
from total
I get correct sum, but all the percentages are 100..
What am I doing wrong?
Some sample data:
2019 7.05
2020 4.77
2020 3.56
2021 1.64
2021 8.37
2021 3.51
2021 1.43
2021 9.94
2022 1.91
2022 5.3
I would like the result
2019 7.05 15
2020 8.33 18
2021 24.89 52
2022 7.21 15
WITH
total as
(
select extract(YEAR from "datefield") theyear, sum(area) as totalarea,
SUM(sum(area)) OVER() as SUM_totalarea
from thetable
group by extract(YEAR from "datefield")
)
SELECT theyear, totalarea, 100.0 * totalarea / SUM_totalarea AS PERCENTAGE
FROM total

Group query results by month and year in postgresql with emply month sum

Based on this answer by Burak Arslan
SELECT date_trunc('month', txn_date) AS txn_month, sum(amount) as monthly_sum
FROM yourtable
GROUP BY txn_month
Is there a way to get months that have no results to show in the query?
So let's say I have :
id transDate Product Qty
1234 04/12/2019 ABCD 2
1245 04/05/2019 ABCD 1
1231 02/07/2019 ABCD 6
I also need to the the third Month returns with a 0 value
MonthYear totalQty
02/2019 6
03/2019 0
04/2019 3
Thanks,
---- UPDATE ---
Here is the final query that that gets last 24 months from the current date. with year and month ready for any charts.
Thanks to #a_horse_with_no_name
SELECT
--ONLY USE THE NEXT LINE IF YOU NEED TO HAVE THE ID IN YOUR RESULT
CASE WHEN t."ItemId" IS NULL THEN 10607 ELSE t."ItemId" END AS "ItemId",
TO_CHAR(y."transactionDate", 'yyyy-mm-dd') AS txn_month,
TO_CHAR(y."transactionDate", 'yyyy') AS "Year",
TO_CHAR(y."transactionDate", 'Mon') AS "Month",
-coalesce(SUM(t."transactionQty"),0) AS "TotalSold"
FROM generate_series(
TO_CHAR(CURRENT_DATE - INTERVAL '24 month', 'yyyy-mm-01')::date ,
TO_CHAR(CURRENT_DATE, 'yyyy-mm-01')::date,
INTERVAL '1 month') as y("transactionDate")
LEFT JOIN "ItemTransactions" AS t
ON date_trunc('month', t."transactionDate") = y."transactionDate"
AND t."ItemTransactionTypeId" = 1
AND t."ItemId" = 10607
GROUP BY txn_month, "Year", "Month", t."ItemId"
ORDER BY txn_month ASC;
EXEMPLE OUTPUT
ItemId txn_month Year Month TotalSold
10607 2018-03-01 2018 Mar 2
10607 2018-04-01 2018 Apr 0
10607 2018-05-01 2018 May 8
10607 2018-06-01 2018 Jun 12
10607 2018-07-01 2018 Jul 6
10607 2018-08-01 2018 Aug 4
10607 2018-09-01 2018 Sep 6
10607 2018-10-01 2018 Oct 8
10607 2018-11-01 2018 Nov 4
10607 2018-12-01 2018 Dec 0
10607 2019-01-01 2019 Jan 2
10607 2019-02-01 2019 Feb 3
10607 2019-03-01 2019 Mar 4
10607 2019-04-01 2019 Apr 1
10607 2019-05-01 2019 May 4
10607 2019-06-01 2019 Jun 3
10607 2019-07-01 2019 Jul 5
10607 2019-08-01 2019 Aug 6
10607 2019-09-01 2019 Sep 6
10607 2019-10-01 2019 Oct 6
10607 2019-11-01 2019 Nov 3
10607 2019-12-01 2019 Dec 0
10607 2020-01-01 2020 Jan 4
10607 2020-02-01 2020 Feb 2
10607 2020-03-01 2020 Mar 0
Left join to a list of months:
SELECT t.txn_month,
coalesce(sum(yt.amount),0) as monthly_sum
FROM generate_series(date '2019-02-01', date '2019-04-01', interval '1 month') as t(txn_month)
left join yourtable yt on date_trunc('month', yt.transdate) = t.txn_month
GROUP BY t.txn_month
Online example
In your actual query you need to move the conditions from the WHERE clause to the JOIN condition. Putting them into the WHERE clause turns the outer join back into an inner join:
SELECT t."ItemId",
y."transactionDate" AS txn_month,
-coalesce(SUM(t."transactionQty"),0) AS "TotalSold"
FROM generate_series(date '2018-01-01', date '2020-04-01', INTERVAL '1 month') as y("transactionDate")
LEFT JOIN "ItemTransactions" AS t
ON date_trunc('month', t."transactionDate") = y."transactionDate"
AND t."ItemTransactionTypeId" = 1
AND t."ItemId" = 10606
-- this WHERE clause isn't really needed because of the date values provided to generate_series()
WHERE AND y."transactionDate" >= NOW() - INTERVAL '2 year'
GROUP BY txn_month, t."ItemId"
ORDER BY txn_month DESC;

7 Day Return/Retention Rate

I've been trying to calculate 7 Day Return Rate (also known as Classic Retention Rate, as described here: https://www.braze.com/blog/calculate-retention-rate/) and then taking a 30 day average to reduce noise in Postgresql.
However, I'm sure I'm doing something wrong. First of all, the numbers look waaay higher than intuitively I feel they should be (generally around 5% for the rest of the sector). Also, I believe the first 7 days should show 0, as theoretically users should take at least 7 days to count as a "return". However, I get around 40-70%, as shown below.
Would someone mind taking a look at the code below and seeing if there are any errors? 7 Day Return Rate is a really common metric for apps, and I haven't found any questions using postgresql that calculate it to this level of sophistication on Stack Exchange (or even the rest of the web), so I feel like a solid response could be very useful to a lot of people.
Sample data
Wednesday, August 1, 2018 12:00 AM 71.14
Thursday, August 2, 2018 12:00 AM 55.44
Friday, August 3, 2018 12:00 AM 50.09
Saturday, August 4, 2018 12:00 AM 45.81
Sunday, August 5, 2018 12:00 AM 43.27
Monday, August 6, 2018 12:00 AM 40.61
Tuesday, August 7, 2018 12:00 AM 39.38
Wednesday, August 8, 2018 12:00 AM 38.46
Thursday, August 9, 2018 12:00 AM 36.81
Friday, August 10, 2018 12:00 AM 35.94
with
user_first_event as (
select distinct id, min(timestamp)::date as first_event_date
from log
where
timestamp <= current_date
and timestamp >= {{start_date}} and timestamp <= {{end_date}}
group by id),
event as (
select distinct id, timestamp::date as user_event_date
from log
where timestamp <= current_date and timestamp >= {{start_date}}),
gap as (
select
user_first_event.id,
user_first_event.first_event_date,
event.user_event_date,
event.user_event_date - user_first_event.first_event_date as days_since_signup
from user_first_event
join event on user_first_event.id = event.id
where user_first_event.first_event_date <= event.user_event_date),
conversion_rate as (
select
first_event_date,
(sum(case when days_since_signup = 7 then 1 else 0 end) * 100.0 /
count(distinct id)
) as seven_day_retention_rate
from gap
group by first_event_date
)
SELECT first_event_date,
AVG(seven_day_retention_rate)
OVER(ORDER BY first_event_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_avg_retention_rate
FROM conversion_rate
The problem is a bit easier than your query makes it seem, you can actually do it with just one subquery and one out query as follows:
select first_event_date
, avg(seven_day_return) as seven_day_return_day_only
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user, 1 if they retain and 0 if they do not
select min(timestamp)::date as first_event_date
, case when array_agg(timestamp::date) #> ARRAY[ (min(timestamp)::date + 7) ] then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;
Note that this weights each day equally rather than each user equally across days. If you want to weight the average by user across days then you can update the outer calculation using more aggregates and windows to compute the value with weightings.
Reference: http://sqlfiddle.com/#!17/ee17e/1/0
If you don't have access to array_agg (but have access to window functions) you can use:
select first_event_date
, avg(seven_day_return) as day_seven_day_return
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user
select min(timestamp)::date as first_event_date
, case when exists(select 1 from log l2 where l2.id = log.id and l2.timestamp::date = min(log.timestamp)::date + 7) then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;

Creating a table of last 12 month in Postgresql

I'm trying to create a table that looks like this (a table of the last 12 months)
month, year
10, 2016
9, 2016
8, 2016
7, 2016
6, 2016
5, 2016
4, 2016
3, 2016
2, 2016
1, 2016
12, 2015
11, 2015
The code I have looks something like this:
select date_part('month', current_date) as order_month,
select date_part('year', current_date) as order_year
union all
select date_part('month', current_date - interval '1 month') as order_month,
select date_part('year', current_date - interval '1 month') as order_year
union all
...
Is there a more concise way of writing this, rather than using 11 unions?
generate_series(start, stop, step) will be useful such that
SELECT
EXTRACT('month' FROM d) AS month,
EXTRACT('year' FROM d) AS year
FROM
GENERATE_SERIES(
now(),
now() - interval '12 months',
interval '-1 month'
) AS d
;