Generating series Postgres - postgresql

I want to be able to generate groups of row by days, weeks, month or depending on the interval I set
Following this solution, it works when granularity is by month. But trying the interval of 1 week, no records are being returned.
This is the rows on my table
This is the current query I have for per month interval, which works perfectly.
SELECT *
FROM (
SELECT day::date
FROM generate_series(timestamp '2018-09-01'
, timestamp '2018-12-01'
, interval '1 month') day
) d
LEFT JOIN (
SELECT date_trunc('month', created_date)::date AS day
, SUM(escrow_amount) AS profit, sum(total_amount) as revenue
FROM (
select distinct on (order_id) order_id, escrow_amount, total_amount, create_time from order_item
WHERE created_date >= date '2018-09-01'
AND created_date <= date '2018-12-01'
-- AND ... more conditions
) t2 GROUP BY 1
) t USING (day)
ORDER BY day;
Result from this query
And this is the per week interval query. I will reduce the range to two months for brevity.
SELECT *
FROM (
SELECT day::date
FROM generate_series(timestamp '2018-09-01'
, timestamp '2018-11-01'
, interval '1 week') day
) d
LEFT JOIN (
SELECT date_trunc('week', created_date)::date AS day
, SUM(escrow_amount) AS profit, sum(total_amount) as revenue
FROM (
select distinct on (order_id) order_id, escrow_amount, total_amount, create_time from order_item
WHERE created_date >= date '2018-09-01'
AND created_date <= date '2018-11-01'
-- AND ... more conditions
) t2 GROUP BY 1
) t USING (day)
ORDER BY day;
Take note that I have records from October, but the result here doesn't show anything for October dates.
Any idea what I am missing here?

Results from your first query are not truncated to the begin of the week.
date_trunc('2018-09-01'::date, 'week')::date
is equal to
'2018-08-27'::date
so your join using day is not working
'2018-09-01'::date <> '2018-08-27'::date
Your query should look more like that:
SELECT *
FROM (
SELECT day::date
FROM generate_series(date_trunc('week',timestamp '2018-09-01') --series begin trunc
, timestamp '2018-11-01'
, interval '1 week') day
) d
LEFT JOIN (
SELECT date_trunc('week', created_date::date)::date AS day
, SUM(escrow_amount) AS profit, sum(total_amount) as revenue
FROM (
select distinct on (order_id) order_id, escrow_amount, total_amount, create_time from order_item
WHERE created_date::date >= date '2018-09-01'
AND created_date::date <= date '2018-11-01'
-- AND ... more conditions
) t2 GROUP BY 1
) t USING (day)
WHERE day >= '2018-09-01' --to skip days from begining of the week to the begining of the series before trunc
ORDER BY day;

Related

Calculations inside window function in PostgreSQL

I have a dataset of sales. To summarize, the structure is
client_id
date_purchase
There might be several purchases done by the same customer on different dates. There can also be several purchases done on the same date (by different or the same customer).
My goal is to get the number of customers, for any given day, that made 2 or more purchases between that day and 90 days prior.
That is, the expected output is
date_purchase
number_of_customers
2022-12-19
200
2022-12-18
194
(...)
Please note this calculates, for any given date, the number of customer with 2+ purchases between that date and 90 days prior.
I know it has something to do with a window function. But so far I have not found a way to calculate, for every window of 90 days, how many customers have done 2+ purchases.
I've tried several window functions with no success:
partition by date_purchase
range between interval '90 days' preceding and current row
So far I can't get to calculate correctly the number for each date.
Window function doesn't seem to be relevant here because there is no relationship between the rows of the same window. A simple query or a self-join query should provide the expected result.
Assuming that client_id and date_purchase are two columns of my_table :
1. Query for a given date reference_date :
SELECT a.reference_date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT reference_date , client_id
FROM my_table
WHERE date_purchase <= reference_date AND date_purchase >= reference_date - INTERVAL '90 days'
GROUP BY client_id
HAVING count(*) >= 2
) AS a
2. Query for a given interval of dates reference_date => reference_date + INTERVAL '20 days' :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN generate_series(reference_date, reference_date + INTERVAL '20 days', '1 day') AS ref(date)
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
3. Query for all the date_purchase in mytable :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN (SELECT DISTINCT date_purchase AS date FROM my_table) AS ref
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date

Dynamic value passing in Postgres

Here is a complex query where i need to pass some dates as dynamic to this, As of now i have hardcoded this '2021-08-01' AND '2022-07-31' these 2 dates.
But i have to pass this dates dynamically in such a way that next dates ie, 2022-06 month , thew dates passed will be '2021-07-01' and '2022-06-30' , basically 12 months behind data.
if we take 2022-05 then the passed date should be '2021-06-01' and '2022-05-31'.
How can we achieve this ? Any suggestions or help will be much appreciated.
below is the query for reference
WITH base as
(
SELECT created_at as period ,order_number, TRIM(email) as email ,is_first_order
FROM orders
WHERE created_at::DATE BETWEEN '2021-08-01' AND '2022-07-31'
)
,base_agg as
(
select TO_CHAR(period,'YYYY-MM') as period
,COUNT(DISTINCT email)FILTER(WHERE is_first_order IS TRUE) as new_users
,COUNT(DISTINCT order_number)FILTER(WHERE is_first_order IS FALSE) as returning_orders
FROM base
GROUP BY 1
)
,base_cumulative as
(
SELECT ROW_NUMBER() OVER(ORDER BY PERIOD DESC ) as rno
,period
,new_users
,returning_orders
,sum("new_users")over (order by "period" asc rows between unbounded preceding and current row) as "cumulative_total"
from base_agg
)
SELECT
(SELECT period FROM base_cumulative WHERE rno=1) period
,(SELECT cumulative_total FROM base_cumulative WHERE rno=1) as cumulated_customers
,SUM(returning_orders) as returning_orders
,SUM(returning_orders)/NULLIF((SELECT cumulative_total FROM base_cumulative WHERE rno=1),0) as rate
FROM base_cumulative
You can calculate the end of current month based on NOW() and some logic, the same can be applied with the rest of the calculation
select date_trunc('month', now())::date + interval '1 month - 1 day' end_of_this_month,
date_trunc('month', now())::date + interval '1 month - 1 day'::interval - '1 year'::interval + '1 day'::interval first_day_of_prev_year_month
;
Result
end_of_this_month | first_day_of_prev_year_month
---------------------+------------------------------
2022-08-31 00:00:00 | 2021-09-01 00:00:00
(1 row)

Make date_trunc() start on Sunday instead of Monday

Select date_trunc('week',dateTime) Date_week, Max(Ranking) Runing_Total_ID
from (select datetime, id , dense_rank () over (order by datetime) as Ranking
from Table1)
group by 1
This query is working for me to give me the running total of total IDs by week. But the week starts on Monday in Postgres by default. Is there any way to change the week start to SUNDAY?
Shift the timestamp back and forth:
Add a day before feeding the timestamp to date_trunc(), then subtract again:
SELECT date_trunc('week', datetime + interval '1 day') - interval '1 day' AS date_week
, max(ranking) AS runing_total_id
FROM (
SELECT datetime, dense_rank() OVER (ORDER BY datetime) AS ranking
FROM table1
) sub
GROUP BY 1;
See:
PostgreSQL custom week number - first week containing Feb 1st

Getting fortnight from timestamp in Postgres

I'm doing some cohort analysis and want to see for a group of customers in November, how many transact weekly, fortnightly, and monthly; and for how long
I have this for the week and month (weekly example):
WITH weekly_users AS (
SELECT user_fk
, DATE_TRUNC('week',created_at) AS week
, (DATE_PART('year', created_at) - 2016) * 52 + DATE_PART('week', created_at) - 45 AS weeks_between
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'
GROUP BY user_fk, week, weeks_between
),
t2 AS (
SELECT weekly_users.*
, COUNT(*) OVER (PARTITION BY user_fk
ORDER BY week ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING) AS prev_rec_cnt
FROM weekly_users
)
SELECT week
, COUNT(*)
FROM t2
WHERE weeks_between = prev_rec_cnt
GROUP BY week
ORDER BY week;
But weekly is too little of an interval, and monthly too much. So I want fortnight. Has anyone done this before? From Googling it seems like a challenge
Thanks in advance
Just worked it out, this is how you'd do it:
WITH fortnightly_users AS (
SELECT user_fk
, EXTRACT(YEAR FROM created_at) * 100 + CEIL(EXTRACT(WEEK FROM created_at)/2) AS fortnight
, (EXTRACT(YEAR FROM created_at) - 2016) * 26 + CEIL(EXTRACT(WEEK FROM created_at)/2) - 23 AS fortnights_between
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'
GROUP BY user_fk, fortnight, fortnights_between
),
t2 AS (
SELECT fortnightly_users.*
, COUNT(*) OVER (PARTITION BY user_fk
ORDER BY fortnight ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING) AS prev_rec_cnt
FROM fortnightly_users
)
SELECT fortnight
, COUNT(*)
FROM t2
WHERE fortnights_between = prev_rec_cnt
GROUP BY fortnight
ORDER BY fortnight;
So you get the week number, then divide by 2. Rounding up to avoid fractional numbers for fortnights

Count records grouped by day that counted by interval

Here is the query
WITH dates AS (
SELECT current_date - serie AS date
FROM generate_series(0, 365, 1) AS serie
), items AS (
SELECT *
FROM items
WHERE created_at BETWEEN now() - interval '6 months' AND now()
)
SELECT dates.date, count(items)
FROM dates
LEFT OUTER JOIN items ON items.created_at::date = dates.date
GROUP BY dates.date
Everything works fine except one thing - I need to somehow replace now() with day in a row.
So for each day calculate items count with conditions based on that day.
Just can't reference it.
Is there any solution for this?
smth like this?
WITH dates AS (
SELECT current_date - serie AS date
FROM generate_series(0, 365, 1) AS serie
)
SELECT dates.date, count(items)
FROM dates
LEFT OUTER JOIN items ON created_at BETWEEN dates.date- interval '6 months' AND dates.date
GROUP BY dates.date;
I came to the following solution, which has the same result as Vao Tsun proposed:
WITH dates AS (
SELECT current_date - serie AS date
FROM generate_series(0, 365, 1) AS serie
), date_intervals AS (
SELECT
(dates.date - INTERVAL '6 months') AS start_date,
dates.date AS end_date
FROM dates
)
SELECT date_intervals.end_date, count(items)
FROM date_intervals
LEFT OUTER JOIN items ON items.created_at BETWEEN date_intervals.start_date AND date_intervals.end_date
GROUP BY 1
ORDER BY 1