Best way to Join on Date Range PostgreSQL - postgresql

This Periscope Data Blog posts describes how to use the SQL query below to find the number of users who visited your website this month who also came last month, i.e. "retention":
with monthly_activity as (
select distinct
date_trunc('month', created_at) as month,
user_id
from events
)
select
this_month.month,
count(distinct user_id)
from monthly_activity this_month
join monthly_activity last_month
on this_month.user_id = last_month.user_id
and this_month.month = last_month.month + interval '1 month' -- date join
group by month
But what if retention is defined as not "visited last month" but "visited in any month in the past"?
In that case, we would want the date join (highlighted as a comment in the above query) for not just 1 month in the past but all months in the past.
I.e. something like this, but without hardcoding:
and this_month.month = last_month.month + interval '1 month'
and this_month.month = last_month.month + interval '2 month'
and this_month.month = last_month.month + interval '3 month'
...
I'm stumped on what's the best way to do this.
I've tried the following workarounds, although I think they lead to weird cartesian products since my queries aren't running:
AND current_month.month > past_logs.month -- doesn't run
and
AND past_logs.month IN (select i::date from generate_series('2014-10-01',
current_month.month, '1 month'::interval) i) -- doesn't run
Thoughts on best way to do this?

Related

Calculations inside window function in PostgreSQL

I have a dataset of sales. To summarize, the structure is
client_id
date_purchase
There might be several purchases done by the same customer on different dates. There can also be several purchases done on the same date (by different or the same customer).
My goal is to get the number of customers, for any given day, that made 2 or more purchases between that day and 90 days prior.
That is, the expected output is
date_purchase
number_of_customers
2022-12-19
200
2022-12-18
194
(...)
Please note this calculates, for any given date, the number of customer with 2+ purchases between that date and 90 days prior.
I know it has something to do with a window function. But so far I have not found a way to calculate, for every window of 90 days, how many customers have done 2+ purchases.
I've tried several window functions with no success:
partition by date_purchase
range between interval '90 days' preceding and current row
So far I can't get to calculate correctly the number for each date.
Window function doesn't seem to be relevant here because there is no relationship between the rows of the same window. A simple query or a self-join query should provide the expected result.
Assuming that client_id and date_purchase are two columns of my_table :
1. Query for a given date reference_date :
SELECT a.reference_date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT reference_date , client_id
FROM my_table
WHERE date_purchase <= reference_date AND date_purchase >= reference_date - INTERVAL '90 days'
GROUP BY client_id
HAVING count(*) >= 2
) AS a
2. Query for a given interval of dates reference_date => reference_date + INTERVAL '20 days' :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN generate_series(reference_date, reference_date + INTERVAL '20 days', '1 day') AS ref(date)
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
3. Query for all the date_purchase in mytable :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN (SELECT DISTINCT date_purchase AS date FROM my_table) AS ref
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date

Fetch records of current month using PostgreSQL query

Suppose I have following data in a table
id createdAt
1 2021-02-26T06:29:03.482Z
2 2021-02-27T06:29:03.482Z
3 2021-03-14T06:29:03.482Z
4 2021-03-17T06:29:03.482Z
I want data of current month. ie, if I generate report in march, I need to fetch results of march, so we need only current month data from table.
wanted output is
id createdAt
3 2021-03-14T06:29:03.482Z
4 2021-03-17T06:29:03.482Z
Anyone please help. Thank you.
You can use date_trunc():
select *
from the_table
where date_trunc('month', createdat) = date_trunc('month', current_timestamp);
date_trunc('month', ...) returns the first day of the month.
However, the above is not able to make use of an index on createdat. To improve performance, use a range query:
select *
from the_table
where createdat >= date_trunc('month', current_timestamp)
and createdat < date_trunc('month', current_timestamp) + interval '1 month'
The expression date_trunc('month', current_timestamp) + interval '1 month' returns the start of the next month (that's way this is compared with <)
You can compare the month and year of a date with the current one. But the index by field will not be used, you can build a separate index by year and month for this.
select *
from your_table
where extract(YEAR FROM createdAt) = extract(YEAR FROM now())
and extract(MONTH FROM createdAt) = extract(MONTH FROM now())

How to find Last Week entries and This Week entries from postgres tables

I want to find the LastWeek entries from postgres table with cycle from Monday to Sunday (both inclusive) For eg - if I query the data today i.e on 2020/07/26 (or say if i query data on any date between 2020/07/20 to 2020/07/26) i should get the data from 2020/07/13 to 2020/07/19
Query:
Select user, date_sent
from users
where date_sent between (SELECT current_date - cast(extract(dow from current_date) as int) - 6)
and (SELECT current_date - cast(extract(dow from current_date) as int) + 1)
Similarly I want to find the This Week entries week starting from Monday and ending on present date. For eg - If I query the data today i.e on 2020/07/26 I should get the data from 2020/07/20 to 2020/07/26. If i query on 2020/07/24 then I should get 2020/07/20 to 2020/07/24
Query:
select user, date_sent
from users
where date_sent >= date_trunc('week', current_date)
and date_sent <= date_trunc('day',current_date+1)
You are almost there.
For "this week":
select user, date_sent
from users
where date_sent >= date_trunc('week', current_date)
and date_sent < date_trunc('week', current_date) + interval '1 week';
For last week it's quite similar:
select user, date_sent
from users
where date_sent >= date_trunc('week', current_date) - interval '1 week'
and date_sent < date_trunc('week', current_date)
Your desired results are inconsistent. In your description, before your initial query you state:
if I query the data today i.e on 2020/07/26 (or say if i query data on
any date between 2020/07/20 to 2020/07/26) i should get the data from
2020/07/13 to 2020/07/19
But after that query you state:
If I query the data today i.e on 2020/07/26 I should get the data from
2020/07/20 to 2020/07/26.
You cannot have both.
Assuming the latter to be correct and assuming ISO-8601 week definition, then your request can be re-phased as:
Given a specified date, if that date falls in the same week as the
current date then return the dates from the start of the week to the
specified date, inclusive. If the specified date does not fall in the
current week return the dates return the dates from Monday on or prior
to the specified date through Sunday on or after the specified date, inclusive.
The following implements that.
with targets (for_week_containing_date
,from_week_start
,iso_from_week
,iso_this_week) as
( select &for_week_containing_date
, date_trunc('week', &for_week_containing_date)
, extract(week from &for_week_containing_date)
, extract(week from now())
)
select user, date_sent
from user_days
cross join targets
where 1=1
and date_sent >= from_week_start
and date_sent <= case when iso_from_week = iso_this_week
then for_week_containing_date
else from_week_start + interval '6 days'
end
;
Since I do not care much for substitution variables this would need bound variables from a script, or wrap wrap it in an SQL function. See example of that here. Also note the last 2 queries, make sure you are ok with and understand what's happening around year end. You may need to make end of year/ begin of year adjustments. The results are not from being in a function, but result from ISO-8601 definitions. End of year/Begin year checking is needed any time you deal with date ranges.

Compare day in current month to same day previous month PostgreSQL

I'm trying to compare values of current month's data to previous months using PostgreSQL. So if today is 4/23/2018, I want the data for 3/23/2018.
I've tried current_date - interval '1 month' but it is problematic for months with 31 days.
My table is structured as simply as
date, value
Check this example query:
WITH dates AS (SELECT date::date FROM generate_series('2018-01-01'::date, '2018-12-31'::date, INTERVAL '1 day') AS date)
SELECT
start_dates.date AS start_date,
end_dates.date AS end_date
FROM
dates AS start_dates
RIGHT JOIN dates AS end_dates
ON ( start_dates.date + interval '1 month' = end_dates.date AND
end_dates.date - interval '1 month' = start_dates.date);
It will output all end_dates and corresponding start_dates. The corresponding dates are defined by interval '1 month' and checked in both ways:
start_dates.date + interval '1 month' = end_dates.date AND
end_dates.date - interval '1 month' = start_dates.date
The output looks like this:
....
2018-02-26 2018-03-26
2018-02-27 2018-03-27
2018-02-28 2018-03-28
2018-03-29
2018-03-30
2018-03-31
2018-03-01 2018-04-01
2018-03-02 2018-04-02
2018-03-03 2018-04-03
2018-03-04 2018-04-04
....
Note, that there are 'gaps' for days without corresponding dates.
Back to your table, join the table with itself (giving aliases) and use given join condition, so the query would look like this:
SELECT
start_dates.value - end_dates.value AS change,
start_dates.date AS start_date,
end_dates.date AS end_date
FROM
_your_table_name_ AS start_dates
RIGHT JOIN _your_table_name_ AS end_dates
ON ( start_dates.date + interval '1 month' = end_dates.date AND
end_dates.date - interval '1 month' = start_dates.date);
Given the following table structure:
create table t (
d date,
v int
);
After populating with some dates and values, there is a way to find the value of the previous month using simple calculations and the LAG function, without resorting to joins. I am not sure how it compares from a performance perspective, so please run your own tests before selecting which solution to use.
select
*,
lag(v, day_of_month) over (order by d) as v_end_of_last_month,
lag(v, last_day_of_previous_month + day_of_month - cast(extract(day from d - interval '1 month') as int)) over (order by d) as v_same_day_last_month
from (
select
*,
lag(day_of_month, day_of_month) over (order by d) as last_day_of_previous_month
from (
select
*,
cast(extract(day from d) as int) as day_of_month
from
t
) t_dom
) t_dom_ldopm;
You may note that between the 29th and 31st of March, the comparison will be made against the 28th of February, since the same day does not exist in February for those particular dates. The same logic applies to other months with different number of days.

Compare current data with data of the a year ago, the same date (Postgres)

I am trying to return the data of the orders of the current date and the orders of the same date a year ago.
My idea was to create two similar tables and merge the date by adding WHERE clauses. But it seems to not work.
Could you have a look at my code and see if you identify something wrong?
My outcome of this is totally blank.
Thanks a lot!
WITH
orders_channels AS
(SELECT
'BLH' AS brand,
date_trunc('week', date)::date AS date,
channel,
order_type,
case when (date_trunc('week', date)::date = current_date - interval '1 day') then 'current'
else 'previous' end
as week_type,
sum(orders) AS orders
FROM
de_data.orders_daily_channel_attribution_dashboard
WHERE
date > date_trunc('day', current_date) - interval '1 day'
GROUP BY 1,2,3,4),
wow_orders_channels AS
(SELECT
'BLH' AS brand,
date_trunc('week', date)::date AS date,
channel,
order_type,
case when (date_trunc('week', date)::date = current_date - interval '1 day') then 'current'
else 'previous' end
as week_type,
sum(orders) AS orders
FROM
de_data.orders_daily_channel_attribution_dashboard
WHERE
date >= date_trunc('week', current_date) - INTERVAL '1 year'
GROUP BY 1,2,3,4)
SELECT
*
FROM
(SELECT
o.brand,
date_trunc('week', o.date)::date as week,
'SEO_ACQ' AS name,
o.orders,
wow.orders as wow_orders
FROM
orders_channels o
join wow_orders_channels wow on wow.date = o.date - interval '1 year' and o.order_type = wow.order_type
where
o.channel = 'SEO'
AND o.order_type = 'ACQUISITION'
UNION ALL
SELECT
o.brand,
date_trunc('week', o.date) as week,
'CRM_ORDERS' AS name,
SUM(o.orders),
sum(wow.orders) as wow_orders
FROM
orders_channels o
join wow_orders_channels wow on wow.date = o.date - interval '1 year' and o.order_type = wow.order_type
WHERE
o.channel = 'CRM'
GROUP BY 1,2,3) x
ORDER BY 3,2