extract days of daterange grouped by month postresql - postgresql

I have a pickupDate and returnDate in my OrderHistory table. I want to extract the sum of rental days of all OrderHistory entries, grouped/ordered by month. A cte seems to be the solution but I don´t get how to implement it in my query since the cte´s i saw were refering to themselves where it says "FROM cte".
I tried something like this:
SELECT
SUM((EXTRACT (DAY FROM("OrderHistory"."returnDate")-("OrderHistory"."pickupDate")))) as traveltime
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
M
ORDER BY
M
But the outcome doesn´t split bookings btw two months (e.g. pickupDate=27th march 2022 and returnDate=03rd of april 2022) but will assign the whole 7 days to the month of march, since the returndate is in it. It should show 4 days in march and 3 in april.
Sorry for the probably very stupid question but I am a beginner. (my code is written in postgresql btw)

PostgreSQL naming conventions
Are PostgreSQL column names case-sensitive?
use legal, lower-case names exclusively so double-quoting is not
needed.
Final result in db fiddle
Add daterange column.
alter table order_history add column date_ranges daterange;
update order_history
with a(m_begin, m_end, pickup_date) as
(select date_trunc('month', pickup_date)::date,
(date_trunc('month', pickup_date) + interval '1 month - 1 day')::date,
pickup_date from order_history)
update order_history set date_ranges =
daterange(a.m_begin, a.m_end,'[]') from a
where a.pickup_date = order_history.pickup_date;
then final query:
WITH A AS(
select
pickup_date,
return_date,
return_date - pickup_date as total,
case when return_date <# date_ranges then (return_date - pickup_date)
else ( date_trunc('month', pickup_date) + interval '1 month - 1 day')::date - pickup_date
end partial_mth
from order_history),
b as (SELECT *, a.total - partial_mth parital_not_mth FROM a)
select *,
case when to_char(pickup_date,'YYYY-MM') = to_char(return_date,'YYYY-MM')
then
sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM')) +
sum(parital_not_mth) over (partition by to_char(return_date,'YYYY-MM'))
else sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM'))
end
from b;

After trying different things I think I found the best answer to my question, that I want to share with the community:
WITH hier as (
SELECT
"OrderHistory"."pickupDate" as start_date
, "OrderHistory"."returnDate" as end_date
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
1, 2, 3
ORDER BY
3
), calendar as (
select date '2022-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select
to_char(calendar_date::date, 'YYYY-MM')
, count(*) as tage_gebucht
from calendar
inner join hier on calendar.calendar_date between start_date and end_date
where calendar_date between '2022-01-01' and '2022-12-31'
group by 1
order by 1;
I think this is the simplest solution I came up with.

Related

Calculations inside window function in PostgreSQL

I have a dataset of sales. To summarize, the structure is
client_id
date_purchase
There might be several purchases done by the same customer on different dates. There can also be several purchases done on the same date (by different or the same customer).
My goal is to get the number of customers, for any given day, that made 2 or more purchases between that day and 90 days prior.
That is, the expected output is
date_purchase
number_of_customers
2022-12-19
200
2022-12-18
194
(...)
Please note this calculates, for any given date, the number of customer with 2+ purchases between that date and 90 days prior.
I know it has something to do with a window function. But so far I have not found a way to calculate, for every window of 90 days, how many customers have done 2+ purchases.
I've tried several window functions with no success:
partition by date_purchase
range between interval '90 days' preceding and current row
So far I can't get to calculate correctly the number for each date.
Window function doesn't seem to be relevant here because there is no relationship between the rows of the same window. A simple query or a self-join query should provide the expected result.
Assuming that client_id and date_purchase are two columns of my_table :
1. Query for a given date reference_date :
SELECT a.reference_date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT reference_date , client_id
FROM my_table
WHERE date_purchase <= reference_date AND date_purchase >= reference_date - INTERVAL '90 days'
GROUP BY client_id
HAVING count(*) >= 2
) AS a
2. Query for a given interval of dates reference_date => reference_date + INTERVAL '20 days' :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN generate_series(reference_date, reference_date + INTERVAL '20 days', '1 day') AS ref(date)
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
3. Query for all the date_purchase in mytable :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN (SELECT DISTINCT date_purchase AS date FROM my_table) AS ref
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

DATE ADD function in PostgreSQL

I currently have the following code in Microsoft SQL Server to get users that viewed on two days in a row.
WITH uservideoviewvideo (date, user_id) AS (
SELECT DISTINCT date, user_id
FROM clickstream_videos
WHERE event_name ='video_play'
and user_id IS NOT NULL
)
SELECT currentday.date AS date,
COUNT(currentday.user_id) AS users_view_videos,
COUNT(nextday.user_id) AS users_view_next_day
FROM userviewvideo currentday
LEFT JOIN userviewvideo nextday
ON currentday.user_id = nextday.user_id AND DATEADD(DAY, 1,
currentday.date) = nextday.date
GROUP BY currentday.date
I am trying to get the DATEADD function to work in PostgreSQL but I've been unable to figure out how to get this to work. Any suggestions?
I don't think PostgreSQL really has a DATEADD function. Instead, just do:
+ INTERVAL '1 day'
SQL Server:
Add 1 day to the current date November 21, 2012
SELECT DATEADD(day, 1, GETDATE()); # 2012-11-22 17:22:01.423
PostgreSQL:
Add 1 day to the current date November 21, 2012
SELECT CURRENT_DATE + INTERVAL '1 day'; # 2012-11-22 17:22:01
SELECT CURRENT_DATE + 1; # 2012-11-22 17:22:01
http://www.sqlines.com/postgresql/how-to/dateadd
EDIT:
It might be useful if you're using a dynamic length of time to create a string and then cast it as an interval like:
+ (col_days || ' days')::interval
You can use date + 1 to do the equivalent of dateadd(), but I do not think that your query does what you want to do.
You should use window functions, instead:
with plays as (
select distinct date, user_id
from clickstream_videos
where event_name = 'video_play'
and user_id is not null
), nextdaywatch as (
select date, user_id,
case
when lead(date) over (partition by user_id
order by date) = date + 1 then 1
else 0
end as user_view_next_day
from plays
)
select date,
count(*) as users_view_videos,
sum(user_view_next_day) as users_view_next_day
from nextdaywatch
group by date
order by date;

Adding rows to SQL query result

I have a custom query in my Java application that looks like that:
select
to_char(search.timestamp,'Mon') as mon,
COUNT(DISTINCT(search.ip_address))
from
searches
WHERE
searches.city = 1
group by 1;
which should return all months that occur within the database, and number of distinct IP addresses within each month. However, at this point, some months do not have any entries, and they are missing in the SQL query result. How can I make sure that all of the months are displayed there, even if their count is 0?
Got it working with:
select
to_char (gs.m,'Mon') as mon,
count (distinct search.ip_address)
from
generate_series (
date_trunc('month', current_date - interval '11 month'),
current_date,
'1 month'
) gs (m)
left join searches
on date_trunc('month', search.timestamp) = gs.m AND search.city = 1
group by gs.m
order by gs.m;
select
to_char (gs.m,'Mon') as mon,
count (distinct(search.ip_address))
from
searches
right join
generate_series (
date_trunc('month', current_date - interval '1 year'),
current_date,
'1 month'
) gs (m) on date_trunc('month', search.timestamp) = gs.m
where searches.city = 1
group by gs.m
order by gs.m;
Something like this (untested):
select
months.mon
, COUNT(DISTINCT(searchs.ip_address))
from
(select
to_char(searches.timestamp,'Mon') as mon
from
searches
group by 1
) months
left join searches
on to_char(searchs.timestamp,'Mon') = months.mon
and searches.city = 1
group by 1;
And if you wanted the years in there, too, try something like this (untested):
select
months.mon
, COUNT(DISTINCT(searchs.ip_address))
from
(select
extract(year from searches.timestamp) as yr
, to_char(searches.timestamp,'Mon') as mon
, to_char(yr,'9999') || mon yrmon
from
searches
group by 1
) months
left join searches
on to_char(extract(year from searches.timestamp),'9999' ||
to_char(searchs.timestamp,'Mon') = months.yrmon
and searches.city = 1
group by 1;

Last 12 months, group by week

I have a table with a column REGDATE, a registration date (YYYY-MM-DD HH:MM:SS). I would like to show an histogram (ExtJS) in order to understand in which period of the years users are signing up. I would like to do this for the past twelve months with respect to the current date and to group dates by week.
Any hints?
FWIW in PostgreSQL, Karaszi has an answer that works, but there is a faster query:
SELECT date_trunc('week', REGDATE) AS "Week" , count(*) AS "No. of users"
FROM <<TABLE>>
WHERE REGDATE > now() - interval '12 months'
GROUP BY 1
ORDER BY 1;
I based this off the work of Ben Goodacre
in MySQL:
SELECT COUNT(*), DATE_FORMAT(regdate, "%X%V") AS regweek FROM table GROUP BY regweek;
or
SELECT COUNT(*), YEARWEEK(NOW(), 2) as regweek FROM table GROUP BY regweek;
in PostgreSQL:
SELECT COUNT(*), EXTRACT(YEAR FROM regdate)::text || EXTRACT(WEEK FROM regdate)::text AS regweek FROM table GROUP BY regweek;
Maybe this?
select to_char(REGDATE,'WW') "Week number",
count(*) "number of signups",
from YOUR_TABLE
where REGDATE > current_date-365
group by to_char(REGDATE,'WW')
order by to_char(REGDATE,'WW')
Hint: (SQL)
SELECT CONVERT (VARCHAR(7), REGDATE, 120) AS [RegistrationMonth]
FROM ...
GROUP BY CONVERT (VARCHAR(7), REGDATE, 120)
ORDER BY CONVERT (VARCHAR(7), REGDATE, 120)