Adding rows to SQL query result - postgresql

I have a custom query in my Java application that looks like that:
select
to_char(search.timestamp,'Mon') as mon,
COUNT(DISTINCT(search.ip_address))
from
searches
WHERE
searches.city = 1
group by 1;
which should return all months that occur within the database, and number of distinct IP addresses within each month. However, at this point, some months do not have any entries, and they are missing in the SQL query result. How can I make sure that all of the months are displayed there, even if their count is 0?
Got it working with:
select
to_char (gs.m,'Mon') as mon,
count (distinct search.ip_address)
from
generate_series (
date_trunc('month', current_date - interval '11 month'),
current_date,
'1 month'
) gs (m)
left join searches
on date_trunc('month', search.timestamp) = gs.m AND search.city = 1
group by gs.m
order by gs.m;

select
to_char (gs.m,'Mon') as mon,
count (distinct(search.ip_address))
from
searches
right join
generate_series (
date_trunc('month', current_date - interval '1 year'),
current_date,
'1 month'
) gs (m) on date_trunc('month', search.timestamp) = gs.m
where searches.city = 1
group by gs.m
order by gs.m;

Something like this (untested):
select
months.mon
, COUNT(DISTINCT(searchs.ip_address))
from
(select
to_char(searches.timestamp,'Mon') as mon
from
searches
group by 1
) months
left join searches
on to_char(searchs.timestamp,'Mon') = months.mon
and searches.city = 1
group by 1;
And if you wanted the years in there, too, try something like this (untested):
select
months.mon
, COUNT(DISTINCT(searchs.ip_address))
from
(select
extract(year from searches.timestamp) as yr
, to_char(searches.timestamp,'Mon') as mon
, to_char(yr,'9999') || mon yrmon
from
searches
group by 1
) months
left join searches
on to_char(extract(year from searches.timestamp),'9999' ||
to_char(searchs.timestamp,'Mon') = months.yrmon
and searches.city = 1
group by 1;

Related

Calculations inside window function in PostgreSQL

I have a dataset of sales. To summarize, the structure is
client_id
date_purchase
There might be several purchases done by the same customer on different dates. There can also be several purchases done on the same date (by different or the same customer).
My goal is to get the number of customers, for any given day, that made 2 or more purchases between that day and 90 days prior.
That is, the expected output is
date_purchase
number_of_customers
2022-12-19
200
2022-12-18
194
(...)
Please note this calculates, for any given date, the number of customer with 2+ purchases between that date and 90 days prior.
I know it has something to do with a window function. But so far I have not found a way to calculate, for every window of 90 days, how many customers have done 2+ purchases.
I've tried several window functions with no success:
partition by date_purchase
range between interval '90 days' preceding and current row
So far I can't get to calculate correctly the number for each date.
Window function doesn't seem to be relevant here because there is no relationship between the rows of the same window. A simple query or a self-join query should provide the expected result.
Assuming that client_id and date_purchase are two columns of my_table :
1. Query for a given date reference_date :
SELECT a.reference_date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT reference_date , client_id
FROM my_table
WHERE date_purchase <= reference_date AND date_purchase >= reference_date - INTERVAL '90 days'
GROUP BY client_id
HAVING count(*) >= 2
) AS a
2. Query for a given interval of dates reference_date => reference_date + INTERVAL '20 days' :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN generate_series(reference_date, reference_date + INTERVAL '20 days', '1 day') AS ref(date)
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
3. Query for all the date_purchase in mytable :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN (SELECT DISTINCT date_purchase AS date FROM my_table) AS ref
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date

Postgresql how to get last year_month from previous year in where clause

I need to show how many active customers we had and the end of the year. Therefore I need to get always last year_month from the previous year. Working with PostgreSQL.
Here my SQL to get the customer base on monthly (year_month) view.
select *
from (
with data as (
select
a.brand,
a.d,
a.activations,
t.terminations,
a.activations-t.terminations count
from (select c.brand, dd.year_month d,
COALESCE(case when dd.year_month is not null then count(c.customer_number) else 0 end, 0) as activations
from generate_series(current_date - interval '8 years', current_date, '1 day') d
left join dim_date dd on dd."date" = d.d
left join r_contracts_report c on to_date(c.service_start_date, 'dd.mm.yyy') = d
where c.contract_status in ('aktiv', 'Kündigung vorgemerkt', 'gekündigt')
and c.contract in ('3048', '3049', '3050', '3055', '3056')
group by dd.year_month,
brand) a,
(select c.brand, dd.year_month d,
COALESCE(case when dd.year_month is not null then count(c.customer_number) else 0 end, 0) as terminations
from generate_series(current_date - interval '8 years', current_date, '1 day') d
left join dim_date dd on dd."date" = d.d
left join r_contracts_report c on to_date(c.termination_date, 'dd.mm.yyy') = d
where c.contract_status in ('aktiv', 'Kündigung vorgemerkt', 'gekündigt')
and c.contract in ('3048', '3049', '3050', '3055', '3056')
group by dd.year_month,
brand) t
where a.d = t.d
and a.brand = t.brand)
select
d.d year_month,
d.brand,
sum(count) over (order by d.d asc rows between unbounded preceding and current row) eop
from data d
where d.brand = '3'
) as foo
Using after "as foo" the following where clause I get the customer base for the last 12 months:
WHERE year_month >= to_char ((current_date - INTERVAL '12 months'), 'YYYY-MM')
And result looks like this:
But I always want to have only the December of the previous year. In this case it would be '2021-12'.
...
where year_month = '2021-12'
or automatically for the previous year:
...
where year_month = (extract(year from current_date) - 1)::text || '-12'
But this is a really inefficient way to get this data.

extract days of daterange grouped by month postresql

I have a pickupDate and returnDate in my OrderHistory table. I want to extract the sum of rental days of all OrderHistory entries, grouped/ordered by month. A cte seems to be the solution but I don´t get how to implement it in my query since the cte´s i saw were refering to themselves where it says "FROM cte".
I tried something like this:
SELECT
SUM((EXTRACT (DAY FROM("OrderHistory"."returnDate")-("OrderHistory"."pickupDate")))) as traveltime
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
M
ORDER BY
M
But the outcome doesn´t split bookings btw two months (e.g. pickupDate=27th march 2022 and returnDate=03rd of april 2022) but will assign the whole 7 days to the month of march, since the returndate is in it. It should show 4 days in march and 3 in april.
Sorry for the probably very stupid question but I am a beginner. (my code is written in postgresql btw)
PostgreSQL naming conventions
Are PostgreSQL column names case-sensitive?
use legal, lower-case names exclusively so double-quoting is not
needed.
Final result in db fiddle
Add daterange column.
alter table order_history add column date_ranges daterange;
update order_history
with a(m_begin, m_end, pickup_date) as
(select date_trunc('month', pickup_date)::date,
(date_trunc('month', pickup_date) + interval '1 month - 1 day')::date,
pickup_date from order_history)
update order_history set date_ranges =
daterange(a.m_begin, a.m_end,'[]') from a
where a.pickup_date = order_history.pickup_date;
then final query:
WITH A AS(
select
pickup_date,
return_date,
return_date - pickup_date as total,
case when return_date <# date_ranges then (return_date - pickup_date)
else ( date_trunc('month', pickup_date) + interval '1 month - 1 day')::date - pickup_date
end partial_mth
from order_history),
b as (SELECT *, a.total - partial_mth parital_not_mth FROM a)
select *,
case when to_char(pickup_date,'YYYY-MM') = to_char(return_date,'YYYY-MM')
then
sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM')) +
sum(parital_not_mth) over (partition by to_char(return_date,'YYYY-MM'))
else sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM'))
end
from b;
After trying different things I think I found the best answer to my question, that I want to share with the community:
WITH hier as (
SELECT
"OrderHistory"."pickupDate" as start_date
, "OrderHistory"."returnDate" as end_date
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
1, 2, 3
ORDER BY
3
), calendar as (
select date '2022-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select
to_char(calendar_date::date, 'YYYY-MM')
, count(*) as tage_gebucht
from calendar
inner join hier on calendar.calendar_date between start_date and end_date
where calendar_date between '2022-01-01' and '2022-12-31'
group by 1
order by 1;
I think this is the simplest solution I came up with.

DATE ADD function in PostgreSQL

I currently have the following code in Microsoft SQL Server to get users that viewed on two days in a row.
WITH uservideoviewvideo (date, user_id) AS (
SELECT DISTINCT date, user_id
FROM clickstream_videos
WHERE event_name ='video_play'
and user_id IS NOT NULL
)
SELECT currentday.date AS date,
COUNT(currentday.user_id) AS users_view_videos,
COUNT(nextday.user_id) AS users_view_next_day
FROM userviewvideo currentday
LEFT JOIN userviewvideo nextday
ON currentday.user_id = nextday.user_id AND DATEADD(DAY, 1,
currentday.date) = nextday.date
GROUP BY currentday.date
I am trying to get the DATEADD function to work in PostgreSQL but I've been unable to figure out how to get this to work. Any suggestions?
I don't think PostgreSQL really has a DATEADD function. Instead, just do:
+ INTERVAL '1 day'
SQL Server:
Add 1 day to the current date November 21, 2012
SELECT DATEADD(day, 1, GETDATE()); # 2012-11-22 17:22:01.423
PostgreSQL:
Add 1 day to the current date November 21, 2012
SELECT CURRENT_DATE + INTERVAL '1 day'; # 2012-11-22 17:22:01
SELECT CURRENT_DATE + 1; # 2012-11-22 17:22:01
http://www.sqlines.com/postgresql/how-to/dateadd
EDIT:
It might be useful if you're using a dynamic length of time to create a string and then cast it as an interval like:
+ (col_days || ' days')::interval
You can use date + 1 to do the equivalent of dateadd(), but I do not think that your query does what you want to do.
You should use window functions, instead:
with plays as (
select distinct date, user_id
from clickstream_videos
where event_name = 'video_play'
and user_id is not null
), nextdaywatch as (
select date, user_id,
case
when lead(date) over (partition by user_id
order by date) = date + 1 then 1
else 0
end as user_view_next_day
from plays
)
select date,
count(*) as users_view_videos,
sum(user_view_next_day) as users_view_next_day
from nextdaywatch
group by date
order by date;

Monthly retention in Amazon redshift

I'm trying to calculate monthly retention rate in Amazon Redshift and have come up with the following query:
Query 1
SELECT EXTRACT(year FROM activity.created_at) AS Year,
EXTRACT(month FROM activity.created_at) AS Month,
COUNT(DISTINCT activity.member_id) AS active_users,
COUNT(DISTINCT future_activity.member_id) AS retained_users,
COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
LEFT JOIN ads.fbs_page_view_staging AS future_activity
ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
Month
ORDER BY Year,
Month
For some reason this query returns zero retained_users and zero retention. I'd appreciate any help regarding why this may be happening or maybe a completely different query for monthly retention would work.
I modified the query as per another SO post and here it goes:
Query 2
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', created_at)) OVER (PARTITION BY member_id
ORDER BY date_trunc('month', created_at))
= date_trunc('month', created_at) - interval '1 month'
OR NULL AS repeat_transaction
FROM ads.fbs_page_view_staging
WHERE created_at >= '2016-01-01'::date
AND created_at < '2016-04-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
This query gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('m...
[Amazon](500310) Invalid operation: Interval values with month or year parts are not supported
Details:
-----------------------------------------------
error: Interval values with month or year parts are not supported
code: 8001
context: interval months: "1"
query: 616822
location: cg_constmanager.cpp:145
process: padbmaster [pid=15116]
-----------------------------------------------;
I have a feeling that Query 2 would fare better than Query 1, so I'd prefer to fix the error on that.
Any help would be much appreciated.
Query 1 looks good. I tried similar one. See below. You are using self join on table (ads.fbs_page_view_staging) and the same column (created_at). Assuming mongo_id is unique, the datediff('month'....) will always return 0 and datediff ('month',activity.created_at,future_activity.created_at) = 1 will always be false.
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct