Get an average monthly view of active members (Postgresql) - postgresql

I am working with members data. I have the responsible Coach, the coachee entry, exit status and date. Because some coachees might graduate/leave during a month I want to calculate a daily number and then get a monthly average of active members for each coach. That means that I need to take in the account all coachees from previous months, that are still active that current month. This is my data:
I am thinking of creating a variable first where I can get the daily active member count for each coach. This is my first approach:
with all_years as (
select y.year, m.month, d.day
from generate_series(2019, 2022) as y(year)
cross join generate_series(1, 12) as m(month)
cross join generate_series(1, 31) as d(day) --<<*not sure how to adjust for days with less than 31 days??*
select ay.*, coach, coachee, entry_status, entry_date, exit_reason, exit_date, sum(count) over (partition by ay.coach order by ay.year, ay.month, ay.day)
from all_years ay
left join table t
on --.... *not sure what I can join on in this case*;
I am open to an easier approach, this logic is just an idea.

You can cross join the list of distinct coaches with all dates to generat combinations, then bring the table with a left join:
select d.dt, c.coach, count(t.coach) no_coachees
from (select distinct coach from mytable) c
cross join generate_series('2019-01-01'::date, '2022-12-31'::date, '1 day':: interval) d(dt)
left join mytable t on t.coach = c.coach and t.entry_date <= d.dt and t.exit_date > d.dt
group by d.dt, c.coach
Then you can use another level of aggregation to get the monthly average:
select date_trunc('month', d.dt) d_month, coach, avg(no_coachees) avg_coaches
from (
select d.dt, c.coach, count(t.coach) no_coachees
from (select distinct coach from mytable) c
cross join generate_series('2019-01-01'::date, '2022-12-31'::date, '1 day':: interval) d(dt)
left join mytable t on t.coach = c.coach and t.entry_date <= d.dt and t.exit_date > d.dt
group by d.dt, c.coach
) t
group by date_trunc('month', d.dt), coach

Related

Is there a SQL code for cumulative count of SaaS customer over months?

I have a table with:
ID (id client), date_start (subscription of SaaS), date_end (could be a date value or be NULL).
So I need a cumulative count of active clients month by month.
any idea on how to write that in Postgres and achieve this result?
Starting from this, but I don't know how to proceed
select
date_trunc('month', c.date_start)::date,
count(*)
from customer
Please check next solution:
select
subscrubed_date,
subscrubed_customers,
unsubscrubed_customers,
coalesce(subscrubed_customers, 0) - coalesce(unsubscrubed_customers, 0) cumulative
from (
select distinct
date_trunc('month', c.date_start)::date subscrubed_date,
sum(1) over (order by date_trunc('month', c.date_start)) subscrubed_customers
from customer c
order by subscrubed_date
) subscribed
left join (
select distinct
date_trunc('month', c.date_end)::date unsubscrubed_date,
sum(1) over (order by date_trunc('month', c.date_end)) unsubscrubed_customers
from customer c
where date_end is not null
order by unsubscrubed_date
) unsubscribed on subscribed.subscrubed_date = unsubscribed.unsubscrubed_date;
share SQL query
You have a table of customers. With a start date and sometimes an end date. As you want to group by date, but there are two dates in the table, you need to split these first.
Then, you may have months where only customers came and others where only customers left. So, you'll want a full outer join of the two sets.
For a cumulative sum (also called a running total), use SUM OVER.
with came as
(
select date_trunc('month', date_start) as month, count(*) as cnt
from customer
group by date_trunc('month', date_start)
)
, went as
(
select date_trunc('month', date_end) as month, count(*) as cnt
from customer
where date_end is not null
group by date_trunc('month', date_end)
)
select
month,
came.cnt as cust_new,
went.cnt as cust_gone,
sum(came.cnt - went.cnt) over (order by month) as cust_active
from came full outer join went using (month)
order by month;

Sub query in SELECT - ungrouped column from outer query

I have to calculate the ARPU (Revenue / # users) but I got this error:
subquery uses ungrouped column "usage_records.date" from outer query
LINE 7: WHERE created_at <= date_trunc('day', usage_records.d... ^
Expected results:
Revenue(day) = SUM(quantity_eur) for that day
Users Count (day) = Total signed up users before that day
Postgresql (Query)
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
( SELECT
COUNT(users.id)
FROM users
WHERE created_at <= date_trunc('day', usage_records.date)
) as users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc
your subquery (executed for each row ) cointain a column nont mentioned in group by but not involeved in aggregation ..
this produce error
but you could refactor your query using a contional also for this value
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
sum( case when created_at <= date_trunc('day', usage_records.date)
AND users.id is not null
then 1 else 0 end ) users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc

How to natural join the two queries having with clause?

I have written two queries that help in finding the minimum and maximum sales quantities for different products. Now, I need to merge these two queries using natural join to output one single table.
Query 1:
with max_quant_table as
(with maxquant_table as
(select distinct month,prod as prod, sum(quant) as quant from sales group by month,prod)
select month as month,prod as MOST_POPULAR_PROD, quant as MOST_POP_TOTAL_Q
from maxquant_table)
select t2.* from
(select month, max(MOST_POP_TOTAL_Q) maxQ FROM max_quant_table group by month order by month asc)
t1 join max_quant_table t2 on t1.month = t2.month and (t2.MOST_POP_TOTAL_Q =maxQ)
Query 2:
with min_quant_table as
(with minquant_table as
(select distinct month,prod as prod, sum(quant) as quant from sales group by month,prod)
select month as month,prod as LEAST_POPULAR_PROD, quant as LEAST_POP_TOTAL_Q
from minquant_table)
select t2.* from
(select month, min(LEAST_POP_TOTAL_Q) minQ FROM min_quant_table group by month order by month asc)
t1 join min_quant_table t2 on t1.month = t2.month and (t2.LEAST_POP_TOTAL_Q = minQ)
You are over complicating things. You don't need to join those two query (and should really stay away from a natural join), you only need to combine them. min() and max() can be used inside the same query, there is no need to run two queries to evaluate both.
You also don't need to nest CTE definitions, you can just write one after the other.
So something like this:
with quant_table as (
select month, prod, sum(quant) as sum_q
from sales
group by month, prod
), min_max as (
select month, max(sum_q) as max_q, min(sum_q) as min_q
from quant_table
group by month
)
select t1.*
from quant_table t1
join min_max t2
on t2.month = t1.month
and t1.sum_q in (t2.min_q, t2.max_q)
order by month, prod;
The condition and t1.sum_q in (t2.min_q, t2.max_q) could also be written as and (t2.max_q = t1.sum_q or t2.min_q = t1.sum_q).
The above can further be simplified by combining group by with window functions and do the calculation of the sum, min and max in a single query:
with min_max as (
select month, prod,
sum(quant) as sum_q,
max(sum(quant)) over (partition by month) as max_q,
min(sum(quant)) over (partition by month) as min_q
from sales
group by month, prod
)
select month, prod, sum_q
from min_max
where sum_q in (max_q, min_q)
order by month, prod;

Generate count of group memberships x days after group creation

I have a table of group_id, member_id, and created_at.
I'm trying to track the growth in group membership across time. Since group_id's are created when the first member_id joins, the min(created_at) for a given group should give the created date. I think this broken code gets the point across for what I'm trying to do (at the month level in this case):
SELECT brand_id,
min(created_at) as created_date,
min(created_at) + INTERVAL '1 month' as end_date,
count(member_id)
FROM member_group
HAVING created_at < end_date
group by 1
It seems to me that you are looking for a query like this:
SELECT g.brand_id, x.created_date, x.end_date, count(g.member_id)
FROM member_group g
JOIN (
SELECT brand_id,
min(created_at) as created_date,
min(created_at) + INTERVAL '1' month as end_date
FROM member_group
GROUP BY brand_id
) x
ON ( g.brand_id = x.brand_id
AND g.created_at BETWEEN x.created_date AND x.end_date )
GROUP BY g.brand_id, x.created_date, x.end_date
select
brand_id,
count(created_at < brand_date + interval '1 month' or null) as total
from
member_group
inner join (
select brand_id, min(created_at) as brand_date
from member_group
group by 1
) s using (brand_id)
group by 1
order by 1
;

postgresql complex query joing same table

I would like to get those customers from a table 'transactions' which haven't created any transactions in the last 6 Months.
Table:
'transactions'
id, email, state, paid_at
To visualise:
|------------------------ time period with all transactions --------------------|
|-- period before month transactions > 0) ---|---- curr month transactions = 0 -|
I guess this is doable with a join showing only those that didn't have any transactions on the right side.
Example:
Month = November
The conditions for the left side should be:
COUNT(l.id) > 0
l.paid_at < '2013-05-01 00:00:00'
Conditions for the right side:
COUNT(r.id) = 0
r.paid_at BETWEEN '2013-05-01 00:00:00' AND '2013-11-30 23:59:59'
Is join the right approach?
Answer
SELECT
C .email
FROM
transactions C
WHERE
(
C .email NOT IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at >= '2013-05-01 00:00:00'
AND paid_at <= '2013-11-30 23:59:59'
)
AND
C .email IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at <= '2013-05-01 00:00:00'
)
)
AND c.paid_at <= '2013-11-30 23:59:59'
There are a couple of ways you could do this. Use a subquery to get distinct customer ids for transactions in the last 6 months, and then select customers where their id isn't in the subquery.
select c.id, c.name
from customer c
where c.id not in (select distinct customer_id from transaction where dt between <start> and <end>);
Or, use a left join from customer to transaction, and filter the results to have transaction id null. A left join includes all rows from the left-hand table, even when there are no matching rows in the right-hand table. Explanation of left joins here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
select c.id, c.name
from customer c
left join transaction t on c.id = t.customer_id
and t.dt between <start> and <end>
where t.id is null;
The left join approach is likely to be faster.