Select count between dates & All time counts in one query Postgres DB - postgresql

I want to select count of impression between the dates and All time impression as well, can we do this in one query ?
This is my query in which I am able to get impression only in between dates
SELECT
robotAds."Ad_ID",
count(robotScraper."adIDAdID") as ad_impression
FROM
robot__ads robotAds
LEFT JOIN robot__session__scraper__data robotScraper
ON robotScraper."adIDAdID" = robotAds."Ad_ID"
LEFT JOIN robot__session_data robotSession
ON robotSession."id" = robotScraper."sessionIDId"
AND robotSession."Session_start" BETWEEN '2020-11-25 00:00:00'
AND '2021-04-01 00:00:00'
GROUP BY
robotAds."Ad_ID"
What I have to do to get count of all time impression in this same query.
Thanks

yes you can :
SELECT
robotAds."Ad_ID",
count(robotScraper."adIDAdID") filter (where robotSession."Session_start" BETWEEN '2020-11-25 00:00:00'AND '2021-04-01 00:00:00') as ad_impression,
count(robotScraper."adIDAdID") count_alltime
FROM
robot__ads robotAds
LEFT JOIN robot__session__scraper__data robotScraper
ON robotScraper."adIDAdID" = robotAds."Ad_ID"
LEFT JOIN robot__session_data robotSession
ON robotSession."id" = robotScraper."sessionIDId"
GROUP BY
robotAds."Ad_ID"

"Conditional aggregation" should meet this need. Essentially this is using a case expression inside the aggregation function, like this:
SELECT
robotAds."Ad_ID"
, count(CASE
WHEN robotSession."Session_start" BETWEEN '2020-11-25 00:00:00'
AND '2021-04-01 00:00:00'
THEN 1
END) AS range_ad_impression
, count(robotScraper."adIDAdID") AS all_ad_impression
FROM robot__ads robotAds
LEFT JOIN robot__session__scraper__data robotScraper ON robotScraper."adIDAdID" = robotAds."Ad_ID"
LEFT JOIN robot__session_data robotSession ON robotSession."id" = robotScraper."sessionIDId"
GROUP BY robotAds."Ad_ID"
Note: the count() function ignores NULLs, above I have ommitted an explicit instruction to return NULL but some prefer to do this using else i.e.
,count(CASE
WHEN robotSession."Session_start" BETWEEN '2020-11-25 00:00:00'
AND '2021-04-01 00:00:00'
THEN 1 ELSE NULL
END) AS range_count

Related

Optimize Query contains join and SubQuery

I need to run this query but it takes so long and I got timeout exception.
would you please help me how can I decrease the execution time of this query or how how can I make it simpler?
here is my Postgres Query:
select
AR1.patient_id,
CONCAT(Ac."firstName", ' ', Ac."lastName") as doctor_full_name,
to_json(Ac.expertise::json->0->'id')::text as expertise_id,
to_json(Ac.expertise::json->0->'title')::text as expertise_title,
AP."phoneNumbers" as mobile,
AC.account_id as account_id,
AC.city_id
from
tb1 as AR1
LEFT JOIN tb2 as AA
on AR1.appointment_id = AA.id
LEFT JOIN tb3 as AC
on AC.account_id = AA.appointment_owner_id
LEFT JOIN tb4 as AP
on AP.id = AR1.patient_id
where AR1.status = 'canceled'
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-30 23:59:59'
and AP."phoneNumbers" <> ''
and patient_id not in (
select
AR2.patient_id
from
tb1 as AR2
LEFT JOIN tb2 as AA2
on AR2.appointment_id = AA2.id
LEFT JOIN tb3 as AC2
on AC2.account_id = AA2.appointment_owner_id
where AR2.status = 'submited'
and AR2.created_at >= '2022-12-30 00:00:00'
and ( to_json(Ac2.expertise::json->0->'id')::text = to_json(Ac.expertise::json->0->'id')::text or ac2.account_id = ac.account_id )
)
Try creating an index on tb1 to handle the WHERE clauses you use in your outer query.
CREATE INDEX status_updated ON tb1
(status, updated_at, patient_id);
And, create this index to handle your inner query.
CREATE INDEX status_created ON tb1
(status, created_at, patient_id);
These work because the query planner can random access these BTREE indexes to the the first eligible row by status and date, and then sequentially scan the index until the last eligible row.
The comments about avoiding f(column) expressions in WHERE and ON conditions are correct. You want those conditions to be sargable whenever possible.
And, by the way, you want this for a datestamp range
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-31 00:00:00'
You have
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-30 23:59:59'
which excludes, rather than includes, rows from the last moment of 2022-12-30. In can be very hard to figure out what went wrong if you exclude a row improperly with a date-range off-by-one-error. (Ask how I know this sometime :-)

Is there a SQL code for cumulative count of SaaS customer over months?

I have a table with:
ID (id client), date_start (subscription of SaaS), date_end (could be a date value or be NULL).
So I need a cumulative count of active clients month by month.
any idea on how to write that in Postgres and achieve this result?
Starting from this, but I don't know how to proceed
select
date_trunc('month', c.date_start)::date,
count(*)
from customer
Please check next solution:
select
subscrubed_date,
subscrubed_customers,
unsubscrubed_customers,
coalesce(subscrubed_customers, 0) - coalesce(unsubscrubed_customers, 0) cumulative
from (
select distinct
date_trunc('month', c.date_start)::date subscrubed_date,
sum(1) over (order by date_trunc('month', c.date_start)) subscrubed_customers
from customer c
order by subscrubed_date
) subscribed
left join (
select distinct
date_trunc('month', c.date_end)::date unsubscrubed_date,
sum(1) over (order by date_trunc('month', c.date_end)) unsubscrubed_customers
from customer c
where date_end is not null
order by unsubscrubed_date
) unsubscribed on subscribed.subscrubed_date = unsubscribed.unsubscrubed_date;
share SQL query
You have a table of customers. With a start date and sometimes an end date. As you want to group by date, but there are two dates in the table, you need to split these first.
Then, you may have months where only customers came and others where only customers left. So, you'll want a full outer join of the two sets.
For a cumulative sum (also called a running total), use SUM OVER.
with came as
(
select date_trunc('month', date_start) as month, count(*) as cnt
from customer
group by date_trunc('month', date_start)
)
, went as
(
select date_trunc('month', date_end) as month, count(*) as cnt
from customer
where date_end is not null
group by date_trunc('month', date_end)
)
select
month,
came.cnt as cust_new,
went.cnt as cust_gone,
sum(came.cnt - went.cnt) over (order by month) as cust_active
from came full outer join went using (month)
order by month;

Get an average monthly view of active members (Postgresql)

I am working with members data. I have the responsible Coach, the coachee entry, exit status and date. Because some coachees might graduate/leave during a month I want to calculate a daily number and then get a monthly average of active members for each coach. That means that I need to take in the account all coachees from previous months, that are still active that current month. This is my data:
I am thinking of creating a variable first where I can get the daily active member count for each coach. This is my first approach:
with all_years as (
select y.year, m.month, d.day
from generate_series(2019, 2022) as y(year)
cross join generate_series(1, 12) as m(month)
cross join generate_series(1, 31) as d(day) --<<*not sure how to adjust for days with less than 31 days??*
select ay.*, coach, coachee, entry_status, entry_date, exit_reason, exit_date, sum(count) over (partition by ay.coach order by ay.year, ay.month, ay.day)
from all_years ay
left join table t
on --.... *not sure what I can join on in this case*;
I am open to an easier approach, this logic is just an idea.
You can cross join the list of distinct coaches with all dates to generat combinations, then bring the table with a left join:
select d.dt, c.coach, count(t.coach) no_coachees
from (select distinct coach from mytable) c
cross join generate_series('2019-01-01'::date, '2022-12-31'::date, '1 day':: interval) d(dt)
left join mytable t on t.coach = c.coach and t.entry_date <= d.dt and t.exit_date > d.dt
group by d.dt, c.coach
Then you can use another level of aggregation to get the monthly average:
select date_trunc('month', d.dt) d_month, coach, avg(no_coachees) avg_coaches
from (
select d.dt, c.coach, count(t.coach) no_coachees
from (select distinct coach from mytable) c
cross join generate_series('2019-01-01'::date, '2022-12-31'::date, '1 day':: interval) d(dt)
left join mytable t on t.coach = c.coach and t.entry_date <= d.dt and t.exit_date > d.dt
group by d.dt, c.coach
) t
group by date_trunc('month', d.dt), coach

Sub query in SELECT - ungrouped column from outer query

I have to calculate the ARPU (Revenue / # users) but I got this error:
subquery uses ungrouped column "usage_records.date" from outer query
LINE 7: WHERE created_at <= date_trunc('day', usage_records.d... ^
Expected results:
Revenue(day) = SUM(quantity_eur) for that day
Users Count (day) = Total signed up users before that day
Postgresql (Query)
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
( SELECT
COUNT(users.id)
FROM users
WHERE created_at <= date_trunc('day', usage_records.date)
) as users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc
your subquery (executed for each row ) cointain a column nont mentioned in group by but not involeved in aggregation ..
this produce error
but you could refactor your query using a contional also for this value
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
sum( case when created_at <= date_trunc('day', usage_records.date)
AND users.id is not null
then 1 else 0 end ) users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc

postgresql complex query joing same table

I would like to get those customers from a table 'transactions' which haven't created any transactions in the last 6 Months.
Table:
'transactions'
id, email, state, paid_at
To visualise:
|------------------------ time period with all transactions --------------------|
|-- period before month transactions > 0) ---|---- curr month transactions = 0 -|
I guess this is doable with a join showing only those that didn't have any transactions on the right side.
Example:
Month = November
The conditions for the left side should be:
COUNT(l.id) > 0
l.paid_at < '2013-05-01 00:00:00'
Conditions for the right side:
COUNT(r.id) = 0
r.paid_at BETWEEN '2013-05-01 00:00:00' AND '2013-11-30 23:59:59'
Is join the right approach?
Answer
SELECT
C .email
FROM
transactions C
WHERE
(
C .email NOT IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at >= '2013-05-01 00:00:00'
AND paid_at <= '2013-11-30 23:59:59'
)
AND
C .email IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at <= '2013-05-01 00:00:00'
)
)
AND c.paid_at <= '2013-11-30 23:59:59'
There are a couple of ways you could do this. Use a subquery to get distinct customer ids for transactions in the last 6 months, and then select customers where their id isn't in the subquery.
select c.id, c.name
from customer c
where c.id not in (select distinct customer_id from transaction where dt between <start> and <end>);
Or, use a left join from customer to transaction, and filter the results to have transaction id null. A left join includes all rows from the left-hand table, even when there are no matching rows in the right-hand table. Explanation of left joins here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
select c.id, c.name
from customer c
left join transaction t on c.id = t.customer_id
and t.dt between <start> and <end>
where t.id is null;
The left join approach is likely to be faster.