postgresql complex query joing same table - postgresql

I would like to get those customers from a table 'transactions' which haven't created any transactions in the last 6 Months.
Table:
'transactions'
id, email, state, paid_at
To visualise:
|------------------------ time period with all transactions --------------------|
|-- period before month transactions > 0) ---|---- curr month transactions = 0 -|
I guess this is doable with a join showing only those that didn't have any transactions on the right side.
Example:
Month = November
The conditions for the left side should be:
COUNT(l.id) > 0
l.paid_at < '2013-05-01 00:00:00'
Conditions for the right side:
COUNT(r.id) = 0
r.paid_at BETWEEN '2013-05-01 00:00:00' AND '2013-11-30 23:59:59'
Is join the right approach?
Answer
SELECT
C .email
FROM
transactions C
WHERE
(
C .email NOT IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at >= '2013-05-01 00:00:00'
AND paid_at <= '2013-11-30 23:59:59'
)
AND
C .email IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at <= '2013-05-01 00:00:00'
)
)
AND c.paid_at <= '2013-11-30 23:59:59'

There are a couple of ways you could do this. Use a subquery to get distinct customer ids for transactions in the last 6 months, and then select customers where their id isn't in the subquery.
select c.id, c.name
from customer c
where c.id not in (select distinct customer_id from transaction where dt between <start> and <end>);
Or, use a left join from customer to transaction, and filter the results to have transaction id null. A left join includes all rows from the left-hand table, even when there are no matching rows in the right-hand table. Explanation of left joins here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
select c.id, c.name
from customer c
left join transaction t on c.id = t.customer_id
and t.dt between <start> and <end>
where t.id is null;
The left join approach is likely to be faster.

Related

Optimize Query contains join and SubQuery

I need to run this query but it takes so long and I got timeout exception.
would you please help me how can I decrease the execution time of this query or how how can I make it simpler?
here is my Postgres Query:
select
AR1.patient_id,
CONCAT(Ac."firstName", ' ', Ac."lastName") as doctor_full_name,
to_json(Ac.expertise::json->0->'id')::text as expertise_id,
to_json(Ac.expertise::json->0->'title')::text as expertise_title,
AP."phoneNumbers" as mobile,
AC.account_id as account_id,
AC.city_id
from
tb1 as AR1
LEFT JOIN tb2 as AA
on AR1.appointment_id = AA.id
LEFT JOIN tb3 as AC
on AC.account_id = AA.appointment_owner_id
LEFT JOIN tb4 as AP
on AP.id = AR1.patient_id
where AR1.status = 'canceled'
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-30 23:59:59'
and AP."phoneNumbers" <> ''
and patient_id not in (
select
AR2.patient_id
from
tb1 as AR2
LEFT JOIN tb2 as AA2
on AR2.appointment_id = AA2.id
LEFT JOIN tb3 as AC2
on AC2.account_id = AA2.appointment_owner_id
where AR2.status = 'submited'
and AR2.created_at >= '2022-12-30 00:00:00'
and ( to_json(Ac2.expertise::json->0->'id')::text = to_json(Ac.expertise::json->0->'id')::text or ac2.account_id = ac.account_id )
)
Try creating an index on tb1 to handle the WHERE clauses you use in your outer query.
CREATE INDEX status_updated ON tb1
(status, updated_at, patient_id);
And, create this index to handle your inner query.
CREATE INDEX status_created ON tb1
(status, created_at, patient_id);
These work because the query planner can random access these BTREE indexes to the the first eligible row by status and date, and then sequentially scan the index until the last eligible row.
The comments about avoiding f(column) expressions in WHERE and ON conditions are correct. You want those conditions to be sargable whenever possible.
And, by the way, you want this for a datestamp range
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-31 00:00:00'
You have
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-30 23:59:59'
which excludes, rather than includes, rows from the last moment of 2022-12-30. In can be very hard to figure out what went wrong if you exclude a row improperly with a date-range off-by-one-error. (Ask how I know this sometime :-)

Postgresql query where statement positions

I have two tables that I want to join. It works without where conditions. After adding where conditions, I got syntax error near a (where I give table1 an alternative name). From my understanding, the syntax looks correct?
My query
select * from table1 where date >= '2020-10-01' and date <= '2020-10-31' a
left join table2 b where registered >= '2020-10-01' and registered <= '2020-10-31' b
on a.id = cast(b.id as varchar)
Some issues:
where goes after all tables (and their join conditions)
aliases go immediately after the table name
Applying these two corrections and some formatting:
select *
from table1 a
left join table2 b on a.id = cast(b.id as varchar)
and registered >= '2020-10-01' and registered <= '2020-10-31'
where date >= '2020-10-01' and date <= '2020-10-31'
Conventionally, join conditions that describe access to joined rows (typically the keys) are coded first, then filtering conditions (ones involving only columns in the joined table) are coded last.
Which can be slightly simplified using between to:
select *
from table1 a
left join table2 b on a.id = cast(b.id as varchar)
and registered between '2020-10-01' and '2020-10-31'
where date between '2020-10-01' and '2020-10-31'

Get an average monthly view of active members (Postgresql)

I am working with members data. I have the responsible Coach, the coachee entry, exit status and date. Because some coachees might graduate/leave during a month I want to calculate a daily number and then get a monthly average of active members for each coach. That means that I need to take in the account all coachees from previous months, that are still active that current month. This is my data:
I am thinking of creating a variable first where I can get the daily active member count for each coach. This is my first approach:
with all_years as (
select y.year, m.month, d.day
from generate_series(2019, 2022) as y(year)
cross join generate_series(1, 12) as m(month)
cross join generate_series(1, 31) as d(day) --<<*not sure how to adjust for days with less than 31 days??*
select ay.*, coach, coachee, entry_status, entry_date, exit_reason, exit_date, sum(count) over (partition by ay.coach order by ay.year, ay.month, ay.day)
from all_years ay
left join table t
on --.... *not sure what I can join on in this case*;
I am open to an easier approach, this logic is just an idea.
You can cross join the list of distinct coaches with all dates to generat combinations, then bring the table with a left join:
select d.dt, c.coach, count(t.coach) no_coachees
from (select distinct coach from mytable) c
cross join generate_series('2019-01-01'::date, '2022-12-31'::date, '1 day':: interval) d(dt)
left join mytable t on t.coach = c.coach and t.entry_date <= d.dt and t.exit_date > d.dt
group by d.dt, c.coach
Then you can use another level of aggregation to get the monthly average:
select date_trunc('month', d.dt) d_month, coach, avg(no_coachees) avg_coaches
from (
select d.dt, c.coach, count(t.coach) no_coachees
from (select distinct coach from mytable) c
cross join generate_series('2019-01-01'::date, '2022-12-31'::date, '1 day':: interval) d(dt)
left join mytable t on t.coach = c.coach and t.entry_date <= d.dt and t.exit_date > d.dt
group by d.dt, c.coach
) t
group by date_trunc('month', d.dt), coach

Sub query in SELECT - ungrouped column from outer query

I have to calculate the ARPU (Revenue / # users) but I got this error:
subquery uses ungrouped column "usage_records.date" from outer query
LINE 7: WHERE created_at <= date_trunc('day', usage_records.d... ^
Expected results:
Revenue(day) = SUM(quantity_eur) for that day
Users Count (day) = Total signed up users before that day
Postgresql (Query)
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
( SELECT
COUNT(users.id)
FROM users
WHERE created_at <= date_trunc('day', usage_records.date)
) as users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc
your subquery (executed for each row ) cointain a column nont mentioned in group by but not involeved in aggregation ..
this produce error
but you could refactor your query using a contional also for this value
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
sum( case when created_at <= date_trunc('day', usage_records.date)
AND users.id is not null
then 1 else 0 end ) users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc

Postgresql count by past weeks

select id, wk0_count
from teams
left join
(select team_id, count(team_id) as wk0_count
from (
select created_at, team_id, trunc(EXTRACT(EPOCH FROM age(CURRENT_TIMESTAMP,created_at)) / 604800) as wk_offset
from loan_files
where loan_type <> 2
order by created_at DESC) as t1
where wk_offset = 0
group by team_id) as t_wk0
on teams.id = t_wk0.team_id
I've created the query above that shows me how many loans each team did in a given week. Week 0 is the past seven days.
Ideally I want a table that shows how many loans each team did in the last 8 weeks, grouped by week. The output would look like:
Any ideas on the best way to do this?
select
t.id,
count(week = 0 or null) as wk0,
count(week = 1 or null) as wk1,
count(week = 2 or null) as wk2,
count(week = 3 or null) as wk3
from
teams t
left join
loan_files lf on lf.team_id = t.id and loan_type <> 2
cross join lateral
(select (current_date - created_at::date) / 7 as week) w
group by 1
In 9.4+ versions use the aggregate filter syntax:
count(*) filter (where week = 0) as wk0,
lateral is from 9.3. In a previous version move the week expression to the filter condition.
How about the following query?
SELECT team_id AS id, count(team_id) AS wk0_count
FROM teams LEFT JOIN loan_files ON teams.id = team_id
WHERE loan_type <> 2
AND trunc(EXTRACT(epoch FROM age(CURRENT_TIMESTAMP, created_at)) / 604800) = 0
GROUP BY team_id
Notable changes are:
ORDER BY clause in subquery was pointless;
created_at in innermost subquery was never used;
wk_offset test is moved on the WHERE clause and not done in two distinct steps;
outermost subquery was not needed.