Optimize Query contains join and SubQuery - postgresql

I need to run this query but it takes so long and I got timeout exception.
would you please help me how can I decrease the execution time of this query or how how can I make it simpler?
here is my Postgres Query:
select
AR1.patient_id,
CONCAT(Ac."firstName", ' ', Ac."lastName") as doctor_full_name,
to_json(Ac.expertise::json->0->'id')::text as expertise_id,
to_json(Ac.expertise::json->0->'title')::text as expertise_title,
AP."phoneNumbers" as mobile,
AC.account_id as account_id,
AC.city_id
from
tb1 as AR1
LEFT JOIN tb2 as AA
on AR1.appointment_id = AA.id
LEFT JOIN tb3 as AC
on AC.account_id = AA.appointment_owner_id
LEFT JOIN tb4 as AP
on AP.id = AR1.patient_id
where AR1.status = 'canceled'
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-30 23:59:59'
and AP."phoneNumbers" <> ''
and patient_id not in (
select
AR2.patient_id
from
tb1 as AR2
LEFT JOIN tb2 as AA2
on AR2.appointment_id = AA2.id
LEFT JOIN tb3 as AC2
on AC2.account_id = AA2.appointment_owner_id
where AR2.status = 'submited'
and AR2.created_at >= '2022-12-30 00:00:00'
and ( to_json(Ac2.expertise::json->0->'id')::text = to_json(Ac.expertise::json->0->'id')::text or ac2.account_id = ac.account_id )
)

Try creating an index on tb1 to handle the WHERE clauses you use in your outer query.
CREATE INDEX status_updated ON tb1
(status, updated_at, patient_id);
And, create this index to handle your inner query.
CREATE INDEX status_created ON tb1
(status, created_at, patient_id);
These work because the query planner can random access these BTREE indexes to the the first eligible row by status and date, and then sequentially scan the index until the last eligible row.
The comments about avoiding f(column) expressions in WHERE and ON conditions are correct. You want those conditions to be sargable whenever possible.
And, by the way, you want this for a datestamp range
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-31 00:00:00'
You have
and AR1.updated_at >= '2022-12-30 00:00:00'
and AR1.updated_at < '2022-12-30 23:59:59'
which excludes, rather than includes, rows from the last moment of 2022-12-30. In can be very hard to figure out what went wrong if you exclude a row improperly with a date-range off-by-one-error. (Ask how I know this sometime :-)

Related

Select count between dates & All time counts in one query Postgres DB

I want to select count of impression between the dates and All time impression as well, can we do this in one query ?
This is my query in which I am able to get impression only in between dates
SELECT
robotAds."Ad_ID",
count(robotScraper."adIDAdID") as ad_impression
FROM
robot__ads robotAds
LEFT JOIN robot__session__scraper__data robotScraper
ON robotScraper."adIDAdID" = robotAds."Ad_ID"
LEFT JOIN robot__session_data robotSession
ON robotSession."id" = robotScraper."sessionIDId"
AND robotSession."Session_start" BETWEEN '2020-11-25 00:00:00'
AND '2021-04-01 00:00:00'
GROUP BY
robotAds."Ad_ID"
What I have to do to get count of all time impression in this same query.
Thanks
yes you can :
SELECT
robotAds."Ad_ID",
count(robotScraper."adIDAdID") filter (where robotSession."Session_start" BETWEEN '2020-11-25 00:00:00'AND '2021-04-01 00:00:00') as ad_impression,
count(robotScraper."adIDAdID") count_alltime
FROM
robot__ads robotAds
LEFT JOIN robot__session__scraper__data robotScraper
ON robotScraper."adIDAdID" = robotAds."Ad_ID"
LEFT JOIN robot__session_data robotSession
ON robotSession."id" = robotScraper."sessionIDId"
GROUP BY
robotAds."Ad_ID"
"Conditional aggregation" should meet this need. Essentially this is using a case expression inside the aggregation function, like this:
SELECT
robotAds."Ad_ID"
, count(CASE
WHEN robotSession."Session_start" BETWEEN '2020-11-25 00:00:00'
AND '2021-04-01 00:00:00'
THEN 1
END) AS range_ad_impression
, count(robotScraper."adIDAdID") AS all_ad_impression
FROM robot__ads robotAds
LEFT JOIN robot__session__scraper__data robotScraper ON robotScraper."adIDAdID" = robotAds."Ad_ID"
LEFT JOIN robot__session_data robotSession ON robotSession."id" = robotScraper."sessionIDId"
GROUP BY robotAds."Ad_ID"
Note: the count() function ignores NULLs, above I have ommitted an explicit instruction to return NULL but some prefer to do this using else i.e.
,count(CASE
WHEN robotSession."Session_start" BETWEEN '2020-11-25 00:00:00'
AND '2021-04-01 00:00:00'
THEN 1 ELSE NULL
END) AS range_count

Sub query in SELECT - ungrouped column from outer query

I have to calculate the ARPU (Revenue / # users) but I got this error:
subquery uses ungrouped column "usage_records.date" from outer query
LINE 7: WHERE created_at <= date_trunc('day', usage_records.d... ^
Expected results:
Revenue(day) = SUM(quantity_eur) for that day
Users Count (day) = Total signed up users before that day
Postgresql (Query)
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
( SELECT
COUNT(users.id)
FROM users
WHERE created_at <= date_trunc('day', usage_records.date)
) as users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc
your subquery (executed for each row ) cointain a column nont mentioned in group by but not involeved in aggregation ..
this produce error
but you could refactor your query using a contional also for this value
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
sum( case when created_at <= date_trunc('day', usage_records.date)
AND users.id is not null
then 1 else 0 end ) users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc

Does cluster index on time increase the speed of a query where we want the max time group by certain id?

Consider the following query
SELECT my_id, my_info FROM my_table as r
JOIN (
SELECT my_id, max(my_time) as max_time FROM my_table
WHERE my_time > timestamp '2019-01-10 00:00:00'
GROUP BY my_id) as k
ON k.my_id = r.my_id and k.max_time = r.my_time
And the following table
my_table
my_id [text, secondary index]
my_info [arbitrary]
my_time [timestamp with timezone, clustered index]
I think the most efficient query if the cardenality of my_id is not big would be the following
Get the set of all unique my_id from the index table
Scan through the entire table from first row (guarantee to have the highest timestamp due to clustering) and fetch my_info of my_id if not been fetch before.
I am not sure if postgres does exactly that, but I am interested in knowing if having cluster index help with my original query
If the answer is no, is there a way to increase the speed of the query above given the table structure?
I believe the clustered index should assist the filtering predicate WHERE my_time > timestamp '2019-01-10 00:00:00' but you need to consider explain plans to determine how the query has been handled. You might also want to consider using a window function approach instead:
SELECT k.my_id, k.my_info
JOIN (
SELECT my_id, my_info
, ROW_NUMBER() OVER(PARTITION BY my_id ORDER BY my_time DESC) as rn
FROM my_table
WHERE my_time > timestamp '2019-01-10 00:00:00'
) as k
WHERE k.rn = 1

Postgresql count by past weeks

select id, wk0_count
from teams
left join
(select team_id, count(team_id) as wk0_count
from (
select created_at, team_id, trunc(EXTRACT(EPOCH FROM age(CURRENT_TIMESTAMP,created_at)) / 604800) as wk_offset
from loan_files
where loan_type <> 2
order by created_at DESC) as t1
where wk_offset = 0
group by team_id) as t_wk0
on teams.id = t_wk0.team_id
I've created the query above that shows me how many loans each team did in a given week. Week 0 is the past seven days.
Ideally I want a table that shows how many loans each team did in the last 8 weeks, grouped by week. The output would look like:
Any ideas on the best way to do this?
select
t.id,
count(week = 0 or null) as wk0,
count(week = 1 or null) as wk1,
count(week = 2 or null) as wk2,
count(week = 3 or null) as wk3
from
teams t
left join
loan_files lf on lf.team_id = t.id and loan_type <> 2
cross join lateral
(select (current_date - created_at::date) / 7 as week) w
group by 1
In 9.4+ versions use the aggregate filter syntax:
count(*) filter (where week = 0) as wk0,
lateral is from 9.3. In a previous version move the week expression to the filter condition.
How about the following query?
SELECT team_id AS id, count(team_id) AS wk0_count
FROM teams LEFT JOIN loan_files ON teams.id = team_id
WHERE loan_type <> 2
AND trunc(EXTRACT(epoch FROM age(CURRENT_TIMESTAMP, created_at)) / 604800) = 0
GROUP BY team_id
Notable changes are:
ORDER BY clause in subquery was pointless;
created_at in innermost subquery was never used;
wk_offset test is moved on the WHERE clause and not done in two distinct steps;
outermost subquery was not needed.

postgresql complex query joing same table

I would like to get those customers from a table 'transactions' which haven't created any transactions in the last 6 Months.
Table:
'transactions'
id, email, state, paid_at
To visualise:
|------------------------ time period with all transactions --------------------|
|-- period before month transactions > 0) ---|---- curr month transactions = 0 -|
I guess this is doable with a join showing only those that didn't have any transactions on the right side.
Example:
Month = November
The conditions for the left side should be:
COUNT(l.id) > 0
l.paid_at < '2013-05-01 00:00:00'
Conditions for the right side:
COUNT(r.id) = 0
r.paid_at BETWEEN '2013-05-01 00:00:00' AND '2013-11-30 23:59:59'
Is join the right approach?
Answer
SELECT
C .email
FROM
transactions C
WHERE
(
C .email NOT IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at >= '2013-05-01 00:00:00'
AND paid_at <= '2013-11-30 23:59:59'
)
AND
C .email IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at <= '2013-05-01 00:00:00'
)
)
AND c.paid_at <= '2013-11-30 23:59:59'
There are a couple of ways you could do this. Use a subquery to get distinct customer ids for transactions in the last 6 months, and then select customers where their id isn't in the subquery.
select c.id, c.name
from customer c
where c.id not in (select distinct customer_id from transaction where dt between <start> and <end>);
Or, use a left join from customer to transaction, and filter the results to have transaction id null. A left join includes all rows from the left-hand table, even when there are no matching rows in the right-hand table. Explanation of left joins here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
select c.id, c.name
from customer c
left join transaction t on c.id = t.customer_id
and t.dt between <start> and <end>
where t.id is null;
The left join approach is likely to be faster.