Postgres Query Optimization without adding an extra index

Postgres Query Optimization without adding an extra index - postgresql

I was trying to optimize this query differently, but before that, Can we make any slight change in this query to reduce the time without adding any index?
Postgres version: 13.5
Query:
SELECT
orders.id as order_id,
orders.*, u1.name as user_name,
u2.name as driver_name,
u3.name as payment_by_name, referrals.name as ref_name,
array_to_string(array_agg(orders_payments.payment_type_name), ',') as payment_type_name,
array_to_string(array_agg(orders_payments.amount), ',') as payment_type_amount,
array_to_string(array_agg(orders_payments.reference_code), ',') as reference_code,
array_to_string(array_agg(orders_payments.tips), ',') as tips,
array_to_string(array_agg(locations.name), ',') as location_name,
(select
SUM(order_items.tax) as tax from order_items
where order_items.order_id = orders.id and order_items.deleted = 'f'
) as tax,
(select
SUM(orders_surcharges.surcharge_tax) as surcharge_tax from orders_surcharges
where orders_surcharges.order_id = orders.id
)
FROM "orders"
LEFT JOIN
users as u1 ON u1.id = orders.user_id
LEFT JOIN
users as u2 ON u2.id = orders.driver_id
LEFT JOIN
users as u3 ON u3.id = orders.payment_received_by
LEFT JOIN
referrals ON referrals.id = orders.referral_id
INNER JOIN
locations ON locations.id = orders.location_id
LEFT JOIN
orders_payments ON orders_payments.order_id = orders.id
WHERE
(orders.company_id = '626')
AND
(orders.created_at BETWEEN '2021-04-23 20:00:00' AND '2021-07-24 20:00:00')
AND
orders.order_status_id NOT IN (10, 5, 50)
GROUP BY
orders.id, u1.name, u2.name, u3.name, referrals.name
ORDER BY
created_at ASC LIMIT 300 OFFSET 0
Current Index:
"orders_pkey" PRIMARY KEY, btree (id)
"idx_orders_company_and_location" btree (company_id, location_id)
"idx_orders_created_at" btree (created_at)
"idx_orders_customer_id" btree (customer_id)
"idx_orders_location_id" btree (location_id)
"idx_orders_order_status_id" btree (order_status_id)
Execution Plan
Seems this takes more time on the parallel heap scan.

You're looking for 300 orders and try to get some additional information about these records. I would see if I could first get these 300 records, instead of getting all the data and then limit it to 300. Something like this:
WITH orders_300 AS (
SELECT * -- just get the columns that you really need, never use * in production
FROM orders
INNER JOIN locations ON locations.id = orders.location_id
WHERE orders.company_id = '626'
AND orders.created_at BETWEEN '2021-04-23 20:00:00' AND '2021-07-24 20:00:00'
AND orders.order_status_id NOT IN (10, 5, 50)
ORDER BY
created_at ASC LIMIT 300 -- LIMIT
OFFSET 0
)
SELECT
orders.id as order_id,
orders.*, -- just get the columns that you really need, never use * in production
u1.name as user_name,
u2.name as driver_name,
u3.name as payment_by_name, referrals.name as ref_name,
array_to_string(array_agg(orders_payments.payment_type_name), ',') as payment_type_name,
array_to_string(array_agg(orders_payments.amount), ',') as payment_type_amount,
array_to_string(array_agg(orders_payments.reference_code), ',') as reference_code,
array_to_string(array_agg(orders_payments.tips), ',') as tips,
array_to_string(array_agg(locations.name), ',') as location_name,
(SELECT SUM(order_items.tax) as tax
FROM order_items
WHERE order_items.order_id = orders.id
AND order_items.deleted = 'f'
) as tax,
( SELECT SUM(orders_surcharges.surcharge_tax) as surcharge_tax
FROM orders_surcharges
WHERE orders_surcharges.order_id = orders.id
)
FROM "orders_300" AS orders
LEFT JOIN users as u1 ON u1.id = orders.user_id
LEFT JOIN users as u2 ON u2.id = orders.driver_id
LEFT JOIN users as u3 ON u3.id = orders.payment_received_by
LEFT JOIN referrals ON referrals.id = orders.referral_id
LEFT JOIN orders_payments ON orders_payments.order_id = orders.id
GROUP BY
orders.id, u1.name, u2.name, u3.name, referrals.name
ORDER BY
created_at;
This will at least have a huge impact on the slowest part of your query, all these index scans on orders_payments. Every single scan is fast, but the query is doing 165000 of them... Limit this to just 300 and will be much faster.
Another issue is that none of your indexes covers the entire WHERE condition on the table "orders". But if you can't create a new index, you're out of luck.

Related

How to select from relationship based on the height value of field postgres?

i have three tables vehicles and trips and componentValues they are related to each other by
vehicles -> trips -> componentValues
vehicles Table : id, ...
trips Table: id, vehicle_id, ...
componentValues Table: id, trip_id, damage, ...
and i'm trying to get all the trips with the highest damage component form the componentValues table like this
SELECT *
FROM (select * from trips WHERE "trips"."vehicle_id" = '7') as t
LEFT JOIN (
select * from "component_values"
where trip_id = '85'
order by damage
desc nulls last limit 1
) as h on t.id = h.trip_id
how can i change the line where trip_id = '85' to be dynamic or is their another way to do this and many thanks in advance.
expected result:
UPDATE
i have did some query that get what i want but how can i improve it by not using sub queries in the select statement
select * ,
(select damage from "component_values" where trip_id = trips.id order by damage desc nulls last limit 1) as h_damage,
(select damage from "component_values" where trip_id = trips.id order by damage asc nulls first limit 1) as l_damage,
(select component_types.name from "component_values" left join component_types on component_values.component_type_id = component_types.id where trip_id = trips.id order by damage desc nulls last limit 1) as hc_damage,
(select component_types.name from "component_values" left join component_types on component_values.component_type_id = component_types.id where trip_id = trips.id order by damage asc nulls first limit 1) as lc_damage
from trips
WHERE trips."vehicle_id" = '7'

I think you want distinct on:
select distinct on (t.id) t.*, dv.damage
from trips t join
component_values cv
on cv.trip_id = t.id
where t.vehicle_id = 7 -- not sure if this is needed
order by t.id, cv.damage desc nulls last;
distinct on is usually the most efficient method in Postgres. You can also do this with window functions:
select distinct on (t.id) t.*, cv.damage
from trips t join
(select cv.*,
row_number() over (partition by cv.trip_id, cv.damage desc nulls last) as seqnum
from component_values cv
) cv
on cv.trip_id = t.id and cv.seqnum = 1
where t.vehicle_id = 7; -- not sure if this is needed

I think you want a lateral join.
SELECT *
FROM (select * from trips WHERE "trips"."vehicle_id" = '7') as t
LEFT JOIN lateral (
select * from "component_values"
where trip_id = t.id
order by damage
desc nulls last limit 1
) as h on true
Although I don't think there is a reason for the first subquery, so:
SELECT *
FROM trips
LEFT JOIN lateral (
select * from "component_values"
where trip_id = trips.id
order by damage
desc nulls last limit 1
) as h on true
WHERE "trips"."vehicle_id" = '7'

Strange Behaviour on Postgresql query

We created a view in Postgres and I am getting strange result.
View Name: event_puchase_product_overview
When I try to get records with *, I get the correct result. but when I try to get specific fields, I get wrong values.
I hope the screens attached here can explain the problem well.
select *
from event_purchase_product_overview
where id = 15065;
select id, departure_id
from event_puchase_product_overview
where id = 15065;
VIEW definition:
CREATE OR REPLACE VIEW public.event_puchase_product_overview AS
SELECT row_number() OVER () AS id,
e.id AS departure_id,
e.type AS event_type,
e.name,
p.id AS product_id,
pc.name AS product_type,
product_date.attribute AS option,
p.upcomming_date AS supply_date,
pr.date_end AS bid_deadline,
CASE
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_hotel'::text) tt)) THEN e.maximum_rooms
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_flight'::text) tt)) THEN e.maximum_seats
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_bike'::text) tt)) THEN e.maximum_bikes
ELSE e.maximum_seats
END AS departure_qty,
CASE
WHEN now()::date > pr.date_end AND po.state::text = 'draft'::text THEN true
ELSE false
END AS is_deadline,
pl.product_qty::integer AS purchased_qty,
pl.comments,
pl.price_unit AS unit_price,
rp.id AS supplier,
po.id AS po_ref,
po.state AS po_state,
po.date_order AS po_date,
po.user_id AS operator,
pl.po_state_line AS line_status
FROM event_event e
LEFT JOIN product_product p ON p.related_departure = e.id
LEFT JOIN product_template pt ON pt.id = p.product_tmpl_id
LEFT JOIN product_category pc ON pc.id = pt.categ_id
LEFT JOIN purchase_order_line pl ON pl.product_id = p.id
LEFT JOIN purchase_order po ON po.id = pl.order_id
LEFT JOIN purchase_order_purchase_requisition_rel prr ON prr.purchase_order_id = po.id
LEFT JOIN purchase_requisition pr ON pr.id = prr.purchase_requisition_id
LEFT JOIN res_partner rp ON rp.id = po.partner_id
LEFT JOIN ( SELECT p_1.id AS product_id,
pav.name AS attribute
FROM product_product p_1
LEFT JOIN product_attribute_value_product_product_rel pa ON pa.prod_id = p_1.id
LEFT JOIN product_attribute_value pav ON pav.id = pa.att_id
LEFT JOIN product_attribute pat ON pat.id = pav.attribute_id
WHERE pat.name::text <> ALL (ARRAY['Date'::character varying, 'Departure'::character varying]::text[])) product_date ON product_date.product_id = p.id
WHERE (p.id IN ( SELECT DISTINCT mrp_bom_line.product_id
FROM mrp_bom_line)) AND p.active
ORDER BY e.id, pt.categ_id, p.id;

If I add new event_event or new product_product I'll get a new definition of row_number in my view, then the column ID of my view is not stable.
at least you can't use row_number as Id of the view,
If you insist to use row_number, you can use the Order By "creation DATE" by this way all new records will be as last lines in the view and this will not change the correspondency between ID (row_number) and other columns.
Hope that helps !

Very likely the execution plan of your query depends on the columns you select. Compare the execution plans!
Your id is generated using the row_number window function. Now window functions are executed before the ORDER BY clause, so the order will depend on the execution plan and hence on the columns you select.
Using row_number without an explicit ordering doesn't make any sense.
To fix that, don't use
row_number() OVER ()
but
row_number() OVER (ORDER BY e.id, pt.categ_id, p.id)
so that you have a reliable ordering.
In addition, you should omit the ORDER BY clause at the end.

postgres JOIN with left table null

my query is:
SELECT main.group_id, s_ref.title, s_ref.username, main.m_per_group, main.pos, u.lang
FROM (
SELECT user_id, group_id, COUNT(user_id) AS m_per_group,
ROW_NUMBER() OVER (
PARTITION BY group_id
ORDER BY COUNT(group_id) DESC
) AS pos
FROM messages
WHERE message_date > date_trunc('week', now())
GROUP BY group_id, user_id
) AS main
LEFT OUTER JOIN supergroups_ref AS s_ref
USING (group_id)
RIGHT JOIN users AS u
ON u.user_id = main.user_id
WHERE main.user_id = %s
ORDER BY m_per_group DESC
the problem is that when main returns 0 elements, i don't get neither the language of the user of the users JOIN but i get exactly []
i instead would like to get [(None, None, None, None, 'en')] this is why i used a right join. How can i get the result i want?

Move this condition:
WHERE main.user_id = %s
To the main subquery:
WHERE message_date > date_trunc('week', now()) and main.user_id = %s
The way it is now it is turning an outer join into an inner join.

Avoiding Order By in T-SQL

Below sample query is a part of my main query. I found SORT operator in below query is consuming 30% of the cost.
To avoid SORT, there is need of creation of Indexes. Is there any other way to optimize this code.
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA
WHERE ID = r.ID
AND Status = 3
AND TableA_ID >ISNULL((
SELECT TOP 1 TableA_ID
FROM TableA
WHERE ID = r.ID
AND Status <> 3
ORDER BY T_Date DESC
), 0)
ORDER BY T_Date ASC

Looks like you can use not exists rather than the sorts. I think you'll probably get a better performance boost by use a CTE or derived table instead of the a scalar subquery.
select *
from r ... left outer join
(
select ID, min(t_date) as min_date from TableA t1
where status = 3 and not exists (
select 1 from TableA t2
where t2.ID = t1.ID
and t2.status <> 3 and t2.t_date > t1.t_date
)
group by ID
) as md on md.ID = r.ID ...
or
select *
from r ... left outer join
(
select t1.ID, min(t1.t_date) as min_date
from TableA t1 left outer join TableA t2
on t2.ID = t1.ID and t2.status <> 3
where t1.status = 3 and t1.t_date < t2.t_date
group by t1.ID
having count(t2.ID) = 0
) as md on md.ID = r.ID ...
It also appears that you're relying on an identity column but it's not clear what those values mean. I'm basically ignoring it and using the date column instead.

Try this:
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA a1
LEFT JOIN (
SELECT ID, MAX(TableA_ID) AS MaxAID
FROM TableA
WHERE Status <> 3
GROUP BY ID
) a2 ON a2.ID = a1.ID AND a1.TableA_ID > coalesce(a2.MAXAID,0)
WHERE a1.ID = r.ID AND a1.Status = 3
ORDER BY T_Date ASC
The use of TOP 1 in combination with the unexplained r alias concern me. There's almost certainly a MUCH better way to get this data into your results that doesn't involve doing this in a sub query (unless this is for an APPLY operation).

limiting a correlated subquery to just one record

I am trying to use a correlated subquery, but I am trying to limit it to the "best" record. When I use SQL very similiar to what follows, I get two rows per BigTable.identifier, and I wish to have only one. In the 'UNION' statement, the second half is more desirable than the first half. However, sometimes the first half will be needed. Any ideas? Here's the code:
select
BigTable.identifier,
Correlated.ID,
Correlated.Effective_Date,
Correlated.Period_Number
from
BigTable
inner join
(
select
TOP 2147483647
Table3.identifier,
Table4.Effective_Date,
Table4.Period_Number
from
Table3
inner join Table4 on Table3.matching_key = Table4.matching_key
where
Table4.Period_Number = 0
order by Table4.Effective_Date desc
UNION
select
TOP 2147483647
Table3.Identifer,
Table4.Effective_Date,
Table4.Period_Number
from
Table3
inner join Table5 on Table3.matching-key = Table5.matching-key
inner join Table4 on Table5.key1 = Table4.key1 and
Table5.key2 = Table4.key2
where
Table4.period_number = 1
order by Table4.Effective_Date desc
) as Correlated
on BigTable.identifier = Correlated.identifier

If each sub-query in that UNION had some condition which EXCLUDED the row if it was less-preferred, you would never see the less-preferred rows in the UNION.
So, if each were to have a NOT EXISTS (.... a better row in the other side of the union ....), you would eliminate less-preferred rows at the root.
I'm not clear on how you want to use effective date. Assuming you mean that you prefer Period=1 but if the Effective date is less you prefer Period=0, then something like this might work.
select
BigTable.identifier,
Correlated.ID,
Correlated.Effective_Date,
Correlated.Period_Number
from
BigTable
inner join
(
select
TOP 2147483647
Table3.identifier,
Table4.Effective_Date,
Table4.Period_Number
from
Table3
inner join Table4 on Table3.matching_key = Table4.matching_key
where
Table4.Period_Number = 0
AND NOT EXISTS
(select 1
from Table5 T5 inner join Table4 T4
on T5.key1 = T4.key1 and T5.key2 = T4.key2
where Table3.matching-key = T5.matching-key
and (T4.Effective_Date >= Table4.Effective_Date and T4.Period_Number = 1)
)
order by Table4.Effective_Date desc
UNION
select
TOP 2147483647
Table3.Identifer,
Table4.Effective_Date,
Table4.Period_Number
from
Table3
inner join Table5 on Table3.matching-key = Table5.matching-key
inner join Table4 on Table5.key1 = Table4.key1 and
Table5.key2 = Table4.key2
where
Table4.period_number = 1
AND NOT EXISTS
(select 1
from Table4 T4
where Table3.matching-key = T4.matching-key
and (T4.Period_Number > 0)
and (T4.Effective_Date > Table4.Effective_Date and T4.Period_Number = 0)
)
order by Table4.Effective_Date desc
) as Correlated
on BigTable.identifier = Correlated.identifier

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres Query Optimization without adding an extra index - postgresql

Related

How to select from relationship based on the height value of field postgres?

Strange Behaviour on Postgresql query

postgres JOIN with left table null

Avoiding Order By in T-SQL

limiting a correlated subquery to just one record

Categories

Resources