query optimization - postgresql

I have 50964218 records in a table. I am going to fetch the data from this table and insert into the same table. Which takes more time to manipulate. How optimize this query.
The Query is
INSERT INTO contacts_lists (contact_id, list_id, is_excluded, added_by_search)
SELECT contact_id, 68114 , TRUE, added_by_search
FROM contacts_lists cl1
WHERE list_id = 67579
AND is_excluded = TRUE
AND NOT EXISTS
(SELECT 1 FROM contacts_lists cl2
WHERE cl1.contact_id = cl2.contact_id AND cl2.list_id = 68114 )
index: list_id,contact_id

you will probably get better results with a left join:
select t1.[field], ...
from t1
left join t2
on [conditions]
where t2.[any pkey field] is null;

Related

Postgres Query Optimization without adding an extra index

I was trying to optimize this query differently, but before that, Can we make any slight change in this query to reduce the time without adding any index?
Postgres version: 13.5
Query:
SELECT
orders.id as order_id,
orders.*, u1.name as user_name,
u2.name as driver_name,
u3.name as payment_by_name, referrals.name as ref_name,
array_to_string(array_agg(orders_payments.payment_type_name), ',') as payment_type_name,
array_to_string(array_agg(orders_payments.amount), ',') as payment_type_amount,
array_to_string(array_agg(orders_payments.reference_code), ',') as reference_code,
array_to_string(array_agg(orders_payments.tips), ',') as tips,
array_to_string(array_agg(locations.name), ',') as location_name,
(select
SUM(order_items.tax) as tax from order_items
where order_items.order_id = orders.id and order_items.deleted = 'f'
) as tax,
(select
SUM(orders_surcharges.surcharge_tax) as surcharge_tax from orders_surcharges
where orders_surcharges.order_id = orders.id
)
FROM "orders"
LEFT JOIN
users as u1 ON u1.id = orders.user_id
LEFT JOIN
users as u2 ON u2.id = orders.driver_id
LEFT JOIN
users as u3 ON u3.id = orders.payment_received_by
LEFT JOIN
referrals ON referrals.id = orders.referral_id
INNER JOIN
locations ON locations.id = orders.location_id
LEFT JOIN
orders_payments ON orders_payments.order_id = orders.id
WHERE
(orders.company_id = '626')
AND
(orders.created_at BETWEEN '2021-04-23 20:00:00' AND '2021-07-24 20:00:00')
AND
orders.order_status_id NOT IN (10, 5, 50)
GROUP BY
orders.id, u1.name, u2.name, u3.name, referrals.name
ORDER BY
created_at ASC LIMIT 300 OFFSET 0
Current Index:
"orders_pkey" PRIMARY KEY, btree (id)
"idx_orders_company_and_location" btree (company_id, location_id)
"idx_orders_created_at" btree (created_at)
"idx_orders_customer_id" btree (customer_id)
"idx_orders_location_id" btree (location_id)
"idx_orders_order_status_id" btree (order_status_id)
Execution Plan
Seems this takes more time on the parallel heap scan.
You're looking for 300 orders and try to get some additional information about these records. I would see if I could first get these 300 records, instead of getting all the data and then limit it to 300. Something like this:
WITH orders_300 AS (
SELECT * -- just get the columns that you really need, never use * in production
FROM orders
INNER JOIN locations ON locations.id = orders.location_id
WHERE orders.company_id = '626'
AND orders.created_at BETWEEN '2021-04-23 20:00:00' AND '2021-07-24 20:00:00'
AND orders.order_status_id NOT IN (10, 5, 50)
ORDER BY
created_at ASC LIMIT 300 -- LIMIT
OFFSET 0
)
SELECT
orders.id as order_id,
orders.*, -- just get the columns that you really need, never use * in production
u1.name as user_name,
u2.name as driver_name,
u3.name as payment_by_name, referrals.name as ref_name,
array_to_string(array_agg(orders_payments.payment_type_name), ',') as payment_type_name,
array_to_string(array_agg(orders_payments.amount), ',') as payment_type_amount,
array_to_string(array_agg(orders_payments.reference_code), ',') as reference_code,
array_to_string(array_agg(orders_payments.tips), ',') as tips,
array_to_string(array_agg(locations.name), ',') as location_name,
(SELECT SUM(order_items.tax) as tax
FROM order_items
WHERE order_items.order_id = orders.id
AND order_items.deleted = 'f'
) as tax,
( SELECT SUM(orders_surcharges.surcharge_tax) as surcharge_tax
FROM orders_surcharges
WHERE orders_surcharges.order_id = orders.id
)
FROM "orders_300" AS orders
LEFT JOIN users as u1 ON u1.id = orders.user_id
LEFT JOIN users as u2 ON u2.id = orders.driver_id
LEFT JOIN users as u3 ON u3.id = orders.payment_received_by
LEFT JOIN referrals ON referrals.id = orders.referral_id
LEFT JOIN orders_payments ON orders_payments.order_id = orders.id
GROUP BY
orders.id, u1.name, u2.name, u3.name, referrals.name
ORDER BY
created_at;
This will at least have a huge impact on the slowest part of your query, all these index scans on orders_payments. Every single scan is fast, but the query is doing 165000 of them... Limit this to just 300 and will be much faster.
Another issue is that none of your indexes covers the entire WHERE condition on the table "orders". But if you can't create a new index, you're out of luck.

Select multiple non aggregated columns with group by in postgres

I'm making a query with having multiple non aggregated columns with group by clause but Postgres is throwing an error that I have to add non aggregated columns in group by or use any aggregate function on that column this is the query that I'm trying to run.
select
tb1.pipeline as pipeline_id,
tb3.pipeline_name as pipeline_name,
tb2."name" as integration_name,
cast(tb1.integration_id as VARCHAR) as integration_id,
tb1.created_at as created_at,
cast(tb1.id as VARCHAR) as batch_id,
sum(tb1.row_select) as row_select,
sum(tb1.row_insert) as row_insert,
from
table1 tb1
join
table2 tb2 on tb1.integration_id = tb2.id
join
table3 tb3 on tb1.pipeline = tb3.id
where
tb1.pipeline is not null
and tb1.is_super_parent = false
group by
tb1.pipeline
and I found one solution/hack for this error that is I added max function in all other non aggregated columns this solves my problem.
select
tb1.pipeline as pipeline_id,
max(tb3.pipeline_name) as pipeline_name,
max(tb2."name") as integration_name,
max(cast(tb1.integration_id as VARCHAR)) as integration_id,
max(tb1.created_at) as created_at,
max(cast(tb1.id as VARCHAR)) as batch_id,
sum(tb1.row_select) as row_select,
sum(tb1.row_insert) as row_insert,
from
table1 tb1
join
table2 tb2 on tb1.integration_id = tb2.id
join
table3 tb3 on tb1.pipeline = tb3.id
where
tb1.pipeline is not null
and tb1.is_super_parent = false
group by
tb1.pipeline
But I don't want to add max functions when there is no need for that second thing is that applying max to all other column query will be expensive so any other better approach that I can do to solve the above issue, thanks in advance.
Well the first thing you need is to learn to format your queries in so as to get an idea of their flow at a glance. Note due to the extra comma in row_insert, from your query will give a syntax error. With that said; How do you solve your issue?
You cannot avoid the additional aggregates or the expanded group by as long as the exist in the scope same query. You need to separate the aggregation from selection of additional columns. You basically have 2 choices:
Perform the aggregation in a CTE.
with sums (pipeline_id, row_select, row_insert) as
( select tb1.pipeline
, sum(tb1.row_select) as row_select
, sum(tb1.row_insert) as row_insert
table1 tb1
where tb1.pipeline is not null
and tb1.is_super_parent = false
group by tb1.pipeline
)
select s.pipeline_id
, tbl3.pipeline_name
, tb2."name" integration_name
, s.row_select
, s.row_insert
from sums s
join table2 tbl2 on (s.pipeline_id = tb2.id)
join table3 tbl3 on (s.pipeline_id = tb3.id);
Perform the aggregation in a sub-query.
select s.pipeline_id
, tbl3.pipeline_name
, tb2."name" integration_name
, s.row_select
, s.row_insert
from ( select tb1.pipeline
, sum(tb1.row_select) as row_select
, sum(tb1.row_insert) as row_insert
table1 tb1
where tb1.pipeline is not null
and tb1.is_super_parent = false
group by tb1.pipeline
) s
join table2 tbl2 on (s.pipeline_id = tb2.id)
join table3 tbl3 on (s.pipeline_id = tb3.id);
NOTE: Not tested as no sample data supplied.

How to insert data with select query in postgresql?

How to insert data with select query sum? I tried. The result is affected success > Affected rows: 1. but row no add in the table.
CREATE TABLE Clases(Segment1_ID char(4), Segment1_Name varchar(75), CompanyID char(4), amount Float(53));
INSERT INTO Clases (Segment1_ID, Segment1_Name, CompanyID, amount)
select Segment1_ID, Segment1_Name, dataupload1h_companyid CompanyID, sum(case when Segment3_Formula = '+' then dataupload1d_amount else dataupload1d_amount * -1 end) Amount
from t_dataupload1_header
inner join t_dataupload1_detail
on dataupload1h_id = dataupload1d_id
inner join M_Account
on dataupload1d_accountid = Account_ID
and dataupload1h_companyid = Account_CompanyID
inner join M_Segment4
on Account_Segment4ID = Segment4_ID
and dataupload1h_companyid = Segment4_CompanyID
inner join M_Segment3
on Segment4_Segment3ID = Segment3_ID
and dataupload1h_companyid = Segment3_CompanyID
inner join M_Segment2
on Segment3_Segment2ID = Segment2_ID
and dataupload1h_companyid = Segment2_CompanyID
inner join M_Segment1
on Segment2_Segment1ID = Segment1_ID
and dataupload1h_companyid = Segment1_CompanyID
where dataupload1h_companyid = '1000'
group by Segment1_ID, Segment1_Name, dataupload1h_companyid
order by Segment1_ID, Segment1_Name, dataupload1h_companyid;
Please help. Thank you
If the select is returning the desired results then check whether Auto-commit is turned on in PgAdmin(I suppose you are using this) Or put COMMIT; after the insert.

What is wrong with this t-sql query using left join?

I have two sqlserver 2005 tables: Task table and Travel table.
I am trying to write a query that selects all records from the Task table for a specific Empid, and if the Task is travel, also get the mileage value from another table.
I expect only one row to be selected, but all rows in the Task table are returned.
The query I am using is
Select Tasklog.TaskLogPkey, Tasklog.Empid , Tasklog.Task, TravelLog.Mileage from
Tasklog
left join TravelLog
on
TaskLog.EmpId = 12 and TaskLog.Task ='Travel' and
TaskLog.TaskLogPkey = TravelLog.TaskLogPkey
But it makes no difference if I don’t include
-- TaskLog.EmpId = 12 and TaskLog.Task ='Travel' and
What am I doing wrong?
Clearly the following doesn't work:
Select Tasklog.TaskLogPkey, Tasklog.Empid , Tasklog.Task, TravelLog.Mileage from
Tasklog where TaskLog.EmpId = 12 and TaskLog.Task ='Travel'
left join TravelLog
on
TaskLog.TaskLogPkey = TravelLog.TaskLogPkey
These are the tables I am using:
Create table TaskLog(
TaskLogPkey int not null,
Empid int,
Task varchar(30)
)
Insert Tasklog values(1,12,'Sales')
Insert Tasklog values(2,4,'Travel')
Insert Tasklog values(3,63,'Meeting')
Insert Tasklog values(4,12,'Travel')
Insert Tasklog values(5,12,'Email')
Insert Tasklog values(6,4,'Travel')
Insert Tasklog values(7,63,'Meeting')
Insert Tasklog values(8,12,'PhoneCall')
Create table TravelLog(
TaskLogPkey int not null,
Mileage int
)
Insert TravelLog values(2,45)
Insert TravelLog values(4,25)
Insert TravelLog values(6,18)
When you perform an Outer join ( a left join is a Left Outer Join), you get all the records in the inner side of that join, --- no matter what predicates (filters) you specify on the other outer side in the join conditions.
To filter the other side you need to add a Where clause.
Select t.TaskLogPkey, t.Empid , t.Task, v.Mileage
From Tasklog t
Left Join TravelLog v
On t.TaskLogPkey = v.TaskLogPkey
Where t.Task ='Travel'
And t.EmpId = 12
TaskLog.EmpId = 12 and TaskLog.Task ='Travel' is part of the selection criteria, NOT join criteria.
And "Charles Bretana" beats me by less than 1 minute.
You should try placing a WHERE clause with the additional criteria:
Select Tasklog.TaskLogPkey, Tasklog.Empid , Tasklog.Task, TravelLog.Mileage
from Tasklog
left join TravelLog
on TaskLog.TaskLogPkey = TravelLog.TaskLogPkey
where TaskLog.EmpId = 12
and TaskLog.Task ='Travel'
See SQL Fiddle with Demo

t-sql NOT IN with multiple columns

I have a Microsoft SQL database, where i am trying to insert some data. I have a unique key on 4 columns and i want to insert data from multiple tables into this table while checking the data to make sure it will not violate the uniqueness of the key. If i was doing this on a single column, i would do a NOT IN, like so..
INSERT TABLE_A (FLD_1)
SELECT FLD_1
FROM TBL_B
INNER JOIN TBL_C
ON TBL_B.FLD_1 = TBL_C.FLD_1
WHERE TBL_B.FLD_1 NOT IN
(
SELECT TBL_A.FLD_1 FROM TBL_A
)
Any thoughts?
Use NOT EXISTS instead since you have to deal with multiple columns.
http://www.techonthenet.com/sql/exists.php
EDIT:
Untested, but roughly it will be this:
SELECT FLD_1
FROM TBL_B
INNER JOIN TBL_C ON TBL_B.FLD_1 = TBL_C.FLD_1
WHERE NOT EXISTS
(
SELECT TBL_A.FLD_1 FROM TBL_A INNER JOIN TBL_B ON TBL_B.FLD1 = TBL_A.FLD1
)
For a multi-column check it would be roughly this:
SELECT FLD_1, FLD_2, FLD_3, FLD_4)
FROM TBL_B
INNER JOIN TBL_C ON TBL_B.FLD_1 = TBL_C.FLD_1
WHERE NOT EXISTS
(
SELECT TBL_A.FLD_1, TBL_A.FLD_2, TBL_A.FLD_3, TBL_A.FLD_4
FROM TBL_A
INNER JOIN TBL_B ON TBL_B.FLD1 = TBL_A.FLD1 AND
TBL_B.FLD2 = TBL_A.FLD2 AND
TBL_B.FLD3 = TBL_A.FLD3 AND
TBL_B.FLD4 = TBL_A.FLD4
)
Also EXCEPT keyword can be used.
SELECT FLD_1, FLD_2, FLD_3, FLD_4)
FROM TBL_B
INNER JOIN TBL_C ON TBL_B.FLD_1 = TBL_C.FLD_1
EXCEPT
SELECT TBL_A.FLD_1, TBL_A.FLD_2, TBL_A.FLD_3, TBL_A.FLD_4
FROM TBL_A
INNER JOIN TBL_B ON TBL_B.FLD1 = TBL_A.FLD1 AND
TBL_B.FLD2 = TBL_A.FLD2 AND
TBL_B.FLD3 = TBL_A.FLD3 AND
TBL_B.FLD4 = TBL_A.FLD4