postgresql:
I'm writing a query in postgresql which is getting struck while running. It is not at all returning any records. Could anyone help me on this?
Actual Query:
select a.auditdate,b.description as auditcategory,remoteaddress,u.name as user1,e.name || '[' || e.employeenumber +']' as employee,a.additionalinfo from tblauditlog a
inner join tblauditcategory b on b.cid = a.auditcategory and b.cid<>756
left outer join tbluser u on a.userid=u.cid and u.usertype<>250
left outer join tblemployee e on (a.affectedemployeeid=e.cid or a.affectedemployee=e.cid or a.affectedemployee=e.employeenumber)
where auditdate >= '01 sep 2022' and auditdate <= '15 sep 2022' order by a.auditdate desc
Query Plan:
Nested Loop Left Join (cost=0.71..659969657.09 rows=1026362 width=151)
Join Filter: (a.userid = u.cid)
-> Nested Loop Left Join (cost=0.71..654861596.89 rows=1026362 width=142)
Join Filter: ((a.affectedemployeeid = e.cid) OR ((a.affectedemployee)::text = textin(int4out(e.cid))) OR ((a.affectedemployee)::text = (e.employeenumber)::text))
-> Nested Loop (cost=0.71..268627.45 rows=566679 width=134)
-> Index Scan Backward using idx_tblauditlog_auditdate on tblauditlog a (cost=0.43..101251.69 rows=567370 width=112)
Index Cond: ((auditdate >= '2022-09-01 00:00:00'::timestamp without time zone) AND (auditdate <= '2022-09-15 00:00:00'::timestamp without time zone))
-> Index Scan using tblauditcategory_pkey on tblauditcategory b (cost=0.28..0.30 rows=1 width=30)
Index Cond: (cid = a.auditcategory)
Filter: (cid <> 756)
-> Materialize (cost=0.00..8005.07 rows=46205 width=33)
-> Seq Scan on tblemployee e (cost=0.00..7774.05 rows=46205 width=33)
-> Materialize (cost=0.00..4554.94 rows=331 width=14)
-> Seq Scan on fk_tbluser u (cost=0.00..4553.29 rows=331 width=14)
Filter: (usertype <> '250'::numeric)
(15 rows)
Actual number of records in each table:
tblAuditlog : 6852333
tblAuditCategory : 825
tbluser : 46342
tblemployee : 46014
Index created:
tblAuditlog:
"tblauditlog_pkey" PRIMARY KEY, btree (cid)
"idx_tblauditlog_auditdate" btree (auditdate, auditcategory, userid)
tblAuditcategory:
1. "tblauditcategory_pkey" PRIMARY KEY, btree (cid)
2. "tblauditCategory_unique" UNIQUE CONSTRAINT, btree (code)
3. "idx_tblauditcategory_code" btree (code)
tbluser:
1. "tbluser_pkey" PRIMARY KEY, btree (cid)
2. "tbluser_employeeid_key" UNIQUE CONSTRAINT, btree (employeeid)
3. "tbluser_name_key" UNIQUE CONSTRAINT, btree (name)
4. "uq_fk_tbluser_name_type_employeeid" UNIQUE CONSTRAINT, btree (name, usertype, employeeid)
5. "idx_tbluser_utype" btree (cid, usertype)
tblemployee:
1. "tblemployee_pkey" PRIMARY KEY, btree (cid)
2. "tblemployee_employeeno_key" UNIQUE CONSTRAINT, btree (employeenumber)
3. "tblemployee_guid_key" UNIQUE CONSTRAINT, btree (sid)
4. "idx_tblemployee_employeeno" btree (employeenumber)
Thanks in advance...
Thanks #jjanes. As you suggested the OR condition was the culprit. I have changed the query like this. After the modification, it is getting executed within a fraction of second. Thank you all for your support.
The modified query is:
select a.auditdate,b.description as auditcategory,remoteaddress,u.name as user1,coalesce(e.name,e1.name,e2.name) || '[' || coalesce(e.employeeno,e1.employeeno,e2.employeeno) +']' as employee,a.additionalinfo from tblauditlog a
inner join tblauditcategory b on b.cid = a.auditcategory and b.cid<>756
left outer join tbluser u on a.userid=u.cid and u.usertype<>250
left outer join tblemployee e on (a.affectedemployeeid=e.cid)
left join tblemployee e1 on(a.affectedemployee=e1.cid)
left join tblemployee e2 on(a.affectedemployee=e2.employeeno)
where auditdate >= '01 sep 2022' and auditdate <= '15 sep 2022' order by a.auditdate desc ;
There is a huge improvement in the query plan:
Sort (cost=433884.73..436109.96 rows=890089 width=151)
Sort Key: a.auditdate DESC
-> Hash Join (cost=252015.26..306723.76 rows=890089 width=151)
Hash Cond: (a.auditcategory = b.cid)
-> Merge Left Join (cost=251985.75..297667.18 rows=891175 width=184)
Merge Cond: ((a.affectedemployee)::text = (e2.employeeno)::text)
-> Merge Left Join (cost=251985.33..275914.91 rows=891175 width=172)
Merge Cond: ((a.affectedemployee)::text = (textin(int4out(e1.cid))))
-> Sort (cost=227637.20..229094.12 rows=582767 width=143)
Sort Key: a.affectedemployee
-> Hash Left Join (cost=20079.20..147328.20 rows=582767 width=143)
Hash Cond: (a.userid = u.cid)
-> Hash Left Join (cost=15491.10..141210.24 rows=582767 width=137)
Hash Cond: (a.affectedemployeeid = e.cid)
-> Index Scan Backward using idx_tblauditlog_auditdate on tblauditlog a (cost=0.43..103968.76 rows=582767 width=112)
Index Cond: ((auditdate >= '2022-09-01 00:00:00'::timestamp without time zone) AND (auditdate <= '2022-09-15 00:00:00'::timestamp without time zone))
-> Hash (cost=13227.63..13227.63 rows=111363 width=33)
-> Seq Scan on tblemployee e (cost=0.00..13227.63 rows=111363 width=33)
-> Hash (cost=4583.89..4583.89 rows=337 width=14)
-> Seq Scan on tbluser u (cost=0.00..4583.89 rows=337 width=14)
Filter: (usertype <> '250'::numeric)
-> Materialize (cost=24348.13..24904.95 rows=111363 width=33)
-> Sort (cost=24348.13..24626.54 rows=111363 width=33)
Sort Key: (textin(int4out(e1.cid)))
-> Seq Scan on tblemployee e1 (cost=0.00..13227.63 rows=111363 width=33)
-> Index Scan using tblemployee_employeeno_key on tblemployee e2 (cost=0.42..15775.39 rows=111363 width=29)
-> Hash (cost=19.26..19.26 rows=820 width=30)
-> Seq Scan on tblauditcategory b (cost=0.00..19.26 rows=820 width=30)
Filter: (cid <> 756)
(29 rows)
Thank you once again...
Related
I'm trying to migrate from SQL Server to Postgresql.
Here is my Posgresql code:
Create View person_names As
SELECT lp."Code", n."Name", n."Type"
from "Persons" lp
Left Join LATERAL
(
Select *
From "Names" n
Where n.id = lp.id
Order By "Date" desc
Limit 1
) n on true
limit 100;
Explain
Select "Code" From person_names;
It prints
"Subquery Scan on person_names (cost=0.42..448.85 rows=100 width=10)"
" -> Limit (cost=0.42..447.85 rows=100 width=56)"
" -> Nested Loop Left Join (cost=0.42..303946.91 rows=67931 width=56)"
" -> Seq Scan on ""Persons"" lp (cost=0.00..1314.31 rows=67931 width=10)"
" -> Limit (cost=0.42..4.44 rows=1 width=100)"
" -> Index Only Scan Backward using ""IX_Names_Person"" on ""Names"" n (cost=0.42..4.44 rows=1 width=100)"
" Index Cond: ("id" = (lp."id")::numeric)"
Why there is an "Index Only Scan" for the "Names" table? This table is not required to get a result. On SQL Server I get only a single scan over the "Persons" table.
How can I tune Postgres to get a better query plans? I'm trying the lastest version, which is the Postgresql 15 beta 3.
Here is SQL Server version:
Create View person_names As
SELECT top 100 lp."Code", n."Name", n."Type"
from "Persons" lp
Outer Apply
(
Select Top 1 *
From "Names" n
Where n.id = lp.id
Order By "Date" desc
) n
GO
SET SHOWPLAN_TEXT ON;
GO
Select "Code" From person_names;
It gives correct execution plan:
|--Top(TOP EXPRESSION:((100)))
|--Index Scan(OBJECT:([Persons].[IX_Persons] AS [lp]))
Change the lateral join to a regular left join, then Postgres is able to remove the select on the Names table:
create View person_names
As
SELECT lp.Code, n.Name, n.Type
from Persons lp
Left Join (
Select distinct on (id) *
From Names n
Order By id, Date desc
) n on n.id = lp.id
limit 100;
The following index will support the distinct on () in case you do include columns from the Names table:
create index on "Names"(id, "Date" desc);
For select code from names this gives me this plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on persons lp (cost=0.00..309.00 rows=20000 width=7) (actual time=0.009..1.348 rows=20000 loops=1)
Planning Time: 0.262 ms
Execution Time: 1.738 ms
For select Code, name, type From person_names; this gives me this plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=559.42..14465.93 rows=20000 width=25) (actual time=5.585..68.545 rows=20000 loops=1)
Hash Cond: (n.id = lp.id)
-> Unique (cost=0.42..13653.49 rows=20074 width=26) (actual time=0.053..57.323 rows=20000 loops=1)
-> Index Scan using names_id_date_idx on names n (cost=0.42..12903.49 rows=300000 width=26) (actual time=0.052..41.125 rows=300000 loops=1)
-> Hash (cost=309.00..309.00 rows=20000 width=11) (actual time=5.407..5.407 rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1116kB
-> Seq Scan on persons lp (cost=0.00..309.00 rows=20000 width=11) (actual time=0.011..2.036 rows=20000 loops=1)
Planning Time: 0.460 ms
Execution Time: 69.180 ms
Of course I had to guess the table structures as you haven't provided any DDL.
Online example
Change your view definition like that
create view person_names as
select p."Code",
(select "Name"
from "Names" n
where n.id = p.id
order by "Date" desc
limit 1)
from "Persons" p
limit 100;
I have a log table with about 5 million records:
id BIGSERIAL,
object_type_name VARCHAR(255),
object_id BIGINT,
user_id BIGINT,
service_id BIGINT,
op_id INTEGER,
dt TIMESTAMP(0) WITHOUT TIME ZONE DEFAULT now(),
property_name VARCHAR(255),
CONSTRAINT object_log_object_log_pkey PRIMARY KEY(id)
I need to delete duplicate records leaving only the latest record (the one with the max id). The problem is that my query is very slow (> 1 min):
DELETE FROM sys.object_log AS t5
USING (
SELECT t3.id
FROM sys.object_log t3 LEFT JOIN (
SELECT t1.id
FROM sys.object_log t1
WHERE t1.id = (
SELECT max(t2.id)
FROM sys.object_log t2
WHERE t2.object_type_name = t1.object_type_name
AND t2.object_id = t1.object_id
AND t2.property_name = t1.property_name
)
) t4 ON t3.id=t4.id
WHERE t4.id IS NULL
) t6
WHERE t5.id = t6.id
QUERY PLAN
Delete on object_log t5 (cost=1.30..72821293.06 rows=8298362 width=18)
-> Merge Join (cost=1.30..72821293.06 rows=8298362 width=18)
Merge Cond: (t3.id = t5.id)
-> Merge Anti Join (cost=0.86..72365877.02 rows=8298362 width=20)
Merge Cond: (t3.id = t1.id)
-> Index Scan using object_log_object_log_pkey on object_log t3 (cost=0.43..330836.36 rows=8340062 width=14)
-> Index Scan using object_log_object_log_pkey on object_log t1 (cost=0.43..72013669.25 rows=41700 width=14)
Filter: (id = (SubPlan 1))
SubPlan 1
-> Aggregate (cost=8.58..8.59 rows=1 width=8)
-> Index Only Scan using object_log_idx1 on object_log t2 (cost=0.56..8.58 rows=1 width=8)
Index Cond: ((object_type_name = (t1.object_type_name)::text) AND (object_id = t1.object_id) AND (property_name = (t1.property_name)::text))
-> Index Scan using object_log_object_log_pkey on object_log t5 (cost=0.43..330836.36 rows=8340062 width=14)
Any idea how to improve performance ?
UPD.1
Next query is also slow:
DELETE FROM sys.object_log
WHERE id IN (
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER w AS rnum
FROM sys.object_log
WINDOW w AS (PARTITION BY object_type_name, object_id, property_name ORDER BY id)
) t
WHERE t.rnum > 1)
QUERY PLAN
QUERY PLAN
Delete on object_log (cost=1703454.67..1960873.74 rows=2780021 width=38)
-> Hash Semi Join (cost=1703454.67..1960873.74 rows=2780021 width=38)
Hash Cond: (object_log.id = t.id)
-> Seq Scan on object_log (cost=0.00..197648.62 rows=8340062 width=14)
-> Hash (cost=1668704.40..1668704.40 rows=2780021 width=40)
-> Subquery Scan on t (cost=1355952.08..1668704.40 rows=2780021 width=40)
Filter: (t.rnum > 1)
-> WindowAgg (cost=1355952.08..1564453.63 rows=8340062 width=38)
-> Sort (cost=1355952.08..1376802.23 rows=8340062 width=30)
Sort Key: object_log_1.object_type_name, object_log_1.object_id, object_log_1.property_name, object_log_1.id
-> **Seq Scan** on object_log object_log_1 (cost=0.00..197648.62 rows=8340062 width=30)
This is what I use for this: https://wiki.postgresql.org/wiki/Deleting_duplicates
Hope you find it useful.
Bjarni
I want to index my tables for the following query:
select
t.*
from main_transaction t
left join main_profile profile on profile.id = t.profile_id
left join main_customer customer on (customer.id = profile.user_id)
where
(upper(t.request_no) like upper(('%'||#requestNumber||'%')) or OR upper(c.phone) LIKE upper(concat('%',||#phoneNumber||,'%')))
and t.service_type = 'SERVICE_1'
and t.status = 'SUCCESS'
and t.mode = 'AUTO'
and t.transaction_type = 'WITHDRAW'
and customer.client = 'corp'
and t.pub_date>='2018-09-05' and t.pub_date<='2018-11-05'
order by t.pub_date desc, t.id asc
LIMIT 1000;
This is how I tried to index my tables:
CREATE INDEX main_transaction_pr_id ON main_transaction (profile_id);
CREATE INDEX main_profile_user_id ON main_profile (user_id);
CREATE INDEX main_customer_client ON main_customer (client);
CREATE INDEX main_transaction_gin_req_no ON main_transaction USING gin (upper(request_no) gin_trgm_ops);
CREATE INDEX main_customer_gin_phone ON main_customer USING gin (upper(phone) gin_trgm_ops);
CREATE INDEX main_transaction_general ON main_transaction (service_type, status, mode, transaction_type); --> don't know if this one is true!!
After indexing like above my query is spending over 4.5 seconds for just selecting 1000 rows!
I am selecting from the following table which has 34 columns including 3 FOREIGN KEYs and it has over 3 million data rows:
CREATE TABLE main_transaction (
id integer NOT NULL DEFAULT nextval('main_transaction_id_seq'::regclass),
description character varying(255) NOT NULL,
request_no character varying(18),
account character varying(50),
service_type character varying(50),
pub_date" timestamptz(6) NOT NULL,
"service_id" varchar(50) COLLATE "pg_catalog"."default",
....
);
I am also joining two tables (main_profile, main_customer) for searching customer.phone and for selecting customer.client. To get to the main_customer table from main_transaction table, I can only go by main_profile
My question is how can I index my table too increase performance for above query?
Please, do not use UNION for OR for this case (upper(t.request_no) like upper(('%'||#requestNumber||'%')) or OR upper(c.phone) LIKE upper(concat('%',||#phoneNumber||,'%'))) instead can we use case when condition? Because, I have to convert my PostgreSQL query into Hibernate JPA! And I don't know how to convert UNION except Hibernate - Native SQL which I am not allowed to use.
Explain:
Limit (cost=411601.73..411601.82 rows=38 width=1906) (actual time=3885.380..3885.381 rows=1 loops=1)
-> Sort (cost=411601.73..411601.82 rows=38 width=1906) (actual time=3885.380..3885.380 rows=1 loops=1)
Sort Key: t.pub_date DESC, t.id
Sort Method: quicksort Memory: 27kB
-> Hash Join (cost=20817.10..411600.73 rows=38 width=1906) (actual time=3214.473..3885.369 rows=1 loops=1)
Hash Cond: (t.profile_id = profile.id)
Join Filter: ((upper((t.request_no)::text) ~~ '%20181104-2158-2723948%'::text) OR (upper((customer.phone)::text) ~~ '%20181104-2158-2723948%'::text))
Rows Removed by Join Filter: 593118
-> Seq Scan on main_transaction t (cost=0.00..288212.28 rows=205572 width=1906) (actual time=0.068..1527.677 rows=593119 loops=1)
Filter: ((pub_date >= '2016-09-05 00:00:00+05'::timestamp with time zone) AND (pub_date <= '2018-11-05 00:00:00+05'::timestamp with time zone) AND ((service_type)::text = 'SERVICE_1'::text) AND ((status)::text = 'SUCCESS'::text) AND ((mode)::text = 'AUTO'::text) AND ((transaction_type)::text = 'WITHDRAW'::text))
Rows Removed by Filter: 2132732
-> Hash (cost=17670.80..17670.80 rows=180984 width=16) (actual time=211.211..211.211 rows=181516 loops=1)
Buckets: 131072 Batches: 4 Memory Usage: 3166kB
-> Hash Join (cost=6936.09..17670.80 rows=180984 width=16) (actual time=46.846..183.689 rows=181516 loops=1)
Hash Cond: (customer.id = profile.user_id)
-> Seq Scan on main_customer customer (cost=0.00..5699.73 rows=181106 width=16) (actual time=0.013..40.866 rows=181618 loops=1)
Filter: ((client)::text = 'corp'::text)
Rows Removed by Filter: 16920
-> Hash (cost=3680.04..3680.04 rows=198404 width=8) (actual time=46.087..46.087 rows=198404 loops=1)
Buckets: 131072 Batches: 4 Memory Usage: 2966kB
-> Seq Scan on main_profile profile (cost=0.00..3680.04 rows=198404 width=8) (actual time=0.008..20.099 rows=198404 loops=1)
Planning time: 0.757 ms
Execution time: 3885.680 ms
With the restriction to not use UNION, you won't get a good plan.
You can slightly speed up processing with the following indexes:
main_transaction ((service_type::text), (status::text), (mode::text),
(transaction_type::text), pub_date)
main_customer ((client::text))
These should at least get rid of the sequential scans, but the hash join that takes the lion's share of the processing time will remain.
I have a query running in Postgres 9.3.9 where I want to delete some records from a temp table based on using an EXISTS clause that matches a specific partial index condition I created. The following related query uses an Index Only Scan on this partial index (abbreviated as 'conditions' below):
EXPLAIN
SELECT l.id
FROM temp_table l
WHERE NOT EXISTS
(SELECT 1
FROM customers cx
WHERE cx.id = l.customer_id
AND ( conditions ));
QUERY PLAN
----------------------------------------------------------------------------------------------
Nested Loop Anti Join (cost=0.42..252440.38 rows=43549 width=4)
-> Seq Scan on temp_table l (cost=0.00..1277.98 rows=87098 width=8)
-> Index Only Scan using customers__bad on customers cx (cost=0.42..3.35 rows=1 width=4)
Index Cond: (id = l.customer_id)
(4 rows)
Here is the actual delete query SQL. This doesn't but I am convinced should use the same Index Only Scan as above, and I wonder if it's a bug in Postgres? Notice the higher cost:
DELETE
FROM temp_table l
WHERE EXISTS(SELECT 1
FROM cnu.customers cx
WHERE cx.id = l.customer_id
AND ( conditions ));
QUERY PLAN
------------------------------------------------------------------------------------------------
Delete on temp_table l (cost=0.42..495426.94 rows=43549 width=12)
-> Nested Loop Semi Join (cost=0.42..495426.94 rows=43549 width=12)
-> Seq Scan on temp_table l (cost=0.00..1277.98 rows=87098 width=10)
-> Index Scan using customers__bad on customers cx (cost=0.42..6.67 rows=1 width=10)
Index Cond: (id = l.customer_id)
(5 rows)
To show that it should be possible on delete to get the same plan, I had to do this, and it gave me the plan I wanted, and was twice as fast as the query above that uses an Index Scan instead of Index Only Scan:
WITH the_right_records AS
(SELECT l.id
FROM temp_table l
WHERE NOT EXISTS
(SELECT 1
FROM cnu.customers cx
WHERE cx.id = l.customer_id
AND ( conditions ))
DELETE FROM temp_table t
WHERE NOT EXISTS (SELECT 1
FROM the_right_records x
WHERE x.id = t.id);
QUERY PLAN
------------------------------------------------------------------------------------------------------
Delete on temp_table t (cost=253855.72..256902.88 rows=43549 width=34)
CTE the_right_records
-> Nested Loop Anti Join (cost=0.42..252440.38 rows=43549 width=4)
-> Seq Scan on temp_table l (cost=0.00..1277.98 rows=87098 width=8)
-> Index Only Scan using customers__bad on customers cx (cost=0.42..3.35 rows=1 width=4)
Index Cond: (id = l.customer_id)
-> Hash Anti Join (cost=1415.34..4462.50 rows=43549 width=34)
Hash Cond: (t.id = x.id)
-> Seq Scan on temp_table t (cost=0.00..1277.98 rows=87098 width=10)
-> Hash (cost=870.98..870.98 rows=43549 width=32)
-> CTE Scan on the_right_records x (cost=0.00..870.98 rows=43549 width=32)
(11 rows)
I've noticed this same behavior in other examples. So anyone have any ideas?
explain select count(1) from tab1_201502 t1, tab2_201502 t2
where t1.serv_no=t2.serv_no
and t1.PC_LOGIN_COUNT1 >5
and t1.FET_WZ_FEE < 80
and t2.ALL_FLOW_2G<50;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=4358706.25..4358706.26 rows=1 width=0)
-> Merge Join (cost=4339930.99..4358703.30 rows=1179 width=0)
Merge Cond: ((t1.serv_no)::text = (t2.serv_no)::text)
-> Index Scan using tab1_201502_serv_no_idx on tab1_201502 t1
(cost=0.56..6239071.57 rows=263219 width=12)
Filter: ((pc_login_count1 > 5::numeric)
AND (fet_wz_fee < 80::numeric))
-> Sort (cost=4339914.76..4340306.63 rows=156747 width=12)
Sort Key: t2.serv_no
-> Seq Scan on tab2_201502 t2
(cost=0.00..4326389.00 rows=156747 width=12)
Filter: (all_flow_2g < 50::numeric)
All tables are indexed on serv_no.
Why is PostgreSQL ignoring the tab2_201502 index for scan?
This is your query:
select count(1)
from tab1_201502 t1 join
tab2_201502 t2
on t1.serv_no = t2.serv_no
where t1.PC_LOGIN_COUNT1 > 5 and t1.FET_WZ_FEE < 80 and t2.ALL_FLOW_2G < 50;
Postgres is deciding that filtering by the where clause is more important than performing the join.
I would recommend trying two sets of indexes for this query. They are: tab2_201502(ALL_FLOW_2G, serv_no) and tab1_201502(serv_no, PC_LOGIN_COUNT1, FET_WZ_FEE).
The second pair is: tab1_201502(PC_LOGIN_COUNT1, FET_WZ_FEE, serv_no) and tab2_201502(serv_no, ALL_FLOW_2G).
Which works better depends on which table is the driving table for the join.