My postgres server details are below using
postgres 13 version, RAM 52GB,SSD 1000GB And the DB size is 300GB
Here is my Query
select distinct "col2","col3","col1"
from table1(foreign table)
where "col2" not in (select "col4"
from table2(foreign table)
where "col9" = 'data1'
and "col10"='A')
and "col2" not in (select "col13"
from table5(foreign table)
where "col11" = 'A'
and "col12" in ('data1', 'data2', 'data3', 'data4'))
and "col6" > '2022-01-01' and "col10" = 'A' and "col18" = 'P'
and not "col7" = 'V' and "Type" = 'A'
order by "col1"
Here is my Explain Plan
"Unique (cost=372.13..372.14 rows=1 width=1074) (actual time=145329.010..145329.136 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Buffers: shared hit=3"
" -> Sort (cost=372.13..372.14 rows=1 width=1074) (actual time=145329.008..145329.027 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Sort Key: table1.""col1"", table1.""col2"", table1.""col3"""
" Sort Method: quicksort Memory: 63kB"
" Buffers: shared hit=3"
" -> Foreign Scan on public.table1 (cost=360.38..372.12 rows=1 width=1074) (actual time=144430.980..145327.532 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)))"
" Rows Removed by Filter: 253144"
" Remote SQL: SELECT ""col2"", ""col3"", ""col1"" FROM dbo.table4 WHERE ((""col6"" > '2022-01-01 00:00:00'::timestamp without time zone)) AND ((""col7"" <> 'V'::text)) AND ((""col8"" = 'A'::text))"
" SubPlan 1"
" -> Foreign Scan on public.table2 (cost=100.00..128.63 rows=1 width=42) (actual time=2.169..104702.862 rows=50573365 loops=1)"
" Output: table2.""col4"""
" Remote SQL: SELECT ""col5"" FROM dbo.table3 WHERE ((""col9"" = 'data1'::text)) AND ((""col10"" = 'A'::text))"
" SubPlan 2"
" -> Foreign Scan on public.table5 (cost=100.00..131.74 rows=1 width=42) (actual time=75.363..1015.498 rows=360240 loops=1)"
" Output: table5.""col13"""
" Remote SQL: SELECT ""col14"" FROM dbo.table6 WHERE ((""col11"" = 'A'::text)) AND ((""col12"" = ANY ('{data1,data2,data3,data4}'::text[])))"
"Planning:"
" Buffers: shared hit=142"
"Planning Time: 1.887 ms"
"Execution Time: 145620.958 ms"
table1 - 4mln row count
table2 - 250mln row count
table3 - 400mln row count
Table Definition table1
CREATE TABLE IF NOT EXISTS table1
(
"col1" character varying(12) ,
"col" character varying(1) ,
"col" character varying(1) ,
...
...
);
Indexes are exist on other column not having on query columns "col2","col3","col1"
Table Definition table2
CREATE TABLE IF NOT EXISTS table2
(
"col4" character varying(12) ,
"col9" character varying(1) ,
"col10" character varying(1) ,
...
...
);
Indexes are exist on table2
CREATE INDEX index1 ON table2("col4" ASC,"col9" ASC,"col" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col" ASC,"col9" ASC,"col4" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col9" ASC,"col4" ASC,"col" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col" ASC,"col9" ASC,"col10" ASC,"col" ASC);
Table Definition table5
CREATE TABLE IF NOT EXISTS table5
(
"col11" character varying(12) ,
"col13" character varying(1) ,
"col" character varying(1) ,
...
...
);
Indexes are exist on table5
CREATE INDEX index ON table5("col" ASC, "col" ASC,"col11" ASC);
CREATE INDEX index ON table5("col13" ASC,"col11" ASC);
CREATE INDEX index ON table5("col" ASC,"col13" ASC,"col11" ASC)INCLUDE ("col");
CREATE INDEX index ON table5("col" ASC, "col" ASC,"col11" ASC);
How to speed up the following query execution? it took 3 minutes just to retrieve 365 records.
Here is my EXPLAIN (ANALYZE, BUFFERS)
"Unique (cost=372.13..372.14 rows=1 width=1074) (actual time=110631.114..110631.262 rows=336 loops=1)"
" -> Sort (cost=372.13..372.14 rows=1 width=1074) (actual time=110631.111..110631.142 rows=336 loops=1)"
" Sort Key: table1.""col1"", table1.""col2"", table1.""col3"""
" Sort Method: quicksort Memory: 63kB"
" -> Foreign Scan on table1 (cost=360.38..372.12 rows=1 width=1074) (actual time=110432.132..110629.640 rows=336 loops=1)"
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)))"
" Rows Removed by Filter: 253144"
" SubPlan 1"
" -> Foreign Scan on table2 (cost=100.00..128.63 rows=1 width=42) (actual time=63638.173..71979.772 rows=50573365 loops=1)"
" SubPlan 2"
" -> Foreign Scan on table5 (cost=100.00..131.74 rows=1 width=42) (actual time=569.126..630.782 rows=360240 loops=1)"
"Planning Time: 0.266 ms"
"Execution Time: 111748.715 ms"
Here is my EXPLAIN (ANALYZE, BUFFERS) of the "remote SQL" when executed on the remote database
"Limit (cost=4157478.69..4157602.66 rows=1000 width=47) (actual time=68356.908..68681.831 rows=336 loops=1)"
" Buffers: shared hit=66205118"
" -> Unique (cost=4157478.69..4164948.04 rows=60253 width=47) (actual time=68356.905..68681.801 rows=336 loops=1)"
" Buffers: shared hit=66205118"
" -> Gather Merge (cost=4157478.69..4164496.14 rows=60253 width=47) (actual time=68356.901..68681.718 rows=336 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=66205118"
" -> Sort (cost=4156478.66..4156541.43 rows=25105 width=47) (actual time=66154.447..66154.459 rows=112 loops=3)"
" Sort Key: table4.""col1"", table4.""col2"", table4.""col3"""
" Sort Method: quicksort Memory: 63kB"
" Buffers: shared hit=66205118"
" Worker 0: Sort Method: quicksort Memory: 25kB"
" Worker 1: Sort Method: quicksort Memory: 25kB"
" -> Parallel Seq Scan on table4 (cost=3986703.25..4154644.03 rows=25105 width=47) (actual time=66041.929..66153.663 rows=112 loops=3)"
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND (""col6"" > '2022-01-01 00:00:00'::timestamp without time zone) AND ((""col7"")::text <> 'V'::text) AND ((""col8"")::text = 'A'::text))"
" Rows Removed by Filter: 1236606"
" Buffers: shared hit=66205102"
" SubPlan 1"
" -> Index Only Scan using col20 on table3 (cost=0.70..2696555.01 rows=50283867 width=13) (actual time=0.134..25085.583 rows=50573365 loops=3)"
" Index Cond: ((""col9"" = 'data1'::text) AND (""col10"" = 'A'::text))"
" Heap Fetches: 0"
" Buffers: shared hit=65737946"
" SubPlan 2"
" -> Bitmap Heap Scan on table6 (cost=4962.91..1163549.12 rows=355779 width=13) (actual time=160.770..440.978 rows=360240 loops=3)"
" Recheck Cond: (((""col12"")::text = ANY ('{data1,data2,data3,data4}'::text[])) AND ((""col11"")::text = 'A'::text))"
" Heap Blocks: exact=110992"
" Buffers: shared hit=333992"
" -> Bitmap Index Scan on col21 (cost=0.00..4873.97 rows=355779 width=0) (actual time=120.354..120.354 rows=360240 loops=3)"
" Index Cond: (((""col12"")::text = ANY ('{data1,data2,data3,data4}'::text[])) AND ((""col11"")::text = 'A'::text))"
" Buffers: shared hit=1016"
"Planning:"
" Buffers: shared hit=451"
"Planning Time: 4.039 ms"
"Execution Time: 69001.171 ms"
I use postgres 10.4 in linux environment .
I have a query which is very slow when I run it.
the search using subject with arabic language is very slow.
also the problem exist with subject in different language.
I have about 1 million record per year.
I try to add index in the table transfer but the result is the same
CREATE INDEX subject
ON public.transfer
USING btree
(subject COLLATE pg_catalog."default" varchar_pattern_ops);
this is the query .
select * from ( with scope as (
select unit_id from public.sec_unit
where emp_id= 'EM-001'and app_type in ('S','E') )
select CAST (row_number() OVER (PARTITION BY advsearch.correspondenceid)
as VARCHAR(15)) as numline, advsearch.*
from (
SELECT Transfer.id AS id, CORRESP.id AS correspondenceId,
Transfer.correspondencecopy_id AS correspondencecopyId, Transfer.datesendjctransfer AS datesendjctransfer
FROM Transfer Transfer
Left outer JOIN correspondencecopy CORRESPCPY ON Transfer.correspondencecopy_id = CORRESPCPY.id
Left outer JOIN correspondence CORRESP ON CORRESP.id = CORRESPCPY.correspondence_id
LEFT OUTER JOIN scope sc on sc.unit_id = Transfer.unittransto_id or sc.unit_id='allorg'
LEFT OUTER JOIN employee emp on emp.id = 'EM-001'
WHERE transfer.status='1' AND (Transfer.docyearhjr='1441' )
AND (Transfer.subject like '%'||'رقم'||'%')
AND ( sc.unit_id is not null )
AND (coalesce(emp.confidentiel,'0') >= coalesce(Transfer.confidentiel,'0'))
)
advsearch ) Searchlist
WHERE Searchlist.numline='1'
ORDER BY Searchlist.datesendjctransfer
can someone help me to optimise the query
updated
I try to change the query.
I eliminate the use of scope.
I change it by simple condition.
I have the same result ( the same number of record )
but the problem is the same : the query is still very slow
select * from (
select CAST (row_number() OVER (PARTITION BY advsearch.correspondenceid)
as VARCHAR(15)) as numline, advsearch.*
from (
SELECT Transfer.id AS id, CORRESP.id AS correspondenceId,
Transfer.correspondencecopy_id AS correspondencecopyId, Transfer.datesendjctransfer AS datesendjctransfer
FROM Transfer Transfer
Left outer JOIN correspondencecopy CORRESPCPY ON Transfer.correspondencecopy_id = CORRESPCPY.id
Left outer JOIN correspondence CORRESP ON CORRESP.id = CORRESPCPY.correspondence_id
LEFT OUTER JOIN employee emp on emp.id = 'EM-001'
WHERE transfer.status='1' and ( Transfer.unittransto_id in (
select unit_id from public.security_employee_unit
where employee_id= 'EM-001'and app_type in ('E','S') )
or 'allorg' in ( select unit_id from public.security_employee_unit
where employee_id= 'EM-001'and app_type in ('S')))
AND (Transfer.docyearhjr='1441' )
AND (Transfer.subject like '%'||'رقم'||'%')
AND (coalesce(emp.confidentiel,'0') >= coalesce(Transfer.confidentiel,'0'))
)
advsearch ) Searchlist
WHERE Searchlist.numline='1'
ORDER BY Searchlist.datesendjctransfer
Updated :
I try to analyze the query using EXPLAIN ANALYZE
this the result :
"Sort (cost=412139.09..412139.13 rows=17 width=87) (actual time=1481.951..1482.166 rows=4497 loops=1)"
" Sort Key: searchlist.datesendjctransfer"
" Sort Method: quicksort Memory: 544kB"
" -> Subquery Scan on searchlist (cost=412009.59..412138.74 rows=17 width=87) (actual time=1457.717..1480.381 rows=4497 loops=1)"
" Filter: ((searchlist.numline)::text = '1'::text)"
" Rows Removed by Filter: 38359"
" -> WindowAgg (cost=412009.59..412095.69 rows=3444 width=87) (actual time=1457.715..1477.146 rows=42856 loops=1)"
" CTE scope"
" -> Bitmap Heap Scan on security_employee_unit (cost=8.59..15.83 rows=2 width=7) (actual time=0.043..0.058 rows=2 loops=1)"
" Recheck Cond: (((employee_id)::text = 'EM-001'::text) AND ((app_type)::text = ANY ('{SE,I}'::text[])))"
" Heap Blocks: exact=2"
" -> Bitmap Index Scan on employeeidkey (cost=0.00..8.59 rows=2 width=0) (actual time=0.037..0.037 rows=2 loops=1)"
" Index Cond: (((employee_id)::text = 'EM-001'::text) AND ((app_type)::text = ANY ('{SE,I}'::text[])))"
" -> Sort (cost=411993.77..412002.38 rows=3444 width=39) (actual time=1457.702..1461.773 rows=42856 loops=1)"
" Sort Key: corresp.id"
" Sort Method: external merge Disk: 2440kB"
" -> Nested Loop Left Join (cost=18315.99..411791.43 rows=3444 width=39) (actual time=1271.209..1295.423 rows=42856 loops=1)"
" Filter: ((COALESCE(emp.confidentiel, '0'::character varying))::text >= (COALESCE(transfer.confidentiel, '0'::character varying))::text)"
" -> Nested Loop (cost=18315.71..411628.14 rows=10333 width=41) (actual time=1271.165..1283.365 rows=42856 loops=1)"
" Join Filter: (((sc.unit_id)::text = (transfer.unittransto_id)::text) OR ((sc.unit_id)::text = 'allorg'::text))"
" Rows Removed by Join Filter: 42856"
" -> CTE Scan on scope sc (cost=0.00..0.04 rows=2 width=48) (actual time=0.045..0.064 rows=2 loops=1)"
" Filter: (unit_id IS NOT NULL)"
" -> Materialize (cost=18315.71..411292.44 rows=10328 width=48) (actual time=53.970..635.651 rows=42856 loops=2)"
" -> Gather (cost=18315.71..411240.80 rows=10328 width=48) (actual time=107.919..1254.600 rows=42856 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Nested Loop Left Join (cost=17315.71..409208.00 rows=4303 width=48) (actual time=104.436..1250.461 rows=14285 loops=3)"
" -> Nested Loop Left Join (cost=17315.28..405979.02 rows=4303 width=48) (actual time=104.382..1136.591 rows=14285 loops=3)"
" -> Parallel Bitmap Heap Scan on transfer (cost=17314.85..377306.25 rows=4303 width=39) (actual time=104.287..996.609 rows=14285 loops=3)"
" Recheck Cond: ((docyearhjr)::text = '1441'::text)"
" Rows Removed by Index Recheck: 437299"
" Filter: (((subject)::text ~~ '%رقم%'::text) AND ((status)::text = '1'::text))"
" Rows Removed by Filter: 297178"
" Heap Blocks: exact=14805 lossy=44734"
" -> Bitmap Index Scan on docyearhjr (cost=0.00..17312.27 rows=938112 width=0) (actual time=96.028..96.028 rows=934389 loops=1)"
" Index Cond: ((docyearhjr)::text = '1441'::text)"
" -> Index Scan using pk_correspondencecopy on correspondencecopy correspcpy (cost=0.43..6.66 rows=1 width=21) (actual time=0.009..0.009 rows=1 loops=42856)"
" Index Cond: ((transfer.correspondencecopy_id)::text = (id)::text)"
" -> Index Only Scan using pk_correspondence on correspondence corresp (cost=0.42..0.75 rows=1 width=9) (actual time=0.007..0.007 rows=1 loops=42856)"
" Index Cond: (id = (correspcpy.correspondence_id)::text)"
" Heap Fetches: 14227"
" -> Materialize (cost=0.28..8.31 rows=1 width=2) (actual time=0.000..0.000 rows=1 loops=42856)"
" -> Index Scan using pk_employee on employee emp (cost=0.28..8.30 rows=1 width=2) (actual time=0.038..0.038 rows=1 loops=1)"
" Index Cond: ((id)::text = 'EM-001'::text)"
"Planning time: 1.595 ms"
"Execution time: 1487.303 ms"
Disclaimer: I am using Entity Framework Core, so I am somewhat restricted in what shape the query below can take.
I have a large customer database (1.1 million records) and am using an API to select a single customer by mca_id and bring back the customer and all their related data from contact_information, address, and so on. It's currently taking around 5 - 700ms which seems very slow for retrieving a single record.
Here is the query (designed to bring back all information related to this customer. Note that Entity Framework Core (.NET/C#) enforces the ORDER BY at the bottom, so there's not much I can do about that).
SELECT
t.customer_internal_id,
t.business_partner_id,
t.created_date,
t.customer_type,
t.date_of_birth,
t.first_name,
t.gender,
t.home_store_id,
t.home_store_updated,
t.last_name,
t.loyalty_db_id,
t.mca_id,
t.status,
t.status_reason,
t.store_joined,
t.title,
t.updated_by,
t.updated_date,
t.updating_store,
c0.contact_internal_id,
c0.contact_type,
c0.contact_value,
c0.created_date,
c0.customer_internal_id,
c0.updated_by,
c0.updated_date,
c0.updating_store,
c0.validated,
a.address_internal_id,
a.address_line_1,
a.address_line_2,
a.address_type,
a.address_undeliverable,
a.address_validated,
a.country,
a.created_date,
a.customer_internal_id,
a.postcode,
a.region,
a.suburb,
a.updated_by,
a.updated_date,
a.updating_store,
m.customer_internal_id,
m.channel_id,
m.created_date,
m.opt_in,
m.updated_by,
m.updated_date,
m.updating_store,
m.valid_from_date,
c1.customer_internal_id,
c1.channel_id,
c1.type_id,
c1.created_date,
c1.opt_in,
c1.updated_by,
c1.updated_date,
c1.updating_store,
c1.valid_from_date,
e.customer_internal_id,
e.card_number,
e.card_design,
e.card_status,
e.card_type,
e.created_date,
e.updated_by,
e.updated_date,
e.updating_store
FROM
(
SELECT
c.customer_internal_id,
c.business_partner_id,
c.created_date,
c.customer_type,
c.date_of_birth,
c.first_name,
c.gender,
c.home_store_id,
c.home_store_updated,
c.last_name,
c.loyalty_db_id,
c.mca_id,
c.status,
c.status_reason,
c.store_joined,
c.title,
c.updated_by,
c.updated_date,
c.updating_store
FROM
customer AS c
WHERE
c.mca_id = '2701159742879#priceline.com.au'
LIMIT
1
) AS t
LEFT JOIN contact_information AS c0 ON t.customer_internal_id = c0.customer_internal_id
LEFT JOIN address AS a ON t.customer_internal_id = a.customer_internal_id
LEFT JOIN marketing_preferences AS m ON t.customer_internal_id = m.customer_internal_id
LEFT JOIN content_type_preferences AS c1 ON t.customer_internal_id = c1.customer_internal_id
LEFT JOIN external_cards AS e ON t.customer_internal_id = e.customer_internal_id
ORDER BY
t.customer_internal_id,
c0.contact_internal_id,
c0.contact_type,
a.address_internal_id,
m.customer_internal_id,
m.channel_id,
c1.customer_internal_id,
c1.channel_id,
c1.type_id,
e.customer_internal_id,
e.card_number
The following are the primary / foreign keys:
Customer PRIMARY KEY ("customer_internal_id")
Address PRIMARY KEY ("address_internal_id"), CONSTRAINT
"address_customer_internal_id_fkey" FOREIGN KEY
("customer_internal_id") REFERENCES "public"."customer"
("customer_internal_id")
Contact_Information PRIMARY KEY ("contact_internal_id", "contact_type"),
CONSTRAINT "contact_information_customer_internal_id_fkey" FOREIGN KEY ("customer_internal_id") REFERENCES "public"."customer" ("customer_internal_id")
External_Cards PRIMARY KEY ("customer_internal_id", "card_number"),
CONSTRAINT "external_cards_customer_internal_id_fkey" FOREIGN KEY ("customer_internal_id") REFERENCES "public"."customer" ("customer_internal_id")
The following indexes are in place:
CREATE INDEX idx_cust_contact ON contact_information (customer_internal_id);
CREATE INDEX idx_cust_address ON address (customer_internal_id);
CREATE INDEX idx_cust_mkpref ON marketing_preferences (customer_internal_id);
CREATE INDEX idx_cust_content ON content_type_preferences (customer_internal_id);
CREATE INDEX idx_cust_cards ON external_cards (customer_internal_id);
CREATE INDEX idx_cust_mcaid ON customer (mca_id);
This is the EXPLAIN from the query:
"Sort (cost=103957.16..103957.20 rows=18 width=687)"
" Sort Key: c.customer_internal_id, c0.contact_internal_id, c0.contact_type, a.address_internal_id, m.customer_internal_id, m.channel_id, c1.customer_internal_id, c1.channel_id, c1.type_id, e.customer_internal_id, e.card_number"
" -> Nested Loop Left Join (cost=35817.63..103956.78 rows=18 width=687)"
" -> Nested Loop Left Join (cost=35813.32..103867.36 rows=6 width=631)"
" -> Nested Loop Left Join (cost=35809.02..103833.42 rows=3 width=506)"
" -> Hash Right Join (cost=35808.74..103808.50 rows=3 width=433)"
" Hash Cond: (c0.customer_internal_id = c.customer_internal_id)"
" -> Seq Scan on contact_information c0 (cost=0.00..59117.35 rows=2368635 width=115)"
" -> Hash (cost=35808.73..35808.73 rows=1 width=318)"
" -> Hash Right Join (cost=8.47..35808.73 rows=1 width=318)"
" Hash Cond: (a.customer_internal_id = c.customer_internal_id)"
" -> Seq Scan on address a (cost=0.00..31425.09 rows=1166709 width=148)"
" -> Hash (cost=8.46..8.46 rows=1 width=170)"
" -> Limit (cost=0.43..8.45 rows=1 width=170)"
" -> Index Scan using idx_cust_mcaid on customer c (cost=0.43..8.45 rows=1 width=170)"
" Index Cond: ((mca_id)::text = '2701159742879#priceline.com.au'::text)"
" -> Index Scan using external_cards_pkey on external_cards e (cost=0.28..8.30 rows=1 width=73)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" -> Bitmap Heap Scan on content_type_preferences c1 (cost=4.30..11.29 rows=2 width=125)"
" Recheck Cond: (c.customer_internal_id = customer_internal_id)"
" -> Bitmap Index Scan on content_type_preferences_pkey (cost=0.00..4.30 rows=2 width=0)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" -> Bitmap Heap Scan on marketing_preferences m (cost=4.31..14.87 rows=3 width=56)"
" Recheck Cond: (c.customer_internal_id = customer_internal_id)"
" -> Bitmap Index Scan on marketing_preferences_pkey (cost=0.00..4.31 rows=3 width=0)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
It seems most of the cost is in these nested loop joins, but I'm not sure how to attack that problem. I did have a seq scan on customer originally, but I addressed that with an index on mca_id but it made virtually no difference to the execution time.
EDIT: Update. I added a couple of hash indexes to cater for the customer_internal_id = customer_internal_id JOINS,
CREATE INDEX idx_contact_hash ON contact_information USING hash (customer_internal_id);
CREATE INDEX idx_address_hash ON address USING hash (customer_internal_id);
And the query time has reduced to 70ms or so. This is GREAT, but I recall something about hash indexes being frowned upon, or not suggested for use? Can anyone help out? Here is the new analyze/explain
"Sort (cost=119.30..119.33 rows=12 width=687) (actual time=0.082..0.082 rows=2 loops=1)"
" Sort Key: c.customer_internal_id, c0.contact_internal_id, c0.contact_type, a.address_internal_id, m.customer_internal_id, m.channel_id, c1.customer_internal_id, c1.channel_id, c1.type_id, e.customer_internal_id, e.card_number"
" Sort Method: quicksort Memory: 26kB"
" Buffers: shared hit=18"
" -> Nested Loop Left Join (cost=9.31..119.08 rows=12 width=687) (actual time=0.062..0.070 rows=2 loops=1)"
" Buffers: shared hit=18"
" -> Nested Loop Left Join (cost=5.01..59.47 rows=4 width=631) (actual time=0.054..0.059 rows=2 loops=1)"
" Buffers: shared hit=14"
" -> Nested Loop Left Join (cost=0.71..36.85 rows=2 width=506) (actual time=0.045..0.048 rows=2 loops=1)"
" Buffers: shared hit=10"
" -> Nested Loop Left Join (cost=0.71..24.79 rows=1 width=391) (actual time=0.039..0.040 rows=1 loops=1)"
" Buffers: shared hit=8"
" -> Nested Loop Left Join (cost=0.43..16.48 rows=1 width=318) (actual time=0.031..0.033 rows=1 loops=1)"
" Buffers: shared hit=6"
" -> Limit (cost=0.43..8.45 rows=1 width=170) (actual time=0.023..0.024 rows=1 loops=1)"
" Buffers: shared hit=4"
" -> Index Scan using idx_cust_mcaid on customer c (cost=0.43..8.45 rows=1 width=170) (actual time=0.022..0.022 rows=1 loops=1)"
" Index Cond: ((mca_id)::text = '2701159742879#priceline.com.au'::text)"
" Buffers: shared hit=4"
" -> Index Scan using idx_address_hash on address a (cost=0.00..8.02 rows=1 width=148) (actual time=0.006..0.006 rows=1 loops=1)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=2"
" -> Index Scan using external_cards_pkey on external_cards e (cost=0.28..8.30 rows=1 width=73) (actual time=0.006..0.006 rows=0 loops=1)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=2"
" -> Index Scan using idx_contact_hash on contact_information c0 (cost=0.00..12.04 rows=2 width=115) (actual time=0.004..0.005 rows=2 loops=1)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=2"
" -> Bitmap Heap Scan on content_type_preferences c1 (cost=4.30..11.29 rows=2 width=125) (actual time=0.004..0.004 rows=0 loops=2)"
" Recheck Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=4"
" -> Bitmap Index Scan on content_type_preferences_pkey (cost=0.00..4.30 rows=2 width=0) (actual time=0.002..0.002 rows=0 loops=2)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=4"
" -> Bitmap Heap Scan on marketing_preferences m (cost=4.31..14.87 rows=3 width=56) (actual time=0.004..0.004 rows=0 loops=2)"
" Recheck Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=4"
" -> Bitmap Index Scan on marketing_preferences_pkey (cost=0.00..4.31 rows=3 width=0) (actual time=0.002..0.002 rows=0 loops=2)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=4"
"Planning Time: 0.770 ms"
"Execution Time: 0.181 ms"
Up to version 9.6, hash indexes were not crash safe and so were discouraged for that reason. WAL was added to them in v10, so there is no longer anything wrong with using them.
Although they also shouldn't be necessary, and I see no reason the regular (btree) indexes were not being used. Are you sure they were actually present and marked as valid? Or maybe they were very bloated and could have been fixed with a REINDEX INDEX ... command, but it hard to see how they could be so bloated that they wouldn't have still been preferred over the seq scans.
Using PG 9.3 I have a report query that SELECTs phone call data.
If the current User is an 'admin', they can see all the calls.
Otherwise, the User can only see his own calls.
Hence we have (simplified)
create table phonecalls (phone_id int, destination varchar(100));
create table users (user_id int);
create table usergroups (user_id int, group_id int);
create table groups (group_id int, is_admin bool);
create table userphones (user_id int, phone_id int);
and the following permissions clause:
SELECT * FROM phonecalls
WHERE
CASE WHEN ( SELECT is_admin FROM users join usergroups using (user_id) join groups using (group_id) WHERE user_id = 1 )
THEN true
ELSE
exists ( SELECT phone_id FROM userphones
WHERE user_id = 1
AND userphones.phone_id = phonecalls.phone_id )
END
When the database has many, many records in it, performance is an issue.
What I'm finding is, if the user with user_id 1 is an admin, the query speeds up if I remove the ELSE part of the permissions clause, i.e.
ELSE
exists ( SELECT 1 )
END
But this seems to contradict the following statement from the Postgres documentation:
https://www.postgresql.org/docs/9.4/functions-conditional.html
A CASE expression does not evaluate any subexpressions that are not needed to determine the result.
If the User is an admin, the ELSE clause should have no effect on query execution time?
Am I misunderstanding?
EDIT Query plan output:
Seq Scan on phonecalls (cost=139.44..421294.43 rows=5000 width=10) (actual time=0.071..5.598 rows=10000 loops=1)
Filter: CASE WHEN $0 THEN true ELSE (alternatives: SubPlan 2 or hashed SubPlan 3) END
InitPlan 1 (returns $0)
-> Nested Loop (cost=36.89..139.44 rows=1538 width=1) (actual time=0.018..0.018 rows=0 loops=1)
-> Hash Join (cost=36.89..80.21 rows=128 width=5) (actual time=0.018..0.018 rows=0 loops=1)
Hash Cond: (groups.group_id = usergroups.group_id)
-> Seq Scan on groups (cost=0.00..33.30 rows=2330 width=5) (actual time=0.002..0.002 rows=1 loops=1)
-> Hash (cost=36.75..36.75 rows=11 width=8) (actual time=0.001..0.001 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 0kB
-> Seq Scan on usergroups (cost=0.00..36.75 rows=11 width=8) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (user_id = 1)
-> Materialize (cost=0.00..40.06 rows=12 width=4) (never executed)
-> Seq Scan on users (cost=0.00..40.00 rows=12 width=4) (never executed)
Filter: (user_id = 1)
SubPlan 2
-> Seq Scan on userphones (cost=0.00..42.10 rows=1 width=0) (never executed)
Filter: ((user_id = 1) AND (phone_id = phonecalls.phone_id))
SubPlan 3
-> Seq Scan on userphones userphones_1 (cost=0.00..36.75 rows=11 width=4) (actual time=0.009..0.010 rows=1 loops=1)
Filter: (user_id = 1)
Total runtime: 6.229 ms
EDIT 2 Query Plan for 'SELECT 1' option
"Result (cost=139.44..294.44 rows=10000 width=10) (actual time=0.044..3.713 rows=10000 loops=1)"
" One-Time Filter: CASE WHEN $0 THEN true ELSE $1 END"
" InitPlan 1 (returns $0)"
" -> Nested Loop (cost=36.89..139.44 rows=1538 width=1) (actual time=0.028..0.028 rows=0 loops=1)"
" -> Hash Join (cost=36.89..80.21 rows=128 width=5) (actual time=0.026..0.026 rows=0 loops=1)"
" Hash Cond: (groups.group_id = usergroups.group_id)"
" -> Seq Scan on groups (cost=0.00..33.30 rows=2330 width=5) (actual time=0.009..0.009 rows=1 loops=1)"
" -> Hash (cost=36.75..36.75 rows=11 width=8) (actual time=0.000..0.000 rows=0 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 0kB"
" -> Seq Scan on usergroups (cost=0.00..36.75 rows=11 width=8) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: (user_id = 1)"
" -> Materialize (cost=0.00..40.06 rows=12 width=4) (never executed)"
" -> Seq Scan on users (cost=0.00..40.00 rows=12 width=4) (never executed)"
" Filter: (user_id = 1)"
" InitPlan 2 (returns $1)"
" -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)"
" -> Seq Scan on phonecalls (cost=0.00..155.00 rows=10000 width=10) (actual time=0.012..1.502 rows=10000 loops=1)"
"Total runtime: 4.307 ms"
The difference is Filter vs. One-Time Filter.
In the first query, the condition in the CASE expression depends on phonecalls.phone_id from the sequential scan (even if that branch is never executed), so the filter will be applies to all 10000 result rows.
In the second query, the filter has to be evaluated only once; the query is run in an InitPlan that is executed before the main query is run.
These 10000 checks must make the difference.
If Case statement is in select/projection part then it does not have a considerable performance impact. if it is part of order by , group by, where or join a condition, it might not use proper index and may cause performance issues
I am looking for optimization suggestions for the below query on postgres. Not a DBA so looking for some expert advice in here.
Devices table holds device_id which are hexadecimal.
To achieve high throughput we run 6 instances of this query in parallel with pattern matching for device_id
beginning with [0-2], [3-5], [6-9], [a-c], [d-f]
When we run just one instance of the query it works fine, but with 6 instances we get error -
[6669]:FATAL: connection to client lost
explain analyze select notifications.id, notifications.status, events.alert_type,
events.id as event_id, events.payload, notifications.device_id as device_id,
device_endpoints.region, device_endpoints.device_endpoint as endpoint
from notifications
inner join events
on notifications.event_id = events.id
inner join devices
on notifications.device_id = devices.id
inner join device_endpoints
on devices.id = device_endpoints.device_id
where notifications.status = 'pending' AND notifications.region = 'ap-southeast-2'
AND devices.device_id ~ '[0-9a-f].*'
limit 10000;
Output of explain analyse
"Limit (cost=25.62..1349.23 rows=206 width=202) (actual time=0.359..0.359 rows=0 loops=1)"
" -> Nested Loop (cost=25.62..1349.23 rows=206 width=202) (actual time=0.357..0.357 rows=0 loops=1)"
" Join Filter: (notifications.device_id = devices.id)"
" -> Nested Loop (cost=25.33..1258.73 rows=206 width=206) (actual time=0.357..0.357 rows=0 loops=1)"
" -> Hash Join (cost=25.04..61.32 rows=206 width=52) (actual time=0.043..0.172 rows=193 loops=1)"
" Hash Cond: (notifications.event_id = events.id)"
" -> Index Scan using idx_notifications_status on notifications (cost=0.42..33.87 rows=206 width=16) (actual time=0.013..0.100 rows=193 loops=1)"
" Index Cond: (status = 'pending'::notification_status)"
" Filter: (region = 'ap-southeast-2'::text)"
" -> Hash (cost=16.50..16.50 rows=650 width=40) (actual time=0.022..0.022 rows=34 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 14kB"
" -> Seq Scan on events (cost=0.00..16.50 rows=650 width=40) (actual time=0.005..0.014 rows=34 loops=1)"
" -> Index Scan using idx_device_endpoints_device_id on device_endpoints (cost=0.29..5.80 rows=1 width=154) (actual time=0.001..0.001 rows=0 loops=193)"
" Index Cond: (device_id = notifications.device_id)"
" -> Index Scan using devices_pkey on devices (cost=0.29..0.43 rows=1 width=4) (never executed)"
" Index Cond: (id = device_endpoints.device_id)"
" Filter: (device_id ~ '[0-9a-f].*'::text)"
"Planning time: 0.693 ms"
"Execution time: 0.404 ms"