Optimizing DB Model : N2N relationship postgresql 12

Optimizing DB Model : N2N relationship postgresql 12 - postgresql

I have a very simple model. A table Articles has 10 relations : 2 are 12N, 8 are N2N.
The issue is that this table can contain 10K rows on average, and some of the N2N relation tables have more than 200 records.
All in all, querying the data from table Articles with all join records leads to a total of 10 849 200 000 000 records, that is obviously too much.
therefore I have to change the model to optimize it, maybe avoiding N2N relations but including some of the N2N data as JSONB column in Articles, but this may lead into complexity while updating/adding/deleting records in the N2N tables
What would be a better approach in your opinion ?
What is the best way to update/delete/add records in JSONB columns ?
running postgres 12 in GCloud
Thks
[EDIT]
here is the article table (on a data schema)
CREATE TABLE IF NOT EXISTS data.articles
(
id uuid NOT NULL DEFAULT uuid_generate_v4(),
title character varying(3000) COLLATE pg_catalog."default" DEFAULT ''::character varying,
"prioriteId" uuid,
"isAlerte" boolean DEFAULT false,
"categorieId" uuid,
"publicationStartDate" timestamp without time zone DEFAULT '2021-12-03 14:20:03.172'::timestamp without time zone,
"publicationEndDate" timestamp without time zone DEFAULT '2021-12-03 14:20:03.172'::timestamp without time zone,
"hasAction" boolean DEFAULT false,
"briefDescription" character varying(3000) COLLATE pg_catalog."default" DEFAULT ''::character varying,
content text COLLATE pg_catalog."default" DEFAULT ''::text,
contact1 character varying(300) COLLATE pg_catalog."default" DEFAULT ''::character varying,
contact2 character varying(300) COLLATE pg_catalog."default" DEFAULT ''::character varying,
author character varying(300) COLLATE pg_catalog."default" DEFAULT ''::character varying,
"recordCreationDate" timestamp without time zone NOT NULL DEFAULT now(),
"recordUpdateDate" timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT "PK_0a6e2c450d83e0b6052c2793334" PRIMARY KEY (id),
CONSTRAINT "FK_278d87b271a80d56e5d6cc0f888" FOREIGN KEY ("categorieId")
REFERENCES data.ref_categories (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT "FK_a55acd217d26e0d60f57b5f38f7" FOREIGN KEY ("prioriteId")
REFERENCES data.ref_priorites (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
TABLESPACE pg_default;
ALTER TABLE data.articles
OWNER to postgres;
-- Index: IDX_0a6e2c450d83e0b6052c279333
-- DROP INDEX data."IDX_0a6e2c450d83e0b6052c279333";
CREATE INDEX "IDX_0a6e2c450d83e0b6052c279333"
ON data.articles USING btree
(id ASC NULLS LAST)
TABLESPACE pg_default;
here is one of the 10 reference tables. All 10 tables have similar definition
CREATE TABLE IF NOT EXISTS data.ref_metiers
(
id uuid NOT NULL DEFAULT uuid_generate_v4(),
section character varying(300) COLLATE pg_catalog."default" NOT NULL,
key character varying(300) COLLATE pg_catalog."default" NOT NULL,
value character varying(300) COLLATE pg_catalog."default" NOT NULL,
parent character varying(300) COLLATE pg_catalog."default" NOT NULL,
"order" integer NOT NULL,
CONSTRAINT "PK_24035e57be39b22b5ee482f4a72" PRIMARY KEY (id)
)
TABLESPACE pg_default;
ALTER TABLE data.ref_metiers
OWNER to postgres;
-- Index: IDX_24035e57be39b22b5ee482f4a7
-- DROP INDEX data."IDX_24035e57be39b22b5ee482f4a7";
CREATE INDEX "IDX_24035e57be39b22b5ee482f4a7"
ON data.ref_metiers USING btree
(id ASC NULLS LAST)
TABLESPACE pg_default;
Here is one of the 8 tables to create the N2N relation
CREATE TABLE IF NOT EXISTS data.articles_metiers
(
id uuid NOT NULL DEFAULT uuid_generate_v4(),
"articleId" uuid NOT NULL,
"metierId" uuid NOT NULL,
CONSTRAINT "PK_62b37716d5cae9a5a9bee96c4da" PRIMARY KEY (id, "articleId", "metierId"),
CONSTRAINT "FK_43128a90c1671f1639d23ea3d5e" FOREIGN KEY ("articleId")
REFERENCES data.articles (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT "FK_f42358a7091bc79819c714987e4" FOREIGN KEY ("metierId")
REFERENCES data.ref_metiers (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
the query (very simple)
select
a.*,
m.id,
d.*,
ad.*,
f.key,
f.value,
me.key,
me.value,
pe.key,
pe.value,
po.key,
po.value,
ser.key,
ser.value,
con.key,
con.value
from data.articles a
inner join data.articles_magasins am
on a."id" = am."articleId"
inner join data.hypermarches_stores m
on am."magasinId" = m."id"
inner join data.articles_documents ad
on a."id" = ad."articleId"
inner join data.base_documents d
on ad."documentId" = d."id"
inner join data.articles_fonctions af
on a."id" = af."articleId"
inner join data.ref_fonctions f
on af."fonctionId" = f."id"
inner join data.articles_metiers ame
on a."id" = ame."articleId"
inner join data.ref_metiers me
on ame."metierId" = me."id"
inner join data.articles_perimetres ape
on a."id" = ape."articleId"
inner join data.ref_perimetres pe
on ape."perimetreId" = pe."id"
inner join data.articles_poles apo
on a."id" = apo."articleId"
inner join data.ref_poles po
on apo."poleId" = po."id"
inner join data.articles_services ase
on a."id" = ase."articleId"
inner join data.ref_services ser
on ase."serviceId" = ser."id"
inner join data.articles_statuts_contribution acon
on a."id" = acon."articleId"
inner join data.ref_statuts_contrib con
on acon."statutId" = con."id"
and a screenshot of the plan in PDAdmin (not sure what info is relevant)
[EDIT 2]
adding a where a.id="lklmklmkm" condition too get back only 1 article takes 7s (the dev server is very small to be honest but it shhould be faster anyway)
its explain analyze :
"Nested Loop (cost=2066.53..27008.53 rows=394240 width=6493) (actual time=92.243..341.860 rows=394240 loops=1)"
" -> Nested Loop (cost=0.00..259.68 rows=1 width=1048) (actual time=0.564..1.101 rows=1 loops=1)"
" Join Filter: (acon.""statutId"" = con.id)"
" Rows Removed by Join Filter: 5"
" -> Seq Scan on articles_statuts_contribution acon (cost=0.00..249.00 rows=1 width=32) (actual time=0.554..1.089 rows=1 loops=1)"
" Filter: (""articleId"" = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" Rows Removed by Filter: 9999"
" -> Seq Scan on ref_statuts_contrib con (cost=0.00..10.30 rows=30 width=1048) (actual time=0.006..0.007 rows=6 loops=1)"
" -> Nested Loop (cost=2066.53..22806.46 rows=394240 width=5461) (actual time=91.676..247.043 rows=394240 loops=1)"
" -> Hash Join (cost=2066.53..14342.47 rows=7040 width=2365) (actual time=77.916..87.851 rows=7040 loops=1)"
" Hash Cond: (ame.""metierId"" = me.id)"
" -> Nested Loop (cost=2055.86..14310.70 rows=7040 width=1349) (actual time=77.880..84.789 rows=7040 loops=1)"
" -> Nested Loop (cost=1000.57..7340.93 rows=176 width=1333) (actual time=36.323..40.931 rows=176 loops=1)"
" -> Gather (cost=1000.28..5352.65 rows=22 width=164) (actual time=30.476..30.682 rows=22 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Nested Loop (cost=0.28..4350.45 rows=9 width=164) (actual time=17.455..20.764 rows=7 loops=3)"
" -> Parallel Seq Scan on articles_documents ad (cost=0.00..4288.83 rows=9 width=81) (actual time=17.434..20.702 rows=7 loops=3)"
" Filter: (""articleId"" = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" Rows Removed by Filter: 73326"
" -> Index Scan using ""IDX_87eab207e6374a967ae94feb8e"" on base_documents d (cost=0.28..6.85 rows=1 width=83) (actual time=0.007..0.007 rows=1 loops=22)"
" Index Cond: (id = ad.""documentId"")"
" -> Materialize (cost=0.29..1986.11 rows=8 width=1169) (actual time=0.266..0.451 rows=8 loops=22)"
" -> Nested Loop (cost=0.29..1986.07 rows=8 width=1169) (actual time=5.830..9.852 rows=8 loops=1)"
" -> Nested Loop (cost=0.29..237.99 rows=1 width=1153) (actual time=0.767..1.265 rows=1 loops=1)"
" Join Filter: (af.""fonctionId"" = f.id)"
" Rows Removed by Join Filter: 2"
" -> Nested Loop (cost=0.29..227.31 rows=1 width=137) (actual time=0.755..1.251 rows=1 loops=1)"
" -> Index Scan using ""IDX_0a6e2c450d83e0b6052c279333"" on articles a (cost=0.29..8.30 rows=1 width=121) (actual time=0.021..0.028 rows=1 loops=1)"
" Index Cond: (id = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" -> Seq Scan on articles_fonctions af (cost=0.00..219.00 rows=1 width=32) (actual time=0.730..1.217 rows=1 loops=1)"
" Filter: (""articleId"" = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" Rows Removed by Filter: 9999"
" -> Seq Scan on ref_fonctions f (cost=0.00..10.30 rows=30 width=1048) (actual time=0.008..0.008 rows=3 loops=1)"
" -> Seq Scan on articles_metiers ame (cost=0.00..1748.00 rows=8 width=32) (actual time=5.057..8.576 rows=8 loops=1)"
" Filter: (""articleId"" = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" Rows Removed by Filter: 79992"
" -> Materialize (cost=1055.29..6881.87 rows=40 width=32) (actual time=0.236..0.239 rows=40 loops=176)"
" -> Gather (cost=1055.29..6881.67 rows=40 width=32) (actual time=41.545..41.620 rows=40 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Hash Join (cost=55.29..5877.67 rows=17 width=32) (actual time=29.068..35.769 rows=13 loops=3)"
" Hash Cond: (am.""magasinId"" = m.id)"
" -> Parallel Seq Scan on articles_magasins am (cost=0.00..5822.33 rows=17 width=32) (actual time=28.870..35.563 rows=13 loops=3)"
" Filter: (""articleId"" = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" Rows Removed by Filter: 133320"
" -> Hash (cost=52.35..52.35 rows=235 width=16) (actual time=0.302..0.303 rows=235 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 20kB"
" -> Seq Scan on hypermarches_stores m (cost=0.00..52.35 rows=235 width=16) (actual time=0.062..0.214 rows=235 loops=1)"
" -> Hash (cost=10.30..10.30 rows=30 width=1048) (actual time=0.023..0.024 rows=41 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 12kB"
" -> Seq Scan on ref_metiers me (cost=0.00..10.30 rows=30 width=1048) (actual time=0.007..0.013 rows=41 loops=1)"
" -> Materialize (cost=0.00..3536.13 rows=56 width=3144) (actual time=0.002..0.006 rows=56 loops=7040)"
" -> Nested Loop (cost=0.00..3535.85 rows=56 width=3144) (actual time=13.749..17.341 rows=56 loops=1)"
" -> Nested Loop (cost=0.00..1761.92 rows=8 width=1048) (actual time=5.872..9.430 rows=8 loops=1)"
" Join Filter: (ape.""perimetreId"" = pe.id)"
" Rows Removed by Join Filter: 56"
" -> Seq Scan on ref_perimetres pe (cost=0.00..10.30 rows=30 width=1048) (actual time=0.018..0.020 rows=8 loops=1)"
" -> Materialize (cost=0.00..1748.04 rows=8 width=32) (actual time=0.731..1.174 rows=8 loops=8)"
" -> Seq Scan on articles_perimetres ape (cost=0.00..1748.00 rows=8 width=32) (actual time=5.844..9.379 rows=8 loops=1)"
" Filter: (""articleId"" = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" Rows Removed by Filter: 79992"
" -> Materialize (cost=0.00..1773.25 rows=7 width=2096) (actual time=0.984..0.987 rows=7 loops=8)"
" -> Nested Loop (cost=0.00..1773.21 rows=7 width=2096) (actual time=7.866..7.880 rows=7 loops=1)"
" Join Filter: (ase.""serviceId"" = ser.id)"
" Rows Removed by Join Filter: 77"
" -> Seq Scan on ref_services ser (cost=0.00..10.30 rows=30 width=1048) (actual time=0.009..0.011 rows=12 loops=1)"
" -> Materialize (cost=0.00..1759.78 rows=7 width=1080) (actual time=0.350..0.654 rows=7 loops=12)"
" -> Nested Loop (cost=0.00..1759.74 rows=7 width=1080) (actual time=4.190..7.830 rows=7 loops=1)"
" -> Nested Loop (cost=0.00..229.68 rows=1 width=1048) (actual time=0.551..1.048 rows=1 loops=1)"
" Join Filter: (apo.""poleId"" = po.id)"
" Rows Removed by Join Filter: 2"
" -> Seq Scan on articles_poles apo (cost=0.00..219.00 rows=1 width=32) (actual time=0.543..1.038 rows=1 loops=1)"
" Filter: (""articleId"" = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" Rows Removed by Filter: 9999"
" -> Seq Scan on ref_poles po (cost=0.00..10.30 rows=30 width=1048) (actual time=0.005..0.005 rows=3 loops=1)"
" -> Seq Scan on articles_services ase (cost=0.00..1530.00 rows=7 width=32) (actual time=3.637..6.776 rows=7 loops=1)"
" Filter: (""articleId"" = '000b827c-7a6a-4430-a28d-827c286983a5'::uuid)"
" Rows Removed by Filter: 69993"
"Planning Time: 2.550 ms"
"Execution Time: 362.384 ms"
[update 3]
Have applied some configs suggested by #sticky bit :
remove the unnecessary id column to the N2N tables
add some indexes on the N2N Tables (articleId and metierId) in the previous example
apply ANALYZE to update the statistics
Still no luck
Have then tried to find which join has an issue.
I found this one that takes 3s to retrieve 400K rows (again the dev server is very small, but both CPU and Memory are not full at all, this is a managed postgres on GCP ):
select
a.*,
m.id
from data.articles a
inner join data.articles_magasins am
on a."id" = am."articleId"
inner join data.hypermarches_stores m
on am."magasinId" = m."id"
articles_magasins has 400 000 rows
data.articles has 10 000 rows
data.hypermarches_stores has 235 rows
articles_magasins has an index on articleId and another index on magasinId
both data.articles and data.hypermarches_stores have indexes on id
However the execution plan doesn't use any of the indexes
"Hash Join (cost=473.29..9534.67 rows=400000 width=137)"
" Hash Cond: (am.""magasinId"" = m.id)"
" -> Hash Join (cost=418.00..8410.44 rows=400000 width=137)"
" Hash Cond: (am.""articleId"" = a.id)"
" -> Seq Scan on articles_magasins am (cost=0.00..6942.00 rows=400000 width=32)"
" -> Hash (cost=293.00..293.00 rows=10000 width=121)"
" -> Seq Scan on articles a (cost=0.00..293.00 rows=10000 width=121)"
" -> Hash (cost=52.35..52.35 rows=235 width=16)"
" -> Seq Scan on hypermarches_stores m (cost=0.00..52.35 rows=235 width=16)"
What's wrong ?

Related

How to speed up the query in GCP PostgreSQL

My postgres server details are below using
postgres 13 version, RAM 52GB,SSD 1000GB And the DB size is 300GB
Here is my Query
select distinct "col2","col3","col1"
from table1(foreign table)
where "col2" not in (select "col4"
from table2(foreign table)
where "col9" = 'data1'
and "col10"='A')
and "col2" not in (select "col13"
from table5(foreign table)
where "col11" = 'A'
and "col12" in ('data1', 'data2', 'data3', 'data4'))
and "col6" > '2022-01-01' and "col10" = 'A' and "col18" = 'P'
and not "col7" = 'V' and "Type" = 'A'
order by "col1"
Here is my Explain Plan
"Unique (cost=372.13..372.14 rows=1 width=1074) (actual time=145329.010..145329.136 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Buffers: shared hit=3"
" -> Sort (cost=372.13..372.14 rows=1 width=1074) (actual time=145329.008..145329.027 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Sort Key: table1.""col1"", table1.""col2"", table1.""col3"""
" Sort Method: quicksort Memory: 63kB"
" Buffers: shared hit=3"
" -> Foreign Scan on public.table1 (cost=360.38..372.12 rows=1 width=1074) (actual time=144430.980..145327.532 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)))"
" Rows Removed by Filter: 253144"
" Remote SQL: SELECT ""col2"", ""col3"", ""col1"" FROM dbo.table4 WHERE ((""col6"" > '2022-01-01 00:00:00'::timestamp without time zone)) AND ((""col7"" <> 'V'::text)) AND ((""col8"" = 'A'::text))"
" SubPlan 1"
" -> Foreign Scan on public.table2 (cost=100.00..128.63 rows=1 width=42) (actual time=2.169..104702.862 rows=50573365 loops=1)"
" Output: table2.""col4"""
" Remote SQL: SELECT ""col5"" FROM dbo.table3 WHERE ((""col9"" = 'data1'::text)) AND ((""col10"" = 'A'::text))"
" SubPlan 2"
" -> Foreign Scan on public.table5 (cost=100.00..131.74 rows=1 width=42) (actual time=75.363..1015.498 rows=360240 loops=1)"
" Output: table5.""col13"""
" Remote SQL: SELECT ""col14"" FROM dbo.table6 WHERE ((""col11"" = 'A'::text)) AND ((""col12"" = ANY ('{data1,data2,data3,data4}'::text[])))"
"Planning:"
" Buffers: shared hit=142"
"Planning Time: 1.887 ms"
"Execution Time: 145620.958 ms"
table1 - 4mln row count
table2 - 250mln row count
table3 - 400mln row count
Table Definition table1
CREATE TABLE IF NOT EXISTS table1
(
"col1" character varying(12) ,
"col" character varying(1) ,
"col" character varying(1) ,
...
...
);
Indexes are exist on other column not having on query columns "col2","col3","col1"
Table Definition table2
CREATE TABLE IF NOT EXISTS table2
(
"col4" character varying(12) ,
"col9" character varying(1) ,
"col10" character varying(1) ,
...
...
);
Indexes are exist on table2
CREATE INDEX index1 ON table2("col4" ASC,"col9" ASC,"col" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col" ASC,"col9" ASC,"col4" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col9" ASC,"col4" ASC,"col" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col" ASC,"col9" ASC,"col10" ASC,"col" ASC);
Table Definition table5
CREATE TABLE IF NOT EXISTS table5
(
"col11" character varying(12) ,
"col13" character varying(1) ,
"col" character varying(1) ,
...
...
);
Indexes are exist on table5
CREATE INDEX index ON table5("col" ASC, "col" ASC,"col11" ASC);
CREATE INDEX index ON table5("col13" ASC,"col11" ASC);
CREATE INDEX index ON table5("col" ASC,"col13" ASC,"col11" ASC)INCLUDE ("col");
CREATE INDEX index ON table5("col" ASC, "col" ASC,"col11" ASC);
How to speed up the following query execution? it took 3 minutes just to retrieve 365 records.
Here is my EXPLAIN (ANALYZE, BUFFERS)
"Unique (cost=372.13..372.14 rows=1 width=1074) (actual time=110631.114..110631.262 rows=336 loops=1)"
" -> Sort (cost=372.13..372.14 rows=1 width=1074) (actual time=110631.111..110631.142 rows=336 loops=1)"
" Sort Key: table1.""col1"", table1.""col2"", table1.""col3"""
" Sort Method: quicksort Memory: 63kB"
" -> Foreign Scan on table1 (cost=360.38..372.12 rows=1 width=1074) (actual time=110432.132..110629.640 rows=336 loops=1)"
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)))"
" Rows Removed by Filter: 253144"
" SubPlan 1"
" -> Foreign Scan on table2 (cost=100.00..128.63 rows=1 width=42) (actual time=63638.173..71979.772 rows=50573365 loops=1)"
" SubPlan 2"
" -> Foreign Scan on table5 (cost=100.00..131.74 rows=1 width=42) (actual time=569.126..630.782 rows=360240 loops=1)"
"Planning Time: 0.266 ms"
"Execution Time: 111748.715 ms"
Here is my EXPLAIN (ANALYZE, BUFFERS) of the "remote SQL" when executed on the remote database
"Limit (cost=4157478.69..4157602.66 rows=1000 width=47) (actual time=68356.908..68681.831 rows=336 loops=1)"
" Buffers: shared hit=66205118"
" -> Unique (cost=4157478.69..4164948.04 rows=60253 width=47) (actual time=68356.905..68681.801 rows=336 loops=1)"
" Buffers: shared hit=66205118"
" -> Gather Merge (cost=4157478.69..4164496.14 rows=60253 width=47) (actual time=68356.901..68681.718 rows=336 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=66205118"
" -> Sort (cost=4156478.66..4156541.43 rows=25105 width=47) (actual time=66154.447..66154.459 rows=112 loops=3)"
" Sort Key: table4.""col1"", table4.""col2"", table4.""col3"""
" Sort Method: quicksort Memory: 63kB"
" Buffers: shared hit=66205118"
" Worker 0: Sort Method: quicksort Memory: 25kB"
" Worker 1: Sort Method: quicksort Memory: 25kB"
" -> Parallel Seq Scan on table4 (cost=3986703.25..4154644.03 rows=25105 width=47) (actual time=66041.929..66153.663 rows=112 loops=3)"
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND (""col6"" > '2022-01-01 00:00:00'::timestamp without time zone) AND ((""col7"")::text <> 'V'::text) AND ((""col8"")::text = 'A'::text))"
" Rows Removed by Filter: 1236606"
" Buffers: shared hit=66205102"
" SubPlan 1"
" -> Index Only Scan using col20 on table3 (cost=0.70..2696555.01 rows=50283867 width=13) (actual time=0.134..25085.583 rows=50573365 loops=3)"
" Index Cond: ((""col9"" = 'data1'::text) AND (""col10"" = 'A'::text))"
" Heap Fetches: 0"
" Buffers: shared hit=65737946"
" SubPlan 2"
" -> Bitmap Heap Scan on table6 (cost=4962.91..1163549.12 rows=355779 width=13) (actual time=160.770..440.978 rows=360240 loops=3)"
" Recheck Cond: (((""col12"")::text = ANY ('{data1,data2,data3,data4}'::text[])) AND ((""col11"")::text = 'A'::text))"
" Heap Blocks: exact=110992"
" Buffers: shared hit=333992"
" -> Bitmap Index Scan on col21 (cost=0.00..4873.97 rows=355779 width=0) (actual time=120.354..120.354 rows=360240 loops=3)"
" Index Cond: (((""col12"")::text = ANY ('{data1,data2,data3,data4}'::text[])) AND ((""col11"")::text = 'A'::text))"
" Buffers: shared hit=1016"
"Planning:"
" Buffers: shared hit=451"
"Planning Time: 4.039 ms"
"Execution Time: 69001.171 ms"

postgres optimize scope search query

I use postgres 10.4 in linux environment .
I have a query which is very slow when I run it.
the search using subject with arabic language is very slow.
also the problem exist with subject in different language.
I have about 1 million record per year.
I try to add index in the table transfer but the result is the same
CREATE INDEX subject
ON public.transfer
USING btree
(subject COLLATE pg_catalog."default" varchar_pattern_ops);
this is the query .
select * from ( with scope as (
select unit_id from public.sec_unit
where emp_id= 'EM-001'and app_type in ('S','E') )
select CAST (row_number() OVER (PARTITION BY advsearch.correspondenceid)
as VARCHAR(15)) as numline, advsearch.*
from (
SELECT Transfer.id AS id, CORRESP.id AS correspondenceId,
Transfer.correspondencecopy_id AS correspondencecopyId, Transfer.datesendjctransfer AS datesendjctransfer
FROM Transfer Transfer
Left outer JOIN correspondencecopy CORRESPCPY ON Transfer.correspondencecopy_id = CORRESPCPY.id
Left outer JOIN correspondence CORRESP ON CORRESP.id = CORRESPCPY.correspondence_id
LEFT OUTER JOIN scope sc on sc.unit_id = Transfer.unittransto_id or sc.unit_id='allorg'
LEFT OUTER JOIN employee emp on emp.id = 'EM-001'
WHERE transfer.status='1' AND (Transfer.docyearhjr='1441' )
AND (Transfer.subject like '%'||'رقم'||'%')
AND ( sc.unit_id is not null )
AND (coalesce(emp.confidentiel,'0') >= coalesce(Transfer.confidentiel,'0'))
)
advsearch ) Searchlist
WHERE Searchlist.numline='1'
ORDER BY Searchlist.datesendjctransfer
can someone help me to optimise the query
updated
I try to change the query.
I eliminate the use of scope.
I change it by simple condition.
I have the same result ( the same number of record )
but the problem is the same : the query is still very slow
select * from (
select CAST (row_number() OVER (PARTITION BY advsearch.correspondenceid)
as VARCHAR(15)) as numline, advsearch.*
from (
SELECT Transfer.id AS id, CORRESP.id AS correspondenceId,
Transfer.correspondencecopy_id AS correspondencecopyId, Transfer.datesendjctransfer AS datesendjctransfer
FROM Transfer Transfer
Left outer JOIN correspondencecopy CORRESPCPY ON Transfer.correspondencecopy_id = CORRESPCPY.id
Left outer JOIN correspondence CORRESP ON CORRESP.id = CORRESPCPY.correspondence_id
LEFT OUTER JOIN employee emp on emp.id = 'EM-001'
WHERE transfer.status='1' and ( Transfer.unittransto_id in (
select unit_id from public.security_employee_unit
where employee_id= 'EM-001'and app_type in ('E','S') )
or 'allorg' in ( select unit_id from public.security_employee_unit
where employee_id= 'EM-001'and app_type in ('S')))
AND (Transfer.docyearhjr='1441' )
AND (Transfer.subject like '%'||'رقم'||'%')
AND (coalesce(emp.confidentiel,'0') >= coalesce(Transfer.confidentiel,'0'))
)
advsearch ) Searchlist
WHERE Searchlist.numline='1'
ORDER BY Searchlist.datesendjctransfer
Updated :
I try to analyze the query using EXPLAIN ANALYZE
this the result :
"Sort (cost=412139.09..412139.13 rows=17 width=87) (actual time=1481.951..1482.166 rows=4497 loops=1)"
" Sort Key: searchlist.datesendjctransfer"
" Sort Method: quicksort Memory: 544kB"
" -> Subquery Scan on searchlist (cost=412009.59..412138.74 rows=17 width=87) (actual time=1457.717..1480.381 rows=4497 loops=1)"
" Filter: ((searchlist.numline)::text = '1'::text)"
" Rows Removed by Filter: 38359"
" -> WindowAgg (cost=412009.59..412095.69 rows=3444 width=87) (actual time=1457.715..1477.146 rows=42856 loops=1)"
" CTE scope"
" -> Bitmap Heap Scan on security_employee_unit (cost=8.59..15.83 rows=2 width=7) (actual time=0.043..0.058 rows=2 loops=1)"
" Recheck Cond: (((employee_id)::text = 'EM-001'::text) AND ((app_type)::text = ANY ('{SE,I}'::text[])))"
" Heap Blocks: exact=2"
" -> Bitmap Index Scan on employeeidkey (cost=0.00..8.59 rows=2 width=0) (actual time=0.037..0.037 rows=2 loops=1)"
" Index Cond: (((employee_id)::text = 'EM-001'::text) AND ((app_type)::text = ANY ('{SE,I}'::text[])))"
" -> Sort (cost=411993.77..412002.38 rows=3444 width=39) (actual time=1457.702..1461.773 rows=42856 loops=1)"
" Sort Key: corresp.id"
" Sort Method: external merge Disk: 2440kB"
" -> Nested Loop Left Join (cost=18315.99..411791.43 rows=3444 width=39) (actual time=1271.209..1295.423 rows=42856 loops=1)"
" Filter: ((COALESCE(emp.confidentiel, '0'::character varying))::text >= (COALESCE(transfer.confidentiel, '0'::character varying))::text)"
" -> Nested Loop (cost=18315.71..411628.14 rows=10333 width=41) (actual time=1271.165..1283.365 rows=42856 loops=1)"
" Join Filter: (((sc.unit_id)::text = (transfer.unittransto_id)::text) OR ((sc.unit_id)::text = 'allorg'::text))"
" Rows Removed by Join Filter: 42856"
" -> CTE Scan on scope sc (cost=0.00..0.04 rows=2 width=48) (actual time=0.045..0.064 rows=2 loops=1)"
" Filter: (unit_id IS NOT NULL)"
" -> Materialize (cost=18315.71..411292.44 rows=10328 width=48) (actual time=53.970..635.651 rows=42856 loops=2)"
" -> Gather (cost=18315.71..411240.80 rows=10328 width=48) (actual time=107.919..1254.600 rows=42856 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Nested Loop Left Join (cost=17315.71..409208.00 rows=4303 width=48) (actual time=104.436..1250.461 rows=14285 loops=3)"
" -> Nested Loop Left Join (cost=17315.28..405979.02 rows=4303 width=48) (actual time=104.382..1136.591 rows=14285 loops=3)"
" -> Parallel Bitmap Heap Scan on transfer (cost=17314.85..377306.25 rows=4303 width=39) (actual time=104.287..996.609 rows=14285 loops=3)"
" Recheck Cond: ((docyearhjr)::text = '1441'::text)"
" Rows Removed by Index Recheck: 437299"
" Filter: (((subject)::text ~~ '%رقم%'::text) AND ((status)::text = '1'::text))"
" Rows Removed by Filter: 297178"
" Heap Blocks: exact=14805 lossy=44734"
" -> Bitmap Index Scan on docyearhjr (cost=0.00..17312.27 rows=938112 width=0) (actual time=96.028..96.028 rows=934389 loops=1)"
" Index Cond: ((docyearhjr)::text = '1441'::text)"
" -> Index Scan using pk_correspondencecopy on correspondencecopy correspcpy (cost=0.43..6.66 rows=1 width=21) (actual time=0.009..0.009 rows=1 loops=42856)"
" Index Cond: ((transfer.correspondencecopy_id)::text = (id)::text)"
" -> Index Only Scan using pk_correspondence on correspondence corresp (cost=0.42..0.75 rows=1 width=9) (actual time=0.007..0.007 rows=1 loops=42856)"
" Index Cond: (id = (correspcpy.correspondence_id)::text)"
" Heap Fetches: 14227"
" -> Materialize (cost=0.28..8.31 rows=1 width=2) (actual time=0.000..0.000 rows=1 loops=42856)"
" -> Index Scan using pk_employee on employee emp (cost=0.28..8.30 rows=1 width=2) (actual time=0.038..0.038 rows=1 loops=1)"
" Index Cond: ((id)::text = 'EM-001'::text)"
"Planning time: 1.595 ms"
"Execution time: 1487.303 ms"

Improve performance for SELECT statement to retrieve single record from table with 1.1 million rows (including related tables) PostgreSQL

Disclaimer: I am using Entity Framework Core, so I am somewhat restricted in what shape the query below can take.
I have a large customer database (1.1 million records) and am using an API to select a single customer by mca_id and bring back the customer and all their related data from contact_information, address, and so on. It's currently taking around 5 - 700ms which seems very slow for retrieving a single record.
Here is the query (designed to bring back all information related to this customer. Note that Entity Framework Core (.NET/C#) enforces the ORDER BY at the bottom, so there's not much I can do about that).
SELECT
t.customer_internal_id,
t.business_partner_id,
t.created_date,
t.customer_type,
t.date_of_birth,
t.first_name,
t.gender,
t.home_store_id,
t.home_store_updated,
t.last_name,
t.loyalty_db_id,
t.mca_id,
t.status,
t.status_reason,
t.store_joined,
t.title,
t.updated_by,
t.updated_date,
t.updating_store,
c0.contact_internal_id,
c0.contact_type,
c0.contact_value,
c0.created_date,
c0.customer_internal_id,
c0.updated_by,
c0.updated_date,
c0.updating_store,
c0.validated,
a.address_internal_id,
a.address_line_1,
a.address_line_2,
a.address_type,
a.address_undeliverable,
a.address_validated,
a.country,
a.created_date,
a.customer_internal_id,
a.postcode,
a.region,
a.suburb,
a.updated_by,
a.updated_date,
a.updating_store,
m.customer_internal_id,
m.channel_id,
m.created_date,
m.opt_in,
m.updated_by,
m.updated_date,
m.updating_store,
m.valid_from_date,
c1.customer_internal_id,
c1.channel_id,
c1.type_id,
c1.created_date,
c1.opt_in,
c1.updated_by,
c1.updated_date,
c1.updating_store,
c1.valid_from_date,
e.customer_internal_id,
e.card_number,
e.card_design,
e.card_status,
e.card_type,
e.created_date,
e.updated_by,
e.updated_date,
e.updating_store
FROM
(
SELECT
c.customer_internal_id,
c.business_partner_id,
c.created_date,
c.customer_type,
c.date_of_birth,
c.first_name,
c.gender,
c.home_store_id,
c.home_store_updated,
c.last_name,
c.loyalty_db_id,
c.mca_id,
c.status,
c.status_reason,
c.store_joined,
c.title,
c.updated_by,
c.updated_date,
c.updating_store
FROM
customer AS c
WHERE
c.mca_id = '2701159742879#priceline.com.au'
LIMIT
1
) AS t
LEFT JOIN contact_information AS c0 ON t.customer_internal_id = c0.customer_internal_id
LEFT JOIN address AS a ON t.customer_internal_id = a.customer_internal_id
LEFT JOIN marketing_preferences AS m ON t.customer_internal_id = m.customer_internal_id
LEFT JOIN content_type_preferences AS c1 ON t.customer_internal_id = c1.customer_internal_id
LEFT JOIN external_cards AS e ON t.customer_internal_id = e.customer_internal_id
ORDER BY
t.customer_internal_id,
c0.contact_internal_id,
c0.contact_type,
a.address_internal_id,
m.customer_internal_id,
m.channel_id,
c1.customer_internal_id,
c1.channel_id,
c1.type_id,
e.customer_internal_id,
e.card_number
The following are the primary / foreign keys:
Customer PRIMARY KEY ("customer_internal_id")
Address PRIMARY KEY ("address_internal_id"), CONSTRAINT
"address_customer_internal_id_fkey" FOREIGN KEY
("customer_internal_id") REFERENCES "public"."customer"
("customer_internal_id")
Contact_Information PRIMARY KEY ("contact_internal_id", "contact_type"),
CONSTRAINT "contact_information_customer_internal_id_fkey" FOREIGN KEY ("customer_internal_id") REFERENCES "public"."customer" ("customer_internal_id")
External_Cards PRIMARY KEY ("customer_internal_id", "card_number"),
CONSTRAINT "external_cards_customer_internal_id_fkey" FOREIGN KEY ("customer_internal_id") REFERENCES "public"."customer" ("customer_internal_id")
The following indexes are in place:
CREATE INDEX idx_cust_contact ON contact_information (customer_internal_id);
CREATE INDEX idx_cust_address ON address (customer_internal_id);
CREATE INDEX idx_cust_mkpref ON marketing_preferences (customer_internal_id);
CREATE INDEX idx_cust_content ON content_type_preferences (customer_internal_id);
CREATE INDEX idx_cust_cards ON external_cards (customer_internal_id);
CREATE INDEX idx_cust_mcaid ON customer (mca_id);
This is the EXPLAIN from the query:
"Sort (cost=103957.16..103957.20 rows=18 width=687)"
" Sort Key: c.customer_internal_id, c0.contact_internal_id, c0.contact_type, a.address_internal_id, m.customer_internal_id, m.channel_id, c1.customer_internal_id, c1.channel_id, c1.type_id, e.customer_internal_id, e.card_number"
" -> Nested Loop Left Join (cost=35817.63..103956.78 rows=18 width=687)"
" -> Nested Loop Left Join (cost=35813.32..103867.36 rows=6 width=631)"
" -> Nested Loop Left Join (cost=35809.02..103833.42 rows=3 width=506)"
" -> Hash Right Join (cost=35808.74..103808.50 rows=3 width=433)"
" Hash Cond: (c0.customer_internal_id = c.customer_internal_id)"
" -> Seq Scan on contact_information c0 (cost=0.00..59117.35 rows=2368635 width=115)"
" -> Hash (cost=35808.73..35808.73 rows=1 width=318)"
" -> Hash Right Join (cost=8.47..35808.73 rows=1 width=318)"
" Hash Cond: (a.customer_internal_id = c.customer_internal_id)"
" -> Seq Scan on address a (cost=0.00..31425.09 rows=1166709 width=148)"
" -> Hash (cost=8.46..8.46 rows=1 width=170)"
" -> Limit (cost=0.43..8.45 rows=1 width=170)"
" -> Index Scan using idx_cust_mcaid on customer c (cost=0.43..8.45 rows=1 width=170)"
" Index Cond: ((mca_id)::text = '2701159742879#priceline.com.au'::text)"
" -> Index Scan using external_cards_pkey on external_cards e (cost=0.28..8.30 rows=1 width=73)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" -> Bitmap Heap Scan on content_type_preferences c1 (cost=4.30..11.29 rows=2 width=125)"
" Recheck Cond: (c.customer_internal_id = customer_internal_id)"
" -> Bitmap Index Scan on content_type_preferences_pkey (cost=0.00..4.30 rows=2 width=0)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" -> Bitmap Heap Scan on marketing_preferences m (cost=4.31..14.87 rows=3 width=56)"
" Recheck Cond: (c.customer_internal_id = customer_internal_id)"
" -> Bitmap Index Scan on marketing_preferences_pkey (cost=0.00..4.31 rows=3 width=0)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
It seems most of the cost is in these nested loop joins, but I'm not sure how to attack that problem. I did have a seq scan on customer originally, but I addressed that with an index on mca_id but it made virtually no difference to the execution time.
EDIT: Update. I added a couple of hash indexes to cater for the customer_internal_id = customer_internal_id JOINS,
CREATE INDEX idx_contact_hash ON contact_information USING hash (customer_internal_id);
CREATE INDEX idx_address_hash ON address USING hash (customer_internal_id);
And the query time has reduced to 70ms or so. This is GREAT, but I recall something about hash indexes being frowned upon, or not suggested for use? Can anyone help out? Here is the new analyze/explain
"Sort (cost=119.30..119.33 rows=12 width=687) (actual time=0.082..0.082 rows=2 loops=1)"
" Sort Key: c.customer_internal_id, c0.contact_internal_id, c0.contact_type, a.address_internal_id, m.customer_internal_id, m.channel_id, c1.customer_internal_id, c1.channel_id, c1.type_id, e.customer_internal_id, e.card_number"
" Sort Method: quicksort Memory: 26kB"
" Buffers: shared hit=18"
" -> Nested Loop Left Join (cost=9.31..119.08 rows=12 width=687) (actual time=0.062..0.070 rows=2 loops=1)"
" Buffers: shared hit=18"
" -> Nested Loop Left Join (cost=5.01..59.47 rows=4 width=631) (actual time=0.054..0.059 rows=2 loops=1)"
" Buffers: shared hit=14"
" -> Nested Loop Left Join (cost=0.71..36.85 rows=2 width=506) (actual time=0.045..0.048 rows=2 loops=1)"
" Buffers: shared hit=10"
" -> Nested Loop Left Join (cost=0.71..24.79 rows=1 width=391) (actual time=0.039..0.040 rows=1 loops=1)"
" Buffers: shared hit=8"
" -> Nested Loop Left Join (cost=0.43..16.48 rows=1 width=318) (actual time=0.031..0.033 rows=1 loops=1)"
" Buffers: shared hit=6"
" -> Limit (cost=0.43..8.45 rows=1 width=170) (actual time=0.023..0.024 rows=1 loops=1)"
" Buffers: shared hit=4"
" -> Index Scan using idx_cust_mcaid on customer c (cost=0.43..8.45 rows=1 width=170) (actual time=0.022..0.022 rows=1 loops=1)"
" Index Cond: ((mca_id)::text = '2701159742879#priceline.com.au'::text)"
" Buffers: shared hit=4"
" -> Index Scan using idx_address_hash on address a (cost=0.00..8.02 rows=1 width=148) (actual time=0.006..0.006 rows=1 loops=1)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=2"
" -> Index Scan using external_cards_pkey on external_cards e (cost=0.28..8.30 rows=1 width=73) (actual time=0.006..0.006 rows=0 loops=1)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=2"
" -> Index Scan using idx_contact_hash on contact_information c0 (cost=0.00..12.04 rows=2 width=115) (actual time=0.004..0.005 rows=2 loops=1)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=2"
" -> Bitmap Heap Scan on content_type_preferences c1 (cost=4.30..11.29 rows=2 width=125) (actual time=0.004..0.004 rows=0 loops=2)"
" Recheck Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=4"
" -> Bitmap Index Scan on content_type_preferences_pkey (cost=0.00..4.30 rows=2 width=0) (actual time=0.002..0.002 rows=0 loops=2)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=4"
" -> Bitmap Heap Scan on marketing_preferences m (cost=4.31..14.87 rows=3 width=56) (actual time=0.004..0.004 rows=0 loops=2)"
" Recheck Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=4"
" -> Bitmap Index Scan on marketing_preferences_pkey (cost=0.00..4.31 rows=3 width=0) (actual time=0.002..0.002 rows=0 loops=2)"
" Index Cond: (c.customer_internal_id = customer_internal_id)"
" Buffers: shared hit=4"
"Planning Time: 0.770 ms"
"Execution Time: 0.181 ms"

Up to version 9.6, hash indexes were not crash safe and so were discouraged for that reason. WAL was added to them in v10, so there is no longer anything wrong with using them.
Although they also shouldn't be necessary, and I see no reason the regular (btree) indexes were not being used. Are you sure they were actually present and marked as valid? Or maybe they were very bloated and could have been fixed with a REINDEX INDEX ... command, but it hard to see how they could be so bloated that they wouldn't have still been preferred over the seq scans.

Postgres CASE expression ELSE clause affects performance even when clause 'true'

Using PG 9.3 I have a report query that SELECTs phone call data.
If the current User is an 'admin', they can see all the calls.
Otherwise, the User can only see his own calls.
Hence we have (simplified)
create table phonecalls (phone_id int, destination varchar(100));
create table users (user_id int);
create table usergroups (user_id int, group_id int);
create table groups (group_id int, is_admin bool);
create table userphones (user_id int, phone_id int);
and the following permissions clause:
SELECT * FROM phonecalls
WHERE
CASE WHEN ( SELECT is_admin FROM users join usergroups using (user_id) join groups using (group_id) WHERE user_id = 1 )
THEN true
ELSE
exists ( SELECT phone_id FROM userphones
WHERE user_id = 1
AND userphones.phone_id = phonecalls.phone_id )
END
When the database has many, many records in it, performance is an issue.
What I'm finding is, if the user with user_id 1 is an admin, the query speeds up if I remove the ELSE part of the permissions clause, i.e.
ELSE
exists ( SELECT 1 )
END
But this seems to contradict the following statement from the Postgres documentation:
https://www.postgresql.org/docs/9.4/functions-conditional.html
A CASE expression does not evaluate any subexpressions that are not needed to determine the result.
If the User is an admin, the ELSE clause should have no effect on query execution time?
Am I misunderstanding?
EDIT Query plan output:
Seq Scan on phonecalls (cost=139.44..421294.43 rows=5000 width=10) (actual time=0.071..5.598 rows=10000 loops=1)
Filter: CASE WHEN $0 THEN true ELSE (alternatives: SubPlan 2 or hashed SubPlan 3) END
InitPlan 1 (returns $0)
-> Nested Loop (cost=36.89..139.44 rows=1538 width=1) (actual time=0.018..0.018 rows=0 loops=1)
-> Hash Join (cost=36.89..80.21 rows=128 width=5) (actual time=0.018..0.018 rows=0 loops=1)
Hash Cond: (groups.group_id = usergroups.group_id)
-> Seq Scan on groups (cost=0.00..33.30 rows=2330 width=5) (actual time=0.002..0.002 rows=1 loops=1)
-> Hash (cost=36.75..36.75 rows=11 width=8) (actual time=0.001..0.001 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 0kB
-> Seq Scan on usergroups (cost=0.00..36.75 rows=11 width=8) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (user_id = 1)
-> Materialize (cost=0.00..40.06 rows=12 width=4) (never executed)
-> Seq Scan on users (cost=0.00..40.00 rows=12 width=4) (never executed)
Filter: (user_id = 1)
SubPlan 2
-> Seq Scan on userphones (cost=0.00..42.10 rows=1 width=0) (never executed)
Filter: ((user_id = 1) AND (phone_id = phonecalls.phone_id))
SubPlan 3
-> Seq Scan on userphones userphones_1 (cost=0.00..36.75 rows=11 width=4) (actual time=0.009..0.010 rows=1 loops=1)
Filter: (user_id = 1)
Total runtime: 6.229 ms
EDIT 2 Query Plan for 'SELECT 1' option
"Result (cost=139.44..294.44 rows=10000 width=10) (actual time=0.044..3.713 rows=10000 loops=1)"
" One-Time Filter: CASE WHEN $0 THEN true ELSE $1 END"
" InitPlan 1 (returns $0)"
" -> Nested Loop (cost=36.89..139.44 rows=1538 width=1) (actual time=0.028..0.028 rows=0 loops=1)"
" -> Hash Join (cost=36.89..80.21 rows=128 width=5) (actual time=0.026..0.026 rows=0 loops=1)"
" Hash Cond: (groups.group_id = usergroups.group_id)"
" -> Seq Scan on groups (cost=0.00..33.30 rows=2330 width=5) (actual time=0.009..0.009 rows=1 loops=1)"
" -> Hash (cost=36.75..36.75 rows=11 width=8) (actual time=0.000..0.000 rows=0 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 0kB"
" -> Seq Scan on usergroups (cost=0.00..36.75 rows=11 width=8) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: (user_id = 1)"
" -> Materialize (cost=0.00..40.06 rows=12 width=4) (never executed)"
" -> Seq Scan on users (cost=0.00..40.00 rows=12 width=4) (never executed)"
" Filter: (user_id = 1)"
" InitPlan 2 (returns $1)"
" -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)"
" -> Seq Scan on phonecalls (cost=0.00..155.00 rows=10000 width=10) (actual time=0.012..1.502 rows=10000 loops=1)"
"Total runtime: 4.307 ms"

The difference is Filter vs. One-Time Filter.
In the first query, the condition in the CASE expression depends on phonecalls.phone_id from the sequential scan (even if that branch is never executed), so the filter will be applies to all 10000 result rows.
In the second query, the filter has to be evaluated only once; the query is run in an InitPlan that is executed before the main query is run.
These 10000 checks must make the difference.

If Case statement is in select/projection part then it does not have a considerable performance impact. if it is part of order by , group by, where or join a condition, it might not use proper index and may cause performance issues

Optimisation on postgres query

I am looking for optimization suggestions for the below query on postgres. Not a DBA so looking for some expert advice in here.
Devices table holds device_id which are hexadecimal.
To achieve high throughput we run 6 instances of this query in parallel with pattern matching for device_id
beginning with [0-2], [3-5], [6-9], [a-c], [d-f]
When we run just one instance of the query it works fine, but with 6 instances we get error -
[6669]:FATAL: connection to client lost
explain analyze select notifications.id, notifications.status, events.alert_type,
events.id as event_id, events.payload, notifications.device_id as device_id,
device_endpoints.region, device_endpoints.device_endpoint as endpoint
from notifications
inner join events
on notifications.event_id = events.id
inner join devices
on notifications.device_id = devices.id
inner join device_endpoints
on devices.id = device_endpoints.device_id
where notifications.status = 'pending' AND notifications.region = 'ap-southeast-2'
AND devices.device_id ~ '[0-9a-f].*'
limit 10000;
Output of explain analyse
"Limit (cost=25.62..1349.23 rows=206 width=202) (actual time=0.359..0.359 rows=0 loops=1)"
" -> Nested Loop (cost=25.62..1349.23 rows=206 width=202) (actual time=0.357..0.357 rows=0 loops=1)"
" Join Filter: (notifications.device_id = devices.id)"
" -> Nested Loop (cost=25.33..1258.73 rows=206 width=206) (actual time=0.357..0.357 rows=0 loops=1)"
" -> Hash Join (cost=25.04..61.32 rows=206 width=52) (actual time=0.043..0.172 rows=193 loops=1)"
" Hash Cond: (notifications.event_id = events.id)"
" -> Index Scan using idx_notifications_status on notifications (cost=0.42..33.87 rows=206 width=16) (actual time=0.013..0.100 rows=193 loops=1)"
" Index Cond: (status = 'pending'::notification_status)"
" Filter: (region = 'ap-southeast-2'::text)"
" -> Hash (cost=16.50..16.50 rows=650 width=40) (actual time=0.022..0.022 rows=34 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 14kB"
" -> Seq Scan on events (cost=0.00..16.50 rows=650 width=40) (actual time=0.005..0.014 rows=34 loops=1)"
" -> Index Scan using idx_device_endpoints_device_id on device_endpoints (cost=0.29..5.80 rows=1 width=154) (actual time=0.001..0.001 rows=0 loops=193)"
" Index Cond: (device_id = notifications.device_id)"
" -> Index Scan using devices_pkey on devices (cost=0.29..0.43 rows=1 width=4) (never executed)"
" Index Cond: (id = device_endpoints.device_id)"
" Filter: (device_id ~ '[0-9a-f].*'::text)"
"Planning time: 0.693 ms"
"Execution time: 0.404 ms"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Optimizing DB Model : N2N relationship postgresql 12 - postgresql

Related

How to speed up the query in GCP PostgreSQL

postgres optimize scope search query

Improve performance for SELECT statement to retrieve single record from table with 1.1 million rows (including related tables) PostgreSQL

Postgres CASE expression ELSE clause affects performance even when clause 'true'

Optimisation on postgres query

Categories

Resources