I have the following tables and sample data:
https://www.db-fiddle.com/f/hxQ7BGdgDJtQcv5xTChY9u/0
I would like to increase the performance of the contained query in db-fiddle by ideally removing the sub-query producing the success field (this was taken from ChatGPT output, but it was unable to remove this sub-query without destroying the results). How can I do this?
My question to ChatGPT was this:
using the tables in <db-fiddle link>, write a select sql query to
return all cfg_commissioning_tags columns and
dat_commissioning_test_log.success. These tables are joined by
cfg_commissioning_tags.id = dat_commissioning_test_log.tag_id. If the
tag_source is 'plc' and any rows have success = true, return true for
the success field for all matching type_id and relative_tag_path rows.
To the result ChatGPT produced, I added the AND ct_2.device_name != ct_1.device_name condition into the sub-query which is also required.
The current query, table creation queries, and the current query results are all copied below for posterity:
SELECT
ct_1.device_parent_path
,ct_1.device_name
,ct_1.relative_tag_path
,ct_1.tag_source
,ct_1.type_id
,CASE
WHEN EXISTS (
SELECT 1
FROM dat_commissioning_test_log ctl_2
JOIN cfg_commissioning_tags ct_2 ON ct_2.id = ctl_2.tag_id
WHERE
ct_2.type_id = ct_1.type_id
AND ct_2.relative_tag_path = ct_1.relative_tag_path
AND ct_2.device_name != ct_1.device_name -- without this, it runs super fast, but I need this
AND ctl_2.success = TRUE
AND ct_2.tag_source = 'plc'
) THEN 'true*'
ELSE CASE ctl_1.success WHEN true THEN 'true' ELSE 'false' END
END AS success
FROM cfg_commissioning_tags ct_1
LEFT JOIN dat_commissioning_test_log ctl_1 ON ct_1.id = ctl_1.tag_id
ORDER BY type_id, relative_tag_path
CREATE TABLE dat_commissioning_test_log
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
tag_id integer,
-- tested_on timestamp with time zone,
success boolean,
-- username character varying(50) COLLATE pg_catalog."default",
-- note character varying(300) COLLATE pg_catalog."default",
CONSTRAINT dat_commissioning_test_log_pkey PRIMARY KEY (id)
);
CREATE TABLE cfg_commissioning_tags
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
-- full_path character varying(400) COLLATE pg_catalog."default",
device_name character varying(50) COLLATE pg_catalog."default",
device_parent_path character varying(400) COLLATE pg_catalog."default",
-- added_on timestamp with time zone,
relative_tag_path character varying(100) COLLATE pg_catalog."default",
-- retired_on timestamp with time zone,
tag_source character varying(10) COLLATE pg_catalog."default",
type_id character varying(400) COLLATE pg_catalog."default",
CONSTRAINT cfg_commissioning_tags_pkey PRIMARY KEY (id)
);
INSERT INTO cfg_commissioning_tags (id, device_name, device_parent_path, relative_tag_path, tag_source, type_id) VALUES
(1, 'PC13A','DUMMY','Run Mins','plc','DOL'),
(2, 'PC12A','DUMMY','Run Mins','plc','DOL'),
(3, 'PC11A','DUMMY','Run Mins','plc','DOL'),
(4, 'PC11A','DUMMY','Status','io','DOL'),
(5, 'PC11A','DUMMY','Alarms/Isolator Tripped','io','DOL'),
(6, 'PC12A','DUMMY','Status','io','DOL');
INSERT INTO dat_commissioning_test_log (tag_id, success) VALUES
(1, true),
(6, true);
This is the results of the query:
device_parent_path
device_name
relative_tag_path
tag_source
type_id
success
DUMMY
PC11A
Alarms/Isolator Tripped
io
DOL
FALSE
DUMMY
PC13A
Run Mins
plc
DOL
TRUE
DUMMY
PC12A
Run Mins
plc
DOL
true*
DUMMY
PC11A
Run Mins
plc
DOL
true*
DUMMY
PC12A
Status
io
DOL
TRUE
DUMMY
PC11A
Status
io
DOL
FALSE
Edit:
Here is the EXPLAIN(ANALYZE, VERBOSE, BUFFERS) result:
"Sort (cost=4368188.41..4368208.24 rows=7932 width=188) (actual time=10378.617..10378.916 rows=8108 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id, (CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE ctl_1.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END)"
" Sort Key: ct_1.type_id, ct_1.relative_tag_path"
" Sort Method: quicksort Memory: 2350kB"
" Buffers: shared hit=2895186"
" -> Hash Left Join (cost=60.69..4367674.67 rows=7932 width=188) (actual time=1.991..10357.671 rows=8108 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id, CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE ctl_1.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END"
" Hash Cond: (ct_1.id = ctl_1.tag_id)"
" Buffers: shared hit=2895186"
" -> Seq Scan on public.cfg_commissioning_tags ct_1 (cost=0.00..426.32 rows=7932 width=156) (actual time=0.013..1.313 rows=7932 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id"
" Buffers: shared hit=347"
" -> Hash (cost=40.86..40.86 rows=1586 width=5) (actual time=0.326..0.326 rows=1593 loops=1)"
" Output: ctl_1.success, ctl_1.tag_id"
" Buckets: 2048 Batches: 1 Memory Usage: 79kB"
" Buffers: shared hit=25"
" -> Seq Scan on public.dat_commissioning_test_log ctl_1 (cost=0.00..40.86 rows=1586 width=5) (actual time=0.012..0.171 rows=1593 loops=1)"
" Output: ctl_1.success, ctl_1.tag_id"
" Buffers: shared hit=25"
" SubPlan 1"
" -> Hash Join (cost=505.71..550.57 rows=1 width=0) (actual time=1.267..1.267 rows=0 loops=8108)"
" Inner Unique: true"
" Hash Cond: (ctl_2.tag_id = ct_2.id)"
" Buffers: shared hit=2894814"
" -> Seq Scan on public.dat_commissioning_test_log ctl_2 (cost=0.00..40.86 rows=1521 width=4) (actual time=0.003..0.112 rows=1300 loops=3800)"
" Output: ctl_2.id, ctl_2.tag_id, ctl_2.tested_on, ctl_2.success, ctl_2.username, ctl_2.note"
" Filter: ctl_2.success"
" Rows Removed by Filter: 56"
" Buffers: shared hit=81338"
" -> Hash (cost=505.64..505.64 rows=6 width=4) (actual time=1.183..1.183 rows=98 loops=8108)"
" Output: ct_2.id"
" Buckets: 1024 Batches: 1 Memory Usage: 8kB"
" Buffers: shared hit=2813476"
" -> Seq Scan on public.cfg_commissioning_tags ct_2 (cost=0.00..505.64 rows=6 width=4) (actual time=0.620..1.169 rows=98 loops=8108)"
" Output: ct_2.id"
" Filter: (((ct_2.device_name)::text <> (ct_1.device_name)::text) AND ((ct_2.type_id)::text = (ct_1.type_id)::text) AND ((ct_2.relative_tag_path)::text = (ct_1.relative_tag_path)::text) AND ((ct_2.tag_source)::text = 'plc'::text))"
" Rows Removed by Filter: 7834"
" Buffers: shared hit=2813476"
"Planning Time: 0.382 ms"
"Execution Time: 10379.346 ms"
Edit 2
EXPLAIN after adding compound indexes:
"Sort (cost=540847.20..540867.03 rows=7932 width=198) (actual time=1142.282..1142.843 rows=7932 loops=1)"
" Output: ct_1.full_path, ct_1.device_parent_path, ct_1.device_name, ct_1.relative_tag_path, ct_1.tag_source, ct_1.type_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, (CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE dat_commissioning_test_log.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END)"
" Sort Key: ct_1.full_path"
" Sort Method: quicksort Memory: 2290kB"
" Buffers: shared hit=778254"
" -> Hash Left Join (cost=149.19..540333.47 rows=7932 width=198) (actual time=1.775..1108.469 rows=7932 loops=1)"
" Output: ct_1.full_path, ct_1.device_parent_path, ct_1.device_name, ct_1.relative_tag_path, ct_1.tag_source, ct_1.type_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE dat_commissioning_test_log.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END"
" Hash Cond: (ct_1.id = dat_commissioning_test_log.tag_id)"
" Buffers: shared hit=778254"
" -> Seq Scan on public.cfg_commissioning_tags ct_1 (cost=0.00..426.32 rows=7932 width=140) (actual time=0.011..0.837 rows=7932 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id"
" Buffers: shared hit=347"
" -> Hash (cost=139.24..139.24 rows=796 width=35) (actual time=1.404..1.404 rows=1417 loops=1)"
" Output: dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, dat_commissioning_test_log.success, dat_commissioning_test_log.tag_id"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 83kB"
" Buffers: shared hit=50"
" -> Hash Join (cost=85.28..139.24 rows=796 width=35) (actual time=0.938..1.249 rows=1417 loops=1)"
" Output: dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, dat_commissioning_test_log.success, dat_commissioning_test_log.tag_id"
" Inner Unique: true"
" Hash Cond: (dat_commissioning_test_log.id = "ANY_subquery".max)"
" Buffers: shared hit=50"
" -> Seq Scan on public.dat_commissioning_test_log (cost=0.00..40.93 rows=1593 width=39) (actual time=0.009..0.089 rows=1593 loops=1)"
" Output: dat_commissioning_test_log.id, dat_commissioning_test_log.tag_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.success, dat_commissioning_test_log.username, dat_commissioning_test_log.note"
" Buffers: shared hit=25"
" -> Hash (cost=82.78..82.78 rows=200 width=4) (actual time=0.926..0.926 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 66kB"
" Buffers: shared hit=25"
" -> HashAggregate (cost=80.78..82.78 rows=200 width=4) (actual time=0.710..0.804 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Group Key: "ANY_subquery".max"
" Buffers: shared hit=25"
" -> Subquery Scan on "ANY_subquery" (cost=48.90..77.23 rows=1417 width=4) (actual time=0.297..0.475 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Buffers: shared hit=25"
" -> HashAggregate (cost=48.90..63.07 rows=1417 width=8) (actual time=0.297..0.413 rows=1417 loops=1)"
" Output: max(dat_commissioning_test_log_1.id), dat_commissioning_test_log_1.tag_id"
" Group Key: dat_commissioning_test_log_1.tag_id"
" Buffers: shared hit=25"
" -> Seq Scan on public.dat_commissioning_test_log dat_commissioning_test_log_1 (cost=0.00..40.93 rows=1593 width=8) (actual time=0.006..0.090 rows=1593 loops=1)"
" Output: dat_commissioning_test_log_1.id, dat_commissioning_test_log_1.tag_id, dat_commissioning_test_log_1.tested_on, dat_commissioning_test_log_1.success, dat_commissioning_test_log_1.username, dat_commissioning_test_log_1.note"
" Buffers: shared hit=25"
" SubPlan 1"
" -> Hash Join (cost=23.10..68.04 rows=1 width=0) (actual time=0.133..0.133 rows=0 loops=7932)"
" Inner Unique: true"
" Hash Cond: (ctl_2.tag_id = ct_2.id)"
" Buffers: shared hit=777857"
" -> Seq Scan on public.dat_commissioning_test_log ctl_2 (cost=0.00..40.93 rows=1528 width=4) (actual time=0.002..0.098 rows=1301 loops=3796)"
" Output: ctl_2.id, ctl_2.tag_id, ctl_2.tested_on, ctl_2.success, ctl_2.username, ctl_2.note"
" Filter: ctl_2.success"
" Rows Removed by Filter: 56"
" Buffers: shared hit=81286"
" -> Hash (cost=23.02..23.02 rows=6 width=4) (actual time=0.057..0.057 rows=100 loops=7932)"
" Output: ct_2.id"
" Buckets: 1024 Batches: 1 Memory Usage: 8kB"
" Buffers: shared hit=696571"
" -> Index Scan using cfg_commissioning_tags_idx on public.cfg_commissioning_tags ct_2 (cost=0.41..23.02 rows=6 width=4) (actual time=0.016..0.049 rows=100 loops=7932)"
" Output: ct_2.id"
" Index Cond: (((ct_2.type_id)::text = (ct_1.type_id)::text) AND ((ct_2.relative_tag_path)::text = (ct_1.relative_tag_path)::text) AND ((ct_2.tag_source)::text = 'plc'::text))"
" Filter: ((ct_2.device_name)::text <> (ct_1.device_name)::text)"
" Rows Removed by Filter: 1"
" Buffers: shared hit=696571"
"Planning Time: 0.550 ms"
"Execution Time: 1143.359 ms"
EDIT 3
Replaced covering index on cfg_commissioning_tags:
"Sort (cost=540847.20..540867.03 rows=7932 width=198) (actual time=1152.113..1152.682 rows=7932 loops=1)"
" Output: ct_1.full_path, ct_1.device_parent_path, ct_1.device_name, ct_1.relative_tag_path, ct_1.tag_source, ct_1.type_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, (CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE dat_commissioning_test_log.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END)"
" Sort Key: ct_1.full_path"
" Sort Method: quicksort Memory: 2290kB"
" Buffers: shared hit=784891"
" -> Hash Left Join (cost=149.19..540333.47 rows=7932 width=198) (actual time=2.016..1115.111 rows=7932 loops=1)"
" Output: ct_1.full_path, ct_1.device_parent_path, ct_1.device_name, ct_1.relative_tag_path, ct_1.tag_source, ct_1.type_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE dat_commissioning_test_log.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END"
" Hash Cond: (ct_1.id = dat_commissioning_test_log.tag_id)"
" Buffers: shared hit=784891"
" -> Seq Scan on public.cfg_commissioning_tags ct_1 (cost=0.00..426.32 rows=7932 width=140) (actual time=0.014..0.755 rows=7932 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id"
" Buffers: shared hit=347"
" -> Hash (cost=139.24..139.24 rows=796 width=35) (actual time=1.613..1.613 rows=1417 loops=1)"
" Output: dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, dat_commissioning_test_log.success, dat_commissioning_test_log.tag_id"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 83kB"
" Buffers: shared hit=50"
" -> Hash Join (cost=85.28..139.24 rows=796 width=35) (actual time=1.117..1.449 rows=1417 loops=1)"
" Output: dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, dat_commissioning_test_log.success, dat_commissioning_test_log.tag_id"
" Inner Unique: true"
" Hash Cond: (dat_commissioning_test_log.id = "ANY_subquery".max)"
" Buffers: shared hit=50"
" -> Seq Scan on public.dat_commissioning_test_log (cost=0.00..40.93 rows=1593 width=39) (actual time=0.010..0.100 rows=1593 loops=1)"
" Output: dat_commissioning_test_log.id, dat_commissioning_test_log.tag_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.success, dat_commissioning_test_log.username, dat_commissioning_test_log.note"
" Buffers: shared hit=25"
" -> Hash (cost=82.78..82.78 rows=200 width=4) (actual time=1.103..1.103 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 66kB"
" Buffers: shared hit=25"
" -> HashAggregate (cost=80.78..82.78 rows=200 width=4) (actual time=0.798..0.940 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Group Key: "ANY_subquery".max"
" Buffers: shared hit=25"
" -> Subquery Scan on "ANY_subquery" (cost=48.90..77.23 rows=1417 width=4) (actual time=0.332..0.549 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Buffers: shared hit=25"
" -> HashAggregate (cost=48.90..63.07 rows=1417 width=8) (actual time=0.331..0.482 rows=1417 loops=1)"
" Output: max(dat_commissioning_test_log_1.id), dat_commissioning_test_log_1.tag_id"
" Group Key: dat_commissioning_test_log_1.tag_id"
" Buffers: shared hit=25"
" -> Seq Scan on public.dat_commissioning_test_log dat_commissioning_test_log_1 (cost=0.00..40.93 rows=1593 width=8) (actual time=0.006..0.095 rows=1593 loops=1)"
" Output: dat_commissioning_test_log_1.id, dat_commissioning_test_log_1.tag_id, dat_commissioning_test_log_1.tested_on, dat_commissioning_test_log_1.success, dat_commissioning_test_log_1.username, dat_commissioning_test_log_1.note"
" Buffers: shared hit=25"
" SubPlan 1"
" -> Hash Join (cost=23.10..68.04 rows=1 width=0) (actual time=0.134..0.134 rows=0 loops=7932)"
" Inner Unique: true"
" Hash Cond: (ctl_2.tag_id = ct_2.id)"
" Buffers: shared hit=784494"
" -> Seq Scan on public.dat_commissioning_test_log ctl_2 (cost=0.00..40.93 rows=1528 width=4) (actual time=0.002..0.098 rows=1301 loops=3796)"
" Output: ctl_2.id, ctl_2.tag_id, ctl_2.tested_on, ctl_2.success, ctl_2.username, ctl_2.note"
" Filter: ctl_2.success"
" Rows Removed by Filter: 56"
" Buffers: shared hit=81286"
" -> Hash (cost=23.02..23.02 rows=6 width=4) (actual time=0.057..0.057 rows=100 loops=7932)"
" Output: ct_2.id"
" Buckets: 1024 Batches: 1 Memory Usage: 8kB"
" Buffers: shared hit=703208"
" -> Index Scan using cfg_commissioning_tags_idx on public.cfg_commissioning_tags ct_2 (cost=0.41..23.02 rows=6 width=4) (actual time=0.015..0.049 rows=100 loops=7932)"
" Output: ct_2.id"
" Index Cond: (((ct_2.type_id)::text = (ct_1.type_id)::text) AND ((ct_2.relative_tag_path)::text = (ct_1.relative_tag_path)::text) AND ((ct_2.tag_source)::text = 'plc'::text))"
" Filter: ((ct_2.device_name)::text <> (ct_1.device_name)::text)"
" Rows Removed by Filter: 1"
" Buffers: shared hit=703208"
"Planning Time: 0.514 ms"
"Execution Time: 1153.156 ms"
You need those compound Indexes :
CREATE index cfg_commissioning_tags_idx on cfg_commissioning_tags(type_id, relative_tag_path, device_name, tag_source);
CREATE index dat_commissioning_test_log_idx on dat_commissioning_test_log(tag_id, success);
You can extend the first Index to take in addition of the columns that are used on the where clause and join, We can add all the columns that are used on the select (in this case we can add device_parent_path to our index), this type of index called Covering Index.
I have a postgresql table named events
Name
Data Type
id
character varying
idx
integer
module
character varying
method
character varying
idx
integer
block_height
integer
data
jsonb
where data is like
["hex_address", 101, 1001, 51637660000, 51324528436, 6003, 235458709417729, 234683610487930] //trade method
["hex_address", 200060013, 1, 250000000000, 3176359357709794, 6006, "0x00000000000000001431f1724a499fcd", 114440794831460] //add method
["hex_address", 200060013, 1, 42658905229340, 407285893749, "0x000000000000000000110204f76c06e2", 6006, 121017475390243, "0x000000000000000013bd9463821aedee"] //remove method
And the table is about 3 million items.The indexes have been created
CREATE INDEX IDX_event_multicolumn_index ON public.events ("module", "method", "block_height") INCLUDE ("id", "idx", "data");
CREATE INDEX event_data_gin_index ON public.events USING gin (data)
When I run the SQL SELECT data FROM public.events WHERE module = 'amm' AND (((data::jsonb->6 IN ('6002', '6003') AND method = 'LiquidityRemoved')) OR ((data::jsonb->5 IN ('6002', '6003') AND method IN ('Traded', 'LiquidityAdded')))) ORDER BY block_height DESC, idx DESC LIMIT 500;
it'll take about 1 minute. the explain analyze result(this is another table with less items):
"Limit (cost=1909.21..1909.22 rows=5 width=334) (actual time=14019.484..14019.524 rows=100 loops=1)"
" -> Sort (cost=1909.21..1909.22 rows=5 width=334) (actual time=14019.477..14019.504 rows=100 loops=1)"
" Sort Key: block_height DESC, idx DESC"
" Sort Method: top-N heapsort Memory: 128kB"
" -> Bitmap Heap Scan on events (cost=114.28..1909.15 rows=5 width=334) (actual time=703.038..13957.503 rows=25625 loops=1)"
" Recheck Cond: ((((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
" Filter: ((((data -> 6) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
" Rows Removed by Filter: 9435"
" Heap Blocks: exact=28532"
" -> BitmapOr (cost=114.28..114.28 rows=462 width=0) (actual time=696.569..696.580 rows=0 loops=1)"
" -> Bitmap Index Scan on ""IDX_event_multicolumn_index"" (cost=0.00..4.59 rows=3 width=0) (actual time=24.375..24.382 rows=896 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text))"
" -> Bitmap Index Scan on ""IDX_event_multicolumn_index"" (cost=0.00..109.69 rows=459 width=0) (actual time=672.191..672.191 rows=34164 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"
If I delete the multicolumn index, it does be faster and take about 20 seconds.
"Gather Merge (cost=477713.00..477720.00 rows=60 width=130) (actual time=22151.357..22210.826 rows=79864 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=476712.97..476713.05 rows=30 width=130) (actual time=22090.960..22097.308 rows=26621 loops=3)"
" Sort Key: block_height"
" Sort Method: external merge Disk: 5400kB"
" Worker 0: Sort Method: external merge Disk: 5264kB"
" Worker 1: Sort Method: external merge Disk: 5416kB"
" -> Parallel Seq Scan on events (cost=0.00..476712.24 rows=30 width=130) (actual time=5.151..21985.878 rows=26621 loops=3)"
" Filter: (((module)::text = 'amm'::text) AND ((((data -> 6) = ANY ('{6002,6003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{6002,6003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))))"
" Rows Removed by Filter: 2858160"
"Planning Time: 0.559 ms"
"Execution Time: 22217.351 ms"
Then I try to create the specific field multicolumn index:
create EXTENSION btree_gin;
CREATE INDEX extcondindex ON public.events USING gin (((data -> 5)), ((data -> 6)), module, method);
The result is just same with the origin multi column index.
If I remove one of the OR constraint SELECT data FROM public.events WHERE module = 'amm' AND (((data::jsonb->6 IN ('6002', '6003') AND method = 'LiquidityRemoved'))) ORDER BY block_height DESC, idx DESC LIMIT 500; It's fast enough and takes about 3 seconds.
I want to know why the multi column index slow down the query, and how should I add the index for the specific field to optimize my query.
The io timing and buffer analyze for the multi-column index table
"Sort (cost=1916.98..1916.99 rows=5 width=142) (actual time=14204.316..14210.109 rows=25683 loops=1)"
" Sort Key: block_height"
" Sort Method: external merge Disk: 5264kB"
" Buffers: shared hit=93 read=31211, temp read=658 written=659"
" I/O Timings: read=13533.823"
" -> Bitmap Heap Scan on events (cost=114.30..1916.92 rows=5 width=142) (actual time=926.714..14156.662 rows=25683 loops=1)"
" Recheck Cond: ((((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
" Filter: ((((data -> 6) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = 'LiquidityRemoved'::text)) OR (((data -> 5) = ANY ('{5002,5003}'::jsonb[])) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[]))))"
" Rows Removed by Filter: 9503"
" Heap Blocks: exact=28626"
" Buffers: shared hit=90 read=31211"
" I/O Timings: read=13533.823"
" -> BitmapOr (cost=114.30..114.30 rows=464 width=0) (actual time=919.777..919.779 rows=0 loops=1)"
" Buffers: shared hit=27 read=2648"
" I/O Timings: read=892.499"
" -> Bitmap Index Scan on ""IDX_event_multicolumn_index"" (cost=0.00..4.59 rows=3 width=0) (actual time=32.255..32.255 rows=898 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = 'LiquidityRemoved'::text))"
" Buffers: shared hit=7 read=72"
" I/O Timings: read=30.095"
" -> Bitmap Index Scan on ""IDX_event_multicolumn_index"" (cost=0.00..109.71 rows=461 width=0) (actual time=887.519..887.520 rows=34288 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"
" Buffers: shared hit=20 read=2576"
" I/O Timings: read=862.404"
"Planning:"
" Buffers: shared hit=230"
"Planning Time: 0.663 ms"
"Execution Time: 14214.038 ms"
I am a newbie to database optimisations,
The table data I have is around 29 million rows,
I am running on Pgadmin to do select * on the rows and it takes 9 seconds.
What can I do to optimize performance?
SELECT
F."Id",
F."Name",
F."Url",
F."CountryModel",
F."RegionModel",
F."CityModel",
F."Street",
F."Phone",
F."PostCode",
F."Images",
F."Rank",
F."CommentCount",
F."PageRank",
F."Description",
F."Properties",
F."IsVerify",
count(*) AS Counter
FROM
public."Firms" F,
LATERAL unnest(F."Properties") AS P
WHERE
F."CountryId" = 1
AND F."RegionId" = 7
AND F."CityId" = 4365
AND P = ANY (ARRAY[126, 128])
AND F."Deleted" = FALSE
GROUP BY
F."Id"
ORDER BY
Counter DESC,
F."IsVerify" DESC,
F."PageRank" DESC OFFSET 10 ROWS FETCH FIRST 20 ROW ONLY
Thats my query plan
" -> Sort (cost=11945.20..11948.15 rows=1178 width=369) (actual time=8981.514..8981.515 rows=30 loops=1)"
" Sort Key: (count(*)) DESC, f.""IsVerify"" DESC, f.""PageRank"" DESC"
" Sort Method: top-N heapsort Memory: 58kB"
" -> HashAggregate (cost=11898.63..11910.41 rows=1178 width=369) (actual time=8981.234..8981.305 rows=309 loops=1)"
" Group Key: f.""Id"""
" Batches: 1 Memory Usage: 577kB"
" -> Nested Loop (cost=7050.07..11886.85 rows=2356 width=360) (actual time=79.408..8980.167 rows=322 loops=1)"
" -> Bitmap Heap Scan on ""Firms"" f (cost=7050.06..11716.04 rows=1178 width=360) (actual time=78.414..8909.649 rows=56071 loops=1)"
" Recheck Cond: ((""CityId"" = 4365) AND (""RegionId"" = 7))"
" Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
" Heap Blocks: exact=55330"
" -> BitmapAnd (cost=7050.06..7050.06 rows=1178 width=0) (actual time=70.947..70.947 rows=0 loops=1)"
" -> Bitmap Index Scan on ""IX_Firms_CityId"" (cost=0.00..635.62 rows=58025 width=0) (actual time=11.563..11.563 rows=56072 loops=1)"
" Index Cond: (""CityId"" = 4365)"
" -> Bitmap Index Scan on ""IX_Firms_RegionId"" (cost=0.00..6413.60 rows=588955 width=0) (actual time=57.795..57.795 rows=598278 loops=1)"
" Index Cond: (""RegionId"" = 7)"
" -> Function Scan on unnest p (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=0 loops=56071)"
" Filter: (p = ANY ('{126,128}'::integer[]))"
" Rows Removed by Filter: 2"
"Planning Time: 0.351 ms"
"Execution Time: 8981.725 ms"```
Create a GIN index on F."Properties",
create index on "Firms" using gin ("Properties");
then add a clause to your WHERE
...
AND P = ANY (ARRAY[126, 128])
AND "Properties" && ARRAY[126, 128]
....
That added clause is redundant to the one preceding it, but the planner is not smart enough to reason through that so you need to make it explicit.
I use postgres 10.4 in linux environment .
I have a query which is very slow when I run it.
the search using subject with arabic language is very slow.
also the problem exist with subject in different language.
I have about 1 million record per year.
I try to add index in the table transfer but the result is the same
CREATE INDEX subject
ON public.transfer
USING btree
(subject COLLATE pg_catalog."default" varchar_pattern_ops);
this is the query .
select * from ( with scope as (
select unit_id from public.sec_unit
where emp_id= 'EM-001'and app_type in ('S','E') )
select CAST (row_number() OVER (PARTITION BY advsearch.correspondenceid)
as VARCHAR(15)) as numline, advsearch.*
from (
SELECT Transfer.id AS id, CORRESP.id AS correspondenceId,
Transfer.correspondencecopy_id AS correspondencecopyId, Transfer.datesendjctransfer AS datesendjctransfer
FROM Transfer Transfer
Left outer JOIN correspondencecopy CORRESPCPY ON Transfer.correspondencecopy_id = CORRESPCPY.id
Left outer JOIN correspondence CORRESP ON CORRESP.id = CORRESPCPY.correspondence_id
LEFT OUTER JOIN scope sc on sc.unit_id = Transfer.unittransto_id or sc.unit_id='allorg'
LEFT OUTER JOIN employee emp on emp.id = 'EM-001'
WHERE transfer.status='1' AND (Transfer.docyearhjr='1441' )
AND (Transfer.subject like '%'||'رقم'||'%')
AND ( sc.unit_id is not null )
AND (coalesce(emp.confidentiel,'0') >= coalesce(Transfer.confidentiel,'0'))
)
advsearch ) Searchlist
WHERE Searchlist.numline='1'
ORDER BY Searchlist.datesendjctransfer
can someone help me to optimise the query
updated
I try to change the query.
I eliminate the use of scope.
I change it by simple condition.
I have the same result ( the same number of record )
but the problem is the same : the query is still very slow
select * from (
select CAST (row_number() OVER (PARTITION BY advsearch.correspondenceid)
as VARCHAR(15)) as numline, advsearch.*
from (
SELECT Transfer.id AS id, CORRESP.id AS correspondenceId,
Transfer.correspondencecopy_id AS correspondencecopyId, Transfer.datesendjctransfer AS datesendjctransfer
FROM Transfer Transfer
Left outer JOIN correspondencecopy CORRESPCPY ON Transfer.correspondencecopy_id = CORRESPCPY.id
Left outer JOIN correspondence CORRESP ON CORRESP.id = CORRESPCPY.correspondence_id
LEFT OUTER JOIN employee emp on emp.id = 'EM-001'
WHERE transfer.status='1' and ( Transfer.unittransto_id in (
select unit_id from public.security_employee_unit
where employee_id= 'EM-001'and app_type in ('E','S') )
or 'allorg' in ( select unit_id from public.security_employee_unit
where employee_id= 'EM-001'and app_type in ('S')))
AND (Transfer.docyearhjr='1441' )
AND (Transfer.subject like '%'||'رقم'||'%')
AND (coalesce(emp.confidentiel,'0') >= coalesce(Transfer.confidentiel,'0'))
)
advsearch ) Searchlist
WHERE Searchlist.numline='1'
ORDER BY Searchlist.datesendjctransfer
Updated :
I try to analyze the query using EXPLAIN ANALYZE
this the result :
"Sort (cost=412139.09..412139.13 rows=17 width=87) (actual time=1481.951..1482.166 rows=4497 loops=1)"
" Sort Key: searchlist.datesendjctransfer"
" Sort Method: quicksort Memory: 544kB"
" -> Subquery Scan on searchlist (cost=412009.59..412138.74 rows=17 width=87) (actual time=1457.717..1480.381 rows=4497 loops=1)"
" Filter: ((searchlist.numline)::text = '1'::text)"
" Rows Removed by Filter: 38359"
" -> WindowAgg (cost=412009.59..412095.69 rows=3444 width=87) (actual time=1457.715..1477.146 rows=42856 loops=1)"
" CTE scope"
" -> Bitmap Heap Scan on security_employee_unit (cost=8.59..15.83 rows=2 width=7) (actual time=0.043..0.058 rows=2 loops=1)"
" Recheck Cond: (((employee_id)::text = 'EM-001'::text) AND ((app_type)::text = ANY ('{SE,I}'::text[])))"
" Heap Blocks: exact=2"
" -> Bitmap Index Scan on employeeidkey (cost=0.00..8.59 rows=2 width=0) (actual time=0.037..0.037 rows=2 loops=1)"
" Index Cond: (((employee_id)::text = 'EM-001'::text) AND ((app_type)::text = ANY ('{SE,I}'::text[])))"
" -> Sort (cost=411993.77..412002.38 rows=3444 width=39) (actual time=1457.702..1461.773 rows=42856 loops=1)"
" Sort Key: corresp.id"
" Sort Method: external merge Disk: 2440kB"
" -> Nested Loop Left Join (cost=18315.99..411791.43 rows=3444 width=39) (actual time=1271.209..1295.423 rows=42856 loops=1)"
" Filter: ((COALESCE(emp.confidentiel, '0'::character varying))::text >= (COALESCE(transfer.confidentiel, '0'::character varying))::text)"
" -> Nested Loop (cost=18315.71..411628.14 rows=10333 width=41) (actual time=1271.165..1283.365 rows=42856 loops=1)"
" Join Filter: (((sc.unit_id)::text = (transfer.unittransto_id)::text) OR ((sc.unit_id)::text = 'allorg'::text))"
" Rows Removed by Join Filter: 42856"
" -> CTE Scan on scope sc (cost=0.00..0.04 rows=2 width=48) (actual time=0.045..0.064 rows=2 loops=1)"
" Filter: (unit_id IS NOT NULL)"
" -> Materialize (cost=18315.71..411292.44 rows=10328 width=48) (actual time=53.970..635.651 rows=42856 loops=2)"
" -> Gather (cost=18315.71..411240.80 rows=10328 width=48) (actual time=107.919..1254.600 rows=42856 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Nested Loop Left Join (cost=17315.71..409208.00 rows=4303 width=48) (actual time=104.436..1250.461 rows=14285 loops=3)"
" -> Nested Loop Left Join (cost=17315.28..405979.02 rows=4303 width=48) (actual time=104.382..1136.591 rows=14285 loops=3)"
" -> Parallel Bitmap Heap Scan on transfer (cost=17314.85..377306.25 rows=4303 width=39) (actual time=104.287..996.609 rows=14285 loops=3)"
" Recheck Cond: ((docyearhjr)::text = '1441'::text)"
" Rows Removed by Index Recheck: 437299"
" Filter: (((subject)::text ~~ '%رقم%'::text) AND ((status)::text = '1'::text))"
" Rows Removed by Filter: 297178"
" Heap Blocks: exact=14805 lossy=44734"
" -> Bitmap Index Scan on docyearhjr (cost=0.00..17312.27 rows=938112 width=0) (actual time=96.028..96.028 rows=934389 loops=1)"
" Index Cond: ((docyearhjr)::text = '1441'::text)"
" -> Index Scan using pk_correspondencecopy on correspondencecopy correspcpy (cost=0.43..6.66 rows=1 width=21) (actual time=0.009..0.009 rows=1 loops=42856)"
" Index Cond: ((transfer.correspondencecopy_id)::text = (id)::text)"
" -> Index Only Scan using pk_correspondence on correspondence corresp (cost=0.42..0.75 rows=1 width=9) (actual time=0.007..0.007 rows=1 loops=42856)"
" Index Cond: (id = (correspcpy.correspondence_id)::text)"
" Heap Fetches: 14227"
" -> Materialize (cost=0.28..8.31 rows=1 width=2) (actual time=0.000..0.000 rows=1 loops=42856)"
" -> Index Scan using pk_employee on employee emp (cost=0.28..8.30 rows=1 width=2) (actual time=0.038..0.038 rows=1 loops=1)"
" Index Cond: ((id)::text = 'EM-001'::text)"
"Planning time: 1.595 ms"
"Execution time: 1487.303 ms"