Improve PostgreSQL - Insert statement execution RUN TIME

Improve PostgreSQL - Insert statement execution RUN TIME - postgresql

SELECT version() = ('PostgreSQL 12.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit',)
I have this use-case where I had to insert records DAILY in the DATAVANT_COVID_MATCH table while joining 3 other tables.
I’ve created INDEX and also Partitions to decrease the execution time for the Insert SQL but still it’s taking multiple hrs to Insert data into DATAVANT_COVID_MATCH table
Below is the code that currently runs DAILY
INSERT INTO DATAVANT_O.DATAVANT_COVID_MATCH_{}
SELECT
CUST_LAST_NM,
CUST_FRST_NM,
CIGNA_DOB,
CIGNA_ZIP,
DATAVANT_DOD,
DATAVANT_DOB,
DEATH_VERIFICATION,
DATA_SOURCE,
INDIV_ENTPR_ID
FROM
(
SELECT
CR.PATNT_LAST_NM AS CUST_LAST_NM,
CR.PATNT_FRST_NM AS CUST_FRST_NM,
CRD.CUST_BRTH_DT AS CIGNA_DOB,
CR.PATNT_POSTL_CD AS CIGNA_ZIP,
MI.DOD AS DATAVANT_DOD,
MI.DOB AS DATAVANT_DOB,
MI.DEATH_VERIFICATION,
MI.DATA_SOURCE,
CRD.INDIV_ENTPR_ID,
ROW_NUMBER () OVER (PARTITION BY CRD.INDIV_ENTPR_ID ORDER BY CRD.INDIV_ENTPR_ID DESC)
FROM DATAVANT_O.COVID_PATNT_REGISTRY_DEID CRD
INNER JOIN DATAVANT_STG_O.MORTALITY_INDEX_{} MI ON
CRD.TOKEN_1 = MI.TOKEN_1 AND
CRD.TOKEN_2 = MI.TOKEN_2 AND
CRD.TOKEN_4 = MI.TOKEN_4
INNER JOIN DATAVANT_O.COVID_PATNT_REGISTRY CR ON
CR.INDIV_ENTPR_ID = CRD.INDIV_ENTPR_ID
) x
WHERE
ROW_NUMBER = 1;
INDEX:
INDEX created for every partition of MORTALITY_INDEX table and DATAVANT_O.DATAVANT_COVID_MATCH table
example:
CREATE INDEX mortality_index_1941_dod_idx
ON datavant_stg_o.mortality_index_1940 USING btree
(dod ASC NULLS LAST)
TABLESPACE pg_default
CREATE INDEX mortality_index_1941_1945_dod_idx
ON datavant_stg_o.mortality_index_1941_1945 USING btree
(dod ASC NULLS LAST)
TABLESPACE pg_default;
etc...
No of records in each table:
DATAVANT_COVID_MATCH = 10k
COVID_PATNT_REGISTRY = 800k
COVID_PATNT_REGISTRY_DEID = 800k
MORTALITY_INDEX(Total count = 220 Million
MORTALITY_INDEX ~= 10 Million records for each interval
So can someone direct me on how to decrease the execution time to <=1hr ?
Any comments/suggestions are appreciated and Let me know if I need to add any additional information.
Thank you!!!
Below is the EXPLAIN plan for the INSERT statement
"Insert on datavant_covid_match (cost=52885719.27..52885719.44 rows=1 width=271)"
" -> Subquery Scan on x (cost=52885719.27..52885719.44 rows=1 width=271)"
" Filter: (x.row_number = 1)"
" -> WindowAgg (cost=52885719.27..52885719.37 rows=5 width=57)"
" -> Sort (cost=52885719.27..52885719.28 rows=5 width=49)"
" Sort Key: crd.indiv_entpr_id DESC"
" -> Nested Loop (cost=21573582.34..52885719.21 rows=5 width=49)"
" Join Filter: ((crd.indiv_entpr_id)::text = (cr.indiv_entpr_id)::text)"
" -> Hash Join (cost=21573582.34..52647701.34 rows=1 width=29)"
" Hash Cond: (((crd.token_1)::text = (mi.token_1)::text) AND ((crd.token_2)::text = (mi.token_2)::text) AND ((crd.token_4)::text = (mi.token_4)::text))"
" -> Seq Scan on covid_patnt_registry_deid crd (cost=0.00..13.20 rows=320 width=145)"
" -> Hash (cost=13011477.77..13011477.77 rows=219629118 width=148)"
" -> Append (cost=0.00..13011477.77 rows=219629118 width=148)"
" -> Seq Scan on mortality_index_1940 mi (cost=0.00..26129.12 rows=471412 width=149)"
" -> Seq Scan on mortality_index_1941_1945 mi_1 (cost=0.00..89439.94 rows=1615094 width=149)"
" -> Seq Scan on mortality_index_1946_1950 mi_2 (cost=0.00..110751.92 rows=1998492 width=149)"
" -> Seq Scan on mortality_index_1951_1955 mi_3 (cost=0.00..170548.84 rows=3077984 width=149)"
" -> Seq Scan on mortality_index_1956_1960 mi_4 (cost=0.00..228210.95 rows=4120895 width=149)"
" -> Seq Scan on mortality_index_1961_1965 mi_5 (cost=0.00..416877.60 rows=7535260 width=148)"
" -> Seq Scan on mortality_index_1966_1970 mi_6 (cost=0.00..721723.91 rows=13042691 width=148)"
" -> Seq Scan on mortality_index_1971_1975 mi_7 (cost=0.00..863088.56 rows=15582656 width=148)"
" -> Seq Scan on mortality_index_1976_1980 mi_8 (cost=0.00..932241.96 rows=16833796 width=149)"
" -> Seq Scan on mortality_index_1981_1985 mi_9 (cost=0.00..956751.74 rows=17281174 width=149)"
" -> Seq Scan on mortality_index_1986_1990 mi_10 (cost=0.00..972980.59 rows=17920859 width=145)"
" -> Seq Scan on mortality_index_1991_1995 mi_11 (cost=0.00..1059929.92 rows=19515892 width=145)"
" -> Seq Scan on mortality_index_1996_2000 mi_12 (cost=0.00..1147163.44 rows=20842344 width=146)"
" -> Seq Scan on mortality_index_2001_2005 mi_13 (cost=0.00..1197933.26 rows=21622326 width=148)"
" -> Seq Scan on mortality_index_2006_2010 mi_14 (cost=0.00..925468.03 rows=16956803 width=149)"
" -> Seq Scan on mortality_index_2011_2015 mi_15 (cost=0.00..1028501.34 rows=19858534 width=149)"
" -> Seq Scan on mortality_index_2016_2020 mi_16 (cost=0.00..1065579.24 rows=21352824 width=150)"
" -> Seq Scan on mortality_index_2021_2025 mi_17 (cost=0.00..10.80 rows=80 width=398)"
" -> Seq Scan on mortality_index_2026_2030 mi_18 (cost=0.00..1.02 rows=2 width=398)"
" -> Seq Scan on covid_patnt_registry cr (cost=0.00..178937.94 rows=4726394 width=29)"
EDIT: As requested EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
for SELECT
"Subquery Scan on x (cost=28188731.46..28188731.66 rows=1 width=49) (actual time=38.348..38.348 rows=0 loops=1)"
" Output: x.cust_last_nm, x.cust_frst_nm, x.cigna_dob, x.cigna_zip, x.datavant_dod, x.datavant_dob, x.death_verification, x.data_source, x.indiv_entpr_id"
" Filter: (x.row_number = 1)"
" Buffers: shared hit=141 read=3"
" -> WindowAgg (cost=28188731.46..28188731.58 rows=6 width=57) (actual time=38.346..38.346 rows=0 loops=1)"
" Output: cr.patnt_last_nm, cr.patnt_frst_nm, crd.cust_brth_dt, cr.patnt_postl_cd, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source, crd.indiv_entpr_id, row_number() OVER (?)"
" Buffers: shared hit=141 read=3"
" -> Sort (cost=28188731.46..28188731.48 rows=6 width=49) (actual time=38.338..38.338 rows=0 loops=1)"
" Output: crd.indiv_entpr_id, cr.patnt_last_nm, cr.patnt_frst_nm, crd.cust_brth_dt, cr.patnt_postl_cd, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source"
" Sort Key: crd.indiv_entpr_id DESC"
" Sort Method: quicksort Memory: 25kB"
" Buffers: shared hit=141 read=3"
" -> Nested Loop (cost=1018.80..28188731.39 rows=6 width=49) (actual time=38.291..38.291 rows=0 loops=1)"
" Output: crd.indiv_entpr_id, cr.patnt_last_nm, cr.patnt_frst_nm, crd.cust_brth_dt, cr.patnt_postl_cd, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source"
" Join Filter: ((crd.indiv_entpr_id)::text = (cr.indiv_entpr_id)::text)"
" Buffers: shared hit=138 read=3"
" -> Gather (cost=1018.80..27906096.67 rows=1 width=29) (actual time=38.290..39.672 rows=0 loops=1)"
" Output: crd.cust_brth_dt, crd.indiv_entpr_id, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=138 read=3"
" -> Hash Join (cost=18.80..27905096.57 rows=1 width=29) (actual time=9.141..9.143 rows=0 loops=3)"
" Output: crd.cust_brth_dt, crd.indiv_entpr_id, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source"
" Hash Cond: (((mi_13.token_1)::text = (crd.token_1)::text) AND ((mi_13.token_2)::text = (crd.token_2)::text) AND ((mi_13.token_4)::text = (crd.token_4)::text))"
" Buffers: shared hit=138 read=3"
" Worker 0: actual time=9.014..9.017 rows=0 loops=1"
" Buffers: shared hit=69 read=1"
" Worker 1: actual time=11.521..11.523 rows=0 loops=1"
" Buffers: shared hit=69 read=1"
" -> Parallel Append (cost=0.00..11089723.14 rows=91512134 width=148) (actual time=8.920..8.920 rows=1 loops=3)"
" Buffers: shared read=3"
" Worker 0: actual time=8.689..8.689 rows=1 loops=1"
" Buffers: shared read=1"
" Worker 1: actual time=11.242..11.242 rows=1 loops=1"
" Buffers: shared read=1"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_2001_2005 mi_13 (cost=0.00..1071803.02 rows=9009302 width=148) (actual time=11.240..11.240 rows=1 loops=1)"
" Output: mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source, mi_13.token_1, mi_13.token_2, mi_13.token_4"
" Buffers: shared read=1"
" Worker 1: actual time=11.240..11.240 rows=1 loops=1"
" Buffers: shared read=1"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1996_2000 mi_12 (cost=0.00..1025583.10 rows=8684310 width=146) (actual time=8.687..8.687 rows=1 loops=1)"
" Output: mi_12.dod, mi_12.dob, mi_12.death_verification, mi_12.data_source, mi_12.token_1, mi_12.token_2, mi_12.token_4"
" Buffers: shared read=1"
" Worker 0: actual time=8.687..8.687 rows=1 loops=1"
" Buffers: shared read=1"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1991_1995 mi_11 (cost=0.00..946087.22 rows=8131622 width=145) (never executed)"
" Output: mi_11.dod, mi_11.dob, mi_11.death_verification, mi_11.data_source, mi_11.token_1, mi_11.token_2, mi_11.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_2016_2020 mi_16 (cost=0.00..941021.10 rows=8897010 width=150) (never executed)"
" Output: mi_16.dod, mi_16.dob, mi_16.death_verification, mi_16.data_source, mi_16.token_1, mi_16.token_2, mi_16.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_2011_2015 mi_15 (cost=0.00..912659.89 rows=8274389 width=149) (never executed)"
" Output: mi_15.dod, mi_15.dob, mi_15.death_verification, mi_15.data_source, mi_15.token_1, mi_15.token_2, mi_15.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1986_1990 mi_10 (cost=0.00..868442.25 rows=7467025 width=145) (never executed)"
" Output: mi_10.dod, mi_10.dob, mi_10.death_verification, mi_10.data_source, mi_10.token_1, mi_10.token_2, mi_10.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1981_1985 mi_9 (cost=0.00..855944.89 rows=7200489 width=149) (never executed)"
" Output: mi_9.dod, mi_9.dob, mi_9.death_verification, mi_9.data_source, mi_9.token_1, mi_9.token_2, mi_9.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1976_1980 mi_8 (cost=0.00..834044.82 rows=7014082 width=149) (never executed)"
" Output: mi_8.dod, mi_8.dob, mi_8.death_verification, mi_8.data_source, mi_8.token_1, mi_8.token_2, mi_8.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_2006_2010 mi_14 (cost=0.00..826553.35 rows=7065335 width=149) (never executed)"
" Output: mi_14.dod, mi_14.dob, mi_14.death_verification, mi_14.data_source, mi_14.token_1, mi_14.token_2, mi_14.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1971_1975 mi_7 (cost=0.00..772189.73 rows=6492773 width=148) (never executed)"
" Output: mi_7.dod, mi_7.dob, mi_7.death_verification, mi_7.data_source, mi_7.token_1, mi_7.token_2, mi_7.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1966_1970 mi_6 (cost=0.00..645641.55 rows=5434455 width=148) (never executed)"
" Output: mi_6.dod, mi_6.dob, mi_6.death_verification, mi_6.data_source, mi_6.token_1, mi_6.token_2, mi_6.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1961_1965 mi_5 (cost=0.00..372921.92 rows=3139692 width=148) (never executed)"
" Output: mi_5.dod, mi_5.dob, mi_5.death_verification, mi_5.data_source, mi_5.token_1, mi_5.token_2, mi_5.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1956_1960 mi_4 (cost=0.00..204172.40 rows=1717040 width=149) (never executed)"
" Output: mi_4.dod, mi_4.dob, mi_4.death_verification, mi_4.data_source, mi_4.token_1, mi_4.token_2, mi_4.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1951_1955 mi_3 (cost=0.00..152593.93 rows=1282493 width=149) (never executed)"
" Output: mi_3.dod, mi_3.dob, mi_3.death_verification, mi_3.data_source, mi_3.token_1, mi_3.token_2, mi_3.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1946_1950 mi_2 (cost=0.00..99094.05 rows=832705 width=149) (never executed)"
" Output: mi_2.dod, mi_2.dob, mi_2.death_verification, mi_2.data_source, mi_2.token_1, mi_2.token_2, mi_2.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1941_1945 mi_1 (cost=0.00..80018.56 rows=672956 width=149) (never executed)"
" Output: mi_1.dod, mi_1.dob, mi_1.death_verification, mi_1.data_source, mi_1.token_1, mi_1.token_2, mi_1.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_1940 mi (cost=0.00..23379.22 rows=196422 width=149) (never executed)"
" Output: mi.dod, mi.dob, mi.death_verification, mi.data_source, mi.token_1, mi.token_2, mi.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_2021_2025 mi_17 (cost=0.00..10.47 rows=47 width=398) (never executed)"
" Output: mi_17.dod, mi_17.dob, mi_17.death_verification, mi_17.data_source, mi_17.token_1, mi_17.token_2, mi_17.token_4"
" -> Parallel Seq Scan on datavant_stg_o.mortality_index_2026_2030 mi_18 (cost=0.00..1.01 rows=1 width=398) (actual time=6.825..6.825 rows=1 loops=1)"
" Output: mi_18.dod, mi_18.dob, mi_18.death_verification, mi_18.data_source, mi_18.token_1, mi_18.token_2, mi_18.token_4"
" Buffers: shared read=1"
" -> Hash (cost=13.20..13.20 rows=320 width=145) (actual time=0.010..0.011 rows=0 loops=3)"
" Output: crd.cust_brth_dt, crd.indiv_entpr_id, crd.token_1, crd.token_2, crd.token_4"
" Buckets: 1024 Batches: 1 Memory Usage: 8kB"
" Worker 0: actual time=0.015..0.015 rows=0 loops=1"
" Worker 1: actual time=0.013..0.013 rows=0 loops=1"
" -> Seq Scan on datavant_o.covid_patnt_registry_deid crd (cost=0.00..13.20 rows=320 width=145) (actual time=0.010..0.010 rows=0 loops=3)"
" Output: crd.cust_brth_dt, crd.indiv_entpr_id, crd.token_1, crd.token_2, crd.token_4"
" Worker 0: actual time=0.015..0.015 rows=0 loops=1"
" Worker 1: actual time=0.012..0.012 rows=0 loops=1"
" -> Seq Scan on datavant_o.covid_patnt_registry cr (cost=0.00..213480.43 rows=5532343 width=29) (never executed)"
" Output: cr.covid_patnt_regstry_sv_key, cr.cret_ts, cr.indiv_entpr_id, cr.patnt_frst_nm, cr.patnt_last_nm, cr.patnt_brth_dt, cr.patnt_gendr_cd, cr.patnt_st_cd, cr.patnt_postl_cd, cr.patnt_dth_dt, cr.covid_idfd_frm_clm_ind, cr.covid_idfd_frm_lab_ind, cr.frst_diag_dt, cr.hosp_ind, cr.frst_covid_admsn_clm_event_key, cr.frst_covid_admsn_refined_clm_event_key, cr.frst_covid_icu_admsn_refined_clm_event_key, cr.frst_fllwup_clm_ln_key, cr.frst_fllwup_clm_svc_beg_dt, cr.subscrbr_indiv_entpr_id, cr.pre_covid_clncl_case_key, cr.post_covid_clncl_case_key, cr.prim_covid_diag_cd, cr.prim_covid_diag_dt, cr.sec_covid_diag_cd, cr.sec_covid_diag_dt, cr.frst_vacnn_dt, cr.sec_vacnn_dt, cr.vacn_manfctrer_nm, cr.load_ctl_key, cr.ingest_timestamp, cr.incr_ingest_timestamp"
"Planning Time: 4.229 ms"
"Execution Time: 40.007 ms"
DDL for Mortality Table
CREATE UNLOGGED TABLE datavant_stg_o.mortality_index
(
data_source character varying(25) COLLATE pg_catalog."default",
op_directive character varying(25) COLLATE pg_catalog."default",
dd_imp_flag integer,
dod date,
dob date,
death_verification integer,
gender_probability double precision,
gender character varying(25) COLLATE pg_catalog."default",
token_1 character varying(44) COLLATE pg_catalog."default",
token_2 character varying(44) COLLATE pg_catalog."default",
token_4 character varying(44) COLLATE pg_catalog."default",
token_5 character varying(44) COLLATE pg_catalog."default",
token_7 character varying(44) COLLATE pg_catalog."default",
token_16 character varying(44) COLLATE pg_catalog."default",
token_key character varying(44) COLLATE pg_catalog."default"
) PARTITION BY RANGE (dod);

Related

How to improve efficiency of query by removing sub-query producing field?

I have the following tables and sample data:
https://www.db-fiddle.com/f/hxQ7BGdgDJtQcv5xTChY9u/0
I would like to increase the performance of the contained query in db-fiddle by ideally removing the sub-query producing the success field (this was taken from ChatGPT output, but it was unable to remove this sub-query without destroying the results). How can I do this?
My question to ChatGPT was this:
using the tables in <db-fiddle link>, write a select sql query to
return all cfg_commissioning_tags columns and
dat_commissioning_test_log.success. These tables are joined by
cfg_commissioning_tags.id = dat_commissioning_test_log.tag_id. If the
tag_source is 'plc' and any rows have success = true, return true for
the success field for all matching type_id and relative_tag_path rows.
To the result ChatGPT produced, I added the AND ct_2.device_name != ct_1.device_name condition into the sub-query which is also required.
The current query, table creation queries, and the current query results are all copied below for posterity:
SELECT
ct_1.device_parent_path
,ct_1.device_name
,ct_1.relative_tag_path
,ct_1.tag_source
,ct_1.type_id
,CASE
WHEN EXISTS (
SELECT 1
FROM dat_commissioning_test_log ctl_2
JOIN cfg_commissioning_tags ct_2 ON ct_2.id = ctl_2.tag_id
WHERE
ct_2.type_id = ct_1.type_id
AND ct_2.relative_tag_path = ct_1.relative_tag_path
AND ct_2.device_name != ct_1.device_name -- without this, it runs super fast, but I need this
AND ctl_2.success = TRUE
AND ct_2.tag_source = 'plc'
) THEN 'true*'
ELSE CASE ctl_1.success WHEN true THEN 'true' ELSE 'false' END
END AS success
FROM cfg_commissioning_tags ct_1
LEFT JOIN dat_commissioning_test_log ctl_1 ON ct_1.id = ctl_1.tag_id
ORDER BY type_id, relative_tag_path
CREATE TABLE dat_commissioning_test_log
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
tag_id integer,
-- tested_on timestamp with time zone,
success boolean,
-- username character varying(50) COLLATE pg_catalog."default",
-- note character varying(300) COLLATE pg_catalog."default",
CONSTRAINT dat_commissioning_test_log_pkey PRIMARY KEY (id)
);
CREATE TABLE cfg_commissioning_tags
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
-- full_path character varying(400) COLLATE pg_catalog."default",
device_name character varying(50) COLLATE pg_catalog."default",
device_parent_path character varying(400) COLLATE pg_catalog."default",
-- added_on timestamp with time zone,
relative_tag_path character varying(100) COLLATE pg_catalog."default",
-- retired_on timestamp with time zone,
tag_source character varying(10) COLLATE pg_catalog."default",
type_id character varying(400) COLLATE pg_catalog."default",
CONSTRAINT cfg_commissioning_tags_pkey PRIMARY KEY (id)
);
INSERT INTO cfg_commissioning_tags (id, device_name, device_parent_path, relative_tag_path, tag_source, type_id) VALUES
(1, 'PC13A','DUMMY','Run Mins','plc','DOL'),
(2, 'PC12A','DUMMY','Run Mins','plc','DOL'),
(3, 'PC11A','DUMMY','Run Mins','plc','DOL'),
(4, 'PC11A','DUMMY','Status','io','DOL'),
(5, 'PC11A','DUMMY','Alarms/Isolator Tripped','io','DOL'),
(6, 'PC12A','DUMMY','Status','io','DOL');
INSERT INTO dat_commissioning_test_log (tag_id, success) VALUES
(1, true),
(6, true);
This is the results of the query:
device_parent_path
device_name
relative_tag_path
tag_source
type_id
success
DUMMY
PC11A
Alarms/Isolator Tripped
io
DOL
FALSE
DUMMY
PC13A
Run Mins
plc
DOL
TRUE
DUMMY
PC12A
Run Mins
plc
DOL
true*
DUMMY
PC11A
Run Mins
plc
DOL
true*
DUMMY
PC12A
Status
io
DOL
TRUE
DUMMY
PC11A
Status
io
DOL
FALSE
Edit:
Here is the EXPLAIN(ANALYZE, VERBOSE, BUFFERS) result:
"Sort (cost=4368188.41..4368208.24 rows=7932 width=188) (actual time=10378.617..10378.916 rows=8108 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id, (CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE ctl_1.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END)"
" Sort Key: ct_1.type_id, ct_1.relative_tag_path"
" Sort Method: quicksort Memory: 2350kB"
" Buffers: shared hit=2895186"
" -> Hash Left Join (cost=60.69..4367674.67 rows=7932 width=188) (actual time=1.991..10357.671 rows=8108 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id, CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE ctl_1.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END"
" Hash Cond: (ct_1.id = ctl_1.tag_id)"
" Buffers: shared hit=2895186"
" -> Seq Scan on public.cfg_commissioning_tags ct_1 (cost=0.00..426.32 rows=7932 width=156) (actual time=0.013..1.313 rows=7932 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id"
" Buffers: shared hit=347"
" -> Hash (cost=40.86..40.86 rows=1586 width=5) (actual time=0.326..0.326 rows=1593 loops=1)"
" Output: ctl_1.success, ctl_1.tag_id"
" Buckets: 2048 Batches: 1 Memory Usage: 79kB"
" Buffers: shared hit=25"
" -> Seq Scan on public.dat_commissioning_test_log ctl_1 (cost=0.00..40.86 rows=1586 width=5) (actual time=0.012..0.171 rows=1593 loops=1)"
" Output: ctl_1.success, ctl_1.tag_id"
" Buffers: shared hit=25"
" SubPlan 1"
" -> Hash Join (cost=505.71..550.57 rows=1 width=0) (actual time=1.267..1.267 rows=0 loops=8108)"
" Inner Unique: true"
" Hash Cond: (ctl_2.tag_id = ct_2.id)"
" Buffers: shared hit=2894814"
" -> Seq Scan on public.dat_commissioning_test_log ctl_2 (cost=0.00..40.86 rows=1521 width=4) (actual time=0.003..0.112 rows=1300 loops=3800)"
" Output: ctl_2.id, ctl_2.tag_id, ctl_2.tested_on, ctl_2.success, ctl_2.username, ctl_2.note"
" Filter: ctl_2.success"
" Rows Removed by Filter: 56"
" Buffers: shared hit=81338"
" -> Hash (cost=505.64..505.64 rows=6 width=4) (actual time=1.183..1.183 rows=98 loops=8108)"
" Output: ct_2.id"
" Buckets: 1024 Batches: 1 Memory Usage: 8kB"
" Buffers: shared hit=2813476"
" -> Seq Scan on public.cfg_commissioning_tags ct_2 (cost=0.00..505.64 rows=6 width=4) (actual time=0.620..1.169 rows=98 loops=8108)"
" Output: ct_2.id"
" Filter: (((ct_2.device_name)::text <> (ct_1.device_name)::text) AND ((ct_2.type_id)::text = (ct_1.type_id)::text) AND ((ct_2.relative_tag_path)::text = (ct_1.relative_tag_path)::text) AND ((ct_2.tag_source)::text = 'plc'::text))"
" Rows Removed by Filter: 7834"
" Buffers: shared hit=2813476"
"Planning Time: 0.382 ms"
"Execution Time: 10379.346 ms"
Edit 2
EXPLAIN after adding compound indexes:
"Sort (cost=540847.20..540867.03 rows=7932 width=198) (actual time=1142.282..1142.843 rows=7932 loops=1)"
" Output: ct_1.full_path, ct_1.device_parent_path, ct_1.device_name, ct_1.relative_tag_path, ct_1.tag_source, ct_1.type_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, (CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE dat_commissioning_test_log.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END)"
" Sort Key: ct_1.full_path"
" Sort Method: quicksort Memory: 2290kB"
" Buffers: shared hit=778254"
" -> Hash Left Join (cost=149.19..540333.47 rows=7932 width=198) (actual time=1.775..1108.469 rows=7932 loops=1)"
" Output: ct_1.full_path, ct_1.device_parent_path, ct_1.device_name, ct_1.relative_tag_path, ct_1.tag_source, ct_1.type_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE dat_commissioning_test_log.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END"
" Hash Cond: (ct_1.id = dat_commissioning_test_log.tag_id)"
" Buffers: shared hit=778254"
" -> Seq Scan on public.cfg_commissioning_tags ct_1 (cost=0.00..426.32 rows=7932 width=140) (actual time=0.011..0.837 rows=7932 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id"
" Buffers: shared hit=347"
" -> Hash (cost=139.24..139.24 rows=796 width=35) (actual time=1.404..1.404 rows=1417 loops=1)"
" Output: dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, dat_commissioning_test_log.success, dat_commissioning_test_log.tag_id"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 83kB"
" Buffers: shared hit=50"
" -> Hash Join (cost=85.28..139.24 rows=796 width=35) (actual time=0.938..1.249 rows=1417 loops=1)"
" Output: dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, dat_commissioning_test_log.success, dat_commissioning_test_log.tag_id"
" Inner Unique: true"
" Hash Cond: (dat_commissioning_test_log.id = "ANY_subquery".max)"
" Buffers: shared hit=50"
" -> Seq Scan on public.dat_commissioning_test_log (cost=0.00..40.93 rows=1593 width=39) (actual time=0.009..0.089 rows=1593 loops=1)"
" Output: dat_commissioning_test_log.id, dat_commissioning_test_log.tag_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.success, dat_commissioning_test_log.username, dat_commissioning_test_log.note"
" Buffers: shared hit=25"
" -> Hash (cost=82.78..82.78 rows=200 width=4) (actual time=0.926..0.926 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 66kB"
" Buffers: shared hit=25"
" -> HashAggregate (cost=80.78..82.78 rows=200 width=4) (actual time=0.710..0.804 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Group Key: "ANY_subquery".max"
" Buffers: shared hit=25"
" -> Subquery Scan on "ANY_subquery" (cost=48.90..77.23 rows=1417 width=4) (actual time=0.297..0.475 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Buffers: shared hit=25"
" -> HashAggregate (cost=48.90..63.07 rows=1417 width=8) (actual time=0.297..0.413 rows=1417 loops=1)"
" Output: max(dat_commissioning_test_log_1.id), dat_commissioning_test_log_1.tag_id"
" Group Key: dat_commissioning_test_log_1.tag_id"
" Buffers: shared hit=25"
" -> Seq Scan on public.dat_commissioning_test_log dat_commissioning_test_log_1 (cost=0.00..40.93 rows=1593 width=8) (actual time=0.006..0.090 rows=1593 loops=1)"
" Output: dat_commissioning_test_log_1.id, dat_commissioning_test_log_1.tag_id, dat_commissioning_test_log_1.tested_on, dat_commissioning_test_log_1.success, dat_commissioning_test_log_1.username, dat_commissioning_test_log_1.note"
" Buffers: shared hit=25"
" SubPlan 1"
" -> Hash Join (cost=23.10..68.04 rows=1 width=0) (actual time=0.133..0.133 rows=0 loops=7932)"
" Inner Unique: true"
" Hash Cond: (ctl_2.tag_id = ct_2.id)"
" Buffers: shared hit=777857"
" -> Seq Scan on public.dat_commissioning_test_log ctl_2 (cost=0.00..40.93 rows=1528 width=4) (actual time=0.002..0.098 rows=1301 loops=3796)"
" Output: ctl_2.id, ctl_2.tag_id, ctl_2.tested_on, ctl_2.success, ctl_2.username, ctl_2.note"
" Filter: ctl_2.success"
" Rows Removed by Filter: 56"
" Buffers: shared hit=81286"
" -> Hash (cost=23.02..23.02 rows=6 width=4) (actual time=0.057..0.057 rows=100 loops=7932)"
" Output: ct_2.id"
" Buckets: 1024 Batches: 1 Memory Usage: 8kB"
" Buffers: shared hit=696571"
" -> Index Scan using cfg_commissioning_tags_idx on public.cfg_commissioning_tags ct_2 (cost=0.41..23.02 rows=6 width=4) (actual time=0.016..0.049 rows=100 loops=7932)"
" Output: ct_2.id"
" Index Cond: (((ct_2.type_id)::text = (ct_1.type_id)::text) AND ((ct_2.relative_tag_path)::text = (ct_1.relative_tag_path)::text) AND ((ct_2.tag_source)::text = 'plc'::text))"
" Filter: ((ct_2.device_name)::text <> (ct_1.device_name)::text)"
" Rows Removed by Filter: 1"
" Buffers: shared hit=696571"
"Planning Time: 0.550 ms"
"Execution Time: 1143.359 ms"
EDIT 3
Replaced covering index on cfg_commissioning_tags:
"Sort (cost=540847.20..540867.03 rows=7932 width=198) (actual time=1152.113..1152.682 rows=7932 loops=1)"
" Output: ct_1.full_path, ct_1.device_parent_path, ct_1.device_name, ct_1.relative_tag_path, ct_1.tag_source, ct_1.type_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, (CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE dat_commissioning_test_log.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END)"
" Sort Key: ct_1.full_path"
" Sort Method: quicksort Memory: 2290kB"
" Buffers: shared hit=784891"
" -> Hash Left Join (cost=149.19..540333.47 rows=7932 width=198) (actual time=2.016..1115.111 rows=7932 loops=1)"
" Output: ct_1.full_path, ct_1.device_parent_path, ct_1.device_name, ct_1.relative_tag_path, ct_1.tag_source, ct_1.type_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, CASE WHEN (SubPlan 1) THEN 'true*'::text ELSE CASE dat_commissioning_test_log.success WHEN CASE_TEST_EXPR THEN 'true'::text ELSE 'false'::text END END"
" Hash Cond: (ct_1.id = dat_commissioning_test_log.tag_id)"
" Buffers: shared hit=784891"
" -> Seq Scan on public.cfg_commissioning_tags ct_1 (cost=0.00..426.32 rows=7932 width=140) (actual time=0.014..0.755 rows=7932 loops=1)"
" Output: ct_1.id, ct_1.full_path, ct_1.device_name, ct_1.device_parent_path, ct_1.added_on, ct_1.relative_tag_path, ct_1.retired_on, ct_1.tag_source, ct_1.type_id"
" Buffers: shared hit=347"
" -> Hash (cost=139.24..139.24 rows=796 width=35) (actual time=1.613..1.613 rows=1417 loops=1)"
" Output: dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, dat_commissioning_test_log.success, dat_commissioning_test_log.tag_id"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 83kB"
" Buffers: shared hit=50"
" -> Hash Join (cost=85.28..139.24 rows=796 width=35) (actual time=1.117..1.449 rows=1417 loops=1)"
" Output: dat_commissioning_test_log.tested_on, dat_commissioning_test_log.note, dat_commissioning_test_log.success, dat_commissioning_test_log.tag_id"
" Inner Unique: true"
" Hash Cond: (dat_commissioning_test_log.id = "ANY_subquery".max)"
" Buffers: shared hit=50"
" -> Seq Scan on public.dat_commissioning_test_log (cost=0.00..40.93 rows=1593 width=39) (actual time=0.010..0.100 rows=1593 loops=1)"
" Output: dat_commissioning_test_log.id, dat_commissioning_test_log.tag_id, dat_commissioning_test_log.tested_on, dat_commissioning_test_log.success, dat_commissioning_test_log.username, dat_commissioning_test_log.note"
" Buffers: shared hit=25"
" -> Hash (cost=82.78..82.78 rows=200 width=4) (actual time=1.103..1.103 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 66kB"
" Buffers: shared hit=25"
" -> HashAggregate (cost=80.78..82.78 rows=200 width=4) (actual time=0.798..0.940 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Group Key: "ANY_subquery".max"
" Buffers: shared hit=25"
" -> Subquery Scan on "ANY_subquery" (cost=48.90..77.23 rows=1417 width=4) (actual time=0.332..0.549 rows=1417 loops=1)"
" Output: "ANY_subquery".max"
" Buffers: shared hit=25"
" -> HashAggregate (cost=48.90..63.07 rows=1417 width=8) (actual time=0.331..0.482 rows=1417 loops=1)"
" Output: max(dat_commissioning_test_log_1.id), dat_commissioning_test_log_1.tag_id"
" Group Key: dat_commissioning_test_log_1.tag_id"
" Buffers: shared hit=25"
" -> Seq Scan on public.dat_commissioning_test_log dat_commissioning_test_log_1 (cost=0.00..40.93 rows=1593 width=8) (actual time=0.006..0.095 rows=1593 loops=1)"
" Output: dat_commissioning_test_log_1.id, dat_commissioning_test_log_1.tag_id, dat_commissioning_test_log_1.tested_on, dat_commissioning_test_log_1.success, dat_commissioning_test_log_1.username, dat_commissioning_test_log_1.note"
" Buffers: shared hit=25"
" SubPlan 1"
" -> Hash Join (cost=23.10..68.04 rows=1 width=0) (actual time=0.134..0.134 rows=0 loops=7932)"
" Inner Unique: true"
" Hash Cond: (ctl_2.tag_id = ct_2.id)"
" Buffers: shared hit=784494"
" -> Seq Scan on public.dat_commissioning_test_log ctl_2 (cost=0.00..40.93 rows=1528 width=4) (actual time=0.002..0.098 rows=1301 loops=3796)"
" Output: ctl_2.id, ctl_2.tag_id, ctl_2.tested_on, ctl_2.success, ctl_2.username, ctl_2.note"
" Filter: ctl_2.success"
" Rows Removed by Filter: 56"
" Buffers: shared hit=81286"
" -> Hash (cost=23.02..23.02 rows=6 width=4) (actual time=0.057..0.057 rows=100 loops=7932)"
" Output: ct_2.id"
" Buckets: 1024 Batches: 1 Memory Usage: 8kB"
" Buffers: shared hit=703208"
" -> Index Scan using cfg_commissioning_tags_idx on public.cfg_commissioning_tags ct_2 (cost=0.41..23.02 rows=6 width=4) (actual time=0.015..0.049 rows=100 loops=7932)"
" Output: ct_2.id"
" Index Cond: (((ct_2.type_id)::text = (ct_1.type_id)::text) AND ((ct_2.relative_tag_path)::text = (ct_1.relative_tag_path)::text) AND ((ct_2.tag_source)::text = 'plc'::text))"
" Filter: ((ct_2.device_name)::text <> (ct_1.device_name)::text)"
" Rows Removed by Filter: 1"
" Buffers: shared hit=703208"
"Planning Time: 0.514 ms"
"Execution Time: 1153.156 ms"

You need those compound Indexes :
CREATE index cfg_commissioning_tags_idx on cfg_commissioning_tags(type_id, relative_tag_path, device_name, tag_source);
CREATE index dat_commissioning_test_log_idx on dat_commissioning_test_log(tag_id, success);
You can extend the first Index to take in addition of the columns that are used on the where clause and join, We can add all the columns that are used on the select (in this case we can add device_parent_path to our index), this type of index called Covering Index.

How to speed up the query in GCP PostgreSQL

My postgres server details are below using
postgres 13 version, RAM 52GB,SSD 1000GB And the DB size is 300GB
Here is my Query
select distinct "col2","col3","col1"
from table1(foreign table)
where "col2" not in (select "col4"
from table2(foreign table)
where "col9" = 'data1'
and "col10"='A')
and "col2" not in (select "col13"
from table5(foreign table)
where "col11" = 'A'
and "col12" in ('data1', 'data2', 'data3', 'data4'))
and "col6" > '2022-01-01' and "col10" = 'A' and "col18" = 'P'
and not "col7" = 'V' and "Type" = 'A'
order by "col1"
Here is my Explain Plan
"Unique (cost=372.13..372.14 rows=1 width=1074) (actual time=145329.010..145329.136 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Buffers: shared hit=3"
" -> Sort (cost=372.13..372.14 rows=1 width=1074) (actual time=145329.008..145329.027 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Sort Key: table1.""col1"", table1.""col2"", table1.""col3"""
" Sort Method: quicksort Memory: 63kB"
" Buffers: shared hit=3"
" -> Foreign Scan on public.table1 (cost=360.38..372.12 rows=1 width=1074) (actual time=144430.980..145327.532 rows=336 loops=1)"
" Output: table1.""col2"", table1.""col3"", table1.""col1"""
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)))"
" Rows Removed by Filter: 253144"
" Remote SQL: SELECT ""col2"", ""col3"", ""col1"" FROM dbo.table4 WHERE ((""col6"" > '2022-01-01 00:00:00'::timestamp without time zone)) AND ((""col7"" <> 'V'::text)) AND ((""col8"" = 'A'::text))"
" SubPlan 1"
" -> Foreign Scan on public.table2 (cost=100.00..128.63 rows=1 width=42) (actual time=2.169..104702.862 rows=50573365 loops=1)"
" Output: table2.""col4"""
" Remote SQL: SELECT ""col5"" FROM dbo.table3 WHERE ((""col9"" = 'data1'::text)) AND ((""col10"" = 'A'::text))"
" SubPlan 2"
" -> Foreign Scan on public.table5 (cost=100.00..131.74 rows=1 width=42) (actual time=75.363..1015.498 rows=360240 loops=1)"
" Output: table5.""col13"""
" Remote SQL: SELECT ""col14"" FROM dbo.table6 WHERE ((""col11"" = 'A'::text)) AND ((""col12"" = ANY ('{data1,data2,data3,data4}'::text[])))"
"Planning:"
" Buffers: shared hit=142"
"Planning Time: 1.887 ms"
"Execution Time: 145620.958 ms"
table1 - 4mln row count
table2 - 250mln row count
table3 - 400mln row count
Table Definition table1
CREATE TABLE IF NOT EXISTS table1
(
"col1" character varying(12) ,
"col" character varying(1) ,
"col" character varying(1) ,
...
...
);
Indexes are exist on other column not having on query columns "col2","col3","col1"
Table Definition table2
CREATE TABLE IF NOT EXISTS table2
(
"col4" character varying(12) ,
"col9" character varying(1) ,
"col10" character varying(1) ,
...
...
);
Indexes are exist on table2
CREATE INDEX index1 ON table2("col4" ASC,"col9" ASC,"col" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col" ASC,"col9" ASC,"col4" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col9" ASC,"col4" ASC,"col" ASC,"col10" ASC);
CREATE INDEX index1 ON table2("col" ASC,"col9" ASC,"col10" ASC,"col" ASC);
Table Definition table5
CREATE TABLE IF NOT EXISTS table5
(
"col11" character varying(12) ,
"col13" character varying(1) ,
"col" character varying(1) ,
...
...
);
Indexes are exist on table5
CREATE INDEX index ON table5("col" ASC, "col" ASC,"col11" ASC);
CREATE INDEX index ON table5("col13" ASC,"col11" ASC);
CREATE INDEX index ON table5("col" ASC,"col13" ASC,"col11" ASC)INCLUDE ("col");
CREATE INDEX index ON table5("col" ASC, "col" ASC,"col11" ASC);
How to speed up the following query execution? it took 3 minutes just to retrieve 365 records.
Here is my EXPLAIN (ANALYZE, BUFFERS)
"Unique (cost=372.13..372.14 rows=1 width=1074) (actual time=110631.114..110631.262 rows=336 loops=1)"
" -> Sort (cost=372.13..372.14 rows=1 width=1074) (actual time=110631.111..110631.142 rows=336 loops=1)"
" Sort Key: table1.""col1"", table1.""col2"", table1.""col3"""
" Sort Method: quicksort Memory: 63kB"
" -> Foreign Scan on table1 (cost=360.38..372.12 rows=1 width=1074) (actual time=110432.132..110629.640 rows=336 loops=1)"
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)))"
" Rows Removed by Filter: 253144"
" SubPlan 1"
" -> Foreign Scan on table2 (cost=100.00..128.63 rows=1 width=42) (actual time=63638.173..71979.772 rows=50573365 loops=1)"
" SubPlan 2"
" -> Foreign Scan on table5 (cost=100.00..131.74 rows=1 width=42) (actual time=569.126..630.782 rows=360240 loops=1)"
"Planning Time: 0.266 ms"
"Execution Time: 111748.715 ms"
Here is my EXPLAIN (ANALYZE, BUFFERS) of the "remote SQL" when executed on the remote database
"Limit (cost=4157478.69..4157602.66 rows=1000 width=47) (actual time=68356.908..68681.831 rows=336 loops=1)"
" Buffers: shared hit=66205118"
" -> Unique (cost=4157478.69..4164948.04 rows=60253 width=47) (actual time=68356.905..68681.801 rows=336 loops=1)"
" Buffers: shared hit=66205118"
" -> Gather Merge (cost=4157478.69..4164496.14 rows=60253 width=47) (actual time=68356.901..68681.718 rows=336 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=66205118"
" -> Sort (cost=4156478.66..4156541.43 rows=25105 width=47) (actual time=66154.447..66154.459 rows=112 loops=3)"
" Sort Key: table4.""col1"", table4.""col2"", table4.""col3"""
" Sort Method: quicksort Memory: 63kB"
" Buffers: shared hit=66205118"
" Worker 0: Sort Method: quicksort Memory: 25kB"
" Worker 1: Sort Method: quicksort Memory: 25kB"
" -> Parallel Seq Scan on table4 (cost=3986703.25..4154644.03 rows=25105 width=47) (actual time=66041.929..66153.663 rows=112 loops=3)"
" Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND (""col6"" > '2022-01-01 00:00:00'::timestamp without time zone) AND ((""col7"")::text <> 'V'::text) AND ((""col8"")::text = 'A'::text))"
" Rows Removed by Filter: 1236606"
" Buffers: shared hit=66205102"
" SubPlan 1"
" -> Index Only Scan using col20 on table3 (cost=0.70..2696555.01 rows=50283867 width=13) (actual time=0.134..25085.583 rows=50573365 loops=3)"
" Index Cond: ((""col9"" = 'data1'::text) AND (""col10"" = 'A'::text))"
" Heap Fetches: 0"
" Buffers: shared hit=65737946"
" SubPlan 2"
" -> Bitmap Heap Scan on table6 (cost=4962.91..1163549.12 rows=355779 width=13) (actual time=160.770..440.978 rows=360240 loops=3)"
" Recheck Cond: (((""col12"")::text = ANY ('{data1,data2,data3,data4}'::text[])) AND ((""col11"")::text = 'A'::text))"
" Heap Blocks: exact=110992"
" Buffers: shared hit=333992"
" -> Bitmap Index Scan on col21 (cost=0.00..4873.97 rows=355779 width=0) (actual time=120.354..120.354 rows=360240 loops=3)"
" Index Cond: (((""col12"")::text = ANY ('{data1,data2,data3,data4}'::text[])) AND ((""col11"")::text = 'A'::text))"
" Buffers: shared hit=1016"
"Planning:"
" Buffers: shared hit=451"
"Planning Time: 4.039 ms"
"Execution Time: 69001.171 ms"

Query optimization in Postgres

I have is over 25 million rows, I'm trying to learn what needs to be done to speed up the query time.
"Properties" column is integer[]. I created a GIN index on F."Properties", what is slowing my query and how do I solve it?
SELECT
F. "Id",
F. "Name",
F. "Url",
F. "CountryModel",
F. "IsVerify",
count(*) AS Counter
FROM
public. "Firms" F,
LATERAL unnest(F."Properties") AS P
WHERE
F. "CountryId" = 1
AND P = ANY (ARRAY[126,128])
AND "Properties" && ARRAY[126,128]
AND F. "Deleted" = FALSE
GROUP BY
F. "Id"
ORDER BY
F. "IsVerify" DESC,
Counter DESC,
F. "PageRank" DESC OFFSET 0 ROWS FETCH FIRST 20 ROW ONLY
**Query Explain Analyze**
"Limit (cost=793826.15..793826.20 rows=20 width=100) (actual time=12255.433..12257.874 rows=20 loops=1)"
" -> Sort (cost=793826.15..794287.87 rows=184689 width=100) (actual time=12255.433..12257.872 rows=20 loops=1)"
" Sort Key: f.""IsVerify"" DESC, (count(*)) DESC, f.""PageRank"" DESC"
" Sort Method: top-N heapsort Memory: 29kB"
" -> GroupAggregate (cost=755368.13..788911.64 rows=184689 width=100) (actual time=12062.457..12224.136 rows=201352 loops=1)"
" Group Key: f.""Id"""
" -> Nested Loop (cost=755368.13..785217.86 rows=369378 width=92) (actual time=12062.450..12176.968 rows=205124 loops=1)"
" -> Gather Merge (cost=755368.12..776878.19 rows=184689 width=120) (actual time=12062.435..12090.924 rows=201352 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=754368.09..754560.48 rows=76954 width=120) (actual time=12050.371..12060.077 rows=67117 loops=3)"
" Sort Key: f.""Id"""
" Sort Method: external merge Disk: 9848kB"
" Worker 0: Sort Method: external merge Disk: 9840kB"
" Worker 1: Sort Method: external merge Disk: 9784kB"
" -> Parallel Bitmap Heap Scan on ""Firms"" f (cost=1731.34..743387.12 rows=76954 width=120) (actual time=44.825..12010.247 rows=67117 loops=3)"
" Recheck Cond: (""Properties"" && '{126,128}'::integer[])"
" Rows Removed by Index Recheck: 356198"
" Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
" Heap Blocks: exact=17368 lossy=47419"
" -> Bitmap Index Scan on ix_properties_gin (cost=0.00..1685.17 rows=184689 width=0) (actual time=47.787..47.787 rows=201354 loops=1)"
" Index Cond: (""Properties"" && '{126,128}'::integer[])"
" -> Memoize (cost=0.01..0.14 rows=2 width=0) (actual time=0.000..0.000 rows=1 loops=201352)"
" Cache Key: f.""Properties"""
" Hits: 179814 Misses: 21538 Evictions: 0 Overflows: 0 Memory Usage: 3076kB"
" -> Function Scan on unnest p (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=1 loops=21538)"
" Filter: (p = ANY ('{126,128}'::integer[]))"
" Rows Removed by Filter: 6"
"Planning Time: 2.143 ms"
"Execution Time: 12259.445 ms"

Postgres 29m rows query performance optimization

I am a newbie to database optimisations,I have is around 29 million rows,it takes 13 seconds. What can I do to optimize performance?
"Properties" column is int array. I created a GIN index on F."Properties",
SELECT
F. "Id",
F. "Name",
F. "Url",
F. "CountryModel",
F. "Properties",
F. "PageRank",
F. "IsVerify",
count(*) AS Counter
FROM
public. "Firms" F,
LATERAL unnest(F."Properties") AS P
WHERE
F. "CountryId" = 1
AND P = ANY (ARRAY[126,128])
AND "Properties" && ARRAY[126,128]
AND F. "Deleted" = FALSE
GROUP BY
F. "Id"
ORDER BY
F. "IsVerify" DESC,
Counter DESC,
F. "PageRank" DESC OFFSET 0 ROWS FETCH FIRST 100 ROW ONLY```
Thats My Query Plan Analyze
"Limit (cost=801718.65..801718.70 rows=20 width=368) (actual time=12671.277..12674.826 rows=20 loops=1)"
" -> Sort (cost=801718.65..802180.37 rows=184689 width=368) (actual time=12671.276..12674.824 rows=20 loops=1)"
" Sort Key: f.""IsVerify"" DESC, (count(*)) DESC, f.""PageRank"" DESC"
" Sort Method: top-N heapsort Memory: 47kB"
" -> GroupAggregate (cost=763260.63..796804.14 rows=184689 width=368) (actual time=12284.752..12592.010 rows=201352 loops=1)"
" Group Key: f.""Id"""
" -> Nested Loop (cost=763260.63..793110.36 rows=369378 width=360) (actual time=12284.734..12488.106 rows=205124 loops=1)"
" -> Gather Merge (cost=763260.62..784770.69 rows=184689 width=360) (actual time=12284.716..12389.961 rows=201352 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=762260.59..762452.98 rows=76954 width=360) (actual time=12258.175..12309.931 rows=67117 loops=3)"
" Sort Key: f.""Id"""
" Sort Method: external merge Disk: 35432kB"
" Worker 0: Sort Method: external merge Disk: 35536kB"
" Worker 1: Sort Method: external merge Disk: 35416kB"
" -> Parallel Bitmap Heap Scan on ""Firms"" f (cost=1731.34..743387.12 rows=76954 width=360) (actual time=57.500..12167.222 rows=67117 loops=3)"
" Recheck Cond: (""Properties"" && '{126,128}'::integer[])"
" Rows Removed by Index Recheck: 356198"
" Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
" Heap Blocks: exact=17412 lossy=47209"
" -> Bitmap Index Scan on ix_properties_gin (cost=0.00..1685.17 rows=184689 width=0) (actual time=61.628..61.628 rows=201354 loops=1)"
" Index Cond: (""Properties"" && '{126,128}'::integer[])"
" -> Memoize (cost=0.01..0.14 rows=2 width=0) (actual time=0.000..0.000 rows=1 loops=201352)"
" Cache Key: f.""Properties"""
" Hits: 179814 Misses: 21538 Evictions: 0 Overflows: 0 Memory Usage: 3076kB"
" -> Function Scan on unnest p (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=1 loops=21538)"
" Filter: (p = ANY ('{126,128}'::integer[]))"
" Rows Removed by Filter: 6"
"Planning Time: 2.542 ms"
"Execution Time: 12675.382 ms"
Thats EXPLAIN (ANALYZE, BUFFERS) result
"Limit (cost=793826.15..793826.20 rows=20 width=100) (actual time=12879.468..12882.414 rows=20 loops=1)"
" Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
" -> Sort (cost=793826.15..794287.87 rows=184689 width=100) (actual time=12879.468..12882.412 rows=20 loops=1)"
" Sort Key: f.""IsVerify"" DESC, (count(*)) DESC, f.""PageRank"" DESC"
" Sort Method: top-N heapsort Memory: 29kB"
" Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
" -> GroupAggregate (cost=755368.13..788911.64 rows=184689 width=100) (actual time=12623.980..12845.122 rows=201352 loops=1)"
" Group Key: f.""Id"""
" Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
" -> Nested Loop (cost=755368.13..785217.86 rows=369378 width=92) (actual time=12623.971..12785.946 rows=205124 loops=1)"
" Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
" -> Gather Merge (cost=755368.12..776878.19 rows=184689 width=120) (actual time=12623.945..12680.899 rows=201352 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
" -> Sort (cost=754368.09..754560.48 rows=76954 width=120) (actual time=12613.425..12624.658 rows=67117 loops=3)"
" Sort Key: f.""Id"""
" Sort Method: external merge Disk: 9848kB"
" Buffers: shared hit=108 read=194121 written=1, temp read=3685 written=3697"
" Worker 0: Sort Method: external merge Disk: 9824kB"
" Worker 1: Sort Method: external merge Disk: 9808kB"
" -> Parallel Bitmap Heap Scan on ""Firms"" f (cost=1731.34..743387.12 rows=76954 width=120) (actual time=42.098..12567.883 rows=67117 loops=3)"
" Recheck Cond: (""Properties"" && '{126,128}'::integer[])"
" Rows Removed by Index Recheck: 356198"
" Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
" Heap Blocks: exact=17323 lossy=47429"
" Buffers: shared hit=97 read=194118 written=1"
" -> Bitmap Index Scan on ix_properties_gin (cost=0.00..1685.17 rows=184689 width=0) (actual time=41.862..41.862 rows=201354 loops=1)"
" Index Cond: (""Properties"" && '{126,128}'::integer[])"
" Buffers: shared hit=4 read=74"
" -> Memoize (cost=0.01..0.14 rows=2 width=0) (actual time=0.000..0.000 rows=1 loops=201352)"
" Cache Key: f.""Properties"""
" Hits: 179814 Misses: 21538 Evictions: 0 Overflows: 0 Memory Usage: 3076kB"
" -> Function Scan on unnest p (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=1 loops=21538)"
" Filter: (p = ANY ('{126,128}'::integer[]))"
" Rows Removed by Filter: 6"
"Planning:"
" Buffers: shared hit=32 read=6 dirtied=1"
"Planning Time: 4.533 ms"
"Execution Time: 12883.604 ms"

You should increase work_mem to get rid of the lossy pages in the bitmap. I don't think this will make a big difference, because I suspect most of your time is going to read the pages from disk, and converting lossy pages to exact pages doesn't change how many pages get read (unless TOAST is involved, which I suspect is not--how large does the "Properties" array get?). But I might be wrong, so try it and see. Also, if you turn on track_io_timing and collect your plans with EXPLAIN (ANALYZE, BUFFERS), then we could immediately see if the IO read time was the problem.
Beyond that, this looks very hard to optimize with traditional methods. You can usually optimize ORDER BY...LIMIT by using an index to read rows already in order, but since the 2nd column in your ordering is computed dynamically, this is unlikely here. Are values within "Properties" unique? So can 126 and 128 each exist and be counted at most once per row, or can they exist and be counted multiple times?
The easiest way to optimize this might be on the app or business end. Do we really need to run this query at all, and why? What if we queried only "IsVerify" is true, rather than sorting by it? If that only returns 95 rows, is it really necessary to go back and fill in 5 more with "IsVerify" is false?, etc.

Stuck with timeout issue. Here is the Query , I am getting timeout for:

I am getting this timeout error:
Message: SQLSTATE[57014]: Query canceled: 7 ERROR: canceling statement due to statement timeout
This is the query that is timing out:
SELECT
log.id,
integration.id AS intid,
log.integration_id AS integration_id,
integration.name,
log.createddate
FROM integration log
LEFT JOIN integration__sf integration on ( integration.id = log.integration_id)
LEFT JOIN property prop on ( log.property_id = prop.id )
LEFT JOIN account acc on ( acc.sfid = integration.account )
WHERE
log.id IS NOT NULL
AND log.script_type = 'Pull'
AND log.script_name = 'ModifyTags'
AND log.createddate >= '2018-11-01 00:00:00'
AND log.createddate <= '2018-11-30 23:59:59'
ORDER BY log.id desc LIMIT 100 OFFSET 0;
Is there any scope to optimize this query any more?
Here is the EXPLAIN (ANALYZE, BUFFERS) output:
"Limit (cost=30809.27..30820.93 rows=100 width=262) (actual time=11.793..11.803 rows=21 loops=1)"
" Buffers: shared hit=5 read=935"
" -> Gather Merge (cost=30809.27..31199.66 rows=3346 width=262) (actual time=11.791..11.799 rows=21 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=5 read=935"
" -> Sort (cost=29809.24..29813.43 rows=1673 width=262) (actual time=6.844..6.844 rows=7 loops=3)"
" Sort Key: log.id DESC"
" Sort Method: quicksort Memory: 27kB"
" Buffers: shared hit=1967 read=937"
" -> Hash Left Join (cost=3003.36..29719.67 rows=1673 width=262) (actual time=6.774..6.819 rows=7 loops=3)"
" Hash Cond: ((integration.account__c)::text = (acc.sfid)::text)"
" Buffers: shared hit=1953 read=937"
" -> Nested Loop Left Join (cost=2472.13..29167.33 rows=1673 width=254) (actual time=3.643..3.686 rows=7 loops=3)"
" Buffers: shared hit=969 read=468"
" -> Hash Left Join (cost=2471.71..17895.82 rows=1673 width=228) (actual time=3.635..3.673 rows=7 loops=3)"
" Hash Cond: (log.integration_id = integration.id)"
" Buffers: shared hit=969 read=468"
" -> Parallel Bitmap Heap Scan on integration_log log (cost=1936.93..17339.92 rows=1673 width=148) (actual time=0.097..0.132 rows=7 loops=3)"
" Recheck Cond: (((script_name)::text = 'ModifyTags'::text) AND ((script_type)::text = 'Pull'::text) AND (createddate >= '2018-11-01 00:00:00+05:30'::timestamp with time zone) AND (createddate <= '2018-12-07 23:59:59+05: (...)"
" Filter: (id IS NOT NULL)"
" Heap Blocks: exact=19"
" Buffers: shared read=26"
" -> Bitmap Index Scan on ah_idx_integeration_log_script_name (cost=0.00..1935.93 rows=4016 width=0) (actual time=0.201..0.201 rows=21 loops=1)"
" Index Cond: (((script_name)::text = 'ModifyTags'::text) AND ((script_type)::text = 'Pull'::text) AND (createddate >= '2018-11-01 00:00:00+05:30'::timestamp with time zone) AND (createddate <= '2018-12-07 23:59:59 (...)"
" Buffers: shared read=5"
" -> Hash (cost=483.79..483.79 rows=4079 width=80) (actual time=3.463..3.463 rows=4079 loops=3)"
" Buckets: 4096 Batches: 1 Memory Usage: 481kB"
" Buffers: shared hit=887 read=442"
" -> Seq Scan on integration__c integration (cost=0.00..483.79 rows=4079 width=80) (actual time=0.012..2.495 rows=4079 loops=3)"
" Buffers: shared hit=887 read=442"
" -> Index Scan using property__c_pkey on property__c prop (cost=0.42..6.74 rows=1 width=30) (actual time=0.001..0.001 rows=0 loops=21)"
" Index Cond: (log.property_id = id)"
" -> Hash (cost=498.88..498.88 rows=2588 width=42) (actual time=3.098..3.098 rows=2577 loops=3)"
" Buckets: 4096 Batches: 1 Memory Usage: 220kB"
" Buffers: shared hit=950 read=469"
" -> Seq Scan on account acc (cost=0.00..498.88 rows=2588 width=42) (actual time=0.011..2.531 rows=2577 loops=3)"
" Buffers: shared hit=950 read=469"
"Planning time: 2.513 ms"
"Execution time: 13.904 ms"
Actually I have got the optimization solution, here the query would be like.
SELECT
log.id,
integration.id AS intid,
log.integration_id AS integration_id,
integration.name,
log.createddate
FROM integration log
LEFT JOIN integration__sf integration on ( integration.id = log.integration_id)
LEFT JOIN property prop on ( log.property_id = prop.id )
LEFT JOIN account acc on ( acc.sfid = integration.account AND prop.account = acc.sfid AND prop.group_membership = integration.grouping)
WHERE log.id IS NOT NULL
AND log.script_type = 'Pull'
AND log.script_name = 'ModifyTags'
AND log.createddate >= '2018-11-01 00:00:00'
AND log.createddate <= '2018-11-30 23:59:59'
ORDER BY log.id desc LIMIT 100 OFFSET 0
If you would suggest more, I will be grateful.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Improve PostgreSQL - Insert statement execution RUN TIME - postgresql

Related

How to improve efficiency of query by removing sub-query producing field?

How to speed up the query in GCP PostgreSQL

Query optimization in Postgres

Postgres 29m rows query performance optimization

Stuck with timeout issue. Here is the Query , I am getting timeout for:

Categories

Resources