PostgreSQL delete-statement using big/long-sub-query hangs/fails - postgresql

I am currently trying to delete about 1+ million rows in a table (actually, 30+ million, but i have made a sub-set, due to problems arising in this as well), where the condition is that the row must not have any references in other tables as a foreign key.. and i delete in batches of 30000 rows at a time.
so the query looks like:
DELETE FROM table_name tn WHERE tn.id IN (
SELECT tn2.id FROM table_name as tn2
LEFT JOIN table_name_join_1 ON table_name_join_1.table_id = tn2.id
LEFT JOIN table_name_join_2 ON table_name_join_2.table_id = tn2.id
...
LEFT JOIN table_name_join_19 ON table_name_join_19.table_id = tn2.id
WHERE table_name_join_1.table_id IS NULL
AND table_name_join_2.table_id IS NULL
...
AND table_name_join_19.table_id IS NULL
LIMIT 30000 OFFSET x
)
The table is referenced in 19 different tables, so a lot of left joins in the sub-query, and takes 61 seconds to run without LIMIT & OFFSET when counting the total amount of rows that will be affected.
The problem is that the query just hangs, when doing it in the delete statement, but works when just counting using COUNT(1).. I am not sure if there is a better way of deleting a lot of rows in a table.. or is it a matter of examining the tables referencing the table-in-question and see if some indexes are wack in some way.
Hope someone can help :D Its quite annoying seeing a query work and then just hang/fail straight afterwards when used as a sub-query.
I use psycopg2 on Python 2.7.17 (a work thing).. i also speculated in when to close the cursor from the psycopg2 connection to up speeds.. currently i create the cursor outside the loop running the delete, and close it along with the db-connection when the script is done... prev. the cursor was closed after each commit of a delete-statement, but it seemed a bit to much to me.. i don't know? Current loop looks like:
cursor = conn.cursor()
while count >= offset:
...
delete(cursor, batch_size, offset)
...
offset += batch_size
Also, is it a bad idea to commit() after each delete-statement is executed or should i wait until after the loop is finished executing all the delete statements, and then commit.. if so, shouldn't i look into using transactions instead ?
Basically I hope someone can tell me why stuff is so slow/fails, even though a count without limit and offset "only" takes 60 seconds??

DELETE FROM xxx has almost the same subsyntax as SELECT COUNT(*) FROM xxx ; so just to test the plan, you can run the fragment below, and check if you get an indexed plan:
EXPLAIN
SELECT COUNT(*)
FROM table_name tn
WHERE NOT EXISTS ( SELECT *
FROM table_name_join_1 x1 WHERE x1.table_id = tn.id
)
--
AND NOT EXISTS ( SELECT *
FROM table_name_join_2 x2 WHERE x2.table_id = tn.id
)
--
AND NOT EXISTS ( SELECT *
FROM table_name_join_3 x3 WHERE x3.table_id = tn.id
)
--
-- et cetera
--
;
Create some data, since it is hard to benchmark pseudocode:
SELECT version();
CREATE TABLE table_name
( id serial NOT NULL PRIMARY KEY
, name text
);
INSERT INTO table_name ( name )
SELECT 'Name_' || gs::text
FROM generate_series(1,100000) gs;
--
CREATE TABLE table_name_join_2
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_2(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 2 = 0
;
--
CREATE TABLE table_name_join_3
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_3(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 3 = 0
;
--
CREATE TABLE table_name_join_5
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_5(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 5 = 0
;
--
CREATE TABLE table_name_join_7
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_7(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 7 = 0
;
--
CREATE TABLE table_name_join_11
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_11(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 11 = 0
;
Now, run the DELETE query:
VACUUM ANALYZE table_name;
VACUUM ANALYZE table_name_join_2;
VACUUM ANALYZE table_name_join_3;
VACUUM ANALYZE table_name_join_5;
VACUUM ANALYZE table_name_join_7;
EXPLAIN ANALYZE
DELETE
FROM table_name tn
WHERE 1=1
AND NOT EXISTS ( SELECT * FROM table_name_join_2 x2 WHERE x2.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_3 x3 WHERE x3.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_5 x5 WHERE x5.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_7 x7 WHERE x7.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_11 x11 WHERE x11.table_id = tn.id)
--
-- et cetera
--
;
SELECT count(*) FROM table_name;
Now, exactly the same, but with supporting indexes on the FKs:
CREATE INDEX table_name_join_2_2 ON table_name_join_2( table_id);
CREATE INDEX table_name_join_3_3 ON table_name_join_3( table_id);
CREATE INDEX table_name_join_5_5 ON table_name_join_5( table_id);
CREATE INDEX table_name_join_7_7 ON table_name_join_7( table_id);
CREATE INDEX table_name_join_11_11 ON table_name_join_11( table_id);
VACUUM ANALYZE table_name;
VACUUM ANALYZE table_name_join_2;
VACUUM ANALYZE table_name_join_3;
VACUUM ANALYZE table_name_join_5;
VACUUM ANALYZE table_name_join_7;
EXPLAIN ANALYZE
DELETE
FROM table_name tn
WHERE 1=1
...
;
----------
Query plan#1:
----------
DROP SCHEMA
CREATE SCHEMA
SET
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Raspbian 8.3.0-6+rpi1) 8.3.0, 32-bit
(1 row)
CREATE TABLE
INSERT 0 100000
CREATE TABLE
INSERT 0 50000
CREATE TABLE
INSERT 0 33333
CREATE TABLE
INSERT 0 20000
CREATE TABLE
INSERT 0 14285
CREATE TABLE
INSERT 0 9090
SET
SET
VACUUM
VACUUM
VACUUM
VACUUM
VACUUM QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on table_name tn (cost=3969.52..7651.94 rows=11429 width=36) (actual time=812.010..812.011 rows=0 loops=1)
-> Hash Anti Join (cost=3969.52..7651.94 rows=11429 width=36) (actual time=206.775..712.982 rows=20779 loops=1)
Hash Cond: (tn.id = x7.table_id)
-> Hash Anti Join (cost=3557.10..7088.09 rows=13334 width=34) (actual time=183.070..654.030 rows=24242 loops=1)
Hash Cond: (tn.id = x5.table_id)
-> Hash Anti Join (cost=2979.10..6329.25 rows=16667 width=28) (actual time=149.870..578.173 rows=30303 loops=1)
Hash Cond: (tn.id = x3.table_id)
-> Hash Anti Join (cost=2016.11..5124.59 rows=25000 width=22) (actual time=95.589..461.053 rows=45455 loops=1)
Hash Cond: (tn.id = x2.table_id)
-> Merge Anti Join (cost=572.11..3271.21 rows=50000 width=16) (actual time=14.486..261.955 rows=90910 loops=1)
Merge Cond: (tn.id = x11.table_id)
-> Index Scan using table_name_pkey on table_name tn (cost=0.29..2344.99 rows=100000 width=10) (actual time=0.031..118.968 rows=100000 loops=1)
-> Sort (cost=571.82..589.22 rows=6960 width=10) (actual time=14.446..20.365 rows=9090 loops=1)
Sort Key: x11.table_id
Sort Method: quicksort Memory: 612kB
-> Seq Scan on table_name_join_11 x11 (cost=0.00..127.60 rows=6960 width=10) (actual time=0.029..6.939 rows=9090 loops=1)
-> Hash (cost=819.00..819.00 rows=50000 width=10) (actual time=80.439..80.440 rows=50000 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2014kB
-> Seq Scan on table_name_join_2 x2 (cost=0.00..819.00 rows=50000 width=10) (actual time=0.019..36.848 rows=50000 loops=1)
-> Hash (cost=546.33..546.33 rows=33333 width=10) (actual time=53.678..53.678 rows=33333 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 1428kB
-> Seq Scan on table_name_join_3 x3 (cost=0.00..546.33 rows=33333 width=10) (actual time=0.027..24.132 rows=33333 loops=1)
-> Hash (cost=328.00..328.00 rows=20000 width=10) (actual time=32.884..32.885 rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 832kB
-> Seq Scan on table_name_join_5 x5 (cost=0.00..328.00 rows=20000 width=10) (actual time=0.017..15.135 rows=20000 loops=1)
-> Hash (cost=233.85..233.85 rows=14285 width=10) (actual time=23.542..23.542 rows=14285 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 567kB
-> Seq Scan on table_name_join_7 x7 (cost=0.00..233.85 rows=14285 width=10) (actual time=0.016..10.742 rows=14285 loops=1)
Planning Time: 4.470 ms
Trigger for constraint table_name_join_2_table_id_fkey: time=172949.350 calls=20779
Trigger for constraint table_name_join_3_table_id_fkey: time=116772.757 calls=20779
Trigger for constraint table_name_join_5_table_id_fkey: time=71218.348 calls=20779
Trigger for constraint table_name_join_7_table_id_fkey: time=51760.503 calls=20779
Trigger for constraint table_name_join_11_table_id_fkey: time=36120.128 calls=20779
Execution Time: 449783.490 ms
(35 rows)
count
-------
79221
(1 row)
Query plan#2:
SET
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
SET
VACUUM
VACUUM
VACUUM
VACUUM
VACUUM
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on table_name tn (cost=1.73..6762.95 rows=11429 width=36) (actual time=776.987..776.988 rows=0 loops=1)
-> Merge Anti Join (cost=1.73..6762.95 rows=11429 width=36) (actual time=0.212..676.794 rows=20779 loops=1)
Merge Cond: (tn.id = x7.table_id)
-> Merge Anti Join (cost=1.44..6322.99 rows=13334 width=34) (actual time=0.191..621.986 rows=24242 loops=1)
Merge Cond: (tn.id = x5.table_id)
-> Merge Anti Join (cost=1.16..5706.94 rows=16667 width=28) (actual time=0.172..550.669 rows=30303 loops=1)
Merge Cond: (tn.id = x3.table_id)
-> Merge Anti Join (cost=0.87..4661.02 rows=25000 width=22) (actual time=0.147..438.036 rows=45455 loops=1)
Merge Cond: (tn.id = x2.table_id)
-> Merge Anti Join (cost=0.58..2938.75 rows=50000 width=16) (actual time=0.125..250.082 rows=90910 loops=1)
Merge Cond: (tn.id = x11.table_id)
-> Index Scan using table_name_pkey on table_name tn (cost=0.29..2344.99 rows=100000 width=10) (actual time=0.031..116.630 rows=100000 loops=1)
-> Index Scan using table_name_join_11_11 on table_name_join_11 x11 (cost=0.29..230.14 rows=9090 width=10) (actual time=0.090..11.228 rows=9090 loops=1)
-> Index Scan using table_name_join_2_2 on table_name_join_2 x2 (cost=0.29..1222.29 rows=50000 width=10) (actual time=0.019..59.500 rows=50000 loops=1)
-> Index Scan using table_name_join_3_3 on table_name_join_3 x3 (cost=0.29..816.78 rows=33333 width=10) (actual time=0.022..40.473 rows=33333 loops=1)
-> Index Scan using table_name_join_5_5 on table_name_join_5 x5 (cost=0.29..491.09 rows=20000 width=10) (actual time=0.016..23.105 rows=20000 loops=1)
-> Index Scan using table_name_join_7_7 on table_name_join_7 x7 (cost=0.29..351.86 rows=14285 width=10) (actual time=0.017..16.903 rows=14285 loops=1)
Planning Time: 4.737 ms
Trigger for constraint table_name_join_2_table_id_fkey: time=1114.497 calls=20779
Trigger for constraint table_name_join_3_table_id_fkey: time=1096.065 calls=20779
Trigger for constraint table_name_join_5_table_id_fkey: time=1094.951 calls=20779
Trigger for constraint table_name_join_7_table_id_fkey: time=1090.509 calls=20779
Trigger for constraint table_name_join_11_table_id_fkey: time=1173.987 calls=20779
Execution Time: 6426.626 ms
(24 rows)
count
-------
79221
(1 row)
So, the query speeds up from 450 seconds to 7 seconds. And most of the time appears to be spent on checking the FK constraints, after the actual delete in the base table. [these constraints are implemented as invisible triggers in Postgres]
Summary table:
query type | indexes on all 5 FKs | workmem | total time(ms) | time for triggers
----------------+-----------------------+---------------+-----------------------+-------------------
NOT EXISTS() | No | 4Mb | 449783.490 | 448821.083
NOT EXISTS() | Yes | 4Mb | 6426.626 | 5570.009
NOT EXISTS() | Yes | 64Kb | 6405.273 | 5545.352
NOT IN() | No | 4Mb | 449435.530 | 448829.179
NOT IN() | Yes | 4Mb | 6113.690 | 5443.505
NOT IN() | Yes | 64Kb | 8595341.467 | 5545.796
Conclusion: it is up to you to decide if you want indexes on Foreign Keys.

Related

Postgresql max query on big indexed table has slow performance

I have a table inside my Postgresql database, called consumer_actions. It contains all the actions done by consumers registered in my app. At the moment, this table has ~ 500 million records. What i'm trying to do is to get the maximum id, based on the system that the action came from.
The definition of the table is:
CREATE TABLE public.consumer_actions (
id int4 NOT NULL,
system_id int4 NOT NULL,
consumer_id int4 NOT NULL,
action_id int4 NOT NULL,
payload_json jsonb NULL,
external_system_date timestamptz NULL,
local_system_date timestamptz NULL,
CONSTRAINT consumer_actions_pkey PRIMARY KEY (id, system_id)
);
CREATE INDEX consumer_actions_ext_date ON public.consumer_actions USING btree (external_system_date);
CREATE INDEX consumer_actions_system_consumer_id ON public.consumer_actions USING btree (system_id, consumer_id);
when i'm trying
select max(id) from consumer_actions where system_id = 1
it takes less than one second, but if i try to use the same index (consumer_actions_system_consumer_id) to get the max(id) by system_id = 2, it takes more than an hour.
select max(id) from consumer_actions where system_id = 2
I have also checked the query planner, is looks similar for both queries; i also rerun vacuum analyze on the table and a reindex. Neither of them helped. Any idea what i can do to improve the second query time?
Here are the query planners for both tables, and the size at the moment of this table:
explain analyze
select max(id) from consumer_actions where system_id = 1;
Result (cost=1.49..1.50 rows=1 width=4) (actual time=0.062..0.063 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..1.49 rows=1 width=4) (actual time=0.057..0.057 rows=1 loops=1)
-> Index Only Scan Backward using consumer_actions_pkey on consumer_actions ca (cost=0.57..524024735.49 rows=572451344 width=4) (actual time=0.055..0.055 rows=1 loops=1)
Index Cond: ((id IS NOT NULL) AND (system_id = 1))
Heap Fetches: 1
Planning Time: 0.173 ms
Execution Time: 0.092 ms
explain analyze
select max(id) from consumer_actions where system_id = 2;
Result (cost=6.46..6.47 rows=1 width=4) (actual time=7099484.855..7099484.858 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..6.46 rows=1 width=4) (actual time=7099484.839..7099484.841 rows=1 loops=1)
-> Index Only Scan Backward using consumer_actions_pkey on consumer_actions ca (cost=0.57..20205843.58 rows=3436129 width=4) (actual time=7099484.833..7099484.834 rows=1 loops=1)
Index Cond: ((id IS NOT NULL) AND (system_id = 2))
Heap Fetches: 1
Planning Time: 3.078 ms
Execution Time: 7099484.992 ms
(8 rows)
select count(*) from consumer_actions; --result is 577408504
Instead of using an aggregation function like max() that has to potentially scan and aggregate large numbers of rows for a table like yours you could get similar results with a query designed to return the fewest rows possible:
SELECT id FROM consumer_actions WHERE system_id = ? ORDER BY id DESC LIMIT 1;
This should still benefit significantly in performance from the existing indices.
I think that you should create an index like this one
CREATE INDEX consumer_actions_system_system_id_id ON public.consumer_actions USING btree (system_id, id);

Very bad query plan in PostgreSQL 9.6

I have a performance problem with PostgreSQL 9.6.17 (x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-39), 64-bit). Sometimes very inefficient query plan is chosen for relatively simple query.
There is dir_current table with 750M rows:
\d sf.dir_current
Table "sf.dir_current"
Column | Type | Collation | Nullable | Default
--------------------+-------------+-----------+----------+-----------------------------------------------
id | bigint | | not null | nextval('sf.object_id_seq'::regclass)
volume_id | bigint | | not null |
parent_id | bigint | | |
blocks | sf.blkcnt_t | | |
rec_aggrs | jsonb | | not null |
...
Indexes:
"dir_current_pk" PRIMARY KEY, btree (id), tablespace "sf_current"
"dir_current_parentid_idx" btree (parent_id), tablespace "sf_current"
"dir_current_volumeid_id_unq" UNIQUE CONSTRAINT, btree (volume_id, id), tablespace "sf_current"
Foreign-key constraints:
"dir_current_parentid_fk" FOREIGN KEY (parent_id) REFERENCES sf.dir_current(id) DEFERRABLE INITIALLY DEFERRED
(some columns omitted as they're irrelevant here).
Now, a temporary table is created with ca. 1K rows:
CREATE TEMP TABLE dir_process AS (
SELECT sf.dir_current.id, volume_id, parent_id, depth, size, blocks, atime, ctime, mtime, sync_time, local_aggrs FROM sf.dir_current
WHERE ....
);
CREATE INDEX dir_process_indx ON dir_process(volume_id, id);
ANALYZE dir_process;
The actual condition .... doesn't matter here - it selects some rows to be processed.
Here is a query which is sometimes very slow:
SELECT dir.id, dir.volume_id, dir.parent_id, dir.rec_aggrs, dir.blocks FROM sf.dir_current AS dir
INNER JOIN dir_process ON dir.parent_id = dir_process.id AND dir.volume_id = dir_process.volume_id
WHERE dir.volume_id = ANY(volume_ids)
A few slow plans:
duration: 1822060.789 ms plan:
Merge Join (cost=150260.47..265775.37 rows=1 width=456) (actual rows=14305 loops=1)
Merge Cond: (dir.volume_id = dir_process.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 23117117695
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..922747.05 rows=624805 width=456) (actual rows=1231600 loops=1)
Index Cond: (volume_id = ANY ('{88}'::bigint[]))
-> Sort (cost=966.16..975.55 rows=18770 width=16) (actual rows=23115900401 loops=1)
Sort Key: dir_process.volume_id
Sort Method: quicksort Memory: 1648kB
-> Seq Scan on dir_process (cost=0.00..699.70 rows=18770 width=16) (actual rows=18770 loops=1)
duration: 10140968.829 ms plan:
Merge Join (cost=0.17..8389.13 rows=1 width=456) (actual rows=819 loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 2153506735
-> Index Only Scan using dir_process_indx on dir_process (cost=0.06..659.76 rows=1166 width=16) (actual rows=1166 loops=1)
Heap Fetches: 1166
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..885276.20 rows=602172 width=456) (actual rows=2153506389 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
duration: 12524111.200 ms plan:
Merge Join (cost=480671.74..878819.79 rows=1 width=456) (actual rows=62 loops=1)
Merge Cond: (dir.volume_id = dir_process.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 153595373018
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..922747.05 rows=624805 width=456) (actual rows=2360101 loops=1)
Index Cond: (volume_id = ANY ('{441}'::bigint[]))
-> Sort (cost=2621.42..2653.96 rows=65080 width=16) (actual rows=153593012980 loops=1)
Sort Key: dir_process.volume_id
Sort Method: quicksort Memory: 4587kB
-> Seq Scan on dir_process (cost=0.00..1580.80 rows=65080 width=16) (actual rows=65080 loops=1)
The first reaction is as usual: "do I have up to date statistics?". The answer is yes: dir_current is changed frequently but is also analyzed ca. once an hour. dir_process is analyzed as soon as it's created.
Note that estimated number of rows matches pretty well:
est. 624805, actual=1231600
est. 624805, actual=2360101
Where the estimates are way off is the inner loop of merge join which actually shows the number of rows times the number of loops (153593012980 or 2153506389 or 23115900401). Poor executor is spinning in the inner loop, iterating over all rows with a given volume_id, looking for a given id.
The biggest problem seems to be that Postgres chooses to do a merge join on a very inefficient condition: dir.volume_id = dir_process.volume_id instead of dir.parent_id = dir_process.id. For a given volume_id there is a few million rows in dir_current, for a given parent_id there is hundreds or thousands of rows, but not millions.
The other effective query plan is a nested loop, where outer loop is iterating over dir_process and using an index to fetch the row from dir_current.
I understand that I could disable merge join before running this query but I was wondering if there is a better solution.
Any more idea about what can be done to avoid this inefficient plan? How is it possible that it's chosen over nested loops?

Seemingly Random Delay in queries

This is a follow up to this issue I posted a while ago.
I have the following code:
SET work_mem = '16MB';
SELECT s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
FROM rm_o_resource_usage_instance_splits_new s
INNER JOIN rm_o_resource_usage r ON s.usage_id = r.id
INNER JOIN scheduledactivities sa ON s.activity_index = sa.activity_index AND r.schedule_id = sa.solution_id and s.solution = sa.solution_id
WHERE r.schedule_id = 10
ORDER BY r.resource_id, s.start_date
When I run EXPLAIN (ANALYZE, BUFFERS) I get the following:
Sort (cost=3724.02..3724.29 rows=105 width=89) (actual time=245.802..247.573 rows=22302 loops=1)
Sort Key: r.resource_id, s.start_date
Sort Method: quicksort Memory: 6692kB
Buffers: shared hit=198702 read=5993 written=612
-> Nested Loop (cost=703.76..3720.50 rows=105 width=89) (actual time=1.898..164.741 rows=22302 loops=1)
Buffers: shared hit=198702 read=5993 written=612
-> Hash Join (cost=703.34..3558.54 rows=105 width=101) (actual time=1.815..11.259 rows=22302 loops=1)
Hash Cond: (s.usage_id = r.id)
Buffers: shared hit=3 read=397 written=2
-> Bitmap Heap Scan on rm_o_resource_usage_instance_splits_new s (cost=690.61..3486.58 rows=22477 width=69) (actual time=1.782..5.820 rows=22302 loops=1)
Recheck Cond: (solution = 10)
Heap Blocks: exact=319
Buffers: shared hit=2 read=396 written=2
-> Bitmap Index Scan on rm_o_resource_usage_instance_splits_new_solution_idx (cost=0.00..685.00 rows=22477 width=0) (actual time=1.609..1.609 rows=22302 loops=1)
Index Cond: (solution = 10)
Buffers: shared hit=2 read=77
-> Hash (cost=12.66..12.66 rows=5 width=48) (actual time=0.023..0.023 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1 read=1
-> Bitmap Heap Scan on rm_o_resource_usage r (cost=4.19..12.66 rows=5 width=48) (actual time=0.020..0.020 rows=1 loops=1)
Recheck Cond: (schedule_id = 10)
Heap Blocks: exact=1
Buffers: shared hit=1 read=1
-> Bitmap Index Scan on rm_o_resource_usage_sched (cost=0.00..4.19 rows=5 width=0) (actual time=0.017..0.017 rows=1 loops=1)
Index Cond: (schedule_id = 10)
Buffers: shared read=1
-> Index Scan using scheduledactivities_activity_index_idx on scheduledactivities sa (cost=0.42..1.53 rows=1 width=16) (actual time=0.004..0.007 rows=1 loops=22302)
Index Cond: (activity_index = s.activity_index)
Filter: (solution_id = 10)
Rows Removed by Filter: 5
Buffers: shared hit=198699 read=5596 written=610
Planning time: 7.070 ms
Execution time: 248.691 ms
Every time I run EXPLAIN, I get roughly the same results. The Execution Time is always between 170ms and 250ms, which, to me is perfectly fine. However, when this query is run through a C++ project (using PQexec(conn, query) where conn is a dedicated connection, and query is the above query), the time it takes seems to vary widely. In general, the query is very quick, and you don't notice a delay. The problem is, that on occasion, this query will take 2 to 3 minutes to complete.
If I open the pgadmin, and have a look at the "server activity" for the database, there's about 30 or so connections, mostly sitting at "idle". The above query's connection is marked as "active", and will stay as "active" for several minutes.
I am at a loss of why it randomly takes several minutes to complete the same query, with no change in data in the DB either. I have tried increasing the work_mem which didn't make any difference (nor did I really expect it to). Any help or suggestions would be greatly appreciated.
There isn't any more specific tags, but I'm currently using Postgres 10.11, but it's also been an issue on other versions of 10.x. System is a Xeon quad-core # 3.4Ghz, with SSD and 24GB of memory.
Per jjanes's suggestion, I put in the auto_explain. Eventually go this output:
duration: 128057.373 ms
plan:
Query Text: SET work_mem = '32MB';SELECT s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset FROM rm_o_resource_usage_instance_splits_new s INNER JOIN rm_o_resource_usage r ON s.usage_id = r.id INNER JOIN scheduledactivities sa ON s.activity_index = sa.activity_index AND r.schedule_id = sa.solution_id and s.solution = sa.solution_id WHERE r.schedule_id = 12642 ORDER BY r.resource_id, s.start_date
Sort (cost=14.36..14.37 rows=1 width=98) (actual time=128042.083..128043.287 rows=21899 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
Sort Key: r.resource_id, s.start_date
Sort Method: quicksort Memory: 6585kB
Buffers: shared hit=21198435 read=388 dirtied=119
-> Nested Loop (cost=0.85..14.35 rows=1 width=98) (actual time=4.995..127958.935 rows=21899 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
Join Filter: (s.activity_index = sa.activity_index)
Rows Removed by Join Filter: 705476285
Buffers: shared hit=21198435 read=388 dirtied=119
-> Nested Loop (cost=0.42..9.74 rows=1 width=110) (actual time=0.091..227.705 rows=21899 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, s.solution, r.resource_id, r.schedule_id
Inner Unique: true
Join Filter: (s.usage_id = r.id)
Buffers: shared hit=22102 read=388 dirtied=119
-> Index Scan using rm_o_resource_usage_instance_splits_new_solution_idx on public.rm_o_resource_usage_instance_splits_new s (cost=0.42..8.44 rows=1 width=69) (actual time=0.082..17.418 rows=21899 loops=1)
Output: s.start_time, s.end_time, s.resources, s.activity_index, s.usage_id, s.start_date, s.end_date, s.solution
Index Cond: (s.solution = 12642)
Buffers: shared hit=203 read=388 dirtied=119
-> Seq Scan on public.rm_o_resource_usage r (cost=0.00..1.29 rows=1 width=57) (actual time=0.002..0.002 rows=1 loops=21899)
Output: r.id, r.schedule_id, r.resource_id
Filter: (r.schedule_id = 12642)
Rows Removed by Filter: 26
Buffers: shared hit=21899
-> Index Scan using scheduled_activities_idx on public.scheduledactivities sa (cost=0.42..4.60 rows=1 width=16) (actual time=0.006..4.612 rows=32216 loops=21899)
Output: sa.usedresourceset, sa.activity_index, sa.solution_id
Index Cond: (sa.solution_id = 12642)
Buffers: shared hit=21176333",,,,,,,,,""
EDIT: Full definitions of the tables are below:
CREATE TABLE public.rm_o_resource_usage_instance_splits_new
(
start_time integer NOT NULL,
end_time integer NOT NULL,
resources jsonb NOT NULL,
activity_index integer NOT NULL,
usage_id bigint NOT NULL,
start_date text COLLATE pg_catalog."default" NOT NULL,
end_date text COLLATE pg_catalog."default" NOT NULL,
solution bigint NOT NULL,
CONSTRAINT rm_o_resource_usage_instance_splits_new_pkey PRIMARY KEY (start_time, activity_index, usage_id),
CONSTRAINT rm_o_resource_usage_instance_splits_new_solution_fkey FOREIGN KEY (solution)
REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE,
CONSTRAINT rm_o_resource_usage_instance_splits_new_usage_id_fkey FOREIGN KEY (usage_id)
REFERENCES public.rm_o_resource_usage (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_instance_splits_new_activity_idx
ON public.rm_o_resource_usage_instance_splits_new USING btree
(activity_index ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_instance_splits_new_solution_idx
ON public.rm_o_resource_usage_instance_splits_new USING btree
(solution ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_instance_splits_new_usage_idx
ON public.rm_o_resource_usage_instance_splits_new USING btree
(usage_id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE TABLE public.rm_o_resource_usage
(
id bigint NOT NULL DEFAULT nextval('rm_o_resource_usage_id_seq'::regclass),
schedule_id bigint NOT NULL,
resource_id text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT rm_o_resource_usage_pkey PRIMARY KEY (id),
CONSTRAINT rm_o_resource_usage_schedule_id_fkey FOREIGN KEY (schedule_id)
REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_idx
ON public.rm_o_resource_usage USING btree
(id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_sched
ON public.rm_o_resource_usage USING btree
(schedule_id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE TABLE public.scheduledactivities
(
id bigint NOT NULL DEFAULT nextval('scheduledactivities_id_seq'::regclass),
solution_id bigint NOT NULL,
activity_id text COLLATE pg_catalog."default" NOT NULL,
sequence_index integer,
startminute integer,
finishminute integer,
issue text COLLATE pg_catalog."default",
activity_index integer NOT NULL,
is_objective boolean NOT NULL,
usedresourceset integer DEFAULT '-1'::integer,
start timestamp without time zone,
finish timestamp without time zone,
is_ore boolean,
is_ignored boolean,
CONSTRAINT scheduled_activities_pkey PRIMARY KEY (id),
CONSTRAINT scheduledactivities_solution_id_fkey FOREIGN KEY (solution_id)
REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
CREATE INDEX scheduled_activities_activity_id_idx
ON public.scheduledactivities USING btree
(activity_id COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX scheduled_activities_id_idx
ON public.scheduledactivities USING btree
(id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX scheduled_activities_idx
ON public.scheduledactivities USING btree
(solution_id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX scheduledactivities_activity_index_idx
ON public.scheduledactivities USING btree
(activity_index ASC NULLS LAST)
TABLESPACE pg_default;
EDIT: Additional output from auto_explain after adding index on scheduledactivities (solution_id, activity_index)
Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
Sort Key: r.resource_id, s.start_date
Sort Method: quicksort Memory: 6283kB
Buffers: shared hit=20159117 read=375 dirtied=190
-> Nested Loop (cost=0.85..10.76 rows=1 width=100) (actual time=5.518..122489.627 rows=20761 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
Join Filter: (s.activity_index = sa.activity_index)
Rows Removed by Join Filter: 668815615
Buffers: shared hit=20159117 read=375 dirtied=190
-> Nested Loop (cost=0.42..5.80 rows=1 width=112) (actual time=0.057..217.563 rows=20761 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, s.solution, r.resource_id, r.schedule_id
Inner Unique: true
Join Filter: (s.usage_id = r.id)
Buffers: shared hit=20947 read=375 dirtied=190
-> Index Scan using rm_o_resource_usage_instance_splits_new_solution_idx on public.rm_o_resource_usage_instance_splits_new s (cost=0.42..4.44 rows=1 width=69) (actual time=0.049..17.622 rows=20761 loops=1)
Output: s.start_time, s.end_time, s.resources, s.activity_index, s.usage_id, s.start_date, s.end_date, s.solution
Index Cond: (s.solution = 12644)
Buffers: shared hit=186 read=375 dirtied=190
-> Seq Scan on public.rm_o_resource_usage r (cost=0.00..1.35 rows=1 width=59) (actual time=0.002..0.002 rows=1 loops=20761)
Output: r.id, r.schedule_id, r.resource_id
Filter: (r.schedule_id = 12644)
Rows Removed by Filter: 22
Buffers: shared hit=20761
-> Index Scan using scheduled_activities_idx on public.scheduledactivities sa (cost=0.42..4.94 rows=1 width=16) (actual time=0.007..4.654 rows=32216 loops=20761)
Output: sa.usedresourceset, sa.activity_index, sa.solution_id
Index Cond: (sa.solution_id = 12644)
Buffers: shared hit=20138170",,,,,,,,,""
The easiest way to reproduce the issue is to add more values to the three tables. I didn't delete any, only did a few thousand INSERTs.
-> Index Scan using .. s (cost=0.42..8.44 rows=1 width=69) (actual time=0.082..17.418 rows=21899 loops=1)
Index Cond: (s.solution = 12642)
The planner thinks it will find 1 row, and instead finds 21899. That error can pretty clearly lead to bad plans. And a single equality condition should be estimated quite accurately, so I'd say the statistics on your table are way off. It could be that the autovac launcher is tuned poorly so it doesn't run often enough, or it could be that select parts of your data change very rapidly (did you just insert 21899 rows with s.solution = 12642 immediately before running the query?) and so the stats can't be kept accurate enough.
-> Nested Loop ...
Join Filter: (s.activity_index = sa.activity_index)
Rows Removed by Join Filter: 705476285
-> ...
-> Index Scan using scheduled_activities_idx on public.scheduledactivities sa (cost=0.42..4.60 rows=1 width=16) (actual time=0.006..4.612 rows=32216 loops=21899)
Output: sa.usedresourceset, sa.activity_index, sa.solution_id
Index Cond: (sa.solution_id = 12642)
If you can't get it to use the Hash Join, you can at least reduce the harm of the Nested Loop by building an index on scheduledactivities (solution_id, activity_index). That way the activity_index criterion could be part of the Index Condition, rather than being a Join Filter. You could probably then drop the index exclusively on solution_id, as there is little point in maintaining both indexes.
The SQL statement of the fast plan is using WHERE r.schedule_id = 10 and returns about 22000 rows (with estimated 105).
The SQL statement of the slow plan is using WHERE r.schedule_id = 12642 and returns about 21000 rows (with estimated only 1).
The slow plan is using nested loops instead of hash joins: maybe because there is a bad estimation for joins: estimated rows is 1 but actual rows is 21899.
For example in this step:
Nested Loop (cost=0.42..9.74 rows=1 width=110) (actual time=0.091..227.705 rows=21899 loops=1)
If data does not change there is maybe a statistic issue (skew data) for some columns.

PostgreSQL increase group by over 30 million rows

Is there any way to increase speed of dynamic group by query ? I have a table with 30 million rows.
create table if not exists tb
(
id serial not null constraint tb_pkey primary key,
week integer,
month integer,
year integer,
starttime varchar(20),
endtime varchar(20),
brand smallint,
category smallint,
value real
);
The query below takes 8.5 seconds.
SELECT category from tb group by category
Is there any way to increase the speed. I have tried with and without index.
For that exact query, not really; doing this operation requires scanning every row. No way around it.
But if you're looking to be able to quickly get the set of unique categories, and you have an index on that column, you can use a variation of the WITH RECURSIVE example shown in the edit to the question here (look towards the end of the question):
Counting distinct rows using recursive cte over non-distinct index
You'll need to change it to return the set of unique values instead of counting them, but it looks like a simple change:
testdb=# create table tb(id bigserial, category smallint);
CREATE TABLE
testdb=# insert into tb(category) select 2 from generate_series(1, 10000)
testdb-# ;
INSERT 0 10000
testdb=# insert into tb(category) select 1 from generate_series(1, 10000);
INSERT 0 10000
testdb=# insert into tb(category) select 3 from generate_series(1, 10000);
INSERT 0 10000
testdb=# create index on tb(category);
CREATE INDEX
testdb=# WITH RECURSIVE cte AS
(
(SELECT category
FROM tb
WHERE category >= 0
ORDER BY 1
LIMIT 1)
UNION ALL SELECT
(SELECT category
FROM tb
WHERE category > c.category
ORDER BY 1
LIMIT 1)
FROM cte c
WHERE category IS NOT NULL)
SELECT category
FROM cte
WHERE category IS NOT NULL;
category
----------
1
2
3
(3 rows)
And here's the EXPLAIN ANALYZE:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on cte (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)
Filter: (category IS NOT NULL)
Rows Removed by Filter: 1
CTE cte
-> Recursive Union (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)
-> Limit (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)
-> Index Only Scan using tb_category_idx on tb tb_1 (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)
Index Cond: (category >= 0)
Heap Fetches: 1
-> WorkTable Scan on cte c (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)
Filter: (category IS NOT NULL)
Rows Removed by Filter: 0
SubPlan 1
-> Limit (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)
-> Index Only Scan using tb_category_idx on tb (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)
Index Cond: (category > c.category)
Heap Fetches: 2
Planning time: 0.224 ms
Execution time: 0.191 ms
(19 rows)
The number of loops it has to do the WorkTable scan node will be equal to the number of unique categories you have plus one, so it should stay very fast up to, say, hundreds of unique values.
Another route you can take is to add another table where you just store unique values of tb.category and have application logic check that table and insert their value when updating/inserting that column. This can also be done database-side with triggers; that solution is also discussed in the answers to the linked question.

PostgreSQL recursive CTE performance issue

I'm trying to understand such a huge difference in performance of two queries.
Let's assume I have two tables.
First one contains A records for some set of domains:
Table "public.dns_a"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
name | character varying(125) | | extended | |
a | inet | | main | |
Indexes:
"dns_a_a_idx" btree (a)
"dns_a_name_idx" btree (name varchar_pattern_ops)
Second table handles CNAME records:
Table "public.dns_cname"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
name | character varying(256) | | extended | |
cname | character varying(256) | | extended | |
Indexes:
"dns_cname_cname_idx" btree (cname varchar_pattern_ops)
"dns_cname_name_idx" btree (name varchar_pattern_ops)
Now I'm trying to solve "simple" problem with getting all the domains pointing to the same IP address, including CNAME.
The first attempt to use CTE works kind of fine:
EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
(
SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
)
UNION ALL
SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on names_traverse (cost=3051757.20..4337044.86 rows=64264383 width=1064) (actual time=0.037..1697.444 rows=199 loops=1)
CTE names_traverse
-> Recursive Union (cost=0.57..3051757.20 rows=64264383 width=45) (actual time=0.036..1697.395 rows=199 loops=1)
-> Index Scan using dns_a_a_idx on dns_a (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.035..0.064 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Merge Join (cost=4377.00..176448.06 rows=6426243 width=45) (actual time=498.101..848.648 rows=92 loops=2)
Merge Cond: ((c.cname)::text = (nt.name)::text)
-> Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2)
-> Materialize (cost=4376.44..4474.09 rows=19530 width=516) (actual time=0.039..0.084 rows=187 loops=2)
-> Sort (cost=4376.44..4425.27 rows=19530 width=516) (actual time=0.037..0.053 rows=100 loops=2)
Sort Key: nt.name USING ~<~
Sort Method: quicksort Memory: 33kB
-> WorkTable Scan on names_traverse nt (cost=0.00..390.60 rows=19530 width=516) (actual time=0.001..0.007 rows=100 loops=2)
Planning time: 0.130 ms
Execution time: 1697.477 ms
(15 rows)
There are two loops in the example above, so if I make a simple outer join query, I get much better results:
EXPLAIN ANALYZE
SELECT *
FROM dns_a a
LEFT JOIN dns_cname c1 ON (c1.cname=a.name)
LEFT JOIN dns_cname c2 ON (c2.cname=c1.name)
WHERE a.a='118.145.5.20';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1.68..65674.19 rows=1953 width=114) (actual time=1.086..12.992 rows=189 loops=1)
-> Nested Loop Left Join (cost=1.12..46889.57 rows=1953 width=69) (actual time=1.085..2.154 rows=189 loops=1)
-> Index Scan using dns_a_a_idx on dns_a a (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.022..0.055 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Index Scan using dns_cname_cname_idx on dns_cname c1 (cost=0.56..19.70 rows=329 width=45) (actual time=0.137..0.148 rows=13 loops=14)
Index Cond: ((cname)::text = (a.name)::text)
-> Index Scan using dns_cname_cname_idx on dns_cname c2 (cost=0.56..6.33 rows=329 width=45) (actual time=0.057..0.057 rows=0 loops=189)
Index Cond: ((cname)::text = (c1.name)::text)
Planning time: 0.452 ms
Execution time: 13.012 ms
(10 rows)
Time: 13.787 ms
So, the performance difference is about 100 times and that's the thing that worries me.
I like the convenience of recursive CTE and prefer to use it instead of doing dirty tricks on application side, but I don't get why the cost of Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2) is so high.
Am I missing something important regarding CTE or the issue is with something else?
Thanks!
Update: A friend of mine spotted the number of affected rows I missed Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2), it equals total number of rows in the table and, if I understand correctly, it performs full index scan without condition and I don't get where condition is missed.
Result: After applying SET LOCAL enable_mergejoin TO false; execution time is much, much better.
EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
(
SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
)
UNION ALL
SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on names_traverse (cost=4746432.42..6527720.02 rows=89064380 width=1064) (actual time=0.718..45.656 rows=199 loops=1)
CTE names_traverse
-> Recursive Union (cost=0.57..4746432.42 rows=89064380 width=45) (actual time=0.717..45.597 rows=199 loops=1)
-> Index Scan using dns_a_a_idx on dns_a (cost=0.57..74.82 rows=2700 width=24) (actual time=0.716..0.717 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Nested Loop (cost=0.56..296507.00 rows=8906168 width=45) (actual time=11.276..22.418 rows=92 loops=2)
-> WorkTable Scan on names_traverse nt (cost=0.00..540.00 rows=27000 width=516) (actual time=0.000..0.013 rows=100 loops=2)
-> Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..7.66 rows=330 width=45) (actual time=0.125..0.225 rows=1 loops=199)
Index Cond: ((cname)::text = (nt.name)::text)
Planning time: 0.253 ms
Execution time: 45.697 ms
(11 rows)
The first query is slow because of the index scan, as you noted.
The plan has to scan the complete index in order to get dns_cname sorted by cname, which is needed for the merge join. A merge join requires that both input tables are sorted by the join key, which can either be done with an index scan over the complete table (as in this case) or by a sequential scan followed by an explicit sort.
You will notice that the planner grossly overestimates all row counts for the CTE evaluation, which is probably the root of the problem. For fewer rows, PostgreSQL might choose a nested loop join which would not have to scan the whole table dns_cname.
That may be fixable or not. One thing that I can see immediately is that the estimate for the initial value '118.145.5.20' is too high by a factor 139.5, which is pretty bad. You might fix that by running ANALYZE on dns_cname, perhaps after increasing the statistics target for the column:
ALTER TABLE dns_a ALTER a SET STATISTICS 1000;
See if that makes a difference.
If that doesn't do the trick, you can manually set enable_mergejoin and enable_hashjoin to off and see if a plan with a nested loop join is really better or not. If you can get away with changing these parameters for this one statement only (probably with SET LOCAL) and get a better result that way, that is another option you have.