I discovered that a query I am doing on indexed column results in a sequential scan:
mydatabase=> explain analyze SELECT account_id,num_tokens,tok_holdings FROM tokacct WHERE address='00000000000000000';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on tokacct (cost=0.00..6.69 rows=1 width=27) (actual time=0.046..0.046 rows=0 loops=1)
Filter: (address = '00000000000000000'::text)
Rows Removed by Filter: 225
Planning time: 0.108 ms
Execution time: 0.075 ms
(5 rows)
mydatabase=>
However a \di shows I have a unique index:
mydatabase=> \di
List of relations
Schema | Name | Type | Owner | Table
--------+-------------------------+-------+--------+----------------
......
public | tokacct_address_key | index | mydb | tokacct
.....
My table is defined like this:
CREATE TABLE tokacct (
tx_id BIGINT NOT NULL,
account_id SERIAL PRIMARY KEY,
state_acct_id INT NOT NULL DEFAULT 0,
num_tokens INT DEFAULT 0,
ts_created INT DEFAULT 0,
block_created INT DEFAULT 0,
address TEXT NOT NULL UNIQUE
tok_holdings TEXT DEFAULT ''
);
As you can see, the address field is declared as UNIQUE. The \di also confirms there is an index. So, why does it use a sequential scan on the table ?
Seq Scan on tokacct (cost=0.00..6.69 rows=1 width=27) (actual time=0.046..0.046 rows=0 loops=1)
create one page table:
db=# create table small as select g, chr(g) from generate_series(1,200) g;
SELECT 200
db=# create index small_i on small(g);
CREATE INDEX
db=# analyze small;
ANALYZE
seq scan:
db=# explain (analyze, verbose, buffers) select g from small where g = 200;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on public.small (cost=0.00..3.50 rows=1 width=4) (actual time=0.044..0.045 rows=1 loops=1)
Output: g
Filter: (small.g = 200)
Rows Removed by Filter: 199
Buffers: shared hit=1
Planning time: 1.360 ms
Execution time: 0.066 ms
(7 rows)
create three page table:
db=# drop table small;
DROP TABLE
db=# create table small as select g, chr(g) from generate_series(1,500) g;
SELECT 500
db=# create index small_i on small(g);
CREATE INDEX
db=# analyze small;
ANALYZE
db=# explain (analyze, verbose, buffers) select g from small where g = 200;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Only Scan using small_i on public.small (cost=0.27..8.29 rows=1 width=4) (actual time=3.194..3.195 rows=1 loops=1)
Output: g
Index Cond: (small.g = 200)
Heap Fetches: 1
Buffers: shared hit=1 read=2
Planning time: 0.271 ms
Execution time: 3.747 ms
(7 rows)
now the table takes three pages and index - two, thus index is cheaper...
how do I know the number of pages? it says so in the (verbose) execution plan. And the table?
db=# select max(ctid) from small;
max
--------
(2,48)
(1 row)
Here 2, means page two (counts from zero).
or again from the verbose plan:
db=# set enable_indexonlyscan to off;
SET
db=# set enable_indexscan to off;
SET
db=# set enable_bitmapscan to off;
SET
db=# explain (analyze, verbose, buffers) select g from small where g = 200;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on public.small (cost=0.00..9.25 rows=1 width=4) (actual time=0.124..0.303 rows=1 loops=1)
Output: g
Filter: (small.g = 200)
Rows Removed by Filter: 499
Buffers: shared hit=3
Planning time: 0.105 ms
Execution time: 0.327 ms
(7 rows)
Here, hit=3
Related
What I mean is this. In PostgreSQL (v 15.1) I have a table foo created in the following way.
create table foo (
id integer primary key generated by default as identity,
id_mod_7 int generated always as (id % 7) stored
);
create index on foo (id_mod_7, id);
insert into foo (id) select generate_series(1, 10000);
If I query this table with a predicate that doesn't use a literal constant but rather uses a function, a sequential scan is used:
explain analyze
select count(1) from foo where id_mod_7 = extract(dow from current_date);
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Aggregate (cost=245.12..245.13 rows=1 width=8) (actual time=7.218..7.219 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..245.00 rows=50 width=0) (actual time=0.020..7.028 rows=1428 loops=1)
Filter: ((id_mod_7)::numeric = EXTRACT(dow FROM CURRENT_DATE))
Rows Removed by Filter: 8572
Planning Time: 0.178 ms
Execution Time: 7.281 ms
However, if I query this table with a predicate that does use a literal constant, an index scan is used:
explain analyze
select count(1) from foo where id_mod_7 = 6;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48.84..48.85 rows=1 width=8) (actual time=0.321..0.322 rows=1 loops=1)
-> Index Only Scan using foo_id_mod_7_id_idx on foo (cost=0.29..45.27 rows=1428 width=0) (actual time=0.022..0.214 rows=1428 loops=1)
Index Cond: (id_mod_7 = 6)
Heap Fetches: 0
Planning Time: 0.106 ms
Execution Time: 0.397 ms
I thought maybe I could fool it into using the index if I used the caching (alleged?) properties of Common Table Expressions (CTE), but to no avail:
explain analyze
with param as (select extract(dow from current_date) as dow)
select count(1) from foo join param on id_mod_7 = dow;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Aggregate (cost=245.12..245.13 rows=1 width=8) (actual time=5.830..5.831 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..245.00 rows=50 width=0) (actual time=0.025..5.668 rows=1428 loops=1)
Filter: ((id_mod_7)::numeric = EXTRACT(dow FROM CURRENT_DATE))
Rows Removed by Filter: 8572
Planning Time: 0.234 ms
Execution Time: 5.894 ms
It's not fatal, but I'm just trying to understand what's going on here. Thanks!
P.S. and just to avoid confusion, it's not the table column that is being computed within the SQL query. It's the value in the predicate expression that is (or would be) computed within the SQL query.
Like I said, I've tried using a CTE because I believed the CTE would be cached or materialized and expected an index scan, but unfortunately still got a sequential scan.
This is because extract() returns a numeric value, but the column is an integer. You can see this effect in the execution plan: (id_mod_7)::numeric = ... - the column needs to be cast to a numeric to be able to match the value from the extract() function
You need to cast the result of the extract() function to an int:
select count(*)
from foo
where id_mod_7 = extract(dow from current_date)::int
I am currently trying to delete about 1+ million rows in a table (actually, 30+ million, but i have made a sub-set, due to problems arising in this as well), where the condition is that the row must not have any references in other tables as a foreign key.. and i delete in batches of 30000 rows at a time.
so the query looks like:
DELETE FROM table_name tn WHERE tn.id IN (
SELECT tn2.id FROM table_name as tn2
LEFT JOIN table_name_join_1 ON table_name_join_1.table_id = tn2.id
LEFT JOIN table_name_join_2 ON table_name_join_2.table_id = tn2.id
...
LEFT JOIN table_name_join_19 ON table_name_join_19.table_id = tn2.id
WHERE table_name_join_1.table_id IS NULL
AND table_name_join_2.table_id IS NULL
...
AND table_name_join_19.table_id IS NULL
LIMIT 30000 OFFSET x
)
The table is referenced in 19 different tables, so a lot of left joins in the sub-query, and takes 61 seconds to run without LIMIT & OFFSET when counting the total amount of rows that will be affected.
The problem is that the query just hangs, when doing it in the delete statement, but works when just counting using COUNT(1).. I am not sure if there is a better way of deleting a lot of rows in a table.. or is it a matter of examining the tables referencing the table-in-question and see if some indexes are wack in some way.
Hope someone can help :D Its quite annoying seeing a query work and then just hang/fail straight afterwards when used as a sub-query.
I use psycopg2 on Python 2.7.17 (a work thing).. i also speculated in when to close the cursor from the psycopg2 connection to up speeds.. currently i create the cursor outside the loop running the delete, and close it along with the db-connection when the script is done... prev. the cursor was closed after each commit of a delete-statement, but it seemed a bit to much to me.. i don't know? Current loop looks like:
cursor = conn.cursor()
while count >= offset:
...
delete(cursor, batch_size, offset)
...
offset += batch_size
Also, is it a bad idea to commit() after each delete-statement is executed or should i wait until after the loop is finished executing all the delete statements, and then commit.. if so, shouldn't i look into using transactions instead ?
Basically I hope someone can tell me why stuff is so slow/fails, even though a count without limit and offset "only" takes 60 seconds??
DELETE FROM xxx has almost the same subsyntax as SELECT COUNT(*) FROM xxx ; so just to test the plan, you can run the fragment below, and check if you get an indexed plan:
EXPLAIN
SELECT COUNT(*)
FROM table_name tn
WHERE NOT EXISTS ( SELECT *
FROM table_name_join_1 x1 WHERE x1.table_id = tn.id
)
--
AND NOT EXISTS ( SELECT *
FROM table_name_join_2 x2 WHERE x2.table_id = tn.id
)
--
AND NOT EXISTS ( SELECT *
FROM table_name_join_3 x3 WHERE x3.table_id = tn.id
)
--
-- et cetera
--
;
Create some data, since it is hard to benchmark pseudocode:
SELECT version();
CREATE TABLE table_name
( id serial NOT NULL PRIMARY KEY
, name text
);
INSERT INTO table_name ( name )
SELECT 'Name_' || gs::text
FROM generate_series(1,100000) gs;
--
CREATE TABLE table_name_join_2
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_2(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 2 = 0
;
--
CREATE TABLE table_name_join_3
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_3(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 3 = 0
;
--
CREATE TABLE table_name_join_5
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_5(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 5 = 0
;
--
CREATE TABLE table_name_join_7
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_7(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 7 = 0
;
--
CREATE TABLE table_name_join_11
( id serial NOT NULL PRIMARY KEY
, table_id INTEGER REFERENCES table_name(id)
, name text
);
INSERT INTO table_name_join_11(table_id,name)
SELECT src.id , 'Name_' || src.id :: text
FROM table_name src
WHERE src.id % 11 = 0
;
Now, run the DELETE query:
VACUUM ANALYZE table_name;
VACUUM ANALYZE table_name_join_2;
VACUUM ANALYZE table_name_join_3;
VACUUM ANALYZE table_name_join_5;
VACUUM ANALYZE table_name_join_7;
EXPLAIN ANALYZE
DELETE
FROM table_name tn
WHERE 1=1
AND NOT EXISTS ( SELECT * FROM table_name_join_2 x2 WHERE x2.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_3 x3 WHERE x3.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_5 x5 WHERE x5.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_7 x7 WHERE x7.table_id = tn.id)
--
AND NOT EXISTS ( SELECT * FROM table_name_join_11 x11 WHERE x11.table_id = tn.id)
--
-- et cetera
--
;
SELECT count(*) FROM table_name;
Now, exactly the same, but with supporting indexes on the FKs:
CREATE INDEX table_name_join_2_2 ON table_name_join_2( table_id);
CREATE INDEX table_name_join_3_3 ON table_name_join_3( table_id);
CREATE INDEX table_name_join_5_5 ON table_name_join_5( table_id);
CREATE INDEX table_name_join_7_7 ON table_name_join_7( table_id);
CREATE INDEX table_name_join_11_11 ON table_name_join_11( table_id);
VACUUM ANALYZE table_name;
VACUUM ANALYZE table_name_join_2;
VACUUM ANALYZE table_name_join_3;
VACUUM ANALYZE table_name_join_5;
VACUUM ANALYZE table_name_join_7;
EXPLAIN ANALYZE
DELETE
FROM table_name tn
WHERE 1=1
...
;
----------
Query plan#1:
----------
DROP SCHEMA
CREATE SCHEMA
SET
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Raspbian 8.3.0-6+rpi1) 8.3.0, 32-bit
(1 row)
CREATE TABLE
INSERT 0 100000
CREATE TABLE
INSERT 0 50000
CREATE TABLE
INSERT 0 33333
CREATE TABLE
INSERT 0 20000
CREATE TABLE
INSERT 0 14285
CREATE TABLE
INSERT 0 9090
SET
SET
VACUUM
VACUUM
VACUUM
VACUUM
VACUUM QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on table_name tn (cost=3969.52..7651.94 rows=11429 width=36) (actual time=812.010..812.011 rows=0 loops=1)
-> Hash Anti Join (cost=3969.52..7651.94 rows=11429 width=36) (actual time=206.775..712.982 rows=20779 loops=1)
Hash Cond: (tn.id = x7.table_id)
-> Hash Anti Join (cost=3557.10..7088.09 rows=13334 width=34) (actual time=183.070..654.030 rows=24242 loops=1)
Hash Cond: (tn.id = x5.table_id)
-> Hash Anti Join (cost=2979.10..6329.25 rows=16667 width=28) (actual time=149.870..578.173 rows=30303 loops=1)
Hash Cond: (tn.id = x3.table_id)
-> Hash Anti Join (cost=2016.11..5124.59 rows=25000 width=22) (actual time=95.589..461.053 rows=45455 loops=1)
Hash Cond: (tn.id = x2.table_id)
-> Merge Anti Join (cost=572.11..3271.21 rows=50000 width=16) (actual time=14.486..261.955 rows=90910 loops=1)
Merge Cond: (tn.id = x11.table_id)
-> Index Scan using table_name_pkey on table_name tn (cost=0.29..2344.99 rows=100000 width=10) (actual time=0.031..118.968 rows=100000 loops=1)
-> Sort (cost=571.82..589.22 rows=6960 width=10) (actual time=14.446..20.365 rows=9090 loops=1)
Sort Key: x11.table_id
Sort Method: quicksort Memory: 612kB
-> Seq Scan on table_name_join_11 x11 (cost=0.00..127.60 rows=6960 width=10) (actual time=0.029..6.939 rows=9090 loops=1)
-> Hash (cost=819.00..819.00 rows=50000 width=10) (actual time=80.439..80.440 rows=50000 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2014kB
-> Seq Scan on table_name_join_2 x2 (cost=0.00..819.00 rows=50000 width=10) (actual time=0.019..36.848 rows=50000 loops=1)
-> Hash (cost=546.33..546.33 rows=33333 width=10) (actual time=53.678..53.678 rows=33333 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 1428kB
-> Seq Scan on table_name_join_3 x3 (cost=0.00..546.33 rows=33333 width=10) (actual time=0.027..24.132 rows=33333 loops=1)
-> Hash (cost=328.00..328.00 rows=20000 width=10) (actual time=32.884..32.885 rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 832kB
-> Seq Scan on table_name_join_5 x5 (cost=0.00..328.00 rows=20000 width=10) (actual time=0.017..15.135 rows=20000 loops=1)
-> Hash (cost=233.85..233.85 rows=14285 width=10) (actual time=23.542..23.542 rows=14285 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 567kB
-> Seq Scan on table_name_join_7 x7 (cost=0.00..233.85 rows=14285 width=10) (actual time=0.016..10.742 rows=14285 loops=1)
Planning Time: 4.470 ms
Trigger for constraint table_name_join_2_table_id_fkey: time=172949.350 calls=20779
Trigger for constraint table_name_join_3_table_id_fkey: time=116772.757 calls=20779
Trigger for constraint table_name_join_5_table_id_fkey: time=71218.348 calls=20779
Trigger for constraint table_name_join_7_table_id_fkey: time=51760.503 calls=20779
Trigger for constraint table_name_join_11_table_id_fkey: time=36120.128 calls=20779
Execution Time: 449783.490 ms
(35 rows)
count
-------
79221
(1 row)
Query plan#2:
SET
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
SET
VACUUM
VACUUM
VACUUM
VACUUM
VACUUM
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on table_name tn (cost=1.73..6762.95 rows=11429 width=36) (actual time=776.987..776.988 rows=0 loops=1)
-> Merge Anti Join (cost=1.73..6762.95 rows=11429 width=36) (actual time=0.212..676.794 rows=20779 loops=1)
Merge Cond: (tn.id = x7.table_id)
-> Merge Anti Join (cost=1.44..6322.99 rows=13334 width=34) (actual time=0.191..621.986 rows=24242 loops=1)
Merge Cond: (tn.id = x5.table_id)
-> Merge Anti Join (cost=1.16..5706.94 rows=16667 width=28) (actual time=0.172..550.669 rows=30303 loops=1)
Merge Cond: (tn.id = x3.table_id)
-> Merge Anti Join (cost=0.87..4661.02 rows=25000 width=22) (actual time=0.147..438.036 rows=45455 loops=1)
Merge Cond: (tn.id = x2.table_id)
-> Merge Anti Join (cost=0.58..2938.75 rows=50000 width=16) (actual time=0.125..250.082 rows=90910 loops=1)
Merge Cond: (tn.id = x11.table_id)
-> Index Scan using table_name_pkey on table_name tn (cost=0.29..2344.99 rows=100000 width=10) (actual time=0.031..116.630 rows=100000 loops=1)
-> Index Scan using table_name_join_11_11 on table_name_join_11 x11 (cost=0.29..230.14 rows=9090 width=10) (actual time=0.090..11.228 rows=9090 loops=1)
-> Index Scan using table_name_join_2_2 on table_name_join_2 x2 (cost=0.29..1222.29 rows=50000 width=10) (actual time=0.019..59.500 rows=50000 loops=1)
-> Index Scan using table_name_join_3_3 on table_name_join_3 x3 (cost=0.29..816.78 rows=33333 width=10) (actual time=0.022..40.473 rows=33333 loops=1)
-> Index Scan using table_name_join_5_5 on table_name_join_5 x5 (cost=0.29..491.09 rows=20000 width=10) (actual time=0.016..23.105 rows=20000 loops=1)
-> Index Scan using table_name_join_7_7 on table_name_join_7 x7 (cost=0.29..351.86 rows=14285 width=10) (actual time=0.017..16.903 rows=14285 loops=1)
Planning Time: 4.737 ms
Trigger for constraint table_name_join_2_table_id_fkey: time=1114.497 calls=20779
Trigger for constraint table_name_join_3_table_id_fkey: time=1096.065 calls=20779
Trigger for constraint table_name_join_5_table_id_fkey: time=1094.951 calls=20779
Trigger for constraint table_name_join_7_table_id_fkey: time=1090.509 calls=20779
Trigger for constraint table_name_join_11_table_id_fkey: time=1173.987 calls=20779
Execution Time: 6426.626 ms
(24 rows)
count
-------
79221
(1 row)
So, the query speeds up from 450 seconds to 7 seconds. And most of the time appears to be spent on checking the FK constraints, after the actual delete in the base table. [these constraints are implemented as invisible triggers in Postgres]
Summary table:
query type | indexes on all 5 FKs | workmem | total time(ms) | time for triggers
----------------+-----------------------+---------------+-----------------------+-------------------
NOT EXISTS() | No | 4Mb | 449783.490 | 448821.083
NOT EXISTS() | Yes | 4Mb | 6426.626 | 5570.009
NOT EXISTS() | Yes | 64Kb | 6405.273 | 5545.352
NOT IN() | No | 4Mb | 449435.530 | 448829.179
NOT IN() | Yes | 4Mb | 6113.690 | 5443.505
NOT IN() | Yes | 64Kb | 8595341.467 | 5545.796
Conclusion: it is up to you to decide if you want indexes on Foreign Keys.
postgres version: 9.3
postgres.conf: all default configurations
I have 2 tables, A and B,both have 1 million rows.
There is a postgres function that will execute every 2 seconds, it will update Table A where ids in an array(array size = 20), and then delete the rows in Table B.
DB function shows as below:
CREATE OR REPLACE FUNCTION test_function (ids NUMERIC[])
RETURNS void AS $$
BEGIN
UPDATE A a
SET status = 'begin', end_time = (NOW() AT TIME ZONE 'UTC')
WHERE a.id = ANY (ids);
DELETE FROM B b
WHERE b.aid = ANY (ids)
AND b.status = 'end';
END;
$$ LANGUAGE plpgsql;
Analysis shows as below:
explain(ANALYZE,BUFFERS,VERBOSE) select test_function('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Result (cost=0.00..0.26 rows=1 width=0) (actual time=14030.435..14030.436 rows=1 loops=1)
Output: test_function('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::numeric[])
Buffers: shared hit=24297 read=26137 dirtied=20
Total runtime: 14030.444 ms
(4 rows)
My Question is:
In the production environment, why this function need to execute at most 7 seconds before success;
When this function is executing, this process will eats up to 60%. --> This is the key problem
EDIT:
Analyze each single sql:
explain(ANALYZE,VERBOSE,BUFFERS) UPDATE A a SET status = 'begin',
end_time = (now()) WHERE a.id = ANY
('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Update on public.A a (cost=0.45..99.31 rows=20 width=143) (actual time=1.206..1.206 rows=0 loops=1)
Buffers: shared hit=206 read=27 dirtied=30
-> Index Scan using A_pkey on public.a a (cost=0.45..99.31 rows=20 width=143) (actual time=0.019..0.116 rows=19 loops=1)
Output: id, start_time, now(), 'begin'::character varying(255), xxxx... ctid
Index Cond: (t.id = ANY('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::integer[]))
Buffers: shared hit=75 read=11
Trigger test_trigger: time=5227.111 calls=1
Total runtime: 5228.357 ms
(8 rows)
explain(ANALYZE,BUFFERS,VERBOSE) DELETE FROM
B b WHERE tq.aid = ANY
('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Delete on B b (cost=0.00..1239.11 rows=20 width=6) (actual time=6.013..6.013 rows=0 loops=1)
Buffers: shared hit=448
-> Seq Scan on B b (cost=0.00..1239.11 rows=20 width=6) (actual time=6.011..6.011 rows=0 loops=1)
Output: ctid
Filter: (b.aid = ANY ('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::bigint[]))
Rows Removed by Filter: 21743
Buffers: shared hit=448
Total runtime: 6.029 ms
(8 rows)
CPU usage
Before calling:
Afer frequently operations:
A table with trigram index, does not work if there is mixed case or ILike in the query.
Im not sure what I have missed. Any ideas?
(Im using PostgreSQL 9.6.2)
CREATE TABLE public.tbltest (
"tbltestId" int NOT null ,
"mystring1" text,
"mystring2" character varying,
CONSTRAINT "tbltest_pkey" PRIMARY KEY ("tbltestId")
);
insert into tbltest ("tbltestId","mystring1", "mystring2")
select x.id, x.id || ' Test', x.id || ' Test' from generate_series(1,100000) AS x(id);
CREATE EXTENSION pg_trgm;
CREATE INDEX tbltest_idx1 ON tbltest using gin ("mystring1" gin_trgm_ops);
CREATE INDEX tbltest_idx2 ON tbltest using gin ("mystring2" gin_trgm_ops);
Using lower case text in the query works, and uses the index
explain analyse
select * from tbltest
where "mystring2" Like '%test%';
QUERY PLAN |
-----------------------------------------------------------------------------------------------------------------------------|
Bitmap Heap Scan on tbltest (cost=20.08..56.68 rows=10 width=24) (actual time=29.846..29.846 rows=0 loops=1) |
Recheck Cond: ((mystring2)::text ~~ '%test%'::text) |
Rows Removed by Index Recheck: 100000 |
Heap Blocks: exact=726 |
-> Bitmap Index Scan on tbltest_idx2 (cost=0.00..20.07 rows=10 width=0) (actual time=12.709..12.709 rows=100000 loops=1) |
Index Cond: ((mystring2)::text ~~ '%test%'::text) |
Planning time: 0.086 ms |
Execution time: 29.875 ms |
Like does not use the index if I add mixed case in the search
explain analyse
select * from tbltest
where "mystring2" Like '%Test%';
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------|
Seq Scan on tbltest (cost=0.00..1976.00 rows=99990 width=24) (actual time=0.011..33.376 rows=100000 loops=1) |
Filter: ((mystring2)::text ~~ '%Test%'::text) |
Planning time: 0.083 ms |
Execution time: 51.259 ms |
ILike does not use the index either
explain analyse
select * from tbltest
where "mystring2" ILike '%Test%';
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------|
Seq Scan on tbltest (cost=0.00..1976.00 rows=99990 width=24) (actual time=0.012..87.038 rows=100000 loops=1) |
Filter: ((mystring2)::text ~~* '%Test%'::text) |
Planning time: 0.134 ms |
Execution time: 105.757 ms |
PostgreSQL does not use the index in the last two queries because that is the best way to process the query, not because it cannot use it.
In your EXPLAIN output you can see that the first query returns zero rows (actual ... rows=0), while the other two queries return every single row in the table (actual ... rows=100000).
The PostgreSQL optimizer's estimates reflect that situation accurately.
Since it has to access most of the rows of the table anyway, PostgreSQL knows that it will be able to get the result much cheaper if it scans the table sequentially than by using the more complicated index access method.
Test table and indexes:
CREATE TABLE public.t (id serial, cb boolean, ci integer, co integer)
INSERT INTO t(cb, ci, co)
SELECT ((round(random()*1))::int)::boolean, round(random()*100), round(random()*100)
FROM generate_series(1, 1000000)
CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
CREATE INDEX wrong ON public.t USING btree (ci, co);
CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);
The problem is that I can't force PostgreSQL to use the "right" index. The next query uses the "wrong" index. It's not optimal because it uses "Filter" (condition: cb = TRUE) and so reads more data from memory (and execution becomes longer):
explain (analyze, buffers)
SELECT * FROM t WHERE cb = TRUE AND ci = 46 ORDER BY co LIMIT 1000
"Limit (cost=0.42..4063.87 rows=1000 width=13) (actual time=0.057..4.405 rows=1000 loops=1)"
" Buffers: shared hit=1960"
" -> Index Scan using wrong on t (cost=0.42..21784.57 rows=5361 width=13) (actual time=0.055..4.256 rows=1000 loops=1)"
" Index Cond: (ci = 46)"
" Filter: cb"
" Rows Removed by Filter: 967"
" Buffers: shared hit=1960"
"Planning time: 0.318 ms"
"Execution time: 4.530 ms"
But when I cast bool column to int, that works fine. This is unclear, because selectivity of both indexes (right and right_hack) remains the same.
explain (analyze, buffers)
SELECT * FROM t WHERE cb::int = 1 AND ci = 46 ORDER BY co LIMIT 1000
"Limit (cost=0.42..2709.91 rows=1000 width=13) (actual time=0.027..1.484 rows=1000 loops=1)"
" Buffers: shared hit=1003"
" -> Index Scan using right_hack on t (cost=0.42..14525.95 rows=5361 width=13) (actual time=0.025..1.391 rows=1000 loops=1)"
" Index Cond: ((ci = 46) AND ((cb)::integer = 1))"
" Buffers: shared hit=1003"
"Planning time: 0.202 ms"
"Execution time: 1.565 ms"
Are there any limitations of using boolean column inside multicolumn index?
A conditional index (or two) does seem to work:
CREATE INDEX true_bits ON ttt (ci, co)
WHERE cb = True ;
CREATE INDEX false_bits ON ttt (ci, co)
WHERE cb = False ;
VACUUM ANALYZE ttt;
EXPLAIN (ANALYZE, buffers)
SELECT * FROM ttt
WHERE cb = TRUE AND ci = 46 ORDER BY co LIMIT 1000
;
Plan
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.25..779.19 rows=1000 width=13) (actual time=0.024..1.804 rows=1000 loops=1)
Buffers: shared hit=1001
-> Index Scan using true_bits on ttt (cost=0.25..3653.46 rows=4690 width=13) (actual time=0.020..1.570 rows=1000 loops=1)
Index Cond: (ci = 46)
Buffers: shared hit=1001
Planning time: 0.468 ms
Execution time: 1.949 ms
(7 rows)
Still, there is very little gain in indexes on low-cardinality columns. The chance that an index-entry can avoid a page-read is very small. For a page size of 8K and a rowsize of ~20, there are ~400 records on a page. There will (almost) always be a true record on any page (and a false record), so the page will have to be read anyway.