Postgres does not pick partial index even when the clause matches - postgresql

I have a table that looks like so:
Column | Type | Collation | Nullable | Default
-----------+---------+-----------+----------+---------
app_id | uuid | | not null |
entity_id | uuid | | not null |
attr_id | uuid | | not null |
value | text | | not null |
ea_index | boolean | | |
Indexes:
"triples_pkey" PRIMARY KEY, btree (app_id, entity_id, attr_id, value)
"ea_index" UNIQUE, btree (app_id, entity_id, attr_id) WHERE ea_index
"triples_app_id" btree (app_id)
"triples_attr_id" btree (attr_id)
Foreign-key constraints:
"triples_app_id_fkey" FOREIGN KEY (app_id) REFERENCES apps(id) ON DELETE CASCADE
"triples_attr_id_fkey" FOREIGN KEY (attr_id) REFERENCES attrs(id) ON DELETE CASCADE
I have a special partial index ea_index, enabled for all the rows that have this column.
Now, when I run:
EXPLAIN (
SELECT
*
FROM triples
WHERE
app_id = '6b1ca162-0175-4188-9265-849f671d56cc' AND
entity_id = '6b1ca162-0175-4188-9265-849f671d56cc' AND
ea_index
);
I get:
Index Scan using triples_app_id on triples (cost=0.28..4.30 rows=1 width=221)
Index Cond: (app_id = '6b1ca162-0175-4188-9265-849f671d56cc'::uuid)
Filter: (ea_index AND (entity_id = '6b1ca162-0175-4188-9265-849f671d56cc'::uuid))
(3 rows)
I am a bit confused: why is this not using an index scan on ea_index? How could I debug this further?

This was because of a costing decision. EXPLAIN showed that it expected only 1 row, so there was no difference which index it chose. Changing up the uuids, it did pick the correct index.

Related

What are the scenarios that cause postgres to do a seq scan instead of an index scan?

I've run into a very strange issue that has come up multiple times. When I create a database and initially query against a table, the plain command tells me to use index scan.
But during the development process, maybe (I'm not sure) some modifications were made to the table, or some index modifications were made. Later I found that for the same query, it no longer uses Index scan.
If I delete this table and rebuild it, I find the same table and index structure, it starts using index scan again.
I know there are some scenarios, such as when the scan results in very many rows, postgres may directly use seq scan for optimization. But my query could always return only 0 or 1 rows.
I also know that index scan can be more costly in some scenarios due to startup time, but this is clearly not that scenario here either.
Can anyone help me with a clue that I can investigate?
testdb=> \d tabverifies;
Table "public.tabverifies"
Column | Type | Collation | Nullable | Default
--------+----------+-----------+----------+------------------------------------------
vid | integer | | not null | nextval('tabverifies_vid_seq'::regclass)
lid | integer | | not null |
verify | integer | | not null |
secret | text | | not null |
Indexes:
"tabverifies_pkey" PRIMARY KEY, btree (vid)
"tabverifies_lid_verify_key" UNIQUE CONSTRAINT, btree (lid, verify)
Foreign-key constraints:
"tabverifies_lid_fkey" FOREIGN KEY (lid) REFERENCES tablogins(lid)
testdb=> explain select * from tabverifies where vid=1000;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tabverifies (cost=0.00..1.04 rows=1 width=44)
Filter: (vid = 1000)
(2 rows)
testdb=> \d tabverifies;
Table "public.tabverifies"
Column | Type | Collation | Nullable | Default
--------+----------+-----------+----------+------------------------------------------
vid | integer | | not null | nextval('tabverifies_vid_seq'::regclass)
lid | integer | | not null |
verify | integer | | not null |
secret | text | | not null |
Indexes:
"tabverifies_pkey" PRIMARY KEY, btree (vid)
"tabverifies_lid_verify_key" UNIQUE CONSTRAINT, btree (lid, verify)
Foreign-key constraints:
"tabverifies_lid_fkey" FOREIGN KEY (lid) REFERENCES tablogins(lid)
sigserverdb=> explain select * from tabverifies where vid=1;
QUERY PLAN
-------------------------------------------------------------------------------------
Index Scan using tabverifies_pkey on tabverifies (cost=0.15..8.17 rows=1 width=44)
Index Cond: (vid = 1)
(2 rows)

Postgres exclusion constraint on insert/update

I have a table defined like so
Table "public.foo"
Column | Type | Collation | Nullable | Default
----------+---------+-----------+----------+-------------------------------------
foo_id | integer | | not null | nextval('foo_foo_id_seq'::regclass)
bar_id | integer | | |
approved | boolean | | |
Indexes:
"foo_pkey" PRIMARY KEY, btree (foo_id)
Foreign-key constraints:
"foo_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bar(bar_id)
How would I define an exclusion constraint, such that only one row of foo with a specific bar_id would be able to set approved to true?
For example with the following data:
foo_id | bar_id | approved
--------+--------+----------
1 | 1 | t
2 | 1 |
3 | 2 |
(3 rows)
I would be able to set approved to row 3 to true, because no other row with foo_id 3 has true for approved.
However updating row 2 's approved to true would fail, because row 1 also has foo_id 1 and is already approved.
You don't need an exclusion constraint, a filtered unique index will do:
create unique index only_one_approved_bar
on foo (bar_id)
where approved;
I would also recommend to define approved as not null. Boolean columns that allow null values are typically a source of constant confusion.
try this
ALTER TABLE public.foo
ADD CONSTRAINT uniq_approved UNIQUE (bar_id, approved)
or you can create unique index
CREATE UNIQUE INDEX uniq_approved ON public.foo
USING btree (bar_id, approved)
WHERE approved

Why is my count query on index field slow?

I have the following schema:
leadgenie-django=> \d main_lead;
Table "public.main_lead"
Column | Type | Modifiers
-----------------+--------------------------+-----------
id | uuid | not null
body | text | not null
username | character varying(255) | not null
link | character varying(255) | not null
source | character varying(10) | not null
keyword_matches | character varying(255)[] | not null
json | jsonb | not null
created_at | timestamp with time zone | not null
updated_at | timestamp with time zone | not null
campaign_id | uuid | not null
is_accepted | boolean |
is_closed | integer |
raw_body | text |
accepted_at | timestamp with time zone |
closed_at | timestamp with time zone |
score | double precision |
Indexes:
"main_lead_pkey" PRIMARY KEY, btree (id)
"main_lead_campaign_id_75034b1f" btree (campaign_id)
Foreign-key constraints:
"main_lead_campaign_id_75034b1f_fk_main_campaign_id" FOREIGN KEY (campaign_id) REFERENCES main_campaign(id) DEFERRABLE INITIALLY DEFERRED
As you can see, campaign_id is indexed.
When I do a simple WHERE with a campaign_id, the query still takes 16 seconds.
leadgenie-django=> EXPLAIN ANALYZE select count(*) from main_lead where campaign_id = '9a183263-7a60-4ec0-a354-2175f8a2e5c9';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=202866.79..202866.80 rows=1 width=8) (actual time=16715.762..16715.763 rows=1 loops=1)
-> Seq Scan on main_lead (cost=0.00..202189.94 rows=270739 width=0) (actual time=1143.886..16516.490 rows=279405 loops=1)
Filter: (campaign_id = '9a183263-7a60-4ec0-a354-2175f8a2e5c9'::uuid)
Rows Removed by Filter: 857300
Planning time: 0.080 ms
Execution time: 16715.807 ms
I would have expected this query to be fast (under 1s), since this field is indexed. Is there a reason my expectation is wrong? Anything I could do to speed it up?
The query fetches about 25% of your table, so PostgreSQL thinks that this is most cheaply done with a sequential scan of the whole table. That is probably correct.
Try running
VACUUM main_lead;
That will update the visibility map, and if there are no long-running concurrent transactions, that should mark most of the table blocks as all-visible, so that you can get a faster index only scan for the query.

Query planner using a primary key index instead of a more targeted column index when adding order by primary key

Please excuse simplification of actual query. Just to make it readable. Currently having slowdowns in our queries when adding order by primary key.
select id, field1, field2
from table1
where field1 = 'value'
limit 1000;
So having an index for field1, this query uses that index, which makes the query faster. I can trace that the query planner uses that index via the explain command.
Adding an order by suddenly changes the index used to the primary key index though. Which makes the query a lot slower.
select id, field1, field2
from table1 where field1 = 'value'
order by id asc
limit 1000;
Is there a way to force the query planner to use the field1 index?
EDIT:
Actual table detail:
\d fax_message
Table "public.fax_message"
Column | Type | Modifiers
--------------------------+-----------------------------+-----------
id | bigint | not null
broadcast_ref | character varying(255) |
busy_retries | integer |
cli | character varying(255) |
dncr | boolean | not null
document_set_id | bigint | not null
fax_broadcast_attempt_id | bigint |
fps | boolean | not null
header_format | character varying(255) |
last_updated | timestamp without time zone | not null
max_fax_pages | integer |
message_ref | character varying(255) |
must_be_sent_before_date | timestamp without time zone |
request_id | bigint |
resolution | character varying(255) |
retries | integer |
send_from | character varying(255) |
send_ref | character varying(255) |
send_to | character varying(255) | not null
smartblock | boolean | not null
status | character varying(255) | not null
time_zone | character varying(255) |
total_pages | integer |
user_id | uuid | not null
delay_status_check_until | timestamp without time zone |
version | bigint | default 0
cost | numeric(40,10) | default 0
Indexes:
"fax_message_pkey" PRIMARY KEY, btree (id)
"fax_message_broadcast_ref_idx" btree (broadcast_ref)
"fax_message_delay_status_check_until" btree (delay_status_check_until)
"fax_message_document_set_idx" btree (document_set_id)
"fax_message_fax_broadcast_attempt_idx" btree (fax_broadcast_attempt_id)
"fax_message_fax_document_set_idx" btree (document_set_id)
"fax_message_message_ref_idx" btree (message_ref)
"fax_message_request_idx" btree (request_id)
"fax_message_send_ref_idx" btree (send_ref)
"fax_message_status_fax_broadcast_attempt_idx" btree (status, fax_broadcast_attempt_id)
"fax_message_user" btree (user_id)
Foreign-key constraints:
"fk2881c4e5106ed2de" FOREIGN KEY (request_id) REFERENCES core_api_send_fax_request(id)
"fk2881c4e5246f3088" FOREIGN KEY (document_set_id) REFERENCES fax_document_set(id)
"fk2881c4e555aad98b" FOREIGN KEY (user_id) REFERENCES users(id)
"fk2881c4e59920b254" FOREIGN KEY (fax_broadcast_attempt_id) REFERENCES fax_broadcast_attempt(id)
Referenced by:
TABLE "fax_message_status_modifier" CONSTRAINT "fk2dfbe52acb955ec1" FOREIGN KEY (fax_message_id) REFERENCES fax_message(id)
TABLE "fax_message_attempt" CONSTRAINT "fk82058973cb955ec1" FOREIGN KEY (fax_message_id) REFERENCES fax_message(id)
Actual index used:
\d fax_message_status_fax_broadcast_attempt_idx
Index "public.fax_message_status_fax_broadcast_attempt_idx"
Column | Type | Definition
--------------------------+------------------------+--------------------------
status | character varying(255) | status
fax_broadcast_attempt_id | bigint | fax_broadcast_attempt_id
btree, for table "public.fax_message"
Real queries:
With order by:
explain select this_.id as id65_0_, this_.version as version65_0_, this_.broadcast_ref as broadcast3_65_0_, this_.busy_retries as busy4_65_0_, this_.cli as cli65_0_, this_.cost as cost65_0_, this_.delay_status_check_until as delay7_5_0_, this_.dncr as dncr65_0_, this_.document_set_id as document9_65_0_, this_.fax_broadcast_attempt_id as fax10_65_0_, this_.fps as fps65_0_, this_.header_format as header12_65_0_, this_.last_updated as last13_65_0_, this_.max_fax_pages as max14_65_0_, this_.message_ref as message15_65_0_, this_.must_be_sent_before_date as must16_65_0_, this_.request_id as request17_65_0_, this_.resolution as resolution65_0_, this_.retries as retries65_0_, this_.send_from as send20_65_0_, this_.send_ref as send21_65_0_, this_.send_to as send22_65_0_, this_.smartblock as smartblock65_0_, this_.status as status65_0_, this_.time_zone as time25_65_0_, this_.total_pages as total26_65_0_, this_.user_id as user27_65_0_ from fax_message this_ where this_.status='TO_CHARGE_GROUP' order by id asc limit 1000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..53956.06 rows=1000 width=2234)
-> Index Scan using fax_message_pkey on fax_message this_ (cost=0.43..2601902.61 rows=48223 width=2234)
Filter: ((status)::text = 'TO_CHARGE_GROUP'::text)
(3 rows)
This one without the order by:
explain select this_.id as id65_0_, this_.version as version65_0_, this_.broadcast_ref as broadcast3_65_0_, this_.busy_retries as busy4_65_0_, this_.cli as cli65_0_, this_.cost as cost65_0_, this_.delay_status_check_until as delay7_5_0_, this_.dncr as dncr65_0_, this_.document_set_id as document9_65_0_, this_.fax_broadcast_attempt_id as fax10_65_0_, this_.fps as fps65_0_, this_.header_format as header12_65_0_, this_.last_updated as last13_65_0_, this_.max_fax_pages as max14_65_0_, this_.message_ref as message15_65_0_, this_.must_be_sent_before_date as must16_65_0_, this_.request_id as request17_65_0_, this_.resolution as resolution65_0_, this_.retries as retries65_0_, this_.send_from as send20_65_0_, this_.send_ref as send21_65_0_, this_.send_to as send22_65_0_, this_.smartblock as smartblock65_0_, this_.status as status65_0_, this_.time_zone as time25_65_0_, this_.total_pages as total26_65_0_, this_.user_id as user27_65_0_ from fax_message this_ where this_.status='TO_CHARGE_GROUP' limit 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..1744.13 rows=1000 width=2234)
-> Index Scan using fax_message_status_fax_broadcast_attempt_idx on fax_message this_ (cost=0.56..84080.59 rows=48223 width=2234)
Index Cond: ((status)::text = 'TO_CHARGE_GROUP'::text)
(3 rows)
The cost on the query that used the fax_message_pkey is greater than the max cost of the query that used fax_message_status_fax_broadcast_attempt_idx.
I was hoping that the query will still use the fax_message_status_fax_broadcast_attempt_idx index even with the order by there.
According to How do I force Postgres to use a particular index? (and links from answers there) there does not seem to be a way to force use of particular index .
CTEs are a optimization fence. You're not giving us enough information to tell you why your query is getting planned wrong, but this should work if you don't care to actually fix the problem.
WITH t AS (
select id, field1, field2
from table1
where field1 = 'value'
limit 1000
)
SELECT *
FROM t
order by id asc;

Is it possible to create a multitype multicolumn index?

If a table have a column A which is geometry type,and column B which is timestamp(1) without time zone type,
Does PostgreSQL allow to create a multitype multicolumn index on column A and column B ?
the index column : (columnA, column B)
I want to create a gist index on the part column column A and create a btree index on the part column columnB;
The folloing is my case,I want to optimize the follwing sql.
SELECT id,content,the_geo,lon,lat,skyid,addtime FROM mapfriends.user_map_book
where the_geo && mapfriends.ST_BUFFER(mapfriends.geometryfromtext('POINT(100.54687 36.06684)'),0.001)
order by addtime desc limit 30
index of the table
db_lbs=> \d mapfriends.user_map_book
Table "mapfriends.user_map_book"
Column | Type | Modifiers
--------------+--------------------------------+-----------------------------------------------------------------------
id | integer | not null default nextval('mapfriends.user_map_book_id_seq'::regclass)
content | character varying(100) |
lon | double precision |
lat | double precision |
skyid | integer |
addtime | timestamp(1) without time zone | default now()
the_geo | mapfriends.geometry |
viewcount | integer | default 0
lastreadtime | timestamp without time zone |
ischeck | boolean |
Indexes:
"user_map_book_pkey" PRIMARY KEY, btree (id)
"idx_map_book_skyid" btree (skyid, addtime)
"idx_user_map_book_atime" btree (addtime DESC)
"user_map_book_idx_gin" gist ((the_geo::box))
From the manual:
Currently, only the B-tree, GiST and GIN index types support
multicolumn indexes.
Just give it a try!