Do Postgres multicolumn indexes get used for OR queries? - postgresql

This seems like a straightforward question, but I can't find the answer online.
I'm using Postgres 9.4 and have this table:
Table "public.title"
Column | Type | Collation | Nullable | Default
---------------------------------+-------------------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('title_id_seq'::regclass)
name1 | character varying(1000) | | |
name2 | character varying(1000) | | |
name3 | character varying(1000) | | |
name4 | character varying(1000) | | |
And I have a multicolumn index:
"idx_title_names" btree (name1, name2, name3, name4)
But for OR queries, the index isn't being used:
EXPLAIN ANALYZE SELECT * FROM "title" WHERE ("title"."name1" = 'foo'
OR "title"."name3" = 'foo' OR "title"."name3" = 'foo' OR "title"."name4" = 'foo');
Gather (cost=1000.00..436451.46 rows=659 width=4500) (actual time=561.418..1297.877 rows=3222 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on title (cost=0.00..435385.56 rows=275 width=4500) (actual time=551.627..1286.724 rows=1074 loops=3)
Filter: (((name1)::text = 'foo'::text) OR ((name2)::text = 'foo'::text) OR ((name3)::text = 'foo'::text) OR ((name4)::text = 'foo'::text))
Rows Removed by Filter: 1231911
Planning Time: 0.102 ms
Execution Time: 1298.148 ms
Is this because these indexes don't work with OR queries?
And: if so, is my best bet just to create 4 separate standard indexes?

One option is to create a GIN index on the array of the columns, then use an array operator:
create index on title using gin (array[name1,name2,name3,name4]);
Then use
SELECT *
FROM title
WHERE array[name1,name2,name3,name4] #> array['foo'];
Note that a GIN index is a bit more expensive to maintain than a BTree index.

OR is often a performance problem in SQL.
This index cannot be used for a condition like that.
Your best bet is to create four single-column indexes and hope for a Bitmap Or:
CREATE INDEX ON public.title (name1);
CREATE INDEX ON public.title (name2);
CREATE INDEX ON public.title (name3);
CREATE INDEX ON public.title (name4);

Having index on (col1, col2, col3, etc) it will be used for conditions/ordering on col1, or col1 and col2, or col1, col2 and col3 etc. It will be not used for conditions/ordering only on col3 for example.
Look at this:
# create table t as select random() as a, random() as b from generate_series(1,1000000);
# create index i on t(a,b);
# analyze t;
# explain analyze select * from t where a > 0.9;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=2246.83..8863.15 rows=96826 width=16) (actual time=10.973..28.023 rows=99311 loops=1)
Recheck Cond: (a > '0.9'::double precision)
Heap Blocks: exact=5406
-> Bitmap Index Scan on i (cost=0.00..2222.62 rows=96826 width=0) (actual time=10.251..10.252 rows=99311 loops=1)
Index Cond: (a > '0.9'::double precision)
Planning Time: 0.348 ms
Execution Time: 31.054 ms
# explain analyze select * from t where b > 0.9;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..17906.00 rows=99117 width=16) (actual time=0.015..70.505 rows=100137 loops=1)
Filter: (b > '0.9'::double precision)
Rows Removed by Filter: 899863
Planning Time: 0.090 ms
Execution Time: 73.656 ms
However when you are using or condition the DBMS actually should to perform several queries, for our example select * from t where a > 0.9 or b > 0.9 is equal to select * from t where a > 0.9 (index could be used) and select * from t where b > 0.9 (index could not be used) thus instead of two actions (scan index then scan whole table) DBMS performs only one action (scan whole table)
Hope it explains why your index is not used for your query.

Related

Postgres underestimating the number of rows leading to bad query plan

I have a query which is taking 2.5 seconds to run. On checking the query plan, I got to know that postgres is heavily underestimating the number of rows leading to nested loops.
Following is the query
explain analyze
SELECT
reprocessed_videos.video_id AS reprocessed_videos_video_id
FROM
reprocessed_videos
JOIN commit_info ON commit_info.id = reprocessed_videos.commit_id
WHERE
commit_info.tag = 'stop_sign_tbc_inertial_fix'
AND reprocessed_videos.reprocess_type_id = 28
AND reprocessed_videos.classification_crop_type_id = 0
AND reprocessed_videos.reprocess_status = 'success';
Following is the explain analyze output.
Nested Loop (cost=0.84..22941.18 rows=1120 width=4) (actual time=31.169..2650.181 rows=179524 loops=1)
-> Index Scan using commit_info_tag_key on commit_info (cost=0.28..8.29 rows=1 width=4) (actual time=0.395..0.397 rows=1 loops=1)
Index Cond: ((tag)::text = 'stop_sign_tbc_inertial_fix'::text)
-> Index Scan using ix_reprocessed_videos_commit_id on reprocessed_videos (cost=0.56..22919.99 rows=1289 width=8) (actual time=30.770..2634.546 rows=179524 loops=1)
Index Cond: (commit_id = commit_info.id)
Filter: ((reprocess_type_id = 28) AND (classification_crop_type_id = 0) AND ((reprocess_status)::text = 'success'::text))
Rows Removed by Filter: 1190
Planning Time: 0.326 ms
Execution Time: 2657.724 ms
As we can see index scan using ix_reprocessed_videos_commit_id anticipated 1289 rows, whereas there were 179524 rows. I have trying to find the reason for this but have been unsuccessful in whatever I tried.
Following are the things I tried.
Vacuum and analyzing all the involved tables. (helped a little but not much maybe because the tables were automatically vacuumed and analyzed)
Increasing the statistics count for commit_id column alter table reprocessed_videos alter column commit_id set statistics 1000; (helped a little)
I read about extended statistics, but not sure if they are of any use here.
Following are the number of tuples in each of these tables
kpis=> SELECT relname, reltuples FROM pg_class where relname in ('reprocessed_videos', 'video_catalog', 'commit_info');
relname | reltuples
--------------------+---------------
commit_info | 1439
reprocessed_videos | 3.1563756e+07
Following is some information related to table schemas
Table "public.reprocessed_videos"
Column | Type | Collation | Nullable | Default
-----------------------------+-----------------------------+-----------+----------+------------------------------------------------
id | integer | | not null | nextval('reprocessed_videos_id_seq'::regclass)
video_id | integer | | |
reprocess_status | character varying | | |
commit_id | integer | | |
reprocess_type_id | integer | | |
classification_crop_type_id | integer | | |
Indexes:
"reprocessed_videos_pkey" PRIMARY KEY, btree (id)
"ix_reprocessed_videos_commit_id" btree (commit_id)
"ix_reprocessed_videos_video_id" btree (video_id)
"reprocessed_videos_video_commit_reprocess_crop_key" UNIQUE CONSTRAINT, btree (video_id, commit_id, reprocess_type_id, classification_crop_type_id)
Foreign-key constraints:
"reprocessed_videos_commit_id_fkey" FOREIGN KEY (commit_id) REFERENCES commit_info(id)
Table "public.commit_info"
Column | Type | Collation | Nullable | Default
------------------------+-------------------+-----------+----------+-----------------------------------------
id | integer | | not null | nextval('commit_info_id_seq'::regclass)
tag | character varying | | |
commit | character varying | | |
Indexes:
"commit_info_pkey" PRIMARY KEY, btree (id)
"commit_info_tag_key" UNIQUE CONSTRAINT, btree (tag)
I am sure that postgres should not use nested loops in this case, but is using them because of bad row estimates. Any help is highly appreciated.
Following are the experiments I tried.
Disabling index scan
Nested Loop (cost=734.59..84368.70 rows=1120 width=4) (actual time=274.694..934.965 rows=179524 loops=1)
-> Bitmap Heap Scan on commit_info (cost=4.29..8.30 rows=1 width=4) (actual time=0.441..0.444 rows=1 loops=1)
Recheck Cond: ((tag)::text = 'stop_sign_tbc_inertial_fix'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on commit_info_tag_key (cost=0.00..4.29 rows=1 width=0) (actual time=0.437..0.439 rows=1 loops=1)
Index Cond: ((tag)::text = 'stop_sign_tbc_inertial_fix'::text)
-> Bitmap Heap Scan on reprocessed_videos (cost=730.30..84347.51 rows=1289 width=8) (actual time=274.250..920.137 rows=179524 loops=1)
Recheck Cond: (commit_id = commit_info.id)
Filter: ((reprocess_type_id = 28) AND (classification_crop_type_id = 0) AND ((reprocess_status)::text = 'success'::text))
Rows Removed by Filter: 1190
Heap Blocks: exact=5881
-> Bitmap Index Scan on ix_reprocessed_videos_commit_id (cost=0.00..729.98 rows=25256 width=0) (actual time=273.534..273.534 rows=180714 loops=1)
Index Cond: (commit_id = commit_info.id)
Planning Time: 0.413 ms
Execution Time: 941.874 ms
I also set updated the statistics for the commit_id column. I observe a approximately 3x speed increase.
On trying to disable bitmapscan, the query does a sequential scan and takes 19 seconds to run
The nested loop is the perfect join strategy, because there is only one row from commit_info. Any other join strategy would lose.
The question is if the index scan on reprocessed_videos is really too slow. To experiment, try again after SET enable_indexscan = off; to get a bitmap index scan and see if that is better. Then also SET enable_bitmapscan = off; to get a sequential scan. I suspect that your current plan will win, but the bitmap index scan has a good chance.
If the bitmap index scan is better, you should indeed try to improve the estimate:
ALTER TABLE reprocessed_videos ALTER commit_id SET STATISTICS 1000;
ANALYZE reprocessed_videos;
You can try with other values; pick the lowest that gives you a good enough estimate.
Another thing to try are extended statistics:
CREATE STATISTICS corr (dependencies)
ON (reprocess_type_id, classification_crop_type_id, reprocess_status)
FROM reprocessed_videos;
ANALYZE reprocessed_videos;
Perhaps you don't need even all three columns in there; play with it.
If the bitmap index scan does not offer enough benefit, there is one way how you can speed up the current index scan:
CLUSTER reprocessed_videos USING ix_reprocessed_videos_commit_id;
That rewrites the table in index order (and blocks concurrent access while it is running, so be careful!). After that, the index scan could be considerably faster. However, the order is not maintained, so you'll have to repeat the CLUSTER occasionally if enough of the table has been modified.
Create a covering index; one that has all the condition columns (first, in descending order of cardinality) and the value columns (last) needed for you query, which means the index alone can be used - avoiding accessing the table:
create index covering_index on reprocessed_videos(
reprocess_type_id,
classification_crop_type_id,
reprocess_status,
commit_id,
video_id
);
And ensure there's one on commit_info(id) too - indexes are not automatically defined in postgres, even for primary keys:
create index commit_info__id on commit_info(id);
To get more accurate query plans, you can manually set the cardinality of condition columns, for example:
select count(distinct reprocess_type_id) from reprocessed_videos;
Then set that value to the column:
alter table reprocessed_videos alter column reprocess_type_id set (n_distinct = number_from_above_query)

In PostgreSQL what does hashed subplan mean?

I want to know how the optimizer rewrote the query and how to read the execution plan in PostgreSQL
Here is the sample code.
DROP TABLE ords;
CREATE TABLE ords (
ORD_ID INT NOT NULL,
ORD_PROD_ID VARCHAR(2) NOT NULL,
ETC_CONTENT VARCHAR(100));
ALTER TABLE ords ADD CONSTRAINT ords_PK PRIMARY KEY(ORD_ID);
CREATE INDEX ords_X01 ON ords(ORD_PROD_ID);
INSERT INTO ords
SELECT i
,chr(64+case when i <= 10 then i else 26 end)
,rpad('x',100,'x')
FROM generate_series(1,10000) a(i);
SELECT COUNT(*) FROM ords WHERE ORD_PROD_ID IN ('A','B','C');
DROP TABLE delivery;
CREATE TABLE delivery (
ORD_ID INT NOT NULL,
VEHICLE_ID VARCHAR(2) NOT NULL,
ETC_REMARKS VARCHAR(100));
ALTER TABLE delivery ADD CONSTRAINT delivery_PK primary key (ORD_ID, VEHICLE_ID);
CREATE INDEX delivery_X01 ON delivery(VEHICLE_ID);
INSERT INTO delivery
SELECT i
, chr(88 + case when i <= 10 then mod(i,2) else 2 end)
, rpad('x',100,'x')
FROM generate_series(1,10000) a(i);
analyze ords;
analyze delivery;
This is the SQL I am interested in.
SELECT *
FROM ords a
WHERE ( EXISTS (SELECT 1
FROM delivery b
WHERE a.ORD_ID = b.ORD_ID
AND b.VEHICLE_ID IN ('X','Y')
)
OR a.ORD_PROD_ID IN ('A','B','C')
);
Here is the execution plan
| Seq Scan on portal.ords a (actual time=0.038..2.027 rows=10 loops=1) |
| Output: a.ord_id, a.ord_prod_id, a.etc_content |
| Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR ((a.ord_prod_id)::text = ANY ('{A,B,C}'::text[]))) |
| Rows Removed by Filter: 9990 |
| Buffers: shared hit=181 |
| SubPlan 1 |
| -> Index Only Scan using delivery_pk on portal.delivery b (never executed) |
| Index Cond: (b.ord_id = a.ord_id) |
| Filter: ((b.vehicle_id)::text = ANY ('{X,Y}'::text[])) |
| Heap Fetches: 0 |
| SubPlan 2 |
| -> Index Scan using delivery_x01 on portal.delivery b_1 (actual time=0.023..0.025 rows=10 loops=1) |
| Output: b_1.ord_id |
| Index Cond: ((b_1.vehicle_id)::text = ANY ('{X,Y}'::text[])) |
| Buffers: shared hit=8 |
| Planning: |
| Buffers: shared hit=78 |
| Planning Time: 0.302 ms |
| Execution Time: 2.121 ms
I don't know how the optimizer transformed the SQL.
What is the final SQL the optimizer rewrote?
I have only one EXISTS sub-query in the SQL above, why are there two sub-plans?
What does "hashed Sub-Plan 2" mean?
I would appreciate it if anyone share a little knowledge with me.
You have the misconception that the optimizer rewrites the SQL statement. That is not the case. Rewriting the query is the job of the query rewriter, which for example replaces views with their definition. The optimizer comes up with a sequence of execution steps to compute the result. It produces a plan, not an SQL statement.
The optimizer plans two alternatives: either execute subplan 1 for each row found, or execute subplan 2 once (note that it is independent of a), build a hash table from the result and probe that hash for each row found in a.
At execution time, PostgreSQL decides to use the latter strategy, that is why subplan 1 is never executed.

Why is a MAX query with an equality filter on one other column so slow in Postgresql?

I'm running into an issue in PostgreSQL (version 9.6.10) with indexes not working to speed up a MAX query with a simple equality filter on another column. Logically it seems that a simple multicolumn index on (A, B DESC) should make the query super fast.
I can't for the life of me figure out why I can't get a query to be performant regardless of what indexes are defined.
The table definition has the following:
- A primary key foo VARCHAR PRIMARY KEY (not used in the query)
- A UUID field that is NOT NULL called bar UUID
- A sequential_id column that was created as a BIGSERIAL UNIQUE type
Here's what the relevant columns look like exactly (with names modified for privacy):
Table "public.foo"
Column | Type | Modifiers
----------------------+--------------------------+--------------------------------------------------------------------------------
foo_uid | character varying | not null
bar_uid | uuid | not null
sequential_id | bigint | not null default nextval('foo_sequential_id_seq'::regclass)
Indexes:
"foo_pkey" PRIMARY KEY, btree (foo_uid)
"foo_bar_uid_sequential_id_idx", btree (bar_uid, sequential_id DESC)
"foo_sequential_id_key" UNIQUE CONSTRAINT, btree (sequential_id)
Despite having the index listed above on (bar_uid, sequential_id DESC), the following query requires an index scan and takes 100-300ms with a few million rows in the database.
The Query (get the max sequential_id for a given bar_uid):
SELECT MAX(sequential_id)
FROM foo
WHERE bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f';
The EXPLAIN ANALYZE result doesn't use the proper index. Also, for some reason it checks if sequential_id IS NOT NULL even though it's declared as not null.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.75..0.76 rows=1 width=8) (actual time=321.110..321.110 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..0.75 rows=1 width=8) (actual time=321.106..321.106 rows=1 loops=1)
-> Index Scan Backward using foo_sequential_id_key on foo (cost=0.43..98936.43 rows=308401 width=8) (actual time=321.106..321.106 rows=1 loops=1)
Index Cond: (sequential_id IS NOT NULL)
Filter: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Rows Removed by Filter: 920761
Planning time: 0.196 ms
Execution time: 321.127 ms
(9 rows)
I can add a seemingly unnecessary GROUP BY to this query, and that speeds it up a bit, but it's still really slow for a query that should be near instantaneous with indexes defined:
SELECT MAX(sequential_id)
FROM foo
WHERE bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'
GROUP BY bar_uid;
The EXPLAIN (ANALYZE, BUFFERS) result:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=8510.54..65953.61 rows=6 width=24) (actual time=234.529..234.530 rows=1 loops=1)
Group Key: bar_uid
Buffers: shared hit=1 read=11909
-> Bitmap Heap Scan on foo (cost=8510.54..64411.55 rows=308401 width=24) (actual time=65.259..201.969 rows=309023 loops=1)
Recheck Cond: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Heap Blocks: exact=10385
Buffers: shared hit=1 read=11909
-> Bitmap Index Scan on foo_bar_uid_sequential_id_idx (cost=0.00..8433.43 rows=308401 width=0) (actual time=63.549..63.549 rows=309023 loops=1)
Index Cond: (bar_uid = 'fa61424d-389f-4e75-ba2d-b77e6bb8491f'::uuid)
Buffers: shared read=1525
Planning time: 3.067 ms
Execution time: 234.589 ms
(12 rows)
Does anyone have any idea what's blocking this query from being on the order of 10 milliseconds? This should logically be instantaneous with the right index defined. It should only require the time to follow links to the leaf value in the B-Tree.
Someone asked:
What do you get for SELECT * FROM pg_stats WHERE tablename = 'foo' and attname = 'bar_uid';?
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+------------------------+-------------+-----------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------+------------------+-------------+-------------------+------------------------+----------------------
public | foo | bar_uir | f | 0 | 16 | 6 | {fa61424d-389f-4e75-ba2d-b77e6bb8491f,5c5dcae9-1b7e-4413-99a1-62fde2b89c32,50b1e842-fc32-4c2c-b00f-4a17c3c1c5fa,7ff1999c-c0ea-b700-343f-9a737f6ad659,f667b353-e199-4890-9ffd-4940ea11fe2c,b24ce968-29fd-4587-ba1f-227036ee3135} | {0.203733,0.203167,0.201567,0.195867,0.1952,0.000466667} | | -0.158093 | | |
(1 row)

Postgresql - table with trigram gin index with mixed case or ILike does not use the index

A table with trigram index, does not work if there is mixed case or ILike in the query.
Im not sure what I have missed. Any ideas?
(Im using PostgreSQL 9.6.2)
CREATE TABLE public.tbltest (
"tbltestId" int NOT null ,
"mystring1" text,
"mystring2" character varying,
CONSTRAINT "tbltest_pkey" PRIMARY KEY ("tbltestId")
);
insert into tbltest ("tbltestId","mystring1", "mystring2")
select x.id, x.id || ' Test', x.id || ' Test' from generate_series(1,100000) AS x(id);
CREATE EXTENSION pg_trgm;
CREATE INDEX tbltest_idx1 ON tbltest using gin ("mystring1" gin_trgm_ops);
CREATE INDEX tbltest_idx2 ON tbltest using gin ("mystring2" gin_trgm_ops);
Using lower case text in the query works, and uses the index
explain analyse
select * from tbltest
where "mystring2" Like '%test%';
QUERY PLAN |
-----------------------------------------------------------------------------------------------------------------------------|
Bitmap Heap Scan on tbltest (cost=20.08..56.68 rows=10 width=24) (actual time=29.846..29.846 rows=0 loops=1) |
Recheck Cond: ((mystring2)::text ~~ '%test%'::text) |
Rows Removed by Index Recheck: 100000 |
Heap Blocks: exact=726 |
-> Bitmap Index Scan on tbltest_idx2 (cost=0.00..20.07 rows=10 width=0) (actual time=12.709..12.709 rows=100000 loops=1) |
Index Cond: ((mystring2)::text ~~ '%test%'::text) |
Planning time: 0.086 ms |
Execution time: 29.875 ms |
Like does not use the index if I add mixed case in the search
explain analyse
select * from tbltest
where "mystring2" Like '%Test%';
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------|
Seq Scan on tbltest (cost=0.00..1976.00 rows=99990 width=24) (actual time=0.011..33.376 rows=100000 loops=1) |
Filter: ((mystring2)::text ~~ '%Test%'::text) |
Planning time: 0.083 ms |
Execution time: 51.259 ms |
ILike does not use the index either
explain analyse
select * from tbltest
where "mystring2" ILike '%Test%';
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------|
Seq Scan on tbltest (cost=0.00..1976.00 rows=99990 width=24) (actual time=0.012..87.038 rows=100000 loops=1) |
Filter: ((mystring2)::text ~~* '%Test%'::text) |
Planning time: 0.134 ms |
Execution time: 105.757 ms |
PostgreSQL does not use the index in the last two queries because that is the best way to process the query, not because it cannot use it.
In your EXPLAIN output you can see that the first query returns zero rows (actual ... rows=0), while the other two queries return every single row in the table (actual ... rows=100000).
The PostgreSQL optimizer's estimates reflect that situation accurately.
Since it has to access most of the rows of the table anyway, PostgreSQL knows that it will be able to get the result much cheaper if it scans the table sequentially than by using the more complicated index access method.

Why isn't my index used

I'm running postgresql-9.1.6 on RHEL 5.8 OS. I got a statement implementing seq scan on which column is indexed.
Table "public.table"
Column | Type | Modifiers
----------+-----------------------+-----------------------------------------
col1 | character(3) | not null
col2 | character varying(20) | not null
col3 | character varying(20) |
col4 | character(1) | default 0
Indexes:
"table_pkey" PRIMARY KEY, btree (col1, col2)
postgres=# explain analyze select * from table where col1=right('10000081',3);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on table (cost=0.00..31053.24 rows=5650 width=286) (actual time=3.221..429.950 rows=110008 loops=1)
Filter: ((col1)::text = '081'::text)
Total runtime: 435.904 ms
(3 rows)
postgres=# explain analyze select * from table where col1=right('10000081',3)::char(3);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on table (cost=3097.81..18602.98 rows=112173 width=286) (actual time=18.125..32.707 rows=110008 loops=1)
Recheck Cond: (col1 = '081'::character(3))
-> Bitmap Index Scan on table_pkey (cost=0.00..3069.77 rows=112173 width=0) (actual time=17.846..17.846 rows=110008 loops=1)
Index Cond: (col1 = '081'::character(3))
Total runtime: 38.640 ms
(5 rows)
and I found that [alter column] is one of the solution....
postgres=# alter table table alter column col1 type varchar(3);
ALTER TABLE
postgres=# explain analyze select * from table where col1=right('10000081',3);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on table (cost=160.26..10902.32 rows=5650 width=295) (actual time=20.249..41.658 rows=110008 loops=1)
Recheck Cond: ((col1)::text = '081'::text)
-> Bitmap Index Scan on table_pkey (cost=0.00..158.85 rows=5650 width=0) (actual time=20.007..20.007 rows=110008 loops=1)
Index Cond: ((col1)::text = '081'::text)
Total runtime: 47.408 ms
(5 rows)
What I wonder is WHY???
It looks from the first plan that there is an implicit type cast performed on col1 to match the return type of right().
Filter: ((col1)::text = '081'::text)
Evidently the expression right('10000081',3) returns text.
So I would say that yes, you do have to type cast the expression, although an alternative would be to index on (col1)::text -- not my favourite solution though.
Avoid the char datatype. It's awful for many reasons, and this is only one of them.
If you stick to text or varchar you'll have fewer issues with implicit casts and confusing behavior.