How to add a prefix-length index in postgresql? - postgresql

As we known, there're prefix length column which can be added during index creation. For example in mysql,
alter table j1 add index idx_j1_str1 (str1(5));
I didn't find any equivalent solution in postgresql after I searched the google.com and stackoverflow.com.
Then can anyone tell me the answer except the function index in postgresql.
Any reply will be appreciated.

create index idx_j1_str on j1 (left(str1,5));
But I don't think you need something like this in Postgres. An index on just str1 is probably much more versatile. But of course this depends heavily on the queries you run - which you did not show us, so it's impossible to say what kind of index you really need.
To make use of a function based index in Postgres (and basically any other DBMS that supports them) your query needs to contain the same expression as you used in the index:
select *
from j1
where left(str1,5) = '1234'
will use the above index (if it makes sense, e.g. if the table is large enough and the condition reduces the overall result substantially).
If you create a regular index on that column:
create index idx_j1_str on j1 (str1 varchar_pattern_ops);
then it can be used for something like:
select *
from j1
where str1 like '1234%'
(which is equivalent to left(str1,5) = '1234') but it can also be used for:
select *
from j1
where str1 like '1234678%'
or any other prefix search using a wildcard.

Related

Is it possible to index the position of an array column in PostgreSQL?

Let's say I want to find rows in the table my_table that have the value 5 at the first position of the array column my_array_column. To prepare the table, I executed the following statements:
CREATE TABLE my_table (
id serial primary key,
my_array_column integer[]
);
CREATE INDEX my_table_my_array_column_index on "my_table" USING GIN ("my_array_column");
SET enable_seqscan TO off;
INSERT INTO my_table (my_array_column) VALUES ('{5,7,10}');
Now, the query can look like this:
select * from my_table where my_array_column[1] = 5;
This works, but it doesn't use the created GIN index. Is it possible to search for the value 5 at a specific position with an index?
I want to find rows in the table my_table that have the value 5 at the first position of the array column
A partial index would be most efficient for that definition:
CREATE INDEX my_table_my_array_special_idx ON my_table ((true))
WHERE my_array_column[1] = 5;
If only a small fraction of rows qualifies, a partial index is accordingly smaller. Plus, the actual index column only occupies minimum space (typically 8 bytes). And, on top of that, Postgres 13 or later can apply index deduplication to make the index much smaller, yet.
Once the index is fully cached, its small size does not make it much faster, but still.
And most writes do not have to manipulate the index, which may be the most important benefit, depending on the workload.
Oh, and Postgres collects statistics for a partial index. So you can expect the query planner to make a fully educated choice when that index is involved.
Related:
PostgreSQL partial index unused when created on a table with existing data
Index that is not used, yet influences query
It's applicable when the query repeats the same condition.
Typically, you have something useful as index field on top of your declared purpose. But if you don't, just use any small constant - true in my example, but anything < 8 bytes is equally good.
Minor disclaimer: The "first position" in a Postgres array does not necessarily have index 1. If non-standard array indexes are possible, consider:
...
WHERE (my_array_column[:])[1] = 5;
In index and queries.
See:
Normalize array subscripts for 1-dimensional array so they start with 1
You can index just the first position. You need an extra set of parentheses in the create statement to do that:
create index on my_table ((my_array_column[1]));
Or you could augment your query to work with your gin index, on the theory that an array can't have the first element be 5 unless at least one element is 5.
select * from my_table where my_array_column[1] = 5 and my_array_column #> ARRAY[5];
Of course this won't be very efficient if a lot of your arrays contain 5, but in some other spot in the array. It would have to recheck all of those "false matches" to eliminate them. So if you only care about the first element, the first index I showed is better. (Of course, if you only care about the first element, why use an array to start with?)
If you always look at the first position a regular B-Tree index will do:
create index on my_table ( (my_array_column[1]) );
If you don't know the position, then a GIN index is indeed needed, but you need to use an operator that is supported by a gin index, that would be e.g. the #> operator. But for that you need to use a different query:
select *
from my_table
where my_array_column #> array[5];
That would find all rows where the array column contains the value 5.
But you should head the advice given in the manual regarding the use of arrays:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

What is the correct way to create a case-insensitive trigram-index in postgres?

...and is it something I should do anyway?
From my brief testing, making a trigram index and searching using
where name like '%query%'
is faster than
where name ilike '%query%'
So it seems like I should do it, but I'm surprised I've not been able to find out how.
(My test data is fairly homogenous - 1.5M rows made up of 16 entries repeated. I can imagine this might mess with the results.)
This is how I expected it to work (note the lower(name)):
create extension pg_trgm;
create table users(name text);
insert into users values('Barry');
create index "idx" on users using gin (lower(name) gin_trgm_ops);
select count(*) from users where (name like '%bar%');
but this returns 0.
Either of
select count(*) from users where (name like '%Bar%');
or
select count(*) from users where (name ilike '%bar%');
work, which makes me believe the trigrams in the index are not lower()'d. Am I misunderstanding how this works under the hood? Is it not possible to call lower there?
I note that this
select show_trgm('Barry')
returns lowercase trigrams:
{" b"," ba",arr,bar,rry,"ry "}
So I'm perplexed.
The trigrams are definitely lower case.
The conundrum becomes cleared up when you consider how trigram indexes are used: they act as a filter that eliminates the majority of non-matches, but allow false positive results (among other reasons is their case insensitivity). That's why there always has to be a recheck to eliminate those false positives, and that us why you always get a bitmap index scan.
The ILIKE query may be slower because it has more results, or because case insensitive comparisons require more effort.

Matching performance with pattern from table column

I have a query which looks like:
SELECT *
FROM my_table
WHERE 'some_string' LIKE mytable.some_column || '%%'
How can I index some_column to improve this query performance?
Or is the a better way to filter this?
This predicate effectively searches for all prefixes for a given string:
WHERE 'some_string' LIKE mytable.some_column || '%'
Maybe % is a special character in your client, which needs to be escaped with another % to pass a literal %, else '%%' would be just noise and can be replaced with '%'.
The most efficient solution should be a recursive CTE (or similar) that matches to every prefix exactly, starting with some_column = left('some_string', 1), up to some_column = left('some_string', length('some_string')) (= 'some_string').
You only need a plain btree index on the column for this. Depending on details of your implementation, partial expression indexes might improve performance ...
Related:
Reverse pattern matching: find the longest prefix
Algorithm for finding the longest prefix
PostgreSQL LIKE query performance variations
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
I believe you intend to write the following query:
SELECT *
FROM my_table
WHERE mytable.some_column LIKE 'some_string%';
In other words, you want to find records where some column begins with some_string followed by anything, possibly nothing at all.
As far as I know, a regular B-tree index on some_column will be effective, to a point, in your query. The reason is that Postgres can traverse the tree looking for the prefix some_string. Once it has found that entry, beyond that the index might not help. But an index on some_column should give you some performance benefit here.
A condition where an index would not help would be the following:
WHERE mutable.some_column LIKE '%some_string';
In this case, the index is rendered mostly useless, because we have no idea with what prefix the column value should begin.

Postgresql ILIKE versus TSEARCH

I have a query with a number of test fields something like this:
SELECT * FROM some-table
WHERE field1 ILIKE "%thing%"
OR field2 ILIKE "%thing"
OR field3 ILIKE "%thing";
The columns are pretty much all varchar(50) or thereabouts. Now I understand to improve performance I should index the fields upon which the search operates. Should I be considering replacing ILIKE with TSEARCH completely?
A full text search setup is not identical to a "contains" like query. It stems words etc so you can match "cars" against "car".
If you really want a fast ILIKE then no standard database index or FTS will help. Fortunately, the pg_trgm module can do that.
http://www.postgresql.org/docs/9.1/static/pgtrgm.html
http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
One thing that is very important: NO B-TREE INDEX will ever improve this kind of search:
where field ilike '%SOMETHING%'
What I am saying is that if you do a:
create index idx_name on some_table(field);
The only access you will improve is where field like 'something%'. (when you search for values starting with some literal). So, you will get no benefit by adding a regular index to field column in this case.
If you need to improve your search response time, definitely consider using FULL TEXT SEARCH.
Adding a bit to what the others have said.
First you can't really use an index based on a value in the middle of the string. Indexes are tree searches generally, and you have no way to know if your search will be faster than just scanning the table, so PostgreSQL will default to a seq scan. Indexes will only be used if they match the first part of the string. So:
SELECT * FROM invoice
WHERE invoice_number like 'INV-2012-435%'
may use an index but like '%44354456%' cannot.
In general in LedgerSMB we use both, depending on what kind of search we are doing. You might see a search like:
select * from parts
WHERE partnumber ilike ? || '%'
and plainto_tsquery(get_default_language(), ?) ## description;
So these are very different. Use each one where it makes the most sense.

finding MAX(db_timestamp) query

I have BIG table with multiple indexes in postgres. It has indexes on db_timestamp, id, username.
I want to find the MAX timestamp for particular username.
The problem is the simple query like
SELECT MAX(db_timestamp) FROM Foo WHERE username = 'foo'
takes so much time because of the huge table size( we are talking 450GB table with over 30 GB index sizes).
Is their any way to optimize this query or tell postgres about what query plan to use?
Use create an index on username and db_timestamp with correct sort order:
CREATE INDEX idx_foo ON foo (username ASC, db_timestamp DESC);
Check EXPLAIN to see if things work as they should.
Postgresql can't use the index on (db_timestamp,id,username) to satisfy that query- the query term you're after has to be a prefix of the index, i.e. using the first column(s).
So an index on (username,db_timestamp) would serve that query very well, since it just has to scan the subtree (username,0)..(username,+inf) (and iirc Postresql should actually know to try and find (username,+inf) and walk backwards in-order).
In general, "covering indices" isn't a useful technique with Postgresql like it is with other databases, due to Postgresql's need to refer to the heap tuples for visibility information.