I have a table with hundreds of millions rows with schema like below.
tabe AA {
id integer primay key,
prop0 boolean not null,
prop1 boolean not null,
prop2 smallint not null,
...
}
The each "property" field (prop0, prop1, ...) has a small number of distinct values. And I usually query to find "id" from the given conditions of properties fields. I think Bitmap index is best for this query. But postgresql seems not support bitmap index.
I tried b-tree index on each field but these indexes are not used according to the query explain.
Is there a good alternative way to do this?
(i'm using postgresql 9)
Your real problem is a bad schema design, not the index. The properties should be placed in a different table and your current table should link to that table using a many to many relation.
The BIT datatype might also be of use, just check the manual.
Create a multicolumn index on properties which are always or almost always queried. Or several multicolumn indexes if needed.
The alternative, when you do not query the same properties almost always, is to make a tsvector column with words describing your data, maintained using trigger, for example
prop0=true
prop1=false
prop2=4
would be
'propzero nopropone proptwo4'::tsvector
index it using GIN and then use full text search for searching:
where tsv ## 'popzero & nopropone & proptwo4'::tsquery
An index is only used if it actually speeds up the query which is not necessarily always the case. Especially with smallish tables (say thousands of rows) a full table scan ("seq scan" in the Postgres execution plan) might indeed be a lot faster.
How many rows did the table have when you tried the statement?
How did the query look like? Maybe there are other conditions that prevent the index usage.
Did you analyze the table to have up-to-date statistics?
Related
I have a database with over 30,000,000 entries. When performing queries (including an ORDER BY clause) on a text field, the = operator results in relatively fast results. However we have noticed that when using the LIKE operator, the query becomes remarkably slow, taking minutes to complete. For example:
SELECT * FROM work_item_summary WHERE manager LIKE '%manager' ORDER BY created;
Creating indices on the keywords being searched will of course greatly speed up the query. The problem is we must support queries on any arbitrary pattern, and on any column, making this solution not viable.
My questions are:
Why are LIKE queries this much slower than = queries?
Is there any other way these generic queries can be optimized, or is about as good as one can get for a database with so many entries?
Your query plan shows a sequential scan, which is slow for big tables, and also not surprising since your LIKE pattern has a leading wildcard that cannot be supported with a plain B-tree index.
You need to add index support. Either a trigram GIN index to support any and all patterns, or a COLLATE "C" B-tree expression index on the reversed string to specifically target leading wildcards.
See:
PostgreSQL LIKE query performance variations
How to index a column for leading wildcard search and check progress?
One technic to speed up queries that search partial word (eg '%something%') si to use rotational indexing technic wich is not implementedin most of RDBMS.
This technique consists of cutting out each word of a sentence and then cutting it in "rotation", ie creating a list of parts of this word from which the first letter is successively removed.
As an example the word "electricity" will be exploded into 10 words :
lectricity
ectricity
ctricity
tricity
ricity
icity
city
ity
ty
y
Then you put all those words into a dictionnary that is a simple table with an index... and reference the root word.
Tables for this are :
CREATE TABLE T_WRD
(WRD_ID BIGINT IDENTITY PRIMARY KEY,
WRD_WORD VARCHAR(64) NOT NULL UNIQUE,
WRD_DROW GENERATED ALWAYS AS (REVERSE(WRD_WORD) NOT NULL UNIQUE) ;
GO
CREATE TABLE T_WORD_ROTATE_STRING_WRS
(WRD_ID BIGINT NOT NULL REFERENCES T_WRD (WRD_ID),
WRS_ROTATE SMALLINT NOT NULL,
WRD_ID_PART BIGINT NOT NULL REFERENCES T_WRD (WRD_ID),
PRIMARY KEY (WRD_ID, WRS_ROTATE));
GO
The past two days I've been reading a lot about jsonb, full text search, gin index, trigram index and what not but I still can not find a definitive or at least a good enough answer on how to fastly search if a row of type JSONB contains certain string as a value. Since it's a search functionality the behavior should be like that of ILIKE
What I have is:
Table, lets call it app.table_1 which contains a lot of columns one of which is of type JSONB, so lets call it column_jsonb
The data inside column_jsonb will always be flatten (no nested objects, etc) but the keys can vary. An example of the data in the column with obfuscated values looks like this:
"{""Key1"": ""Value1"", ""Key2"": ""Value2"", ""Key3"": null, ""Key4"": ""Value4"", ""Key5"": ""Value5""}"
I have a GIN index for this column which doesn't seems to affect the search time significantly (I am testing with 20k records now which takes about 550ms). The indes looks like this:
CREATE INDEX ix_table_1_column_jsonb_gin
ON app.table_1 USING gin
(column_jsonb jsonb_path_ops)
TABLESPACE pg_default;
I am interested only in the VALUES and the way I am searching them now is this:
EXISTS(SELECT value FROM jsonb_each(column_jsonb) WHERE value::text ILIKE search_term)
Here search_term is variable coming from the front end with the string that the user is searching for
I have the following questions:
Is it possible to make the check faster without modifying the data model? I've read that trigram index might be usfeul for similar cases but at least for me it seems that converting jsonb to text and then checking will be slower and actually I am not sure if the trigram index will actually work if the column original type is JSONB and I explicitly cast each row to text? If I'm wroing I would really appreciate some explanation with example if possible.
Is there some JSONB function that I am not aware of which offers what I am searching for out of the box, I'm constrained to PostgreSQL v 11.9 so some new things coming with version 12 are not available for me.
If it's not possible to achieve significant improvement with the current data structure can you propose a way to restructure the data in column_jsonb maybe another column of some other type with data persisted in some other way, I don't know...
Thank you very much in advance!
If the data structure is flat, and you regularly need to search the values, and the values are all the same type, a traditional key/value table would seem more appropriate.
create table table1_options (
table1_id bigint not null references table1(id),
key text not null,
value text not null
);
create index table1_options_key on table1_options(key);
create index table1_options_value on table1_options(value);
select *
from table1_options
where value ilike 'some search%';
I've used simple B-Tree indexes, but you can use whatever you need to speed up your particular searches.
The downsides are that all values must have the same type (doesn't seem to be a problem here) and you need an extra table for each table. That last one can be mitigated somewhat with table inheritance.
I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.
If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.
I have some tables that are around 100 columns wide. I haven't normalized them because to put it back together would require almost 3 dozen joins and am not sure it would perform any better... haven't tested it yet (I will) so can't say for sure.
Anyway, that really isn't the question. I have been indexing columns in these tables that I know will be pulled frequently, so something like 50 indexes per table.
I got to thinking though. These columns will never be pulled by themselves and are meaningless without the primary key (basically an item number). The PK will always be used for the join and even in simple SELECT queries, it will have to be a specified column so the data makes sense.
That got me thinking further about indexes and how they work. As I understand them the locations of a values are committed to memory for that column so it is quickly found in a query.
For example, if you have:
SELECT itemnumber, expdate
FROM items;
And both itemnumber and expdate are indexed, is that excessive and really adding any benefit? Is it sufficient to just index itemnumber and the index will know that expdate, or anything else that is queried for that item, is on the same row?
Secondly, if multiple columns constitute a primary key, should the index include them grouped together, or is individually sufficient?
For example,
CREATE INDEX test_index ON table (pk_col1, pk_col2, pk_col3);
vs.
CREATE INDEX test_index1 ON table (pk_col1);
CREATE INDEX test_index2 ON table (pk_col2);
CREATE INDEX test_index3 ON table (pk_col3);
Thanks for clearing that up in advance!
Uh oh, there is a mountain of basics that you still have to learn.
I'd recommend that you read the PostgreSQL documentation and the excellent book “SQL Performance Explained”.
I'll give you a few pointers to get you started:
Whenever you create a PRIMARY KEY or UNIQUE constraint, PostgreSQL automatically creates a unique index over all the columns of that constraint. So you don't have to create that index explicitly (but if it is a multicolumn index, it sometimes is useful to create another index on any but the first column).
Indexes are relevant to conditions in the WHERE clause and the GROUP BY clause and to some extent for table joins. They are irrelevant for entries in the SELECT list. An index provides an efficient way to get the part of a table that satisfies a certain condition; an (unsorted) access to all rows of a table will never benefit from an index.
Don't sprinkle your schema with indexes randomly, since indexes use space and make all data modification slow.
Use them where you know that they will do good: on columns on which a foreign key is defined, on columns that appear in WHERE clauses and contain many different values, on columns where your examination of the execution plan (with EXPLAIN) suggests that you can expect a performance benefit.
I was wondering I'm not really sure how multiple indexes would work on the same column.
So lets say I have an id column and a country column. And on those I have an index on id and another index on id and country. When I do my query plan it looks like its using both those indexes. I was just wondering how that works? Can I force it to use just the id and country index.
Also is it bad practice to do that? When is it a good idea to index the same column multiple times?
It is common to have indexes on both (id) and (country,id), or alternatively (country) and (country,id) if you have queries that benefit from each of them. You might also have (id) and (id, country) if you want the "covering" index on (id,country) to support index only scans, but still need the stand along to enforce a unique constraint.
In theory you could just have (id,country) and still use it to enforce uniqueness of id, but PostgreSQL does not support that at this time.
You could also sensibly have different indexes on the same column if you need to support different collations or operator classes.
If you want to force PostgreSQL to not use a particular index to see what happens with it gone, you can drop it in a transactions then roll it back when done:
BEGIN; drop index table_id_country_idx; explain analyze select * from ....; ROLLBACK;