PostgreSQL indexes for hstore boolean attributes

PostgreSQL indexes for hstore boolean attributes - postgresql

I have an hstore column called extras and I have defined there many attributes some of them are boolean and I would like to index some of them for example extras->'delivered' in this case which would be the best way to index some of these attributes.
If you answer could you tell me if your technique applies for decimal or other types.
Thanx.

Indexing individual keys in a hstore field
The current hstore version doesn't have typed values. All values are text. So you can't directly define a "boolean" index on an hstore value. You can, however, cast the value to boolean and index the cast expression.
CREATE INDEX sometable_extras_delivered_bool
ON sometable ( ((extras->'delivered')::boolean) );
Only queries that use the expression (extras->'delivered')::boolean) will benefit from the index. If the index expression uses a cast, the query expressions must too.
This b-tree index on a hstore field will be less efficient to create and maintain than a b-tree index of a boolean col directly in the table. It'll be much the same to query.
Indexing all keys in a hstore field
If you want a general purpose index that indexes all hstore keys, you can only index them as text. There's no support for value typing in hstore in PostgreSQL 9.3.
See indexes on hstore.
This is useful when you don't know in advance which keys you need to index.
(Users on later, pre-release at time of writing versions of PostgreSQL with the json-compatible hstore version 2 will find that their hstore supports typed values).
Reconsider your data model
Frankly, if you're creating indexes on fields in a hstore that you treat as boolean, then consider re-thinking your data model. You are quite likely better off having this boolean as a normal field of the table that contains the hstore.
You can store typed values in json, but you don't get the GIN / GiST index support that's available for hstore. This will improve in 9.4 or 9.5, with hstore 2 adding support for typed, nested, indexable hstores and a new json representation being built on top of that.
Partial indexes
For booleans you may also want to consider partial index expressions where the boolean is a predicate on another index, instead of the actual indexed column. E.g:
CREATE INDEX sometable_ids_delivered ON sometable(id) WHERE (delivered);
or, for the hstore field:
CREATE INDEX sometable_ids_delivered ON sometable(id) WHERE ((extras->'delivered')::boolean);
Exactly what's best depends on your queries.

Related

PostgreSQL: Index JSONB array that is queried with `#?` operator

My table (table) has a JSONB field (data) that contains a field with an array where I store tags (tags).
I query that table with an expression like:
SELECT * FROM table WHERE data->'tags' #? '$[*] ? (# like_regex ".*(foo|bar).*" flag "i");
With such use-case is there a way for me to index the data->'tags' array to speed up the query? Or should I rather work on moving the tags array out of the JSONB field and into a TEXT[] field and index that?
I've already tried:
CREATE INDEX foo ON tbl USING GIN ((data->'tags') jsonb_path_ops);
but it doesn't work: https://gist.github.com/vkaracic/a62ac917d34eb6e975c4daeefbd316e8

The index you built can be used (if you set enable_seqscan=off, you will see that it does get used), but it is generally not chosen as it is pretty useless for this query. The only rows it would rule out through the index are the ones that don't have the 'tags' key at all, and even at that is poorly estimated so probably won't be used without drastic measures.
You could try to convert to text[] and the use parray_gin, but probably better would be to convert to a child table with text and then use pg_trgm.

Fast way to check if PostgreSQL jsonb column contains certain string

The past two days I've been reading a lot about jsonb, full text search, gin index, trigram index and what not but I still can not find a definitive or at least a good enough answer on how to fastly search if a row of type JSONB contains certain string as a value. Since it's a search functionality the behavior should be like that of ILIKE
What I have is:
Table, lets call it app.table_1 which contains a lot of columns one of which is of type JSONB, so lets call it column_jsonb
The data inside column_jsonb will always be flatten (no nested objects, etc) but the keys can vary. An example of the data in the column with obfuscated values looks like this:
"{""Key1"": ""Value1"", ""Key2"": ""Value2"", ""Key3"": null, ""Key4"": ""Value4"", ""Key5"": ""Value5""}"
I have a GIN index for this column which doesn't seems to affect the search time significantly (I am testing with 20k records now which takes about 550ms). The indes looks like this:
CREATE INDEX ix_table_1_column_jsonb_gin
ON app.table_1 USING gin
(column_jsonb jsonb_path_ops)
TABLESPACE pg_default;
I am interested only in the VALUES and the way I am searching them now is this:
EXISTS(SELECT value FROM jsonb_each(column_jsonb) WHERE value::text ILIKE search_term)
Here search_term is variable coming from the front end with the string that the user is searching for
I have the following questions:
Is it possible to make the check faster without modifying the data model? I've read that trigram index might be usfeul for similar cases but at least for me it seems that converting jsonb to text and then checking will be slower and actually I am not sure if the trigram index will actually work if the column original type is JSONB and I explicitly cast each row to text? If I'm wroing I would really appreciate some explanation with example if possible.
Is there some JSONB function that I am not aware of which offers what I am searching for out of the box, I'm constrained to PostgreSQL v 11.9 so some new things coming with version 12 are not available for me.
If it's not possible to achieve significant improvement with the current data structure can you propose a way to restructure the data in column_jsonb maybe another column of some other type with data persisted in some other way, I don't know...
Thank you very much in advance!

If the data structure is flat, and you regularly need to search the values, and the values are all the same type, a traditional key/value table would seem more appropriate.
create table table1_options (
table1_id bigint not null references table1(id),
key text not null,
value text not null
);
create index table1_options_key on table1_options(key);
create index table1_options_value on table1_options(value);
select *
from table1_options
where value ilike 'some search%';
I've used simple B-Tree indexes, but you can use whatever you need to speed up your particular searches.
The downsides are that all values must have the same type (doesn't seem to be a problem here) and you need an extra table for each table. That last one can be mitigated somewhat with table inheritance.

What is the best type of index to use on a materialized view in PostgreSQL

I want to increase the performance of queries on table in Postgrsql db i need to use.
CREATE TABLE mytable (
article_number text NOT NULL,
description text NOT null,
feature text NOT null,
...
);
The table is just in example but the thing is that there are no unique columns. article_number is the one used in the where clause but for example article_number='000.002-00A' can have from 3 to 300 rows. The total number of rows is 102,165,920. What would be the best index to use for such a situation?
I know there B-tree, Hash, GiST, SP-GiST, GIN and BRIN index types in postgres but which one would be the best for this.

If the lookups are filtered on article_number then an index should be created on that. Not quite sure what else you're asking.
The default index is a btree and that'll work fine. If you're only checking for strict equality hash would also be an option but it has issues before Postgres 10, so I wouldn't recommend it.
Other index types are for more complicated forms of querying or custom data types, there's no reason to even consider them if you just want to perform equality filters.
btrees are useful for strict equality and range searches (which includes prefix search e.g. foo like 'bar%')
hash indexes are useful only for strict equality they can be faster & smaller than btrees in some rare cases
GIN indexes are useful when you have multiple index values per row (arrays, json, gis, some FTS cases)
GiST indexes are useful for more complex querying than equality and range (geom/gis, FTS)
I've never looked into BRIN index so I'm not sure what their use case would be. But my understanding is that there's no case to even consider it before you have huge numbers of rows.
Basically, use btree unless you know that you can not.

Postgres Array column vs JSONB column

Is a Postgres Array column more easily indexed than a JSONB column with a JSON array in it?
https://www.postgresql.org/docs/current/arrays.html
https://www.compose.com/articles/faster-operations-with-the-jsonb-data-type-in-postgresql/

Syntactically, the JSONB array may be easier to use as you don't have to wrap your query value in a dummy array constructor:
where jsonbcolumn ? 'abc';
vs
where textarraycolumn #> ARRAY['abc']
On the other hand, the planner is likely to make better decisions with the PostgreSQL array, as it collects statistics on its contents, but doesn't on JSONB.
Also, you should read the docs for the version of PostgreSQL you are using, which is hopefully greater than 9.4 and really really should be greater than 9.1.

jsonb data type lookup cost in postgres

This might be an obvious and simple question.
But I read through the jsonb data type documentation, but nowhere it mentions the lookup cost of a key in jsonb data.
For example, let's say I have a table with following schema:
CREATE TABLE A (id character varying (20),
info jsonb);
I want to know how postgres would parse a where query as below:
SELECT * FROM A WHERE info->>'city' = 'portland';
While going through the jsonb field of a row, is the lookup constant time (O(1)) or linear time (checking each key one by one in the row's jsonb dictionary) within that jsonb data dictionary?
My intuition is that it must be constant time (else what's the point of a dictionary style data?) but I can't see it in the official documentation to convince my team.
Any help would be great!
Thanks!

As with any WHERE condition in SQL: if there is no index, the database has to go through all rows of the table to find those that satisfy your condition.
You can either index a specific expression, or you can index the whole json value using a GIN index which then enables Postgres to use the index if any of the supported operators are used.
If you always check for the city, you can create a regular B-Tree index:
create index on a ( (info->>'city') );
If you don't know what you will be looking for, a GIN index might be a better choice:
create index on a using gin (info);
But you will need to change your query to use one of the operators that are supported by a GIN index, e.g. using the contains operator #>
select *
from a
where info #> '{"city": "portland"}::jsonb;
Note that an index lookup is not always the most efficient solution. Sometimes it's faster to simply go through all rows, sometimes the index lookup is faster.
If you want to learn more about indexes in relational database, go through the material here: http://use-the-index-luke.com/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse