Best Way to Index All JSONB Fields With Boolean Values in PostgreSQL - postgresql

I am creating a table that has a features jsonb column. There will be a dynamic set of features (each row can have an unknown set of features). Each feature is boolean true/false values.
Example:
Row 1: feature: { "happy": true, "tall": false, "motivated": true }
Row 2: feature: { "happy": true, "fast": true, "strong": false }
Row 3: feature: { "smart": true, "fast": true, "sleepy": false }
What would be the best way to index this column such that I can make queries to find all rows with featureX = true? All the examples I have looked up seem to need a field name to base the index on.

You can create an index on the complete JSON value:
create index on the_table using gin (features);
It can be used for e.g. the #> operator:
select *
from the_table
where features #> '{"happy": true}'
Another method would be to not store key/value pairs, but only list the features that are "true" in an array: ["happy", "motivated"] and then use the ? operator. This way the JSON value is a bit smaller and that might be more efficient.
select *
from the_table
where features ? 'happy'
or if you want to test for multiple features:
select *
from the_table
where features ?| array['happy', 'motivated']
That too can make use of the GIN index

Related

In Postgres, how can I efficiently filter using the inner numbers of this jsonb structure?

So I work with Postgres SQL, and I have a jsonb column with the following structure:
{
"Store1":[
{
"price":5.99,
"seller":"seller"
},
{
"price":56.43,
"seller":"seller"
}
],
"Store2":[
{
"price":45.65,
"seller":"seller"
},
{
"price":44.66,
"seller":"seller"
}
]
}
I have a jsonb like this for every product in the database. I want to run an SQL query that will answer the following question:
For each product, is one of the prices in this JSON is bigger/equal/smaller than X?
Basically filter the product to include only the ones who have at least one price that satisfies a mathematical condition.
How can I do it efficiently? What's the best way in Postgres to iterate a JSON like this, with a relatively complex inner structure?
Also, if I could control the way the data is structured (to an extent, I can), what changes can I do to make this query more efficient?
Thanks!
Use a json path expression:
WHERE col ## '$.*[*].price < 20'
or
WHERE col #? '$.*[*] ? (#.price < 20)'
If you need to compare to another column or make the query parameterised, you can either build the jsonpath dynamically
WHERE col ## format('$.*[*].price < %s', $1)::jsonpath
WHERE col #? format('$.*[*] ? (#.price < %s)', $1)::jsonpath
or you can use the respective function and pass variables as an object:
WHERE jsonb_path_match(col, '$.*[*].price < $limit', jsonb_build_object('limit', $1))
WHERE jsonb_path_exists(col, format('$.*[*] ? (#.price < $limit)', jsonb_build_object('limit', $1))
I admit I had to check my cheat sheet to figure out the right combination of operator and expression. Takeaways:
if a comparison operator needs to work with multiple values, it generally functions as an ANY
## does not work with ? (# …) filter expressions since they don't return a boolean,
#? does not work with predicates since they always return a value (even if it's false)
What changes can I do to make this query more efficient?
As #jjanes commented on my other answer, the jsonpath match col ## '$.*[*].price < $limit' isn't going to be fast and needs to do full table scan, at least for < and >. To make a useful index, a different approach is required. An index can only have a single value to compare with, not any number. For that, we need to change the condition from EXISTS(SELECT prices_of(col) WHERE price < $limit) to (SELECT MIN(prices_of(col))) < $limit.
With this idea it is possible to build an expression index on the result of a custom immutable function:
CREATE FUNCTION min_price(data jsonb) RETURNS float
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT
RETURN (
SELECT min((offer ->> 'price')::float)
FROM jsonb_each(data) AS entries(name, store),
LATERAL jsonb_array_elements(store) AS elements(offer)
);
CREATE INDEX example_min_data_price_idx ON example (min_price(data));
which you can use as
SELECT * FROM example WHERE min_price(data) < 20;
Looking for rows with a price larger than a certain number requires a separate index on max_price(data). If you want to use the index in a JOIN with more conditions, consider making it a multi-column index.
Looking for row with a price equalling a certain number can be optimised by indexing the jsonb column and using a jsonpath:
CREATE INDEX example_data_idx ON example USING GIN (data jsonb_ops);
SELECT * FROM example WHERE data ## '$.*[*].price == 20';
SELECT * FROM example WHERE data #? '$.*[*] ? (#.price == 20)';
Unfortunately you can't use jsonb_path_ops here since that doesn't support the wildcard.

Properly using gin index on a JSONB flat object

I'm wondering how to correctly create an index that would index the keys of a JSONB column that has the following shape:
{
"perm1": true,
"perm2": false,
"perm3": true
}
Basically I'd like the lookup of perm1, perm2 and perm3 to be the keys to reference. I don't know the values of the keys so I can't using the regular create index. I did find this as a suggestion:
CREATE INDEX idx_visitors ON visitors USING GIN (jsonb jsonb_path_value_ops);
Is there a more appropriate way of creating asserting this index?

Indexing a josnb column in postgresql

I have a column in postgresql table with type jsonb.
{
.....
"type": "car",
"vehicleIds": [
"980e3761-935a-4e52-be77-9f9461dec4d1","980e3761-935a-4e52-be77-9f9461dec4d2"
]
.....
}
Application runs queries against these fields to fetch records. I need to index this column only for these fields.
How can this be done?
This is query structure with properties as the column name:
SELECT *
FROM Vehicle f
WHERE f.properties::text ## CONCAT('$.vehicleIds[*] >', :vehicleId )= true
AND f.properties::text ## CONCAT('$.type >', :type ) = true
The query you are using is highly confusing, as it boils down to be a text search query, as the ## is applied on a text value.
I also don't understand the '$.type > ... condition. With values like car I would expect an equality operator, rather than "greater than". Using > together with a UUID also doesn't seem to make sense.
If you want to search for values of type car and contain a list of IDs, using the "contains" operator #> is a better way to do that:
SELECT *
FROM Vehicle f
WHERE f.properties #> '{"type": "car", "vehicleIds": ["980e3761-935a-4e52-be77-9f9461dec4d1"]}'
The above could make use of a GIN index on the properties column:
create index on vehicles using gin (properties);
If the type key is always queried with equality (which I assume), a combined index might be more efficient:
create index on vehicles using gin ( (properties ->> 'type'), (properties -> 'vehicleIds') );
You need to install the btree_gin extension in order to create that index.
That index would be a bit smaller but needs a different query:
SELECT *
FROM Vehicle f
WHERE f.properties ->> 'type' = 'car'
AND f.properties -> 'vehicleIds' #> '["980e3761-935a-4e52-be77-9f9461dec4d1"]'
You will need to validate if the indexes are used and which ones is more efficient by looking at the execution plan

Postgres jsonb query missing index?

We have the following json documents stored in our PG table (identities) in a jsonb column 'data':
{
"email": {
"main": "mainemail#email.com",
"prefix": "aliasPrefix",
"prettyEmails": ["stuff1", "stuff2"]
},
...
}
I have the following index set up on the table:
CREATE INDEX ix_identities_email_main
ON identities
USING gin
((data -> 'email->main'::text) jsonb_path_ops);
What am I missing that is preventing the following query from hitting that index?? It does a full seq scan on the table... We have tens of millions of rows, so this query is hanging for 15+ minutes...
SELECT * FROM identities WHERE data->'email'->>'main'='mainemail#email.com';
If you use JSONB data type for your data column, in order to index ALL "email" entry values you need to create following index:
CREATE INDEX ident_data_email_gin_idx ON identities USING gin ((data -> 'email'));
Also keep in mind that for JSONB you need to use appropriate list of operators;
The default GIN operator class for jsonb supports queries with the #>,
?, ?& and ?| operators
Following queries will hit this index:
SELECT * FROM identities
WHERE data->'email' #> '{"main": "mainemail#email.com"}'
-- OR
SELECT * FROM identities
WHERE data->'email' #> '{"prefix": "aliasPrefix"}'
If you need to search against array elements "stuff1" or "stuff2", index above will not work , you need to explicitly add expression index on "prettyEmails" array element values in order to make query work faster.
CREATE INDEX ident_data_prettyemails_gin_idx ON identities USING gin ((data -> 'email' -> 'prettyEmails'));
This query will hit the index:
SELECT * FROM identities
WHERE data->'email' #> '{"prettyEmails":["stuff1"]}'

Formatting database when using multiple criteria

This should be a relatively simple question. I come from a Python background and don't do a lot of SQL stuff so thought I would ask this formatting question here.
Say I've got something that has
Criteria 1: True
Criteria 2: False
Criteria N: True
In Postgresql, is it better to set the database up as:
Column: Criteria
Row: [1:True,2:False,N:True]
or set each criteria as a column of its own?
Use three Boolean columns:
CREATE TABLE t (
criteria1 boolean,
criteria2 boolean,
criterian boolean
);
You can then formulate queries that involve these columns:
SELECT *
FROM t
WHERE criteria1 = true
AND criteria2 = false;
or
SELECT *
FROM t
WHERE criteria1 = false
OR criterian = true;
Relational databases are designed to do this. In addition, you can create an index on these columns.