Using index for jsonb sub keys postgresql - postgresql

I have a table like:
create table items
(
id int constraint items_pk primary key,
acl jsonb
);
With such items:
id,acl
1,{
"users": {
"2": { <-- The key "2" is the user_id
"role1": {...},
"role2": {...}
},
"3": {
"role1": {...}
}
},
"groups": {...}
}
...
I want to count the number of items where the user "2" has the role "role2", what I do:
SELECT COUNT(*) FROM items WHERE ( acl->'users'->'2' ? 'role2')
The problem is that I want this query to use an index, but I can't make this query to use them. Here are the index I setup:
CREATE INDEX _index1 ON items using gin (acl jsonb_ops);
CREATE INDEX _index2 ON items using gin ((acl->'users') jsonb_ops);
Then I tried this query that is using the index but it is slower ( like 40x ) than the first one so it is unuseful. And also goes beyond the fact that I just want to verify the presence of the "role2" key in acl->'users'->'2'.
SELECT COUNT(*) FROM items WHERE ( acl #> '{"users": {"2": {"role2": {...}}}}');
My question is how can I make this query to use an index keeping my current json data structure ?
I know I can use string arrays and lots of other things to make this usecase work but they imply changing the data structure, and this is not my point here because the problem is that this data structure is used at scale and I want to know if something is possible with this structure.

Related

Postgres - Query for nested value in nested array

I have a table called clients with a column called configs containing JSON object.
{
"pos_link": {
"primary_id": "000123",
"sub_ids": ["000123", "000124", "00125", "000126"],
},
"prime_tags": {
"tags": ["Children"]
}
}
How do I find all entries where one of the sub_id is '00124'
select *
from clients c,
jsonb_array_elements(c.configs->'pos_link') pos_link,
jsonb_array_elements(pos_link->'sub_ids') sub_ids
where sub_id IN ('00124')
You can use the contains operator ?
select *
from clients
where configs -> 'pos_link' -> 'sub_ids' ? '000124';
This assumes that configs is defined as jsonb (which it should be). If it's not, you need to cast it: configs::jsonb
Online example

How to get postgres to use index for wild card query on JSONB column?

I have a table in a postgres DB that looks like the following:
CREATE TABLE docs {
id SERIAL PRIMARY KEY,
doc JSONB
};
The JSON documents look like the following:
{
"meta": {
"name": "something",
...
},
...
}
I want to run wildcard queries on the table like follows:
SELECT id FROM docs WHERE doc -> 'meta' ->> 'name' LIKE '%Adam%';
However, this search is not using any index. I have tried creating indexes like the following:
CREATE INDEX myindex ON docs USING GIN ((doc));
CREATE INDEX myindex ON docs USING GIN ((doc -> 'meta' -> 'name'));
CREATE INDEX myindex ON docs ((doc -> 'meta' -> 'name'));
When I try to 'explain' the query, I always find that index is not used. How do I execute this query and have postgres use the index ?

Jsonb object parsing in PostgreSql

How to parse jsonb object in PostgreSql. The problem is - object every time is different by structure inside. Just like below.
{
"1":{
"1":{
"level":2,
"nodeType":2,
"id":2,
"parentNode":1,
"attribute_id":363698007,
"attribute_text":"Finding site",
"concept_id":386108004,
"description_text":"Heart tissue",
"hierarchy_id":0,
"description_id":-1,
"deeperCnt":0,
"default":false
},
"level":1,
"nodeType":1,
"id":1,
"parentNode":0,
"concept_id":22253000,
"description_id":37361011,
"description_text":"Pain",
"hierarchy_id":404684003,
"deeperCnt":1,
"default":false
},
"2":{
"1":{
"attribute_id":"363698007",
"attribute_text":"Finding site (attribute)",
"value_id":"321667001",
"value_text":"Respiratory tract structure (body structure)",
"default":true
},
"level":1,
"nodeType":1,
"id":3,
"parentNode":0,
"concept_id":11833005,
"description_id":20419011,
"description_text":"Dry cough",
"hierarchy_id":404684003,
"deeperCnt":1,
"default":false
},
"level":0,
"recAddedLevel":1,
"recAddedId":3,
"nodeType":0,
"multiple":false,
"currNodeId":3,
"id":0,
"lookForAttributes":false,
"deeperCnt":2,
}
So how should I parse all object and for example look if object inside has "attribute_id" = 363698007?
In this case we should get 'true' while selecting data rows in PostgreSql with WHERE statement.
2 question - what index should I use for jsonb column to get wanted results?
Already tried to create btree and gin indexes but even simple select returns 'null' with sql like this:
SELECT object::jsonb -> 'id' AS id
FROM table;
if I use this:
SELECT object
FROM table;
returns firstly described object.
The quick and dirty way (extended upon Collect Recursive JSON Keys In Postgres):
WITH RECURSIVE doc_key_and_value_recursive(id, key, value) AS (
SELECT
my_json.id,
t.key,
t.value
FROM my_json, jsonb_each(my_json.data) AS t
UNION ALL
SELECT
doc_key_and_value_recursive.id,
t.key,
t.value
FROM doc_key_and_value_recursive,
jsonb_each(CASE
WHEN jsonb_typeof(doc_key_and_value_recursive.value) <> 'object' THEN '{}'::jsonb
ELSE doc_key_and_value_recursive.value
END) AS t
)
SELECT t.id, t.data->'id' AS id
FROM doc_key_and_value_recursive AS c
INNER JOIN my_json AS t ON (t.id = c.id)
WHERE
jsonb_typeof(c.value) <> 'object'
AND c.key = 'attribute_id'
AND c.value = '363698007'::jsonb;
Online example: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=57b7c4e817b2dd6580bbf28cbac10981
This may be improved a lot by stopping the recursion as soon as the relevant key and value are found, reverse sort and limit 1, aso. But it does the basic thing generically.
It also shows that jsonb->'id' does work as expected.

How to query an n:n adjacency list map in DynamoDB without using scan

I'm trying to model a cataloging system in DynamodDB. It has "Catalogs" which contains "Collections". Each "Collection" can be tagged by many "Tags".
In an RDBMS I would create a table "Catalogs" with a 1:n relationship with "Collections". "Collections" would have an n:n with "Tags" as a Collection can have multiple Tags and a Tag can belong to multiple Collections.
The queries I want to run are:
1) Get all catalogs
2) Get catalog by ID
3) Get collections by catalog ID
I read on AWS I can use the adjacency list map design (because I have the n:n with "Tags"). So here is my table structure:
PK SK name
cat-1 cat-1 Sales Catalog
cat-1 col-1 Sales First Collection
cat-1 col-2 Sales Second Collection
cat-2 cat-2 Finance Catalog
tag-1 tag-1 Recently Added Tag
col-1 tag-1 (collection, tag relationship)
The problem here is I have to use a scan which I understand to be inefficient in order to get all "Catalogs" because a query's PK has to be an '=' and not a 'Begins With'.
The only thing I can think of is creating another attribute like "GSI_PK" and add "Catalog_1" when the PK is cat-1 and the SK is cat-1, "Catalog_2" when the PK is cat-2 and SK is cat-2. I've never really see this done so I'm not sure if it's the way to go and it takes some maintenance if I ever want to change IDs.
Any ideas how I would accomplish this?
In that case, you can have the PK be the type of the object and the SK be a uuid. A record would look like this { PK: "Catalog", SK: "uuid", ...other catalog fields }. You can then do a get all catalogs by doing a query on the PK = Catalog.
To store the associations you can have a GSI on two fields sourcePK and relatedPK where you could store records that associate things. To associate an object you would create a record like e.g. { PK: "Association", SK: "uuid", sourcePK: "category-1", relatedPK: "collection-1", ... other data on the association }. To find objects associated with the "Catalog" with id 1, you would do a query on the GSI where sourcePK = catalog-1.
With this setup you need to be careful about hot keys and should make sure you never have more than 10GBs of data under the same partition key in a table or index.
Let's walk through it. I'll use GraphQL SDL to layout the design of the data model & queries but you can just apply the same concepts to DynamoDB directly.
Thinking data model first we will have something like:
type Catalog {
id: ID!
name: String
# Use a DynamoDB query on the **Collection** table
# where the **catalogId = $ctx.source.id**. Use a GSI or make catalogId the PK.
collections: [Collection]
}
type Collection {
id: ID!
name: String
# Use a DynamoDB query on the **CollectionTag** table where
# the **collectionId = $ctx.source.id**. Use a GSI or make the collectionId the PK.
tags: [CollectionTag]
}
# The "association map" idea as a GraphQL type. The underlying table has a collectionId and tagId.
# Create objects of this type to associate a collection and tag in the many to many relationship.
type CollectionTag {
# Do a GetItem on the **Collection** table where **id = $ctx.source.collectionId**
collection: Collection
# Do a GetItem on the **Tag** table where **id = $ctx.source.tagId**
tag: Tag
}
type Tag {
id: ID!
name: String
# Use a DynamoDB query on teh **CollectionTag** table where
# the **tagId = $ctx.source.id**. If collectionId is the PK then make a GSI where this tagId is the PK.
collections: [CollectionTag]
}
# Root level queries
type Query {
# GetItem to **Catalog** table where **id = $ctx.args.id**
getCatalog(id: ID!): Catalog
# Scan to **Catalog** table. As long as you don't care about ordering on a filed in particular then
# this will likely be okay at the top level. If you only want all catalogs where "arePublished = 1",
# for example then we would likely change this.
allCatalogs: [Catalog]
# Note: You don't really need a getCollectionsByCatalogId(catalogId: ID!) at the top level because you can
# use `query { getCatalog(id: "***") { collections { ... } } }` which is effectively the same thing.
# You could add another field here if having it at the top level was a requirement
getCollectionsByCatalogId(catalogId: ID!): [Collection]
}
Note: Everywhere I use [Collection] or [Catalog] etc above you should use a CollectionConnection, CatalogConnection, etc wrapper type to enable pagination.

Postgresql json select from values in second layer of containment of arrays

I have a jsonb column 'data' that contains a tree like json, example:
{
"libraries":[
{
"books":[
{
"name":"mybook",
"type":"fiction"
},
{
"name":"yourbook",
"type":"comedy"
}
{
"name":"hisbook",
"type":"fiction"
}
]
}
]
}
I want to be able to do a index using query that selects a value from the indented "book" jsons according to the type.
so all book names that are fiction.
I was able to do this using jsonb_array_elements a join query, but as i understand this would not be optimized with using the GIN index.
my query is
select books->'name'
from data,
jsonb_array_elements(data->'libraries') libraries,
jsonb_array_elements(libraries->'books') books,
where books->>'type'='grading'
If the example data you are showing is the type of data that is common in your JSON, I would suggest that you may be setting things up wrong.
Why not make a library table and a book table and not use JSON at all, it seems JSON is not the right choice here.
CREATE TABLE library
(
id serial,
name text
);
CREATE TABLE book
(
isbn BIGINT,
name text,
book_type text
);
CREATE TABLE library_books
(
library_id integer,
isbn BIGINT
)
select book.* from library_books where library_id = 1;