Indexing of tuples in a large PSQL JSONB array - postgresql

I have a JSONB column "deps" like this
[
[
{
"name": "A"
},
"823"
],
[
{
"name": "B"
},
"332"
],
[
{
"name": "B"
},
"311"
]
]
The dictionary {"name": "B"} always comes first in the tuple. The number of tuples (items in the array) varies between zero and 15K
Is it possible to create an index of the items by "name" field of the dictionary?
Will such index improve performance of search for all rows containing
"name" = "B" or "name" = "C" ?
SELECT *
FROM the_table
WHERE deps #> ANY (
ARRAY [
'[[{"name": "B"}]]',
'[[{"name": "C"}]]'
]::jsonb[]
);
I have discovered that the time of query above does not depend much on the size of the array in the ANY argument. It seems that PSQL creates a temporary (?) index when looking for B, consequent lookup for "C" runs fast.
The table is quite large ~300GB, 25M rows

Related

How to add new property in an object nested in 2 arrays (JSONB postgresql)

I am looking to you for help in adding a property to a json object nested in 2 arrays.
Table Example :
CREATE TABLE events (
seq_id BIGINT PRIMARY KEY,
data JSONB NOT NULL,
evt_type TEXT NOT NULL
);
example of my JSONB data column in my table:
{
"Id":"1",
"Calendar":{
"Entries":[
{
"Id": 1,
"SubEntries":[
{
"ExtId":{
"Id":"10",
"System": "SampleSys"
},
"Country":"FR",
"Details":[
{
"ItemId":"1",
"Quantity":10,
},
{
"ItemId":"2",
"Quantity":3,
}
],
"RequestId":"222",
"TypeId":1,
}
],
"OrderResult":null
}
],
"OtherThingsArray":[
]
}
}
So I need to add new properties into a SubEntries object based on the Id value of the ExtId object (The where clause)
How can I do that please ?
Thanks a lot
You can use jsonb_set() for this, which takes jsonpath assignments as a text[] (array of text values) as
SELECT jsonb_set(
input_jsonb,
the starting jsonb document
path_array '{i,j,k[, ...]}'::text[],
the path array, where the series {i, j, k} progresses at each level with either the (string) key or (integer) index (starting at zero)denoting the new key (or index) to populate
new_jsonb_value,
if adding a key-value pair, you can use something like to_jsonb('new_value_string'::text) here to force things to format correctly
create_if_not_exists_boolean
if adding new keys/indexes, give this as true so they'll be appended; otherwise you'll be limited to overwriting existing keys
)
Example
json
{
"array1": [
{
"id": 1,
"detail": "test"
}
]
}
SQL
SELECT
jsonb_set('{"array1": [{"id": 1, "detail": "test"}]}'::jsonb,
'{array1,0,update}'::TEXT[],
to_jsonb('new'::text),
true
)
Output
{
"array1": [
{
"id": 1,
"upd": "new",
"detail": "test"
}
]
}
Note that you can only add 1 nominal level of depth at a time (i.e. either a new key or a new index entry), but you can circumvent this by providing the full depth in the assignment value, or by using jsonb_set() iteratively:
select
jsonb_set(
jsonb_set('{"array1": [{"id": 1, "detail": "test"}]}'::jsonb, '{array1,0,upd}'::TEXT[], '[{"new": "val"}]'::jsonb, true),
'{array1,0,upd,0,check}'::TEXT[],
'"test_val"',
true)
would be required to produce
{
"array1": [
{
"id": 1,
"upd": [
{
"new": "val",
"check": "test_val"
}
],
"detail": "test"
}
]
}
If you need other, more complex logic to evaluate which values need to be added to which objects, you can try:
dynamically creating a set of jsonb_set() statements for execution
using the outputs from queries of jsonb_each() and jsonb_array_elements() to evaluate the row logic down at the SubEntities level, and then using jsonb_object_agg() and jsonb_agg() as appropriate to build the document back up to the root level from the resultant object-arrays and key-value collections

Postgres JSONB Editing columns

I was going through the Postgres Jsonb documentation but was unable to find a solution for a small issue I'm having.
I've got a table : MY_TABLE
that has the following columns:
User, Name, Data and Purchased
One thing to note is that "Data" is a jsonb and has multiple fields. One of the fields inside of "Data" is "Attribute" but the values it can hold are not in sync. What I mean is, it could be a string, a list of strings, an empty list, or just an empty string. However, I want to change this.
The only values that I want to allow are a list of strings and an empty list. I want to convert all the empty strings to empty lists and regular strings to a list of strings.
I have tried using json_build_array but have not had any luck
So for example, I'd want my final jsonb to look like :
[{
"Id": 1,
"Attributes": ["Test"]
},
{
"Id": 2,
"Attributes": []
},
{
"Id": 3,
"Attributes": []
}]
when converted from
{
"Id": 1,
"Attributes": "Test"
},
{
"Id": 2,
"Attributes": ""
},
{
"Id": 3,
"Attributes": []
}
]
I only care about the "Attributes" field inside of the Json, not any other fields.
I also want to ensure for some Attributes that have an empty string "Attributes": "", they get mapped to an empty list and not a list with an empty string ([] not [""])
I also want to preserve the empty array values ([]) for the Attributes that already hold an empty array value.
This is what I have so far:
jsonb_set(
mycol,
'{Attributes}',
case when js ->> 'Attributes' <> ''
then jsonb_build_array(js ->> 'Attributes')
else '[]'::jsonb
end
)
However, Attributes: [] is getting mapped to ["[]"]
Use jsonb_array_elements() to iterate over the elements of each 'data' cell and jsonb_agg to regroup the transform values together into an array:
WITH test_data(js) AS (
VALUES ($$ [
{
"Id": 1,
"Attributes": "Test"
},
{
"Id": 2,
"Attributes": ""
},
{
"Id": 3,
"Attributes": []
}
]
$$::JSONB)
)
SELECT transformed_elem
FROM test_data
JOIN LATERAL (
SELECT jsonb_agg(jsonb_set(
elem,
'{Attributes}',
CASE
WHEN elem -> 'Attributes' IN ('""', '[]') THEN '[]'::JSONB
WHEN jsonb_typeof(elem -> 'Attributes') = 'string' THEN jsonb_build_array(elem -> 'Attributes')
ELSE elem -> 'Attributes'
END
)) AS transformed_elem
FROM jsonb_array_elements(test_data.js) AS f(elem) -- iterate over every element in the array
) s
ON TRUE
returns
[{"Id": 1, "Attributes": ["Test"]}, {"Id": 2, "Attributes": []}, {"Id": 3, "Attributes": []}]

jsonb searching for keys in array and returning position

I have the following json object stored into a jsonb column
{
"msrp": 6000,
"data": [
{
"supplier": "a",
"price": 5775
},
{
"supplier": "b",
"price": 6129
},
{
"supplier": "c",
"price": 5224
},
{
"supplier": "d",
"price": 5775
}
]
}
There's a few things I'm trying to do but completely stuck on :(
Check if a supplier exists inside this array. So if I'm looking up if "supplier": "e" is in here. Here's what I tried but didn't work. "where data #> '{"supplier": "e"}'"
(optional but really nice to have) Before returning results if I do a select *, inject into each array a "price_diff" so that I can see the difference between msrp and the supplier price as such.
{
"supplier": "d",
"price": 5775,
"price_diff": 225
}
where data #> '{"supplier": "e"}'
Do you have a column named data? You can't just treat a JSONB key name as if it were a column name.
Containment starts from the root.
colname #> '{"data":[{"supplier": "e"}]}'
You can redefine the 'root' dynamically though:
colname->'data' #> '[{"supplier": "e"}]'

Postgresql get keys from array of objects in JSONB field

Here' a dummy data for the jsonb column
[ { "name": [ "sun11", "sun12" ], "alignment": "center", "more": "fields" }, { "name": [ "sun12", "sun13" ], "alignment": "center" }, { "name": [ "sun14", "sun15" ] }]
I want to fetch all the name keys value from jsonb array of objects...expecting output -
[ [ "sun11", "sun12" ], [ "sun12", "sun13" ], [ "sun14", "sun15" ] ]
The problem is that I'm able to fetch the name key value by giving the index like 0, 1, etc
SELECT data->0->'name' FROM public."user";
[ "sun11", "sun12" ]
But I'm not able to get all the name keys values from same array of object.I Just want to get all the keys values from the array of json object. Any help will be helpful. Thanks
demo:db<>fiddle (Final query first, intermediate steps below)
WITH data AS (
SELECT '[ { "name": [ "sun11", "sun12" ], "alignment": "center", "more": "fields" }, { "name": [ "sun12", "sun13" ], "alignment": "center" }, { "name": [ "sun14", "sun15" ] }]'::jsonb AS jsondata
)
SELECT
jsonb_agg(elems.value -> 'name') -- 2
FROM
data,
jsonb_array_elements(jsondata) AS elems -- 1
jsonb_array_elements() expands every array element into one row
-> operator gives the array for attribute name; after that jsonb_agg() puts all extracted arrays into one again.
my example
SELECT DISTINCT sub.name FROM (
SELECT
jsonb_build_object('name', p.data->'name') AS name
FROM user AS u
WHERE u.data IS NOT NULL
) sub
WHERE sub.name != '{"name": null}';

Concurrent update of array elements which are embedded documents in MongoDB

I have documents like this one at collection x at MongoDB:
{
"_id" : ...
"attrs" : [
{
"key": "A1",
"type" : "T1",
"value" : "13"
},
{
"key": "A2",
"type" : "T2",
"value" : "14"
}
]
}
The A1 and A2 elements above are just examples: the attrs field may hold any number of array elements.
I'd need to access concurrently to the attrs array from several independent clients accessing to MongoDB. For example, considers two clients, one wanting to change the value of the element identified by key equal to "A1" to "80" and other wanting to change the value of the element identified by key equal to "A2" to "20". Is there any compact way of doing it using MongoDB operations?
It is important to note that:
Clients doesn't know the position of each element in the attr array, only the key of the element which value has to be modified.
Reading the whole attrs array in client space, searching the element to modify at client space, then updating attrs with the new array (in which the element to modify has been changed) would involve race conditions.
Clients also may add and remove elements in the array. Thus, doing a first search at MongoDB to locate the position of the element to modify, then update it using that particular position doesn't work in general, as elements could have been added/removed thus altering of the position previously found.
The process here is really quite simple, it only varies in where you want to "find or create" the elements in the array.
First, assuming the elements for each key are in place already, then the simple case is to query for the element and update with the index returned via the positional $ operator:
db.collection.update(
{
"_id": docId,
"attrs": { "$elemMatch": { "key": "A1", "type": "T1" } }
}
{ "$set": { "attrs.$.value": "20" }
)
That will only modify the element that is matched without affecting others.
In the second case where "find or create" is required and the particular key may not exist, then you use "two" update statements. But the Bulk Operations API allows you to do this in a single request to the server with a single response:
var bulk = db.collection.initializeOrderedBulkOp();
// Try to update where exists
bulk.find({
"_id": docId,
"attrs": { "$elemMatch": { "key": "A1", "type": "T2" } }
}).updateOne({
"$set": { "attrs.$.value": "30" }
});
// Try to add where does noes not exist
bulk.find({
"_id": docId,
"attrs": { "$not": { "$elemMatch": { "key": "A1", "type": "T2" } } }
}).updateOne({
"$push": { "attrs": { "key": "A1", "type": "T2", "value": "30" } }
});
bulk.execute();
The basic logic being that first the update attempt is made to match an element with the required values just as done before. The other condition tests for where the element is not found at all by reversing the match logic with $not.
In the case where the array element was not found then a new one is valid for addition via $push.
I should really add that since we are specifically looking for negative matches here it is always a good idea to match the "document" that you intend to update by some unique identifier such as the _id key. While possible with "multi" updates, you need to be careful about what you are doing.
So in the case of running the "find or create" process then element that was not matched is added to the array correctly, without interferring with other elements, also the previous update for an expected match is applied in the same way:
{
"_id" : ObjectId("55b570f339db998cde23369d"),
"attrs" : [
{
"key" : "A1",
"type" : "T1",
"value" : "20"
},
{
"key" : "A2",
"type" : "T2",
"value" : "14"
},
{
"key" : "A1",
"type" : "T2",
"value" : "30"
}
]
}
This is a simple pattern to follow, and of course the Bulk Operations here remove any overhead involved by sending and receiving multiple requests to and from the server. All of this hapily works without interferring with other elements that may or may not exist.
Aside from that, there are the extra benefits of keeping the data in an array for easy query and analysis as supported by the standard operators without the need to revert to JavaScript server processing in order to traverse the elements.