How create a GIN index on nested arrays in PostgeSQL

How create a GIN index on nested arrays in PostgeSQL - postgresql

I have a table with column "data" of jsonb type. The structure looks like this (synthetic simplified example):
[
{
"object": "car",
"fields": [
{ "id": "price", "value": "200000" },
{ "id": "weight", "value": "600" },
{ "id": "color", "value": "green" }
]
},
{
"object": "wheel1",
"fields": [
{ "id": "diameter", "value": "16" },
{ "id": "height", "value": "20" }
]
},
{
"object": "wheel2",
"fields": [
{ "id": "diameter", "value": "17" },
{ "id": "height", "value": "30" }
]
}
]
The goal is to find rows which has objects with some field properties like so:
SELECT * FROM cars
WHERE data #> '[{"fields": [{"id":"diameter", "value":"16"}, {"id":"height", "value":"20"}]}]'
AND data #> '[{"fields": [{"id":"price", "value":"200000"}, {"id":"weight", "value":"600"}]}]';
EXPLAIN of course says that Seq Scan is involved. I know how to create GIN index on objects in array:
CREATE INDEX cars_data_idx ON cars USING gin ((data->'some_array') jsonb_path_ops);
However I'm unable to create a proper index on "fields" arrays, which are in objects inside main array. Is it generally possible? What is the correct syntax for that?

Related

How to select filtered postgresql jsonb field with performance prioritization?

A table:
CREATE TABLE events_holder(
id serial primary key,
version int not null,
data jsonb not null
);
Data field can be very very large (up to 100 Mb) and looks like this:
{
"id": 5,
"name": "name5",
"events": [
{
"id": 255,
"name": "festival",
"start_date": "2022-04-15",
"end_date": "2023-04-15",
"values": [
{
"id": 654,
"type": "text",
"name": "importance",
"value": "high"
},
{
"id": 655,
"type": "boolean",
"name": "epic",
"value": "true"
}
]
},
{
"id": 256,
"name": "discovery",
"start_date": "2022-02-20",
"end_date": "2022-02-22",
"values": [
{
"id": 711,
"type": "text",
"name": "importance",
"value": "low"
},
{
"id": 712,
"type": "boolean",
"name": "specificAttribute",
"value": "false"
}
]
}
]
}
I want to select data field by version, but filtered with extra condition: where events end_date > '2022-03-15'. And the output must look like this:
{
"id": 5,
"name": "name5",
"events": [
{
"id": 255,
"name": "festival",
"start_date": "2022-04-15",
"end_date": "2023-04-15",
"values": [
{
"id": 654,
"type": "text",
"name": "importance",
"value": "high"
},
{
"id": 655,
"type": "boolean",
"name": "epic",
"value": "true"
}
]
}
]
}
How can I do this with maximum performance? How should I index the data field?
My primary solution:
with cte as (
select eh.id, eh.version, jsonb_agg(events) as filteredEvents from events_holder eh
cross join jsonb_array_elements(eh.data #> '{events}') as events
where version = 1 and (events ->> 'end_date')::timestamp >= '2022-03-15'::timestamp
group by id, version
)
select jsonb_set(data, '{events}', cte.filteredEvents) from events_holder, cte
where events_holder.id = cte.id;
But i don't think it's a good variant.

You can do this using a JSON path expression:
select eh.id, eh.version,
jsonb_path_query_array(data,
'$.events[*] ? (#.end_date.datetime() >= "2022-03-15".datetime())')
from events_holder eh
where eh.version = 1
and eh.data #? '$.events[*] ? (#.end_date.datetime() >= "2022-03-15".datetime())'
Given your example JSON, this returns:
[
{
"id": 255,
"name": "festival",
"values": [
{
"id": 654,
"name": "importance",
"type": "text",
"value": "high"
},
{
"id": 655,
"name": "epic",
"type": "boolean",
"value": "true"
}
],
"end_date": "2023-04-15",
"start_date": "2022-04-15"
}
]
Depending on your data distribution a GIN index on data or an index on version could help.
If you need to re-construct the whole JSON content but with just a filtered events array, you can do something like this:
select (data - 'events')||
jsonb_build_object('events', jsonb_path_query_array(data, '$.events[*] ? (#.end_date.datetime() >= "2022-03-15".datetime())'))
from events_holder eh
...
(data - 'events') removes the events key from the json. Then the the result of the JSON path query is appended back to that (partial) object.

Transform nested JSON in data factory to sql

New to data factory. I have a json file that needs to manipulate but I can't figure out how to go about it. The file has a generic "name" property but it should have the value as the key name. How can I get it so that I can get the value as key?
So far been getting Complex JSON errors. This json is coming from file store.
[
{
"Version": "1.1",
"Documents": [
{
"DocumentState": "Correct",
"DocumentData": {
"Name": "Name1",
"$type": "Document",
"Fields": [
{
"Name": "Form",
"$type": "Text",
"Value": "Birthday Form"
},
{
"Name": "Date",
"$type": "Text",
"Value": "12/1/1999"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "John"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "Smith"
}
]
}
}
]
},
{
"Version": "1.1",
"Documents": [
{
"DocumentState": "Correct",
"DocumentData": {
"Name": "Name2",
"$type": "Document",
"Fields": [
{
"Name": "Form",
"$type": "Text",
"Value": "Entry Form"
},
{
"Name": "Date",
"$type": "Text",
"Value": "4/3/2010"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "Jane"
},
{
"Name": "LastName",
"$type": "Text",
"Value": "Doe"
}
]
}
}
]
}
]
Expected output
DocumentData: [
{
"Form":"Birthday Form",
"Date": "12/1/1999",
"FirstName": "John",
"LastName": "Smith"
},
{
"Form":"Entry Form",
"Date": "4/3/2010",
"FirstName": "Jane",
"LastName": "Doe"
}
]

#jaimers,
I was able to achieve it by making use of the Data Flow Activity
The below is the complete DataFlow
1) Source1
This step involves getting the data from source. You will have to configure the Source dataset.
The only change I had done in the source was to Convert Fields.Name,Field.Type,Field.Value as string[] (From string).
This was required to make/create key value pair of the fields in the Subsequent steps.
Flatten1
I had made use of Flatten at the Document level.
And got the values of DocumentData.DocumentName and DocumentData.Fields
Note : If you don't want DocumentData.DocumentName - You can safely ignore it.
4) DerivedColumn1
This is actual step where I convert name:key1 key:value1 to key1:value1.
To do that I had made use of the below expression :
keyValues(Fields.Name,Fields.Value)
Note: Keyvalues() function expects 2 array arguments. Hence, in the first step we had changed the type of Fields.Name and Fields.Value to array.
4) Select
Just to select the columns that need to be sent as an output
Output

You mentioned SQL in your title so if you have access to a SQL database, eg Azure SQL DB, then it is quite capable with manipulating JSON, eg using the OPENJSON and FOR JSON PATH methods. A simple example:
DECLARE #json VARCHAR(MAX) = '[
{
"Version": "1.1",
"Documents": [
{
"DocumentState": "Correct",
"DocumentData": {
"Name": "Name1",
"$type": "Document",
"Fields": [
{
"Name": "Form",
"$type": "Text",
"Value": "Birthday Form"
},
{
"Name": "Date",
"$type": "Text",
"Value": "12/1/1999"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "John"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "Smith"
}
]
}
}
]
},
{
"Version": "1.1",
"Documents": [
{
"DocumentState": "Correct",
"DocumentData": {
"Name": "Name2",
"$type": "Document",
"Fields": [
{
"Name": "Form",
"$type": "Text",
"Value": "Entry Form"
},
{
"Name": "Date",
"$type": "Text",
"Value": "4/3/2010"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "Jane"
},
{
"Name": "LastName",
"$type": "Text",
"Value": "Doe"
}
]
}
}
]
}
]'
-- Restructure the JSON and add a root
SELECT *
FROM OPENJSON ( #json )
WITH
(
Form VARCHAR(50) '$.Documents[0].DocumentData.Fields[0].Value',
[Date] DATE '$.Documents[0].DocumentData.Fields[1].Value',
FirstName VARCHAR(50) '$.Documents[0].DocumentData.Fields[2].Value',
LastName VARCHAR(50) '$.Documents[0].DocumentData.Fields[3].Value'
)
FOR JSON PATH, ROOT('DocumentData');
My results:
NB I've used the ROOT clause to add a root to the JSON document. You could make the #json a stored proc parameter and use a Stored Proc task from the pipeline.

Using GraphQL,Springboot,MongoDB.The json is 1000+ lines deeply nested.Instead of updating whole doc,need to update specific key-value at any position

Requirement for mutation is to behave like upsert. For example in below json mutation is required to change "status" under Rooms->Availability section. Which I do not want to hard code like
db.collection.update({
'Rooms.Availability.status':true
},{
$set : {
'Rooms.Availability.status':False
}
})
for specific Array Index because there are possibilities of not having "status" or "Availability" key in some other document.
Below is similar JSON structure. Keys can be different in other JSON documents within same collection.
#GraphQLMutation(name = "updateHotelDetails")
public Hotel updateHotelDetails(Hotel h){ // Do I need to pass whole object as argument or only specific key-value?
mongoTemplate.updateFirst(....); // How can I write update code without hard coding?
}
Document 1:
{
"_id" : ObjectId("71testsrtdtsea6995432"),
"HotelName": "Test71testsrtdtsea699fff",
"Description": ".....",
"Address": {
"Street": "....",
"City": "....",
"State": "...."
},
"Rooms": [
{
"Description": "......",
"Type": ".....",
"Price": "....."
"Availability": [
"status": false,
"readOnly": false
]
},
{
"Description": "......",
"Type": "....",
"Price": "..."
"Availability": [
"status": true,
"readOnly": false
]
"newDynamickey": [
{}
]
},
]
"AdditionalData": [
{
"key1": "Vlaue1",
"key2":"Value2"
},
{...}
]
}

PostgreSql jsonb field to view

I have this kind of jsonb data in one of column named "FORM" in a my table "process" and I want to create view with some data which are inside of row named field I just want name and value form field named array in this jsonb.
here the jsonb:
{
"column": [
{
"row": {
"id": "ebc7afddad474aee8f82930b6dc328fe",
"name": "Details",
"field": [
{
"name": {
"id": "50a5613e97e04cb5b8d32afa8a9975d1",
"label": "name"
},
"value": {
"stringValue": "yhfghg"
}
}
]
}
},
{
"row": {
"id": "5b7471413cbc44c1a39895020bf2ec58",
"name": "leave details",
"field": [
{
"name": {
"id": "bb127e8284c84692aa217539c4312394",
"label": "date"
},
"value": {
"dateValue": 1549065600
}
},
{
"name": {
"id": "33b2c5d1a968481d9d5e386db487de52",
"label": "days",
"options": {
"allowedValues": [
{
"item": "1"
},
{
"item": "2"
},
{
"item": "3"
},
{
"item": "4"
},
{
"item": "5"
}
]
},
"defaultValue": {
"radioButtonValue": "1"
}
},
"value": {
"radioButtonValue": "3"
}
}
]
}
}
]
}
and i want to this kind of jsonb in view data comes from subarray called field inside the object named row......
[
{
"name": {
"id": "50a5613e97e04cb5b8d32afa8a9975d1"
},
"value": {
"stringValue": "yhfghg"
}
},
{
"name": {
"id": "bb127e8284c84692aa217539c4312394"
},
"value": {
"dateValue": 1549065600
}
},
{
"name": {
"id": "33b2c5d1a968481d9d5e386db487de52"
},
"value": {
"radioButtonValue": "3"
}
}
]
How can I do this?

I used jsonb_array_elements twice to expand the two arrays, then used json_build_object to make the result structure and jsonb_agg combine the several rows generated above into a single JSONB array.
I included a row number is the results so I could later apply group by so that results from several "process" rows would not be accidentally combined by the jsonb_agg.
with cols as (select jsonb_array_elements( "FORM" ->'column') as r
,row_number() over () as n from "process" )
,cols2 as (select jsonb_array_elements(r->'row'->'field') as v
,n from cols)
select jsonb_agg(json_build_object('name',v->'id','value',v->'value'))
from cols2 group by n;

How do I filter a collection for a null foreign key or object relation in BackAnd?

I'd like to filter a collection for a null foreign key or object relation in BackAnd.
filter = [
{
"fieldName": "Parent",
"operator": "empty",
"value": ""
}
]
Here's my table/object definition:
{
"name": "Tree",
"fields": {
"Title": {
"type": "string"
},
"Description": {
"type": "string"
},
"Parent": {
"object": "Certifications"
},
"Children": {
"collection": "Certifications",
"via": "Parent"
}
}
}
When I try the filter above, I get this error:
The field "FK_Tree_Tree_Parent" is a relation field.
To filter relation fields please use the operator "in"
This just returns all of the values in the table:
filter = [
{
"fieldName": "Parent",
"operator": "in",
"value": ""
}
]
Is it possible to get back the records that don't have a parent assigned to them?

You can use Queries, with the following sql
SELECT * FROM Tree WHERE Parent IS NULL

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How create a GIN index on nested arrays in PostgeSQL - postgresql

Related

How to select filtered postgresql jsonb field with performance prioritization?

Transform nested JSON in data factory to sql

Using GraphQL,Springboot,MongoDB.The json is 1000+ lines deeply nested.Instead of updating whole doc,need to update specific key-value at any position

PostgreSql jsonb field to view

How do I filter a collection for a null foreign key or object relation in BackAnd?

Categories

Resources