Postgres find in jsonb nested array - postgresql

I have a case when my data in in nested arrays of jsonb in order to find the value I have to do multiple JSONB_ARRAY_ELEMENTS which is costly and takes a lots of nested code.
The json file has the continents inside countries and inside cities.
I need to access a city value.
Is there a way to make this query simpler and faster?
I was trying to solve it using JSON_EXTRACT_PATH but in order to get in to a array but I need the indexes.
WITH mydata AS (
SELECT '
{
"continents":[
{
"name":"America",
"area":43316000,
"countries":[
{
"country_name":"Canada",
"capital":"Toronto",
"cities":[
{
"city_name":"Ontario",
"population":2393933
},
{
"city_name":"Quebec",
"population":12332
}
]
},
{
"country_name":"Brazil",
"capital":"Brasilia",
"cities":[
{
"city_name":"Sao Paolo",
"population":34534534
},
{
"city_name":"Rio",
"population":445345
}
]
}
]
},
{
"name":"Europa",
"area":10530751,
"countries":[
{
"country_name":"Switzerland",
"capital":"Zurich",
"cities":[
{
"city_name":"Ginebra",
"population":4564565
},
{
"city_name":"Basilea",
"population":4564533
}
]
},
{
"country_name":"Norway",
"capital":"Oslo",
"cities":[
{
"city_name":"Oslo",
"population":3243534
},
{
"city_name":"Steinkjer",
"population":4565465
}
]
}
]
}
]
}
'::JSONB AS data_column
)
SELECT cit.city->>'city_name' AS city,
(cit.city->>'population')::INTEGER AS population
FROM (SELECT JSONB_ARRAY_ELEMENTS(coun.country->'cities') AS city
FROM (SELECT JSONB_ARRAY_ELEMENTS(cont.continent->'countries') AS country
FROM (SELECT JSONB_ARRAY_ELEMENTS(data_column->'continents') AS continent
FROM mydata
) AS cont
WHERE cont.continent #> '{"name":"Europa"}'
) AS coun
WHERE coun.country #> '{"country_name" : "Norway"}'
) AS cit
WHERE cit.city #> '{"city_name": "Oslo"}'
See my nested queries? looks ugly, I can get the answer using: JSONB_EXTRACT_PATH( data_column->'continents', '1', 'countries', '1', 'cities', '0', 'population') but I had to hardcode the array indexes.
Hope you can help me out.
Thanks.

You don't need any nesting, you can do lateral queries:
SELECT
city->>'city_name' AS city,
(city->>'population')::INTEGER AS population
FROM
mydata,
JSONB_ARRAY_ELEMENTS(data_column->'continents') AS continent,
JSONB_ARRAY_ELEMENTS(continent->'countries') AS country,
JSONB_ARRAY_ELEMENTS(country->'cities') AS city
WHERE continent ->> 'name' = 'Europa'
AND country ->> 'country_name' = 'Norway'
AND city ->> 'city_name' = 'Oslo';
(online demo)
However, since you mentioned paths and having to specify indices in there, this is actually the perfect use case for Postgres 12 JSON paths:
SELECT jsonb_path_query(data_column, '$.continents[*]?(#.name == "Europa").countries[*]?(#.country_name=="Norway").cities[*]?(#.city_name=="Oslo")') FROM mydata
(online demo)

Related

Sort by json element at nested level for jsonb data - postgresql

I have below table in postgresql which stored JSON data in jsonb type of column.
CREATE TABLE "Trial" (
id SERIAL PRIMARY KEY,
data jsonb
);
Below is the sample json structure
{
"id": "000000007001593061",
"core": {
"groupCode": "DVL",
"productType": "ZDPS",
"productGroup": "005001000"
},
"plants": [
{
"core": {
"mrpGroup": "ZMTS",
"mrpTypeDesc": "MRP",
"supLeadTime": 777
},
"storageLocation": [
{
"core": {
"storageLocation": "H050"
}
},
{
"core": {
"storageLocation": "H990"
}
},
{
"core": {
"storageLocation": "HM35"
}
}
]
}
],
"discriminator": "Material"
}
These are the scripts for insert json data
INSERT INTO "Trial"(data)
VALUES(CAST('{"id":"000000007001593061","core":{"groupCode":"DVL","productType":"ZDPS","productGroup":"005001000"},"plants":[{"core":{"mrpGroup":"ZMTS","mrpTypeDesc":"MRP","supLeadTime":777},"storageLocation":[{"core":{"storageLocation":"H050"}},{"core":{"storageLocation":"H990"}},{"core":{"storageLocation":"HM35"}}]}],"discriminator":"Material"}' AS JSON))
INSERT INTO "Trial"(data)
VALUES(CAST('{"id":"000000000104107816","core":{"groupCode":"ELC","productType":"ZDPS","productGroup":"005001000"},"plants":[{"core":{"mrpGroup":"ZCOM","mrpTypeDesc":"MRP","supLeadTime":28},"storageLocation":[{"core":{"storageLocation":"H050"}},{"core":{"storageLocation":"H990"}}]}],"discriminator":"Material"}' AS JSON))
INSERT INTO "Trial"(data)
VALUES(CAST('{"id":"000000000104107818","core":{"groupCode":"DVK","productType":"ZDPS","productGroup":"005001000"},"plants":[{"core":{"mrpGroup":"ZMTL","mrpTypeDesc":"MRP","supLeadTime":28},"storageLocation":[{"core":{"storageLocation":"H050"}},{"core":{"storageLocation":"H990"}}]}]}' AS JSON))
If try to sort by at first level then it works
select id,data->'core'->'groupCode'
from "Trial"
order by data->'core'->'groupCode' desc
But when I try to sort by at nested level, below is the script then it doesn't work for me, I'm for sure I'm wrong for this script but don't know what is it ? Need assistant if someone knows how to order by at nested level for JSONB data.
select id,data->'plants'
from sap."Trial"
order by data->'plants'->'core'->'mrpGroup' desc
Need assistance for write a query for order by at nested level for JSONB data.
Below query works for me
SELECT id, data FROM "Trial" ORDER BY jsonb_path_query_array(data, '$.plants[*].core[*].mrpGroup') desc limit 100

Insert new item in JSONB column based on value of other field - postgres

I have the following jsonb structure with many entries in it
[
{
"name":"test",
"features":[
{
"name":"feature1",
"granted":false
},
{
"name":"feature2",
"granted":true
}
]
}...
]
I'd like to add a new entry in the features array when the parent name element has value "test" and feature1 granted is "false".
The idea is to write a flyway script to migrate my data.
I've been battling with jsonb_insert but I can't figure out the path portion of it since I can have potentially many elements in there and I can't just add a given subscript.
End result should be:
[
{
"name":"test",
"features":[
{
"name":"feature1",
"granted":false
},
{
"name":"feature2",
"granted":true
},
{
"name":"newFeature",
"granted":false
}
]
}
]
EDIT1
So far I've attempted:
UPDATE my_table SET modules =
jsonb_insert(my_column, '{features, [0]}', '{"name": "newFeature", "granted": false}')
WHERE my_column ->> 'name' = 'test' AND my_column #> '{"features": [{"name":"feature1", "granted": false}]}';
The statement executes but no updates are actually done.
EDIT2
I modified the query just to test the PATH out to
UPDATE my_table SET modules =
jsonb_insert(my_column, '{0, features, 0}', '{"name": "newFeature", "granted": false}')
WHERE my_column ->> 'name' = 'test' AND my_column #> '{"features": [{"name":"feature1", "granted": false}]}';
However this only always updates the first entry in the array, and the object I need to update is not guaranteed to always be in this position
This should be enough information to complete the query:
Let's create the mock data
create table a (id serial primary key , b jsonb);
insert into a (b)
values ('[
{
"name": "test",
"features": [
{
"name": "feature1",
"granted": false
},
{
"name": "feature2",
"granted": true
}
]
},
{
"name": "another-name",
"features": [
{
"name": "feature1",
"granted": false
},
{
"name": "feature2",
"granted": true
}
]
}
]');
Now explode the array using jsonb_array_elements with ordinality to get the index and the property
select first_level.id, position, feature_position, feature
from (select a.id, arr.*
from a,
jsonb_array_elements(a.b) with ordinality arr (elem, position)
where elem ->> 'name' = 'test') first_level,
jsonb_array_elements(first_level.elem -> 'features') with ordinality features (feature, feature_position);
The result of this query is:
1,1,1,"{""name"": ""feature1"", ""granted"": false}"
1,1,2,"{""name"": ""feature2"", ""granted"": true}"
There you have the necessary info that you need to fetch the sub elements that you need, as well as all the indexes that you needed for your query.
Now, to the final edit, you already had the query that you wanted:
UPDATE my_table SET modules =
jsonb_insert(my_column, '{0, features, 0}', '{"name": "newFeature", "granted": false}')
WHERE my_column ->> 'name' = 'test' AND my_column #> '{"features": [{"name":"feature1", "granted": false}]}';
In the where you'll use the id, because those are the rows that you are interested in, and in the indexes you got them from the query. So:
UPDATE my_table SET modules =
jsonb_insert(my_column, '{' || exploded_info.position::string || ', features, ' || exploded_info.feature_position || '}', '{"name": "newFeature", "granted": false}') from (/* previous query */) as exploded_info
WHERE exploded_info.id = my_table.id and exploded_info.feature -> 'granted' = false;
As you can see this easily get's very nasty.
I'd recommend either using a more sql approach, that is, having features in a table instead of inside a json, a fk linking that to your table...
If you really need to use the json, for example, because the domain is really complex and defined at the application level and very flexible. Then I would recommend doing the updates inside app code

Creating an AND query on a list of items in Azure Cosmos

I'm building an application in Azure Cosmos and I'm having trouble creating a query. Using the dataset below, I want to create a query that only finds CharacterId "Susan" by searching for all characters that have the TraitId of "Athletic" and "Slim".
Here is my JSON data set
{
"characterId": "Bob",
"traits": [
{
"traitId": "Athletic",
"traitId": "Overweight"
}
],
},
{
"characterId": "Susan",
"traits": [
{
"traitId": "Athletic",
"traitId": "Slim"
}
],
},
{
"characterId": "Jerry",
"traits": [
{
"traitId": "Slim",
"traitId": "Strong"
}
],
}
]
The closest I've come is this query but it acts as an OR statement and what I want is an AND statement.
SELECT * FROM Characters f WHERE f.traits IN ("Athletic", "Slim")
Any help is greatly appreciated.
EDITED: I figured out the answer to this question. If anyone is interested this query gives the results I was looking for:
SELECT * FROM Characters f
WHERE EXISTS (SELECT VALUE t FROM t IN f.traits WHERE t.traitId = 'Athletic')
AND EXISTS (SELECT VALUE t FROM t IN f.traits WHERE t.traitId = 'Slim')
The answer that worked for me is to use EXISTS statements with SELECT statements that searched the traits list. In my program I can use StringBuilder to create a SQL statement that concatenates an AND EXISTS statement for each of the traits I want to find:
SELECT * FROM Characters f
WHERE EXISTS (SELECT VALUE t FROM t IN f.traits WHERE t.traitId = 'Athletic')
AND EXISTS (SELECT VALUE t FROM t IN f.traits WHERE t.traitId = 'Slim')

SequelizeJS: Recursive include same model

I want to fetch data from a Postgres database recursively including all associated models using SequelizeJS.
GeoLocations table
id | isoCode | name | parentId | type | updatedAt | deletedAt
parentId holds the id of the parent GeoLocation.
Model Associations
GeoLocation.belongsTo(GeoLocation, {
foreignKey: 'parentId',
as: 'GeoLocation'
});
Data explanation and example
GeoLocation types and their relations:
city -> subdivision -> country -> continent
id: 552, name: Brooklyn, type: city, parentId: 551
id: 551, name: New York, type: subdivision, parentId: 28
id: 28, name: United States, type: country, parentId: 27
id: 27, name: North America, type: continent, parentId: NULL
Now, when querying a city, I want all relations to be included as long as parentId is set.
GeoLocation.findOne({
where: {
id: 552
},
include: [
{
model: GeoLocation,
as: 'GeoLocation',
include: [
{
model: GeoLocation,
as: 'GeoLocation',
include: [
{
model: GeoLocation,
as: 'GeoLocation',
}
],
}
],
}
],
});
Response in JSON
{
"GeoLocation":{
"id":552,
"type":"city",
"name":"Brooklyn",
"GeoLocation":{
"id":551,
"type":"subdivision",
"name":"New York",
"GeoLocation":{
"id":28,
"type":"country",
"name":"United States",
"GeoLocation":{
"id":27,
"type":"continent",
"name":"North America"
}
}
}
}
}
The solution above works, but I have the strong feeling, that there are better ways to do this, without having to include the model multiple times. I can't find anything related in the docs. Am I missing something?
I already moved on a different solution. I am not using sequelize native pseudo language to achieve this. Because apparently, recursive includes feature is not possible with sequelize.
Here is the recursive model I use :
I want this recursive info, which organisation is related to which, no matter how I get this info, I'll process it later to get the info into shape (tree, flat list, ... whatever)
I use a sequelize RAW query involving recursive SQL:
exports.getArborescenceByLienId = function (relationType, id) {
const query = 'With cte as (\n' +
' select id_organisationSource, \n' +
' relationType,\n' +
' id_organisationTarget\n' +
' from (\n' +
' select * from relation where relationType= :relationType\n' +
' ) relations_sorted,\n' +
' (\n' +
' select #pv := :orgId\n' +
' ) initialisation\n' +
' where find_in_set(id_organisationSource, #pv)\n' +
' and length(#pv := concat(#pv, \',\', id_organisationTarget))\n' +
')\n' +
'select source.id as `idSource`, cte.relationType, target.id as `idTarget`\n' +
'from cte \n' +
'left outer join organisation source on cte.id_organisationSource = source.id\n' +
'left outer join organisation target on cte.id_organisationTarget = target.id;';
return models.sequelize.query(
query,
{
type: models.sequelize.QueryTypes.SELECT,
replacements: { orgId: id, relationType: relationType}
}
);
};
Imagine I have an object model like this:
(Assuming each relation here have same type : say isManaging)
Result of query with ID = 1 would be :
Result of query with ID = 2 would be :
I then transform this flat list to a Tree, and VOILA :)
Let me know if your interested in this Flat list to tree transforming algorithm.
Hope this help. Cheers !

sql server Row Number with partition over in MongoDB for returning a subset of rows

How to write below query using MongoDB-Csharp driver
SELECT SubSet.*
FROM ( SELECT T.ProductName ,
T.Price ,
ROW_NUMBER() OVER ( PARTITION BY T.ProductName ORDER BY T.ProductName ) AS ProductRepeat
FROM myTable T
) SubSet
WHERE SubSet.ProductRepeat = 1
What I am trying to achieve is
Collection
ProductName|Price|SKU
Cap|10|AB123
Bag|5|ED567
Cap|20|CD345
Cap|5|EC123
Expected results is
ProductName|Price|SKU
Cap|10|AB123
Bag|5|ED567
Here is the one attempt (please don't go with the object and fields)
public List<ProductOL> Search(ProductOL obj, bool topOneOnly)
{
List<ProdutOL> products = new List<ProductOL>();
var database = MyMongoClient.Instance.OpenToRead(dbName: ConfigurationManager.AppSettings["MongoDBDefaultDB"]);
var collection = database.GetCollection<RawBsonDocument>("Products");
List<IMongoQuery> build = new List<IMongoQuery>();
if (!string.IsNullOrEmpty(obj.ProductName))
{
var ProductNameQuery = Query.Matches("ProductName", new BsonRegularExpression(obj.ProductName, "i"));
build.Add(ProductNameQuery);
}
if (!string.IsNullOrEmpty(obj.BrandName))
{
var brandNameQuery = Query.Matches("BrandName", new BsonRegularExpression(obj.BrandName, "i"));
build.Add(brandNameQuery);
}
var fullQuery = Query.And(build.ToArray());
products = collection.FindAs<ProductOL>(fullQuery).SetSortOrder(SortBy.Ascending("ProductName")).ToList();
if (topOneOnly)
{
var tmpProducts = new List<ProductOL>();
foreach (var item in products)
{
if (tmpProducts.Any(x => x.ProductName== item.ProductName)) { }
else
tmpProducts.Add(item);
}
products = tmpProducts;
}
return products;
}
my mongo query works and gives me the right results. But that is not effeciant when I am dealing with huge data, so I was wondering if mongodb has any concepts like SQL Server for Row_Number() and Partitioning
If your query returns the expected results but isn't efficient, you should look into index usage with explain(). Given your query generation code includes conditional clauses, it seems likely you will need multiple indexes to efficiently cover common variations.
I'm not sure how the C# code you've provided relates to the original SQL query, as they seem to be entirely different. I'm also not clear how grouping is expected to help your query performance, aside from limiting the results returned.
Equivalent of the SQL query
There is no direct equivalent of ROW_NUMBER() .. PARTITION BY grouping in MongoDB, but you should be able to work out the desired result using either the Aggregation Framework (fastest) or Map/Reduce (slower but more functionality). The MongoDB manual includes an Aggregation Commands Comparison as well as usage examples.
As an exercise in translation, I'll focus on your SQL query which is pulling out the first product match by ProductName:
SELECT SubSet.*
FROM ( SELECT T.ProductName ,
T.Price ,
ROW_NUMBER() OVER ( PARTITION BY T.ProductName ORDER BY T.ProductName ) AS ProductRepeat
FROM myTable T
) SubSet
WHERE SubSet.ProductRepeat = 1
Setting up the test data you provided:
db.myTable.insert([
{ ProductName: 'Cap', Price: 10, SKU: 'AB123' },
{ ProductName: 'Bag', Price: 5, SKU: 'ED567' },
{ ProductName: 'Cap', Price: 20, SKU: 'CD345' },
{ ProductName: 'Cap', Price: 5, SKU: 'EC123' },
])
Here's an aggregation query in the mongo shell which will find the first match per group (ordered by ProductName). It should be straightforward to translate that aggregation query to the C# driver using the MongoCollection.Aggregate() method.
I've included comments with the rough equivalent SQL fragment in your original query.
db.myTable.aggregate(
// Apply a sort order so the $first product is somewhat predictable
// ( "ORDER BY T.ProductName")
{ $sort: {
ProductName: 1
// Should really have additional sort by Price or SKU (otherwise order may change)
}},
// Group by Product Name
// (" PARTITION BY T.ProductName")
{ $group: {
_id: "$ProductName",
// Find first matching product details per group (can use $$CURRENT in MongoDB 2.6 or list specific fields)
// "SELECT SubSet.* ... WHERE SubSet.ProductRepeat = 1"
Price: { $first: "$Price" },
SKU: { $first: "$SKU" },
}},
// Rename _id to match expected results
{ $project: {
_id: 0,
ProductName: "$_id",
Price: 1,
SKU: 1,
}}
)
Results given the test data appear to be what you were looking for:
{ "Price" : 10, "SKU" : "AB123", "ProductName" : "Cap" }
{ "Price" : 5, "SKU" : "ED567", "ProductName" : "Bag" }
Notes:
This aggregation query uses the $first operator, so if you want to find the second or third product per grouping you'd need a different approach (eg. $group and then take the subset of results needed in your application code)
If you want predictable results for finding the first item in a $group there should be more specific sort criteria than ProductName (for example, sorting by ProductName & Price or ProductName & SKU). Otherwise the order of results may change in future as documents are added or updated.
Thanks to #Stennie with the help of his answer I could come up with C# aggregation code
var match = new BsonDocument
{
{
"$match",
new BsonDocument{
{"ProductName", new BsonRegularExpression("cap", "i")}
}
}
};
var group = new BsonDocument
{
{"$group",
new BsonDocument
{
{"_id", "$ProductName"},
{"SKU", new BsonDocument{
{
"$first", "$SKU"
}}
}
}}
};
var project = new BsonDocument{
{
"$project",
new BsonDocument
{
{"_id", 0 },
{"ProductName","$_id" },
{"SKU", 1}
}}};
var sort = new BsonDocument{
{
"$sort",
new BsonDocument
{
{
"ProductName",1 }
}
}};
var pipeline = new[] { match, group, project, sort };
var aggResult = collection.Aggregate(pipeline);
var products= aggResult.ResultDocuments.Select(BsonSerializer.Deserialize<ProductOL>).ToList();
Using AggregateArgs
AggregateArgs args = new AggregateArgs();
List<BsonDocument> piple = new List<BsonDocument>();
piple.Add(match);
piple.Add(group);
piple.Add(project);
piple.Add(sort);
args.Pipeline = piple;
// var pipeline = new[] { match, group, project, sort };
var aggResult = collection.Aggregate(args);
products = aggResult.Select(BsonSerializer.Deserialize<ProductOL>).ToList();