Querying array of nested objects with nested array of objects in Redshift - amazon-redshift

Let's say I have the following JSON
{
"id": 1,
"sets": [
{
"values": [
{
"value": 1
},
{
"value": 2
}
]
},
{
"values": [
{
"value": 5
},
{
"value": 6
}
]
}
]
}
If the table name is X I expect the query
SELECT x.id, v.value
FROM X as x,
x.sets as sets,
sets.values as v
to give me
id, value
1, 1
1, 2
2, 5
2, 6
and it does work if both sets and values has one object each. When there's more the query fails with column 'id' had 0 remaining values but expected 2. Seems to me I'm not iterating over "sets" properly?
So my question is: what's the proper way to query data structured like my example above in Redshift (using PartiQL)?

Related

Postgres JSONB Editing columns

I was going through the Postgres Jsonb documentation but was unable to find a solution for a small issue I'm having.
I've got a table : MY_TABLE
that has the following columns:
User, Name, Data and Purchased
One thing to note is that "Data" is a jsonb and has multiple fields. One of the fields inside of "Data" is "Attribute" but the values it can hold are not in sync. What I mean is, it could be a string, a list of strings, an empty list, or just an empty string. However, I want to change this.
The only values that I want to allow are a list of strings and an empty list. I want to convert all the empty strings to empty lists and regular strings to a list of strings.
I have tried using json_build_array but have not had any luck
So for example, I'd want my final jsonb to look like :
[{
"Id": 1,
"Attributes": ["Test"]
},
{
"Id": 2,
"Attributes": []
},
{
"Id": 3,
"Attributes": []
}]
when converted from
{
"Id": 1,
"Attributes": "Test"
},
{
"Id": 2,
"Attributes": ""
},
{
"Id": 3,
"Attributes": []
}
]
I only care about the "Attributes" field inside of the Json, not any other fields.
I also want to ensure for some Attributes that have an empty string "Attributes": "", they get mapped to an empty list and not a list with an empty string ([] not [""])
I also want to preserve the empty array values ([]) for the Attributes that already hold an empty array value.
This is what I have so far:
jsonb_set(
mycol,
'{Attributes}',
case when js ->> 'Attributes' <> ''
then jsonb_build_array(js ->> 'Attributes')
else '[]'::jsonb
end
)
However, Attributes: [] is getting mapped to ["[]"]
Use jsonb_array_elements() to iterate over the elements of each 'data' cell and jsonb_agg to regroup the transform values together into an array:
WITH test_data(js) AS (
VALUES ($$ [
{
"Id": 1,
"Attributes": "Test"
},
{
"Id": 2,
"Attributes": ""
},
{
"Id": 3,
"Attributes": []
}
]
$$::JSONB)
)
SELECT transformed_elem
FROM test_data
JOIN LATERAL (
SELECT jsonb_agg(jsonb_set(
elem,
'{Attributes}',
CASE
WHEN elem -> 'Attributes' IN ('""', '[]') THEN '[]'::JSONB
WHEN jsonb_typeof(elem -> 'Attributes') = 'string' THEN jsonb_build_array(elem -> 'Attributes')
ELSE elem -> 'Attributes'
END
)) AS transformed_elem
FROM jsonb_array_elements(test_data.js) AS f(elem) -- iterate over every element in the array
) s
ON TRUE
returns
[{"Id": 1, "Attributes": ["Test"]}, {"Id": 2, "Attributes": []}, {"Id": 3, "Attributes": []}]

Is there a magic function with can extract all select keys/nested keys including array from jsonb

Given a jsonb and set of keys how can I get a new jsonb with required keys.
I've tried extracting key-values and assigned to text[] and then using jsonb_object(text[]). It works well, but the problem comes when a key has a array of jsons.
create table my_jsonb_table
(
data_col jsonb
);
insert into my_jsonb_table (data_col) Values ('{
"schemaVersion": "1",
"Id": "20180601550002",
"Domains": [
{
"UID": "29aa2923",
"quantity": 1,
"item": "book",
"DepartmentDomain": {
"type": "paper",
"departId": "10"
},
"PriceDomain": {
"Price": 79.00,
"taxA": 6.500,
"discount": 0
}
},
{
"UID": "bbaa2923",
"quantity": 2,
"item": "pencil",
"DepartmentDomain": {
"type": "wood",
"departId": "11"
},
"PriceDomain": {
"Price": 7.00,
"taxA": 1.5175,
"discount": 1
}
}
],
"finalPrice": {
"totalTax": 13.50,
"total": 85.0
},
"MetaData": {
"shopId": "1405596346",
"locId": "95014",
"countryId": "USA",
"regId": "255",
"Date": "20180601"
}
}
')
This is what I am trying to achieve :
SELECT some_magic_fun(data_col,'Id,Domains.UID,Domains.DepartmentDomain.departId,finalPrice.total')::jsonb FROM my_jsonb_table;
I am trying to create that magic function which extracts the given keys in a jsonb format, as of now I am able to extract scalar items and put them in text[] and use jsonb_object. but don't know how can I extract all elements of array
expected output :
{
"Id": "20180601550002",
"Domains": [
{
"UID": "29aa2923",
"DepartmentDomain": {
"departId": "10"
}
},
{
"UID": "bbaa2923",
"DepartmentDomain": {
"departId": "11"
}
}
],
"finalPrice": {
"total": 85.0
}
}
I don't know of any magic. You have to rebuild it yourself.
select jsonb_build_object(
-- Straight forward
'Id', data_col->'Id',
'Domains', (
-- Aggregate all the "rows" back together into an array.
select jsonb_agg(
-- Turn each array element into a new object
jsonb_build_object(
'UID', domain->'UID',
'DepartmentDomain', jsonb_build_object(
'departId', domain#>'{DepartmentDomain,departId}'
)
)
)
-- Turn each element of the Domains array into a row
from jsonb_array_elements( data_col->'Domains' ) d(domain)
),
-- Also pretty straightforward
'finalPrice', jsonb_build_object(
'total', data_col#>'{finalPrice,total}'
)
) from my_jsonb_table;
This probably is not a good use of a JSON column. Your data is relational and would better fit traditional relational tables.

MongoDB updating wrong subdocument in an array

my gamefamilies collection looks like this
{
"_id": ObjectId('54cc3ee7894ae60c1c9d6c74'),
"game_ref_id": "REF123",
..
"yearwise_details": [
{
"year": 1,
...
"other_details": [
{
"type": "cash",
"openingstock": 988
..
},
{
"type": "FLU",
"openingstock": 555
..
},
..other items
]
},
{
"year": 2,
...
"other_details": [
{
"type": "cash",
"openingstock": 3000,
....
},
...
{
"type": "ghee",
"openingstock": 3000,
...
},
..
]
}
]
}
My update query
db.gamefamilies.update({"game_ref_id": "REF123", "teamname": "manisha","yearwise_details.year": 2, "yearwise_details.other_details.type": "ghee"}, {"$set": {"yearwise_details.0.other_details.$.openingstock": 555} });
Document is getting picked up correctly. I expect to update year 2's item type="ghee" but instead year 1's 2nd item (type FLU) gets updated. What am I doing wrong ?
Any help would be greatly appreciated.
regards
Manisha
Unfortunately, there is not yet support for nested $ positional operator updates.
So you can hardcode the update with
db.gamefamilies.update({"game_ref_id": "REF123",
"teamname": "manisha",
"yearwise_details.year": 2,
"yearwise_details.other_details.type": "ghee"},
{"$set":
{"yearwise_details.1.other_details.$.openingstock": 555}});
But notice that the yearwise_details.1.other_details is hardcoding that you want the second value of the array (it is 0-indexed, so the 1 is referencing the second element). I am assuming you found the command you have in your question because it worked for the first element of the array. But it will only ever work on the first element and the command above will only ever work on the second element.

Map Reduce Mongo DB: Sum of ODD and EVEN numbers with elements

I am trying to process a number series ( collection ) get sum of odd / even numbers separately along with elements considered for calculations of each.
The numberseries document structure is as follows:
{
_id: <Autogenerated>,
number: <any number, it can repeat. Even if it repeats, it should be added each time. >
}
The output is something like below( not exact but in general )
{
..
{
"odd":<result>, elements:{n1,n3,n5}
},
{
"even":<result>, elements:{n2,n4,n6}
}
..
}
Map Function:
mapf = function(){
var value = { sum : 0, elements :[] };
value.sum = this.number;
value.elements.push(this.number);
print(tojson(value));
if( this.number % 2 != 0 ){
emit( "odd", value );
}
if( this.number % 2 == 0 ){
emit( "even", value );
}
}
Reduce Values argument:
Values is an array of JSON emitted from map:
[{
"sum": 1,
"elements": [1]
}, {
"sum": 3,
"elements": [3]
} ... ]
Reduce Function:
reducef = function(key, values){
var result = { sum : 0 , elements:[] };
print("K " + key +"Values array " + tojson(values) );
for(var i = 0; i<values.length;i++ ){
v = values[i];
print("Key "+key+"V.JSON"+tojson(v)+" V.SUM -> "+v.sum);
result.sum += v.sum;
result.elements.push(v.elements[0]);
print(tojson(result));
}
return result;
}
I am getting sum correctly, but the elements array is not properly getting populated. It is containing only some of the elements considered for calculations.
UPDATE
As per the answer given by Neil, I further verified my code. I found that my code, without any modification, works for small dataset, but does not work for large data-set.
Below are points which I have verified as pointed out, I found my code to be correct.
print("K " + key +"Values array " + tojson(values) );
Above line in reduce function results in following values object printed.
[{
"sum": 1,
"elements": [1]
}, {
"sum": 3,
"elements": [3]
}, {
"sum": 5,
"elements": [5]
}, {
"sum": 7,
"elements": [7]
}, {
"sum": 9,
"elements": [9]
}, {
"sum": 11,
"elements": [11]
}, {
"sum": 13,
"elements": [13]
}, {
"sum": 15,
"elements": [15]
}, {
"sum": 17,
"elements": [17]
}, {
"sum": 19,
"elements": [19]
}]
Hence the line to push elements to array in final results result.elements.push(v.elements[0]); should be correct.
In map function, before emitting, I am modifying value.sum as follows
value.sum = this.number;
This ensures that sum is not zero and numbers are properly getting added due to this.
When I test this code with 20 records, 40 records, 100 records, it works perfectly.
When I test this code with 20000 records, the sum value is correct but the element array
does not contain 10000 elements each( Odd and even numbers are equally distributed in collection ) .
In later case, I get below message:
query not recording (too large)
Okay, there is a clear reason and you do appear to have read some of the documentation and at least applied this rule:
"the type of the return object must be identical to the type of the value emitted by the map function ..."
And by that this means that both the map function and the reduce function essentially have the same output, which you did:
{ sum : 0, elements :[] };
But there was a piece of documentation that has not been understood:
"MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key."
So where the whole thing goes wrong is that you have assumed that since your "map" function only emits one element, that then there will be only one element in the "elements" array. A careful re-read of the above says that this is not true. And in fact the output from "reduce" will very likely be fed back into the "reduce" function again. This is indeed how mapReduce deals with a large number of values for the "values" array.
To fix it, change this in the "reduce" function:
result.elements.push(v.elements[0]);
To this:
v.elements.forEach(function(element) {
result.elements.push(element);
}
And in that way, when the "reduce" function returns a result that has summed up a few "elements" already and pushed them to the list, then that "input" will be processed correctly and merged with any other "values" that come in with it.
BTW. I Think you actually meant this in your mapper:
var value = { sum : 1, elements :[] };
Otherwise this code down here would just be summing 0's:
result.sum += v.sum;
But aggregate does this better
All of that said the following aggregation framework statement does the same thing but better and faster with an implementation in native code:
db.collection.aggregate([
{ "$project": {
"type": { "$cond": [
{ "$eq": [ { "$mod": [ "$number", 2 ] }, 0 ] },
"even",
"odd"
]},
"number": 1
}},
{ "$group": {
"_id": "$type",
"sum": { "$sum": 1 },
"elements": { "$push": "$number" }
}}
])
And also note that in both cases you are not really "summing the elements", but rather "counting" them. So if your want the sum then the mapReduce part becomes:
//result.sum += v.sum;
v.elements.forEach(function(element) {
result.sum += element;
result.elements.push(element);
}
And the aggregate part becomes:
{ "$group": {
"_id": "$type",
"sum": { "$sum": "$number" },
"elements": { "$push": "$number" }
}}
Which truly sums the "odd" or "even" numbers as found in your collection.

Conditional $inc in a nested MongoDB array

My database looks like this:
{
_id: 1,
values: [ 1, 2, 3, 4, 5 ]
},
{
_id: 2,
values: [ 2, 4, 6, 8, 10 ]
}, ...
I'd like to update every value in every document's nested array ("values") that meets some criterion. For instance, I'd like to increment every value that's >= 4 by one, which ought to yield:
{
_id: 1,
values: [ 1, 2, 3, 5, 6 ]
},
{
_id: 2,
values: [ 2, 5, 7, 8, 11 ]
}, ...
I'm used to working with SQL, where the nested array would be a seperated table connected with a unique ID. I'm a little lost in this new NoSQL world.
Thank you kindly,
This sort of update is not really possible using nested arrays, the reason for this is given in the positional $ operator documentation, and that states that you can only match the first array element for a given condition in the query.
So a statement like this:
db.collection.update(
{ "values": { "$gte": 4 } },
{ "$inc": { "values.$": 1 } }
)
Will not work in the sense that only the "first" array element that was matched would be incremented. So on your first document you would get this:
{ "_id" : 1, "values" : [ 1, 2, 3, 6, 6 ] }
In order to update the values as you are suggesting you would need to iterate the documents and the array elements to produce the result:
db.collecction.find({ "values": { "$gte": 4 } }).forEach(function(doc) {
for ( var i=0; i < doc.values.length; i++ ) {
if ( doc.values[i] >= 4 ) {
doc.values[i]++;
}
}
db.collection.update(
{ "_id": doc._id },
{ "$set": { "values": doc.values } }
);
})
Or whatever code equivalent of that basic concept.
Generally speaking, this sort of update does not lend itself well to a structure that contains elements in an array. If that is really your need, then the elements are better off listed within a separate collection.
Then again, the presentation of this question is more of a "hypothetical" situation without understanding your actual use case for performing this sort of udpate. So if you possibly described what you actually need to do and how your data really looks in another question, then that might get a more meaningful response in terms of the best approach for you to use.