How to flatten nested array (parent->child) tree (of any depth) in MongoDB aggregation? - mongodb

I have a nested tree level structure of item->items that looks something like this
{ "id":"1",
"type":"panel",
"items": [
{ "id":"2", "type":"input", },
{ "id":"4", "type":"group", "items": [
{ "id":"5", "type":"input" },
{ "id":"6", "type":"panel", "items":[...] },
]
}
]}
I'm looking to flatten the tree and get a single array list of all items like this:
[ { "id":"1", "type":"panel", },
{ "id":"2", "type":"input", },
{ "id":"4", "type":"panel", },
{ "id":"5", "type":"input", },
...
]
Is there a generic way to flatten the tree (that would work for any depth level)?
All answers I found here just manually $unwind each child level (I can't predict the number of levels) nor do I have reference to parent to use traverse with $graphLookup.
Or something like {'$*.items'}?

MQL doesn't have functions, so we can't recur, if we find a array.
Maybe there is a way to do it with MQL and 1 query.
But there is way to do it fast with more than 1 query.
The bellow example is 1 level/query.
With small change it can do 10 level/query or 100 level/query etc
so only 1 query will be needed, but we will do some redadent attempts to flatten arrays even if they are empty.
First 1 small modification.
Add 1 field on all documents "all-items": [{"id": "$id","type": "$type"}]
and removed the top level "id" and "type". Like bellow
aggregate(
[ {
"$project" : {
"all-items" : [ {
"id" : "$id",
"type" : "$type"
} ],
"items" : 1
}
} ]
)
Modified data
[
{
"all-items": [
{
"id": "1",
"type": "panel"
}
],
"items": [...like it was...]
}
]
And now we can do it with multiple queries 1 per/level
First call, code example
Second call, code example, with the result of first call
Third call we dont need, while will be false.
In each call we do $out, and we aggregate on the result of previous call.
while(there_is_1_document_with_not_empty_items[]) (send 1 find query)
db.collection.aggregate([
{
"$addFields": {
"level-nlevel": {
"$reduce": {
"input": "$items",
"initialValue": [
[],
[]
],
"in": {
"$let": {
"vars": {
"info": "$$value",
"i": "$$this"
},
"in": {
"$let": {
"vars": {
"level": {
"$arrayElemAt": [
"$$info",
0
]
},
"nlevel": {
"$arrayElemAt": [
"$$info",
1
]
}
},
"in": [
{
"$concatArrays": [
"$$level",
[
{
"id": "$$i.id",
"type": "$$i.type"
}
]
]
},
{
"$cond": [
{
"$isArray": [
"$$i.items"
]
},
{
"$concatArrays": [
"$$nlevel",
"$$i.items"
]
},
"$$nlevel"
]
}
]
}
}
}
}
}
}
}
},
{
"$project": {
"all-items": {
"$concatArrays": [
"$all-items",
{
"$arrayElemAt": [
"$level-nlevel",
0
]
}
]
},
"items": {
"$arrayElemAt": [
"$level-nlevel",
1
]
}
}
}
])
This flattens per document(no $unwind is used), if you want to flatten all collection, $unwind one time after the while ends the $all-items.

There is not a mongodb query language aggregation sage that supports flatting to an unknown depth, but $function would allow you to execute a method against the document
here is a javascript example:
var fn = function(items) {
var ret = [];
var toCheck = [...items];
while (toCheck.length) {
var nxtToCheck = [];
for (var item of toCheck) {
ret.push({ id: item.id, type: item.type });
nxtToCheck.push(...(item.items || []));
}
toCheck = nxtToCheck;
}
return ret;
}
db.myCol.aggregate([
{ $match: {} },
{ $addFields: { allItems: { $function: { body: fn, args: ["$items"], lang: "js" } } } }
]);

Related

MongoDB query to select documents with array with all of its elements matching some conditions

I am trying to come up with a query in MongoDB that lets me select documents in a collection based on the contents of subdocuments in a couple of levels deep arrays.
The collection in the example (simplified) represents situations. The purpose of the query is, given a moment in time, to know the currently active situation. The conditionGroups array represents different conditions in which the situation becomes active, and each of those has an array of conditions all of which have to be true.
In other words, the conditionGroups array operates as an OR condition, and its children array "conditions" operates as an AND. So, given any root document "situation", this situation will be active if at least one of its conditionGroups meets all of its conditions.
[
{
"name": "Weekdays",
"conditionGroups": [
{
"conditions": [
{
"type": "DayOfWeek",
"values": [1, 2, 3, 4, 5]
},
{
"type": "HourIni",
"values": [8]
},
{
"type": "HourEnd",
"values": [19]
}
]
}
]
},
{
"name": "Nights and weekends",
"conditionGroups": [
{
"conditions": [
{
"type": "DayOfWeek",
"values": [1, 2, 3, 4, 5]
},
{
"type": "HourIni",
"values": [20]
},
{
"type": "HourEnd",
"values": [23]
}
]
},
{
"conditions": [
{
"type": "DayOfWeek",
"values": [6, 7]
},
{
"type": "HourIni",
"values": [8]
},
{
"type": "HourEnd",
"values": [19]
}
]
}
]
},
{
"name": "Weekend night",
"conditionGroups": [
{
"conditions": [
{
"type": "DayOfWeek",
"values": [6, 7]
},
{
"type": "HourIni",
"values": [20]
},
{
"type": "HourEnd",
"values": [23]
}
]
}
]
}
]
Another thing to note is that there are other types of conditions, like DayOfMonth, Month, Year, and others that might come, so the query should look for conditions that match the type and value or do not exist at all.
Given this example data, and imagining a december monday at lunchtime (so DayOfWeek is 1, current hour is 12, DayOfMonth is 13, Month is 12, Year is 2021) only the first document should be selected, because it has a "conditionGroup" all of which conditions match the current parameters, even if parameters like DayOfMonth/Year/Month are not specified. The important thing is that all the conditions must be met.
Now, I've tried the following with no luck:
db.situations.find({
'conditionGroups': { $all: [
{
$elemMatch: { $nor: [
{ 'conditions.type': 'HourIni', 'conditions.values.0': { $gt: 12 } },
{ 'conditions.type': 'HourEnd', 'conditions.values.0': { $lte: 12 } },
{ 'conditions.type': 'DayOfWeek', 'conditions.values.0': { $nin: [1] } },
{ 'conditions.type': 'DayOfMonth', 'conditions.values.0': { $nin: [13] } },
{ 'conditions.type': 'Month', 'conditions.values.0': { $nin: [12] } },
{ 'conditions.type': 'Year', 'conditions.values.0': { $nin: [2021] } },
]}
}
] }
})
This query is coming back empty.
Another thing I've tried is to first unwind the conditionGroups with the aggregation pipeline, and then try $elemMatch on conditions, but getting odd results. My guess is that I don't fully understand the $elemMatch and other array operators and I'm confusing them somehow...
It's quite a tricky question...so I've simplified it, but a largely appreciated bonus would be to consider that every condition, apart from "type" and "values" can also have an "inverse" boolean attribute that acts like a "not", so that condition would have to be "reversed".
I've spent many hours trying to get this to work but I'm kind of lost now. I understand the info might not be enough, so if anyone was able to give me a hint I could provide extra info if needed...
Any tip would be appreciated as I'm quite lost! ;)
You can do the following in an aggregation pipeline:
$unwind conditionGroups for future processing/filtering
use a $switch to perform condition checking on the condition level. Set the result to be true if the condition is matched, otherwise set the result to be false. By using $map, you obtained a mapped boolean result for the condition array
$allElementsTrue to check if the result array in step 2 is all true; if true, that means one condition passed all matchings
use the _id to find back the _id of all original documents
db.collection.aggregate([
{
"$addFields": {
"dateInput": ISODate("2021-12-13T12:00:00Z")
}
},
{
"$unwind": "$conditionGroups"
},
{
"$addFields": {
"matchedCondition": {
"$map": {
"input": "$conditionGroups.conditions",
"as": "c",
"in": {
"$switch": {
"branches": [
{
"case": {
$and: [
{
$eq: [
"$$c.type",
"DayOfWeek"
]
},
{
"$in": [
{
"$dayOfWeek": "$dateInput"
},
"$$c.values"
]
}
]
},
"then": true
},
{
"case": {
$and: [
{
$eq: [
"$$c.type",
"HourIni"
]
},
{
"$gt": [
{
"$hour": "$dateInput"
},
{
"$arrayElemAt": [
"$$c.values",
0
]
}
]
}
]
},
"then": true
},
{
"case": {
$and: [
{
$eq: [
"$$c.type",
"HourEnd"
]
},
{
"$lte": [
{
"$hour": "$dateInput"
},
{
"$arrayElemAt": [
"$$c.values",
0
]
}
]
}
]
},
"then": true
}
],
default: false
}
}
}
}
}
},
{
"$match": {
$expr: {
$eq: [
true,
{
"$allElementsTrue": "$matchedCondition"
}
]
}
}
},
{
"$group": {
"_id": "$_id"
}
},
{
"$lookup": {
"from": "collection",
"localField": "_id",
"foreignField": "_id",
"as": "originalDocument"
}
},
{
"$unwind": "$originalDocument"
},
{
"$replaceRoot": {
"newRoot": "$originalDocument"
}
}
])
Here is the Mongo playground for your reference.

How to filter an array of objects in mongoose by date field only selecting the most recent date

I'm trying to filter through an array of objects in a user collection on MongoDB. The structure of this particular collection looks like this:
name: "John Doe"
email: "john#doe.com"
progress: [
{
_id : ObjectId("610be25ae20ce4872b814b24")
challenge: ObjectId("60f9629edd16a8943d2cab9b")
date_unlocked: 2021-08-05T12:15:32.129+00:00
completed: true
date_completed: 2021-08-06T12:15:32.129+00:00
}
{
_id : ObjectId("611be24ae32ce4772b814b32")
challenge: ObjectId("60g6723efd44a6941l2cab81")
date_unlocked: 2021-08-06T12:15:32.129+00:00
completed: true
date_completed: 2021-08-07T12:15:32.129+00:00
}
]
date: 2021-08-04T13:06:34.129+00:00
How can I query the database using mongoose to return only the challenge with the most recent 'date_unlocked'?
I have tried: User.findById(req.user.id).select('progress.challenge progress.date_unlocked').sort({'progress.date_unlocked': -1}).limit(1);
but instead of returning a single challenge with the most recent 'date_unlocked', it is returning the whole user progress array.
Any help would be much appreciated, thank you in advance!
You can try this.
db.collection.aggregate([
{
"$unwind": {
"path": "$progress"
}
},
{
"$sort": {
"progress.date_unlocked": -1
}
},
{
"$limit": 1
},
{
"$project": {
"_id": 0,
"latestChallenge": "$progress.challenge"
}
}
])
Test the code here
Alternative solution is to use $reduce in that array.
db.collection.aggregate([
{
"$addFields": {
"latestChallenge": {
"$arrayElemAt": [
{
"$reduce": {
"input": "$progress",
"initialValue": [
"0",
""
],
"in": {
"$let": {
"vars": {
"info": "$$value",
"progress": "$$this"
},
"in": {
"$cond": [
{
"$gt": [
"$$progress.date_unlocked",
{
"$arrayElemAt": [
"$$info",
0
]
}
]
},
[
{
"$arrayElemAt": [
"$$info",
0
]
},
"$$progress.challenge"
],
"$$info"
]
}
}
}
}
},
1
]
}
}
},
{
"$project": {
"_id": 0,
"latestChallenge": 1
}
},
])
Test the code here
Mongoose can use raw MQL so you can use it.

Zip two array and create new array of object

hello all i'm working with a MongoDB database where each data row is like:
{
"_id" : ObjectId("5cf12696e81744d2dfc0000c"),
"contributor": "user1",
"title": "Title 1",
"userhasRate" : [
"51",
"52",
],
"ratings" : [
4,
3
],
}
and i need to change it to be like:
{
"_id" : ObjectId("5cf12696e81744d2dfc0000c"),
"contributor": "user1",
"title": "Title 1",
rate : [
{userhasrate: "51", value: 4},
{userhasrate: "52", value: 3},
]
}
I already try using this method,
db.getCollection('contens').aggregate([
{ '$group':{
'rates': {$push:{ value: '$ratings', user: '$userhasRate'}}
}
}
]);
and my result become like this
{
"rates" : [
{
"value" : [
5,
5,
5
],
"user" : [
"51",
"52",
"53"
]
}
]
}
Can someone help me to solve my problem,
Thank you
You can use $arrayToObject and $objectToArray inside $map to achieve the required output.
db.collection.aggregate([
{
"$project": {
"rate": {
"$map": {
"input": {
"$objectToArray": {
"$arrayToObject": {
"$zip": {
"inputs": [
"$userhasRate",
"$ratings"
]
}
}
}
},
"as": "el",
"in": {
"userhasRate": "$$el.k",
"value": "$$el.v"
}
}
}
}
}
])
Alternative Method
If userhasRate contains repeated values then the first solution will not work. You can use arrayElemAt and $map along with $zip if it contains repeated values.
db.collection.aggregate([
{
"$project": {
"rate": {
"$map": {
"input": {
"$zip": {
"inputs": [
"$userhasRate",
"$ratings"
]
}
},
"as": "el",
"in": {
"userhasRate": {
"$arrayElemAt": [
"$$el",
0
]
},
"value": {
"$arrayElemAt": [
"$$el",
1
]
}
}
}
}
}
}
])
Try below aggregate, first of all you used group without _id that grouped all the JSONs in the collection instead set it to "$_id" also you need to create 2 arrays using old data then in next project pipeline concat the arrays to get desired output:
db.getCollection('contens').aggregate([
{
$group: {
_id: "$_id",
rate1: {
$push: {
userhasrate: {
$arrayElemAt: [
"$userhasRate",
0
]
},
value: {
$arrayElemAt: [
"$ratings",
0
]
}
}
},
rate2: {
$push: {
userhasrate: {
$arrayElemAt: [
"$userhasRate",
1
]
},
value: {
$arrayElemAt: [
"$ratings",
1
]
}
}
}
}
},
{
$project: {
_id: 1,
rate: {
$concatArrays: [
"$rate1",
"$rate2"
]
}
}
}
])

Using $elemMatch and $or to implement a fallback logic (in projection)

db.projects.findOne({"_id": "5CmYdmu2Aanva3ZAy"},
{
"responses": {
"$elemMatch": {
"match.nlu": {
"$elemMatch": {
"intent": "intent1",
"$and": [
{
"$or": [
{
"entities.entity": "entity1",
"entities.value": "value1"
},
{
"entities.entity": "entity1",
"entities.value": {
"$exists": false
}
}
]
}
],
"entities.1": {
"$exists": false
}
}
}
}
}
})
In a given project I need a projection containing only one response, hence $elemMatch. Ideally, look for an exact match:
{
"entities.entity": "entity1",
"entities.value": "value1"
}
But if such a match doesn't exist, look for a record where entities.value does not exist
The query above doesn't work because if it finds an item with entities.value not set it will return it. How can I get this fallback logic in a Mongo query
Here is an example of document
{
"_id": "5CmYdmu2Aanva3ZAy",
"responses": [
{
"match": {
"nlu": [
{
"entities": [],
"intent": "intent1"
}
]
},
"key": "utter_intent1_p3vE6O_XsT"
},
{
"match": {
"nlu": [
{
"entities": [{
"entity": "entity1",
"value": "value1"
}],
"intent": "intent1"
}
]
},
"key": "utter_intent1_p3vE6O_XsT"
},
{
"match": {
"nlu": [
{
"intent": "intent2",
"entities": []
},
{
"intent": "intent1",
"entities": [
{
"entity": "entity1"
}
]
}
]
},
"key": "utter_intent2_Laag5aDZv2"
}
]
}
To answer the question, the first thing to start with is that doing what you want is not as simple as an $elemMatch projection and requires special projection logic of the aggregation framework. The second main principle here is "nesting arrays is a really bad idea", and this is exactly why:
db.collection.aggregate([
{ "$match": { "_id": "5CmYdmu2Aanva3ZAy" } },
{ "$addFields": {
"responses": {
"$filter": {
"input": {
"$map": {
"input": "$responses",
"in": {
"match": {
"nlu": {
"$filter": {
"input": {
"$map": {
"input": "$$this.match.nlu",
"in": {
"entities": {
"$let": {
"vars": {
"entities": {
"$filter": {
"input": "$$this.entities",
"cond": {
"$and": [
{ "$eq": [ "$$this.entity", "entity1" ] },
{ "$or": [
{ "$eq": [ "$$this.value", "value1" ] },
{ "$ifNull": [ "$$this.value", false ] }
]}
]
}
}
}
},
"in": {
"$cond": {
"if": { "$gt": [{ "$size": "$$entities" }, 1] },
"then": {
"$slice": [
{ "$filter": {
"input": "$$entities",
"cond": { "$eq": [ "$$this.value", "value1" ] }
}},
0
]
},
"else": "$$entities"
}
}
}
},
"intent": "$$this.intent"
}
}
},
"cond": { "$ne": [ "$$this.entities", [] ] }
}
}
},
"key": "$$this.key"
}
}
},
"cond": { "$ne": [ "$$this.match.nlu", [] ] }
}
}
}}
])
Will return:
{
"_id" : "5CmYdmu2Aanva3ZAy",
"responses" : [
{
"match" : {
"nlu" : [
{
"entities" : [
{
"entity" : "entity1",
"value" : "value1"
}
],
"intent" : "intent1"
}
]
},
"key" : "utter_intent1_p3vE6O_XsT"
}
]
}
That is extracting ( as best I can determine your specification ), the first matching element from the nested inner array of entities where the conditions for both entity and value are met OR where the value property does not exist.
Note the additional fallback in that if both conditions meant returning multiple array elements, then only the first match where the value was present and matching would be the result returned.
Querying deeply nested arrays requires chained usage of $map and $filter in order to traverse those array contents and return only items which match the conditions. You cannot specify these conditions in an $elemMatch projection, nor has it even been possible until recent releases of MongoDB to even atomically update such structures without overwriting significant parts of the document or introducing problems with update concurrency.
More detailed explanation of this is on my existing answer to Updating a Nested Array with MongoDB and from the query side on Find in Double Nested Array MongoDB.
Note that both responses there show usage of $elemMatch as a "query" operator, which is really only about "document selection" ( therefore does not apply to an _id match condition ) and cannot be used in concert with the former "projection" variant nor the positional $ projection operator.
You would be advised then to "not nest arrays" and instead take the option of "flatter" data structures as those answers already discuss at length.

mongodb - how to aggregate/filter elements in different subdocuments?

I have a doc looks like below:
{
"contents": [
{
"translationId": "MENU",
},
{
"translationId": "PAGETITLE"
}
],
"slides": [
{
"translationId": "SLIDE1",
"imageUrl": "assets/img/room/1.jpg",
"desc": {
"translationId": "DESC",
}
},
{
"translationId": "SLIDE2",
"imageUrl": "assets/img/aa/2.jpg"
}
]}
I would like to aggregate against the translationId no matter in which subdocument the data is. My current query is like below which does not give me the expected result.
db.cursor.find({"contents.translationId": { $exists: true }},
{"contents.translationId":1,'slides.translationId':1,"slides.desc.translationId":1,'_id':0})
I expect result like below. Is there a good approach to retrieve such a result directly from mongodb query?
[
{
"translationId": "MENU"
},
{
"translationId": "PAGETITLE"
},
{
"translationId": "SLIDE1"
},
{
"translationId": "SLIDE2"
},
{
"translationId": "DESC"
}
]
Additionally, I might not know in which element translationId might exists. In this case it resides in contents, slides and slides.desc but it might also be under some other elements. Is it possible?
Thanks!
As long as the items are unqiue you can use the $setUnion operator in modern MongoDB releases 2.6 and over, as well as the $map operator for transaltion of just the required element from the other array:
db.cursor.aggregate([
{ "$project": {
"joined": {
"$setDifference": [
{ "$setUnion": [
"$contents",
{ "$map": {
"input": "$slides",
"as": "slide",
"in": {
"translationId": "$$slide.translationId"
}
}},
{ "$map": {
"input": "$slides",
"as": "slide",
"in": {
"$cond": [
{ "$ifNull": [ "$$slide.desc.translationId", false] },
{ "translationId": "$$slide.desc.translationId" },
false
]
}
}}
]},
[false]
]
}
}}
])
You also need $setDifference to filter out any false values returned where the "desc" field is not present.
It produces:
{
"_id" : ObjectId("55f13f444db9bc30de351c84"),
"joined" : [
{
"translationId" : "DESC"
},
{
"translationId" : "SLIDE2"
},
{
"translationId" : "SLIDE1"
},
{
"translationId" : "PAGETITLE"
},
{
"translationId" : "MENU"
}
]
}
Of course if you have no idea of the structure "at all", then you need a recursive function with mapReduce instead:
db.cursor.mapReduce(
function() {
var tags = [];
function walkObj(obj) {
Object.keys(obj).forEach(function(key) {
if ( typeof(obj[key]) == "object" ) {
walkObj(obj[key]);
} else if ( key == "translationId" ) {
tags.push({ "translationId": obj[key] })
}
});
}
walkObj(this);
emit(this._id,{ "joined": tags})
},
function(){},
{ "out": { "inline": 1 } }
)
Which gives basically the same output as before but of course does not need to be aware of the structure