I have a question about a problem I came across while trying to use $setDifference on a collection of documents.
All I want to have are all documents that are contained in Root 1 and remove all documents that are also included in Root 2 based on the "reference.id".
My collection represents two tree structures and basically looks like this:
/* Tree Root 1 */
{
"_id" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"name" : "Root 1",
"children" : [
LUUID("ca01f1ab-7c32-4e6b-a07a-e0ee9d8ec5ac"),
LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213")
]
},
/* Child 1 - Root 1 */
{
"_id" : LUUID("ca01f1ab-7c32-4e6b-a07a-e0ee9d8ec5ac"),
"parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"reference" : {
"type" : "someType",
"id" : LUUID("331503FB-C4D1-4F7A-A461-933C701EF9AB")
},
"rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"name" : "Child 1 (Root 1)"
}
/* Child 2 - Root 1 */
{
"_id" : LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213"),
"parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"reference" : {
"type" : "someType",
"id" : LUUID("23E8B540-3EFB-455A-AA5C-2B67D6B59943")
},
"rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"displayName" : "Child 2 (Root 1)"
}
/* Tree Root 2 */
{
"_id" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
"name" : "Root 2",
"children" : [
LUUID("ad4ad076-322e-4c26-8855-91c9b1912d1f"),
LUUID("66452420-dd2f-4d27-91c9-78bd0990817c")
]
},
/* Child 1 - Root 2 */
{
"_id" : LUUID("ad4ad076-322e-4c26-8855-91c9b1912d1f"),
"parentId" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
"reference" : {
"type" : "someType",
"id" : LUUID("331503FB-C4D1-4F7A-A461-933C701EF9AB")
},
"rootReferenceId" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
"displayName" : "Child 1 (Root 2)"
}
That means in the end I want to have the document:
/* Child 2 - Root 1 */
{
"_id" : LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213"),
"parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"reference" : {
"type" : "someType",
"id" : LUUID("23E8B540-3EFB-455A-AA5C-2B67D6B59943")
},
"rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"displayName" : "Child 2 (Root 1)"
}
Because its reference.id is contained in Root 1 but not in Root 2 (so it will not be excluded from the result set like Child 1)
I already wrote an aggregation stage to group the "reference.id"s like this:
db.getCollection('test').aggregate([
{
$match: {
rootReferenceId: { $ne: null }
}
},
{
$group: {
_id: "$rootReferenceId",
referenceIds: { $addToSet: "$reference.id" }
}
}
])
What returns me this:
/* 1 */
{
"_id" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
"referenceIds" : [
LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
]
}
/* 2 */
{
"_id" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
"referenceIds" : [
LUUID("23e8b540-3efb-455a-aa5c-2b67d6b59943"),
LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
]
}
Has anyone an idea how I can $project this into a format that $setDifference accepts?
I think it needs to look like this:
{
LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9") : [
LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
]
LUUID("9f3a73df-bca7-48b7-b111-285359e50a02") : [
LUUID("23e8b540-3efb-455a-aa5c-2b67d6b59943"),
LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
]
}
Or it there a complete different way to achieve this i am not aware of?
Any help is appreciated!
Edit Solution:
The solution is now like dnickless suggested. Really a nice one! Thanks a lot for this!
Here is what you could do without storing duplicate values in a string format. What's nice about this solution is that
a) it returns the entire document that you are interested in so you don't need a second query (if you do not need the entire document then the $filter operator can simply be replaced with the $setDifference bit)
b) it consists of very few and cheap stages (no grouping!) and will leverage indices on the rootReferenceId field (if there are any which I would recommend).
db.getCollection('test').aggregate([
{ "$facet": {
"allInRoot1": [{
"$match": { "rootReferenceId": LUUID("9f3a73df-bca7-48b7-b111-285359e50a02") }
}],
"allInRoot2": [{
"$match": { "rootReferenceId": LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9") }
}]
}}, {
"$project": {
"difference": {
"$filter": {
"input": "$allInRoot1",
"as": "this",
"cond": { "$in": [ "$$this.reference.id", { "$setDifference": [ "$allInRoot1.reference.id", "$allInRoot2.reference.id" ] } ] }
}
}
}
}
])
You can try below aggregation in mongodb 3.6 and above.
db.getCollection('test').aggregate([
{ "$match": { "rootReferenceId": { "$ne": null }}},
{ "$group": {
"_id": "$rootReferenceId",
"referenceIds": { "$addToSet": "$reference.id" }
}},
{ "$group": {
"_id": null,
"data": {
"$push": { "k": { "$toString": "$_id" }, "v": "$referenceIds" }
}
}},
{ "$replaceRoot": { "newRoot": { "$arrayToObject": "$data" }}}
])
Related
I have an aggregation pipeline that nearly does what I want. I've used match / unwind / project / sort to get 99% of the way. It is returning multiple documents:
[
{
"_id" : 254.8
},
{
"_id" : 93.7
},
{
"_id" : 89.9
},
{
"_id" : 94.15
},
{
"_id" : 102.1
},
{
"_id" : 93.9
},
{
"_id" : 102.7
}
]
Note: I've added the array brackets and commas to make it more readable, but you can also read it as:
{
"_id" : 254.8
}
{
"_id" : 93.7
}
{
"_id" : 89.9
}
{
"_id" : 94.15
}
{
"_id" : 102.1
}
I need the contents of the ID fields from all 7 documents in an array of values in one document:
{values: [254.8, 93.7, 89.9, 94.15, 102.1, 93.9, 102.7]}
It would be easy to sort this with JS once I have the results but I'd rather do it in the pipeline if possible so my JS stays 100% generic and only returns pure pipeline data.
Here is what you need to complete the job:
db.collection.aggregate([
{
"$group": {
"_id": null,
"values": {
$push: "$_id"
}
}
},
{
"$project": {
_id: false
}
}
])
The result will be:
[
{
"values": [
254.8,
93.7,
89.9,
94.15,
102.1,
93.9,
102.7
]
}
]
https://mongoplayground.net/p/pTmR_rni0J1
I'm having group of elements in MongoDB as given below:
/* 1 */
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"Name" : "Kevin",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2014-08-31"
},
{
"event_type" : "Anniversary",
"event_date" : "2014-08-31"
}
]
}
/* 2 */
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"Name" : "Peter",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2014-08-31"
},
{
"event_type" : "Anniversary",
"event_date" : "2015-03-24"
}
]
}
/* 3 */
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"Name" : "Pole",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2015-03-24"
},
{
"event_type" : "Work Anniversary",
"event_date" : "2015-03-24"
}
]
}
Now I want the result that has group on event_date then after group on event_type. event_type contain all names of the related user, then count of records in the respective array.
Expected Output
/* 1 */
{
"event_date" : "2014-08-31",
"data" : [
{
"event_type" : "Birthday",
"details" : [
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"name" : "Kevin"
},
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"name" : "Peter"
}
],
"count" : 2
},
{
"event_type" : "Anniversary",
"details" : [
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"name" : "Kevin"
}
],
"count" : 1
}
]
}
/* 2 */
{
"event_date" : "2015-03-24",
"data" : [
{
"event_type" : "Anniversary",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"name" : "Peter"
}
],
"count" : 1
},
{
"event_type" : "Birthday",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"name" : "Pole"
}
],
"count" : 1
},
{
"event_type" : "Work Anniversary",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"name" : "Pole"
}
],
"count" : 1
}
]
}
Using the aggregation framework, you would need to run a pipeline that has the following stages so that you get the desired result:
db.collection.aggregate([
{ "$unwind": "$pb_event" },
{
"$group": {
"_id": {
"event_date": "$pb_event.event_date",
"event_type": "$pb_event.event_type"
},
"details": {
"$push": {
"_id": "$_id",
"name": "$Name"
}
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.event_date",
"data": {
"$push": {
"event_type": "$_id.event_type",
"details": "$details",
"count": "$count"
}
}
}
},
{
"$project": {
"_id": 0,
"event_date": "$_id",
"data": 1
}
}
])
In the above pipeline, the first step is the $unwind operator
{ "$unwind": "$pb_event" }
which comes in quite handy when the data is stored as an array. When the unwind operator is applied on a list data field, it will generate a new record for each and every element of the list data field on which unwind is applied. It basically flattens the data.
This is a necessary operation for the next pipeline stage, the $group step where you group the flattened documents by the deconstructed pb_event array fields event_date and event_type:
{
"$group": {
"_id": {
"event_date": "$pb_event.event_date",
"event_type": "$pb_event.event_type"
},
"details": {
"$push": {
"_id": "$_id",
"name": "$Name"
}
},
"count": { "$sum": 1 }
}
},
The $group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, you have to use an aggregation function in MongoDB (called an accumulator operator) as well. You can read more about the aggregation functions here.
In this $group operation, the logic to calculate the count aggregate i.e. the total number of documents in the group using the $sum accumulator operator. Within the same pipeline, you can aggregate a list of the name and _id subdocuments by using the $push operator which returns an array of expression values for each group.
The preceding $group pipeline
{
"$group": {
"_id": "$_id.event_date",
"data": {
"$push": {
"event_type": "$_id.event_type",
"details": "$details",
"count": "$count"
}
}
}
}
will further aggregate the results from the last pipeline by grouping on the event_date, which forms basis of the desired output by creating a new data list using $push and then the final $project pipeline stage
{
"$project": {
"_id": 0,
"event_date": "$_id",
"data": 1
}
}
reshapes the documents fields by renaming the _id field to event_date and retaining the other field.
{
_id:1, members: [
{
name:"John",
status:"A"
},
{
name:"Alex",
status:"D"
},
{
name:"Jack",
status:"A"
},
{
name:"Robin",
status:"D"
}
]}
That is Channel document.
Now I need to count all elements in members array where status equal to 'A'.
For example the above doc has 2 members with status 'A'.
How can I achieve this?
You can use mongodb-count to achieve the desired result.
Returns the count of documents that would match a find() query. The db.collection.count() method does not perform the find() operation but instead counts and returns the number of results that match a query.
So your query will be
var recordcount = db.collName.count({"members.status":"A"});
Now recordCount will be number of records that matches {"members.status":"A"} query.
Here Is your Json file
{
"_id" : ObjectId("575915653b3cc43fca1fca4c"),
"members" : [
{
"name" : "John",
"status" : "A"
},
{
"name" : "Alex",
"status" : "D"
},
{
"name" : "Jack",
"status" : "A"
},
{
"name" : "Robin",
"status" : "D"
}
]
}
And you want to the count of all elements in members array where
status equal to 'A'.
you have to try this one to find out your count
db.CollectionName.aggregate([{
"$project": {
"members": {
"$filter": {
"input": "$members",
"as": "mem",
"cond": {
"$eq": ["$$mem.status", "A"]
}
}
}
}
}, {
"$project": {
"membersize": {
"$size": "$members"
}
}
}]).pretty()
And you found your answer is like that { "_id" :
ObjectId("575915653b3cc43fca1fca4c"), "membersize" : 2 }
try this one for old version......
db.CollectionName.aggregate([{"$unwind":"$members"},{"$match":{"members.status":"A"}},{"$group":{_id:"$_id","memberscount":{"$sum":1}}}]).pretty()
{ "_id" : ObjectId("575915653b3cc43fca1fca4c"), "memberscount" : 2 }
Here Is your Json file
{
"_id" : ObjectId("575915653b3cc43fca1fca4c"),
"members" : [
{
"name" : "John",
"status" : "A"
},
{
"name" : "Alex",
"status" : "D"
},
{
"name" : "Jack",
"status" : "A"
},
{
"name" : "Robin",
"status" : "D"
}
]
}
And you want to the count of all elements in members array where
status equal to 'A'.
you have to try this one to find out your count
db.CollectionName.aggregate([{
"$project": {
"members": {
"$filter": {
"input": "$members",
"as": "mem",
"cond": {
"$eq": ["$$mem.status", "A"]
}
}
}
}
}, {
"$project": {
"membersize": {
"$size": "$members"
}
}
}]).pretty()
And you found your answer is like that { "_id" :
ObjectId("575915653b3cc43fca1fca4c"), "membersize" : 2 }
I want to compare two dynamic fields like I'd do with:
$where: [ "$foo_update > $bar_update" ]
I need this to get a bunch of objects that must be updated. It depends on several conditions if they must get updated, so that's why I want to make it with an aggregation.
The current query for the related part looks like:
[
{ $sort: { "updated_at": 1 }
{ $group: {
"_id" : "$bar",
"foo" : { "$first" : "$foo" },
"bar" : { "$first" : "$bar" },
"last_update" : { "$last " : "$updated_at" }
} },
{ $lookup: {
"from" : "table_foo",
"localField" : "foo",
"foreignField" : "_id",
"as" : "foo"
} },
{ $lookup: {
"from" : "table_bar",
"localField" : "bar",
"foreignField" : "_id",
"as" : "bar"
} }
]
Here I could follow with another $group operator to get the values I need out to the top-level. But I cannot do that with the lookup values as it is mostly an array of items.
Here, one item is expected (and I make a query for that too as we need update if the other item is removed).
So now I want to compare the $last_update and the foo.update_at field. It would look something like this in my head.
$match: {
"foo": {
$elemMatch: {
"updated_at": { "$gte": "$last_update" }
}
}
}
Is this even possible?
If yes, how would you do it?
And yes, it is possible. It turned out that you can move the value of the array to the top with operators like $max.
So my solution looks like this now:
[
/** the beginning of the query above **/
{ $group: {
"_id" : "$_id",
"foo" : { "$first" : "$foo" },
"foo_updated" : { "$max" : "$foo_updated_at" },
"bar" : { "$first" : "$bar" },
"bar_updated" : { "$max" : "$bar.updated_at" },
"last_update" : { "$last " : "$updated_at" }
}, $project: {
"_id" : "$_id",
"foo" : "$foo",
"foo_updated" : { "$gt": [ "$foo_updated", "$last_create" ] },
"bar" : "$bar",
"bar_updated" : { "$gt": [ "$bar_updated", "$last_create" ] },
"last_create" : "$last_create"
}, $match: {
"$or": [
/** other conditions **/
{ "foo_updated": true },
{ "bar_updated": true }
]
} }
]
I got a collection of companies that looks like this. I also want to merge other documents deals.
I need this:
{
"_id" : ObjectId("561637942d25a7644cae993e"),
"locations" : [
{
"deals" : [
{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},
{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
}
]
}
],
"deals" : [
{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
}
]
}
{
"_id" : ObjectId("561637942d25a7644cae993e"),
"locations" : [
{
"deals" : [
{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9940")
}
]
}
],
"deals" : []
}
To be like this:
{
"deals": [{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
},{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
},{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9949")
}]
}
But I have only failed to do this. It seems like if I want all the deals to be grouped together into one array I should not use unwind since that create more documents because I only need to group once.
This is my attempt which does not work at all.
{
"$project": {
"_id": 1,
"locations": 1,
"deals": 1
}
}, {
"$unwind": "$locations"
}, {
"$unwind": "$locations.deals"
}, {
"$unwind": "$deals"
}, {
"$group": {
"_id": null,
"deals": {
"$addToSet": "$locations.deals",
"$addToSet": "$deals"
}
}
}
You should first use filter your documents to reduce the size of documents to process in the pipeline using the $match operator. Then we need to $unwind the "locations" array after that we use the $project operator to reshape your documents. The $cond operator is used to return a single element array [false] if the deals field is empty array or the deals value because $unwinding empty array will throw an exception. Of course the $setUnion operator does return an array of element that appear in the locations.deals array or the deals array. We then use the $setDifference operator to filter out the false element from the merged array. We then need another $unwind stage where we deconstruct the deals array. From there we can easily $group your documents.
db.collection.aggregate([
{ "$match": { "locations.0": { "$exists": true } } },
{ "$unwind": "$locations" },
{ "$project": {
"deals": {
"$setDifference": [
{ "$setUnion": [
{ "$cond": [
{ "$eq" : [ { "$size": "$deals" }, 0 ] },
[false],
"$deals"
]},
"$locations.deals"
]},
[false]
]
}
}},
{ "$unwind": "$deals" },
{ "$group": {
"_id": null,
"deals": { "$addToSet": "$deals" }
}}
])
Which returns:
{
"_id" : null,
"deals" : [
{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},
{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
},
{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
},
{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9940")
}
]
}