How to solve empty array with $unwind? - mongodb

I have a table and save like following:
{ "_id" : ObjectId("5716617f4af77ca97a9614bd"), "count" : 1, "author" : "Tony", "music" : [ { "_id" : ObjectId("571661cd4af77ca97a9614c1"), "count" : 2, "author" : "Tony" } ] }
{ "_id" : ObjectId("5716617f4af77ca97a9614be"), "count" : 2, "author" : "Joe", "music" : [ { "_id" : ObjectId("571661cd4af77ca97a9614c0"), "count" : 1, "author" : "Joe" } ] }
{ "_id" : ObjectId("5716617f4af77ca97a9614bf"), "count" : 3, "author" : "Mary", "music" : [ ] }
I hope to find the number of record that "$count" > "$music.count". But when I do {$unwind:"$music"}, I get following:
{ "_id" : ObjectId("5716617f4af77ca97a9614bd"), "count" : 1, "author" : "Tony", "music" : { "_id" : ObjectId("571661cd4af77ca97a9614c1"), "count" : 2, "author" : "Tony" } }
{ "_id" : ObjectId("5716617f4af77ca97a9614be"), "count" : 2, "author" : "Joe", "music" : { "_id" : ObjectId("571661cd4af77ca97a9614c0"), "count" : 1, "author" : "Joe" } }
The third record disappear. How can I get the result like:
{ "_id" : ObjectId("5716617f4af77ca97a9614bd"), "count" : 1, "author" : "Tony", "music" : { "_id" : ObjectId("571661cd4af77ca97a9614c1"), "count" : 2, "author" : "Tony" } }
{ "_id" : ObjectId("5716617f4af77ca97a9614be"), "count" : 2, "author" : "Joe", "music" : { "_id" : ObjectId("571661cd4af77ca97a9614c0"), "count" : 1, "author" : "Joe" } }
{ "_id" : ObjectId("5716617f4af77ca97a9614bf"), "count" : 3, "author" : "Mary", "music" : {"count": 0} }
The initial records are got by $loopup, The total code is like following:
db.bookAuthors.aggregate([{
$lookup:{from:"musicAuthors", localField:"author", foreignField:"author",as:"music"}},
{$unwind:"$music"},
{$project:{_id:"$author",count:1,music:1}},
{$match:{$gt:["$count","$music.count"]}},
{$group:{_id:null,count:{$sum:1}}}
])
How can I do to find the number of record that "$count" > "$music.count"? In this example, the result should be 2. But now due to the unwind problem, I get 1. Thanks.

In MongoDB 3.2 ( which you are using if you have $lookup ) the $unwind operator has the preserveNullAndEmptyArrays option. This changes the behaviour to "not" remove the document from results where the array is in fact "empty":
db.bookAuthors.aggregate([
{ "$lookup":{
"from": "musicAuthors",
"localField": "author",
"foreignField": "author",
"as": "music"
}},
{ "$unwind": { "path": "$music", "preserveNullAndEmptyArrays": true },
{ "$project": {
"count": 1,
"author": 1,
"music": {
"$ifNull": [ "$music", { "$literal": { "count": 0 } }] },
}
}}
])
And the $ifNull replaces the missing value in this case.
But actually since your association here is 1:1 then you could just forego the $unwind altogether, and simply replace the empty array:
db.bookAuthors.aggregate([
{ "$lookup":{
"from": "musicAuthors",
"localField": "author",
"foreignField": "author",
"as": "music"
}},
{ "$project": {
"count": 1,
"author": 1,
"music": {
"$ifNull": [
{ "$arrayElemAt": [ "$music", 0 ] },
{ "$literal": { "count": 0 } }
]
}
}}
])
And there if $arrayElemAt found nothing at the 0 index ( therefore "empty" ) then the $ifNull would return the alternate value just as before. Of course where it did find something, then that value is returned instead.
But again, your specific problem still has a better solution, which again does not need $unwind. Since you can just calculate the "count" condition "in-line" with the array:
db.bookAuthors.aggregate([
{ "$lookup":{
"from": "musicAuthors",
"localField": "author",
"foreignField": "author",
"as": "music"
}},
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$cond": {
"if": {
"$gt": [
"$count",
{ "$sum": {
"$map": {
"input": "$music",
"as": "el",
"in": "$$el.count"
}
}}
]
},
"then": 1,
"else": 0
}
}
}
}}
])
Here the $sum operator is used in both of it's use cases, as it's traditional "accumulator" and new role in "summing" values in an array. The $map operator looks at each array element and returns the value to $sum to produce a total. An "empty" array would return as 0.
Then there is the $cond comparison to determine if the returned total from the array was less than the "count" property on the document. Where true a 1 is returned for the accumulator, and where false it gets 0.
The end result is of course 2, since both the "first" and "third" documents actually match the condition inside the accumulator. So that really is the most efficient way to do this, even if it looks a bit "long winded" in the process.

Related

Mongodb Search with embedded document.

Structure of mongodb collection is like this.
collection User
{
"name":"sufaid",
"age":"22",
"address":"zzzz",
"product":[{"id":1,"name":"A"},
{"id":6,"name":"N"},
{"id":3,"name":"D"},
{"id":7,"name":"q"},
]
}
I need to find users those who have product id "3"
Out put should be like this
{
"name":"sufaid",
"age":"22",
"address":"zzzz",
"product":{"id":3,"name":"D"}
}
Note : With out using $unwind and projection like "product.$"
"product.$" through error while using pymongo.
Any other option is there ???
use $elemMatch. https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/
for your query:
db.User.find({},{name:1,age:1,address:1,product:{$elemMatch:{id:3}}})
or
db.User.find({},{product:{$elemMatch:{id:3}}})
o/p: {
"name" : "sufaid",
"age" : "22",
"address" : "zzzz",
"product" : [
{
"id" : 3.0,
"name" : "D"
}
]
}
As you require it for aggregation:
db.User.aggregate([
{$unwind:'$product'},
{$match:{'product.id':3}},
{$project:{_id:0,name:1,age:1,aaddress:1,product:1}}
])
o/p:
{
"name" : "sufaid",
"age" : "22",
"address" : "zzzz",
"product" : {
"id" : 3.0,
"name" : "D"
}
}
This will give exactly what you indicated in the question.
You could use the aggregation framework which has a plethora of operators that you can use, in particular you'd need the $filter and $arrayElemAt operators in a $project pipeline.
For instance, you could return just the product field as an embedded document by running the following pipeline:
db.user.aggregate([
{ "$match": { "product.id": 3 } },
{
"$project": {
"name": 1,
"age": 1,
"address": 1,
"product": {
"$arrayElemAt": [
{
"$filter": {
"input": "$product",
"as": "item",
"cond": { "$eq": [ "$$item.id", 3 ] }
}
},
0
]
}
}
}
])
Sample Output
{
"_id" : ObjectId("5829ac89628123dcf8a64b7a"),
"name" : "sufaid",
"age" : "22",
"address" : "zzzz",
"product" : {
"id" : 3,
"name" : "D"
}
}
If you just need an output with the array filtered, skip the $arrayElemAt expression and use the $filter only:
db.user.aggregate([
{ "$match": { "product.id": 3 } },
{
"$project": {
"name": 1,
"age": 1,
"address": 1,
"product": {
"$filter": {
"input": "$product",
"as": "item",
"cond": { "$eq": [ "$$item.id", 3 ] }
}
}
}
}
])
Sample Output
{
"_id" : ObjectId("5829ac89628123dcf8a64b7a"),
"name" : "sufaid",
"age" : "22",
"address" : "zzzz",
"product" : [
{ "id" : 3, "name" : "D" }
]
}
db.User.find({},{product:{$elemMatch:{id:3}}})
it's enough

Return array of elements from multiple arrays

I got a collection of companies that looks like this. I also want to merge other documents deals.
I need this:
{
"_id" : ObjectId("561637942d25a7644cae993e"),
"locations" : [
{
"deals" : [
{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},
{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
}
]
}
],
"deals" : [
{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
}
]
}
{
"_id" : ObjectId("561637942d25a7644cae993e"),
"locations" : [
{
"deals" : [
{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9940")
}
]
}
],
"deals" : []
}
To be like this:
{
"deals": [{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
},{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
},{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9949")
}]
}
But I have only failed to do this. It seems like if I want all the deals to be grouped together into one array I should not use unwind since that create more documents because I only need to group once.
This is my attempt which does not work at all.
{
"$project": {
"_id": 1,
"locations": 1,
"deals": 1
}
}, {
"$unwind": "$locations"
}, {
"$unwind": "$locations.deals"
}, {
"$unwind": "$deals"
}, {
"$group": {
"_id": null,
"deals": {
"$addToSet": "$locations.deals",
"$addToSet": "$deals"
}
}
}
You should first use filter your documents to reduce the size of documents to process in the pipeline using the $match operator. Then we need to $unwind the "locations" array after that we use the $project operator to reshape your documents. The $cond operator is used to return a single element array [false] if the deals field is empty array or the deals value because $unwinding empty array will throw an exception. Of course the $setUnion operator does return an array of element that appear in the locations.deals array or the deals array. We then use the $setDifference operator to filter out the false element from the merged array. We then need another $unwind stage where we deconstruct the deals array. From there we can easily $group your documents.
db.collection.aggregate([
{ "$match": { "locations.0": { "$exists": true } } },
{ "$unwind": "$locations" },
{ "$project": {
"deals": {
"$setDifference": [
{ "$setUnion": [
{ "$cond": [
{ "$eq" : [ { "$size": "$deals" }, 0 ] },
[false],
"$deals"
]},
"$locations.deals"
]},
[false]
]
}
}},
{ "$unwind": "$deals" },
{ "$group": {
"_id": null,
"deals": { "$addToSet": "$deals" }
}}
])
Which returns:
{
"_id" : null,
"deals" : [
{
"name" : "1",
"_id" : ObjectId("561637942d25a7644cae9940")
},
{
"name" : "2",
"_id" : ObjectId("562f868ce73962c626a16b15")
},
{
"name" : "3",
"_id" : ObjectId("562f86ebe73962c626a16b17")
},
{
"name" : "4",
"_id" : ObjectId("561637942d25a7644cae9940")
}
]
}

Mongodb output field with multiple $cond

Here's an example of documents I use :
{
"_id" : ObjectId("554a1f5fe36a768b362ea5c0"),
"store_state" : 1,
"services" : [
{
"id" : "XXX",
"state" : 1,
"active": true
},
{
"id" : "YYY",
"state" : 1,
"active": true
},
...
]
}
I want to output a new field with "Y" if the id is "XXX" and active is true and "N" in any other cases. The service element with "XXX" as id is not present on every documents (output "N" in this case).
Here's my query for the moment :
db.stores.aggregate({
$match : {"store_state":1}
},
{ $project : {
"XXX_active": {
$cond: [ {
$and:[
{$eq:["services.$id","XXX"]},
{$eq:["services.$active",true]}
]},"Y","N"
] }
}
}).pretty()
But it always output "N" for "XXX_active" field.
The expected output I need is :
{
"_id" : ObjectId("554a1f5de36a768b362e7e6f"),
"XXX_active" : "Y"
},
{
"_id" : ObjectId("554a1f5ee36a768b362e9d25"),
"XXX_active" : "N"
},
{
"_id" : ObjectId("554a1f5de36a768b362e73a5"),
"XXX_active" : "Y"
}
Other example of possible result :
{
"_id" : ObjectId("554a1f5de36a768b362e7e6f"),
"XXX_active" : "Y",
"YYY_active" : "N"
},
{
"_id" : ObjectId("554a1f5ee36a768b362e9d25"),
"XXX_active" : "N",
"YYY_active" : "N"
},
{
"_id" : ObjectId("554a1f5de36a768b362e73a5"),
"XXX_active" : "Y",
"YYY_active" : "Y"
}
Only one XXX_active per object and no duplicates objects but I need all objects with an XXX_active even if the services id element "XXX" is not present. Could someone help please?
First $unwind services array and then used $cond as below :
db.stores.aggregate({
"$match": {
"store_state": 1
}
}, {
"$unwind": "$services"
}, {
"$project": {
"XXX_active": {
"$cond": [{
"$and": [{
"$eq": ["$services.id", "XXX"]
}, {
"$eq": ["$services.active", true]
}]
}, "Y", "N"]
}
}
},{"$group":{"_id":"$_id","XXX_active":{"$first":"$XXX_active"}}}) //group by id
The following aggregation pipeline will give the desired result. You would need to first apply the $unwind operator on the services array field first as your initial aggregation pipeline step. This will deconstruct the services array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
db.stores.aggregate([
{
"$match" : {"store_state": 1}
},
{
"$unwind": "$services"
},
{
"$project": {
"store_state" : 1,
"services": 1,
"XXX_active": {
"$cond": [
{
"$and": [
{"$eq":["$services.id", "XXX"]},
{"$eq":["$services.active",true]}
]
},"Y","N"
]
}
}
},
{
"$match": {
"services.id": "XXX"
}
},
{
"$group": {
"_id": {
"_id": "$_id",
"store_state": "$store_state",
"XXX_active": "$XXX_active"
},
"services": {
"$push": "$services"
}
}
},
{
"$project": {
"_id": "$_id._id",
"store_state" : "$_id.store_state",
"services": 1,
"XXX_active": "$_id.XXX_active"
}
}
])

Aggregating and comparing scores of teams using mongodb

I'm just starting out with mongodb and have been reading through the documentation on aggregation but still struggling to relate equivalent knowledge of sql statements to the methods used in mongo.
I have this data:
{
"_id" : ObjectId("53ac7bce4eaf6de4d5601c19"),
"uid" : ObjectId("53ac7bb84eaf6de4d5601c15"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 1
},{
"_id" : ObjectId("53ac7bce4eaf6de4d5601c1a"),
"uid" : ObjectId("53ac7bb84eaf6de4d5601c16"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 5
}
...
And I'm trying to get to this result:
{
"uid" : ObjectId("53ac7bb84eaf6de4d5601c15"),
"uid_2" : ObjectId("53ac7bb84eaf6de4d5601c16"),
"mid" : ObjectId("53ab27504eaf6de4d5601be4"),
"score" : 1,
"score_2" : 5,
"difference" : 4
}
...
Where I am comparing every uid against every other uid around a single mid and calculating the difference in their scores (can't be a negative difference, only positive).
Most of the examples I'm running into don't quite fit my requirements and hoping some mongo guru can help me out. Thanks!
As stated, I think your data modelling is a little off here as you need something to "pair" the "matches" as it were. I have a "simplified" case here:
{
"_id" : ObjectId("53ae9da2e24682cac4215e0c"),
"match" : ObjectId("53ae9d78e24682cac4215e0b"),
"score" : 1
}
{
"_id" : ObjectId("53ae9da5e24682cac4215e0d"),
"match" : ObjectId("53ae9d78e24682cac4215e0b"),
"score" : 5
}
{
"_id" : ObjectId("53aea6cde24682cac4215e15"),
"match" : ObjectId("53aea6c1e24682cac4215e14"),
"score" : 2
}
{
"_id" : ObjectId("53aea6e4e24682cac4215e16"),
"match" : ObjectId("53aea6c1e24682cac4215e14"),
"score" : 1
}
{
"_id" : ObjectId("53aea6eae24682cac4215e18"),
"match" : ObjectId("53aea6e6e24682cac4215e17"),
"score" : 2
}
{
"_id" : ObjectId("53aea6ece24682cac4215e19"),
"match" : ObjectId("53aea6e6e24682cac4215e17"),
"score" : 2
}
What that basically represents is the scores for "six" teams in "three" distinct matches.
Given that, my take on getting to results would be this:
db.matches.aggregate([
// Group on matches and find the "min" and "max" score
{ "$group": {
"_id": "$match",
"teams": {
"$push": {
"_id": "$_id",
"score": "$score"
}
},
"minScore": { "$min": "$score" },
"maxScore": { "$max": "$score" }
}},
// Unwind the "teams" array created
{ "$unwind": "$teams" },
// Compare scores for "win", "loss" or "draw"
{ "$group": {
"_id": "$_id",
"win": {
"$min": { "$cond": [
{ "$and": [
{ "$eq": [ "$teams.score", "$maxScore" ] },
{ "$gt": [ "$teams.score", "$minScore" ] }
]},
"$teams",
false
]}
},
"loss": {
"$min": { "$cond": [
{ "$and": [
{ "$eq": [ "$teams.score", "$minScore" ] },
{ "$lt": [ "$teams.score", "$maxScore" ] }
]},
"$teams",
false
]}
},
"draw": {
"$push": { "$cond": [
{ "$eq": [ "$minScore", "$maxScore" ] },
"$teams",
false
]}
},
"difference": {
"$max": { "$subtract": [ "$maxScore", "$minScore" ] }
}
}},
// Just fix up those "draw" results with a [false,false] array
{ "$project": {
"win": 1,
"loss": 1,
"draw": { "$cond": [
{ "$gt": [
{ "$size": { "$setDifference": [ "$draw", [false] ] } },
0
]},
"$draw",
false
]},
"difference": 1
}}
])
And this gives you a quite nice result:
{
"_id" : ObjectId("53ae9d78e24682cac4215e0b"),
"win" : {
"_id" : ObjectId("53ae9da5e24682cac4215e0d"),
"score" : 5
},
"loss" : {
"_id" : ObjectId("53ae9da2e24682cac4215e0c"),
"score" : 1
},
"draw" : false,
"difference" : 4
}
{
"_id" : ObjectId("53aea6c1e24682cac4215e14"),
"win" : {
"_id" : ObjectId("53aea6cde24682cac4215e15"),
"score" : 2
},
"loss" : {
"_id" : ObjectId("53aea6e4e24682cac4215e16"),
"score" : 1
},
"draw" : false,
"difference" : 1
}
{
"_id" : ObjectId("53aea6e6e24682cac4215e17"),
"win" : false,
"loss" : false,
"draw" : [
{
"_id" : ObjectId("53aea6eae24682cac4215e18"),
"score" : 2
},
{
"_id" : ObjectId("53aea6ece24682cac4215e19"),
"score" : 2
}
],
"difference" : 0
}
That is essentially the results per "match" and determines the "difference" between winner and looser while identifying which team "won" or "lost". The final stage there uses some operators only introduced in MongoDB 2.6, but that really is not necessary if you do not have that version available. Or you could actually still do the same thing if you wanted to by using $unwind and some other processing.

mongodb multiple aggregations in single operation

I have an item collection with following documents.
{ "item" : "i1", "category" : "c1", "brand" : "b1" }
{ "item" : "i2", "category" : "c2", "brand" : "b1" }
{ "item" : "i3", "category" : "c1", "brand" : "b2" }
{ "item" : "i4", "category" : "c2", "brand" : "b1" }
{ "item" : "i5", "category" : "c1", "brand" : "b2" }
I want to separate aggregation results --> count by category, count by brand. Please note, it is not count by (category,brand)
I am able to do this using map-reduce using following code.
map = function(){
emit({type:"category",category:this.category},1);
emit({type:"brand",brand:this.brand},1);
}
reduce = function(key, values){
return Array.sum(values)
}
db.item.mapReduce(map,reduce,{out:{inline:1}})
And the result is
{
"results" : [
{
"_id" : {
"type" : "brand",
"brand" : "b1"
},
"value" : 3
},
{
"_id" : {
"type" : "brand",
"brand" : "b2"
},
"value" : 2
},
{
"_id" : {
"type" : "category",
"category" : "c1"
},
"value" : 3
},
{
"_id" : {
"type" : "category",
"category" : "c2"
},
"value" : 2
}
],
"timeMillis" : 21,
"counts" : {
"input" : 5,
"emit" : 10,
"reduce" : 4,
"output" : 4
},
"ok" : 1,
}
I can get same results by firing two different aggregation commands as below.
db.item.aggregate({$group:{_id:"$category",count:{$sum:1}}})
db.item.aggregate({$group:{_id:"$brand",count:{$sum:1}}})
Is there anyway I can do the same using aggregation framework by single aggregation command.
I have simplified my case here, but in actual I need this grouping from fields in array of subdocuments. Assume the above is structure after I do unwind.
It is a real-time query (someone waiting for response), though on smaller dataset, so execution time is important.
I am using MongoDB 2.4.
Starting in Mongo 3.4, the $facet aggregation stage greatly simplifies this type of use case by processing multiple aggregation pipelines within a single stage on the same set of input documents:
// { "item" : "i1", "category" : "c1", "brand" : "b1" }
// { "item" : "i2", "category" : "c2", "brand" : "b1" }
// { "item" : "i3", "category" : "c1", "brand" : "b2" }
// { "item" : "i4", "category" : "c2", "brand" : "b1" }
// { "item" : "i5", "category" : "c1", "brand" : "b2" }
db.collection.aggregate(
{ $facet: {
categories: [{ $group: { _id: "$category", count: { "$sum": 1 } } }],
brands: [{ $group: { _id: "$brand", count: { "$sum": 1 } } }]
}}
)
// {
// "categories" : [
// { "_id" : "c1", "count" : 3 },
// { "_id" : "c2", "count" : 2 }
// ],
// "brands" : [
// { "_id" : "b1", "count" : 3 },
// { "_id" : "b2", "count" : 2 }
// ]
// }
Over a large data set I would say that your current mapReduce approach would be the best one, because the aggregation technique for this would not work well with large data. But possibly over a reasonably small size it might just be what you need:
db.items.aggregate([
{ "$group": {
"_id": null,
"categories": { "$push": "$category" },
"brands": { "$push": "$brand" }
}},
{ "$project": {
"_id": {
"categories": "$categories",
"brands": "$brands"
},
"categories": 1
}},
{ "$unwind": "$categories" },
{ "$group": {
"_id": {
"brands": "$_id.brands",
"category": "$categories"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.brands",
"categories": { "$push": {
"category": "$_id.category",
"count": "$count"
}},
}},
{ "$project": {
"_id": "$categories",
"brands": "$_id"
}},
{ "$unwind": "$brands" },
{ "$group": {
"_id": {
"categories": "$_id",
"brand": "$brands"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": null,
"categories": { "$first": "$_id.categories" },
"brands": { "$push": {
"brand": "$_id.brand",
"count": "$count"
}}
}}
])
Not really the same as the mapReduce output, you could throw in some more stages to change the output format, but this should be usable:
{
"_id" : null,
"categories" : [
{
"category" : "c2",
"count" : 2
},
{
"category" : "c1",
"count" : 3
}
],
"brands" : [
{
"brand" : "b2",
"count" : 2
},
{
"brand" : "b1",
"count" : 3
}
]
}
As you can see, this involves a fair bit of shuffling between arrays in order to group each set of either "category" or "brand" within the same pipeline process. Again I will say, this will not do well for large data, but for something like "items in an order" it would probably do nicely.
Of course as you say, you have simplified somewhat, so the first grouping key on null is either going to be something else or either narrowed down to do that null case by an earlier $match stage, which is probably what you want to do.