mongodb $aggregate empty array and multiple documents - mongodb

mongodb has below document:
> db.test.find({name:{$in:["abc","abc2"]}})
{ "_id" : 1, "name" : "abc", "scores" : [ ] }
{ "_id" : 2, "name" : "abc2", "scores" : [ 10, 20 ] }
I want get scores array length for each document, how should I do?
Tried below command:
db.test.aggregate({$match:{name:"abc2"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Result:
{ "_id" : null, "count" : 2 }
But below command:
db.test.aggregate({$match:{name:"abc"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Return Nothing. Question:
How should I get each lenght of scores in 2 or more document in one
command?
Why the result of second command return nothing? and how
should I check if the array is empty?

So this is actually a common problem. The result of the $unwind phase in an aggregation pipeline where the array is "empty" is to "remove" to document from the pipeline results.
In order to return a count of "0" for such an an "empty" array then you need to do something like the following.
In MongoDB 2.6 or greater, just use $size:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$group": {
"_id": null,
"count": { "$sum": { "$size": "$scores" } }
}}
])
In earlier versions you need to do this:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$project": {
"name": 1,
"scores": {
"$cond": [
{ "$eq": [ "$scores", [] ] },
{ "$const": [false] },
"$scores"
]
}
}},
{ "$unwind": "$scores" },
{ "$group": {
"_id": null,
"count": { "$sum": {
"$cond": [
"$scores",
1,
0
]
}}
}}
])
The modern operation is simple since $size will just "measure" the array. In the latter case you need to "replace" the array with a single false value when it is empty to avoid $unwind "destroying" this for an "empty" statement.
So replacing with false allows the $cond "trinary" to choose whether to add 1 or 0 to the $sum of the overall statement.
That is how you get the length of "empty arrays".

To get the length of scores in 2 or more documents you just need to change the _id value in the $group pipeline which contains the distinct group by key, so in this case you need to group by the document _id.
Your second aggregation returns nothing because the $match query pipeline passed a document which had an empty scores array. To check if the array is empty, your match query should be
{'scores.0': {$exists: true}} or {scores: {$not: {$size: 0}}}
Overall, your aggregation should look like this:
db.test.aggregate([
{ "$match": {"scores.0": { "$exists": true } } },
{ "$unwind": "$scores" },
{
"$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}
}
])

Related

Get unique array elements in MongoDB by "master" element

I have mongodb rows with array element which looks like this:
{"data" : [1, 111]}
{"data" : [222, 1]}
{"data" : [1, 333]}
{"data" : [2, 444]}
How to get unique array elements by "master" element. So for example "master" element is 1 I should get result: [111, 222, 333] and not 444, because that array does not contain 1. If master element would be 2, the result should be: [444]
I tried something like this aggregation. Is it correct? Are there any performance issues? What indecies should be on table to make it fast?
[
{$match: {"data": 1}},
{$project : {a : '$data'}},
{$unwind: '$a'},
{$group: {_id: 'a', items: {$addToSet: '$a'}}}
]
You can use Aggregation framework:
$match to filter all documents that have "master" key in the "data" array.
$group to concatenate "data" arrays of all documents in one property called "result" and $filter to filter our "master" element from "data" arrays. ("result" will be an array that will have all documents "data" arrays as elements).
$reduce with $concatArrays to concatenate all "data" arrays inside "result" property.
db.collection.aggregate([
{
"$match": {
data: 1
}
},
{
"$group": {
"_id": null,
result: {
$addToSet: {
"$filter": {
"input": "$data",
"cond": {
"$ne": [
"$$this",
1
]
}
}
}
}
}
},
{
"$project": {
result: {
$reduce: {
input: "$result",
initialValue: [],
in: {
$concatArrays: [
"$$value",
"$$this"
]
}
}
}
}
}
])
Be aware that the "master" element has to be dynamically populated in first stage for $match pipeline, as well as in the second stage when performing filtering with $filter operator.
Here is the working example: https://mongoplayground.net/p/EtYwOqAE-PE
I think this works also
Test code here
keeps only the arrays that contain the master key
unwind them
group by {"_id" 1} is like group by null, all make it true, just added to have the master key as _id (on the group $$REMOVE system variable is used to not add the master key)
Query (where you see 1 put your master key, or a variable)
db.collection.aggregate([
{
"$match": {
"data": 1
}
},
{
"$unwind": {
"path": "$data"
}
},
{
"$group": {
"_id": 1,
"members": {
"$addToSet": {
"$cond": [
{
"$ne": [
"$data",
1
]
},
"$data",
"$$REMOVE"
]
}
}
}
}
])

MongoDB: Null check in between Pipeline Stages

If I create a collection like so:
db.People.insert({"Name": "John"})
and run a simple mongo aggregate, like so:
db.People.aggregate([{$match: {Name: "John"}}, {$group: {_id: "null", count: {$sum: 1}}}])
This counts all the Johns in the collection and returns this
{ "_id" : "null", "count" : 1 }
Which is nice. But if I search for the name "Clarice" that does not exist at all, it returns null.
I would like it to return
{ "_id" : "null", "count" : 0 }
I have not found a way to achieve this. I would have to include some kind of null-check between the $match- and $group-stage.
Have have to use $facet aggregation along with the operator $ifNull. e.g:
db.People.aggregate([
{ "$facet": {
"array": [
{ "$match": { Name:"John" }},
{ "$group": {
"_id": null,
"count": { "$sum": 1 }
}},
{ "$project": { "_id": 0, "count": 1 }}
]
}},
{ "$project": {
"count": {
"$ifNull": [{ "$arrayElemAt": ["$array.count", 0] }, 0 ]
}
}}
])
Output:
{ "count" : 1 }
For other name, it should be as follow:
{ "count" : 0 }
Similar ans at $addFields when no $match found
Simply use count
db. People.count({Name:"John"})
This will return the exact number.
Otherwise You need to check the result wether it is a empty array. Below are the code for node using loopback,
db.People.aggregate([
{$match: {Name: "John"}},
{$group: {_id: "null", count: {$sum: 1}}}
],(err,res)=>{
if(err) return cb(err)
if(res.length) return cb(err,res)
else return cb(err,{_id:null,count:0})
})
You can use $ifNull in your $match stage.
If you can provide an collecion of examples it's more easy to elaborare an answer on it.
Edit: if you group by Name, result for "John" is one, for "Clarice" is an empty array that is correct, here the aggregation query:
db.People.aggregate([
{
$match: { Name: "John" }
},
{
$group: { _id: "$Name", count: { $sum: 1 } }
}
])

Group sums of an attribute by the values of another array attribute

I have a collection "tagsCount" that looks like that:
{
"_id" : ObjectId("59e3a46a48507851d411ad78"),
"tags" : [ "Marketing" ],
"cpt" : 14354
},
{
"_id" : ObjectId("59e3a46a48507851d411ad79"),
"tags" : [
"chatbot",
"Content marketing",
"Intelligence artificielle",
"Marketing digital",
"Personnalisation"
],
"cpt" : 9037
}
Of course there are many more lines.
I want to get the sum of "cpt" grouped by the values of "tags".
I have come up with that:
db.tagsCount.aggregate([
{ "$project": { "tags":1 }},
{ "$unwind": "$tags"},
{ "$group": {
"_id" : "$tags",
cpt : "$cpt" ,
"count": { "$sum": "$cpt" }
}}
])
But that doesn't do the trick, I have the list of all different tags and the count have a value a 0.
Is it possible to do what I want?
The problem is that your aggregation pipeline starts with $project which selects only tags to the next stages and that's why you're executing $group on documents without cpt. Here's my working example:
db.tagsCount.aggregate([
{ "$unwind": "$tags"},
{ "$group": {
"_id": "$tags",
"count": { "$sum": "$cpt" }
}},
{ "$project": { "tag": "$_id", "_id": 0, "count": 1 }}
])

Mongodb aggregation, finding within an array of values

I have a schemea that creates documents using the following structure:
{
"_id" : "2014-07-16:52TEST",
"date" : ISODate("2014-07-16T23:52:59.811Z"),
"name" : "TEST"
"values" : [
[
1405471921000,
0.737121
],
[
1405471922000,
0.737142
],
[
1405471923000,
0.737142
],
[
1405471924000,
0.737142
]
]
}
In the values, the first index is a timestamp. What I'm trying to do is query a specific timestamp to find the closest value ($gte).
I've tried the following aggregate query:
[
{ "$match": {
"values": {
"$elemMatch": { "0": {"$gte": 1405471923000} }
},
"name" : 'TEST'
}},
{ "$project" : {
"name" : 1,
"values" : 1
}},
{ "$unwind": "$values" },
{ "$match": { "values.0": { "$gte": 1405471923000 } } },
{ "$limit" : 1 },
{ "$sort": { "values.0": -1 } },
{ "$group": {
"_id": "$name",
"values": { "$push": "$values" },
}}
]
This seems to work, but it doesn't pull the closest value. It seems to pull anything greater or equal to and the sort doesn't seem to get applied, so it will pull a timestamp that is far in the future.
Any suggestions would be great!
Thank you
There are a couple of things wrong with the approach here even though it is a fair effort. You are right that you need to $sort here, but the problem is that you cannot "sort" on an inner element with an array. In order to get a value that can be sorted you must $unwind the array first as it otherwise will not sort on an array position.
You also certainly do not want $limit in the pipeline. You might be testing this against a single document, but "limit" will actually act on the entire set of documents in the pipeline. So if more than one document was matching your condition then they would be thrown away.
The key thing you want to do here is use $first in your $group stage, which is applied once you have sorted to get the "closest" element that you want.
db.collection.aggregate([
// Documents that have an array element matching the condition
{ "$match": {
"values": { "$elemMatch": { "0": {"$gte": 1405471923000 } } }
}},
// Unwind the top level array
{ "$unwind": "$values" },
// Filter just the elements that match the condition
{ "$match": { "values.0": { "$gte": 1405471923000 } } },
// Take a copy of the inner array
{ "$project": {
"date": 1,
"name": 1,
"values": 1,
"valCopy": "$values"
}},
// Unwind the inner array copy
{ "$unwind": "$valCopy" },
// Filter the inner elements
{ "$match": { "valCopy": { "$gte": 1405471923000 } }},
// Sort on the now "timestamp" values ascending for nearest
{ "$sort": { "valCopy": 1 } },
// Take the "first" values
{ "$group": {
"_id": "$_id",
"date": { "$first": "$date" },
"name": { "$first": "$name" },
"values": { "$first": "$values" },
}},
// Optionally push back to array to match the original structure
{ "$group": {
"_id": "$_id",
"date": { "$first": "$date" },
"name": { "$first": "$name" },
"values": { "$push": "$values" },
}}
])
And this produces your document with just the "nearest" timestamp value matching the original document form:
{
"_id" : "2014-07-16:52TEST",
"date" : ISODate("2014-07-16T23:52:59.811Z"),
"name" : "TEST",
"values" : [
[
1405471923000,
0.737142
]
]
}

Mongodb $cond in aggregation framework

I have a collection with documents that look like the following:
{
ipAddr: '1.2.3.4',
"results" : [
{
"Test" : "Sight",
"Score" : "FAIL",
"Reason" : "S1002"
},
{
"Test" : "Speed",
"Score" : "FAIL",
"Reason" : "85"
},
{
"Test" : "Sound",
"Score" : "FAIL",
"Reason" : "A1001"
}
],
"finalGrade" : "FAILED"
}
Here's the aggregation query I'm trying to write, what I want to do (see commented out piece), is to create a grouped field, per ipAddr, of the
'Reason / Error' code, but only if the Reason code begins with a specific letter, and only add the code in once, I tried the following:
db.aggregate([
{$group:
{ _id: "$ipAddr",
attempts: {$sum:1},
results: {$push: "$finalGrade"},
// errorCodes: {$addToSet: {$cond: ["$results.Reason": /[A|B|S|N.*/, "$results.Reason", ""]}},
finalResult: {$last: "$finalGrade"} }
}
]);
Everything works, excluding the commented out 'errorCodes' line. The logic I'm attempting to create is:
"Add the the errorCodes set the value of the results.Reason code IF it begins with an A, B, S, or N, otherwise there is nothing to add".
For the Record above, the errorCodes set should contain:
...
errorCodes: [S1002,A1001],
...
$group cannot take conditional expressions, which is why that line is not working. $project is the phase where you can transform the original document based on $conditional expressions (among other things).
You need two steps in the aggregation pipeline before you can $group - first you need to $unwind the results array, and next you need to $match to filter out the results you don't care about.
That would do the simple thing of just throwing out the results with error codes you don't care about keeping, but it sounds like you want to count the total number of failures including all error codes, but then only add particular ones to the output array? There isn't a straight-forward way to do that, you would have to make two $group $unwind passes in the pipeline.
Something similar to this will do it:
db.aggregate([
{$unwind : "$results"},
{$group:
{ _id: "$ipAddr",
attempts: {$sum:1},
results: {$push : "$results"},
finalGrade: {$last : "$finalGrade" }
}
},
{$unwind: "$results"},
{$match: {"results.Reason":/yourMatchExpression/} },
{$group:
{ _id: "$ipAddr",
attempts: {$last:"$attempts"},
errorCodes: {$addToSet: "$results.Reason"},
finalResult: {$last: "$finalGrade"}
}
]);
If you only want to count attempts that have the matching error code then you can do that with a single $group - you will need to do $unwind, $match and $group. You could use $project with $cond as you had it, but then your array of errorCodes will have an empty string entry along with all the proper error codes.
As of Mongo 2.4, $regex can be used for pattern matching, but not as an expression returning a boolean, which is what's required by $cond
Then, you can either use a $match operator to use the $regex keyword:
http://mongotry.herokuapp.com/#?bookmarkId=52fb39e207fc4c02006fcfed
[
{
"$unwind": "$results"
},
{
"$match": {
"results.Reason": {
"$regex": "[SA].*"
}
}
},
{
"$group": {
"_id": "$ipAddr",
"attempts": {
"$sum": 1
},
"results": {
"$push": "$finalGrade"
},
"undefined": {
"$last": "$finalGrade"
},
"errorCodes": {
"$addToSet": "$results.Reason"
}
}
}
]
or you can use $substr as your pattern matching is very simple
http://mongotry.herokuapp.com/index.html#?bookmarkId=52fb47bc7f295802001baa38
[
{
"$unwind": "$results"
},
{
"$group": {
"_id": "$ipAddr",
"errorCodes": {
"$addToSet": {
"$cond": [
{
"$or": [
{
"$eq": [
{
"$substr": [
"$results.Reason",
0,
1
]
},
"A"
]
},
{
"$eq": [
{
"$substr": [
"$results.Reason",
0,
1
]
},
"S"
]
}
]
},
"$results.Reason",
"null"
]
}
}
}
}
]