mongo - count return no docoument found instead of 0 - mongodb

In SQL query
select count(*) from table where id=1
would return 0 as result where there isn't any record with such id.
I would like to get exactly the same behavior but in mongo. Unfortunately I can only use aggregate function.
I was trying something like this
db.collection.aggregate([
{
"$match": {
"key": 1
}
},
{
$count: "s"
}
])
It works but only with records with key:1 but when this key does not exist there is "no document found"

You can use this aggregation query using $facet to create two possible ways: If document exists or if document does not exists.
First $facet to create the two ways
Into notFound way the result will always be {count: 0} ; into found way there is the match
Then $replaceRoot merging results to get desired value.
db.collection.aggregate([
{
"$facet": {
"notFound": [
{
"$project": {
"_id": 0,
"count": {
"$const": 0
}
}
},
{
"$limit": 1
}
],
"found": [
{
"$match": {
"key": 1
}
},
{
"$count": "count"
}
]
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
"$arrayElemAt": [
"$notFound",
0
]
},
{
"$arrayElemAt": [
"$found",
0
]
}
]
}
}
}
])
Example here where key exists and here where key doesn't exists.
Also I've tested with this using $ifNull instead of $mergeObjects and seem works ok too.

I think the right way to do it is with the driver code, if you get empty results you make that document {"count" : 0} you dont need i think to do anything in the database.
Another solution can be this (replace the 5 with the key value you want)
Test code here
creates 2 groups the matched(count>0) and the not matched(count=0)
sort by {"count" : -1}
take the first, if there was a match count will be the one matched,
else it will be 0
aggregate(
[ {
"$group" : {
"_id" : {
"$cond" : [ {"$eq" : [ "$key", 5 ]}, "$key", "not_match" ]
},
"count" : {
"$sum" : {"$cond" : [ {"$eq" : [ "$key", 5 ]}, 1, 0 ]}
}
}
},
{"$sort" : {"count" : -1}},
{
"$group" : {
"_id" : null,
"count" : {"$first" : "$count"}
}
},
{"$project" : {"_id" : 0}}
])

I did it by using $facet,$project and when there were no documents to project it was showing undefined, so I used $ifNull expression. I've kept zero value for replacement expression value (see the $ifNull docs).
db.collection.aggregate([
{
"$facet": {
"keyFound": [
{
"$match": {
"key": 1
}
},
{
"$count": "count"
}
]
}
},
{
"$project": {
"keyFoundCount": {
"$ifNull": [
{
"$arrayElemAt": [
"$keyFound.count",
0
]
},
0
]
}
}
}
])
testCodeHere

Related

MongoDB pipeline conditional counting

Documents in a collection contain title and active fields. The active field is boolean. My goal is to group by title and count all the records. Lastly, I want to count the documents where active is true.
This query does the counting, but total and active are always equal. Why isn't the conditional counting only the documents where active is true?
Here is my pipeline:
[
{
"$group" : {
"_id" : {
"student᎐campus᎐title" : "$student.campus.title"
},
"total" : {
"$sum" : NumberInt(1)
},
"active" : {
"$sum" : {
"$cond" : [
{
"active" : true
},
1.0,
0.0
]
}
}
}
}
]
Your code doesn't work because you are evaluating expression objects instead of operator expressions
Try below working version:
db.collection.aggregate([
{
"$group": {
"_id": "$title",
"total": {
"$sum": 1
},
"active": {
"$sum": {
"$cond": [
"$active",
1.0,
0.0
]
}
}
}
}
])
Here is the Mongo playground for your reference.
EDIT: thanks to #wernfriedDomscheit 's advice, here is a more concise version using $toInt for MongoDB v4.0+
db.collection.aggregate([
{
"$group": {
"_id": "$title",
"total": {
"$sum": 1
},
"active": {
"$sum": {
"$toInt": "$active"
}
}
}
}
])
Mongo playground

Check duplicates of certain field for documents array with inner array

I have 2 objects,
{
_id: ObjectId("5cd9010310b80b3e38cd3f88")
subGroup: [
bookList: [
{
title: "A good book",
id: "abc123"
}
]
]
}
{
_id: ObjectId("5cd9010710b80b3e38cd3f89")
subGroup: [
bookList: [
{
title: "A good book",
id: "abc123"
}
]
These are 2 different objects. I would like to detect the occurence of these 2 objects where the title is duplicated (eg the same).
I tried this query
db.scope.aggregate({"$unwind": "$subGroup.bookList"}, {"$group" : { "_id": "$title", "count": { "$sum": 1 } } }, {"$match": {"id" :{ "$ne" : null } , "count" : {"$gt": 1} } })
which i looked at other threads on stackoverflow. However, it does not return me anything. How can i solve this?
There are few issues here:
$unwind should be run on subGroup and on subGroup.bookList separately
when specifying _id for $group stage you should use full path (subGroup.bookList.title)
in your $match stage you want to check if _id (not id) is $ne null
Try:
db.col.aggregate([
{"$unwind": "$subGroup"},
{"$unwind": "$subGroup.bookList"},
{"$group" : { "_id": "$subGroup.bookList.title", "count": { "$sum": 1 } } },
{"$match": { "_id" :{ "$ne" : null } , "count" : { "$gt": 1} } }
])
Mongo playground

How to find percentage of grouping containing a specific word

I am trying to calculate the percentage of listings in a MongoDB that contain a specific word grouped by a collection's object.
I have managed to group the count of listings containing the word but not the percentage on the total count of each group's listings.
My collection looks like this:
{
"_id" : "103456",
"metadata" : {
"type" : "Bike",
"brand" : "Siamoto",
"model" : "Siamoto vespa '01 - € 550 EUR (Negotiable)"
}
},
{
"_id" : "103457",
"metadata" : {
"type" : "Bike",
"brand" : "BMW",
"model" : "BMW ADFR '06 - € 5680 EUR"
}
}
I want to project the percentage of ads per metadata.brand that contain the word "Negotiable" in metadata.model.
I have used for the count something like:
db.advertisements.aggregate([
{ $match: { $text: { $search: "Negotiable" } } },
{ $group: { _id: "$metadata.brand", Count: { $sum: 1} } }
])
and it worked but I can't find a workaround for the percentage. Thanks to all
For what you are trying to do, using a $text search or even a $regex is the wrong approach. All these can do is return the "matching" documents only from within the collection.
Using Aggregate to Count String Matches
Whist not as flexible as a regular expression ( and sadly there is no aggregation operator equivalent at this time, but there will be in future releases. See SERVER-11947 ) the better option is to use $indexOfCP in order to match the occurrence of the "string" and then count those against the "total counts" from each grouping:
db.advertisements.aggregate([
{ "$group": {
"_id": "$metadata.brand",
"totalCount": { "$sum": 1 },
"matchedCount": {
"$sum": {
"$cond": [{ "$ne": [{ "$indexOfCP": [ "$metadata.model", "Negotiable" ] }, -1 ] }, 1, 0]
}
}
}},
{ "$addFields": {
"percentage": {
"$cond": {
"if": { "$ne": [ "$matchedCount", 0 ] },
"then": {
"$multiply": [
{ "$divide": [ "$matchedCount", "$totalCount" ] },
100
]
},
"else": 0
}
}
}},
{ "$sort": { "percentage": -1 } }
])
And the results:
{ "_id" : "Siamoto", "totalCount" : 1, "matchedCount" : 1, "percentage" : 100 }
{ "_id" : "BMW", "totalCount" : 1, "matchedCount" : 0, "percentage" : 0 }
Note that the $group is used for the accumulation of both the total documents found within the "brand" as well as those where the string was matched. The $cond operator used here is a "ternary" or if/then/else statement which evaluates a boolean expression and then returns either one value where true or another where false. In this case the $indexOfCP NOT returning the -1 value or "not found".
The "percentage" is actually done in a separate stage, which in this case we use $addFields to add the "additional field". The operation is basically a $divide over the two accumulated values from the previous stage. The $cond is just applied to avoid "divide by 0" errors and the $multiply is just moving the decimal places into something that looks more like a "percentage". But the basic premise is such calculations which require "totals" to be accumulated first will always be a manipulation in a "later stage".
MongoDB 4.2 (proposed) Preview
FYI, on the current "unfinalized" syntax for $regexFind from MongoDB 4.2 (proposed, and yet to be finalized if included in that release ) and onwards this would be something like:
db.advertisements.aggregate([
{ "$group": {
"_id": "$metadata.brand",
"totalCount": { "$sum": 1 },
"matchedCount": {
"$sum": {
"$cond": {
"if": {
"$ne": [
{ "$regexFind": {
"input": "$metadata.model",
"regex": /Negotiable/i
}},
null
]
},
"then": 1,
"else": 0
}
}
}
}},
{ "$addFields": {
"percentage": {
"$cond": {
"if": { "$ne": [ "$matchedCount", 0 ] },
"then": {
"$multiply": [
{ "$divide": [ "$matchedCount", "$totalCount" ] },
100
]
},
"else": 0
}
}
}},
{ "$sort": { "percentage": -1 } }
])
Again noting strongly that the "current" implementation may be subject to change by the time it is released. This is how it works on the current 4.1.9-17-g0a856820ba development release.
Using MapReduce
An alternate approach where either your MongoDB version does not support $indexOfCP OR you need more flexibility in how you "match the string" is to use mapReduce for the aggregation instead:
db.advertisements.mapReduce(
function() {
emit(this.metadata.brand, {
totalCount: 1,
matchedCount: (/Negotiable/i.test(this.metadata.model)) ? 1 : 0
});
},
function(key,values) {
var obj = { totalCount: 0, matchedCount: 0 };
values.forEach(value => {
obj.totalCount += value.totalCount;
obj.matchedCount += value.matchedCount;
});
return obj;
},
{
"out": { "inline": 1 },
"finalize": function(key,value) {
value.percentage = (value.matchedCount != 0)
? (value.matchedCount / value.totalCount) * 100
: 0;
return value;
}
}
)
This has a similar result, but in a very "mapReduce" specific way:
{
"_id" : "BMW",
"value" : {
"totalCount" : 1,
"matchedCount" : 0,
"percentage" : 0
}
},
{
"_id" : "Siamoto",
"value" : {
"totalCount" : 1,
"matchedCount" : 1,
"percentage" : 100
}
}
The logic is pretty much the same. We "emit" using the "key" for the "brand" and then use another ternary to determine whether to count a "match" or not. In this case a regular expression test() operation, and even using "case insensitive" matching as an example.
The "reducer" part simply accumulates the values that were emitted, and the finalize function is where the "percentage" is returned by the same division and multiplication process.
The main difference between the two other than basic capabilities is that the mapReduce cannot do "further things" beyond the accumulation and basic manipulation in the finalize. The "sorting" demonstrated in the aggregation pipeline cannot be done with mapReduce without outputting to a separate collection and doing a separate find() and sort() on those documents contained.
Either way works, and it just depends on your needs and the capabilities of what you have available. Of course any aggregate() approach will be much faster than using the JavaScript evaluation of mapReduce. So you probably want aggregate() as your preference where possible.

MongoDB aggregate count based on multiple query fields - (Multiple field count)

My collection will look this,
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "xxx",
"salary" : 10000,
"type" : "type1"
}
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "aaa",
"salary" : 10000,
"type" : "type2"
}
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "ccc",
"salary" : 10000,
"type" : "type2"
}
My query params will be coming as,
{salary=10000, type=type2}
so based on the query I need to fetch the count of above query params
The result should be something like this,
{ category: 'type1', count: 500 } { category: 'type2', count: 200 } { category: 'name', count: 100 }
Now I am getting count by hitting three different queries and constructing the result (or) server side iteration I can get the result.
Can anyone suggest or provide me good way to get above result
Your quesstion is not very clearly presented, but what it seems you wanted to do here was count the occurances of the data in the fields, optionally filtering those fields by the values that matches the criteria.
Here the $cond operator allows you to tranform a logical condition into a value:
db.collection.aggregate([
{ "$group": {
"_id": null,
"name": { "$sum": 1 },
"salary": {
"$sum": {
"$cond": [
{ "$gte": [ "$salary", 1000 ] },
1,
0
]
}
},
"type": {
"$sum": {
"$cond": [
{ "$eq": [ "$type", "type2" ] },
1,
0
]
}
}
}}
])
All values are in the same document, and it does not really make any sense to split them up here as this is additional work in the pipeline.
{ "_id" : null, "name" : 3, "salary" : 3, "type" : 2 }
Otherwise in the long form, which is not very performant due to needing to make a copy of each document for every key looks like this:
db.collection.aggregate([
{ "$project": {
"name": 1,
"salary": 1,
"type": 1,
"category": { "$literal": ["name","salary","type"] }
}},
{ "$unwind": "$category" },
{ "$group": {
"_id": "$category",
"count": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$category", "name"] },
{ "$ifNull": [ "$name", false ] }
]},
1,
{ "$cond": [
{ "$and": [
{ "$eq": [ "$category", "salary" ] },
{ "$gte": [ "$salary", 1000 ] }
]},
1,
{ "$cond": [
{ "$and": [
{ "$eq": [ "$category", "type" ] },
{ "$eq": [ "$type", "type2" ] }
]},
1,
0
]}
]}
]
}
}
}}
])
And it's output:
{ "_id" : "type", "count" : 2 }
{ "_id" : "salary", "count" : 3 }
{ "_id" : "name", "count" : 3 }
If your documents do not have uniform key names or otherwise cannot specify each key in your pipeline condition, then apply with mapReduce instead:
db.collection.mapReduce(
function() {
var doc = this;
delete doc._id;
Object.keys(this).forEach(function(key) {
var value = (( key == "salary") && ( doc[key] < 1000 ))
? 0
: (( key == "type" ) && ( doc[key] != "type2" ))
? 0
: 1;
emit(key,value);
});
},
function(key,values) {
return Array.sum(values);
},
{
"out": { "inline": 1 }
}
);
And it's output:
"results" : [
{
"_id" : "name",
"value" : 3
},
{
"_id" : "salary",
"value" : 3
},
{
"_id" : "type",
"value" : 2
}
]
Which is basically the same thing with a conditional count, except that you only specify the "reverse" of the conditions you want and only for the fields you want to filter conditions on. And of course this output format is simple to emit as separate documents.
The same approach applies where to test the condition is met on the fields you want conditions for and return 1 where the condition is met or 0 where it is not for the summing the count.
You can use aggregation as following query:
db.collection.aggregate({
$match: {
salary: 10000,
//add any other condition here
}
}, {
$group: {
_id: "$type",
"count": {
$sum: 1
}
}
}, {
$project: {
"category": "$_id",
"count": 1,
_id: 0
}
}

MongoDb aggregate and group by two fields depending on values

I want to aggregate over a collection where a type is given. If the type is foo I want to group by the field author, if the type is bar I want to group by user.
All this should happen in one query.
Example Data:
{
"_id": 1,
"author": {
"someField": "abc",
},
"type": "foo"
}
{
"_id": 2,
"author": {
"someField": "abc",
},
"type": "foo"
}
{
"_id": 3,
"user": {
"someField": "abc",
},
"type": "bar"
}
This user field is only existing if the type is bar.
So basically something like that... tried to express it with an $or.
function () {
var results = db.vote.aggregate( [
{ $or: [ {
{ $match : { type : "foo" } },
{ $group : { _id : "$author", sumAuthor : {$sum : 1} } } },
{ { $match : { type : "bar" } },
{ $group : { _id : "$user", sumUser : {$sum : 1} } }
} ] }
] );
return results;
}
Does someone have a good solution for this?
I think it can be done by
db.c.aggregate([{
$group : {
_id : {
$cond : [{
$eq : [ "$type", "foo"]
}, "author", "user"]
},
sum : {
$sum : 1
}
}
}]);
The solution below can be cleaned up a bit...
For "bar" (note: for "foo", you have to change a bit)
db.vote.aggregate(
{
$project:{
user:{ $ifNull: ["$user", "notbar"]},
type:1
}
},
{
$group:{
_id:{_id:"$user.someField"},
sumUser:{$sum:1}
}
}
)
Also note: In you final answer, anything that is not of type "bar" will have an _id=null
What you want here is the $cond operator, which is a ternary operator returning a specific value where the condition is true or false.
db.vote.aggregate([
{ "$group": {
"_id": null,
"sumUser": {
"$sum": {
"$cond": [ { "$eq": [ "$type", "user" ] }, 1, 0 ]
}
},
"sumAuhtor": {
"$sum": {
"$cond": [ { "$eq": [ "$type", "auhtor" ] }, 1, 0 ]
}
}
}}
])
This basically tests the "type" of the current document and decides whether to pass either 1 or 0 to the $sum operation.
This also avoids errant grouping should the "user" and "author" fields contain the same values as they do in your example. The end result is a single document with the count of both types.