Check if embed exists - aggregation framework mongodb - mongodb

This is my test collection:
>db.test.find()
{
"_id": ObjectId("54906479e89cdf95f5fb2351"),
"reports": [
{
"desc": "xxx",
"order": {"$id": ObjectId("53fbede62827b89e4f86c12e")}
}
]
},
{
"_id": ObjectId("54906515e89cdf95f5fb2352"),
"reports": [
{
"desc": "xxx"
}
]
},
{
"_id": ObjectId("549067d3e89cdf95f5fb2353"),
"reports": [
{
"desc": "xxx"
}
]
}
I want to count all documents and documents with order, so:
>db.test.aggregate({
$group: {
_id: null,
all: {
$sum: 1
},
order: {
$sum: {
"$cond": [
{
"$ifNull": ["$reports.order", false]
},
1,
0
]
}
}
}
})
and my results:
{
"result" : [
{
"_id" : null,
"all" : 3,
"order" : 3
}
],
"ok" : 1
}
but expected:
{
"result" : [
{
"_id" : null,
"all" : 3,
"order" : 1
}
],
"ok" : 1
}
It makes no difference what I'll put - "$reports.order", "$reports.xxx", etc, aggregation framework check only if the field reports exists, ignores embed.
$ifNull and $eq dosn't work with embeded documents?
Is any way to do something like this
db.test.find({"reports.order": {$exists: 1}})
in aggregation framework?
Sorry for my english and I hope that you understood what I want to show you :)

I think it doesn't work because the field "reports" contain an array, not an object.
I mean, your aggregation works as you expect in this collection:
>db.test.find()
{
"_id": ObjectId("54906479e89cdf95f5fb2351"),
"reports":
{
"desc": "xxx",
"order": {"$id": ObjectId("53fbede62827b89e4f86c12e")}
}
},
{
"_id": ObjectId("54906515e89cdf95f5fb2352"),
"reports":
{
"desc": "xxx"
}
},
{
"_id": ObjectId("549067d3e89cdf95f5fb2353"),
"reports":
{
"desc": "xxx"
}
}
Note that I removed the "[" and "]", so now it's an object, not an array (one-to-one relation).
Because you have array inside the "report" field, you need to unwind the array to output one document for each element. I suppose that if you have two "order" fields inside the "reports" array, you only wants to count it once. I mean:
"reports": [
{
"desc": "xxx",
"order": {"$id": ObjectId("53fbede62827b89e4f86c12e")},
"order": "yyy",
}
]
Should only count as one for the object final "order" sum.
In this case, you need to unwind, group by _id (because the previous example outputs two documents for the same _id) and then group again to count all documents:
db.test.aggregate([
{$unwind: '$reports'},
{$group:{
_id:"$_id",
order:{$sum:{"$cond": [
{
"$ifNull": ["$reports.order", false]
},
1,
0
]
}
}
}},
{$group:{
_id:null,
all:{$sum:1},
order: {
$sum:{
"$cond": [{$eq: ['$order', 0]}, 0, 1]
}
}
}}])
Maybe there is a shorter solution, but this works.

Related

MongoDb sum issue after match and group

Suppose I have document as userDetails:
[
{
"roles": [
"author",
"reader"
],
"completed_roles": ["author", "reader"],
"address": {
"current_address": {
"city": "abc"
}
},
"is_verified": true
},
{
"roles": [
"reader"
],
"completed_roles": ["reader"],
"address": {
"current_address": {
"city": "abc"
}
},
"is_verified": true
},
{
"roles": [
"author"
],
"completed_roles": [],
"address": {
"current_address": {
"city": "xyz"
}
},
"is_verified": false
}
]
I want to fetch sum for all roles which has author based on city, total_roles_completed and is_verified.
So the O/P should look like:
[
{
"_id": {
"city": "abc"
},
"total_author": 1,
"total_roles_completed": 1,
"is_verified": 1
},
{
"_id": {
"city": "xyz"
},
"total_author": 1,
"total_roles_completed": 0,
"is_verified": 0
}
]
Basic O/P required:
Filter the document based on author in role (other roles may be present in role but author must be present)
Sum the author based on city
sum on basis of completed_profile has "author"
Sum on basis of documents if they are verified.
For this I tried as:
db.userDetails.aggregate([
{
$match: {
roles: {
$eleMatch: {
$eq: "author"
}
}
}
},
{
$unwind: "$completed_roles"
},
{
"$group": {
_id: { city: "$address.current_address.city"},
total_authors: {$sum: 1},
total_roles_completed: {
$sum: {
$cond: [
{
$eq: ["$completed_roles","author"]
}
]
}
},
is_verified: {
$sum: {
$cond: [
{
$eq: ["$is_verified",true]
}
]
}
}
}
}
]);
But the sum is incorrect. Please let me know where I made mistake. Also, if anyone needs any further information please let me know.
Edit: I figured that because of unwind it is giving me incorrect value, if I remove the unwind the sum is coming correct.
Is there any other way by which I can calculate the sum of total_roles_completed for each city?
If I've understood correctly you can try this query:
First $match to get only documents where roles contains author.
And then $group by the city (the document is not a valid JSON so I assume is address:{"current_addres:{city:"abc"}}). This $group get the authors for each city and also: $sum 1 if "author" is in completed_roles and check if is verified.
Here I don't know the way to know if the author is verified (I don't know if can be true in one document and false in other document. If is the same value over all documents you can use $first to get the first is_verified value). But I decided to use $allElementsTrue in a $project stage, so this only will be true if is_verified is true in all documents grouped by $group.
db.collection.aggregate([
{
"$match": {
"roles": "author"
}
},
{
"$group": {
"_id": "$address.current_address.city",
"total_author": {
"$sum": 1
},
"total_roles_completed": {
"$sum": {
"$cond": {
"if": {
"$in": [
"author",
"$completed_roles"
]
},
"then": 1,
"else": 0
}
}
},
"is_verified": {
"$addToSet": "$is_verified"
}
}
},
{
"$project": {
"_id": 0,
"city": "$_id",
"is_verified": {
"$allElementsTrue": "$is_verified"
},
"total_author": 1,
"total_roles_completed": 1
}
}
])
Example here
The result from this query is:
[
{
"city": "xyz",
"is_verified": false,
"total_author": 1,
"total_roles_completed": 0
},
{
"city": "abc",
"is_verified": true,
"total_author": 2,
"total_roles_completed": 2
}
]

$merge whenMatched pipeline

I have been looking at $merge and Variables in Aggregation Expressions but I am struggling to understand. What I would like to do in a very general sense is take two collections, match them on the unique "Role ID" field and see if they are exactly the same or not. If they are the same I want to update the "Status" field to "Updated".
Where I am struggling is on the whenMatched pipeline. I am not sure how to target the "new" and "old" document for the $cmp expression. I am also not tied to this approach. I feel like $mergeObjects could be used as well. I appreciate the help.
const mergePipeline = [
{'$unset': "_id"},
{'$addFields' : {"Status" : "New"}},
{'$merge' : {
into: "previous",
on: "Role ID",
whenMatched: [
// compare the documents with $cmp <-- is it possible to only compare a few fields without unsetting them?
// if different replace root with "new" document
// change status to "updated"
],
whenNotMatched: "insert"
}}
];
db.current.aggregate(mergePipeline);
Coll1 (we aggregate on this one)
[
{
"role_id": 1,
"a": 2
},
{
"role_id": 2,
"a": 3
},
{
"role_id": 3,
"a": 20
}
]
Coll2 (the one in the disk)
first should be updated (= roots)
second should be replaced from the pipeline
and rold_id :3 has no match => inserted
[
{
"role_id": 1,
"a": 2
},
{
"role_id": 2,
"a": 10
}
]
Query
removes the :_id (else error, we cant update that)
merge on role_id
if 2 roots(from pipeline and from disk) equals => status updated
else replace with the root of the pipeline
*let is used to have the ROOT. of the pipeline, as variable "$$p-root"
coll1.aggregate(
[
{
"$unset": [
"_id"
]
},
{
"$merge": {
"into": {
"db": "testdb",
"coll": "coll2"
},
"on": [
"role_id"
],
"let": {
"p_root": "$$ROOT"
},
"whenMatched": [
{
"$unset": [
"_id"
]
},
{
"$replaceRoot": {
"newRoot": {
"$cond": [
{
"$eq": [
"$$p_root",
"$$ROOT"
]
},
{
"$mergeObjects": [
"$$ROOT",
{
"status": "updated"
}
]
},
"$$p_root"
]
}
}
}
],
"whenNotMatched": "insert"
}
}
])
Results that i got
(i used simple $eq , you can use $cmp but i dont think we need it, because we care only for the equality not the > <)
[
{
"role_id": 1,
"a": 2,
"status": "updated" // roots were equal (pipeline root,disk root)
},
{
"role_id": 2, // root not equal i kept the pipelines
"a": 3
},
{
"role_id": 3, // no match happened => insert
"a": 20
}
]
const mergePipeline = [
{'$unset': "_id"},
{'$merge' : {
into: "currentStatusSample",
on: "Role ID",
whenMatched: [
{$unset: ["_id", "Status"]},
{$addFields : {compare: {$cmp: ["$$new","$$ROOT"]}}},
{$set: {"Status" : {$cond: [ {$ne : ["$compare", 0]}, "Updated", "Unchanged"]}}},
{$unset: "compare"}
],
whenNotMatched: "insert"
}}
];
I wish there was a way to only $cmp on specified fields but for now this seemed to have worked. I plan on moving forward with either this or the answer posted by Takis above.

How to assign weights to searched documents in MongoDb?

This might sounds like simple question for you but i have spend over 3 hours to achieve it but i got stuck in mid way.
Inputs:
List of keywords
List of tags
Problem Statement: I need to find all the documents from the database which satisfy following conditions:
List documents that has 1 or many matching keywords. (achieved)
List documents that has 1 or many matching tags. (achieved)
Sort the found documents on the basis of weights: Each keyword matching carry 2 points and each tag matching carry 1 point.
Query: How can i achieve requirement#3.
My Attempt: In my attempt i am able to list only on the basis of keyword match (that too without multiplying weight with 2 ).
tags are array of documents. Structure of each tag is like
{
"id" : "ICC",
"some Other Key" : "some Other value"
}
keywords are array of string:
["women", "cricket"]
Query:
var predicate = [
{
"$match": {
"$or": [
{
"keywords" : {
"$in" : ["cricket", "women"]
}
},
{
"tags.id" : {
"$in" : ["ICC"]
}
}
]
}
},
{
"$project": {
"title":1,
"_id": 0,
"keywords": 1,
"weight" : {
"$size": {
"$setIntersection" : [
"$keywords" , ["cricket","women"]
]
}
},
"tags.id": 1
}
},
{
"$sort": {
"weight": -1
}
}
];
It seems that you were close in your attempt, but of course you need to implement something to "match your logic" in order to get the final "score" value you want.
It's just a matter of changing your projection logic a little, and assuming that both "keywords" and "tags" are arrays in your documents:
db.collection.aggregate([
// Match your required documents
{ "$match": {
"$or": [
{
"keywords" : {
"$in" : ["cricket", "women"]
}
},
{
"tags.id" : {
"$in" : ["ICC"]
}
}
]
}},
// Inspect elements and create a "weight"
{ "$project": {
"title": 1,
"keywords": 1,
"tags": 1,
"weight": {
"$add": [
{ "$multiply": [
{"$size": {
"$setIntersection": [
"$keywords",
[ "cricket", "women" ]
]
}}
,2] },
{ "$size": {
"$setIntersection": [
{ "$map": {
"input": "$tags",
"as": "t",
"in": "$$t.id"
}},
["ICC"]
]
}}
]
}
}},
// Then sort by that "weight"
{ "$sort": { "weight": -1 } }
])
So it is basicallt the $map logic here that "transforms" the other array to just give the id values for comparison against the "set" solution that you want.
The $add operator provides the additional "weight" to the member you want to "weight" your responses by.

How to group by different fields

I want to find all users named 'Hans' and aggregate their 'age' and number of 'childs' by grouping them.
Assuming I have following in my database 'users'.
{
"_id" : "01",
"user" : "Hans",
"age" : "50"
"childs" : "2"
}
{
"_id" : "02",
"user" : "Hans",
"age" : "40"
"childs" : "2"
}
{
"_id" : "03",
"user" : "Fritz",
"age" : "40"
"childs" : "2"
}
{
"_id" : "04",
"user" : "Hans",
"age" : "40"
"childs" : "1"
}
The result should be something like this:
"result" :
[
{
"age" :
[
{
"value" : "50",
"count" : "1"
},
{
"value" : "40",
"count" : "2"
}
]
},
{
"childs" :
[
{
"value" : "2",
"count" : "2"
},
{
"value" : "1",
"count" : "1"
}
]
}
]
How can I achieve this?
This should almost be a MongoDB FAQ, mostly because it is a real example concept of how you should be altering your thinking from SQL processing and embracing what engines like MongoDB do.
The basic principle here is "MongoDB does not do joins". Any way of "envisioning" how you would construct SQL to do this essentially requires a "join" operation. The typical form is "UNION" which is in fact a "join".
So how to do it under a different paradigm? Well first, let's approach how not to do it and understand the reasons why. Even if of course it will work for your very small sample:
The Hard Way
db.docs.aggregate([
{ "$group": {
"_id": null,
"age": { "$push": "$age" },
"childs": { "$push": "$childs" }
}},
{ "$unwind": "$age" },
{ "$group": {
"_id": "$age",
"count": { "$sum": 1 },
"childs": { "$first": "$childs" }
}},
{ "$sort": { "_id": -1 } },
{ "$group": {
"_id": null,
"age": { "$push": {
"value": "$_id",
"count": "$count"
}},
"childs": { "$first": "$childs" }
}},
{ "$unwind": "$childs" },
{ "$group": {
"_id": "$childs",
"count": { "$sum": 1 },
"age": { "$first": "$age" }
}},
{ "$sort": { "_id": -1 } },
{ "$group": {
"_id": null,
"age": { "$first": "$age" },
"childs": { "$push": {
"value": "$_id",
"count": "$count"
}}
}}
])
That will give you a result like this:
{
"_id" : null,
"age" : [
{
"value" : "50",
"count" : 1
},
{
"value" : "40",
"count" : 3
}
],
"childs" : [
{
"value" : "2",
"count" : 3
},
{
"value" : "1",
"count" : 1
}
]
}
So why is this bad? The main problem should be apparent in the very first pipeline stage:
{ "$group": {
"_id": null,
"age": { "$push": "$age" },
"childs": { "$push": "$childs" }
}},
What we asked to do here is group up everything in the collection for the values we want and $push those results into an array. When things are small then this works, but real world collections would result in this "single document" in the pipeline that exceeds the 16MB BSON limit that is allowed. That is what is bad.
The rest of the logic follows the natural course by working with each array. But of course real world scenarios would almost always make this untenable.
You could avoid this somewhat, by doing things like "duplicating" the documents to be of "type" "age or "child" and grouping documents individually by type. But it's all a bit to "over complex" and not a solid way of doing things.
The natural response is "what about a UNION?", but since MongoDB does not do the "join" then how to approach that?
A Better Way ( aka A New Hope )
Your best approach here both architecturally and performance wise is to simply submit "both" queries ( yes two ) in "parallel" to the server via your client API. As the results are received you then "combine" them into a single response you can then send back as a source of data to your eventual "client" application.
Different languages have different approaches to this, but the general case is to look for an "asynchronous processing" API that allows you to do this in tandem.
My example purpose here uses node.js as the "asynchronous" side is basically "built in" and reasonably intuitive to follow. The "combination" side of things can be any type of "hash/map/dict" table implementation, just doing it the simple way for example only:
var async = require('async'),
MongoClient = require('mongodb');
MongoClient.connect('mongodb://localhost/test',function(err,db) {
var collection = db.collection('docs');
async.parallel(
[
function(callback) {
collection.aggregate(
[
{ "$group": {
"_id": "$age",
"type": { "$first": { "$literal": "age" } },
"count": { "$sum": 1 }
}},
{ "$sort": { "_id": -1 } }
],
callback
);
},
function(callback) {
collection.aggregate(
[
{ "$group": {
"_id": "$childs",
"type": { "$first": { "$literal": "childs" } },
"count": { "$sum": 1 }
}},
{ "$sort": { "_id": -1 } }
],
callback
);
}
],
function(err,results) {
if (err) throw err;
var response = {};
results.forEach(function(res) {
res.forEach(function(doc) {
if ( !response.hasOwnProperty(doc.type) )
response[doc.type] = [];
response[doc.type].push({
"value": doc._id,
"count": doc.count
});
});
});
console.log( JSON.stringify( response, null, 2 ) );
}
);
});
Which gives the cute result:
{
"age": [
{
"value": "50",
"count": 1
},
{
"value": "40",
"count": 3
}
],
"childs": [
{
"value": "2",
"count": 3
},
{
"value": "1",
"count": 1
}
]
}
So the key thing to note here is that the "separate" aggregation statements themselves are actually quite simple. The only thing you face is combining those in your final result. There are many approaches to "combining", particularly to deal with large results from each of the queries, but this is the basic example of the execution model.
Key points here.
Shuffling data in the aggregation pipeline is possible but not performant for large data sets.
Use a language implementation and API that support "parallel" and "asynchronous" execution so you can "load up" all or "most" of your operations at once.
The API should support some method of "combination" or otherwise allow a separate "stream" write to process each result set received into one.
Forget about the SQL way. The NoSQL way delegates the processing of such things as "joins" to your "data logic layer", which is what contains the code as shown here. It does it this way because it is scalable to very large datasets. It is rather the job of your "data logic" handling nodes in large applications to deliver this to the end API.
This is fast compared to any other form of "wrangling" I could possibly describe. Part of "NoSQL" thinking is to "Unlearn what you have learned" and look at things a different way. And if that way doesn't perform better, then stick with the SQL approach for storage and query.
That's why alternatives exist.
That was a tough one!
First, the bare solution:
db.test.aggregate([
{ "$match": { "user": "Hans" } },
// duplicate each document: one for "age", the other for "childs"
{ $project: { age: "$age", childs: "$childs",
data: {$literal: ["age", "childs"]}}},
{ $unwind: "$data" },
// pivot data to something like { data: "age", value: "40" }
{ $project: { data: "$data",
value: {$cond: [{$eq: ["$data", "age"]},
"$age",
"$childs"]} }},
// Group by data type, and count
{ $group: { _id: {data: "$data", value: "$value" },
count: { $sum: 1 },
value: {$first: "$value"} }},
// aggregate values in an array for each independant (type,value) pair
{ $group: { _id: "$_id.data", values: { $push: { count: "$count", value: "$value" }} }} ,
// project value to the correctly name field
{ $project: { result: {$cond: [{$eq: ["$_id", "age"]},
{age: "$values" },
{childs: "$values"}]} }},
// group all data in the result array, and remove unneeded `_id` field
{ $group: { _id: null, result: { $push: "$result" }}},
{ $project: { _id: 0, result: 1}}
])
Producing:
{
"result" : [
{
"age" : [
{
"count" : 3,
"value" : "40"
},
{
"count" : 1,
"value" : "50"
}
]
},
{
"childs" : [
{
"count" : 1,
"value" : "1"
},
{
"count" : 3,
"value" : "2"
}
]
}
]
}
And now, for some explanations:
One of the major issues here is that each incoming document has to be part of two different sums. I solved that by adding a literal array ["age", "childs"] to your documents, and then unwinding them by that array. That way, each document will be presented twice in the later stage.
Once that done, to ease processing, I change the data representation to something much more manageable like { data: "age", value: "40" }
The following steps will perform the data aggregation per-se. Up to the third $project step that will map the value fields to the corresponding age or childs field.
The final two steps will simply wrap the two documents in one, removing the unneeded _id field.
Pfff!

MongoDb aggregate and group by two fields depending on values

I want to aggregate over a collection where a type is given. If the type is foo I want to group by the field author, if the type is bar I want to group by user.
All this should happen in one query.
Example Data:
{
"_id": 1,
"author": {
"someField": "abc",
},
"type": "foo"
}
{
"_id": 2,
"author": {
"someField": "abc",
},
"type": "foo"
}
{
"_id": 3,
"user": {
"someField": "abc",
},
"type": "bar"
}
This user field is only existing if the type is bar.
So basically something like that... tried to express it with an $or.
function () {
var results = db.vote.aggregate( [
{ $or: [ {
{ $match : { type : "foo" } },
{ $group : { _id : "$author", sumAuthor : {$sum : 1} } } },
{ { $match : { type : "bar" } },
{ $group : { _id : "$user", sumUser : {$sum : 1} } }
} ] }
] );
return results;
}
Does someone have a good solution for this?
I think it can be done by
db.c.aggregate([{
$group : {
_id : {
$cond : [{
$eq : [ "$type", "foo"]
}, "author", "user"]
},
sum : {
$sum : 1
}
}
}]);
The solution below can be cleaned up a bit...
For "bar" (note: for "foo", you have to change a bit)
db.vote.aggregate(
{
$project:{
user:{ $ifNull: ["$user", "notbar"]},
type:1
}
},
{
$group:{
_id:{_id:"$user.someField"},
sumUser:{$sum:1}
}
}
)
Also note: In you final answer, anything that is not of type "bar" will have an _id=null
What you want here is the $cond operator, which is a ternary operator returning a specific value where the condition is true or false.
db.vote.aggregate([
{ "$group": {
"_id": null,
"sumUser": {
"$sum": {
"$cond": [ { "$eq": [ "$type", "user" ] }, 1, 0 ]
}
},
"sumAuhtor": {
"$sum": {
"$cond": [ { "$eq": [ "$type", "auhtor" ] }, 1, 0 ]
}
}
}}
])
This basically tests the "type" of the current document and decides whether to pass either 1 or 0 to the $sum operation.
This also avoids errant grouping should the "user" and "author" fields contain the same values as they do in your example. The end result is a single document with the count of both types.