mongodb unwind aggregate result? - mongodb

So I'm trying to produce some statistics in my database using aggregation
My current script looks like this:
db.posts.aggregate( [
{ $group: {_id : "$domain", "counter" : {$sum : 1}}},
{ $sort : { counter : -1}},
{ $match : { counter : {$gt : 10} } }
])
and produces result like this:
{
"result" : [
{
"_id" : "i.imgur.com",
"counter" : 1220
},
{
"_id" : "imgur.com",
"counter" : 459
}
],
"ok" : 1
}
Quite satisfactory, but I want to move further. I'm using robomongo (more comfortable to me) and this appears like single document to me:
I want to have each result to be a separate document like this:
| _id | counter
---+-------------+--------
1 | i.imgur.com | 1220
---+-------------+--------
2 | imgur.com | 459
I assumed I needed to use $unwind, but failed miserably: adding { $unwind : "$result"} to aggregate chain produces this output:
/* 0 */
{
"result" : [],
"ok" : 1
}
What have I done wrong and how to do it right?

I don't use robomongo, but it seems like you could tack on .result to your script to get the results output you want.
db.posts.aggregate( [
{ $group: {_id : "$domain", "counter" : {$sum : 1}}},
{ $sort : { counter : -1}},
{ $match : { counter : {$gt : 10} } }
]).result
I've done some looking into this and it appears that the return output you are getting is a consequence of robomongo using db.posts.runCommand("aggregate", {pipeline: [<array of pipeline operators>]}) and then not "instantiating a cursor" rather than the aggregate() helper which creates a cursor.
I don't think there's much you can do on this outside of filing a ticket.

Related

MongoDB $divide on aggregate output

Is there a possibility to calculate mathematical operation on already aggregated computed fields?
I have something like this:
([
{
"$unwind" : {
"path" : "$users"
}
},
{
"$match" : {
"users.r" : {
"$exists" : true
}
}
},
{
"$group" : {
"_id" : "$users.r",
"count" : {
"$sum" : 1
}
}
},
])
Which gives an output as:
{ "_id" : "A", "count" : 7 }
{ "_id" : "B", "count" : 49 }
Now I want to divide 7 by 49 or vice versa.
Is there a possibility to do that? I tried $project and $divide but had no luck.
Any help would be really appreciated.
Thank you,
From your question, it looks like you are assuming result count to be 2 only. In that case I can assume users.r can have only 2 values(apart from null).
The simplest thing I suggest is to do this arithmetic via javascript(if you're using it in mongo console) or in case of using it in progam, use the language you're using to access mongo) e.g.
var results = db.collection.aggregate([theAggregatePipelineQuery]).toArray();
print(results[0].count/results[1].count);
EDIT: I am sharing an alternative to above approach because OP commented about the constraint of not using javascript code and the need to be done only via query. Here it is
([
{ /**your existing aggregation stages that results in two rows as described in the question with a count field **/ },
{ $group: {"_id": 1, firstCount: {$first: "$count"}, lastCount: {$last: "$count"}
},
{ $project: { finalResult: { $divide: ['$firstCount','$lastCount']} } }
])
//The returned document has your answer under `finalResult` field

MongoDB update latest subdocument

here is my mongo document..
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8d"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e"
}
]
}
i want to update and upsert invoice_id into last element of sub-array..
i have tried..
sort: {$natural: -1},
subscription.$.invoice
what i want it to be is....
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8d"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8f"
}
]
}
While there are ways to get the last array element, like Saravana shows in her answer, I don't recommend doing it that way because it introduces race conditions. For example, if two subs are added simultaneously, you can't depend on which one is 'last' in the array.
If an invoice_id has to be tied to a specific sub_id, then it's far better to query and find that specific element in the array, then add the invoice_id to it.
In the comments, the OP indicated that the current order of operations is 1) add sub_id, 2) insert the invoice record into the INVOICE collection and get the invoice_id, 3) add the invoice_id into the new subscription.
However, if you already have the sub_id, then it's better to re-order your operations this way: 1) insert the invoice record and get the invoice_id 2) add both sub_id and invoice_id with a single operation.
Doing this improves performance (eliminates the second update operation), but more importantly, eliminates race conditions because you're adding both sub_id and invoice_id at the same time.
we can get the document and update last element by index
> var doc = db.sub.findOne({"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8")})
> if ( doc.subscriptions.length - 1 >= 0 )
doc.subscriptions[doc.subscriptions.length-1].invoice_id="5a56fd399dd78e33948c9b8f"
> db.sub.update({_id:doc._id},doc)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
or write an aggregation pipeline to form the document and use it for update
db.sub.aggregate(
[
{$match : { "_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8") }},
{$addFields : { last : { $subtract : [{$size : "$subscriptions"},1]}}},
{$unwind : { path :"$subscriptions" , includeArrayIndex : "idx"}},
{$project : { "subscriptions.sub_id" : 1,
"subscriptions.invoice_id" : {
$cond : {
if: { $eq: [ "$idx", "$last" ] },
then: "5a56fd399dd78e33948c9b8f",
else: "$$REMOVE"
}
}
}
},
{$group : {_id : "$_id", subscriptions : {$push : "$subscriptions"}}}
]
).pretty()
result doc
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8f"
}
]
}

Mongodb unwind and match VS match and unwind

I'm looking to optimize the MongoDB performance by minimizing the number of records to unwind.
I do like:
unwind(injectionRecords),
match("machineID" : "machine1"),
count(counter)
But because of huge data unwind operation takes a lot of time and then it matches from unwind.
It unwinds all the 4 records then matches machineID from result and give me count of it.
Instead I would like to do something like :
match("machineID": "machine1"),
unwind(injectionRecords)
count(counter)
So, it would match records having machineID and unwind only 2 instead of 4 and give me the count of it.
Is this possible? How can I do this?
Here are sample docs,
{
"_id" : ObjectId("5981c24b90a7c215e4f166dd"),
"machineID" : "machine1",
"injectionRecords" : [
{
"startTime" : ISODate("2017-08-02T17:45:04.779+05:30"),
"endTime" : ISODate("2017-08-02T17:45:07.763+05:30"),
"counter" : 1
},
{
"startTime" : ISODate("2017-08-02T17:45:24.417+05:30"),
"endTime" : ISODate("2017-08-02T17:45:27.402+05:30"),
"counter" : 2
}
]
},
{
"_id" : ObjectId("5981c24b90a7c215e4f166de"),
"machineID" : "machine2",
"injectionRecords" : [
{
"startTime" : ISODate("2017-08-02T17:46:04.779+05:30"),
"endTime" : ISODate("2017-08-02T17:46:07.763+05:30"),
"counter" : 1
},
{
"startTime" : ISODate("2017-08-02T17:46:24.417+05:30"),
"endTime" : ISODate("2017-08-02T17:46:27.402+05:30"),
"counter" : 2
}
]
}
The following query will return a count of injectionRecords for a given machineId. I think this is what you are asking for.
db.collection.aggregate([
{$match: {machineID: 'machine1'}},
{$unwind: '$injectionRecords'},
{$group:{_id: "$_id",count:{$sum:1}}}
])
Of course, this query (where the unwind takes place before the match) is functionally equivalent:
db.collection.aggregate([
{$unwind: '$injectionRecords'},
{$match: {machineID: 'machine1'}},
{$group:{_id: "$_id",count:{$sum:1}}}
])
However, running that query with explain ...
db.collection.aggregate([
{$unwind: '$injectionRecords'},
{$match: {machineID: 'machine1'}},
{$group:{_id: "$_id",count:{$sum:1}}}
], {explain: true})
... shows that the unwind stage applies to the entire collection whereas if you match before unwinding then only the matched documents are unwound.

I need to count how many children orgs are assigned to a parent org in MongoDB

I'm new to the MongoDB world. I'm trying to figure out how to count the number of children organizations assigned to a parent organization. I have documents that have this general structure:
{
"_id" : "001",
"parentOrganization" : {
"organizationId" : "pOrg1"
},
"childOrganization" : {
"organizationId" : "cOrg1"
}
},
{
"_id" : "002",
"parentOrganization" : {
"organizationId" : "pOrg1"
},
"childOrganization" : {
"organizationId" : "cOrg2"
}
},
{
"_id" : "003",
"parentOrganization" : {
"organizationId" : "pOrg2"
},
"childOrganization" : {
"organizationId" : "cOrg3"
}
}
Each document has a parentOrganization with an associated childOrganization. There may be multiple documents with the same parentOrganization, but different childOrganizations. There may also be multiple documents with the same parent/child relationship. Additionally, there may even be a case where a child org may associate with multiple parent orgs.
I'm trying to group by parentOrganization and then count the number of unique childOrganization's associated with each parentOrganization, as well as display the unique id's.
I have tried using an aggregation framework with $match and $group, but I'm still not getting into the child organization parts to count them. Here is what I'm currently attempting:
var s1 = {$match: {"parentOrganization.organizationId": {$exists: true}}};
var s2 = {$group: {_id: "$parentOrganization.organizationId", count: {$sum: "$childOrganization.organizationId"}}};
db.collection.aggregate(s1, s2);
My results are returning the parentOrganization, but my $sum is not returning the number of associated childOrganizations:
/* 1 */
{
"_id" : "pOrg1",
"count" : 0
}
/* 2 */
{
"_id" : "pOrg2",
"count" : 0
}
I get the feeling it is a bit more complicated than my limited knowledge has access to at this time. What details am I missing in this query?
Your $sum is referencing the childOrganization.organizationId value, which is a string. When $sum references a string, it will return the value 0.
I was a unsure of exactly what you were asking for, but I believe that these aggregations can help you on your way.
This will return a count of documents groups by the parentOrganization.organizationId
db.collection.aggregate({$group: {"_id":"$parentOrganization.organizationId", "count": {"$sum": 1}}})
Output:
{ "_id" : "pOrg2", "count" : 1 }
{ "_id" : "pOrg1", "count" : 2 }
This will return a count of unique parent/child organizations:
db.collection.aggregate(
{$group: {"_id": {"parentOrganization": "$parentOrganization.organizationId", "childOrganization": "$childOrganization.organizationId"}, "count":{$sum:1}}})
Output:
{ "_id" : { "parentOrganization" : "pOrg2", "childOrganization" : "cOrg3" }, "count" : 1 }
{ "_id" : { "parentOrganization" : "pOrg1", "childOrganization" : "cOrg2" }, "count" : 1 }
{ "_id" : { "parentOrganization" : "pOrg1", "childOrganization" : "cOrg1" }, "count" : 1 }
This will return a count of unique child organizations and get the set of unique child organizations as well using $addToSet. One caveat of using $addToSet is that the MongoDB 16MB limit on document size still holds. This means that if your collection is large enough such that the size of the set will make one document greater than 16MB, the command will fail. The first $group will create a set of child organizations grouped by parent organization. The $project is used simply to add the total size of the set to the result.
db.collection.aggregate([
{$group: {"_id" : "$parentOrganization.organizationId", "childOrgs" : { "$addToSet" : "$childOrganization.organizationId"}}},
{$project: {"_id" : "$_id", "uniqueChildOrgsCount": {"$size" : "$childOrgs"}, "uniqueChildOrgs": "$childOrgs"}}])
Output:
{ "_id" : "pOrg2", "uniqueChildOrgsCount" : 1, "uniqueChildOrgs" : [ "cOrg3" ]}
{ "_id" : "pOrg1", "uniqueChildOrgsCount" : 2, "uniqueChildOrgs" : [ "cOrg2", "cOrg1" ]}
During these aggregations, I left out the $match statement you included for simplicity, but you could add that back as well.

How to find a document with maximum field value in mongodb?

I have a number of Mongodb documents of the following form:
{
"auditedId" : "53d0f648e4b064e8d746b31c",
"modifications" : [
{
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31d"),
"modified" : "2014-07-22 18:33:05"
},
{
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31e"),
"modified" : "2014-07-24 14:15:27"
},
{
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31f"),
"modified" : "2014-07-24 12:04:24"
}
]
}
For each of these documents I want to find "auditRecordId" value which corresponds to the latest modification. In the given example I want to retrieve
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31e")
Or, even better:
{
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31e"),
"modified" : "2014-07-24 14:15:27"
}
Is there any way how I can do this without writing map-reduce functions?
Whenever you have an array in your document, the aggregate method is your friend :)
db.foo.aggregate([
// De-normalize the 'modifications' array
{"$unwind":"$modifications"},
// Sort by 'modifications.modified' descending
{"$sort":{"modifications.modified":-1}},
// Pick the first one i.e., the max
{"$limit":1}
])
Output:
{
"result" : [
{
"_id" : ObjectId("53d12be57a462c7459b6f1c7"),
"auditedId" : "53d0f648e4b064e8d746b31c",
"modifications" : {
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31e"),
"modified" : "2014-07-24 14:15:27"
}
}
],
"ok" : 1
}
Just to illustrate the $unwind operator, I used the above query with $limit. If you have multiple documents of the above format, and you want to retrieve the latest modification in each, you'll have to add another $group phase in your aggregation pipeline and use the $first operator:
db.foo.aggregate([
{"$unwind":"$modifications"},
{"$sort":{"modifications.modified":-1}},
{"$group":{
"_id" : "$auditedId",
"modifications" : {$first:"$modifications"}}}
])