Mongodb aggregate positional fields inside an array [duplicate] - mongodb

I have a collection with the following data:
{
"_id" : ObjectId("5516d416d0c2323619ddbca8"),
"date" : "28/02/2015",
"driver" : "user1",
"passengers" : [
{
"user" : "user2",
"times" : 2
},
{
"user" : "user3",
"times" : 3
}
]
}
{
"_id" : ObjectId("5516d517d0c2323619ddbca9"),
"date" : "27/02/2015",
"driver" : "user2",
"passengers" : [
{
"user" : "user1",
"times" : 2
},
{
"user" : "user3",
"times" : 2
}
]
}
And I would like to perform aggregation so that I will know for a certain passenger, times it was with a certain driver, in my example it would be:
for user1: [{ driver: user2, times: 2}]
for user2: [{ driver: user1, times: 2}]
for user3: [{ driver: user1, times: 3}, {driver: user2, times:2}]
Im quite new with mongo and know how to perform easy aggregation with sum, but not when its inside arrays, and when my subject is itself in the array.
what is the appropriate way to perform this kind of aggregation, and in more specific, how I perform it in express.js based server?

To achieve your needs with aggregation framework, the first pipeline stage will be a $match operation on the passenger in question that matches the documents with the user in the passenger array, followed by the $unwind operation which deconstructs the passengers array from the input documents in the previous operation to output a document for each element. Another $match operation on the deconstructed array follows that further filters the previous document stream to allow only matching documents to pass unmodified into the next pipeline stage, which is projecting the required fields with the $project operator. So essentially your aggregation pipeline for user3 will be like:
db.collection.aggregate([
{
"$match": {
"passengers.user": "user3"
}
},
{
"$unwind": "$passengers"
},
{
"$match": {
"passengers.user": "user3"
}
},
{
"$project": {
"_id": 0,
"driver": "$driver",
"times": "$passengers.times"
}
}
])
Result:
/* 0 */
{
"result" : [
{
"driver" : "user1",
"times" : 3
},
{
"driver" : "user2",
"times" : 2
}
],
"ok" : 1
}
UPDATE:
For grouping duplicates on drivers with different dates, as you mentioned you can do a $group operation just before the last $project pipeline stage where you compute the total passengers times using the $sum operator:
db.collection.aggregate([
{
"$match": {
"passengers.user": "user3"
}
},
{
"$unwind": "$passengers"
},
{
"$match": {
"passengers.user": "user3"
}
},
{
"$group": {
"_id": "$driver",
"total": {
"$sum": "$passengers.times"
}
}
},
{
"$project": {
"_id": 0,
"driver": "$_id",
"total": 1
}
}
])
Result:
/* 0 */
{
"result" : [
{
"total" : 2,
"driver" : "user2"
},
{
"total" : 3,
"driver" : "user1"
}
],
"ok" : 1
}

Related

Multiple Nested Group Within Array

I'm having group of elements in MongoDB as given below:
/* 1 */
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"Name" : "Kevin",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2014-08-31"
},
{
"event_type" : "Anniversary",
"event_date" : "2014-08-31"
}
]
}
/* 2 */
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"Name" : "Peter",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2014-08-31"
},
{
"event_type" : "Anniversary",
"event_date" : "2015-03-24"
}
]
}
/* 3 */
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"Name" : "Pole",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2015-03-24"
},
{
"event_type" : "Work Anniversary",
"event_date" : "2015-03-24"
}
]
}
Now I want the result that has group on event_date then after group on event_type. event_type contain all names of the related user, then count of records in the respective array.
Expected Output
/* 1 */
{
"event_date" : "2014-08-31",
"data" : [
{
"event_type" : "Birthday",
"details" : [
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"name" : "Kevin"
},
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"name" : "Peter"
}
],
"count" : 2
},
{
"event_type" : "Anniversary",
"details" : [
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"name" : "Kevin"
}
],
"count" : 1
}
]
}
/* 2 */
{
"event_date" : "2015-03-24",
"data" : [
{
"event_type" : "Anniversary",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"name" : "Peter"
}
],
"count" : 1
},
{
"event_type" : "Birthday",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"name" : "Pole"
}
],
"count" : 1
},
{
"event_type" : "Work Anniversary",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"name" : "Pole"
}
],
"count" : 1
}
]
}
Using the aggregation framework, you would need to run a pipeline that has the following stages so that you get the desired result:
db.collection.aggregate([
{ "$unwind": "$pb_event" },
{
"$group": {
"_id": {
"event_date": "$pb_event.event_date",
"event_type": "$pb_event.event_type"
},
"details": {
"$push": {
"_id": "$_id",
"name": "$Name"
}
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.event_date",
"data": {
"$push": {
"event_type": "$_id.event_type",
"details": "$details",
"count": "$count"
}
}
}
},
{
"$project": {
"_id": 0,
"event_date": "$_id",
"data": 1
}
}
])
In the above pipeline, the first step is the $unwind operator
{ "$unwind": "$pb_event" }
which comes in quite handy when the data is stored as an array. When the unwind operator is applied on a list data field, it will generate a new record for each and every element of the list data field on which unwind is applied. It basically flattens the data.
This is a necessary operation for the next pipeline stage, the $group step where you group the flattened documents by the deconstructed pb_event array fields event_date and event_type:
{
"$group": {
"_id": {
"event_date": "$pb_event.event_date",
"event_type": "$pb_event.event_type"
},
"details": {
"$push": {
"_id": "$_id",
"name": "$Name"
}
},
"count": { "$sum": 1 }
}
},
The $group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, you have to use an aggregation function in MongoDB (called an accumulator operator) as well. You can read more about the aggregation functions here.
In this $group operation, the logic to calculate the count aggregate i.e. the total number of documents in the group using the $sum accumulator operator. Within the same pipeline, you can aggregate a list of the name and _id subdocuments by using the $push operator which returns an array of expression values for each group.
The preceding $group pipeline
{
"$group": {
"_id": "$_id.event_date",
"data": {
"$push": {
"event_type": "$_id.event_type",
"details": "$details",
"count": "$count"
}
}
}
}
will further aggregate the results from the last pipeline by grouping on the event_date, which forms basis of the desired output by creating a new data list using $push and then the final $project pipeline stage
{
"$project": {
"_id": 0,
"event_date": "$_id",
"data": 1
}
}
reshapes the documents fields by renaming the _id field to event_date and retaining the other field.

mongoDB query to find the document in nested array

[{
"username":"user1",
"products":[
{"productID":1,"itemCode":"CODE1"},
{"productID":2,"itemCode":"CODE1"},
{"productID":3,"itemCode":"CODE2"},
]
},
{
"username":"user2",
"products":[
{"productID":1,"itemCode":"CODE1"},
{"productID":2,"itemCode":"CODE2"},
]
}]
I want to find all the "productID" of "products" for "user1" such that "itemCode" for the product is "CODE1".
What query in mongoDB should be written to do so?
If you only need to match a single condition, then the dot notation is sufficient.
In Mongo shell:
db.col.find({"products.itemCode" : "CODE1", "username" : "user1"})
This will return all users with nested product objects having itemCode "CODE1".
Updated
Wasn't clear on your requirements at first but this should be it.
If you want each product as a separate entry, then you would need to use the aggregate framework. First split the entries in the array using $unwind, then use $match for your conditions.
db.col.aggregate(
{ $unwind: "$products" },
{ $match: { username: "user1", "products.itemCode": "CODE1" } }
);
response:
{ "_id" : ObjectId("57cdf9c0f7f7ecd0f7ef81b6"), "username" : "user1", "products" : { "productID" : 1, "itemCode" : "CODE1" } }
{ "_id" : ObjectId("57cdf9c0f7f7ecd0f7ef81b6"), "username" : "user1", "products" : { "productID" : 2, "itemCode" : "CODE1" } }
The answer to your question is
db.col.aggregate([
{ $unwind: "$products" },
{ $match: { username: "user1", "products.itemCode": CODE1 } },
{ $project: { _id: 0, "products.productID": 1 } }
]);
In my case didn't work without [ ] tags.
You need multiple filter for this like below which is nothing but AND condition (assuming your collection name is collection1)
db.collection1.find({"username":"user1", "products.itemCode" : "CODE1"})

MongoDB Advanced Query - Getting data based on Array of objects

On Mongo 2.4.6
Collection of Users
{
"_id" : User1,
"orgRoles" : [
{"_id" : 1, "app" : "ANGRYBIRDS", "orgId" : "CODOE"},
{"_id" : 2, "app" : "ANGRYBIRDS", "orgId" : "MSDN"}
],
},
{
"_id" : User2,
"orgRoles" : [
{"_id" : 1, "app" : "ANGRYBIRDS", "orgId" : "CODOE"},
{"_id" : 2, "app" : "HUNGRYPIGS", "orgId" : "MSDN"}
],
},
{
"_id" : User2,
"orgRoles" : [
{"_id" : 1, "app" : "ANGRYBIRDS", "orgId" : "YAHOO"},
{"_id" : 2, "app" : "HUNGRYPIGS", "orgId" : "MSDN"}
],
}
With data that looks like above, I'm trying to write a query to get:
All the id's of the users that have only one ANGRYBIRDS app and that ANGRYBIRDS app is in the CODOE organization.
So it would return User2 because they have 1 ANGRYBIRDS and is in the ORG "CODOE" but not User1 because they have two ANGRYBIRDS or User3 because they don't have an ANGRYBIRDS app in the "CODOE" organization. I'm fairly new to mongo queries, so any help is appreciated.
To do something with a few more detailed conditions not immediately offered by standard operators, then your best approach is to use the aggregation framework. This allows you do some processing to work our your conditions, such as the number of matches:
db.collection.aggregate([
// Filter the documents that are possible matches
{ "$match": {
"orgRoles": {
"$elemMatch": {
"app": "ANGRYBIRDS", "orgId": "CODOE"
}
}
}},
// De-normalize the array content
{ "$unwind": "$orgRoles" },
// Group and count the matches
{ "$group": {
"_id": "$_id",
"orgRoles": { "$push": "$orgRoles" },
"matched": {
"$sum": {
"$cond": [
{ "$eq": ["$orgRoles.app", "ANGRYBIRDS"] },
1,
0
]
}
}
}},
// Filter where matched is more that 1
{ "$match": {
"orgRoles": {
"$elemMatch": {
"app": "ANGRYBIRDS", "orgId": "CODOE"
}
},
"matched": 1
}},
// Optionally project to just keep the original fields
{ "$project": { "orgRoles": 1 } }
])
The main thing here happens after the initial $match is processed to only return those documents that have at least one array element matching the main condition, and then after the array elements are processed with $unwind so they can be inspected individually.
The trick is the conditional $sum operation with the $cond operator which is a "ternary". This evaluates "howMany" matches were found in the array to the "ANGRYBIRDS" string. Following this you $match again in order to "filter" any documents that had a match count of more than one. Still leaving the other condition in there, but that is really not necessary.
Just for the record, this is also possible with using the JavaScript evaluation of the $where clause, but due to that it is likely not to be as efficient at processing:
db.collection.find({
"orgRoles": {
"$elemMatch": {
"app": "ANGRYBIRDS", "orgId": "CODOE"
}
},
"$where": function() {
var orgs = this.orgRoles.filter(function(el) {
return el.app == "ANGRYBIRDS";
});
return ( orgs.length == 1 );
}
})
One way of doing it using the aggregation pipeline is:
db.users.aggregate([
// Match the documents with app being "ANGRYBIRDS" and orgID being "CODE"
// Note that this step filters out most of the documents and is good to have
// at the start of the pipeline, moreover it can make use of indexes, if
// used at the beginning of the aggregation pipeline.
{
$match : {
"orgRoles.app" : "ANGRYBIRDS",
"orgRoles.orgId" : "CODOE"
}
},
// unwind the elements in the orgRoles array
{
$unwind : "$orgRoles"
},
// group by userid and app
{
$group : {
"_id" : {
"id" : "$_id",
"app" : "$orgRoles.app"
},
// take the id and app of the first document in each group, since all
// the
// other documents in the group will have the same values.
"id" : {
$first : "$_id"
},
"app" : {
$first : "$orgRoles.app"
},
// orgId can be different, so form an array for each group.
"orgId" : {
$push : {
"id" : "$orgRoles.orgId"
}
},
// count the number of documents in each group.
"count" : {
$sum : 1
}
}
},
// find the matching group
{
$match : {
"count" : 1,
"app" : "ANGRYBIRDS",
"orgId" : {
$elemMatch : {
"id" : "CODOE"
}
}
}
},
// project only the userid
{
$project : {
"id" : 1,
"_id" : 0
}
} ]);
Edit: Removed mapping the aggregation result, since the problem requires solution in v2.4.6, and according to the documentation.
Changed in version 2.6: The db.collection.aggregate() method returns a cursor and can return result sets of any size. Previous versions
returned all results in a single document, and the result set was
subject to a size limit of 16 megabytes.

add where condition in aggregate and group function in mongodb

I have mongo model lets say MYLIST containing data like:-
{
"_id" : ObjectId("542139f31284ad1461dbc15f"),
"Category" : "CENTER",
"Name" : "STAND",
"Url" : "center/stand",
"Img" : [ {
"url" : "www.google.com/images",
"main" : "1",
"home" : "1",
"id" : "34faf230-43cf-11e4-8743-311ea2261289"
},
{
"url" : "www.google.com/images1",
"main" : "1",
"home" : "0",
"id" : "34faf230-43cf-11e4-8743-311e66441289"
} ]
}
I execute the following query to the MYLIST collection:
db.MYLIST.aggregate([
{ "$group": {
"_id": "$Category",
"Name": { "$addToSet": {
"name": "$Name",
"url": "$Url",
"img": "$Img"
}}
}},
{ "$sort": { "_id" : 1 } }
]);
And I got the following result -
[
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images" , "main" : "1", "home" : "1", "id" : "350356a0-43cf-11e4-8743-311ea2261289" }
}]
},
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images1" , "main" : "1", "home" : "0", "id" : "34faf230-43cf-11e4-8743-311ea2261289" }
}]
}
]
As you can see my img key itself is an array of objects, Hence I am getting multiple entries for the same category of each entry in img array.
What I actually need is to get only those images that have some value for home key.
expected result:-
[
{ _id: 'CENTER',
Name:
[ { "name" : "Stand",
"url" : "center/stand",
"img": { "url" : "www.google.com/images" , "main" : "1", "home" : "1", "id" : "350356a0-43cf-11e4-8743-311ea2261289" }
}]
},
]
Hence I would like to add where the condition for img.home > 0 on the above-mentioned query, Could anybody help me to resolve this issue as my relatively new to MongoDB.
Still really not sure if this is what you want or even why you would be using $addToSet on this grouping. But if all you want to do is "filter" the content of the array returned in your result, then what you want to do is $match the array elements to your condition after processing an $unwind pipeline in order to "de-normalize" the content:
db.MYLIST.aggregate([
// If you only want those matching array members it makes sense to match the
// documents that contain them first
{ "$match": { "Img.home": 1 } },
// Unwind to de-normalize or "un-join" the documents
{ "$unwind": "$Img" },
// Match again to "filter" out those elements that do not match
{ "$match": { "Img.home": 1 } },
// Then do your grouping
{ "$group": {
"_id": "$Category",
"Name": {
"$addToSet": {
"name": "$Name",
"url": "$Url",
"img": "$Img"
}
}
}},
// Finally sort
{ "$sort": { "_id" : 1 } }
]);
So the $match pipeline is the equivalent of a general query or "where clause" in SQL terms, and can be used at any stage. It is usually best to have this as a first stage when there is some type of filtering that results from this. It reduces the overall load by reducing documents to be processed even if "all" of the end results are not removed as would be the case of working with an array.
The $unwind stage allows the array elements to be processed just like another document. And of course you can just use another $match pipeline stage after this in order to just match the documents to your query condition.

Obtaining $group result with group count

Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
Use $project to save tag and count into tmp
Use $push or addToSet to store tmp into your data list.
Code:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)