Is it possible to use the aggregation framework to group by a specific element of an array?
Such that with documents like this:
{
name: 'Russell',
favourite_foods: [
{ name: 'Pizza', type: 'Four Cheeses' },
{ name: 'Burger', type: 'Veggie'}
],
height: 6
}
I could get a distinct list of top favourite foods (ie. foods at index 0) along with the height of the tallest person who's top favourite food that is?
Something like this (although it doesn't work as the array index access dot notation doesn't seem to work in the aggregation framework):
db.people.aggregate([
{ $group : { _id: "$favourite_foods.0.name", max_height: { $max : "$height" } } }
])
Seems like you are relying on the favorite food for each person being first in the array. If so, there is an aggregation framework operator you can take advantage of.
Here is the pipeline you can use:
db.people.aggregate(
[
{
"$unwind" : "$favourite_foods"
},
{
"$group" : {
"_id" : {
"name" : "$name",
"height" : "$height"
},
"faveFood" : {
"$first" : "$favourite_foods"
}
}
},
{
"$group" : {
"_id" : "$faveFood.name",
"height" : {
"$max" : "$_id.height"
}
}
}
])
On this sample dataset:
> db.people.find().pretty()
{
"_id" : ObjectId("508894efd4197aa2b9490741"),
"name" : "Russell",
"favourite_foods" : [
{
"name" : "Pizza",
"type" : "Four Cheeses"
},
{
"name" : "Burger",
"type" : "Veggie"
}
],
"height" : 6
}
{
"_id" : ObjectId("5088950bd4197aa2b9490742"),
"name" : "Lucy",
"favourite_foods" : [
{
"name" : "Pasta",
"type" : "Four Cheeses"
},
{
"name" : "Burger",
"type" : "Veggie"
}
],
"height" : 5.5
}
{
"_id" : ObjectId("5088951dd4197aa2b9490743"),
"name" : "Landy",
"favourite_foods" : [
{
"name" : "Pizza",
"type" : "Four Cheeses"
},
{
"name" : "Pizza",
"type" : "Veggie"
}
],
"height" : 5
}
{
"_id" : ObjectId("50889541d4197aa2b9490744"),
"name" : "Augie",
"favourite_foods" : [
{
"name" : "Sushi",
"type" : "Four Cheeses"
},
{
"name" : "Pizza",
"type" : "Veggie"
}
],
"height" : 6.2
}
You get these results:
{
"result" : [
{
"_id" : "Pasta",
"height" : 5.5
},
{
"_id" : "Pizza",
"height" : 6
},
{
"_id" : "Sushi",
"height" : 6.2
}
],
"ok" : 1
}
Looks like it isn't currently possible to extract a specific element from an array in aggregation:
https://jira.mongodb.org/browse/SERVER-4589
JUST add more information about the result after using "$wind":
DOCUMENT :
> db.people.find().pretty()
{
"_id" : ObjectId("508894efd4197aa2b9490741"),
"name" : "Russell",
"favourite_foods" : [
{
"name" : "Pizza",
"type" : "Four Cheeses"
},
{
"name" : "Burger",
"type" : "Veggie"
}
],
"height" : 6
},
...
AGGREAGATION :
db.people.aggregate([{
$unwind: "$favourite_foods"
}]);
RESULT :
{
"_id" : ObjectId("508894efd4197aa2b9490741"),
"name" : "Russell",
"favourite_foods" :{
"name" : "Pizza",
"type" : "Four Cheeses"
},
"height" : 6
},
{
"_id" : ObjectId("508894efd4197aa2b9490741"),
"name" : "Russell",
"favourite_foods" : {
"name" : "Burger",
"type" : "Veggie"
},
"height" : 6
}
In Addition:
If there are more than two array fields in one collection record,
we can use "$project" stage to specify the array field.
db.people.aggregate([
{
$project:{
"favourite_foods": 1
}
},
{
$unwind: "$favourite_foods"
}
]);
I think you can make use of the $project and $unwind operators (let me know if this isn't what you're trying to accomplish):
> db.people.aggregate(
{$unwind: "$favourite_foods"},
{$project: {food : "$favourite_foods", height: 1}},
{$group : { _id: "$food", max_height: { $max : "$height" } } })
{
"result" : [
{
"_id" : {
"name" : "Burger",
"type" : "Veggie"
},
"max_height" : 6
},
{
"_id" : {
"name" : "Pizza",
"type" : "Four Cheeses"
},
"max_height" : 6
}
],
"ok" : 1
}
http://docs.mongodb.org/manual/applications/aggregation/
Since mongoDB version 3.2 You can simply use $arrayElemAt and $max:
db.collection.aggregate([
{
$set: {favourite_foods: {$arrayElemAt: ["$favourite_foods", 0]}}
},
{
$group: {
_id: "$favourite_foods.name",
maxHeight: {$max: "$height"}
}
}
])
Playground example
Related
Need help to sort these documents:
const docs = Docs.find(
{
'publishedOn.profileId': groupProfile._id,
},
{ sort: { ??? }}
);
I need to find documents which has defined 'publishedOn.profileId' and
sort by 'awards.type' = 'challengeWinner' and by its 'awards.score'
Not all document has awards.type = 'challengeWinner'. I need to
take on the top 'awards.score' = 1, then 2, then 3 and then the rest by 'writtenDate'.
I have no idea how to fix it. Is it possible?
[
{
"_id" : "5FW9EDW8gi3M8R7XK",
"createdAt" : ISODate("2021-06-13T00:11:48.638Z"),
"title" : "My solution",
"writtenDateType" : 4,
"writtenDate" : ISODate("2021-06-13T00:00:00.000Z"),
"userId" : "dC35hwe6XMRhvqWBv",
"publishedOn" : [
{
"profileId" : "36oPw2zxYCpKxfiu2",
"publishedDate" : ISODate("2021-06-13T00:11:48.787Z"),
"userId" : "dC35hwe6XMRhvqWBv"
},
{
"profileId" : "9y2RwJpzzyk29ApiC",
"userId" : "dC35hwe6XMRhvqWBv",
"publishedDate" : ISODate("2021-06-13T00:16:01.529Z")
}
],
"awards" : [
{
"type" : "topPoem",
"score" : 5,
"addedAt" : ISODate("2021-06-24T23:04:10.454Z"),
"updatedAt" : ISODate("2021-06-25T23:30:00.069Z")
},
{
"type" : "challengeWinner",
"score" : 2,
"challengeId" : "9y2RwJpzzyk29ApiC",
"addedAt" : ISODate("2021-06-24T23:04:10.454Z"),
"updatedAt" : ISODate("2021-06-25T23:30:00.069Z")
}
]
},
{
"_id" : "upzvo8BeHyQ9r9Yfv",
"createdAt" : ISODate("2021-06-19T15:35:13.716Z"),
"title" : "Briches",
"writtenDateType" : 2,
"writtenDate" : ISODate("2003-01-01T00:00:00.000Z"),
"userId" : "A32228XMuZqxFe4Kz",
"publishedOn" : [
{
"profileId" : "MLGkCtNyZ64bGKedG",
"publishedDate" : ISODate("2021-06-19T15:35:13.861Z"),
"userId" : "A32228XMuZqxFe4Kz"
},
{
"profileId" : "9y2RwJpzzyk29ApiC",
"userId" : "A32228XMuZqxFe4Kz",
"publishedDate" : ISODate("2021-06-19T15:35:36.280Z")
}
],
"awards" : [
{
"type" : "challengeWinner",
"score" : 1,
"challengeId" : "9y2RwJpzzyk29ApiC",
"addedAt" : ISODate("2021-06-24T22:59:00.948Z"),
"updatedAt" : ISODate("2021-06-25T23:30:00.067Z"),
"claps" : 19,
"clapsUsers" : 4
},
{
"type" : "suggestedHomepage",
"score" : 1,
"addedAt" : ISODate("2021-06-24T22:59:59.981Z"),
"updatedAt" : ISODate("2021-06-24T22:59:59.981Z")
}
]
}
]
I just learned and tried to solve your problem. I used aggregate to do the filter in your data.
First I selected all the items which $match the `publishedOn.profileId".
Then, I $project(ed) the items that are needed. In this case, I took the writtenDate and the matching awards.
In order to choose the needed value from awards, I $filter (ed) the award type.
Last, I did $sort for the award score first and then writtenDate,
db.collection.aggregate([
{
"$match": {
"publishedOn.profileId": "9y2RwJpzzyk29ApiC"
}
},
{
"$project": {
"writtenDate": 1,
"awards": {
"$filter": {
"input": "$awards",
"as": "award",
"cond": {
"$eq": [
"$$award.type",
"challengeWinner"
]
}
}
}
}
},
{
"$sort": {
"awards.score": 1,
"writtenDate": 1,
}
}
])
Working of above query: https://mongoplayground.net/p/MzWQCR2Gshg
Happy Coding !!!
I am having problems aggregating my Product Document in MongoDB.
My Product Document is:
{
"_id" : ObjectId("5d81171c2c69f45ef459e0af"),
"type" : "T-Shirt",
"name" : "Panda",
"description" : "Panda's are cool.",
"image" : ObjectId("5d81171c2c69f45ef459e0ad"),
"created_at" : ISODate("2019-09-17T18:25:48.026+01:00"),
"is_featured" : false,
"sizes" : [
"XS",
"S",
"M",
"L",
"XL"
],
"tags" : [ ],
"pricing" : {
"price" : 26,
"sale_price" : 8
},
"categories" : [
ObjectId("5d81171b2c69f45ef459e086"),
ObjectId("5d81171b2c69f45ef459e087")
],
"sku" : "5d81171c2c69f45ef459e0af"
},
And my Category Document is:
{
"_id" : ObjectId("5d81171b2c69f45ef459e087"),
"name" : "Art",
"description" : "These items are our artsy options.",
"created_at" : ISODate("2019-09-17T18:25:47.196+01:00")
},
My aim is to perform aggregation on the Product Document in order to count the number of items within each Category. So I have the Category "Art", I need to count the products are in the "Art" Category:
My current aggregate:
db.product.aggregate(
{ $unwind : "$categories" },
{
$group : {
"_id" : { "name" : "$name" },
"doc" : { $push : { "category" : "$categories" } },
}
},
{ $unwind : "$doc" },
{
$project : {
"_id" : 0,
"name" : "$name",
"category" : "$doc.category"
}
},
{
$group : {
"_id" : "$category",
"name": { "$first": "$name" },
"items_in_cat" : { $sum : 1 }
}
},
{ "$sort" : { "items_in_cat" : -1 } },
)
Which does actually work but not as I need:
{
"_id" : ObjectId("5d81171b2c69f45ef459e082"),
"name" : null, // Why is the name of the category no here?
"items_in_cat" : 4
},
As we can see the name is null. How can I aggregate the output to be:
{
"_id" : ObjectId("5d81171b2c69f45ef459e082"),
"name" : "Art",
"items_in_cat" : 4
},
We need to use $lookup to fetch the name from Category collection.
The following query can get us the expected output:
db.product.aggregate([
{
$unwind:"$categories"
},
{
$group:{
"_id":"$categories",
"items_in_cat":{
$sum:1
}
}
},
{
$lookup:{
"from":"category",
"let":{
"id":"$_id"
},
"pipeline":[
{
$match:{
$expr:{
$eq:["$_id","$$id"]
}
}
},
{
$project:{
"_id":0,
"name":1
}
}
],
"as":"categoryLookup"
}
},
{
$unwind:{
"path":"$categoryLookup",
"preserveNullAndEmptyArrays":true
}
},
{
$project:{
"_id":1,
"name":{
$ifNull:["$categoryLookup.name","NA"]
},
"items_in_cat":1
}
}
]).pretty()
Data set:
Collection: product
{
"_id" : ObjectId("5d81171c2c69f45ef459e0af"),
"type" : "T-Shirt",
"name" : "Panda",
"description" : "Panda's are cool.",
"image" : ObjectId("5d81171c2c69f45ef459e0ad"),
"created_at" : ISODate("2019-09-17T17:25:48.026Z"),
"is_featured" : false,
"sizes" : [
"XS",
"S",
"M",
"L",
"XL"
],
"tags" : [ ],
"pricing" : {
"price" : 26,
"sale_price" : 8
},
"categories" : [
ObjectId("5d81171b2c69f45ef459e086"),
ObjectId("5d81171b2c69f45ef459e087")
],
"sku" : "5d81171c2c69f45ef459e0af"
}
Collection: category
{
"_id" : ObjectId("5d81171b2c69f45ef459e086"),
"name" : "Art",
"description" : "These items are our artsy options.",
"created_at" : ISODate("2019-09-17T17:25:47.196Z")
}
{
"_id" : ObjectId("5d81171b2c69f45ef459e087"),
"name" : "Craft",
"description" : "These items are our artsy options.",
"created_at" : ISODate("2019-09-17T17:25:47.196Z")
}
Output:
{
"_id" : ObjectId("5d81171b2c69f45ef459e087"),
"items_in_cat" : 1,
"name" : "Craft"
}
{
"_id" : ObjectId("5d81171b2c69f45ef459e086"),
"items_in_cat" : 1,
"name" : "Art"
}
I have datas of following format collection(projects) inside my database:
{ "_id" : ObjectId("5981a80f223e491a58230e5d"), "id" : 2, "name" : "gbqplhlqxzwl", "managerId" : 65151, "startDate" : "03.11.1999", "finishDate" : "02.01.2003", "projectStatus" : "POSTPONED", "participants" : [ ], "estimatedBudget" : 6017891.811079914 }
{ "_id" : ObjectId("5981a80f223e491a58230e5e"), "id" : 3, "name" : "erfekfsdgryu", "managerId" : 83749, "startDate" : "07.07.2007", "finishDate" : "26.12.2027", "projectStatus" : "POSTPONED", "participants" : [ 19229, 81856, 79270, 5509, 70344, 39424 ], "estimatedBudget" : 3086213.8981674756 }
{ "_id" : ObjectId("5981a80f223e491a58230e5f"), "id" : 1, "name" : "jvbzobhppntd", "managerId" : 18925, "startDate" : "29.04.1999", "finishDate" : "13.10.2008", "projectStatus" : "OPEN", "participants" : [ 46100, 96968, 6676, 56121, 4716, 68901, 43990, 48587, 62547, 30292, 65153, 17551, 27083, 20261, 27097, 50036, 86585, 69890, 18790, 22592, 60774, 93709, 78471, 27157, 4328, 36501, 47296, 16831 ], "estimatedBudget" : 3581496.7068344904 }
{ "_id" : ObjectId("5981a80f223e491a58230e60"), "id" : 4, "name" : "cdspkkqwvwld", "managerId" : 62042, "startDate" : "13.03.1998", "finishDate" : "20.06.2007", "projectStatus" : "OPEN", "participants" : [ 53480, 60897, 23677, 22064, 60807, 66637, 84609, 28378, 87143, 27675, 79283, 94992, 20429, 48769, 91671, 41747, 21651, 91134, 41684, 57228, 51949, 18756, 45679, 87781, 67287, 6902, 27526 ], "estimatedBudget" : 2126283.953787842 }
....
I need to find the busiest employee and list all his projects.
participants array contains employee ids who participate in the project.
I use the following query to find the busiest employee:
db.projects.aggregate(
{
$unwind: '$participants'
},
{
$addFields: {
count: 1
}
},
{
$group: {
_id : '$participants',
participation_count : {
'$sum':'$count'
}
}
},
{
$sort:{participation_count:-1}
},
{
$limit:1
}
)
and this work correctly. But I have no ideas how to list all his projects.
any ideas?
db.projects.aggregate(
[
{
$unwind: '$participants'
},
{
$addFields: {
count: 1
}
},
{
$group: {
_id : '$participants',
participation_count : {'$sum':'$count'},
projectId : {$push: '$id'}
}
},
{
$sort:{participation_count:-1}
},
{
$limit:1
}
],
{
allowDiskUse:true
}
)
this is my data :
> db.bookmarks.find({"userId" : "56b9b74bf976ab70ff6b9999"}).pretty()
{
"_id" : ObjectId("56c2210fee4a33579f4202dd"),
"userId" : "56b9b74bf976ab70ff6b9999",
"items" : [
{
"itemId" : "28",
"timestamp" : "2016-02-12T18:07:28Z"
},
{
"itemId" : "29",
"timestamp" : "2016-02-12T18:07:29Z"
},
{
"itemId" : "30",
"timestamp" : "2016-02-12T18:07:30Z"
},
{
"itemId" : "31",
"timestamp" : "2016-02-12T18:07:31Z"
},
{
"itemId" : "32",
"timestamp" : "2016-02-12T18:07:32Z"
},
{
"itemId" : "33",
"timestamp" : "2016-02-12T18:07:33Z"
},
{
"itemId" : "34",
"timestamp" : "2016-02-12T18:07:34Z"
}
]
}
I want to have something like (actually i hope the _id can become userId too) :
{
"_id" : "56b9b74bf976ab70ff6b9999",
"items" : [
{ "itemId": "32", "timestamp": "2016-02-12T18:07:32Z" },
{ "itemId": "31", "timestamp": "2016-02-12T18:07:31Z" },
{ "itemId": "30", "timestamp": "2016-02-12T18:07:30Z" }
]
}
What I have now :
> db.bookmarks.aggregate(
... { $match: { "userId" : "56b9b74bf976ab70ff6b9999" } },
... { $unwind: '$items' },
... { $sort: { 'items.timestamp': -1} },
... { $skip: 2 },
... { $limit: 3},
... { $group: { '_id': '$userId' , items: { $push: '$items.itemId' } } }
... ).pretty()
{ "_id" : "56b9b74bf976ab70ff6b9999", "items" : [ "32", "31", "30" ] }
i tried to read the document in mongo and find out i can $push, but somehow i cannot find a way to push such object, which is not defined anywhere in the whole object. I want to have the timestamp also.. but i don't know how should i modified the $group (or others??) to do so. thanks for helping!
This code, which I tested in the MongoDB 3.2.1 shell, should give you the output format that you want:
> db.bookmarks.aggregate(
{ "$match" : { "userId" : "Ursula" } },
{ "$unwind" : "$items" },
{ "$sort" : { "items.timestamp" : -1 } },
{ "$skip" : 2 },
{ "$limit" : 3 },
{ "$group" : { "_id" : "$userId", items: { "$push" : { "myPlace" : "$items.itemId", "myStamp" : "$items.timestamp" } } } } ).pretty()
Running the above will produce this output:
{
"_id" : "Ursula",
"items" : [
{
"myPlace" : "52",
"myStamp" : ISODate("2016-02-13T18:07:32Z")
},
{
"myPlace" : "51",
"myStamp" : ISODate("2016-02-13T18:07:31Z")
},
{
"myPlace" : "50",
"myStamp" : ISODate("2016-02-13T18:07:30Z")
}
]
}
In MongoDB version 3.2.x, you can also use the $out operator in the very last stage of the aggregation pipeline, and have the output of the aggregation query written to a collection. Here is the code I used:
> db.bookmarks.aggregate(
{ "$match" : { "userId" : "Ursula" } },
{ "$unwind" : "$items" },
{ "$sort" : { "items.timestamp" : -1 } },
{ "$skip" : 2 },
{ "$limit" : 3 },
{ "$group" : { "_id" : "$userId", items: { "$push" : { "myPlace" : "$items.itemId", "myStamp" : "$items.timestamp" } } } },
{ "$out" : "ursula" } )
This gives me a collection named "ursula":
> show collections
ursula
and I can query that collection:
> db.ursula.find().pretty()
{
"_id" : "Ursula",
"items" : [
{
"myPlace" : "52",
"myStamp" : ISODate("2016-02-13T18:07:32Z")
},
{
"myPlace" : "51",
"myStamp" : ISODate("2016-02-13T18:07:31Z")
},
{
"myPlace" : "50",
"myStamp" : ISODate("2016-02-13T18:07:30Z")
}
]
}
>
Last of all, this is the input document I used in the aggregation query. You can compare this document to how I coded the aggregation query to see how I built the new items array.
> db.bookmarks.find( { "userId" : "Ursula" } ).pretty()
{
"_id" : ObjectId("56c240ed55f2f6004dc3b25c"),
"userId" : "Ursula",
"items" : [
{
"itemId" : "48",
"timestamp" : ISODate("2016-02-13T18:07:28Z")
},
{
"itemId" : "49",
"timestamp" : ISODate("2016-02-13T18:07:29Z")
},
{
"itemId" : "50",
"timestamp" : ISODate("2016-02-13T18:07:30Z")
},
{
"itemId" : "51",
"timestamp" : ISODate("2016-02-13T18:07:31Z")
},
{
"itemId" : "52",
"timestamp" : ISODate("2016-02-13T18:07:32Z")
},
{
"itemId" : "53",
"timestamp" : ISODate("2016-02-13T18:07:33Z")
},
{
"itemId" : "54",
"timestamp" : ISODate("2016-02-13T18:07:34Z")
}
]
}
I'm having trouble figuring out the right aggregation pipe operations to return the results I need.
I have a collection similar to the following :-
{
"_id" : "writer1",
"Name" : "writer1",
"Website" : "website1",
"Reviews" : [
{
"Film" : {
"Name" : "Jurassic Park",
"Genre" : "Action"
},
"Score" : 4
},
{
"Technology" : {
"Name" : "Mad Max",
"Genre" : "Action"
},
"Score" : 5
}
]
}
{
"_id" : "writer2",
"Name" : "writer2",
"Website" : "website1",
"Reviews" : [
{
"Technology" : {
"Name" : "Mad Max",
"Genre" : "Action"
},
"Score" : 5
}
]
}
And this is my aggregation so far : -
db.writers.aggregate([
{ "$unwind" : "$Reviews" },
{ "$match" : { "Reviews.Film.Name" : "Jurassic Park" } },
{ "$group" : { "_id" : "$Website" , "score" : { "$avg" : "$Reviews.Score" },
writers :{ $push: { name:"$Name", score:"$Reviews.Score" } }
}}
])
This returns only writers who have a review of the matching film and also only websites that have at least 1 writer who has reviewed the film,
however, I need to return all websites containing a list of their all writers, with a score of 0 if they haven't written a review for the specified film.
so, I am currently getting : -
{ "_id" : "website1", "score" : 4, "writers" : [ { "name" : "writer1", "score" : 4 } ] }
When I actually need : -
{ "_id" : "website1", "score" : 2, "writers" : [ { "name" : "writer1", "score" : 4 },{ "name" :"writer2", "score" : 0 } ] }
Can anyone point me in the right direction?
Cheers