MongoDB - Grouping by inner-documents and retrieving top results - mongodb

I'm trying to find the most common (and least common) skills stored in the mongo database. I'm using mongoose to retrieve the results.
The User is the root document, which each have an inner Profile document. The profile has an attribute of 'skills' which contain an array of ProfileSkillEntry's which has a title (the skill name).
return User.aggregate([{
$group: {
'_id': '$profile.skills.title',
'count': {
$sum: 1
}
}
}, {
$sort: {
'count': -1
}
}, {
$limit: 5
}]);
I expect it to combine all of the registered Users skills together, find the top 5 occurring and return that. Instead it seems to be grouping per-user and giving invalid results.
Example User document structure:
{
"_id" : ObjectId("..."),
"firstName" : "Harry",
"lastName" : "Potter",
"profile" : {
"_id" : ObjectId("..."),
"skills" : [
{
"_id" : ObjectId("..."),
"title" : "Java",
"description" : "Master",
"dateFrom" : "31/07/2019",
"coreSkill" : true
},
{
"_id" : ObjectId("..."),
"title" : "JavaScript",
"description" : "Proficient",
"dateFrom" : "31/07/2019",
"coreSkill" : false
}
],
}
}

Please use the below query. Just add the sort and limit as per your requirement
db.test.aggregate(
[{ $unwind: { path: "$profile.skills"} },
{ $group: { _id: "$profile.skills.title",
"count": { $sum: 1 }} }] )

Related

mongodb get entries where id exists multiple times based on count condition

I have a collection 'bookings' with the following example structure:
{
"_id" : ObjectId("1"),
"user" : ObjectId("1"),
"event" : ObjectId("1"),
},
{
"_id" : ObjectId("2"),
"user" : ObjectId("1"),
"event" : ObjectId("1"),
},
{
"_id" : ObjectId("3"),
"user" : ObjectId("2"),
"event" : ObjectId("1"),
},
{
"_id" : ObjectId("4"),
"user" : ObjectId("3"),
"event" : ObjectId("1"),
},
{
"_id" : ObjectId("5"),
"user" : ObjectId("4"),
"event" : ObjectId("2"),
},
{
"_id" : ObjectId("6"),
"user" : ObjectId("1"),
"event" : ObjectId("2"),
}
I cant figure out a query that shows all "event" id's in which the same "user" id appears multiple times. something like this:
{
"event": 1,
"user": 1,
"count": 2
}
Does not have to be this exact output, in other words I just want a way to have a query to get all events for which the same "user" id has more than one entry in this "bookings" collection.
Any suggestions? Thanks!
You just need to do grouping and filtering.
In SQL it would be just as simple as
SELECT count(*) as cc, user, event FROMM t1 GROUP BY user, event HAVING cc > 1
In MongoDB, you can use the aggregation framework to do equivalent stuff.
It does the same in 3 different steps in the pipeline: group, filter, project.
db.mycollection.aggregate( [
{ $group: { _id: { user: "$user", event: "$event", }, count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } },
{ $project: { _id: 0,
userId: "$_id. user",
event: "$_id.event",
count: 1
}
}
] )
This documentation can help you to understand deeper: https://www.mongodb.com/docs/manual/reference/sql-aggregation-comparison/

How to return just the nested documents of an array from all documents

I have a question about querying nested documents. I tried to search but nothing answered my question or I am maybe overlooking it. I have structure like this:
{
"_id" : ObjectId("592aa441e0f8de09b0912fe9"),
"name" : "Patrick Rothfuss",
"books" : [
{
"title" : "Name of the wind",
"pages" : 400,
"_id" : ObjectId("592aa441e0f8de09b0912fea")
},
{
"title" : "Wise Man's Fear",
"pages" : 500,
"_id" : ObjectId("592aa441e0f8de09b0912feb")
},
},
{
"_id" : ObjectId("592aa441e0f8de09b0912fe9"),
"name" : "Rober Jordan",
"books" : [
{
"title" : "The Eye of the World",
"pages" : 400,
"_id" : ObjectId("592aa441e0f8de09b0912fea")
},
{
"title" : "The Great Hunt",
"pages" : 500,
"_id" : ObjectId("592aa441e0f8de09b0912feb")
}
},
And I would like to query for the list of all books in entire colletion of Authors - something like:
"books" : [
{
"title" : "The Eye of the World",
"pages" : 400,
"_id" : ObjectId("592aa441e0f8de09b0912fea")
},
{
"title" : "The Great Hunt",
"pages" : 500,
"_id" : ObjectId("592aa441e0f8de09b0912feb")
},
{
"title" : "Name of the wind",
"pages" : 400,
"_id" : ObjectId("592aa441e0f8de09b0912fea")
},
{
"title" : "Wise Man's Fear",
"pages" : 500,
"_id" : ObjectId("592aa441e0f8de09b0912fea")
}]
You can do this using .aggregate() and predominantly the $unwind pipeline operator:
In modern MongoDB 3.4 and above you can use in tandem with $replaceRoot
Model.aggregate([
{ "$unwind": "$books" },
{ "$replaceRoot": { "newRoot": "$books" } }
],function(err,results) {
})
In earlier versions you specify all fields with $project:
Model.aggregate([
{ "$unwind": "$books" },
{ "$project": {
"_id": "$books._id",
"pages": "$books.pages",
"title": "$books.title"
}}
],function(err,results) {
})
So $unwind is what you use to deconstruct or "denormalise" the array entries for processing. Effectively this creates a copy of the whole document for each member of the array.
The rest of the task is about returning "only" those fields present in the array.
It's not a very wise thing to do though. If your intent is to only return content embedded within an array of a document, then you would be better off putting that content into a separate collection instead.
It's far better for performance, pulling apart a all documents from a collection with the aggregation framework, just to list those documents from the array only.
According to above mentioned description please try executing following query in MongoDB shell.
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: "$books"
},
// Stage 2
{
$group: {
_id:null,
books:{$addToSet:'$books'}
}
},
// Stage 3
{
$project: {
books:1,
_id:0
}
},
]
);

combining distinct on projection in mongodb

Is there a query i can use on the following collection to get the result at the bottom?
Example:
{
"_id" : ObectId(xyz),
"name" : "Carl",
"something":"else"
},
{
"_id" : ObectId(aaa),
"name" : "Lenny",
"something":"else"
},
{
"_id" : ObectId(bbb),
"name" : "Carl",
"something":"other"
}
I need a query to get this result:
{
"_id" : ObectId(xyz),
"name" : "Carl"
},
{
"_id" : ObectId(aaa),
"name" : "Lenny"
},
A set of documents with no identical names. Its not important which _ids are kept.
You can use aggregation framework to get this shape, the query could look like this:
db.collection.aggregate(
[
{
$group:
{
_id: "$name",
id: { $first: "$_id" }
}
},
{
$project:{
_id:"$id",
name:"$_id"
}
}
]
)
As long as you don't need other fields this will be sufficient.
If you need to add other fields - please update document structure and expected result.
as you don't care about ids it can be simplified
db.collection.aggregate([{$group:{_id: "$name"}}])

How to do this query in mongo: get newest messages for a list of users

I have a collection of messages with fields user_id, created_time, and content. Given a list of user_id, I would like to get back a list of messages, where for each user_id it contains a message newest with respect to that user. I thought about using a distinct command together with sort in mongo but that doesn't seem to be supported. Is there a way to do this in mongo using a single query?
MongoDB has the Aggregation framework which you can use for tasks that require some manipulation of your data in your collection
Consider the following dataset
> db.messages.find().pretty()
{
"_id" : ObjectId("52ecb77486d35a12f3552aa1"),
"user_id" : "fred",
"create_date" : ISODate("1392-09-21T00:00:00Z")
}
{
"_id" : ObjectId("52ecb79286d35a12f3552aa2"),
"user_id" : "fred",
"create_date" : ISODate("1392-06-01T00:00:00Z")
}
{
"_id" : ObjectId("52ecb7a386d35a12f3552aa3"),
"user_id" : "marty",
"create_date" : ISODate("1393-04-06T00:00:00Z")
}
{
"_id" : ObjectId("52ecb7af86d35a12f3552aa4"),
"user_id" : "marty",
"create_date" : ISODate("1386-02-12T00:00:00Z")
}
So in passing this to aggregate we want to group on user_id and get the most recent or maximum create_date
> db.messages.aggregate([
{ $group: { _id: { user_id: "$user_id" }, create_date: { $max: "$create_date" }} }
])
{
"result" : [
{
"_id" : {
"user_id" : "marty"
},
"create_date" : ISODate("1393-04-06T00:00:00Z")
},
{
"_id" : {
"user_id" : "fred"
},
"create_date" : ISODate("1392-09-21T00:00:00Z")
}
],
"ok" : 1
}
That's not bad but you can clean it up with $project
> db.messages.aggregate([
{ $group: { _id: { user_id: "$user_id" }, create_date: { $max: "$create_date" }} },
{ $project: { _id: 0, user_id: "$_id.user_id", create_date: 1} }
])
{
"result" : [
{
"create_date" : ISODate("1393-04-06T00:00:00Z"),
"user_id" : "marty"
},
{
"create_date" : ISODate("1392-09-21T00:00:00Z"),
"user_id" : "fred"
}
],
"ok" : 1
}
So that actually looks like a clean record to use. In latest drivers the returned value from aggregate should be a cursor you can iterate over. So the results are just the same to work with as using find.
Additional documentation on operators to use can be found here.

Obtaining $group result with group count

Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
Use $project to save tag and count into tmp
Use $push or addToSet to store tmp into your data list.
Code:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)