Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
Use $project to save tag and count into tmp
Use $push or addToSet to store tmp into your data list.
Code:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)
Related
I have below collection, need to find duplicate records in mongo, how can we find that as below is one sample of collection we have around more then 10000 records of collections.
/* 1 */
{
"_id" : 1814099,
"eventId" : "LAS012",
"eventName" : "CustomerTab",
"timeStamp" : ISODate("2018-12-31T20:09:09.820Z"),
"eventMethod" : "click",
"resourceName" : "CustomerTab",
"targetType" : "",
"resourseUrl" : "",
"operationName" : "",
"functionStatus" : "",
"results" : "",
"pageId" : "CustomerPage",
"ban" : "290824901",
"jobId" : "87377713",
"wrid" : "87377713",
"jobType" : "IBJ7FXXS",
"Uid" : "sc343x",
"techRegion" : "W",
"mgmtReportingFunction" : "N",
"recordPublishIndicator" : "Y",
"__v" : 0
}
We can first find the unique ids using
const data = await db.collection.aggregate([
{
$group: {
_id: "$eventId",
id: {
"$first": "$_id"
}
}
},
{
$group: {
_id: null,
uniqueIds: {
$push: "$id"
}
}
}
]);
And then we can make another query, which will find all the duplicate documents
db.collection.find({_id: {$nin: data.uniqueIds}})
This will find all the documents that are redundant.
Another way
To find the event ids which are duplicated
db.collection.aggregate(
{"$group" : { "_id": "$eventId", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }
)
To get duplicates from db, you need to get only the groups that have a count of more than one, we can use the $match operator to filter our results. Within the $match pipeline operator, we'll tell it to look at the count field and tell it to look for counts greater than one using the $gt operator representing "greater than" and the number 1. This looks like the following:
db.collection.aggregate([
{$group: {
_id: {eventId: "$eventId"},
uniqueIds: {$addToSet: "$_id"},
count: {$sum: 1}
}
},
{$match: {
count: {"$gt": 1}
}
}
]);
I assume that eventId is a unique id.
I want to get the order of some user from a list after $sort aggregation pipeline.
Let's say we have a leaderboard, and I need to get my rank in the leaderboard with only one query getting only my data.
I have tried $addFields and some queries with $map
Let's say we have these documents
/* 1 createdAt:8/18/2019, 4:42:41 PM*/
{
"_id" : ObjectId("5d5963e1c6c93b2da849f067"),
"name" : "x4",
"points" : 69
},
/* 2 createdAt:8/18/2019, 4:42:41 PM*/
{
"_id" : ObjectId("5d5963e1c6c93b2da849f07b"),
"name" : "x24",
"points" : 968
},
/* 3 createdAt:8/18/2019, 4:42:41 PM*/
{
"_id" : ObjectId("5d5963e1c6c93b2da849f06a"),
"name" : "x7",
"points" : 997
},
And I want to write a query like this
db.table.aggregate(
[
{ $sort : { points : 1 } },
{ $addFields: { order : "$index" } },
{ $match : { name : "x24" } }
]
)
I need to inject the order field with something like $index
I expect to have something like this in return
{
"_id" : ObjectId("5d5963e1c6c93b2da849f07b"),
"name" : "x24",
"points" : 968,
"order" : 2
}
I need something like the metadata of the result here which return 2
/* 2 createdAt:8/18/2019, 4:42:41 PM*/
One of the workaround for this situation is to convert your all documents into one single array and hence resolve the index of the document using this array with help of $unwind and finally project the data with fields as required.
db.collection.aggregate([
{ $sort: { points: 1 } },
{
$group: {
_id: 1,
register: { $push: { _id: "$_id", name: "$name", points: "$points" } }
}
},
{ $unwind: { path: "$register", includeArrayIndex: "order" } },
{ $match: { "register.name": "x4" } },
{
$project: {
_id: "$register._id",
name: "$register.name",
points: "$register.points",
order: 1
}
}
]);
To make it more efficient you can apply limit, match, and filter as per your requirement.
I have a collection named users with the following structure to its documents
{
"_id" : <user_id>,
"NAME" : "ABC",
"TIME" : 53.0,
"OBJECTS" : 1
},
{
"_id" : <user_id>,
"NAME" : "ABCD",
"TIME" : 353.0,
"OBJECTS" : 70
}
Now, I want to sum the value of OBJECTS over the entire collection and return the value along with the objects.
Something like this
{
{
"_id" : <user_id>,
"NAME" : "ABC",
"TIME" : 53.0,
"OBJECTS" : 1
},
{
"_id" : <user_id>,
"NAME" : "ABCD",
"TIME" : 353.0,
"OBJECTS" : 70
},
"TOTAL_OBJECTS": 71
}
Or any way wherein I don't have to compute on the received object and can directly access from it. Now, I've tried looking this up but I found none where the hierarchy of the existing documents isn't destroyed.
You can use $group specifying null as a grouping id. You'll gather all documents into one array (using $$ROOT variable) and another field can represent a sum of OBJECT like below:
db.users.aggregate([
{
$group: {
_id: null,
documents: { $push: "$$ROOT" },
TOTAL_OBJECTS: { $sum: "$OBJECTS" }
}
}
])
db.users.aggregate(
// Pipeline
[
// Stage 1
{
$group: {
_id: null,
TOTAL_OBJECTS: {
$sum: '$OBJECTS'
},
documents: {
$addToSet: '$$CURRENT'
}
}
},
]
);
Into above aggregate query I have pushed all documents into an array using $addToSet operator as a part of $group stage of aggregate operation
Is there a query i can use on the following collection to get the result at the bottom?
Example:
{
"_id" : ObectId(xyz),
"name" : "Carl",
"something":"else"
},
{
"_id" : ObectId(aaa),
"name" : "Lenny",
"something":"else"
},
{
"_id" : ObectId(bbb),
"name" : "Carl",
"something":"other"
}
I need a query to get this result:
{
"_id" : ObectId(xyz),
"name" : "Carl"
},
{
"_id" : ObectId(aaa),
"name" : "Lenny"
},
A set of documents with no identical names. Its not important which _ids are kept.
You can use aggregation framework to get this shape, the query could look like this:
db.collection.aggregate(
[
{
$group:
{
_id: "$name",
id: { $first: "$_id" }
}
},
{
$project:{
_id:"$id",
name:"$_id"
}
}
]
)
As long as you don't need other fields this will be sufficient.
If you need to add other fields - please update document structure and expected result.
as you don't care about ids it can be simplified
db.collection.aggregate([{$group:{_id: "$name"}}])
On Mongo 2.4.6
Collection of Users
{
"_id" : User1,
"orgRoles" : [
{"_id" : 1, "app" : "ANGRYBIRDS", "orgId" : "CODOE"},
{"_id" : 2, "app" : "ANGRYBIRDS", "orgId" : "MSDN"}
],
},
{
"_id" : User2,
"orgRoles" : [
{"_id" : 1, "app" : "ANGRYBIRDS", "orgId" : "CODOE"},
{"_id" : 2, "app" : "HUNGRYPIGS", "orgId" : "MSDN"}
],
},
{
"_id" : User2,
"orgRoles" : [
{"_id" : 1, "app" : "ANGRYBIRDS", "orgId" : "YAHOO"},
{"_id" : 2, "app" : "HUNGRYPIGS", "orgId" : "MSDN"}
],
}
With data that looks like above, I'm trying to write a query to get:
All the id's of the users that have only one ANGRYBIRDS app and that ANGRYBIRDS app is in the CODOE organization.
So it would return User2 because they have 1 ANGRYBIRDS and is in the ORG "CODOE" but not User1 because they have two ANGRYBIRDS or User3 because they don't have an ANGRYBIRDS app in the "CODOE" organization. I'm fairly new to mongo queries, so any help is appreciated.
To do something with a few more detailed conditions not immediately offered by standard operators, then your best approach is to use the aggregation framework. This allows you do some processing to work our your conditions, such as the number of matches:
db.collection.aggregate([
// Filter the documents that are possible matches
{ "$match": {
"orgRoles": {
"$elemMatch": {
"app": "ANGRYBIRDS", "orgId": "CODOE"
}
}
}},
// De-normalize the array content
{ "$unwind": "$orgRoles" },
// Group and count the matches
{ "$group": {
"_id": "$_id",
"orgRoles": { "$push": "$orgRoles" },
"matched": {
"$sum": {
"$cond": [
{ "$eq": ["$orgRoles.app", "ANGRYBIRDS"] },
1,
0
]
}
}
}},
// Filter where matched is more that 1
{ "$match": {
"orgRoles": {
"$elemMatch": {
"app": "ANGRYBIRDS", "orgId": "CODOE"
}
},
"matched": 1
}},
// Optionally project to just keep the original fields
{ "$project": { "orgRoles": 1 } }
])
The main thing here happens after the initial $match is processed to only return those documents that have at least one array element matching the main condition, and then after the array elements are processed with $unwind so they can be inspected individually.
The trick is the conditional $sum operation with the $cond operator which is a "ternary". This evaluates "howMany" matches were found in the array to the "ANGRYBIRDS" string. Following this you $match again in order to "filter" any documents that had a match count of more than one. Still leaving the other condition in there, but that is really not necessary.
Just for the record, this is also possible with using the JavaScript evaluation of the $where clause, but due to that it is likely not to be as efficient at processing:
db.collection.find({
"orgRoles": {
"$elemMatch": {
"app": "ANGRYBIRDS", "orgId": "CODOE"
}
},
"$where": function() {
var orgs = this.orgRoles.filter(function(el) {
return el.app == "ANGRYBIRDS";
});
return ( orgs.length == 1 );
}
})
One way of doing it using the aggregation pipeline is:
db.users.aggregate([
// Match the documents with app being "ANGRYBIRDS" and orgID being "CODE"
// Note that this step filters out most of the documents and is good to have
// at the start of the pipeline, moreover it can make use of indexes, if
// used at the beginning of the aggregation pipeline.
{
$match : {
"orgRoles.app" : "ANGRYBIRDS",
"orgRoles.orgId" : "CODOE"
}
},
// unwind the elements in the orgRoles array
{
$unwind : "$orgRoles"
},
// group by userid and app
{
$group : {
"_id" : {
"id" : "$_id",
"app" : "$orgRoles.app"
},
// take the id and app of the first document in each group, since all
// the
// other documents in the group will have the same values.
"id" : {
$first : "$_id"
},
"app" : {
$first : "$orgRoles.app"
},
// orgId can be different, so form an array for each group.
"orgId" : {
$push : {
"id" : "$orgRoles.orgId"
}
},
// count the number of documents in each group.
"count" : {
$sum : 1
}
}
},
// find the matching group
{
$match : {
"count" : 1,
"app" : "ANGRYBIRDS",
"orgId" : {
$elemMatch : {
"id" : "CODOE"
}
}
}
},
// project only the userid
{
$project : {
"id" : 1,
"_id" : 0
}
} ]);
Edit: Removed mapping the aggregation result, since the problem requires solution in v2.4.6, and according to the documentation.
Changed in version 2.6: The db.collection.aggregate() method returns a cursor and can return result sets of any size. Previous versions
returned all results in a single document, and the result set was
subject to a size limit of 16 megabytes.