mongodb find matches based on count aggregation - mongodb

I have a mongodb collection like this:
{"uid": "01370mask4",
"title": "hidden",
"post: "hidden",
"postTime": "01-23, 2016",
"unixPostTime": "1453538601",
"upvote": [2, 3]}
and I'd like to select post records from the users with more than 5 posts. The stucture should be the same, I just don't need the documents from users who don't have many posts.
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } }
]
)
Now I'm stuck at how to use the count values to find. I searched but didn't find any methods to add the count values back to the same collection by uid. Saving the aggregation output and joining them together seems not supported by mongodb. Please advise, thanks!
Update:
Sorry that I didn't make it clear earlier. Thanks for your prompt answers! I want a subset of the original collection, with post text, post timestamp, etc. I don't want a subset of the aggregation output.

If there aren't millions of documents, then you can try a shortcut way to achieve what you are trying using one aggregate and another find query,
Aggregate query:
var users = db.collection.aggregate(
[
{$group:{_id:'$uid', count:{$sum:1}}},
{$match:{count:{$gt:5}}},
{$group:{_id:null,users:{$push:'$_id'}}}
]
).toArray()[0]['users']
Then it's a straight ahead query to find the particular users:
db.collection.find({uid: {$in: users}})

Just add the $match after your group with the correct query and it works :
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } },
{ $match : { count : { $gt : 5 } }
]
)

Please try this one to select users with more than 5 posts. To keep the original fields through using $first, if the $uid is unique, please add the field as below.
db.collection.aggregate([
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}])
)
If there are multiple value for the same $uid, you should use $push to an array in the $group.
If you want to save the result to db, please try it as below
var cur = db.collection.aggregate(
[
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}
]
)
cur.forEach(function(doc) {
db.collectioin.update({_id: doc._id}, {/*the field should be updated */});
});

Related

Looking for proper way to prioritize certain documents in Mongodb query

I was looking all over the place and I couldn't find a proper source for the problem I need to solve.
given record data, I need to prioritize some documents over others when I query all.
for example: lets say i'm doing this search
db.users.find().limit(10)
and my document has data with id = 1,2,3,....50;
how can I prioritize the query of id=12, or id=49 first?
what I would want to get back:
array({id=12}, {id=49} ... fill the rest until pager limit)
I tried using $or like this:
{
"$or": [
{'_id': {'$in': [id=12,id=49]}},
{}
]
}
But I don't think this is the proper way of doing this and it's not working
Any help would be greatly appreciated
You can use aggregate() method,
$addFields to add new fields for sorting purpose hasId, check condition if your field _id in your input ids then return 1 otherwise removes field
$sort by hasId in descending order
$limit documents
db.collection.aggregate([
{
$addFields: {
hasId: {
$cond: [
{ $in: ["$_id", [8, 5]] },
1,
"$$REMOVE"
]
}
}
},
{ $sort: { hasId: -1 } },
{ $limit: 5 }
])
Playground

Get the number of documents liked per document in MongoDB

I'm working on a project by using MongoDB as a database and I'm encountering a problem: I can't find the right query to make a simple count of the likes of a document. The collection that I use is this :
{ "username" : "example1",
"like" : [ { "document_id" : "doc1" },
"document_id" : "doc2 },
...]
}
So what I need is to compute is the number of likes of each document so at the end I will have
{ "document_id" : "docA" , nbLikes : 30 }, {"document_id" : "docB", nbLikes : 1}
Can anyone help me on this because I failed.
You can do this by unwinding the like array of each doc and then grouping by document_id to get a count for each value:
db.test.aggregate([
// Duplicate each doc, once per 'like' array element
{$unwind: '$like'},
// Group them by document_id and assemble a count
{$group: {_id: '$like.document_id', nbLikes: {$sum: 1}}},
// Reshape the docs to match the desired output
{$project: {_id: 0, document_id: '$_id', nbLikes: 1}}
])
Add "likeCount" field and increase count for per $push operation and read "likeCount" field
db.test.update(
{ _id: "..." },
{
$inc: { likeCount: 1 },
$push: { like: { "document_id" : "doc1" } }
}
)

MongoDB: count the number of items in an array

I have a collection where every document in the collection has an array named foo that contains a set of embedded documents. Is there currently a trivial way in the MongoDB shell to count how many instances are within foo? something like:
db.mycollection.foos.count() or db.mycollection.foos.size()?
Each document in the array needs to have a unique foo_id and I want to do a quick count to make sure that the right amount of elements are inside of an array for a random document in the collection.
In MongoDB 2.6, the Aggregation Framework has a new array $size operator you can use:
> db.mycollection.insert({'foo':[1,2,3,4]})
> db.mycollection.insert({'foo':[5,6,7]})
> db.mycollection.aggregate([{$project: { count: { $size:"$foo" }}}])
{ "_id" : ObjectId("5314b5c360477752b449eedf"), "count" : 4 }
{ "_id" : ObjectId("5314b5c860477752b449eee0"), "count" : 3 }
if you are on a recent version of mongo (2.2 and later) you can use the aggregation framework.
db.mycollection.aggregate([
{$unwind: '$foo'},
{$group: {_id: '$_id', 'sum': { $sum: 1}}},
{$group: {_id: null, total_sum: {'$sum': '$sum'}}}
])
which will give you the total foos of your collection.
Omitting the last group will aggregate results per record.
Using Projections and Groups
db.mycollection.aggregate(
[
{
$project: {
_id:0,
foo_count:{$size:"$foo"},
}
},
{
$group: {
foo_total:{$sum:"$foo_count"},
}
}
]
)
Multiple child array counts can also be calculated this way
db.mycollection.aggregate(
[
{
$project: {
_id:0,
foo1_count:{$size:"$foo1"},
foo2_count:{$size:"$foo2"},
}
},
{
$group: {
foo1_total:{$sum:"$foo1_count"},
foo2_total:{$sum:"$foo2_count"},
}
}
]
)

How to aggregate queries in mongodb

I have a document collection that look like the following:
{
name : "tester"
, activity: [
{
gear: "glasses"
where: "outside"
}
, {
gear: "hat"
, where: "inside"
}
, {
gear: "glasses"
, where: "car"
}
]
}
How do I query the collection to return only documents with multiple activities that contain the value of "gear":"glasses"?
Thanks!
I think it's possible to do without aggregation framework, if you need full document filtered by your condition:
db.collection.find({
"activity": {$elemMatch: {gear:"glasses"}},
"activity.1" : {$exists: 1}
})
This is going to be ugly with aggregation framework, but it can be done:
db.collection.aggregate(
{$match: {"activity.gear": "glasses"}},
{$unwind: "$activity"},
{$group: {
_id: {_id: "$_id", name: "$name"},
_count: {$sum: {$cond: [{$eq: ["glasses", "$activity.gear"]}, 1, 0]}}
}},
{$match: {_count: {$gt: 1}}}
)
When analyzing the above query, I would recommend walking through step. Start with just the "$match", the the "$match" and "$unwind". And so one. You will see how each step works.
The response is not the full document. If you are looking for the full document, include a $project step that passes through a dummy activity, and reconstruct the full document on the output.
You can also try this:
db.collection.find( { activity: { $elemMatch: { gear: "glasses" } } )

Querying internal array size in MongoDB

Consider a MongoDB document in users collection:
{ username : 'Alex', tags: ['C#', 'Java', 'C++'] }
Is there any way, to get the length of the tags array from the server side (without passing the tags to the client) ?
Thank you!
if username Alex is unique, you can use next code:
db.test.insert({username:"Alex", tags: ['C#', 'Java', 'C++'] });
db.test.aggregate(
{$match: {username : "Alex"}},
{$unwind: "$tags"},
{$project: {count:{$add:1}}},
{$group: {_id: null, number: {$sum: "$count" }}}
);
{ "result" : [ { "_id" : null, "number" : 3 } ], "ok" : 1 }
Now MongoDB (2.6 release) supports $size operation in aggregation.
From the documentation:
{ <field>: { $size: <array> } }
What you want can be accomplished as following with either by using this:
db.users.aggregate(
[
{
$group: {
_id: "$username",
tags_count: {$first: {$size: "$tags" }}
}
}
]
)
or
db.users.aggregate(
[
{
$project: {
tags_count: {$size: "$tags"}
}
}
]
)
I think it might be more efficient to calculate the number of tags on each save (as a separate field) using $inc perhaps or via a job on a schedule.
You could also do this with map/reduce (the canonical example) but that doesn't seem to be be what you'd want.
I'm not sure it's possible to do exactly what you are asking, but you can query all the documents that match a certain size with $size ...
> db.collection.find({ tags : { $size: 3 }});
That'd get you all the documents with 3 tags ...
xmm.dev's answer can be simplified: instead of having interm field 'count', you can sum directly in $group:
db.test.aggregate(
{$match: {username : "Alex"}},
{$unwind: "$tags"},
{$group: {_id: null, number: {$sum: 1 }}}
)
Currently, the only way to do it seems to be using db.eval, but this locks database for other operations.
The most speed-efficient way would be adding an extra field that stores the length of the array and
maintaining it by $inc and $push operations.
I did a small work around as I needed to query the array size and return if it was greater than 0 but could be anything from 1-3.
Here was my solution:
db.test.find($or : [{$field : { $exists : true, $size : 1}},
{$field : { $exists : true, $size : 2}},
{$field : { $exists : true, $size : 3}}, ])
This basically returns a document when the attribute exists and the size is 1, 2, or 3. The user can add more statements and increment if they are looking for a specific size or within a range. I know its not perfect but it did work and was relatively quick. I only had 1-3 sizes in my attribute so this solution worked.