Get the number of documents liked per document in MongoDB - mongodb

I'm working on a project by using MongoDB as a database and I'm encountering a problem: I can't find the right query to make a simple count of the likes of a document. The collection that I use is this :
{ "username" : "example1",
"like" : [ { "document_id" : "doc1" },
"document_id" : "doc2 },
...]
}
So what I need is to compute is the number of likes of each document so at the end I will have
{ "document_id" : "docA" , nbLikes : 30 }, {"document_id" : "docB", nbLikes : 1}
Can anyone help me on this because I failed.

You can do this by unwinding the like array of each doc and then grouping by document_id to get a count for each value:
db.test.aggregate([
// Duplicate each doc, once per 'like' array element
{$unwind: '$like'},
// Group them by document_id and assemble a count
{$group: {_id: '$like.document_id', nbLikes: {$sum: 1}}},
// Reshape the docs to match the desired output
{$project: {_id: 0, document_id: '$_id', nbLikes: 1}}
])

Add "likeCount" field and increase count for per $push operation and read "likeCount" field
db.test.update(
{ _id: "..." },
{
$inc: { likeCount: 1 },
$push: { like: { "document_id" : "doc1" } }
}
)

Related

Aggregate on array of embedded documents

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:
{
"name": string,
"count": integer
}
Now I want to aggregate these subdocuments to find
The top X counts and their name.
Same as 1. but the names have to match a regex before sorting and limiting.
I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.
[{
$match: {
_id: id
}
}, {
$unwind: {
path: "$array"
}
}, {
$sort: {
'count': -1
}
}, {
$limit: x
}]
Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.
The sort has to include the array name in order to avoid an additional sort later on.
Given the following document to work with:
{
students: [{
count: 4,
name: "Ann"
}, {
count: 7,
name: "Brad"
}, {
count: 6,
name: "Beth"
}, {
count: 8,
name: "Catherine"
}]
}
As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.
db.tests.aggregate([
{$match: {
_id: ObjectId("5c1b191b251d9663f4e3ce65")
}},
{$unwind: {
path: "$students"
}},
{$match: {
"students.name": /[he]/
}},
{$sort: {
"students.count": -1
}},
{$limit: 2}
])
This is the output given the above mentioned input:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }
Both names contain the letters "h" and "e", and the output is sorted from high to low.
When setting the limit to 1, the output is limited to:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
In this case only the highest count has been kept after having matched the names.
=====================
Edit for the extra question:
Yes, the first $match can be changed to filter on specific universities.
{$match: {
university: "University X"
}},
That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.
The following match would retrieve the students for the given university for a given academic year in case that would be needed.
{$match: {
university: "University X",
academic_year: "2018-2019"
}},
That should narrow it down to get the correct documents.

mongodb find matches based on count aggregation

I have a mongodb collection like this:
{"uid": "01370mask4",
"title": "hidden",
"post: "hidden",
"postTime": "01-23, 2016",
"unixPostTime": "1453538601",
"upvote": [2, 3]}
and I'd like to select post records from the users with more than 5 posts. The stucture should be the same, I just don't need the documents from users who don't have many posts.
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } }
]
)
Now I'm stuck at how to use the count values to find. I searched but didn't find any methods to add the count values back to the same collection by uid. Saving the aggregation output and joining them together seems not supported by mongodb. Please advise, thanks!
Update:
Sorry that I didn't make it clear earlier. Thanks for your prompt answers! I want a subset of the original collection, with post text, post timestamp, etc. I don't want a subset of the aggregation output.
If there aren't millions of documents, then you can try a shortcut way to achieve what you are trying using one aggregate and another find query,
Aggregate query:
var users = db.collection.aggregate(
[
{$group:{_id:'$uid', count:{$sum:1}}},
{$match:{count:{$gt:5}}},
{$group:{_id:null,users:{$push:'$_id'}}}
]
).toArray()[0]['users']
Then it's a straight ahead query to find the particular users:
db.collection.find({uid: {$in: users}})
Just add the $match after your group with the correct query and it works :
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } },
{ $match : { count : { $gt : 5 } }
]
)
Please try this one to select users with more than 5 posts. To keep the original fields through using $first, if the $uid is unique, please add the field as below.
db.collection.aggregate([
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}])
)
If there are multiple value for the same $uid, you should use $push to an array in the $group.
If you want to save the result to db, please try it as below
var cur = db.collection.aggregate(
[
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}
]
)
cur.forEach(function(doc) {
db.collectioin.update({_id: doc._id}, {/*the field should be updated */});
});

MongoDB: count the number of items in an array

I have a collection where every document in the collection has an array named foo that contains a set of embedded documents. Is there currently a trivial way in the MongoDB shell to count how many instances are within foo? something like:
db.mycollection.foos.count() or db.mycollection.foos.size()?
Each document in the array needs to have a unique foo_id and I want to do a quick count to make sure that the right amount of elements are inside of an array for a random document in the collection.
In MongoDB 2.6, the Aggregation Framework has a new array $size operator you can use:
> db.mycollection.insert({'foo':[1,2,3,4]})
> db.mycollection.insert({'foo':[5,6,7]})
> db.mycollection.aggregate([{$project: { count: { $size:"$foo" }}}])
{ "_id" : ObjectId("5314b5c360477752b449eedf"), "count" : 4 }
{ "_id" : ObjectId("5314b5c860477752b449eee0"), "count" : 3 }
if you are on a recent version of mongo (2.2 and later) you can use the aggregation framework.
db.mycollection.aggregate([
{$unwind: '$foo'},
{$group: {_id: '$_id', 'sum': { $sum: 1}}},
{$group: {_id: null, total_sum: {'$sum': '$sum'}}}
])
which will give you the total foos of your collection.
Omitting the last group will aggregate results per record.
Using Projections and Groups
db.mycollection.aggregate(
[
{
$project: {
_id:0,
foo_count:{$size:"$foo"},
}
},
{
$group: {
foo_total:{$sum:"$foo_count"},
}
}
]
)
Multiple child array counts can also be calculated this way
db.mycollection.aggregate(
[
{
$project: {
_id:0,
foo1_count:{$size:"$foo1"},
foo2_count:{$size:"$foo2"},
}
},
{
$group: {
foo1_total:{$sum:"$foo1_count"},
foo2_total:{$sum:"$foo2_count"},
}
}
]
)

get total of sub documents in a collection

How do I get the total comments in the collection if my collection looks like this. (not the total comments per post but total for the collection.)
{
_id: 1,
post: 'content',
comments: [
{
name: '',
comment: ''
}
]
}
If I have post A with 3 comments and post B with 5 comments. The result should be 8.
You could use the aggregation framework:
> db.prabir.aggregate(
{ $unwind : "$comments" },
{ $group: {
_id: '',
count: { $sum: 1 }
}
})
{ "result" : [ { "_id" : "", "count" : 8 } ], "ok" : 1 }
In a nutshell this (temporarily) creates a separate document for each comment and then increments count for each document.
For a large number of posts and comments it might be more efficient to keep track of the number of comments. When ever a comment is added you also increment a counter. Example:
// Insert a comment
> comment = { name: 'JohnDoe', comment: 'FooBar' }
> db.prabir.update(
{ post: "A" },
{
$push: { comments: comment },
$inc: { numComments: 1 }
}
)
Using the aggregation framework again:
> db.prabir.aggregate(
{ $project : { _id: 0, numComments: 1 }},
{ $group: {
_id: '',
count: { $sum: "$numComments" }
}
})
{ "result" : [ { "_id" : "", "count" : 8 } ], "ok" : 1 }
You can use the aggregate method of the aggregation framework for that:
db.test.aggregate(
// Only include docs with at least one comment.
{$match: {'comments.0': {$exists: true}}},
// Duplicate the documents, 1 per comments array entry
{$unwind: '$comments'},
// Group all docs together and count the number of unwound docs,
// which will be the same as the number of comments.
{$group: {_id: null, count: {$sum: 1}}}
);
UPDATE
As of MongoDB 2.6, there's a more efficient way to do this by using the $size aggregation operator to directly get the number of comments in each doc:
db.test.aggregate(
{$group: {_id: null, count: {$sum: {$size: '$comments'}}}}
);

Querying internal array size in MongoDB

Consider a MongoDB document in users collection:
{ username : 'Alex', tags: ['C#', 'Java', 'C++'] }
Is there any way, to get the length of the tags array from the server side (without passing the tags to the client) ?
Thank you!
if username Alex is unique, you can use next code:
db.test.insert({username:"Alex", tags: ['C#', 'Java', 'C++'] });
db.test.aggregate(
{$match: {username : "Alex"}},
{$unwind: "$tags"},
{$project: {count:{$add:1}}},
{$group: {_id: null, number: {$sum: "$count" }}}
);
{ "result" : [ { "_id" : null, "number" : 3 } ], "ok" : 1 }
Now MongoDB (2.6 release) supports $size operation in aggregation.
From the documentation:
{ <field>: { $size: <array> } }
What you want can be accomplished as following with either by using this:
db.users.aggregate(
[
{
$group: {
_id: "$username",
tags_count: {$first: {$size: "$tags" }}
}
}
]
)
or
db.users.aggregate(
[
{
$project: {
tags_count: {$size: "$tags"}
}
}
]
)
I think it might be more efficient to calculate the number of tags on each save (as a separate field) using $inc perhaps or via a job on a schedule.
You could also do this with map/reduce (the canonical example) but that doesn't seem to be be what you'd want.
I'm not sure it's possible to do exactly what you are asking, but you can query all the documents that match a certain size with $size ...
> db.collection.find({ tags : { $size: 3 }});
That'd get you all the documents with 3 tags ...
xmm.dev's answer can be simplified: instead of having interm field 'count', you can sum directly in $group:
db.test.aggregate(
{$match: {username : "Alex"}},
{$unwind: "$tags"},
{$group: {_id: null, number: {$sum: 1 }}}
)
Currently, the only way to do it seems to be using db.eval, but this locks database for other operations.
The most speed-efficient way would be adding an extra field that stores the length of the array and
maintaining it by $inc and $push operations.
I did a small work around as I needed to query the array size and return if it was greater than 0 but could be anything from 1-3.
Here was my solution:
db.test.find($or : [{$field : { $exists : true, $size : 1}},
{$field : { $exists : true, $size : 2}},
{$field : { $exists : true, $size : 3}}, ])
This basically returns a document when the attribute exists and the size is 1, 2, or 3. The user can add more statements and increment if they are looking for a specific size or within a range. I know its not perfect but it did work and was relatively quick. I only had 1-3 sizes in my attribute so this solution worked.