MongoDB: count the number of items in an array - mongodb

I have a collection where every document in the collection has an array named foo that contains a set of embedded documents. Is there currently a trivial way in the MongoDB shell to count how many instances are within foo? something like:
db.mycollection.foos.count() or db.mycollection.foos.size()?
Each document in the array needs to have a unique foo_id and I want to do a quick count to make sure that the right amount of elements are inside of an array for a random document in the collection.

In MongoDB 2.6, the Aggregation Framework has a new array $size operator you can use:
> db.mycollection.insert({'foo':[1,2,3,4]})
> db.mycollection.insert({'foo':[5,6,7]})
> db.mycollection.aggregate([{$project: { count: { $size:"$foo" }}}])
{ "_id" : ObjectId("5314b5c360477752b449eedf"), "count" : 4 }
{ "_id" : ObjectId("5314b5c860477752b449eee0"), "count" : 3 }

if you are on a recent version of mongo (2.2 and later) you can use the aggregation framework.
db.mycollection.aggregate([
{$unwind: '$foo'},
{$group: {_id: '$_id', 'sum': { $sum: 1}}},
{$group: {_id: null, total_sum: {'$sum': '$sum'}}}
])
which will give you the total foos of your collection.
Omitting the last group will aggregate results per record.

Using Projections and Groups
db.mycollection.aggregate(
[
{
$project: {
_id:0,
foo_count:{$size:"$foo"},
}
},
{
$group: {
foo_total:{$sum:"$foo_count"},
}
}
]
)
Multiple child array counts can also be calculated this way
db.mycollection.aggregate(
[
{
$project: {
_id:0,
foo1_count:{$size:"$foo1"},
foo2_count:{$size:"$foo2"},
}
},
{
$group: {
foo1_total:{$sum:"$foo1_count"},
foo2_total:{$sum:"$foo2_count"},
}
}
]
)

Related

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

How to query items in array alone in Mongodb

My collection name is employee and my collections as follows
{
"Title":"IssueFixingTeam",
"TeamLead":"Mr.Bean",
"workers":["xxx","yyy","zzz"]
},
{
"Title":"DevelopmentTeam",
"TeamLead":"Mr.John Doe",
"workers":["aa","dd","ss"]
}
how to query to find, how many workers are there under TeamLead "Mr.Bean"
Thanks in advance
if you are interested in just one record (otherwise, see the answer by #felix) belonging to "Mr.Bean", then this could give you the required count:
db.employee.findOne({'TeamLead': 'Mr.Bean'}).workers.length
Use Match
to filter TeamLead: Mr.Bean
use Size operator in Project
to get size of array,
db.collection.aggregate([{
$match: {
TeamLead: "Mr.Bean"
}
}, {
$project: {
"TeamLead":1,
workers: {
$size: "$workers"
}
}
}])
You can use the aggregation framework.
In case you are only interested in matching documents of a specific TeamLead and sum per document:
db.foo.aggregate([{$match: {"TeamLead": "Mr.Bean"}},
{$project: {"num_workers": {$size: "$workers"}}}])
Output:
{ "_id" : ObjectId("58c6a5ef9bc86fa5c7e4fa50"), "num_workers" : 3 }
If you want to group documents by TeamLead and get the number of unique workers under each TeamLead:
db.foo.aggregate([{$group: {"_id": "$TeamLead", "workers": {$addToSet: "$workers"}}},
{$unwind: "$workers"},
{$project: {"num_workers": {$size: "$workers"}}}])
Output:
{ "_id" : "Mr.John Doe", "num_workers" : 3 }
{ "_id" : "Mr.Bean", "num_workers" : 3 }

mongodb find matches based on count aggregation

I have a mongodb collection like this:
{"uid": "01370mask4",
"title": "hidden",
"post: "hidden",
"postTime": "01-23, 2016",
"unixPostTime": "1453538601",
"upvote": [2, 3]}
and I'd like to select post records from the users with more than 5 posts. The stucture should be the same, I just don't need the documents from users who don't have many posts.
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } }
]
)
Now I'm stuck at how to use the count values to find. I searched but didn't find any methods to add the count values back to the same collection by uid. Saving the aggregation output and joining them together seems not supported by mongodb. Please advise, thanks!
Update:
Sorry that I didn't make it clear earlier. Thanks for your prompt answers! I want a subset of the original collection, with post text, post timestamp, etc. I don't want a subset of the aggregation output.
If there aren't millions of documents, then you can try a shortcut way to achieve what you are trying using one aggregate and another find query,
Aggregate query:
var users = db.collection.aggregate(
[
{$group:{_id:'$uid', count:{$sum:1}}},
{$match:{count:{$gt:5}}},
{$group:{_id:null,users:{$push:'$_id'}}}
]
).toArray()[0]['users']
Then it's a straight ahead query to find the particular users:
db.collection.find({uid: {$in: users}})
Just add the $match after your group with the correct query and it works :
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } },
{ $match : { count : { $gt : 5 } }
]
)
Please try this one to select users with more than 5 posts. To keep the original fields through using $first, if the $uid is unique, please add the field as below.
db.collection.aggregate([
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}])
)
If there are multiple value for the same $uid, you should use $push to an array in the $group.
If you want to save the result to db, please try it as below
var cur = db.collection.aggregate(
[
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}
]
)
cur.forEach(function(doc) {
db.collectioin.update({_id: doc._id}, {/*the field should be updated */});
});

Mongodb aggregate distinct with sort and limit

I have a collection objects.
{
"_id" : ObjectId("55fa65046db58e7d0c8b456a"),
"object_id" : "1651419",
"user" : {
"id" : "65593",
"cookie" : "9jgkm7ME1HDFD4K6j8WWvg",
},
"createddate" : ISODate("2015-09-17T10:00:20.945+03:00")
}
Every time user visits object's page it stores as separate record in collection. Now i need to get array of last N visited objects. It should be distinct, so array should have N unique records. Also, it should be sorted by createddate.
So if the user visited object_id = 1, then object_id = 2 two times, after that visited object_id = 3 and again object_id = 1 the array should contain:
{
visits : [1, 3, 2]
}
(distinct and sorted by time of last visit).
I tried to use construction like
db.objects.aggregate([
{$match: {'user.id' : '65593'}},
{$sort: { 'createddate':-1 }},
{$project: {'id': '$user.id', 'obj' : '$object_id'}},
{$group: {_id:'$id', 'obj': {$addToSet: '$obj'}}},
{$project:{_id:0, 'obj':'$obj'}}
])
but it returns array that not sorted and also i can't limit array size.
The $addToSet operator and "sets" in general for MongoDB are not ordered in any way. Insead, get the "distinct" values by grouping on them first, then apply to the array after sorting them:
db.objects.aggregate([
{ "$match": { "user.id": "65593" } },
{ "$sort": { "user.id": 1, "createddate": -1 } },
{ "$group": {
"_id": {
"_id": "$user.id",
"object_id": "$object_id"
},
"createddate": { "$first": "$createddate" }
}},
{ "$sort": { "_id._id": 1, "createddate": -1 } },
{ "$group": {
"_id": "$_id._id",
"obj": { "$push": "$_id.object_id" }
}}
])
So if you want the discovery oder by date you $sort first, but since $group does not guarantee any order of results you need to $sort again before you group with the $push operation to build the array.
Note that you are likely reducing down the "createddate" somehow as then general "distinct" items would appear to be the "user.id" and the "object_id" fields, so this does need some sort of accumulator and needs to be included for your ordering.
Then the array items will be in the order you expect.
If you need to $limit then you must process $unwind and split the limit the results. Alternately process a "limit" after the first group and following sort here.
But of course this is only practical to do for a single main grouping _id, being "user.id". Future mongodb releases will support $slice, which will make this practical for multiple grouping id's and a bit more simple in general. But it still won't be possible to "limit" the array items before that initial group over multiple primary groupind id's.
I found the solution i expected.
db.objects.aggregate([
{$match: {'user.id' : '65593'}},
{$group : {
_id : '$object_id',
dt : {$max: '$createddate'}
}
},
{$sort: {'dt':-1}},
{$limit:5},
{$group : {
_id :null,
'objects' : {$push:'$_id'}
}
},
{$project: {_id:0, 'objects':'$objects'}}
])
It returns limited to N distinct array sorted backwards by createddate.
Thank everyone for help!

Querying internal array size in MongoDB

Consider a MongoDB document in users collection:
{ username : 'Alex', tags: ['C#', 'Java', 'C++'] }
Is there any way, to get the length of the tags array from the server side (without passing the tags to the client) ?
Thank you!
if username Alex is unique, you can use next code:
db.test.insert({username:"Alex", tags: ['C#', 'Java', 'C++'] });
db.test.aggregate(
{$match: {username : "Alex"}},
{$unwind: "$tags"},
{$project: {count:{$add:1}}},
{$group: {_id: null, number: {$sum: "$count" }}}
);
{ "result" : [ { "_id" : null, "number" : 3 } ], "ok" : 1 }
Now MongoDB (2.6 release) supports $size operation in aggregation.
From the documentation:
{ <field>: { $size: <array> } }
What you want can be accomplished as following with either by using this:
db.users.aggregate(
[
{
$group: {
_id: "$username",
tags_count: {$first: {$size: "$tags" }}
}
}
]
)
or
db.users.aggregate(
[
{
$project: {
tags_count: {$size: "$tags"}
}
}
]
)
I think it might be more efficient to calculate the number of tags on each save (as a separate field) using $inc perhaps or via a job on a schedule.
You could also do this with map/reduce (the canonical example) but that doesn't seem to be be what you'd want.
I'm not sure it's possible to do exactly what you are asking, but you can query all the documents that match a certain size with $size ...
> db.collection.find({ tags : { $size: 3 }});
That'd get you all the documents with 3 tags ...
xmm.dev's answer can be simplified: instead of having interm field 'count', you can sum directly in $group:
db.test.aggregate(
{$match: {username : "Alex"}},
{$unwind: "$tags"},
{$group: {_id: null, number: {$sum: 1 }}}
)
Currently, the only way to do it seems to be using db.eval, but this locks database for other operations.
The most speed-efficient way would be adding an extra field that stores the length of the array and
maintaining it by $inc and $push operations.
I did a small work around as I needed to query the array size and return if it was greater than 0 but could be anything from 1-3.
Here was my solution:
db.test.find($or : [{$field : { $exists : true, $size : 1}},
{$field : { $exists : true, $size : 2}},
{$field : { $exists : true, $size : 3}}, ])
This basically returns a document when the attribute exists and the size is 1, 2, or 3. The user can add more statements and increment if they are looking for a specific size or within a range. I know its not perfect but it did work and was relatively quick. I only had 1-3 sizes in my attribute so this solution worked.