Limiting Query result in MongoDB - mongodb

I have 20,000+ documents in my mongodb. I just learnt that you cannot query them all in one go.
So my question is this:
I want to get my document using find(query) then limit its results for 3 documents only and I can choose where those documents start from.
For example if my find() query resulted in 8 documents :
[{doc1}, {doc2}, {doc3}, {doc4}, {doc5}, {doc6}, {doc7}, {doc 8}]
command limit(2, 3) will gives [doc3, doc4, doc5]
And I also need to get total count for all that result(without limit) for example : length() will give 8 (the number of total document resulted from find() function)
Any suggestion? Thanks

add .skip(2).limit(3) to the end of your query

I suppose you have the following documents in your collection.
{ "_id" : ObjectId("56801243fb940e32f3221bc2"), "a" : 0 }
{ "_id" : ObjectId("56801243fb940e32f3221bc3"), "a" : 1 }
{ "_id" : ObjectId("56801243fb940e32f3221bc4"), "a" : 2 }
{ "_id" : ObjectId("56801243fb940e32f3221bc5"), "a" : 3 }
{ "_id" : ObjectId("56801243fb940e32f3221bc6"), "a" : 4 }
{ "_id" : ObjectId("56801243fb940e32f3221bc7"), "a" : 5 }
{ "_id" : ObjectId("56801243fb940e32f3221bc8"), "a" : 6 }
{ "_id" : ObjectId("56801243fb940e32f3221bc9"), "a" : 7 }
From MongoDB 3.2 you can use the .aggregate() method and the $slice operator.
db.collection.aggregate([
{ "$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" }
}},
{ "$project": {
"count": 1,
"_id": 0,
"docs": { "$slice": [ "$docs", 2, 3 ] }
}}
])
Which returns:
{
"count" : 8,
"docs" : [
{
"_id" : ObjectId("56801243fb940e32f3221bc4"),
"a" : 2
},
{
"_id" : ObjectId("56801243fb940e32f3221bc5"),
"a" : 3
},
{
"_id" : ObjectId("56801243fb940e32f3221bc6"),
"a" : 4
}
]
}
You may want to sort your document before grouping using the $sort operator.
From MongoDB 3.0 backwards you will need to first $group your documents and use the $sum accumulator operator to return the "count" of documents; also in that same group stage you need to use the $push and the $$ROOT variable to return an array of all your documents. The next stage in the pipeline will then be the $unwind stage where you denormalize that array. From there use use the $skip and $limit operators respectively skip the first 2 documents and passes 3 documents to the next stage which is another $group stage.
db.collection.aggregate([
{ "$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" }
}},
{ "$unwind": "$docs" },
{ "$skip": 2 },
{ "$limit": 3 },
{ "$group": {
"_id": "$_id",
"count": { "$first": "$count" },
"docs": { "$push": "$docs" }
}}
])
As #JohnnyHK pointed out in this comment
$group is going to read all documents and build a 20k element array with them just to get three docs.
You should then run two queries using find()
db.collection.find().skip(2).limit(3)
and
db.collection.count()

Related

Group sums of an attribute by the values of another array attribute

I have a collection "tagsCount" that looks like that:
{
"_id" : ObjectId("59e3a46a48507851d411ad78"),
"tags" : [ "Marketing" ],
"cpt" : 14354
},
{
"_id" : ObjectId("59e3a46a48507851d411ad79"),
"tags" : [
"chatbot",
"Content marketing",
"Intelligence artificielle",
"Marketing digital",
"Personnalisation"
],
"cpt" : 9037
}
Of course there are many more lines.
I want to get the sum of "cpt" grouped by the values of "tags".
I have come up with that:
db.tagsCount.aggregate([
{ "$project": { "tags":1 }},
{ "$unwind": "$tags"},
{ "$group": {
"_id" : "$tags",
cpt : "$cpt" ,
"count": { "$sum": "$cpt" }
}}
])
But that doesn't do the trick, I have the list of all different tags and the count have a value a 0.
Is it possible to do what I want?
The problem is that your aggregation pipeline starts with $project which selects only tags to the next stages and that's why you're executing $group on documents without cpt. Here's my working example:
db.tagsCount.aggregate([
{ "$unwind": "$tags"},
{ "$group": {
"_id": "$tags",
"count": { "$sum": "$cpt" }
}},
{ "$project": { "tag": "$_id", "_id": 0, "count": 1 }}
])

MongoDB - Operations with nested fields

I have twitter data that looks like this:
db.users.findOne()
{
"_id" : ObjectId("578ffa8e7eb9513f4f55a935"),
"user_name" : "koteras",
"retweet_count" : 0,
"tweet_followers_count" : 461,
"source" : "Twitter for iPhone",
"coordinates" : null,
"tweet_mentioned_count" : 1,
"tweet_ID" : "755891629932675072",
"tweet_text" : "RT #ochocinco: I beat them all for 10 straight hours #FIFA16KING",
"user" : {
"CreatedAt" : ISODate("2011-12-27T09:04:01Z"),
"FavouritesCount" : 5223,
"FollowersCount" : 461,
"FriendsCount" : 619,
"UserId" : 447818090,
"Location" : "501"
}
For example, I want to find the number of users that have "FollowersCount" greater than "FavouritesCount". How can I do that?
The $where operator is specifically designed for this.
db.users.find( { $where: function() { return (this.user.FollowersCount > this.user.FavouritesCount) } } );
But keep in mind that this would run single threaded JS code, and will be slower.
Another option is to use an aggregation pipeline projecting the difference, and then having a $match on the difference
db.users.aggregate([
{$project: {
diff: {$subtract: ["$user.FollowersCount", "$user.FavouritesCount"]},
// project remaining fields here
}
},
{$match: {diff: {$gt: 0}}}
])
In my experience I have found the second one to be much faster than the first.
To get the number of users that have "FollowersCount" greater than "FavouritesCount", you could use the aggregation framework which has some operators that you can apply.
Consider the first use case which looks at manipulating the comparison operators within the $project pipeline and a subsequent $match pipeline to filter documents based on the $cmp value. You can then get the final user count by applying a $group pipeline that aggregates the filtered documents:
db.users.aggregate([
{
"$project": {
"hasMoreFollowersThanFavs": {
"$cmp": [ "$user.FollowersCount", "$user.FavouritesCount" ]
}
}
},
{ "$match": { "hasMoreFollowersThanFavs": 1 } },
{
"$group": {
"_id": null,
"count": { "$sum": 1 }
}
}
])
Another option is using a single pipeline with $redact operator which incorporates the functionality of $project and $match as above and returns all documents which match a specified condition using $$KEEP system variable and discards those that don't match using the $$PRUNE system variable:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$eq": [
{ "$cmp": [ "$user.FollowersCount", "$user.FavouritesCount" ] },
1
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{
"$group": {
"_id": null,
"count": { "$sum": 1 }
}
}
])

mongodb $aggregate empty array and multiple documents

mongodb has below document:
> db.test.find({name:{$in:["abc","abc2"]}})
{ "_id" : 1, "name" : "abc", "scores" : [ ] }
{ "_id" : 2, "name" : "abc2", "scores" : [ 10, 20 ] }
I want get scores array length for each document, how should I do?
Tried below command:
db.test.aggregate({$match:{name:"abc2"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Result:
{ "_id" : null, "count" : 2 }
But below command:
db.test.aggregate({$match:{name:"abc"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Return Nothing. Question:
How should I get each lenght of scores in 2 or more document in one
command?
Why the result of second command return nothing? and how
should I check if the array is empty?
So this is actually a common problem. The result of the $unwind phase in an aggregation pipeline where the array is "empty" is to "remove" to document from the pipeline results.
In order to return a count of "0" for such an an "empty" array then you need to do something like the following.
In MongoDB 2.6 or greater, just use $size:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$group": {
"_id": null,
"count": { "$sum": { "$size": "$scores" } }
}}
])
In earlier versions you need to do this:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$project": {
"name": 1,
"scores": {
"$cond": [
{ "$eq": [ "$scores", [] ] },
{ "$const": [false] },
"$scores"
]
}
}},
{ "$unwind": "$scores" },
{ "$group": {
"_id": null,
"count": { "$sum": {
"$cond": [
"$scores",
1,
0
]
}}
}}
])
The modern operation is simple since $size will just "measure" the array. In the latter case you need to "replace" the array with a single false value when it is empty to avoid $unwind "destroying" this for an "empty" statement.
So replacing with false allows the $cond "trinary" to choose whether to add 1 or 0 to the $sum of the overall statement.
That is how you get the length of "empty arrays".
To get the length of scores in 2 or more documents you just need to change the _id value in the $group pipeline which contains the distinct group by key, so in this case you need to group by the document _id.
Your second aggregation returns nothing because the $match query pipeline passed a document which had an empty scores array. To check if the array is empty, your match query should be
{'scores.0': {$exists: true}} or {scores: {$not: {$size: 0}}}
Overall, your aggregation should look like this:
db.test.aggregate([
{ "$match": {"scores.0": { "$exists": true } } },
{ "$unwind": "$scores" },
{
"$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}
}
])

MongoDB $sort usage

This is my database/document.
Running:
db.Students.find().pretty()
Result is:
{
"_id" : 1,
"scores" : [
{
"attempt" : 1,
"score" : 5
},
{
"attempt" : 2,
"score" : 10
},
{
"attempt" : 3,
"score" : 7
},
{
"attempt" : 4,
"score" : 9
}
]
}
How to display the scores in descending order using $sort ?
Well you cannot do that using .find() as any .sort() modifier there is actually sorting the documents and not the contents of your array. But you can do that using .aggregate():
db.Students.aggregate([
// Unwind the array to de-normalize
{ "$unwind": "$scores" },
// Sort the documents with the scores descending
{ "$sort": { "_id": 1, "scores.score": -1 } },
// Group back to an array
{ "$group": {
"_id": "$_id",
"scores": { "$push": "$scores" }
}}
])
So once all the elements are "de-normalized" into individual documents, the $sort pipeline stage takes care of re-arranging the order.

mongodb count num of distinct values per field/key

Is there a query for calculating how many distinct values a field contains in DB.
f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)
If someone adds more documents with a new country I would like the query to return 9.
Is there easier way then group and count?
MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.
There is a shell db.collection.distinct() helper as well:
> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]
> db.countries.distinct('country').length
4
As noted in the MongoDB documentation:
Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.
Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.
db.articles.aggregate([
{
$match: {
keywords: { $not: {$size: 0} }
}
},
{ $unwind: "$keywords" },
{
$group: {
_id: {$toLower: '$keywords'},
count: { $sum: 1 }
}
},
{
$match: {
count: { $gte: 2 }
}
},
{ $sort : { count : -1} },
{ $limit : 100 }
]);
that give result such as
{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }
With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.
For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:
db.users.aggregate([
{ "$group": {
"_id": { "$toLower": "$role" },
"count": { "$sum": 1 }
} },
{ "$group": {
"_id": null,
"counts": {
"$push": { "k": "$_id", "v": "$count" }
}
} },
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$counts" }
} }
])
Example Output
{
"user" : 67,
"superuser" : 5,
"admin" : 4,
"moderator" : 12
}
I wanted a more concise answer and I came up with the following using the documentation at aggregates and group
db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}}])
You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.
Sample
For each distinct value of field counts the occurrences in documents optionally filtered by query
> db.users.distinctAndCount('name', {name: /^a/i})
{
"Abagail": 1,
"Abbey": 3,
"Abbie": 1,
...
}
The field parameter could be an array of fields
> db.users.distinctAndCount(['name','job'], {name: /^a/i})
{
"Austin,Educator" : 1,
"Aurelia,Educator" : 1,
"Augustine,Carpenter" : 1,
...
}
To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :
db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})
So, find number distinct names from a collection where age > 25 will be like :
db.your_collection_name.distinct('names', {'age': {"$gt": 25}})
Hope it helps!
I use this query:
var collection = "countries"; var field = "country";
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db[collection].count({[field]: value}))})
Output:
countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16
This query first distinct all the values, and then count for each one of them the number of occurrences.
If you're on MongoDB 3.4+, you can use $count in an aggregation pipeline:
db.users.aggregate([
{ $group: { _id: '$country' } },
{ $count: 'countOfUniqueCountries' }
]);