Mongo aggregaton does not return whole result only a part of it - mongodb

Can somebody tell me please where is the problem in this simple aggregation command:
db.test.aggregate([
{
$group: {
_id: "$type",
numbers: { $sum: 1 }
}
}
]).pretty()
Collection has about 2 millions of documents and everyone has type field. But the result returns only few of them as result + message "Type "it" for more" If I type "it" it returns next partial aggregation result till the end. But I want to have the whole aggregation in one result. What am I doing wrong?
Thanks.

MongoDB won't return you whole bunch of data because it has built-in pagination.
In other case (2 mil of documents) would crash your server/computer as it runs out of memory.
But, if you'd like to get all bunch of data, it's better to store it with script.
You can write script with your programming language, request db, paginate through data and store in some variable.
Example

Related

Mongodb aggregate $count

I would like to count the number of documents returned by an aggregation.
I'm sure my initial aggregation works, because I use it later in my programm. To do so I created a pipeline variable (here called pipelineTest, ask me if you want to see it in detail, but it's quite long, that's why I don't give the lines here).
To count the number of documents returned, I push my pipeline with :
{$count: "totalCount"}
Now I would like to get (or log) totalCount value. What should I do ?
Here is the aggregation :
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.?)
Thanks for your help, I read lot and lot doc about aggregation and I still don't understand how to read the result of an aggregation...
Assuming you're using async/await syntax - you need to await on the result of the aggregation.
You can convert the cursor to an array, get the first element of that array and access totalCount.
pipelineTest.push({$count: "totalCount"});
cursorTest = await collection.aggregate(pipelineTest, options).toArray();
console.log(cursorTest[0].totalCount);
Aggregation
db.mycollection.aggregate([
{
$count: "totalCount"
}
])
Result
[ { totalCount: 3 } ]
Your Particulars
Try the following:
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.totalCount)

How to retrieve the total cases in my data set on a MongoDB query?

I want the total number of cases in all my documents,
This is the query I tried to use:
db.coviddatajson.aggregate([
{ $group: { _id: null, total: { $sum: "$total_cases"} } }
])
For some reason the result is 0 which does not make sense, as it's supposed to be 1000+ at least and the expected result anything that is not zero will make sense but it's supposed to be a few thousands or something like that.
This is the dataset I am using:
https://covid.ourworldindata.org/data/owid-covid-data.json
What am I doing wrong here?
Any ideas on how to fix this query?
The total_cases field is inside data array, and $sum requires field type as number in $group stage, so before we need to do total($sum) of data.total_cases in current document and then pass it to $group stage and count total sum,
db.coviddatajson.aggregate([
{
$project: { total_cases: { $sum: "$data.total_cases" } }
},
{
$group: {
_id: null,
total: { $sum: "$total_cases" }
}
}
])
Playground
The data set has some issues.
The document size is bigger than 16MiB, you cannot load documents >16MiB into MongoDB. This in an internal limitation. You would need to split the document into sub-documents.
The document contains data for each country but also summarized data for "World". Do you have to exclude the "World" data? Can you use it, instead of manual summary?
The data is not consistent. For example some countries do not provide a number of male/female smokers or median age. Not all countries provide all data for each date, you may have missing values. How to deal with them?
Do you like a simple sum of all total_cases? If yes, the query would be easy, however the result would be pointless (15'773'189'214 total cases, twice population of the world).

Best way to count documents in mongoDB

we have a collection with big amount of documents, lets say around 100k. We now want to count the number of documents which has the key x set.
If I try it with Collection.countDocuments({ x: { $exists: true }}) I get the result, but it creates instantly a warning in the console: Query Targeting: Scanned Objects / Returned has gone above 1000.
So, is there a better way to count the documents? There is a Index on the field, is it possible to get the length of the index?
Thanks
Theres no real way of viewing the index trees in Mongo, what other people have linked you just returns the size of the tree, I'm not sure how useful that information is in this context.
Now to your question is this the best way to count?.
The answer is Yes ... -ish.
countDocuments is a wrapper function, it just simulates the following pipeline:
db.collection.aggregate([
{ $match: <query> },
{ $group: { _id: null, n: { $sum: 1 } } } )
])
This pipeline is the most efficient way to go, but the difference between running this aggregation and using the wrapper function is about 100-200 milliseconds, depending on your machine spec.
Meaning if you're looking for "way" better performance you're not going to find it.
With that said this warning is stupid, it just means you have more than 1000 documents with that field. The true purpose of it is to alert you in the case you're trying to query 1-20 documents without a proper index.
You can use the indexSizes field returned by the stats() method.
The stats() method "Returns statistics about the collection".
See example here :
https://docs.mongodb.com/manual/reference/method/db.collection.stats/#basic-stats-lookup
{
...,
"indexSizes" : {
"_id_" : 237568,
"cuisine_1" : 143360,
"borough_1_cuisine_1" : 151552,
"borough_1_address.zipcode_1" : 151552
},
...
}
indexSize key return size as in space used in storing not count
Check With Explain if index getting used or not . (Update in question Also)
can use hint option to check the performance after specifying index
Or precalculate count by $inc operator might good option if possible in you use case
try cursor.count if its faster countDocument should been faster but no harm in checking
https://docs.mongodb.com/manual/reference/method/cursor.count/

how to populate field of one collection with count query results of another collection?

Kind of a complex one here and i'm pretty new to Mongo, so hopefully someone can help. I have a db of users. Each user has a state/province listed. I'm trying to create another collection of the total users in each state/province. Because users sign up pretty regularly, this will be an growing total i'm trying to generate and display on a map.
I'm able to query the database to find total number of users in a specific state, but i want to do this for all users and come out with a list of totals in all states/provinces and have a separate collection in the DB with all states/provinces listed and the field TOTAL to be dynamically populated with the count query of the other collection. But i'm not sure how to have a query be the result of a field in another collection.
used this to get users totals:
db.users.aggregate([
{"$group" : {_id:"$state", count:{$sum:1}}}
])
My main question is how to make the results of a query the value of a field in each corresponding record in another collection. Or if that's even possible.
Thanks for any help or guidance.
Looks like that On-Demand Materialized Views (just added on version 4.2 of MongoDB) should solve your problem!
You can create an On-Demand Materialized View using the $merge operator.
A possible definition of the Materialized View could be:
updateUsersLocationTotal = function() {
db.users.aggregate( [
{ $match: { <if you need to perform a match, like $state, otherwise remove it> } },
{ $group: { _id:"$state", users_quantity: { $sum: 1} } },
{ $merge: { into: "users_total", whenMatched: "replace" } }
] );
};
And then you perform updates just by calling updateUsersLocationTotal()
After that you can query the view just like a normal collection, using db.users_total.find() or db.users_total.aggregate().

mongodb, pymongo, aggregate gives strange output (something about cursor)

I am trying get a list of people with the most entries in my database.
print db.points.aggregate(
[
{
"$group":
{
"_id": "$created.user",
"count":{"$sum":1}
}
},
{
"$sort":
{"count":-1}
}
]
)
An entry looks like this :
{
u'id': u'342902',
u'_id': ObjectId('555af76a029d3b1b0ff9a4be'),
u'type': u'node',
u'pos': [48.9979746, 8.3719741],
u'created': {
u'changeset': u'7105928',
u'version': u'4',
u'uid': u'163673',
u'timestamp': u'2011-01-27T18:05:54Z',
u'user': u'Free_Jan'
}
}
I know that created.user exists and is otherwise accessible.
Still the output i get is:
<pymongo.command_cursor.CommandCursor object at 0x02ADD6B0>
Shouldn't I get a sorted list ?
The result of an aggregation query is a cursor, as for a regular find query. In case of pymongo the CommandCursor is iterable, thus you are able to do any of the following:
cursor = db.points.aggregate(...)
# Option 1
print(list(cursor))
# Option 2
for document in cursor:
print(document)
Note: as arun noticed, in both cases, i.e. after you create a list out of the cursor, or iterate in the for loop, you will not be able to re-iterate over the cursor. In that case the first option becomes better, if you want to use it in future, as you can use the obtained list as much as you want, because it is in the memory already.
The reason of not being able to reiterate is that the cursor is actually on the server, and it send the data chunk-by-chunk, and after it has sent you all the data (or the server terminates) the cursor gets destroyed.