Is the count function exact in mongo - mongodb

Is count() in mongodb exact for many documents or is it an approximate number and if it's not is there any function that returns the exact number?

if your mongodb is version 4.0.3 or higher, use this for accurate count:
db.collection.countDocuments({})
According to the latest document count() function is inaccurate in case of unclean shutdown or etc, and will be deprecated.
Check this out:
https://docs.mongodb.com/manual/reference/method/db.collection.countDocuments/

It is an exact count. If it were not an exact count, the documentation would reflect that.
Counts the number of documents in a collection.
Reference - http://docs.mongodb.org/manual/reference/command/count/

Related

Why MongoDB find has same performance as count

I am running tests against my MongoDB and for some reason find has the same performance as count.
Stats:
orders collection size: ~20M,
orders with product_id 6: ~5K
product_id is indexed for improved performance.
Query: db.orders.find({product_id: 6}) vs db.orders.find({product_id: 6}).count()
result the orders for the product vs 5K after 0.08ms
Why count isn't dramatically faster? it can find the first and last elements position with the product_id index
As Mongo documentation for count states, calling count is same as calling find, but instead of returning the docs, it just counts them. In order to perform this count, it iterates over the cursor. It can't just read the index and determine the number of documents based on first and last value of some ID, especially since you can have index on some other field that's not ID (and Mongo IDs are not auto-incrementing). So basically find and count is the same operation, but instead of getting the documents, it just goes over them and sums their number and return it to you.
Also, if you want a faster result, you could use estimatedDocumentsCount (docs) which would go straight to collection's metadata. This results in loss of the ability to ask "What number of documents can I expect if I trigger this query?". If you need to find a count of docs for a query in a faster way, then you could use countDocuments (docs) which is a wrapper around an aggregate query. From my knowledge of Mongo, the provided query looks like a fastest way to count query results without calling count. I guess that this should be preferred way regarding performances for counting the docs from now on (since it's introduced in version 4.0.3).

How do I set `skip` (offset) for reactivemongo mongodb driver?

I need to skip a number of documents (offset) from a query, and only return limit number of documents that go after. I know the following naive approach:
collection.find(BSONDocument())
.cursor[T].collect[List](offset+limit).map(_.drop(offset))
but it is not really desired because it will load offset+limit number of documents in JVM memory, whereas I'd like to filter them on the "database" side.
Solution: use QueryOpts. Example:
collection.find(BSONDocument())
.options(QueryOpts(skipN = offset))
.cursor[T].collect[List](limit)
Note that using skip is not very efficient because mongodb does not support effective pagination, it will just skip the desired number by iterating through all the documents.
VasyaNovikov answer is certainly correct. Reactive mongo offers a more intuitive API:
collection.find(BSONDocument())
.skip(offset)
.cursor[T]
.collect[List](limit, Cursor.FailOnError[List[T]]())

how to get the max value of a field in MongoDB

like:
{id:4563214321,updateTime:long("124354354")}
there are always new collections enter the db, so I would like to always get latest updated documents aka the largest update time. how to design the shell script? thanks in advance.
You can use a combination of limit and sort to achieve this goal.
db.collectionName.find({}).sort({"updateTime" : -1}).limit(1)
This will sort all of your fields based on update time and then only return the one largest value.
I would recommend adding an index to this field to improve performance.
This is a repeated question, you can find an answer in this link
Using findOne in mongodb to get element with max id
use like this,
db.collection.find().sort({updateTime:-1}).limit(1).pretty()
as findOne you can do it with this syntax:
db.collection.findOne({$query:{},$orderby:{_updateTime:-1}})

MongoDB query: Using Limit together with $near skips few documents

I am currently developing an app which gets the specific number of documents from a collection if their location cordinates falls within certain range of distance. I am using a active record library for Codeigniter and the query that is generated is as follows
db.updates.find({locs: { $near: [72.844102008984, 19.130207090604 ], $maxDistance: 5000 }, posted_on : { $lt :1398425538.1942 },}).sort( { posted_on: -1 } ).limit(10).toArray()
The problem I am facing is that the above query skips few documents which should actually get pulled. But if I remove the limit(10) from the above query then proper documents gets pulled.
I am not sure, but does using limit() in MongoDB omit few results ? or does it limits to only the closest(nearest) documents?
P.S - The documents skipped using the limit are not always the same & random results are generated
I suspect you are running into problems with the special nature of the $near query. $near performs both a limit() and a sort() on the cursor returning the results -
Specifies a point for which a geospatial query returns the closest documents first. The query sorts the documents from nearest to farthest.
By default, queries that use a 2d index return a limit of 100 documents; however you may use limit() to change the number of results.
http://docs.mongodb.org/manual/reference/operator/query/near/
While the documentation does specifically discuss overriding the limit of 100 with your own limit call
You can further limit the number of results using cursor.limit().
It is silent on adding your own sort() or both sorting and overriding the limit at the same time. I suspect you are running into side effects of doing both. Note that it's not incorrect to do both - it just may not produce the results you are looking for. I'd suggest trying the same query using $geoWithin
http://docs.mongodb.org/manual/reference/operator/query/geoWithin/
$geoWithin does not apply a sort or a limit on the results, so it gives you something of a more raw result set.
Do you have any identical posted_on dates in the system? I recommend sorting by a second key, perhaps _id. If the sort order is non-deterministic the system may skip documents in a non-deterministic manor. Adding the _id field to your sort order is generally not that expensive if you have an index on the other fields as they will already be very close to the correct order and _id is part of all indexes. ("By default, all collections have an index on the _id field, and applications and users may add additional indexes to support important queries and operations." http://docs.mongodb.org/manual/core/index-single/ )

What is the difference between db.collection_name.find().count() and db.collection_name.count() in MongoDB

I believe they both return the same results, but essentially which is one is better to use under what scenarios?
Here is what the documentation says:
Returns the count of documents that would match a find() query. The db.collection.count() method does not perform the find() operation but instead counts and returns the number of results that match a query.
There's no difference. One is implemented in terms of the other:
> db.users.count
function ( x ){
return this.find( x ).count();
}
From my understanding, they are equivalent to each other. db.collection_name.count() does not use the find() function, therefore, I would assume that it would be slightly better performance wise.
Check out the official MongoDB page referencing this.
MongoDB Count