Difference between count() and find().count() in MongoDB - mongodb

What is the difference between, I basically wanted to find all the documents in the mycollection.
db.mycollection.count() vs
db.mycollection.find().count()?
They both returns the same result. Is there any reason why would somebody choose the count() vs the find().count()? In contrast to the fact that find() has a default limit applied (correct me if I'm wrong) to which you would have to type "it" in order to see more in the shell.

db.collection.count() and cursor.count() are simply wrappers around the count command thus running db.collection.count() and cursor.count() with/without the same will return the same query argument, will return the same result. However the count result can be inaccurate in sharded cluster.
MongoDB drivers compatible with the 4.0 features deprecate their
respective cursor and collection count() APIs in favor of new APIs for
countDocuments() and estimatedDocumentCount(). For the specific API
names for a given driver, see the driver documentation.
The db.collection.countDocuments method internally uses an aggregation query to return the document count while db.collection.estimatedDocumentCount/ returns documents count based on metadata.
It is worth mentioning that the estimatedDocumentCount output can be inaccurate as mentioned in the documentation.

db.collection.count() without parameters counts all documents in a collection. db.collection.find() without parameters matches all documents in a collection, and appending count() counts them, so there is no difference.
This is confirmed explicitly in the db.collection.count() documentation:
To count the number of all documents in the orders collection, use the
following operation:
db.orders.count()
This operation is equivalent to the following:
db.orders.find().count()

As is mentioned in another answer by sheilak, the two are equivalent - except that db.collection.count() can be inaccurate for sharded clusters.
The latest documentation says:
count() is equivalent to the db.collection.find(query).count()
construct.
And then,
Sharded Clusters
On a sharded cluster, db.collection.count() can result in an
inaccurate count if orphaned documents exist or if a chunk migration
is in progress.
The documentation explains how to mitigate this bug (use an aggregate).

db.collection.count() is equivalent to the db.collection.find(query).count() construct.
Examples
Count all Documents in a Collection
db.orders.count()
This operation is equivalent to the following:
db.orders.find().count()
Count all Documents that Match a Query
Count the number of the documents in the orders collection with the field ord_dt greater than new Date('01/01/2012'):
db.orders.count( { ord_dt: { $gt: new Date('01/01/2012') } } )
The query is equivalent to the following:
db.orders.find( { ord_dt: { $gt: new Date('01/01/2012') } } ).count()
As per the documentation in the following scenario db.collection.count() can be inaccurate :
On a sharded cluster, db.collection.count() without a query predicate can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.
After an unclean shutdown of a mongod using the Wired Tiger storage engine, count statistics reported by count() may be inaccurate.

I believe if you are using some kind of pagination like:
find(query).limit().skip().count()
You will not get the same result as
count(query)
So in cases like this, if you want to get the total, I think you might have to use both.

Related

MongoDB duplicate key in single document

relative noob.
I have a MongoDB collection, where it appears one of the keys has been duplicated within a document. The document count is 13603, but aggregating by the key and counting results in 13604. I have run this 3 times, 30 minutes apart, so I know it's not a timing issue. I am trying to find the document with the duplicate key, but don't understand aggregations enough to find it.
I found a similar thread here but I see no solution for finding the "corrupt" document within a collection.
This is NOT a duplicate key across documents or a duplicate document issue; it is a duplicate key within the same document issue. Any help is appreciated.
screen-shot comparing document count to key-aggregation count
It is most probably incorrect:
db.collection.count()
Check this ticket here
Try to count with this:
db.collection.countDocuments({})
The db.collection.count() is just reading the collection count metadata , it is fast but inacurate especially in sharded cluster since sometimes there is orphaned documents not updated in collection metadata , you need to clean orphans and then try again.
From the documentation:
Avoid using the db.collection.count() method without a query predicate since without the query predicate, the method returns results based on the collection's metadata, which may result in an approximate count. In particular,
On a sharded cluster, the resulting count will not correctly filter out orphaned documents.
After an unclean shutdown, the count may be incorrect.
For counts based on collection metadata, see also collStats pipeline stage with the count option.

MongoDB collection toArray() length is 20 less than collection.count()

I am using mongoDB version 3.6.3 on a ubuntu operating system.
I have created a collection with 100 records
To manipulate the data on the mongo shell I assign cursor like below
cur = db.dummyData.find({}, {_id: 0})
now the cur.count() is 100 but cur.toArray().length is 80.
I not sure why this is happening. I have tried with bunch of different collections toArray() length is always 20 less than the actual count.
Would appreciate any help to understand this behavior.
MongoDB keeps a running count of documents for each collection which is updated for each insert/delete operation. Some occurrences such a hard shutdown can result in this number in the metadata differing from the actual collection.
The cursor.count() function queries the MongoDB asking for this number from the metadata without fetching any documents, so it is very fast. The cursor.itcount() function will actually fetch the documents, so it will run slower, but will always return an accurate count.
To correct the count in the collections metadata, run db.collectionName.validate(true) on the collection in question from the mongo shell.

What is the difference between COUNT_SCAN and IXSCAN?

Whenever I run a count query on MongoDB with explain I can see two different stages COUNT_SCAN and IXSCAN. I want to know the difference between them according to performance and how can I improve the query.
field is indexed.
following query:
db.collection.explain(true).count({field:1}})
uses COUNT_SCAN and query like:
db.collection.explain(true).count({field:"$in":[1,2]})
uses IXSCAN.
The short: COUNT_SCAN is the most efficient way to get a count by reading the value from an index, but can only be performed in certain situations. Otherwise, IXSCAN is performed following by some filtering of documents and a count in memory.
When reading from secondary the read concern available is used. This concern level doesn't consider orphan documents in sharded clusters, and so no SHARDING_FILTER stage will be performed. This is when you see COUNT_SCAN.
However, if we use read concern local, we need to fetch the documents in order to perform the SHARDING_FILTER filter stage. In this case, there are multiple stages to fulfill the query: IXSCAN, then FETCH then SHARDING_FILTER.

Incorrect Count returned by MongoDB (WiredTiger)

This sounds odd, and I hope I am doing something wrong, but my MongoDB collection is returning the Count off by one in my collection.
I have a collection with (I am sure) 359671 documents. However the count() command returns 359670 documents.
I am executing the count() command using the mongo shell:
rs0:PRIMARY> db.COLLECTION.count()
359670
This is incorrect.
It is not finding each and every document in my collection.
If I provide the following query to count, I get the correct result:
rs0:PRIMARY> db.COLLECTION.count({_id: {$exists: true}})
359671
I believe this is a bug in WiredTiger. As far as I am aware each document has the same definition, an _id field of an integer ranging from 0 to 359670, and a BinData field. I did not have this problem with the older storage engine (or Mongo 2, either could have caused the issue).
Is this something I have done wrong? I do not want to use the {_id: {$exists: true}} query as that takes 100x longer to complete.
According to this issue, this behaviour can occur if mongodb experiences a hard crash and is not shut down gracefully. If not issuing any query, mongodb probably just falls back to the collected statistics.
According to the article, calling db.COLLECTION.validate(true) should reset the counters.
As now stated in the doc, db.collection.count() without using a query parameter, returns results based on the collection’s metadata:
This may result in an approximate count. In particular:
On a sharded cluster, the resulting count will not correctly filter out orphaned documents.
After an unclean shutdown, the count may be incorrect.
When using a query parameter, as you did in the second query ({_id: {$exists: true}}), then it forces count to not use the collection's metadata, but to scan the collection instead.
Starting Mongo 4.0.3, count() is considered deprecated and the following alternatives are recommended instead:
Exact count of douments:
db.collection.countDocuments({})
which under the hood actually performs the following "expensive", but accurate aggregation (expensive since the whole collection is scanned to count records):
db.collection.aggregate([{ $group: { _id: null, n: { $sum: 1 } } }])
Approximate count of documents:
db.collection.estimatedDocumentCount()
which performs exactly what db.collection.count() does/did (it's actually a wrapper around count), which uses the collection’s metadata.
This is thus almost instantaneous, but may lead to an approximate result in the particular cases mentioned above.

Mongos count items not real?

I have a strange behavior of count() function in a mongos instance.
More than one hour ago I updated about 8.000 items in posts collection because I needed to convert tags objects to Array.
Now, when I query mongos with:
mongos> db.posts.find({blog: 'blog1', tags: {$type: 3}}).count()
4139
mongos> db.posts.findOne({blog: 'blog1', tags: {$type: 3}})
null
Why count() shows 4139 items and findOne returns a null value, even if RS are synchronized ?
EDIT:
There are 4 RS (all synchronized).
I also did the same count query on all PRIMARIES and the result is always 0.
Only if I count on mongos the result is 4139!
count() takes corresponding value from metadata field count and on a sharded environment can show wrong value (there is a bug). It may count chunks which are currently moved by the balancer. I assume that you have more than one shard.
I would not really rely on count on environment with shards and use simple M/R script instead (try to see it with M/R by the way) until above mentioned bug will be fixed (2.5?). You can also take a look at my question regarding count - db.collection.count() returns a lot more documents for sharded collection in MongoDB
If count() and limit() are acting weird, maybe your best shot is trying to repair the database. Go into the Mongo Shell and enter the following command:
db.repairDatabase()
For further explanations you can check the MongoDB docs.