Mongos count items not real? - mongodb

I have a strange behavior of count() function in a mongos instance.
More than one hour ago I updated about 8.000 items in posts collection because I needed to convert tags objects to Array.
Now, when I query mongos with:
mongos> db.posts.find({blog: 'blog1', tags: {$type: 3}}).count()
4139
mongos> db.posts.findOne({blog: 'blog1', tags: {$type: 3}})
null
Why count() shows 4139 items and findOne returns a null value, even if RS are synchronized ?
EDIT:
There are 4 RS (all synchronized).
I also did the same count query on all PRIMARIES and the result is always 0.
Only if I count on mongos the result is 4139!

count() takes corresponding value from metadata field count and on a sharded environment can show wrong value (there is a bug). It may count chunks which are currently moved by the balancer. I assume that you have more than one shard.
I would not really rely on count on environment with shards and use simple M/R script instead (try to see it with M/R by the way) until above mentioned bug will be fixed (2.5?). You can also take a look at my question regarding count - db.collection.count() returns a lot more documents for sharded collection in MongoDB

If count() and limit() are acting weird, maybe your best shot is trying to repair the database. Go into the Mongo Shell and enter the following command:
db.repairDatabase()
For further explanations you can check the MongoDB docs.

Related

MongoDB collection toArray() length is 20 less than collection.count()

I am using mongoDB version 3.6.3 on a ubuntu operating system.
I have created a collection with 100 records
To manipulate the data on the mongo shell I assign cursor like below
cur = db.dummyData.find({}, {_id: 0})
now the cur.count() is 100 but cur.toArray().length is 80.
I not sure why this is happening. I have tried with bunch of different collections toArray() length is always 20 less than the actual count.
Would appreciate any help to understand this behavior.
MongoDB keeps a running count of documents for each collection which is updated for each insert/delete operation. Some occurrences such a hard shutdown can result in this number in the metadata differing from the actual collection.
The cursor.count() function queries the MongoDB asking for this number from the metadata without fetching any documents, so it is very fast. The cursor.itcount() function will actually fetch the documents, so it will run slower, but will always return an accurate count.
To correct the count in the collections metadata, run db.collectionName.validate(true) on the collection in question from the mongo shell.

MongoDB db.collection.count() vs db.collection.find().length()

I would like to understand why these commands, when run from a mongos instance against the same MongoDB collection, return different numbers?
db.users.count()
db.users.find().length()
What can be the reason and can it be a sign of underlying issues?
I believe your collection is sharded.
Most sharded databases solutions have such discrepancy, due to the fact that some commands consider the entire collection, meaning all the documents of all the shards, while some other commands only consider the documents of the shard it is connected to.
This is something to always keep in mind. It mostly applies to commands which:
count
return the document having the lowest value for a given field
return the document having the biggest value for a given field
...
Found on Mongo docs:
count() is equivalent to the db.collection.find(query).count()
construct. ... Sharded Clusters
On a sharded cluster, db.collection.count() can result in an
inaccurate count if orphaned documents exist or if a chunk migration
is in progress. ...
So in the case of Mongo, it is simply because Mongo always runs, in a background process, some rebalancing of the documents within a shard, in order to keep the shards distribution compliant with the sharding policy defined on the collection.
Keep in mind that to offer the best performance, most sharded solutions will write the documents on the shard the client is connected to, and then later put it where it is really meant to be.
This is why nosql DBs are often flagged as eventually consistent.

Difference between count() and find().count() in MongoDB

What is the difference between, I basically wanted to find all the documents in the mycollection.
db.mycollection.count() vs
db.mycollection.find().count()?
They both returns the same result. Is there any reason why would somebody choose the count() vs the find().count()? In contrast to the fact that find() has a default limit applied (correct me if I'm wrong) to which you would have to type "it" in order to see more in the shell.
db.collection.count() and cursor.count() are simply wrappers around the count command thus running db.collection.count() and cursor.count() with/without the same will return the same query argument, will return the same result. However the count result can be inaccurate in sharded cluster.
MongoDB drivers compatible with the 4.0 features deprecate their
respective cursor and collection count() APIs in favor of new APIs for
countDocuments() and estimatedDocumentCount(). For the specific API
names for a given driver, see the driver documentation.
The db.collection.countDocuments method internally uses an aggregation query to return the document count while db.collection.estimatedDocumentCount/ returns documents count based on metadata.
It is worth mentioning that the estimatedDocumentCount output can be inaccurate as mentioned in the documentation.
db.collection.count() without parameters counts all documents in a collection. db.collection.find() without parameters matches all documents in a collection, and appending count() counts them, so there is no difference.
This is confirmed explicitly in the db.collection.count() documentation:
To count the number of all documents in the orders collection, use the
following operation:
db.orders.count()
This operation is equivalent to the following:
db.orders.find().count()
As is mentioned in another answer by sheilak, the two are equivalent - except that db.collection.count() can be inaccurate for sharded clusters.
The latest documentation says:
count() is equivalent to the db.collection.find(query).count()
construct.
And then,
Sharded Clusters
On a sharded cluster, db.collection.count() can result in an
inaccurate count if orphaned documents exist or if a chunk migration
is in progress.
The documentation explains how to mitigate this bug (use an aggregate).
db.collection.count() is equivalent to the db.collection.find(query).count() construct.
Examples
Count all Documents in a Collection
db.orders.count()
This operation is equivalent to the following:
db.orders.find().count()
Count all Documents that Match a Query
Count the number of the documents in the orders collection with the field ord_dt greater than new Date('01/01/2012'):
db.orders.count( { ord_dt: { $gt: new Date('01/01/2012') } } )
The query is equivalent to the following:
db.orders.find( { ord_dt: { $gt: new Date('01/01/2012') } } ).count()
As per the documentation in the following scenario db.collection.count() can be inaccurate :
On a sharded cluster, db.collection.count() without a query predicate can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.
After an unclean shutdown of a mongod using the Wired Tiger storage engine, count statistics reported by count() may be inaccurate.
I believe if you are using some kind of pagination like:
find(query).limit().skip().count()
You will not get the same result as
count(query)
So in cases like this, if you want to get the total, I think you might have to use both.

Incorrect Count returned by MongoDB (WiredTiger)

This sounds odd, and I hope I am doing something wrong, but my MongoDB collection is returning the Count off by one in my collection.
I have a collection with (I am sure) 359671 documents. However the count() command returns 359670 documents.
I am executing the count() command using the mongo shell:
rs0:PRIMARY> db.COLLECTION.count()
359670
This is incorrect.
It is not finding each and every document in my collection.
If I provide the following query to count, I get the correct result:
rs0:PRIMARY> db.COLLECTION.count({_id: {$exists: true}})
359671
I believe this is a bug in WiredTiger. As far as I am aware each document has the same definition, an _id field of an integer ranging from 0 to 359670, and a BinData field. I did not have this problem with the older storage engine (or Mongo 2, either could have caused the issue).
Is this something I have done wrong? I do not want to use the {_id: {$exists: true}} query as that takes 100x longer to complete.
According to this issue, this behaviour can occur if mongodb experiences a hard crash and is not shut down gracefully. If not issuing any query, mongodb probably just falls back to the collected statistics.
According to the article, calling db.COLLECTION.validate(true) should reset the counters.
As now stated in the doc, db.collection.count() without using a query parameter, returns results based on the collection’s metadata:
This may result in an approximate count. In particular:
On a sharded cluster, the resulting count will not correctly filter out orphaned documents.
After an unclean shutdown, the count may be incorrect.
When using a query parameter, as you did in the second query ({_id: {$exists: true}}), then it forces count to not use the collection's metadata, but to scan the collection instead.
Starting Mongo 4.0.3, count() is considered deprecated and the following alternatives are recommended instead:
Exact count of douments:
db.collection.countDocuments({})
which under the hood actually performs the following "expensive", but accurate aggregation (expensive since the whole collection is scanned to count records):
db.collection.aggregate([{ $group: { _id: null, n: { $sum: 1 } } }])
Approximate count of documents:
db.collection.estimatedDocumentCount()
which performs exactly what db.collection.count() does/did (it's actually a wrapper around count), which uses the collection’s metadata.
This is thus almost instantaneous, but may lead to an approximate result in the particular cases mentioned above.

Mongoid: why fetching count is slower than fetching documents

I noticed a strange behavior. It might be mongoid or mongodb, I am not sure, but Counting documents is slower than fetching the documents. Here are the queries I fired:
Institution.all.any_of(:portaled_at.ne => nil).any_of(portaled: true).order_by(:portaled_at.desc).count
# mongodb query and timing as per mongoid logs,
# times are consistent over multiple runs
# MONGODB (236ms) db['$cmd'].find({"count"=>"institutions", "query"=>{"$or"=>[{:portaled_at=>{"$ne"=>nil}}, {:portaled=>true}]}, "fields"=>nil}).limit(-1)
# MONGODB (245ms) db['$cmd'].find({"count"=>"institutions", "query"=>{"$or"=>[{:portaled_at=>{"$ne"=>nil}}, {:portaled=>true}]}, "fields"=>nil}).limit(-1)
Institution.all.any_of(:portaled_at.ne => nil).any_of(portaled: true).order_by(:portaled_at.desc).to_a
# mongodb query and timing as per mongoid logs
# times are not so consistent over multiple runs,
# but consistently much lower than count query
# MONGODB (9ms) db['institutions'].find({"$or"=>[{:portaled_at=>{"$ne"=>nil}}, {:portaled=>true}]}).sort([[:portaled_at, :desc]])
# MONGODB (18ms) db['institutions'].find({"$or"=>[{:portaled_at=>{"$ne"=>nil}}, {:portaled=>true}]}).sort([[:portaled_at, :desc]])
I believe indexes are not used by mongodb for $and and $or queries, but just so if it matters, I have a sparse index on portaled_at in descending order. Out of around 200,000 documents only around 50-60 have portaled_at set.
rails 3.2.12
mongoid 2.6.0
mongodb 2.2.3
This is against my common sense and if anybody can explain what is going on I would really appreciate it.
While the two are running through different subsystems in MongoDB (one is using runCommand and the other the standard query engine), the specific issue in this case is very likely a known issue in the current version of MongoDb.
The quick summary is that counting without fetching is extremely slow as MongoDb is doing a lot of extra work that often isn't necessary. It's been fixed in the development branch, so it should be in 2.4 when it is released.
For some reason Mongo defaults to not counting records using only indexes. However, if you construct a query correctly, Mongo will count from the index. The trick is to only fetch the fields that are in the index, and you have to specify a query.
In Mongo Shell:
db.MyCollection.find({"_id":{$ne:''}},{"_id":1}).count()
You can check with the explain method:
db.MyCollection.find({"_id":{$ne:''}},{"_id":1}).explain()
Which will include "indexOnly" : true in the output.
And similarly the command can be executed via the Moped driver directly like so:
Mongoid::Sessions.default.command(:count => "MyCollection", :query=>{"_id"=>{"$ne"=>""}}, :fields => {:_id=>1})
Which, in my benchmarks (on my live data, YMMV) is about 100x faster than simply doing MyMongoidDocumentClass.count
Unfortunately, there doesn't seem to be a way to do this quickly through the Mongoid gem.