MongoDB closes connection on read operation

MongoDB closes connection on read operation - mongodb

I run MongoDB 4.0 on WiredTiger under Ubuntu Server 16.04 to store complex documents. There is an issue with one of the collections: the documents have many images written as strings in base64. I understand this is a bad practice, but I need some time to fix it.
Because of this some find operations fail, but only those which have a non-empty filter or skip. For example db.collection('collection').find({}) runs OK while db.collection('collection').find({category: 1}) just closes connection after a timeout. It doesn't matter how many documents should be returned: if there's a filter, the error will pop every time (even if it should return 0 docs), while an empty query always executes well until skip is too big.
UPD: some skip values make queries to fail. db.collection('collection').find({}).skip(5000).limit(1) runs well, db.collection('collection').find({}).skip(9000).limit(1) takes way much time but executes too, while db.collection('collection').find({}).skip(10000).limit(1) fails every time. Looks like there's some kind of buffer where the DB stores query related data and on the 10000 docs it runs out of the resources. The collection itself has ~10500 docs. Also, searching by _id runs OK. Unfortunately, I have no opportunity to make new indexes because the operation fails just like read.
What temporary solution I may use before removing base64 images from the collection?

This happens because such a problematic data scheme causes huge RAM usage. The more entities there are in the collection, the more RAM is needed not only to perform well but even to run find.
Increasing MongoDB default RAM usage with storage.wiredTiger.engineConfig.cacheSizeGB config option allowed all the operations to run fine.

Related

MongoDB Atlas Profiler: What is "Num Yields"?

In the MongoDB Atlas dashboard query profiler, there's a Num Yields field. What is it?
Screenshot

From Database Profiler Output documentation page:
The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation. For more information, see the FAQ on when operations yield.
Basically most database operations in MongoDB has a "yield point", i.e. the point that it can yield control to other operations. This is usually waiting for data to be loaded from disk.
So in short, if you see a high number of yields, that means the query spent a lot of time waiting for data to be loaded from disk. The cause is typically:
A query returning or processing a large amount of data. If this is the cause, ensure the query only returns what you need. However this may not be avoidable in some use case (e.g. analytical).
Inefficient query that is not utilizing proper indexing, so the server was forced to load the full documents from disk all the time. If this is the cause, ensure that you have created proper indexes backing the query.
Small RAM in the server, so the data must be loaded from disk all the time. If none of the above, then the server is simply too small for the task at hand. Consider upgrading the server's hardware.
Note that a high number of yield is not necessarily bad if you don't see it all the time. However, it's certainly not good if you see this on a query that you run regularly.

mongodb taking 500 ms in fetching 150 records

We are using MongoDb(3.0.6) for the application we are using. We have around 150 entries in one of the collections, but it takes approx 500ms to fetch all of these records, which I didn't expect. Mongo is deployed on a company server. What can be done to reduce this time?
We are not getting too many reads and there is no CPU load, etc, What can be mistake which may be causing this or what config should be changes to affect these.
Here is my schema: http://pastebin.com/x18iDPKf
I am just querying all the entries, which are 160 in number. I don't think time taken is due to Mongoose or NodeJs, as when I quesry using RoboMongo, It still takes same time.
Output of db.<collection>.stats().size :
223000
The query I am doing is:
db.getCollection('collectionName').find({})

Definitely it shouldn't be a problem with MongoDB. It should be a temporary issue or might be due to system constraints such as Internal Memory and so on.
If still the problem exists, use Indexes on the appropriate field which you are querying.
https://docs.mongodb.com/manual/indexes/

How long will a mongo internal cache sustain?

I would like to know how long a mongo internal cache would sustain. I have a scenario in which i have some one million records and i have to perform a search on them using the mongo-java driver.
The initial search takes a lot of time (nearly one minute) where as the consecutive searches of same query reduces the computation time (to few seconds) due to mongo's internal caching mechanism.
But I do not know how long this cache would sustain, like is it until the system reboots or until the collection undergoes any write operation or things like that.
Any help in understanding this is appreciated!
PS:
Regarding the fields with which search is performed, some are indexed
and some are not.
Mongo version used 2.6.1

It will depend on a lot of factors, but the most prominent are the amount of memory in the server and how active the server is as MongoDB leaves much of the caching to the OS (by MMAP'ing files).
You need to take a long hard look at your log files for the initial query and try to figure out why it takes nearly a minute.

In most cases there is some internal cache invalidation mechanism that will drop your cached query internal record when write operation occurs. It is the simplest describing of process. Just from my own expirience.
But, as mentioned earlier, there are many factors besides simple invalidation that can have place.

MongoDB automatically uses all free memory on the machine as its cache.It would be better to use MongoDB 3.0+ versions because it comes with two Storage Engines MMAP and WiredTiger.
The major difference between these two is that whenever you perform a write operation in MMAP then the whole database is going to lock and whereas the locking mechanism is upto document level in WiredTiger.
If you are using MongoDB 2.6 version then you can also check the query performance and execution time taking to execute the query by explain() method and in version 3.0+ executionStats() in DB Shell Commands.
You need to index on a particular field which you will query to get results faster. A single collection cannot have more than 64 indexes. The more index you use in a collection there is performance impact in write/update operations.

MongoLab (Mongodb) is slow when query old data

We are using MongoLabs and when I query my old data it never returns me results and my web-app simply timeout. I can guess that old data is in the disk and searching through disk is slow. But really does mongodb store all the new data in RAM ? Most of recent data works fine.
What is the real cause for this ? Is there any solutions for make it even. I have 32,401,864 documents in my database and have already created enough indexes based on queries and have TTL set to 100 days.
Number of documents I have is super high for mongodb ?

If you haven't attempted this already, try turning on the profiler, in the tools tab and the commands subtab. Then select the profile (log slow) command. Also, try examining the system.profile collection, since the profiler deposits all profile data into a collection called system.profile. After running your application for a bit, go into the system.profile collection and take a looksee. Just a side note, you don'w ant to leave the profiler on if you're not using it. There is a bit of overhead if you don't turn it off. select profile turn off command from the menu and then run the command

mongodb java driver readFully is slow

db.collection.find in my app that uses mongodb java driver (latest) are super slow. I investigated one of them as follows
// about 300 hundred ids at a time (i've tried lower and higher numbers - no impact
db.users.find({_id : {$in : [1,2,3,4,5,6....]}})
Once I get the cursor I do: cursor.toArray() and then iterate of the results
The toArray operation is extremely slow. On average they take about a minute. IMPORTANT: my database is under very heavy load at all times. This particular collection has over 50mm entries.
I've narrowed down the issue in mongo java driver to com.mongodb.Response - specifically to this line:
final byte [] b = new byte[36];
Bits.readFully(in, b);
Incredibly readFully of just 36 bytes takes over a minute some times!
When I bring own the load on the databases, the improvements are drastic. From about a minute to 5-6 seconds. I mean 5-6 seconds to get 300 documents is still super slow, but definitely better then 1 minute.
What can I do to troubleshoot this further? Are there settings on MondoDB that I need to look at?

What happens
You are loading all of the 300 user documents.
What happens is that the _id index is searched and the respective documents are sent completely to your app. So mongoDB will access it's data files, read the first document and send it to you, then it jumps to the next document and sends it to you and so forth. If you used the cursor, you could start iterating over the returned documents as soon as a number of documents equalling your defined cursor size have been returned, as others will be lazily loaded from the cursor on the server on demand. (Bit of a simplification, but sufficient for answering this question). What you do is to explicitly wait until the index is scanned, the documents are located, sent back to your app and have reached it down to the last byte of the last document. As #wdberkeley (who works for 10gen) correctly pointed out, this is a Very Bad Idea™.
What might cause or intensify the problem
Under heavy load, two things might happen. The more likely is that your _id index isn't in RAM any more, causing thousands, if not millions of reads from disk - which is slow. Much slower than if the indices are kept in RAM (by several orders of magnitude). So it is not the code snippet you mentioned, but the response time of MongoDB which causes the delay. Another option under heavy load is that your disk IO is simply too low or (more likely) the random file read latency is too high. I assume you are using spinning disks plus not enough RAM for a database that size.
What to do to find the cause
Try to find out your index size using the db.users.stats(). I am pretty sure that your index size(s combined) exceed your available RAM.
Measure the disk IO and latency. If you use a GNU/Linux OS, you might want to find out how high your IOwait percentage is. A high percentage shows that your disk latency is too high for the load put on the server. It might even be that your are reaching the disk's IO limits.
Do your queries on a mongo shell. In case they are fast, you can be pretty sure that your toArray call is the cause of the problem.
What to do to resolve the problem
If you have not enough RAM, either scale up or scale out.
If your disk latency or throughput is too high, either scale out or ( better and cheaper in most cases ) use SSDs for storing MongoDB's data.
Use a cursor object to iterate over the documents. This is a better solution in almost every use case I can think of.

Upgrading MongoDB driver to 3.6.4 will fetch the data in no-time.
We have around 2 million documents in our collection and with previous version it was taking around ~3 minutes but after after upgrading to 3.6.4 it took only 5-7 sec.So what I feel is that there is some issue with the old version of mongoDB driver.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse