MongoDB concurrency - reduces the performance - mongodb

I understand that mongo db does locking on read and write operations.
My Use case:
Only read operations. No write operations.
I have a collection about 10million documents. Storage engine is wiredTiger.
Mongo version is 3.4.
I made a request which should return 30k documents - took 650ms on an average.
When I made concurrent requests - same requests - 100 times - It takes in seconds - few seconds to 2 minutes all requests handled.
I have single node to serve the data.
How do I access the data:
Each document contains 25 to 40 fields. I indexed few fields. I query based on one index field.
API will return all the matching documents in json form.
Other informations: API is written using Spring boot.
Concurrency tested through JMeter shell script from command line on remote machine.
So,
My question:
Am I missing any optimizations? [storage engine level, version]
Can't I achieve all read requests to be served less than a second?
If so, what sla I can keep for this use case?
Any suggestions?
Edit:
I enabled database profiler in mongodb with level 2.
My single query internally converted to 4 queries:
Initial read
getMore
getMore
getMore
These are the queries found through profiler.
Totally, it is taking less than 100ms. Is it true really?
My concurrent queries:
Now, When I hit 100 requests, nearly 150 operations are more than 100ms, 100 operations are more than 200ms, 90 operations are more than 300ms.
As per my single query analysis, 100 requests will be converted to 400 queries internally. It is fixed pattern which I verified by checking the query tag in the profiler output.
I hope this is what affects my request performance.

My single query internally converted to 4 queries:
Initial read
getMore
getMore
getMore
It's the way mongo cursors work. The documents are transferred from the db to the app in batches. IIRC the first batch is around 100 documents + cursor Id, then consecutive getMore calls retrieve next batches by cursor Id.
You can define batch size (number of documents in the batch) from the application. The batch cannot exceed 16MB, e.g. if you set batch size 30,000 it will fit into single batch only if document size is less than 500B.
Your investigation clearly show performance degradation under load. There are too many factors and I believe locking is not one of them. WiredTiger does exclusive locks on document level for regular write operations and you are doing only reads during your tests, aren't you? In any doubts you can compare results of db.serverStatus().locks before and after tests to see how many write locks were acquired. You can also run db.serverStatus().globalLock during the tests to check the queue. More details about locking and concurrency are there: https://docs.mongodb.com/manual/faq/concurrency/#for-wiredtiger
The bottleneck is likely somewhere else. There are few generic things to check:
Query optimisation. Ensure you use indexes. The profiler should have no "COLLSCAN" stage in execStats field.
System load. If your database shares system resources with application it may affect performance of the database. E.g. BSON to JSON conversion in your API is quite CPU hungry and may affect performance of the queries. Check system's LA with top or htop on *nix systems.
Mongodb resources. Use mongostat and mongotop if the server has enough RAM, IO, file descriptors, connections etc.
If you cannot spot anything obvious I'd recommend you to seek professional help. I find the simplest way to get one is by exporting data to Atlas, running your tests against the cluster. Then you can talk to the support team if they could advice any improvements to the queries.

Related

Store many documents to mongoose fast

I need to insert 10K documents as fast as possible but it's taking a long time.
I am currently using Model.create([<huge array here>]) to do this.
Would it help to use multiple connections to the database? For example have 10 connections saving 1K each?
You can use model.insertMany(doc, options).
Some stuff to note below.
Connection Pool
10 connections is usually sufficient, but it greatly depends on your hardware. Opening up more connections may slow down your server.
In some cases, the number of connections between the applications and
the database can overwhelm the ability of the server to handle
requests.
Options
There are a couple of options for insertMany that can speed up insertion.
[options.lean «Boolean» = false] if true, skips hydrating and
validating the documents. This option is useful if you need the extra
performance, but Mongoose won't validate the documents before
inserting.
[options.limit «Number» = null] this limits the number of documents
being processed (validation/casting) by mongoose in parallel, this
does NOT send the documents in batches to MongoDB. Use this option if
you're processing a large number of documents and your app is running
out of memory.
Write concern
Setting options on writeConcern in options can also affect performance.
If applications specify write concerns that include the j option,
mongod will decrease the duration between journal writes, which can
increase the overall write load.
Use db.collection.insertMany([])
insertMany will accept array of objects and this is use to perform bulk insertions.

Does a running MongoDB aggregation pipeline slow down reads and writes to the affected collection?

As the title suggests, I'd like to know if reads and writes to a collection are delayed/paused while a MongoDB aggregation pipeline is running. I'm considering adding a pipeline in a user collection, and I think the query could sometimes affect a lot of users (possibly tens of thousands), or just run for longer than I expect. So I'm wondering if that will "block" reads and writes to the collection. The server isn't live, so I don't have real user data to inform this decision. I'd appreciate any feedback or suggestions, thanks!
Each server has certain resource capacity. If you are sending a query to the server, it has less capacity remaining to do other work (be that other queries or writes).
For locking and concurrency in MongoDB, see https://docs.mongodb.com/manual/faq/concurrency/.
If you are planning for high load/high throughput you need to benchmark your specific use case.

MongoDB Atlas Profiler: What is "Num Yields"?

In the MongoDB Atlas dashboard query profiler, there's a Num Yields field. What is it?
Screenshot
From Database Profiler Output documentation page:
The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation. For more information, see the FAQ on when operations yield.
Basically most database operations in MongoDB has a "yield point", i.e. the point that it can yield control to other operations. This is usually waiting for data to be loaded from disk.
So in short, if you see a high number of yields, that means the query spent a lot of time waiting for data to be loaded from disk. The cause is typically:
A query returning or processing a large amount of data. If this is the cause, ensure the query only returns what you need. However this may not be avoidable in some use case (e.g. analytical).
Inefficient query that is not utilizing proper indexing, so the server was forced to load the full documents from disk all the time. If this is the cause, ensure that you have created proper indexes backing the query.
Small RAM in the server, so the data must be loaded from disk all the time. If none of the above, then the server is simply too small for the task at hand. Consider upgrading the server's hardware.
Note that a high number of yield is not necessarily bad if you don't see it all the time. However, it's certainly not good if you see this on a query that you run regularly.

How long will a mongo internal cache sustain?

I would like to know how long a mongo internal cache would sustain. I have a scenario in which i have some one million records and i have to perform a search on them using the mongo-java driver.
The initial search takes a lot of time (nearly one minute) where as the consecutive searches of same query reduces the computation time (to few seconds) due to mongo's internal caching mechanism.
But I do not know how long this cache would sustain, like is it until the system reboots or until the collection undergoes any write operation or things like that.
Any help in understanding this is appreciated!
PS:
Regarding the fields with which search is performed, some are indexed
and some are not.
Mongo version used 2.6.1
It will depend on a lot of factors, but the most prominent are the amount of memory in the server and how active the server is as MongoDB leaves much of the caching to the OS (by MMAP'ing files).
You need to take a long hard look at your log files for the initial query and try to figure out why it takes nearly a minute.
In most cases there is some internal cache invalidation mechanism that will drop your cached query internal record when write operation occurs. It is the simplest describing of process. Just from my own expirience.
But, as mentioned earlier, there are many factors besides simple invalidation that can have place.
MongoDB automatically uses all free memory on the machine as its cache.It would be better to use MongoDB 3.0+ versions because it comes with two Storage Engines MMAP and WiredTiger.
The major difference between these two is that whenever you perform a write operation in MMAP then the whole database is going to lock and whereas the locking mechanism is upto document level in WiredTiger.
If you are using MongoDB 2.6 version then you can also check the query performance and execution time taking to execute the query by explain() method and in version 3.0+ executionStats() in DB Shell Commands.
You need to index on a particular field which you will query to get results faster. A single collection cannot have more than 64 indexes. The more index you use in a collection there is performance impact in write/update operations.

How to improve the performance of feed system using mongodb

I have a feed system using fan-out on write. I keep a list of feed ids in redis sorted set, and save the feed content in mongodb, so every time when i read 30 feeds, i have to do 30 query to mongodb, is there anyway to improve it ?
Its depend upon your setup of database. MongoDb has a vast documentation about how to increase simultaneous read and write MongoDb conncurrency
If you need so many writes in database with less latency starts using sharding Deployment Sharding.
If you need to increase number of reads in data base deploy each shards as replica set and rout your read query in secondary node Read Prefences
Also each query should covered by index Better indexing, you can check your query time by simply adding explain after a find it will show you the time and all facts
db.collection.find({a:"abcd"}).explain()
Make sure you have enough ram so that your data set fits with ram atleast your index should fit inside the ram coz each time a data fetched from disk is 10 times slower than RAM.
Check your sever status with running MongoStat it will measures your database performance , page fault , lock , query opertaion manny detail.
Also measure your hardware performance with program like iostat and make sure io wait is low and less than 1%.
Few good links to deployment of mongodb and performance tuning.
1. Production deployment of mongodb
2. Performance tuning of mongodb By 10gen
3. Using redis before mongodb to cache query and result object
4. Example of redis and mongo