Assume I have a mongodb instance on a remote server (1 core cpu and 8G memory).
When I send a simple query db.table.find({_id:ObjectId('xxx')}).limit(1).explain()to the instance, I got the result show that query cost 10ms.
So can I come to a conclusion that "my mongodb server can only handle 100 simple query per second"?
"So can I come to a conclusion that "my mongodb server can only handle 100 simple query per second""
Its not tht like that 1 query takes 10ms then 100 query will take 1sec
db.table.find({_id:ObjectId('xxx')}).limit(1) will not lock your table
So if your 100 client request this with 100 connections all will return in 10ms
Concurrency Depends upon locks and connection limits
If a query locking collection for read and write for 10 sec then all query after that will wait for lock to cleared
MongoDb Can Handle multiple Request
db.runCommand( { serverStatus: 1 } )
will return current status of mongo there is object in
"connections" : {
"current" : <num>,
"available" : <num>,
"totalCreated" : <num>,
"active" : <num>
}
https://docs.mongodb.com/manual/reference/command/serverStatus/#connections
https://docs.mongodb.com/manual/reference/configuration-options/#net.maxIncomingConnections
https://docs.mongodb.com/manual/faq/concurrency/
These will give more clarity around connections and its limits
You should know, your mongodb server can handle many parallel queries.
your assumption is incorrect, cuz u are ignoring the fact that mongodb supports concurrency
Source
Related
I understand that mongo db does locking on read and write operations.
My Use case:
Only read operations. No write operations.
I have a collection about 10million documents. Storage engine is wiredTiger.
Mongo version is 3.4.
I made a request which should return 30k documents - took 650ms on an average.
When I made concurrent requests - same requests - 100 times - It takes in seconds - few seconds to 2 minutes all requests handled.
I have single node to serve the data.
How do I access the data:
Each document contains 25 to 40 fields. I indexed few fields. I query based on one index field.
API will return all the matching documents in json form.
Other informations: API is written using Spring boot.
Concurrency tested through JMeter shell script from command line on remote machine.
So,
My question:
Am I missing any optimizations? [storage engine level, version]
Can't I achieve all read requests to be served less than a second?
If so, what sla I can keep for this use case?
Any suggestions?
Edit:
I enabled database profiler in mongodb with level 2.
My single query internally converted to 4 queries:
Initial read
getMore
getMore
getMore
These are the queries found through profiler.
Totally, it is taking less than 100ms. Is it true really?
My concurrent queries:
Now, When I hit 100 requests, nearly 150 operations are more than 100ms, 100 operations are more than 200ms, 90 operations are more than 300ms.
As per my single query analysis, 100 requests will be converted to 400 queries internally. It is fixed pattern which I verified by checking the query tag in the profiler output.
I hope this is what affects my request performance.
My single query internally converted to 4 queries:
Initial read
getMore
getMore
getMore
It's the way mongo cursors work. The documents are transferred from the db to the app in batches. IIRC the first batch is around 100 documents + cursor Id, then consecutive getMore calls retrieve next batches by cursor Id.
You can define batch size (number of documents in the batch) from the application. The batch cannot exceed 16MB, e.g. if you set batch size 30,000 it will fit into single batch only if document size is less than 500B.
Your investigation clearly show performance degradation under load. There are too many factors and I believe locking is not one of them. WiredTiger does exclusive locks on document level for regular write operations and you are doing only reads during your tests, aren't you? In any doubts you can compare results of db.serverStatus().locks before and after tests to see how many write locks were acquired. You can also run db.serverStatus().globalLock during the tests to check the queue. More details about locking and concurrency are there: https://docs.mongodb.com/manual/faq/concurrency/#for-wiredtiger
The bottleneck is likely somewhere else. There are few generic things to check:
Query optimisation. Ensure you use indexes. The profiler should have no "COLLSCAN" stage in execStats field.
System load. If your database shares system resources with application it may affect performance of the database. E.g. BSON to JSON conversion in your API is quite CPU hungry and may affect performance of the queries. Check system's LA with top or htop on *nix systems.
Mongodb resources. Use mongostat and mongotop if the server has enough RAM, IO, file descriptors, connections etc.
If you cannot spot anything obvious I'd recommend you to seek professional help. I find the simplest way to get one is by exporting data to Atlas, running your tests against the cluster. Then you can talk to the support team if they could advice any improvements to the queries.
Here is the issue :
If the collection has only the default index "_id", then the time to upload a set of documents is constant when the collection gets bigger.
But if I add the following index to the collection :
db.users.createIndex({"s_id": "hashed"}, {"background": true})
then, the time to upload the same set of documents increase drastically (it looks like an exponential function)
Context :
I'm trying to insert about 80 millions documents in a collection. I don't use mongo's sharding, there is only one instance.
I'm using the python API and here is my code :
client = pymongo.MongoClient(ip_address, 27017)
users = client.get_database('local')\
.get_collection('users')
bulk_op = users.initialize_unordered_bulk_op()
for s in iterator:
bulk_op.insert(s)
bulk_op.execute()
client.close()
There are 15 concurrent connections (I'm using Apache Spark and it corresponds to the different partitions).
The instance has 4GB of RAM.
The total size of indexes is about 1,5GB when upload is finish.
Thanks a lot for your help.
I am trying to run some wild card/regex based query on mongo cluster from java driver.
Mongo Replica Set config:
3 member replica
16 CPU(hyperthreaded), 24G RAM Linux x86_64
Collection size: 6M rows, 7G data
Client is localhost (mac osx 10.8) with latest mongo-java driver
Query using java driver with readpref = primaryPreffered
{ "$and" : [{ "$or" : [ { "country" : "united states"}]} , { "$or" : [ { "registering_organization" : { "$regex" : "^.*itt.*hartford.*$"}} , { "registering_organization" : { "$regex" : "^.*met.*life.*$"}} , { "registering_organization" : { "$regex" : "^.*cardinal.*health.*$"}}]}]}
I have regular index on both "country" and "registering_organization". But as per mongo docs a single query can utilize only one index and I can see that from explain() on above query as well.
So my question is what is the best alternative to achieve better performance in above query.
Should I break the 'and' operations and do in memory intersection. Going further I shall have 'Not' operations in query too.
I think my application may turn into reporting/analytics in future but that's not down the line or i am not looking to design accordingly.
There are so many things wrong with this query.
Your nested conditional with regexes will never get faster in MongoDB. MongoDB is not the best tool for "data discovery" (e.g. ad-hoc, multi-conditional queries for uncovering unknown information). MongoDB is blazing fast when you know the metrics you are generating. But, not for data discovery.
If this is a common query you are running, then I would create an attribute called "united_states_or_health_care", and set the value to the timestamp of the create date. With this method, you are moving your logic from your query to your document schema. This is one common way to think about scaling with MongoDB.
If you are doing data discovery, you have a few different options:
Have your application concatenate the results of the different queries
Run query on a secondary MongoDB, and accept slower performance
Pipe your data to Postgresql using mosql. Postgres will run these data-discovery queries much faster.
Another Tip:
Your regexes are not anchored in a way to be fast. It would be best to run your "registering_organization" attribute through a "findable_registering_organization" filter. The filter would break apart the organization into an array of queryable name subsets, and you would quite using the regexes. +2 points if you can filter incoming names by an industry lookup.
I have 2 EC2 instances, one as a mongodb server and the other is a python web app (same availability zone). The python sever connects to mongo server using PyMongo and everything works fine.
The problem is, when I profile execution-time in python, in some calls (less than 5%) it takes even up to couple of second to return. I was able to narrow down the problem and the time delay was actually on the db calls to mongo server.
The two causes that I thought were,
1. The Mongo server is slow/over-loaded
2. Network latency
So, I tried upgrading the mongo servers to 4X faster instance but the issue still happens (Some calls takes even 3 secs to return) I assumed since both the servers are on EC2, network latency should not be a problem... but may be I was wrong.
How can I confirm if the issue is actually the network itself? If so, what the best way to solve it? Is there any other possible cause?
Any help is appreciated...
Thanks,
UPATE: The entities that I am fetching are very small (indexed) and usually the calls takes only 0.01-0.02 secs to finish.
UPDATE:
As suggested by "James Wahlin", I enabled profiling on my mongo server and got some interesting logs,
Fri Mar 15 18:05:22 [conn88635] query db.UserInfoShared query: { $or:
[ { _locked: { $exists: false } }, { _locked: { $lte:
1363370603.297361 } } ], _id: "750837091142" } nto return:1 nscanned:1 nreturned:1 reslen:47 2614ms
Fri Mar 15 18:05:22 [conn88635] command db.$cmd command: {
findAndModify: "UserInfoShared", fields: { _id: 1 }, upsert: true,
query: { $or: [ { _locked: { $exists: false } }, { _locked: { $lte:
1363370603.297361 } } ], _id: "750837091142" }, update: { $set: { _locked: 1363370623.297361 } }, new: true } ntoreturn:1 reslen:153 2614ms
You can see these two calls took more than 2 secs to finish. The field _id is unique indexed and finding it should not have taken this much time. May be I have to post a new question for it, but can the mongodb GLOBAL LOCK be the cause ?
#James Wahlin, thanks a lot for helping me out.
As it turned out the main cause of latency was mongodb GLOBAL LOCK itself. We had a lock percentage which was averaging at 5% and sometime peaks up to 30-50% and that results in the slow queries.
If you are facing this issue, the first thing you have to do it enable mongodb MMS service (mms.10gen.com), which will give you a lot of insights on what exactly is happening in your db server.
In our case the LOCK PERCENTAGE was really high and there were multiple reasons for it. First thing you have to do to figure it out is to read mongodb documentation on concurrency,
http://docs.mongodb.org/manual/faq/concurrency/
The reason for lock can be in application level, mongodb or hardware.
1) Our app was doing a lot of updates and each update(more than 100 ops/sec) holds a global lock in mongodb. The issue was that when an update happens for an entry which is not in memory, mongo will have to load the data into memory first and then update (in memory) and the whole process happens while inside the global lock. If say the whole thing takes 1 sec to complete (0.75sec to load the data from disk and 0.25sec to update in memory), the whole rest of the update calls waits (for the entire 1 sec) and such updates starts queuing up... and you will notice more and more slows requests in your app server.
The solution for it (while it might sound silly) is to query for the same data before you make an update. What it effectively does is moving the 'load data to memory' (0.75sec) part out of the global lock which greatly reduces your lock percentage
2) The other main cause of global lock is mongodb's data flush to disk. Basically in every 60sec (or less) mongodb (or the OS) writes the data to disk and a global lock is held during this process. (This kinda explains the random slow queries). In your MMS stats, see the graph for background flush avg... if its high, that means you have to get faster disks.
In our case, we moved to a new EBS optimized instance in EC2 and also bumped our provisioned IOPS from 100 to 500 which almost halved the background flush avg and the servers are much happier now.
I am trying to retrieve 100000 docouments from MongoDb like below and its taking very long to return collection.
var query = Query.EQ("Status", "E");
var items = collection.Find(query).SetLimit(100000).ToList();
Or
var query = Query.GT("_id", idValue);
var items = collection.Find(query).SetLimit(100000).ToList();
Explain:
{
"cursor" : "BtreeCursor _id_",
"nscanned" : 1,
"nscannedObjects" :1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" :
{
"_id" :[[ObjectId("4f79a64eca98b5fc0e5ae35a"),
ObjectId("4f79a64eca98b5fc0e5ae35a")]]
}
}
Any suggestions to improve query performance. My table having 2 million documents.
-Venkat
This question was also asked on Google Groups:
https://groups.google.com/forum/?fromgroups#!topicsearchin/mongodb-user/100000/mongodb-user/a6FHFp5aOnA
As I responded on the Google Groups question I tried to reproduce this and was unable to observe any slowness. I was able to read 100,000 documents in 2-3 seconds, depending on whether the documents were near the beginning or near the end of the collection (because I didn't create an index).
My answer to the Google groups question has more details and a link to the test program I used to try and reproduce this.
Given the information you have provided my best guess is that your document size is too large and the delay is not necessarily on the mongo server but on the transmission of the result set back to your app machine. Take a look at your avg document size in the collection, do you have large embedded arrays for example?
Compare the response time when selecting only one field using the .SetFields method (see example here How to retrieve a subset of fields using the C# MongoDB driver?). If the response time is significantly faster then you know that this is the issue.
Have you defined indices?
http://www.mongodb.org/display/DOCS/Indexes
There are several things to check:
Is your query correctly indexed?
If your query is indexed, what are the odds that the data itself is in memory? If you have 20GB of data and 4GB of RAM, then most of your data is not in memory which means that your disks are doing a lot of work.
How much data does 100k documents represent? If your documents are really big they could be sucking up all of the available disk IO or possibly the network? Do you have enough space to store this in RAM on the client?
You can check for disk usage using iostat (a common linux tool) or perfmon (under Windows). If you run these while your query is running, you should get some idea about what's happening with your disks.
Otherwise, you will have to do some reasoning about how much data is moving around here. In general, queries that return 100k objects are not intended to be really fast (not in MongoDB or in SQL). That's more data than humans typically consume in one screen, so you may want to make smaller batches and read 10k objects 10 times instead of 100k objects once.
If you don't create indexes for your collection the MongoDB will do a full table scan - this is the slowest possible method.
You can run explain() for your query. Explain will tell you which indexes (if any) are used for the query, number of scanned documents and total query duration.
If your query hits all the indexes and it's execution is still slow then you probably have a problem with the size of the collection / RAM.
MongoDB is the fastest when collection data + indexes fits in the memory. If the your collection size is larger than available RAM the performance drop is very large.
You can check the size of your collection with totalSize(), totalIndexSize() or validate() (these are shell commands).