MongoDB only uses one thread - mongodb

MacOS with mongodb-community#4.2 (installed using brew)
TLDR: MongoDB is only running as one process, seemingly not taking advantage of the 7 other available CPU cores.
I'm running a simple NodeJS application with PM2, making use of all 8 of my CPU cores.
Using Apache Benchmark, I try to stress-test the application for retrieving data. The endpoint I am hitting retrieves data from my MongoDB database. (Only reading, no write operations are performed).
During the stress-test I get these results:
There are 8 active NodeJS processes
There is only 1 active MongoDB process
CPU usage indicates that MongoDB is the bottleneck. How can I ensure that MongoDB takes advantage of more cores?
Screenshot from TOP:
Why is MongoDB only making use of 1 process/core?
Can I increase performance by configuring it to use more than one process/core?
Some additional information, serverStatus() run during the stress-test:

MongoDB (as any database) works with single process to ensure consistency, it uses locking and other concurrency control measures to prevent multiple clients from modifying the same piece of data simultaneously.
MongoDB Performance
In some cases, the number of connections between the applications and the database can overwhelm the ability of the server to handle requests. The following fields in the serverStatus document can provide insight:
connections is a container for the following two fields:
connections.current the total number of current clients connected to the database instance.
connections.available the total number of unused connections available for new clients.
If there are numerous concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then you will need to increase the capacity of your deployment.
For read-heavy applications, increase the size of your replica set and distribute read operations to secondary members.
For write-heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod instances.
https://docs.mongodb.com/manual/administration/analyzing-mongodb-performance/#number-of-connections

Related

How to locally test MongoDB multicollection transactions when standalone mode does not support them

I am just starting out with MongoDB and am using the docker mongo instance for local development and testing.
My code has to update 2 collections in the same transaction so that the data is logically consistent:
using (var session = _client.StartSession())
{
session.StartTransaction();
ec.InsertOne(evt);
sc.InsertMany(snapshot.Selections.Select(ms => new SelectionEntity(snapshot.Id, ms)));
session.CommitTransaction();
}
This is failing with the error:
'Standalone servers do not support transactions
The error is obvious, my standalone docker container does not support transactions. I am confused though as this means it's impossible to test code such as the above unless I have a replica set running. This doesn't appear to be listed as a requirement in the documentation - and it refers to the fact that transactions could be multi-document OR distributed:
For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports multi-document transactions. With distributed transactions, transactions can be used across multiple operations, collections, databases, documents, and shards.
It's not clear to me how to create a multi-document transaction that does not require a replica based server to exist or how to properly test code locally that may not have a mongo replica cluster to work against.
How do people handle this?
For testing puirposes, you could set up a local replica set using docker-compose. There are various blog posts on the topic available, e.g. Create a replica set in MongoDB with docker-compose.
Another option is to use a cluster on MongoDB Atlas. There is a free tier available so you can test this without any extra cost.
In addition, you could change your code so that transactions can be disabled depending on the configuration. This way, you can test the code without transactions locally and enable them on staging or production.

Why is replica set mandatory for transactions in MongoDB?

As per MongoDB documentation, transactions only works for replica sets and not single node. Why such requirement? Isn't it is easier to do transaction stuff on a single node rather than a distributed system?
Implementation of transactions uses sessions which in turn require an oplog. Oplog is provided by replica sets for data synchronization between nodes.
Isn't it is easier to do transaction stuff on a single node rather than a distributed system?
This is true but in practice, MongoDB positions itself as a high-availability database therefore there are rather few production deployments using a standalone server (as far as I know this isn't even an option in Atlas, for example). Hence lack of transaction support on standalone servers typically doesn't affect anything.
Conversely, implementing transactions only on standalone servers would not address the needs of the vast majority of MongoDB deployments/customers that use replica sets and sharded clusters.
For development purposes you can run a single-node replica set which gives you an oplog required for sessions and transactions but still only one mongod process.

mongodb low cpu utilization

I have two instances running on AWS (EC2). One instance is running only mongodb server while the other one is running a multi process python program that acquires info from the remote mongo server.
On the python instance I am using pymongo, and each process establishes connection (MongoClient) independently.
While monitoring the CPU utilization of the mongo's instance, I get very low CPU usage (about 2%).
In the free monitoring tool (https://cloud.mongodb.com/freemonitoring/cluster), I get about 40% CPU utilization.
Why there is such a big difference between the two values?
Does the mongodb needs to be special configured in order to utilize multiple CPU's cores?
Does the mongodb needs to be special configured in order to utilize multiple CPU's cores?
No.
Why there is such a big difference between the two values?
You have not described where the 2% value came from or what it is measuring, hence this question is impossible to answer.

Running MongoDB and Redis on two different containers in the same host machine

I have read somewhere that MongoDB and Redis server shouldn't be executed in the same host because the way that Redis manages the memory damages MongoDb. This is before Docker.io. But now thing seems are pretty different or not? Is is convenient running Redis server and MongoDB on two different containers on the same host machine?
Docker does not change your hardware, also it is the OS that deals with resources which is not virtualized so the same rules as a normal hardware should apply here.
RAM
MongoDB and Redis don't share any memory. The problem of using the same host will be that you can run out of RAM with these two processes, you can put a max size for redis, you can probably do the same for MongoDB, it is mandatory.
If your sizing is good (MongoDB RAM + Redis RAM < Hardware RAM), you won't get any swap on disk for redis (which is absolutely what you want to prevent) but maybe mongodb cache won't be as good (not enough place for optimization). Less memory for redis is always a challenge if your data grows: beware of out of memory if the data size is unpredictable!
If you use backups with redis, it uses more RAM than its dataset to produce the dump, so beware of that. It implies also using IO.
IO
In this case (less RAM) mongo will do a lot more of IO to access data. Redis, depending on your backup policy, can use IO or not (your choice). Worst case: if you use AOF on redis, it is a lot of IO so maybe IO can become a bottleneck in this architecture. If you don't use backups with redis: you won't have problems. Also a SSD is a good choice for Mongo.
CPU
I don't know if MongoDB uses a lot of CPU, but redis most of the time does not except during backups. If you use backups with redis: try to have two CPU cores available for it (one for redis, one for backup task).
Network
It depends on your number of clients. But you should check the throughput / input load of your machine to see if you are not saturating (using monit for instance with alerts). Sometimes it is the bottleneck, not enought throughput in one machine!
Many of today's services, in particular Databases, are very aggressive consuming resources and are designed thinking they will (or should) be executed in a dedicated machine for them. MongoDB and Redis try to keep a lot of data in memory and will try to take the more memory they can for themselves. To avoid this services take all the memory of your host machine you can limit the maximum memory used by a container using -m="<number><optional unit>" in docker run. E.g.: docker run -d -m="2g" -p 27017:27017 --name mongodb dockerfile/mongodb
So you can control in an easy way the resource limits of your services, and run them in the same host with a fine grained control of the resources. Anyway it's important to consider that the performance of these services is designed thought that the resources of the host machine will be fully available for them. For example there are other databases as Cassandra that will consume a lot of memory, and furthermore, are designed to have sequential access writing to disk. In these cases Docker will let you to run limiting the resources used, but if you run multiple services in the same host the performance of them will decrease severely.

MongoDB : does it need 2 mongos per shard?

All in the title : do we need 2 mongos per shard in MongoDB ? I am not sure to understand exactly what mongos are for and if my website will communicate with them or if it is something internal to MongoDB.
If you have cluster set up (with shards, not to be confused with replica set), then you have to have mongos instances deployed. It's a router process. It knows which data resides where. Application talks to mongos, it routes the request to corresponding shard. Talking to shards directly is strongly discouraged.
You must have at least one mongos process. You can have more, they have small resource footprint. I usually deploy one mongos per application server.
A mongos is basically nothing more than a router which gathers a configuration of your cluster from config servers, caches that config, and uses it to route targeted and scatter and gather operations within a cluster of shards. It can also be used for aggregation as such if aggregation queries are common in your app the mongos can take some CPU and memory, however, for the most part they have no weight and can run on the smallest server.
You do not require 2 mongos, the number depends upon the operations being sent through that router. You can in theory do with one, however, that isn't very redundant and cerates a single point of failure, 2 makes that less possible.