MongoDB Write performance issue in Cluster - mongodb

I am facing some performance issue while writing the data into MongoDB cluster(1Master,1Config , 2 Shards) that we have set.
For 1GB of data(8 Million documents) it is taking more than 30 min while in single node it was taking 8min for that same data.
Here are the config details of VMs:
Master Server – 7GB RAM, 3CPUs.
Config Server - 1GB RAM, 1CPU.
2 Shard servers- 2GB RAM, 2 CPUs (each shard)
All servers configured in Virtual Machines.
Please let me know if you have any idea to resolve this issue.

It seems your secondary servers are struggling to keep up with the data writes on the primary, most likely because the have a lower hardware spec. If it is acceptable, you should set the write concern to:
{ w: 1,j: 1}
This will allow the secondary servers to temporarily lag behind while you are uploading the data.

Related

Mongodb Write Performance goes down when Wired Tiger cache utilization goes above 80% of total cache size

I am running load test on Mongodb using java driver at client side.
Cluster Setup includes: 3 Mongos, 3 Config Servers, 3 shards each having 3 members( Primary, Secondary and a Arbiter node).Everything is running on separate machine.
Config servers and shard nodes are configured with 4 GB wiredTiger cache size.
At client side, I am inserting same document every time. Here _id is auto generated. Size of the document is fixed and is 900 bytes only.
While running a load of 40k transaction per second from single client using insertOne operation, I am getting close to 40k transaction per second. But this TPS drops significantly and gradually becomes zero after complete RAM utilization both(WT cache and file system(Linux) cache). i.e, I am not able to insert any more document into DB.
However I tried to insert documents using second client to check if it is client side issue. But it also failed to insert.
IOStats of the server having primary node:
MongoStats at primary node:
MongoTop at primary node:

MongoDB on Amazon SSD-backed EC2

We have mongodb sharded cluster currently deployed on EC2 instances in Amazon. These shards are also replica sets. The instances used are using EBS with IOPS provisioned.
We have about 30 million documents in a collection. Our queries count the whole collection that matches the filters. We have indexes on almost all of the query-able fields. This results to the RAM reaching 100% usage. Our working set exceeds the size of the RAM. We think that the slow response of our queries are caused by EBS being slow so we are thinking of migrating to the new SSD-backed instances.
C3 is available
http://aws.typepad.com/aws/2013/11/a-generation-of-ec2-instances-for-compute-intensive-workloads.html
I2 is coming soon
http://aws.typepad.com/aws/2013/11/coming-soon-the-i2-instance-type-high-io-performance-via-ssd.html
Our only concern is that SSD is ephemeral, meaning the data will be gone once the instance stops, terminates, or fails. How can we address this? How do we automate backups. Is it a good idea to migrate to SSD to improve the performance of our queries? Do we still need to set-up a sharded cluster?
Working with the ephemeral disks is a risk but if you have your replication setup correctly it shouldn't be a huge concern. I'm assuming you've setup a three node replica set correct? Also you have three nodes for your config servers?
I can speak of this from experience as the company I'm at has been setup this way. To help mitigate risk I'm moving towards a backup strategy that involved a hidden replica. With this setup I can shutdown the hidden replica set and one of the config servers (first having stopped balancing) and take a complete copy of the data files (replica and config server) and have a valid backup. If AWS went down on my availability zone I'd still have a daily backup available on S3 to restore from.
Hope this helps.

ElasticSearch architecture and hosting

I am looking to use Elastic Search with MongoDB to support our full text search requirements. I am struggling to find information on the architecture and hosting and would like some help. I am planning to host ES on premise rather than in the cloud. We currently have MongoDB running in replica set with three nodes.
How many servers are required to run ElasticSearch for high availability?
What is the recommended server specification. Currently my thoughts are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
How does ES support failover
Thanks
Tariq
How many servers are required to run ElasticSearch for high availability?
At least 2
What is the recommended server specification. Currently my thoughts are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
It really depends on the amount of data you're indexing, but that amount of RAM and (I'm assuming a decent dual core CPU) should be enough to get you started
How does ES support failover
you set up a clustering with multiple nodes in such a way that each node has a replica of another
So in a simple example your cluster would consist of two servers, each with one node on them.
You'd set replicas to 1 so that the shards in your node, would have a backup copy stored on the other node and vice versa.
So if a node goes down, elasticsearch will detect the failure and route the requests for that node to its replica on another node, until you fix the problem.
Of course you could make this even more robust by having 4 servers with one node each and 2 replicas, as an example. What you must understand is that elasticsearch will optimize that distribution of replicas and primary shards based on the number of shards you have.
so with the 2 nodes and 1 replica example above, say you added 2 extra servers/nodes (1 node/server is recommended), Elasticsearch would move the replicas off the nodes and to their own node, so that you'd have 2 nodes with 1 primary shard(s) and nothing else then 2 other nodes with 1 copy of those shards (replicas) each.
How many servers are required to run ElasticSearch for high
availability?
I recommend 3 servers with 3 replication factor index. It will be more stable in case of one server goes down, plus it's better for highload, cause of queries can be distributed through cluster.
What is the recommended server specification. Currently my thoughts
are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
I strongly recommend more RAM. We have 72Gb on each machine in the cluster and ES works perfectly smooth (and we still never fall in garbage collector issues)
How does ES support failover
In our case at http://indexisto.com we had a lot of test and some production cluster server fails. Starting from 3 server no any issues in case server goes down. More servers in cluster - less impact of one server fail.

Does it make sense to have more shards than physical servers to spread Mongo writes?

MongoDB uses master-slave replication, which means that all writes go to the single master node (slaves are just backups or standby, or can serve reads that need not be absolutely current).
MongoDB also has sharding which splits the data into shards, each of which has their own replication set (i.e. their own master). As a result, if writes are spread evenly across shards, write performance increases.
So far, I have only considered sharding as a scale-out option: Add more machines to host the shards.
Does it make sense to have multiple shards on the same machine just to spread out writes, too?
Machine A: [Shard A Master] [Shard B Replica]
Machine B: [Shard B Master] [Shard C Replica]
Machine C: [Shard C Master] [Shard A Replica]
In most cases, running multiple mongod processes per physical machine (either as a replica set or as shard servers for different shards) does not increase write performance compared to running a single mongod per machine. MongoDB will perform very well under concurrent write scenarios as long as the "working set" of data (that is, frequently-accessed data) and indexes fits within RAM -- if you have only one mongod per machine, that mongod has access to all of RAM; if you have more, they will eventually contend with one another for a limited resource.
In special circumstances, such as if you have a lot of excess capacity on your physical machines, running multiple mongods can be beneficial, as they can better take advantage of the resources available on the machine.

What is the Node/Shard limit for MongoDB ona 64 bit machine?

We are currently setting up a mongodb database for our enviroment. We are running into specific collections which will initially be more then 2gb of size.
Out deployment enviroment is a 64bit Ubuntu machine.
We are trying to find what is the limit of a sizes of a specific collection and shard for mongodb in a sharding enviroment?
As far as I know, there is no limit to the size of a collection within MongoDB. The only limit would be the amount of disk space available to you. In the case of sharding, it would be the total amount of disk space available on all shards. And according to the docs, you can only have 1000 shards.