MongoDB 32 Bit Master Node - mongodb

I know that if you run MongoDB on a 32 bit operating system it can only store up to 2 GB of data. Does this restriction apply to a master node in a MongoDB cluster?

Yes I believe. In master-slave setup, the master receives all the writes and the other slaves simply replicate from the master. So the master is the one where the following restrictions apply. From the FAQ
MongoDB uses memory-mapped files. When running a 32-bit build of
MongoDB, the total storage size for the server, including data and
indexes, is 2 gigabytes. For this reason, do not deploy MongoDB to
production on 32-bit machines.

Related

Mongodb on the cloud

I'm preparing my production environment on the Hetzner cloud, but I have some doubts (I'm more a developer than a devops).
I will get 3 servers for the replicaset with 8 core, 32 Gb ram and 240 gb ssd. I'm a bit worried about the size of the ssd the server comes with and Hetzner has the possibility to create volumes to be attached to the servers. Since mongodb uses a single folder for the db data, I was wondering how can I use the 240 gb that comes with the server in combination with external volumes. At the beginning I can use the 240 gb, but then I will have to move the data folder to a volume when it reaches capacity. Im fine with this, but it looks to me that when I will move to volumes, this 240gb will not be used anymore (yes I can use them to save the mongo journaling as they suggest to store it in a separate partition).
So, my noob question is, how can I use both the disk that comes with the server and the external volumes?
Thank you

Cassandra and MongoDB minimum system requirements for Windows 10 Pro

RAM- 4GB,
PROCESSOR-i3 5010ucpu #2.10 GHz
64 bit OS
can Cassandra and MongoDB be installed in such a laptop? Will it run successfully?
The hardware configuration proposed does not meet the minimum requirements. For Cassandra, the documentation requests a minimum of 8GB of RAM and at least 2 cores.
MongoDB's documentation also states that it will need at least 2 real cores or one multi-core physical CPU. With 4GB in RAM, the WiredTiger will allocate 1.5GB for the cache. Please also note that MongoDB will require changes in BIOS to allow memory interleaving to enable Non-Uniform Access Memory, a.k.a. NUMA, such changes will impact the performance of the laptop for other processes.
Will it run successfully?
This will depend on the workload expected to be executed; there are documented examples where Cassandra was installed on a Raspberry Pi array, which since the design it was expected to have slow performance and have a limited amount of data that can be held in the cluster.
If you are looking to have a small sandbox to start using these databases there are other options, MongoDB has a service named Atlas, with a model of a database as a service, it offers a free tier for a 3-node replica and up to 512Mb of storage. For Cassandra there are similar options, AWS offers in the free tier a small cluster of their Managed Cassandra Service (MCS), Datastax is also planning to offer similar services with Constellation

MongoDB Write performance issue in Cluster

I am facing some performance issue while writing the data into MongoDB cluster(1Master,1Config , 2 Shards) that we have set.
For 1GB of data(8 Million documents) it is taking more than 30 min while in single node it was taking 8min for that same data.
Here are the config details of VMs:
Master Server – 7GB RAM, 3CPUs.
Config Server - 1GB RAM, 1CPU.
2 Shard servers- 2GB RAM, 2 CPUs (each shard)
All servers configured in Virtual Machines.
Please let me know if you have any idea to resolve this issue.
It seems your secondary servers are struggling to keep up with the data writes on the primary, most likely because the have a lower hardware spec. If it is acceptable, you should set the write concern to:
{ w: 1,j: 1}
This will allow the secondary servers to temporarily lag behind while you are uploading the data.

MongoDB on Amazon SSD-backed EC2

We have mongodb sharded cluster currently deployed on EC2 instances in Amazon. These shards are also replica sets. The instances used are using EBS with IOPS provisioned.
We have about 30 million documents in a collection. Our queries count the whole collection that matches the filters. We have indexes on almost all of the query-able fields. This results to the RAM reaching 100% usage. Our working set exceeds the size of the RAM. We think that the slow response of our queries are caused by EBS being slow so we are thinking of migrating to the new SSD-backed instances.
C3 is available
http://aws.typepad.com/aws/2013/11/a-generation-of-ec2-instances-for-compute-intensive-workloads.html
I2 is coming soon
http://aws.typepad.com/aws/2013/11/coming-soon-the-i2-instance-type-high-io-performance-via-ssd.html
Our only concern is that SSD is ephemeral, meaning the data will be gone once the instance stops, terminates, or fails. How can we address this? How do we automate backups. Is it a good idea to migrate to SSD to improve the performance of our queries? Do we still need to set-up a sharded cluster?
Working with the ephemeral disks is a risk but if you have your replication setup correctly it shouldn't be a huge concern. I'm assuming you've setup a three node replica set correct? Also you have three nodes for your config servers?
I can speak of this from experience as the company I'm at has been setup this way. To help mitigate risk I'm moving towards a backup strategy that involved a hidden replica. With this setup I can shutdown the hidden replica set and one of the config servers (first having stopped balancing) and take a complete copy of the data files (replica and config server) and have a valid backup. If AWS went down on my availability zone I'd still have a daily backup available on S3 to restore from.
Hope this helps.

ElasticSearch architecture and hosting

I am looking to use Elastic Search with MongoDB to support our full text search requirements. I am struggling to find information on the architecture and hosting and would like some help. I am planning to host ES on premise rather than in the cloud. We currently have MongoDB running in replica set with three nodes.
How many servers are required to run ElasticSearch for high availability?
What is the recommended server specification. Currently my thoughts are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
How does ES support failover
Thanks
Tariq
How many servers are required to run ElasticSearch for high availability?
At least 2
What is the recommended server specification. Currently my thoughts are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
It really depends on the amount of data you're indexing, but that amount of RAM and (I'm assuming a decent dual core CPU) should be enough to get you started
How does ES support failover
you set up a clustering with multiple nodes in such a way that each node has a replica of another
So in a simple example your cluster would consist of two servers, each with one node on them.
You'd set replicas to 1 so that the shards in your node, would have a backup copy stored on the other node and vice versa.
So if a node goes down, elasticsearch will detect the failure and route the requests for that node to its replica on another node, until you fix the problem.
Of course you could make this even more robust by having 4 servers with one node each and 2 replicas, as an example. What you must understand is that elasticsearch will optimize that distribution of replicas and primary shards based on the number of shards you have.
so with the 2 nodes and 1 replica example above, say you added 2 extra servers/nodes (1 node/server is recommended), Elasticsearch would move the replicas off the nodes and to their own node, so that you'd have 2 nodes with 1 primary shard(s) and nothing else then 2 other nodes with 1 copy of those shards (replicas) each.
How many servers are required to run ElasticSearch for high
availability?
I recommend 3 servers with 3 replication factor index. It will be more stable in case of one server goes down, plus it's better for highload, cause of queries can be distributed through cluster.
What is the recommended server specification. Currently my thoughts
are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
I strongly recommend more RAM. We have 72Gb on each machine in the cluster and ES works perfectly smooth (and we still never fall in garbage collector issues)
How does ES support failover
In our case at http://indexisto.com we had a lot of test and some production cluster server fails. Starting from 3 server no any issues in case server goes down. More servers in cluster - less impact of one server fail.