ElasticSearch architecture and hosting - mongodb

I am looking to use Elastic Search with MongoDB to support our full text search requirements. I am struggling to find information on the architecture and hosting and would like some help. I am planning to host ES on premise rather than in the cloud. We currently have MongoDB running in replica set with three nodes.
How many servers are required to run ElasticSearch for high availability?
What is the recommended server specification. Currently my thoughts are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
How does ES support failover
Thanks
Tariq

How many servers are required to run ElasticSearch for high availability?
At least 2
What is the recommended server specification. Currently my thoughts are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
It really depends on the amount of data you're indexing, but that amount of RAM and (I'm assuming a decent dual core CPU) should be enough to get you started
How does ES support failover
you set up a clustering with multiple nodes in such a way that each node has a replica of another
So in a simple example your cluster would consist of two servers, each with one node on them.
You'd set replicas to 1 so that the shards in your node, would have a backup copy stored on the other node and vice versa.
So if a node goes down, elasticsearch will detect the failure and route the requests for that node to its replica on another node, until you fix the problem.
Of course you could make this even more robust by having 4 servers with one node each and 2 replicas, as an example. What you must understand is that elasticsearch will optimize that distribution of replicas and primary shards based on the number of shards you have.
so with the 2 nodes and 1 replica example above, say you added 2 extra servers/nodes (1 node/server is recommended), Elasticsearch would move the replicas off the nodes and to their own node, so that you'd have 2 nodes with 1 primary shard(s) and nothing else then 2 other nodes with 1 copy of those shards (replicas) each.

How many servers are required to run ElasticSearch for high
availability?
I recommend 3 servers with 3 replication factor index. It will be more stable in case of one server goes down, plus it's better for highload, cause of queries can be distributed through cluster.
What is the recommended server specification. Currently my thoughts
are 2 x CPU, 4GB RAM, C drive: 40GB , D drive: 40GB
I strongly recommend more RAM. We have 72Gb on each machine in the cluster and ES works perfectly smooth (and we still never fall in garbage collector issues)
How does ES support failover
In our case at http://indexisto.com we had a lot of test and some production cluster server fails. Starting from 3 server no any issues in case server goes down. More servers in cluster - less impact of one server fail.

Related

Cockroach DB is slower in 64GB RAM cluster as compared to 32GB RAM cluster

I have installed CockroachDB in below clusters
a. 3 nodes cluster in which every node has 32GB RAM
b. 3 nodes cluster in which every node has 64GB RAM
I am testing the performance by running the same queries(Select, join, insert, delete, aggregate functions, nested queries, concurrent queries) in both the clusters.
After testing for 3 times, I have found that 64GB cluster is slower than 32 GB cluster.
I was expecting 64GB RAM cluster would be faster than 32GB RAM cluster.
I am not able to find the suitable answers for the same.
Any answers or insights would be greatly appreciated.
Thanks in Advance!
Thanks for the post! Without knowing the exact machine specs, configuration settings, and workload it would be hard to figure out what could be happening here :)
We have a Slack channel that might be a bit easier for back and forth on performance optimization, or a support ticket could be opened with a best effort SLA. If you have any more information on the run configuration that would be great too!
https://www.cockroachlabs.com/join-community/
https://support.cockroachlabs.com/hc/en-us

Cassandra and MongoDB minimum system requirements for Windows 10 Pro

RAM- 4GB,
PROCESSOR-i3 5010ucpu #2.10 GHz
64 bit OS
can Cassandra and MongoDB be installed in such a laptop? Will it run successfully?
The hardware configuration proposed does not meet the minimum requirements. For Cassandra, the documentation requests a minimum of 8GB of RAM and at least 2 cores.
MongoDB's documentation also states that it will need at least 2 real cores or one multi-core physical CPU. With 4GB in RAM, the WiredTiger will allocate 1.5GB for the cache. Please also note that MongoDB will require changes in BIOS to allow memory interleaving to enable Non-Uniform Access Memory, a.k.a. NUMA, such changes will impact the performance of the laptop for other processes.
Will it run successfully?
This will depend on the workload expected to be executed; there are documented examples where Cassandra was installed on a Raspberry Pi array, which since the design it was expected to have slow performance and have a limited amount of data that can be held in the cluster.
If you are looking to have a small sandbox to start using these databases there are other options, MongoDB has a service named Atlas, with a model of a database as a service, it offers a free tier for a 3-node replica and up to 512Mb of storage. For Cassandra there are similar options, AWS offers in the free tier a small cluster of their Managed Cassandra Service (MCS), Datastax is also planning to offer similar services with Constellation

MongoDB sharding: mongos and configuration servers together?

We want to create a MongoDB shard (v. 2.4). The official documentation recommends to have 3 config servers.
However, the policies of our company won't allow us to get 3 extra servers for this purpose. Since we have already 3 application servers (1 web node, 2 process nodes) we are considering to put the configuration servers in the same application servers, with the mongos. Availability is not critical for us.
What do you think about this configuration? Can we face some problem or is it discouraged for some reason?
Given that Availability is not critical for your use case, I would say it should be fine to place the config servers in the same application servers and mongos.
If one of the process nodes is down, you will lose: 1 x mongos, 1 application server and 1 config server. During this down time, the other two config servers will be read-only , which means there won't be balancing of shards, modification to cluster config etc. Although your other two mongos should still be operational (CRUD wise). If your web-node is down, then you have a bigger problem to deal with.
If two of the nodes are down (2 process nodes, or 1 web server and process node), again, you would have bigger problem to deal with. i.e. Your applications are probably not going to work anyway.
Having said that, please consider the capacity of these nodes to be able to handle a mongos, an application server and a config server. i.e. CPU, RAM, network connections, etc.
I would recommend to test the deployment architecture in a development/staging cluster first under your typical workload and use case.
Also see Sharded Cluster High Availability for more info.
Lastly, I would recommend to check out MongoDB v3.2 which is the current stable release. The config servers in v3.2 are modelled as a replica set, see Sharded Cluster config servers for more info.

MongoDB Write performance issue in Cluster

I am facing some performance issue while writing the data into MongoDB cluster(1Master,1Config , 2 Shards) that we have set.
For 1GB of data(8 Million documents) it is taking more than 30 min while in single node it was taking 8min for that same data.
Here are the config details of VMs:
Master Server – 7GB RAM, 3CPUs.
Config Server - 1GB RAM, 1CPU.
2 Shard servers- 2GB RAM, 2 CPUs (each shard)
All servers configured in Virtual Machines.
Please let me know if you have any idea to resolve this issue.
It seems your secondary servers are struggling to keep up with the data writes on the primary, most likely because the have a lower hardware spec. If it is acceptable, you should set the write concern to:
{ w: 1,j: 1}
This will allow the secondary servers to temporarily lag behind while you are uploading the data.

MongoDB on Amazon SSD-backed EC2

We have mongodb sharded cluster currently deployed on EC2 instances in Amazon. These shards are also replica sets. The instances used are using EBS with IOPS provisioned.
We have about 30 million documents in a collection. Our queries count the whole collection that matches the filters. We have indexes on almost all of the query-able fields. This results to the RAM reaching 100% usage. Our working set exceeds the size of the RAM. We think that the slow response of our queries are caused by EBS being slow so we are thinking of migrating to the new SSD-backed instances.
C3 is available
http://aws.typepad.com/aws/2013/11/a-generation-of-ec2-instances-for-compute-intensive-workloads.html
I2 is coming soon
http://aws.typepad.com/aws/2013/11/coming-soon-the-i2-instance-type-high-io-performance-via-ssd.html
Our only concern is that SSD is ephemeral, meaning the data will be gone once the instance stops, terminates, or fails. How can we address this? How do we automate backups. Is it a good idea to migrate to SSD to improve the performance of our queries? Do we still need to set-up a sharded cluster?
Working with the ephemeral disks is a risk but if you have your replication setup correctly it shouldn't be a huge concern. I'm assuming you've setup a three node replica set correct? Also you have three nodes for your config servers?
I can speak of this from experience as the company I'm at has been setup this way. To help mitigate risk I'm moving towards a backup strategy that involved a hidden replica. With this setup I can shutdown the hidden replica set and one of the config servers (first having stopped balancing) and take a complete copy of the data files (replica and config server) and have a valid backup. If AWS went down on my availability zone I'd still have a daily backup available on S3 to restore from.
Hope this helps.