Cassandra and MongoDB minimum system requirements for Windows 10 Pro - mongodb

RAM- 4GB,
PROCESSOR-i3 5010ucpu #2.10 GHz
64 bit OS
can Cassandra and MongoDB be installed in such a laptop? Will it run successfully?

The hardware configuration proposed does not meet the minimum requirements. For Cassandra, the documentation requests a minimum of 8GB of RAM and at least 2 cores.
MongoDB's documentation also states that it will need at least 2 real cores or one multi-core physical CPU. With 4GB in RAM, the WiredTiger will allocate 1.5GB for the cache. Please also note that MongoDB will require changes in BIOS to allow memory interleaving to enable Non-Uniform Access Memory, a.k.a. NUMA, such changes will impact the performance of the laptop for other processes.
Will it run successfully?
This will depend on the workload expected to be executed; there are documented examples where Cassandra was installed on a Raspberry Pi array, which since the design it was expected to have slow performance and have a limited amount of data that can be held in the cluster.
If you are looking to have a small sandbox to start using these databases there are other options, MongoDB has a service named Atlas, with a model of a database as a service, it offers a free tier for a 3-node replica and up to 512Mb of storage. For Cassandra there are similar options, AWS offers in the free tier a small cluster of their Managed Cassandra Service (MCS), Datastax is also planning to offer similar services with Constellation

Related

Cockroach DB is slower in 64GB RAM cluster as compared to 32GB RAM cluster

I have installed CockroachDB in below clusters
a. 3 nodes cluster in which every node has 32GB RAM
b. 3 nodes cluster in which every node has 64GB RAM
I am testing the performance by running the same queries(Select, join, insert, delete, aggregate functions, nested queries, concurrent queries) in both the clusters.
After testing for 3 times, I have found that 64GB cluster is slower than 32 GB cluster.
I was expecting 64GB RAM cluster would be faster than 32GB RAM cluster.
I am not able to find the suitable answers for the same.
Any answers or insights would be greatly appreciated.
Thanks in Advance!
Thanks for the post! Without knowing the exact machine specs, configuration settings, and workload it would be hard to figure out what could be happening here :)
We have a Slack channel that might be a bit easier for back and forth on performance optimization, or a support ticket could be opened with a best effort SLA. If you have any more information on the run configuration that would be great too!
https://www.cockroachlabs.com/join-community/
https://support.cockroachlabs.com/hc/en-us

Presto on Preemptible GCE instances

I am running an instance group of 20 Preemptible GCE instance to read ORC files on Google storage, The data partitioned by hour, each hour about 2GB.
What type of instances should i use ?
How many of the Ram should be used by the JVM ?
I am using autoscale configuration of 80% CPU and 10 minute cooldown, Is there more subtitle config for Presto ?
Is there a solution for servers shutdowns, due to lack of resources ?
Partial responses will be appreciated as well.
As 0.199 version of PrestoDB there's no google cloud storage connector for Presto, which makes impossible to query GCS data.
Regarding hardware requirements, I'll cite Terada doc here.
Memory
You should allocate a minimum of 16GB of RAM per node for Presto. But
recommend 64GB for most production workloads.
Network Bandwidth
It is recommended to have 10 Gigabit Ethernet between all the nodes in
the cluster.
Other Recommendations
Presto can be installed on any normally configured Hadoop cluster.
YARN should be configured to account for resources dedicated to
Presto. For example, if a node has 64GB of RAM, perhaps you would
normally allocate 60GB to YARN. If you install Presto on that node and
give Presto 32GB of RAM, then you should subtract 32GB from the 60GB
and let YARN only allocate 28GB per node. An optimized configuration
might choose to have separate Presto and Hadoop nodes. The optimized
configuration allows you to give more memory to Presto, and thus
perform larger join queries, for example.

MongoDB Response Slower on Amazon EC2 Server Than Localhost

I'm loading the same amount of data (~100kb) on both my local server and a test Amazon EC2 server, but the response is 2x slower on EC2. Both are running Apache 2 and MongoDB on the same machine. On my local server, the response is about 209ms versus approximately 455ms on EC2.
I've setup a simple query and AJAX call that grabs point data to display on the map based on the current viewport of the device.
How can I debug this issue? How can I make it as faster as my local server? I even tried experimenting with different types of instances to make sure the specs are the same, but no luck. I also realize it could be because of network latency.
Local computer specs:
Intel Core i5 # 3.30GHz
8GB RAM
64-bit Windows 8
Amazon EC2 specs (m4.large):
2.4 GHz Intel Xeon Haswell (2 vCPUs)
8GB RAM
Amazon Linux
A remote query to EC2 is unlikely to return the result of the AJAX query as fast as your local server because it has network latency, while your local server does not. Measure the time in your AJAX handler from the start of the query to the point where it is ready to return data to get a meaningful baseline for comparison.
MongoDB is very sensitive to data being in RAM vs on disk. Depending on how you configured your EC2 instance, and on your local hardware, chances are pretty good that your local hardware is faster. EC2 instances can be configured to use SSD storage, and you can configure a guaranteed IOPS figure.
Is the 100KB the size of the result set, or the amount of data needed to form the result set? If you process 4GB of data down to get a 100KB result set, there's a good chance that disk IO is involved. If the amount of data you need to pull is small, repeat the test a few times to ensure data is entirely in RAM.
Finally, if both local and EC2 are pulling data from RAM, there's a good chance that your local CPU core is just faster than the EC2 CPU core, and that your RAM access is faster as well. EC2 is designed to provide low-cost commodity hardware. Developer setups are often much faster.
If you cannot account for the speed differences given the factors above, update your question with the time measurements that exclude network latency and provide more detailed specifications about your hardware. Update the question to indicate whether the data you are retrieving from MongoDB should be entirely in RAM, given it's size and the amount of RAM on your instance.

Running MongoDB and Redis on two different containers in the same host machine

I have read somewhere that MongoDB and Redis server shouldn't be executed in the same host because the way that Redis manages the memory damages MongoDb. This is before Docker.io. But now thing seems are pretty different or not? Is is convenient running Redis server and MongoDB on two different containers on the same host machine?
Docker does not change your hardware, also it is the OS that deals with resources which is not virtualized so the same rules as a normal hardware should apply here.
RAM
MongoDB and Redis don't share any memory. The problem of using the same host will be that you can run out of RAM with these two processes, you can put a max size for redis, you can probably do the same for MongoDB, it is mandatory.
If your sizing is good (MongoDB RAM + Redis RAM < Hardware RAM), you won't get any swap on disk for redis (which is absolutely what you want to prevent) but maybe mongodb cache won't be as good (not enough place for optimization). Less memory for redis is always a challenge if your data grows: beware of out of memory if the data size is unpredictable!
If you use backups with redis, it uses more RAM than its dataset to produce the dump, so beware of that. It implies also using IO.
IO
In this case (less RAM) mongo will do a lot more of IO to access data. Redis, depending on your backup policy, can use IO or not (your choice). Worst case: if you use AOF on redis, it is a lot of IO so maybe IO can become a bottleneck in this architecture. If you don't use backups with redis: you won't have problems. Also a SSD is a good choice for Mongo.
CPU
I don't know if MongoDB uses a lot of CPU, but redis most of the time does not except during backups. If you use backups with redis: try to have two CPU cores available for it (one for redis, one for backup task).
Network
It depends on your number of clients. But you should check the throughput / input load of your machine to see if you are not saturating (using monit for instance with alerts). Sometimes it is the bottleneck, not enought throughput in one machine!
Many of today's services, in particular Databases, are very aggressive consuming resources and are designed thinking they will (or should) be executed in a dedicated machine for them. MongoDB and Redis try to keep a lot of data in memory and will try to take the more memory they can for themselves. To avoid this services take all the memory of your host machine you can limit the maximum memory used by a container using -m="<number><optional unit>" in docker run. E.g.: docker run -d -m="2g" -p 27017:27017 --name mongodb dockerfile/mongodb
So you can control in an easy way the resource limits of your services, and run them in the same host with a fine grained control of the resources. Anyway it's important to consider that the performance of these services is designed thought that the resources of the host machine will be fully available for them. For example there are other databases as Cassandra that will consume a lot of memory, and furthermore, are designed to have sequential access writing to disk. In these cases Docker will let you to run limiting the resources used, but if you run multiple services in the same host the performance of them will decrease severely.

Put memcached on db or web server instance?

For my Drupal-based site, I have an architecture with 3 instances running nginx, postgresql, & solr, respectively. I'd like to install Memcached. Should I put it on the nginx or postgresql server? What are the performance implications?
Memcached is very light on CPU usage, so it is a great candidate to gobble up spare web server RAM. Also, you will scale out you web tier much more than your other tiers, and Memcached clustering can pool that RAM together into one logical cache.
If you have any spare RAM on the DB, it is almost always best for performance to let the DB gobble it up.
TL;DR Let DB have all of the RAM, colocate memcached on web tier.
Source: http://code.google.com/p/memcached/wiki/NewHardware
The best is to have a separate server (if you can do that).
Otherwise, it depends on your servers CPU & memory utilization and availability requirements. In general I would avoid running anything extra on a DB server machine...since DB is the foundation of the system and has to be available and performing well.
if your Solr server does not have high traffic an don't utilize much memory I'd put it in there. Memcached servers known to be light on CPU. Also you should estimate how much memory memcached instance will need...to make sure its enough on the server.