Mongodb on the cloud - mongodb

I'm preparing my production environment on the Hetzner cloud, but I have some doubts (I'm more a developer than a devops).
I will get 3 servers for the replicaset with 8 core, 32 Gb ram and 240 gb ssd. I'm a bit worried about the size of the ssd the server comes with and Hetzner has the possibility to create volumes to be attached to the servers. Since mongodb uses a single folder for the db data, I was wondering how can I use the 240 gb that comes with the server in combination with external volumes. At the beginning I can use the 240 gb, but then I will have to move the data folder to a volume when it reaches capacity. Im fine with this, but it looks to me that when I will move to volumes, this 240gb will not be used anymore (yes I can use them to save the mongo journaling as they suggest to store it in a separate partition).
So, my noob question is, how can I use both the disk that comes with the server and the external volumes?
Thank you

Related

Migrate to kubernetes

We're planning to migrate our software to run in kubernetes with auto scalling, this is our current infrastructure:
PHP and apache are running in Google Compute Engine n1-standard-4 (4 vCPUs, 15 GB memory)
MySql is running in Google Cloud SQL
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
I found many posts that recomments to store the data file in the Google Cloud Storage and use the API to fetch the file and uploading to the bucket. We have very limited time so I decide to use NFS to share the data files over the pods, the problem is nfs speed is slow, it's around 100mb/s when I copying the file with pv, the result from iperf is 1.96 Gbits/sec.Do you know how to achieve the same result without implement the cloud storage? or increase the NFS speed?
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
There's nothing stopping you from volume mounting an SSD into the Pod so you can continue to use an SSD. I can only speak to AWS terminology, but some EC2 instances come with "local" SSD hardware, and thus you would only need to use a nodeSelector to ensure your Pods were scheduled onto machines that had said local storage available.
Where you're going to run into problems is if you are currently just using one php+apache and thus just one SSD, but now you want to scale the application up and it requires that all php+apache have access to the same SSD. That's a classic distributed application architecture problem, and something kubernetes itself can't fix for you.
If you're willing to expend the effort, you can also try any one of the other distributed filesystems (Ceph, GlusterFS, etc) and see if they perform better for your situation. Then again, "We have very limited time" I guess pretty much means that's off the table.

MongoDB Response Slower on Amazon EC2 Server Than Localhost

I'm loading the same amount of data (~100kb) on both my local server and a test Amazon EC2 server, but the response is 2x slower on EC2. Both are running Apache 2 and MongoDB on the same machine. On my local server, the response is about 209ms versus approximately 455ms on EC2.
I've setup a simple query and AJAX call that grabs point data to display on the map based on the current viewport of the device.
How can I debug this issue? How can I make it as faster as my local server? I even tried experimenting with different types of instances to make sure the specs are the same, but no luck. I also realize it could be because of network latency.
Local computer specs:
Intel Core i5 # 3.30GHz
8GB RAM
64-bit Windows 8
Amazon EC2 specs (m4.large):
2.4 GHz Intel Xeon Haswell (2 vCPUs)
8GB RAM
Amazon Linux
A remote query to EC2 is unlikely to return the result of the AJAX query as fast as your local server because it has network latency, while your local server does not. Measure the time in your AJAX handler from the start of the query to the point where it is ready to return data to get a meaningful baseline for comparison.
MongoDB is very sensitive to data being in RAM vs on disk. Depending on how you configured your EC2 instance, and on your local hardware, chances are pretty good that your local hardware is faster. EC2 instances can be configured to use SSD storage, and you can configure a guaranteed IOPS figure.
Is the 100KB the size of the result set, or the amount of data needed to form the result set? If you process 4GB of data down to get a 100KB result set, there's a good chance that disk IO is involved. If the amount of data you need to pull is small, repeat the test a few times to ensure data is entirely in RAM.
Finally, if both local and EC2 are pulling data from RAM, there's a good chance that your local CPU core is just faster than the EC2 CPU core, and that your RAM access is faster as well. EC2 is designed to provide low-cost commodity hardware. Developer setups are often much faster.
If you cannot account for the speed differences given the factors above, update your question with the time measurements that exclude network latency and provide more detailed specifications about your hardware. Update the question to indicate whether the data you are retrieving from MongoDB should be entirely in RAM, given it's size and the amount of RAM on your instance.

Do you need to run RAID 10 on Mongo when using Provisioned IOPS on Amazon EBS?

I'm trying to setup a production mongo system on Amazon to use as a datastore for a realtime metrics system,
I initially used the MongoDB AMIs[1] in the Marketplace, but I'm confused in that there is only one data EBS. I've read that Mongo recommends RAID 10 on EBS storage (8 EBS on each server). Additionally, I've read that the bare minimum for production is a primary/secondary with an arbiter. Is RAID 10 still the recommended setup, or is one provisioned IOPS EBS sufficient?
Please Advise. We are a small shop, so what is the bare minimum we can get away with and still be reasonably safe?
[1] MongoDB 2.4 with 1000 IOPS - data: 200 GB # 1000 IOPS, journal: 25 GB # 250 IOPS, log: 10 GB # 100 IOPS
So, I just got off of a call with an Amazon System Engineer, and he had some interesting insights related to this question.
First off, if you are going to use RAID, he said to simply do striping, as the EBS blocks were mirrored behind the scenes anyway, so raid 10 seemed like overkill to him.
Standard EBS volumes tend to handle spiky traffic well (it may be able to handle 1K-2K iops for a few seconds), however eventually it will tail off to an average of 100 iops. One suggestion was to use many small EBS volumes and stripe them to get better iops throughput.
Some of his customers use just the ephemeral storage on the EC2 images, but then have multiple (3-5) nodes in the availability set. The ephemeral storage is the storage on the physical machine. Apparently, if you use the EC2 instance with the SSD storage, you can get up to 20K iops.
Some customers will do a huge EC2 image w/ssd for the master, then do a smaller EC2 w/ EBS for the secondary. The primary machine is performant, but the failover is available but has degraded performance.
make sure you check 'EBS Optimized' when you spin up an instance. That means you have a dedicated channel to the EBS storage (of any kind) instead of sharing the NIC.
Important! Provisioned IOPS EBS is expensive, and the bill does not shut off when you shut down the EC2 instances they are attached to. (this sucks while you are testing) His advice was to take a snapshot of the EBS volumes, then delete them. When you need them again, just create new provisioned IOPS EBS volumes, restore the snapshot, then reconfigure your EC2 instances to attache the new storage. (it's more work than it should be, but it's worth it not to get sucker punched with the IOPS bill.
I've got the same question. Both Amazon and Mongodb try to market a lot on provisioned IOPs chewing over its advantages over a standard EBS volume. We run prod instances on m2.4xlarge aws instances with 1 primary and 2 secondaries setup per service. In the highest utilized service cluster, apart from a few slow queries the monitoring charts do not reveal any drop on performance at all. Page faults are rare occurrences and that too between 0.0001 and 0.0004 faults once or twice a day. Background flushes are in milliseconds and locks and queues are so far at manageable levels. I/O waits on the Primary node at any time ranges between 0 to 2 %, mostly less than 1 and %idle steadily stays above 90% mark. Do I still need to consider provisioned IOPs given we've a budget still to improve any potential performance drag? Any guidance will be appreciated.

MongoDB 32 Bit Master Node

I know that if you run MongoDB on a 32 bit operating system it can only store up to 2 GB of data. Does this restriction apply to a master node in a MongoDB cluster?
Yes I believe. In master-slave setup, the master receives all the writes and the other slaves simply replicate from the master. So the master is the one where the following restrictions apply. From the FAQ
MongoDB uses memory-mapped files. When running a 32-bit build of
MongoDB, the total storage size for the server, including data and
indexes, is 2 gigabytes. For this reason, do not deploy MongoDB to
production on 32-bit machines.

MongoDB in the cloud hosting, benefits

Im still fighting with mongoDB and I think this war will end is not soon.
My database has a size of 15.95 Gb;
Objects - 9963099;
Data Size - 4.65g;
Storage Size - 7.21g;
Extents - 269;
Indexes - 19;
Index Size - 1.68g;
Powered by:
Quad Xeon E3-1220 4 × 3.10 GHz / 8Gb
For me to pay dearly for a dedicated server.
On VPS 6GB memory, database is not imported.
Migrate to the cloud service?
https://www.dotcloud.com/pricing.html
I try to pick up the rate but there max 4Gb memory mongoDB (USD 552.96/month o_0), I even import your base can not, not enough memory.
Or something I do not know about cloud services (no experience with)?
Cloud services are not available to a large database mongoDB?
2 x Xeon 3.60 GHz, 2M Cache, 800 MHz FSB / 12Gb
http://support.dell.com/support/edocs/systems/pe1850/en/UG/p1295aa.htm
Will work my database on that server?
This is of course all the fun and get the experience in the development, but already beginning to pall ... =]
You shouldn't have an issue with a db of this size. We were running a mongodb instance on Dotcloud with 100's of GB of data. It may just be because Dotcloud only allow 10GB of HDD space by default per service.
We were able to backup and restore that instance on 4GB of RAM - albeit that it took several hours
I would suggest you email them directly support#dotcloud.com to get help increasing the HDD allocation of your instance.
You can also consider using ObjectRocket which is a MOngoDB as a service. For a 20Gb database the price is $149 per month - http://www.objectrocket.com/pricing