MongoDB on Amazon SSD-backed EC2 - mongodb

We have mongodb sharded cluster currently deployed on EC2 instances in Amazon. These shards are also replica sets. The instances used are using EBS with IOPS provisioned.
We have about 30 million documents in a collection. Our queries count the whole collection that matches the filters. We have indexes on almost all of the query-able fields. This results to the RAM reaching 100% usage. Our working set exceeds the size of the RAM. We think that the slow response of our queries are caused by EBS being slow so we are thinking of migrating to the new SSD-backed instances.
C3 is available
http://aws.typepad.com/aws/2013/11/a-generation-of-ec2-instances-for-compute-intensive-workloads.html
I2 is coming soon
http://aws.typepad.com/aws/2013/11/coming-soon-the-i2-instance-type-high-io-performance-via-ssd.html
Our only concern is that SSD is ephemeral, meaning the data will be gone once the instance stops, terminates, or fails. How can we address this? How do we automate backups. Is it a good idea to migrate to SSD to improve the performance of our queries? Do we still need to set-up a sharded cluster?

Working with the ephemeral disks is a risk but if you have your replication setup correctly it shouldn't be a huge concern. I'm assuming you've setup a three node replica set correct? Also you have three nodes for your config servers?
I can speak of this from experience as the company I'm at has been setup this way. To help mitigate risk I'm moving towards a backup strategy that involved a hidden replica. With this setup I can shutdown the hidden replica set and one of the config servers (first having stopped balancing) and take a complete copy of the data files (replica and config server) and have a valid backup. If AWS went down on my availability zone I'd still have a daily backup available on S3 to restore from.
Hope this helps.

Related

Autoscaling limited by RDS connection

I have some nightly jobs that are running on EC2 and the number of machines is scaled by the number of messages in SQS. My process requires reads from a Postgres RDS database. Now these are the issues I am facing.
Not able to scale beyond a certain number because of the unavailability of connections.
I tried creating a connection pool using pgbouncer, and tried with different settings as well, but it's missing a lot of data on the resultant set.
Make your postgresql RDS install multi AZ. Then you can make read replicas on demand and scale read performance with your load.
To answer the comments:
Some extra "plumbing" is required to make the connections to the read replica. Maybe route53 dynamically updated records as the scaling happens or something like haproxy
The reason I mention multi AZ is that this would help prevent downtime during an auto scaling event bringing up the read replica
It would be simpler (but more costly) to permanently bring up a read replica and use DNS round robin to share the load
See https://aws.amazon.com/blogs/aws/amazon-rds-announcing-read-replicas/ for information on read replicas

Taking EBS snapshot for multiple mongo node EBS volumes in mongoDB cluster

I have journal and data both on the same volume for a mongoDB shard, so the consistency problem of taking snapshots only after locking using fsyncLock is not needed. An EBS snapshot would be consistent point in time for a single shard.
I would like to know what is the preferred way of taking backups in mongodb cluster. I have explored two options:
Approximate point in time consistent backup by taking the EBS snapshots around the same time. Advantage being, no write lock needs to be taken.
Stop writes on the system, then take snapshots. This would give point in time consistent backup.
Now, I'd like to know how is it actually done in production. I've read about replica set's secondary node being used, but not clear how it gives point in time consistent backup. Unless all the secondary nodes have a consistent point in time data, the EBS snapshot cannot be point in time. For example, what if for a secondary for NodeA, data is synced with primary, but some data for secondary for NodeB is not. Am I missing something here?
Also, can if ever happen that approach 1 leads to inconsistent MongoDB cluster (when restored), such that is crashes or stuff?
Consistent backups
The first steps in any sharded cluster backup procedure should be to:
Stop the balancer (including waiting for any migrations in progress to complete). Usually this is done with the sh.stopBalancer() shell helper.
Backup a config server (usually with the same method as your shard servers, so EBS or filesystem snapshot)
I would define a consistent backup of a sharded cluster as one where the sharded cluster metadata (i.e. the data stored on your config servers) corresponds with the backups for the individual shards, and each of the individual shards has been correctly backed up. Stopping the balancer ensures that no data migrations happen while your backup is underway.
Assuming your MongoDB data and journal files are on a single volume, you can take a consistent EBS snapshot or filesystem snapshot without stopping writes to the node you are backing up. Snapshots occur asynchronously. Once an initial snapshot has been created, successive snapshots are incremental (only needing to update blocks that have changed since the previous snapshot).
Point-in-time backup
With an active sharded cluster, you can only easily capture a true point-in-time backup of data that has been written by stopping all writes to the cluster and backing up the primaries for each shard. Otherwise, as you have surmised, there may be differing replication lag between shards if you backup from secondaries. It's more common to backup from secondaries as there is some I/O overhead while the snapshots are written.
If you aren't using replication for your shards (or prefer to backup from primaries) the replication lag caveat doesn't apply, but the timing will be still be approximate for an active system as the snapshots need to be started simultaneously across all shards.
Point-in-time restore
Assuming all of your shards are backed by replica sets it is possible to use an approximate point-in-time consistent backup to orchestrate a restore to a more specific point-in-time using the replica set oplog for each of the shards (plus a config server). This is essentially the approach taken by backup solutions such as MongoDB Cloud Manager (née MMS): see MongoDB Backup for Sharded Cluster. MongoDB Cloud Manager leverages backup agents on each shard for continuous backup using the replication oplog, and periodically creates full snapshots on a schedule. Point-in-time restores can be built by starting from a full data snapshot and then replaying the relevant oplogs up to a requested point-in-time.
What's the common production approach?
Downtime is generally not a desirable backup strategy for a production system, so the common approach is to take a consistent backup of a running sharded cluster at an approximate point-in-time using snapshots. Coordinating backup across a sharded cluster can be challenging, so backup tools/services are also worth considering. Backup services can also be more suitable if your deployment doesn't allow snapshots (for example, if your data and/or journal directories are spread across multiple volumes to maximise available IOPS).
Note: you should really, really consider using replication for your production deployment unless this is a non-essential cluster or downtime is acceptable. Replica sets help maximise uptime & availability for your deployment and some maintenance tasks (including backup) will be much more impactful without data redundancy.
Your backup will be divided into multiple phases:
Stop the balancer on the mongos with sh.stopBalancer()
You can backup now the config database of the config servers. Does not matter whether you do it using EBS snapshots or mongodump --oplog
Now the shards and you can decide which way:
Either: You backup every node with mongodump --oplog. You do not need to stop writes since you're snapshotting the oplog together with the database export. This backup allows a consistent restore. When restoring, you can use the --oplogReplay and the --oplogLimit options to specify a timestamp (assuming your oplog is sized appropriately and did not roll over during backup). You can perform a dump on all shards in parallel and by the restore is synchronized by the oplog.
Or you fsync and lock and create an EBS snapshot (described http://docs.mongodb.org/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2/) for every shard. MongoDB 3.0 cannot guarantee when using WiredTiger that the data files do not change. The cost here is, that you're required to stop all reads and writes since you have to unmount the device.
Now start the balancer on the mongos with sh.startBalancer()
Since you do not use replica sets, you have no hassle with lagging secondaries/a write is not replicated throughout the cluster. My favorite option is using mongodump/mongorestore which give a lot of control over the restore.
Update:
In the end, you've to decide, what you want to pay to get certain benefits:
Snapshots: Pay with space, write lock and a certain level of consistency to get fast backups, fast restore times and, not impacting performance after backup
Dumping: Pay with time and ousting the working set during backup to get smaller backups for consistent and slower restores, no write locks

setting development project with mongo database on EC2 cluster

I would like to create a development project on EC2 cluster. Current design suggest accessing mongo database files stored on EBS volume. If that is possible to run distributed computing and access same files in /data/db/ simultaneously from different nodes?
No, that will not work. You cannot access the same mongodb database files from different processes on different nodes.
The way you use mongoDB in a distributed environment is with replica sets and sharding. In both cases you have mongodb instances running on each node. Replica sets duplicate the same data across all the nodes in the set, for data redundancy and fault tolerance. Sharding allows you to distribute different sets of data on different nodes to provide horizontal scaling. Large production environments use both replica sets and sharding.
Best place to read up on all of this is:
http://docs.mongodb.org/manual/administration/replica-sets/
http://docs.mongodb.org/manual/sharding/
http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/

Deploying large data on mongodb replicaset

Can I deploy large database by copying its files (eg. testing database with files: testing.0,testing.1,testing.ns found on mongodb dbpath) from another server to the target servers (replica set) to avoid usage of communication bandwidth for replication (in case it is only deployed to the primary)? So basically I want to avoid the slow process of replication.
If journaling is enabled, what is the effect on the process?
Yes you can, this is a perfectly valid way of solving having to do tedious and time consuming replication between members of a distanced or latenced network.
If journaling is enabled nothing really happens, copying via the file system goes around MongoDB.

mongodb automatic failover / high availability on aws

I need the proper way of failover mechanism for mongodb on aws ec2. I know failover can be accomplished by replica sets, but what is the best way to fire a new mongo installed ubuntu-ec2 ami node and add it to replica set again automatically (with zero manual operation) and return the replica set to it's proper state ?
EBS has some problems, but if I use local instance storage, I will lost the dead nodes data, but does the replica got all the master data and so is replaca is enough to recover everthing (on mongo 1.8 with journaling), or do I have to use only EBS ?
How should I start mongo instances, If I should start with repair option, how can I sperate node's first run from failover restart ?
Regards,
The easiest way to bring up new nodes is to bring up a new node with a recent backup.
So now it's a question of how you do your backup and how you restore from the backup quickly.
The MongoDB site has a write up for backups (in general) and backups on EC2 specifically. There's also a write-up for adding a new set member.
You can do this with instance storage or EBS drives, but you'll need different strategies for each. There's really no single way to do this, so I would check out the docs I've linked to for a primer.
Highly recommend reading Sean Coates' article on mutli-node MongoDB Elections, failover and AWS - specifically, the subtlety on distributed arbiter nodes (e.g., make sure to give yourself a voting majority when an AZ goes down). A similar recommendation can be found in a comment on this (now-closed) MongoDB vs. Cassandra thread.