CockroachDB snapshot backups in Kubernetes - kubernetes

I am trying to take snapshot backups with Velero in Kubernetes of a 12 node test CockroachDB cluster with Velero such that, if the cluster failed, we could rebuild the cluster and restore the cockroachdb from these snapshots.
We're using Velero to do that and the snapshot and restore seems to work, but on recovery, we seem to have issues with CockroachDB losing ranges.
Has anyone gotten snapshot backups to work with CockroachDB with a high scale database? (Given the size of the dataset, doing dumps or restores from dumps is not viable.)

Performing backups of the underlying disks while CockroachDB nodes are running is unlikely to work as expected.
The main reason is that even if a persistent disk snapshot is atomic, there is no way to ensure that all disks are captured at the exact same time (time being defined by CockroachDB's consistency mechanism). The restore would contain data with replicas across nodes at different commit indices, resulting in data loss or loss of quorum (show in the Admin UI as "unavailable" ranges).
You have a few options (in order or convenience):
CockroachDB BACKUP which has all nodes write data to external storage (S3, GCS, etc...). Before version 20.2, this is only available with an enterprise license.
SQL dump which is impractical for large datasets
stop all nodes, snapshot all disks, startup all nodes again. warning: this is something we have used to quickly load testing datasets but have not used it in production environments.

Related

Should I use EBS or EFS for database?

For database directories for MongoDB, Cassandra or Elasticsearch clusters with high availability, should I use EBS or EFS? MongoDB, Cassnadra and Elasticsearch clusters take care of replicating data across nodes if they are configured to have replication factor > 1, so EFS replication feature may not be needed I giuess.
EBS - for databases
EFS - for file sharing across applications, VMs etc
Here is a good article that differentiates between the storage types
https://dzone.com/articles/confused-by-aws-storage-options-s3-ebs-amp-efs-explained
EFS is for multiple servers having access to the same set of files. Cassandra has replication built in, so it has no use for that feature. You would not want multiple Cassandra nodes accessing the same files anyway as each node manages its own sstables.
Not to mention Cassandra is disk intensive and gets angry if there is latency. Cassandra connections time out really easily. So, using an NFS mount (EFS) instead of a “local” disk is just a bad idea.
Read this if you haven’t already: https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-cassandra-on-amazon-ec2/
(Can’t speak for other databases like MongoDB.)

AWS Redshift Snapshot Restore - Benchmark

Any benchmark available on restoring X TB snapshot from a N node Redshift cluster to another N node Redshift cluster?
For e.g. a 10 TB snapshot from a 12 node Redshift cluster takes 1 day to restore in another 12 node Redshift cluster? is it more like 1 day for the above or 1 week?
We are trying to get a benchmark.
Thank you
When you restore from a snapshot, Amazon Redshift creates a new cluster and makes the new cluster available before all of the data is loaded, so you can begin querying the new cluster immediately. The cluster streams data on demand from the snapshot in response to active queries, then loads the remaining data in the background.
"Amazon Redshift Snapshots"
The new cluster (restore target) becomes available very quickly. You can start running all queries and Redshift will retrieve any missing data from the snapshot.
Total time taken to finish restoring all data from the snapshot depends on multiple factors:
Is the cluster being heavily used during the restore? A cluster with no usage will complete the restore somewhat more quickly.
What is the cluster size and node type? Larger nodes and larger clusters will have more network resources.
Does the cluster use Enhanced VPC routing? This can improve network bandwidth further. https://docs.aws.amazon.com/redshift/latest/mgmt/enhanced-vpc-enabling-cluster.html

Taking EBS snapshot for multiple mongo node EBS volumes in mongoDB cluster

I have journal and data both on the same volume for a mongoDB shard, so the consistency problem of taking snapshots only after locking using fsyncLock is not needed. An EBS snapshot would be consistent point in time for a single shard.
I would like to know what is the preferred way of taking backups in mongodb cluster. I have explored two options:
Approximate point in time consistent backup by taking the EBS snapshots around the same time. Advantage being, no write lock needs to be taken.
Stop writes on the system, then take snapshots. This would give point in time consistent backup.
Now, I'd like to know how is it actually done in production. I've read about replica set's secondary node being used, but not clear how it gives point in time consistent backup. Unless all the secondary nodes have a consistent point in time data, the EBS snapshot cannot be point in time. For example, what if for a secondary for NodeA, data is synced with primary, but some data for secondary for NodeB is not. Am I missing something here?
Also, can if ever happen that approach 1 leads to inconsistent MongoDB cluster (when restored), such that is crashes or stuff?
Consistent backups
The first steps in any sharded cluster backup procedure should be to:
Stop the balancer (including waiting for any migrations in progress to complete). Usually this is done with the sh.stopBalancer() shell helper.
Backup a config server (usually with the same method as your shard servers, so EBS or filesystem snapshot)
I would define a consistent backup of a sharded cluster as one where the sharded cluster metadata (i.e. the data stored on your config servers) corresponds with the backups for the individual shards, and each of the individual shards has been correctly backed up. Stopping the balancer ensures that no data migrations happen while your backup is underway.
Assuming your MongoDB data and journal files are on a single volume, you can take a consistent EBS snapshot or filesystem snapshot without stopping writes to the node you are backing up. Snapshots occur asynchronously. Once an initial snapshot has been created, successive snapshots are incremental (only needing to update blocks that have changed since the previous snapshot).
Point-in-time backup
With an active sharded cluster, you can only easily capture a true point-in-time backup of data that has been written by stopping all writes to the cluster and backing up the primaries for each shard. Otherwise, as you have surmised, there may be differing replication lag between shards if you backup from secondaries. It's more common to backup from secondaries as there is some I/O overhead while the snapshots are written.
If you aren't using replication for your shards (or prefer to backup from primaries) the replication lag caveat doesn't apply, but the timing will be still be approximate for an active system as the snapshots need to be started simultaneously across all shards.
Point-in-time restore
Assuming all of your shards are backed by replica sets it is possible to use an approximate point-in-time consistent backup to orchestrate a restore to a more specific point-in-time using the replica set oplog for each of the shards (plus a config server). This is essentially the approach taken by backup solutions such as MongoDB Cloud Manager (née MMS): see MongoDB Backup for Sharded Cluster. MongoDB Cloud Manager leverages backup agents on each shard for continuous backup using the replication oplog, and periodically creates full snapshots on a schedule. Point-in-time restores can be built by starting from a full data snapshot and then replaying the relevant oplogs up to a requested point-in-time.
What's the common production approach?
Downtime is generally not a desirable backup strategy for a production system, so the common approach is to take a consistent backup of a running sharded cluster at an approximate point-in-time using snapshots. Coordinating backup across a sharded cluster can be challenging, so backup tools/services are also worth considering. Backup services can also be more suitable if your deployment doesn't allow snapshots (for example, if your data and/or journal directories are spread across multiple volumes to maximise available IOPS).
Note: you should really, really consider using replication for your production deployment unless this is a non-essential cluster or downtime is acceptable. Replica sets help maximise uptime & availability for your deployment and some maintenance tasks (including backup) will be much more impactful without data redundancy.
Your backup will be divided into multiple phases:
Stop the balancer on the mongos with sh.stopBalancer()
You can backup now the config database of the config servers. Does not matter whether you do it using EBS snapshots or mongodump --oplog
Now the shards and you can decide which way:
Either: You backup every node with mongodump --oplog. You do not need to stop writes since you're snapshotting the oplog together with the database export. This backup allows a consistent restore. When restoring, you can use the --oplogReplay and the --oplogLimit options to specify a timestamp (assuming your oplog is sized appropriately and did not roll over during backup). You can perform a dump on all shards in parallel and by the restore is synchronized by the oplog.
Or you fsync and lock and create an EBS snapshot (described http://docs.mongodb.org/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2/) for every shard. MongoDB 3.0 cannot guarantee when using WiredTiger that the data files do not change. The cost here is, that you're required to stop all reads and writes since you have to unmount the device.
Now start the balancer on the mongos with sh.startBalancer()
Since you do not use replica sets, you have no hassle with lagging secondaries/a write is not replicated throughout the cluster. My favorite option is using mongodump/mongorestore which give a lot of control over the restore.
Update:
In the end, you've to decide, what you want to pay to get certain benefits:
Snapshots: Pay with space, write lock and a certain level of consistency to get fast backups, fast restore times and, not impacting performance after backup
Dumping: Pay with time and ousting the working set during backup to get smaller backups for consistent and slower restores, no write locks

MongoDB on Amazon SSD-backed EC2

We have mongodb sharded cluster currently deployed on EC2 instances in Amazon. These shards are also replica sets. The instances used are using EBS with IOPS provisioned.
We have about 30 million documents in a collection. Our queries count the whole collection that matches the filters. We have indexes on almost all of the query-able fields. This results to the RAM reaching 100% usage. Our working set exceeds the size of the RAM. We think that the slow response of our queries are caused by EBS being slow so we are thinking of migrating to the new SSD-backed instances.
C3 is available
http://aws.typepad.com/aws/2013/11/a-generation-of-ec2-instances-for-compute-intensive-workloads.html
I2 is coming soon
http://aws.typepad.com/aws/2013/11/coming-soon-the-i2-instance-type-high-io-performance-via-ssd.html
Our only concern is that SSD is ephemeral, meaning the data will be gone once the instance stops, terminates, or fails. How can we address this? How do we automate backups. Is it a good idea to migrate to SSD to improve the performance of our queries? Do we still need to set-up a sharded cluster?
Working with the ephemeral disks is a risk but if you have your replication setup correctly it shouldn't be a huge concern. I'm assuming you've setup a three node replica set correct? Also you have three nodes for your config servers?
I can speak of this from experience as the company I'm at has been setup this way. To help mitigate risk I'm moving towards a backup strategy that involved a hidden replica. With this setup I can shutdown the hidden replica set and one of the config servers (first having stopped balancing) and take a complete copy of the data files (replica and config server) and have a valid backup. If AWS went down on my availability zone I'd still have a daily backup available on S3 to restore from.
Hope this helps.

EBS snapshots vs. WAL-E for PostgreSQL on EC2

I'm getting ready to move our posgresql databases to EC2 but I'm a little unclear on the best backup and recovery strategy. The original plan was to build an EBS backed server, set up WAL-E to handle WAL archiving and base backups to S3. I would take snapshots of the final production server volume to be used if the instance crashed. I also see that many people perform frequent snapshots of the EBS for recovery purposes.
What is the recommended strategy? Is there a reason to archive with WAL and perform scheduled EBS snapshots?
The EBS Snapshots will give you a slightly different type of backup than then WAL-E backups. EBS backups the entire drives, which means if your EC2 Virt goes down you can just restart the virt with your last EBS snapshot and things will pickup right where you last snapshotted things.
The frequency of your EBS snapshots would define how good your database backups are.
The appealing thing about WAL-E is the "continuous archiving". If I needed every DB transaction backed up, then WAL-E seems the right choice. Manys apps I can envision cannot afford to lose transactions, so that seems a very prudent choice.
I think your plan to snapshot the production volumes as a baseline, then use WAL-E to continuously archive the database seems very reasonable. Personally I would likely add a periodic snapshot (once a day?) to that plan just to take a hard baseline and make your recovery process a bit easier.
The usual caveat of "Test your recovery plans!" applies here. You're mixing a number of technologies (EC2, EBS, Postgres, Snapshots, S3, WAL-E) so making sure you can actually recover - rather than just back - is of critical importance.
EBS snapshots will save the image of an entire disk, so you can back up all the disks in the server and recover it as a whole in case of data loss or disaster. Besides that, the block-level property of EBS snapshots allows instant recovery, you can have a 1TB database restored and have it up and running in a few minutes. To recover a 1TB database from scratch using a file based solution (like WAL-E) will require copying the data from S3 first, a process that will take hours. Using WAL files for recovery is a good approach, since you can go back to any time by transaction, but snapshotting the entire server will include WAL files as well, so you’ll still have that option. The backup and rapid recovery process using EBS snapshots can be automated with scripts or EC2 backup solutions (for example, Backup solutions for AWS EC2 instances).