EBS snapshots vs. WAL-E for PostgreSQL on EC2 - postgresql

I'm getting ready to move our posgresql databases to EC2 but I'm a little unclear on the best backup and recovery strategy. The original plan was to build an EBS backed server, set up WAL-E to handle WAL archiving and base backups to S3. I would take snapshots of the final production server volume to be used if the instance crashed. I also see that many people perform frequent snapshots of the EBS for recovery purposes.
What is the recommended strategy? Is there a reason to archive with WAL and perform scheduled EBS snapshots?

The EBS Snapshots will give you a slightly different type of backup than then WAL-E backups. EBS backups the entire drives, which means if your EC2 Virt goes down you can just restart the virt with your last EBS snapshot and things will pickup right where you last snapshotted things.
The frequency of your EBS snapshots would define how good your database backups are.
The appealing thing about WAL-E is the "continuous archiving". If I needed every DB transaction backed up, then WAL-E seems the right choice. Manys apps I can envision cannot afford to lose transactions, so that seems a very prudent choice.
I think your plan to snapshot the production volumes as a baseline, then use WAL-E to continuously archive the database seems very reasonable. Personally I would likely add a periodic snapshot (once a day?) to that plan just to take a hard baseline and make your recovery process a bit easier.
The usual caveat of "Test your recovery plans!" applies here. You're mixing a number of technologies (EC2, EBS, Postgres, Snapshots, S3, WAL-E) so making sure you can actually recover - rather than just back - is of critical importance.

EBS snapshots will save the image of an entire disk, so you can back up all the disks in the server and recover it as a whole in case of data loss or disaster. Besides that, the block-level property of EBS snapshots allows instant recovery, you can have a 1TB database restored and have it up and running in a few minutes. To recover a 1TB database from scratch using a file based solution (like WAL-E) will require copying the data from S3 first, a process that will take hours. Using WAL files for recovery is a good approach, since you can go back to any time by transaction, but snapshotting the entire server will include WAL files as well, so you’ll still have that option. The backup and rapid recovery process using EBS snapshots can be automated with scripts or EC2 backup solutions (for example, Backup solutions for AWS EC2 instances).

Related

Cloud SQL - Growing each day, but not replicating

I've had a replica slave set up for about two weeks now. It has been failing replication due to configuration issues, but still growing in the size of the master each day (about 5gb a day).
Until today, binary logs were disabled. And if I go to Monitoring -> slave instance, under Backup Configuration, it says "false".
How do I determine why this is growing each day?
I noticed in monitoring in the InnoDB Pages Read/Write section, there are upticks of Write each day, but no read. But what is it writing to? The DB hasn't changed. and there are no binary logs.
I noticed in the docs, it says "Point-in-time recovery is enabled by default when you create a new Cloud SQL instance."
But there has never been a "Backup" listed in the Operations list on the instance. And when I do gcloud sql instances describe my-instance, it's not listed under backUpConfiguration
The issue you are having could possibly happen due to Point-in-time recovery, it will show an increase to your storage constantly.
There, you will be able to keep automated backups enabled while disabling point-in-time recovery. Once you disable it, the binary logs will be deleted and you will notice an immediate reduction in storage usage.
Here are the steps to disable Point-in-time recovery:
Select your instance
Select Backups
Under Settings, select Edit
Uncheck box for point-in-time recovery
To add an explanation of Point-in-time recovery, I will add Google Cloud SQL documentation with Postgres and MySQL.
It is necessary to archive the WAL files for instances it is enabled on. This archiving is done automatically on the backend and will consume storage space (even if the instance is idle), and, consequently, using this feature would cause an increased storage space on your DB instance.

Difference between incremental backup and WAL archiving with PgBackRest

As far as I understood
WAL archiving is pushing the WAL logs to a storage place as the WAL files are generated
Incremental backup is pushing all the WAL files created since the last backup
So, assuming my WAL archiving is setup correctly
Why would I need incremental backups?
Shouldn't the cost of incremental backups be almost zero?
Most of the documentation I found is focusing on a high level implementation (e.g. how to setup WAL archiving or incremental backups) vs the internal ( what happens when I trigger an incremental backup)
My question can probably be solved with a link to some documentation, but my google-fu has failed me so far
Backups are not copies of the WAL files, they're copies of the cluster's whole data directory. As it says in the docs, an incremental backup contains:
those database cluster files that have changed since the last backup (which can be another incremental backup, a differential backup, or a full backup)
WALs alone aren't enough to restore a database; they only record changes to the cluster files, so they require a backup as a starting point.
The need for periodic backups (incremental or otherwise) is primarily to do with recovery time. Technically, you could just hold on to your original full backup plus years worth of WAL files, but replaying them all in the event of a failure could take hours or days, and you likely can't tolerate that kind of downtime.
A new backup also means that you can safely discard any older WALs (assuming you don't still need them for point-in-time recovery), meaning less data to store, and less data whose integrity you're relying on in order to recover.
If you want to know more about what pgBackRest is actually doing under the hood, it's all covered pretty thoroughly in the Postgres docs.

How to configure WAL archiving for a cluster that *only* hosts dev or test databases?

I've got a dev and test database for a project, i.e. databases that I use to either run my project or run tests, locally. They're both in the same cluster ('instance' – I come from Redmond).
Note that my local cluster is different than the cluster that hosts the production database.
How should I configure those databases with respect to archiving the WAL files?
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
But how should I configure the databases or the cluster for archiving WAL files? I understand that I need them if I want to recover the database. I think that's unlikely (as I didn't even know about 'WAL' or their files, or that, presumably they're shared by all of the databases in the same cluster, which seems weird and scary coming from Microsoft SQL Server.)
In the event that I rebuild one of the databases, I should delete the WAL files since the base backup – how can I do that?
But I also don't want to have to worry about the size of the WAL files growing indefinitely. I don't want to be forced to rebuild just to save space. What can I do to prevent this?
My local cluster only contains a single dev and test database for my project, i.e. losing data from one of these databases is (or should be) no big deal. Even having to recreate the cluster itself, and the two databases, is fine and not an issue if it's even just easier than otherwise to restore the two databases to a 'working' condition for local development and testing.
In other words, I don't care about the data in either database. I will ensure – separate from WAL archiving – that I can restore either database to a state sufficient for my needs.
Also, I'd like to document (e.g. in code) how to configure my local cluster and the two databases so that other developers for the same project can use the same setup for their local clusters. These clusters are all distinct from the cluster that hosts the production database.
Rather than trying to manage your WAL files manually, it's generally recommended that you let a third-party app take care of that for you. There are several options, but pg_backrest is the most popular of the open-source offerings out there.
Each database instance writes its WAL stream, chopped in segments of 16MB.
Every other relational database does the same thing, even Microsoft SQL Server (the differences are in the name and organization of these files).
The WAL contains the physical information required to replay transactions. Imagine it as information like: "in file x, block 2734, change 24 bytes at offset 543 as follows: ..."
With a base backup and this information you can restore any given point in time in the life of the database since the end of the base backup.
Each PostgreSQL cluster writes its own "WAL stream". The files are named with long weird hexadecimal numbers that never repeat, so there is no danger that a later WAL segment of a cluster can conflict with an earlier WAL segment of the same cluster.
You have to make sure that WAL is archived to a different machine, otherwise the exercise is pretty useless. If you have several clusters on the same machine, make sure that you archive them to different directories (or locations in general), because the names of the WAL segments of different clusters will collide.
About retention: You want to keep around your backups for some time. Once you get rid of a base backup, you can also get rid of all WAL segments from before that base backup. There is the pg_archivecleanup executable that can help you get rid of all archived WAL segments older than a given base backup.
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
Where is the basebackup coming from? If you are restoring the PROD base backup and running the seed scripts over it, then you don't need WAL archiving at all on test/dev. But then what you get will be a clone of PROD, which means it will not have different databases for test and for dev in the same instance, since (presumably) PROD doesn't have that.
If the base backup is coming from someplace else, you will have to describe what it is. That will dictate your WAL needs.
Trying to run one instance with both test and dev on it seems like a false economy to me. Just run two instances.
Setting archive_mode=off will entirely disable a wal archive. There will still be "live" WAL files in the pg_wal or pg_xlog directory, but these get removed/recycled automatically after each checkpoint--you should not need to manage these, other than by controlling how often checkpoints take place (and making sure you don't have any replication slots hanging around). The WAL archive and the live WAL files are different things. The live WAL files are mandatory and are needed to automatically recover from something like a power failure. The WAL archive may be needed to manually recover from a hard-drive crash or the total destruction of your server, and probably isn't needed at all on dev/test.

What is the best way to take snapshots of an EC2 instance running MongoDB?

I wanted to automate taking snapshots of the volume attached to an EC2 instance running the primary node of our production MongoDB replicaSet. While trying to gauge the pitfalls and best practices over Google, I came across the fact that data inconsistency and corruption are very much possible while creating a snapshot but not of journaling is enabled, which it is in our case.
So my question is - is it safe to go ahead and execute aws ec2 create-snapshot --volume-id <volume-id> to get clean backups of my data?
Moreover, I plan on running the same command via a cron job that runs once every week. Is that a good enough plan to have scheduled backups?
For MongoDB on an EC2 instance I do the following:
mongodump to a backup directory on the EBS volume
zip the mongodump output directory
copy the zip file to an S3 bucket (with encryption and versioning enabled)
initiate a snapshot of the EBS volume
I write a script to perform the above tasks, and schedule it to run daily via cron. Now you will have a backup of the database on the EC2 instance, in the EBS snapshot, and on S3. You could go one step further by enabling cross region replication on the S3 bucket.
This setup provides multiple levels of backup. You should now be able to recover your database in the event of an EC2 failure, an EBS failure, an AWS Availability Zone failure or even a complete AWS Region failure.
I would recommend reading the official MongoDB documentation on EC2 deployments:
https://docs.mongodb.org/ecosystem/platforms/amazon-ec2/
https://docs.mongodb.org/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2/

Taking EBS snapshot for multiple mongo node EBS volumes in mongoDB cluster

I have journal and data both on the same volume for a mongoDB shard, so the consistency problem of taking snapshots only after locking using fsyncLock is not needed. An EBS snapshot would be consistent point in time for a single shard.
I would like to know what is the preferred way of taking backups in mongodb cluster. I have explored two options:
Approximate point in time consistent backup by taking the EBS snapshots around the same time. Advantage being, no write lock needs to be taken.
Stop writes on the system, then take snapshots. This would give point in time consistent backup.
Now, I'd like to know how is it actually done in production. I've read about replica set's secondary node being used, but not clear how it gives point in time consistent backup. Unless all the secondary nodes have a consistent point in time data, the EBS snapshot cannot be point in time. For example, what if for a secondary for NodeA, data is synced with primary, but some data for secondary for NodeB is not. Am I missing something here?
Also, can if ever happen that approach 1 leads to inconsistent MongoDB cluster (when restored), such that is crashes or stuff?
Consistent backups
The first steps in any sharded cluster backup procedure should be to:
Stop the balancer (including waiting for any migrations in progress to complete). Usually this is done with the sh.stopBalancer() shell helper.
Backup a config server (usually with the same method as your shard servers, so EBS or filesystem snapshot)
I would define a consistent backup of a sharded cluster as one where the sharded cluster metadata (i.e. the data stored on your config servers) corresponds with the backups for the individual shards, and each of the individual shards has been correctly backed up. Stopping the balancer ensures that no data migrations happen while your backup is underway.
Assuming your MongoDB data and journal files are on a single volume, you can take a consistent EBS snapshot or filesystem snapshot without stopping writes to the node you are backing up. Snapshots occur asynchronously. Once an initial snapshot has been created, successive snapshots are incremental (only needing to update blocks that have changed since the previous snapshot).
Point-in-time backup
With an active sharded cluster, you can only easily capture a true point-in-time backup of data that has been written by stopping all writes to the cluster and backing up the primaries for each shard. Otherwise, as you have surmised, there may be differing replication lag between shards if you backup from secondaries. It's more common to backup from secondaries as there is some I/O overhead while the snapshots are written.
If you aren't using replication for your shards (or prefer to backup from primaries) the replication lag caveat doesn't apply, but the timing will be still be approximate for an active system as the snapshots need to be started simultaneously across all shards.
Point-in-time restore
Assuming all of your shards are backed by replica sets it is possible to use an approximate point-in-time consistent backup to orchestrate a restore to a more specific point-in-time using the replica set oplog for each of the shards (plus a config server). This is essentially the approach taken by backup solutions such as MongoDB Cloud Manager (née MMS): see MongoDB Backup for Sharded Cluster. MongoDB Cloud Manager leverages backup agents on each shard for continuous backup using the replication oplog, and periodically creates full snapshots on a schedule. Point-in-time restores can be built by starting from a full data snapshot and then replaying the relevant oplogs up to a requested point-in-time.
What's the common production approach?
Downtime is generally not a desirable backup strategy for a production system, so the common approach is to take a consistent backup of a running sharded cluster at an approximate point-in-time using snapshots. Coordinating backup across a sharded cluster can be challenging, so backup tools/services are also worth considering. Backup services can also be more suitable if your deployment doesn't allow snapshots (for example, if your data and/or journal directories are spread across multiple volumes to maximise available IOPS).
Note: you should really, really consider using replication for your production deployment unless this is a non-essential cluster or downtime is acceptable. Replica sets help maximise uptime & availability for your deployment and some maintenance tasks (including backup) will be much more impactful without data redundancy.
Your backup will be divided into multiple phases:
Stop the balancer on the mongos with sh.stopBalancer()
You can backup now the config database of the config servers. Does not matter whether you do it using EBS snapshots or mongodump --oplog
Now the shards and you can decide which way:
Either: You backup every node with mongodump --oplog. You do not need to stop writes since you're snapshotting the oplog together with the database export. This backup allows a consistent restore. When restoring, you can use the --oplogReplay and the --oplogLimit options to specify a timestamp (assuming your oplog is sized appropriately and did not roll over during backup). You can perform a dump on all shards in parallel and by the restore is synchronized by the oplog.
Or you fsync and lock and create an EBS snapshot (described http://docs.mongodb.org/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2/) for every shard. MongoDB 3.0 cannot guarantee when using WiredTiger that the data files do not change. The cost here is, that you're required to stop all reads and writes since you have to unmount the device.
Now start the balancer on the mongos with sh.startBalancer()
Since you do not use replica sets, you have no hassle with lagging secondaries/a write is not replicated throughout the cluster. My favorite option is using mongodump/mongorestore which give a lot of control over the restore.
Update:
In the end, you've to decide, what you want to pay to get certain benefits:
Snapshots: Pay with space, write lock and a certain level of consistency to get fast backups, fast restore times and, not impacting performance after backup
Dumping: Pay with time and ousting the working set during backup to get smaller backups for consistent and slower restores, no write locks