Ive recently started tinkering with Google Cloud SQL - PostgreSQL.
I have created an empty database and over 4-5 days its storage usage has grown to over 20GB. Its just keeps going up, but there is no data in the database. Its not even being used.
Does anyone know what would be doing this and how to stop it?
Yes, this is most likely due to Point-in-time recovery which will show an increase to your storage every few minutes. You are able to keep automated backups enabled while disabling point-in-time recovery. Once you disable it the binary logs will be deleted and you will notice an immediate reduction in storage usage. That said, according to the documentation: "The binary logs are automatically deleted with their associated automatic backup, which generally happens after about 7 days."
To disable point-in-time recovery:
Select your instance
Select Backups
Under Settings select Edit
Uncheck box for point-in-time recovery
Most likely you have turned on the automated backups setting. You can confirm this by clicking the backups tab in your Cloud SQL instance. Be careful with disabling and deleting backups in case you will start using your database later!
Related
I've had a replica slave set up for about two weeks now. It has been failing replication due to configuration issues, but still growing in the size of the master each day (about 5gb a day).
Until today, binary logs were disabled. And if I go to Monitoring -> slave instance, under Backup Configuration, it says "false".
How do I determine why this is growing each day?
I noticed in monitoring in the InnoDB Pages Read/Write section, there are upticks of Write each day, but no read. But what is it writing to? The DB hasn't changed. and there are no binary logs.
I noticed in the docs, it says "Point-in-time recovery is enabled by default when you create a new Cloud SQL instance."
But there has never been a "Backup" listed in the Operations list on the instance. And when I do gcloud sql instances describe my-instance, it's not listed under backUpConfiguration
The issue you are having could possibly happen due to Point-in-time recovery, it will show an increase to your storage constantly.
There, you will be able to keep automated backups enabled while disabling point-in-time recovery. Once you disable it, the binary logs will be deleted and you will notice an immediate reduction in storage usage.
Here are the steps to disable Point-in-time recovery:
Select your instance
Select Backups
Under Settings, select Edit
Uncheck box for point-in-time recovery
To add an explanation of Point-in-time recovery, I will add Google Cloud SQL documentation with Postgres and MySQL.
It is necessary to archive the WAL files for instances it is enabled on. This archiving is done automatically on the backend and will consume storage space (even if the instance is idle), and, consequently, using this feature would cause an increased storage space on your DB instance.
I have a question about binary log on Google Cloud Sql.
Now that storage on Cloud SQL is constantly increasing, I want to delete the binary logs files. I have read the documentation about it, but it is not clear that when I disable the binary logs function, will the files be deleted immediately or have to wait for the next 7 days for the files to be deleted. Thank you.
https://cloud.google.com/sql/docs/mysql/backup-recovery/pitr#disk-usage
According to the official documentation :
Disk usage and point-in-time recovery
The binary logs are automatically deleted with their associated
automatic backup, which generally happens after about 7 days.
Diagnosing issues with Cloud SQL instances
Binary logs cannot be manually deleted.
Binary logs are automatically deleted with their associated automatic
backup, which generally happens after about seven days.
Therefore you have to wait for about 7 days for the Binary logs and their associated automatic backup to be deleted.
I am using RDS to run a postgresql server (9.6.3) and this morning, a backup was automatically kicked off. It is still going 6 hours later which seems absurd. The database is not that big (~ 600 GB), and as far as I can tell, this is the first time i've experienced this problem. The machine is relatively beefy (db.m4.2xlarge), so it seems like these backups should take a lot less than 6 hours.
I am also surprised by the fact that a backup would be kicked off at 5:30 AM, which seems awfully close to standard biz hours.
Any ideas?
You scheduled the 5:30 AM backup window. Amazon didn't randomly kick it off at that time. Look at your RDS instance's settings and you will see a backup window that was defined when you created the instance.
An RDS backup is like an EBS snapshot, and it shouldn't be reliant on the CPU of the server at all. It should also not affect server performance at all.
You should look into migrating to Amazon Aurora now that the PostreSQL version is out of beta. Among other benefits, you will get extremely fast snapshot creation with Aurora.
Sometimes things like this become "stuck" due to an issue behind the scenes. If that happens all you can do is open a support ticket with AWS to get it fixed.
I have a PostgreSQL 9.5 instance running off an Azure VM. As described here, I must specify a post- and a prescript to tell Azure: "Yes, I've taken care of putting the VM in a state, so the entire VM/blob can be backed up as a snapshot that can be restored as a working new VM" and "Now I'm done", thus Azure will flag the backup as Application consistent.
In terms of PostgreSQL, I have read the docs on continuous archiving, that instruct why and how to enable WAL Archiving to allow for backups. And here comes my question:
If I set archive_mode = on and wal_level = archive, can I leave the archive_command empty, and does this even make sense? Or - should I do some kind of archiving here (like e.g. copying the log segments to another location / disk), and is this archiving necessary to ensure a working database upon restoring the VM in my scenario?
I only need to tell the PostgreSQL "Wait a minute / hold your data-writes (or whatever goes on), while I create a snapshot of the entire VM". The plan is to execute pg_start_backup() before , take the snapshot and then pg_stop_backup().
I do realize, this method (if it's even valid) is essentially a file system level backup, and according to docs, the postgres-service must be shut down for the fs-backup to be valid. Another place I've read that hitting the pg_start_backup() should be enough to guarantee for a valid stand-alone physical backup.
If the snapshots you plan to take are truly atomic, that is, the restored snapshot represents the state of the file system at some point in time, you can just restart the database from such a snapshot, and it will perform crash recovery and come up in a consistent state.
In that case, there is no need to care about WAL archiving or backup mode. You could set archive_mod = off and not worry about it.
If the snapshot is not truly atomic, or you want point-in-time-recovery (the ability to restore the database to a point in time between backups), you need WAL archiving set up and running, because you need the WALs to restore the database to a consistent state.
In that case archive_mode must be on and archive_command must be a command that returns success only if the WAL file has been archived successfully. If only one WAL is missing between your last backup and the time to which you want to restore the database, it will not work.
I'm getting ready to move our posgresql databases to EC2 but I'm a little unclear on the best backup and recovery strategy. The original plan was to build an EBS backed server, set up WAL-E to handle WAL archiving and base backups to S3. I would take snapshots of the final production server volume to be used if the instance crashed. I also see that many people perform frequent snapshots of the EBS for recovery purposes.
What is the recommended strategy? Is there a reason to archive with WAL and perform scheduled EBS snapshots?
The EBS Snapshots will give you a slightly different type of backup than then WAL-E backups. EBS backups the entire drives, which means if your EC2 Virt goes down you can just restart the virt with your last EBS snapshot and things will pickup right where you last snapshotted things.
The frequency of your EBS snapshots would define how good your database backups are.
The appealing thing about WAL-E is the "continuous archiving". If I needed every DB transaction backed up, then WAL-E seems the right choice. Manys apps I can envision cannot afford to lose transactions, so that seems a very prudent choice.
I think your plan to snapshot the production volumes as a baseline, then use WAL-E to continuously archive the database seems very reasonable. Personally I would likely add a periodic snapshot (once a day?) to that plan just to take a hard baseline and make your recovery process a bit easier.
The usual caveat of "Test your recovery plans!" applies here. You're mixing a number of technologies (EC2, EBS, Postgres, Snapshots, S3, WAL-E) so making sure you can actually recover - rather than just back - is of critical importance.
EBS snapshots will save the image of an entire disk, so you can back up all the disks in the server and recover it as a whole in case of data loss or disaster. Besides that, the block-level property of EBS snapshots allows instant recovery, you can have a 1TB database restored and have it up and running in a few minutes. To recover a 1TB database from scratch using a file based solution (like WAL-E) will require copying the data from S3 first, a process that will take hours. Using WAL files for recovery is a good approach, since you can go back to any time by transaction, but snapshotting the entire server will include WAL files as well, so you’ll still have that option. The backup and rapid recovery process using EBS snapshots can be automated with scripts or EC2 backup solutions (for example, Backup solutions for AWS EC2 instances).