GCS: How to backup and retain versions with a least privilege service account - google-cloud-storage

I want to set up a service account that can save away backups of a file into Google Cloud Storage on a daily basis.
I was going to do it using object versioning and a life cycle policy that maintains the most recent 30 versions of the file.
However, I've discovered that gsutil requires the delete privilege to create a new version of the same file.
It seems a bit nuts to me to give a backup process delete privileges and not really in step with the principle of least privilege since my understanding is that this gives the service account the ability to do gsutil rm -a and nuke all versions of the backup in one go.
What, then, is the best, least privilege way to achieve this?
I could append a timestamp to the filename each time, but then I can't use lifecycle management and would have to write my own script to determine which are the recent 30 and delete the rest.
Is there a better/easier way to do this?

The best way I can think of to solve this is to have two service accounts -- one that can only create objects (creating your backups using timestamps), and one that can list and delete them.
Account 1 would create your backups, using timestamped filenames to avoid overwriting and thus requiring storage.objects.delete permission.
The credentials for Account 2 would be used for running a script that lists your backup objects and deletes all but the most recent 30 -- you could run this script as a cronjob on a VM somewhere, or only run it when a new backup is uploaded by utilizing Cloud Pub/Sub to trigger a Cloud Function.

We've ended up going with just saving to a different filename (eg backup-YYYYMMDD) and using retention policy to delete that file after 30 days.
It's not water tight, if backup fails for 30 days then all versions will be deleted, but we think we've put enough in place that someone would notice that before 30 days.
We didn't like leaving it up to a script to do the deleting because:
It's more error prone
It means we still end up with a service account with the ability to delete files, and we were really aiming to limit that privilege.

Related

How to recover PostgreSQL 13.5 database without backup file in Ubuntu 20.04 server?

I had 10-15 microservices databases running in production on Ubuntu server. Accidentally deleted everything in the /var/lib/postgresql/** folder with command sudo rm -r *. I think PGDATA is inside the /var/lib/postgresql/13/ folder.
I tried TestDisk to restore this folder but it showed everything deleted except the 13/ folder.
I only have backup files from a long time ago.
Is there any way to restore the last data?
If you don't have a backup of the deleted files and testdisk was not able to recover them, you may want to try using another data recovery tool such as extundelete or photorec. These tools work by scanning the partition and looking for data that is no longer referenced by the file system, which can include deleted files.
It's important to note that the chances of successfully recovering deleted files decrease as more time passes and more activity occurs on the partition, so it's best to try to recover the files as soon as possible after the deletion. In addition, the more you use the partition after the deletion, the more likely it is that the deleted data will be overwritten, making it impossible to recover.
If you are unable to recover the deleted files using these tools, you may want to consider seeking the assistance of a professional data recovery service. These services typically have specialized equipment and expertise that can be used to recover data from damaged or formatted disks. However, these services can be expensive, so it's important to weigh the value of the data against the cost of recovery.

Without retention policy or lifecycle rules, would Google Cloud Storage automatically delete files?

My app uses Google Cloud Storage through Firebase with Java, Angular & Flutter. It stores pictures and such there. Now, a lot of older files recently disappeared from Google Cloud Storage. A test version of my app is probably the culprit. But I want to make sure that I got the storage bucket configured correctly.
Please note that I don't have object versioning enabled. From what I know, it would keep a copy of deleted files around. That's why I plan to enable it in the future. But it doesn't help me with files deleted in the past.
Right now, my storage bucket is configured as follows:
Default storage class: Standard
Object versioning: Off
Retention policy: None
Lifecycle rules: None
So with that configuration, would Google Cloud Storage automatically delete files? Like, say, after a year or so?
No. If you don't ask Cloud Storage to delete your files, your files will stay around forever. There's no expiration of any sort by default. Cloud Storage is a popular tool for long term storage/backup/retention.
If you want to be especially careful not to delete certain objects, retention policies and object holds can be used to make it harder to delete objects by accident. For example, if you wanted to temporarily ensure that your scripts would not delete your most important object, you could run:
gsutil retention temp set gs://my_bucket_name/my_important_file.txt
This would set a "temporary object hold" on the object, which would make it so that my_important_file.txt could not be deleted with the delete command until you released the hold.

Rundeck MariaDB hot backup

On rundeck backup guide, noted that is mandatory to stop rundeck to take full backup when using data file. Now, that guide don't show any secure method to backup full rundeck instance (rundeck server + database) when using MariaDB, PostgreSQL, or any supported database as a backend.
In a real production scenario, not seems to be possible to stop rdeck on a daily basis.
Can anybody share best pratices to take a hot full backup on rdeck installation without stop rdeck?
Is there any secure and supported way to achive a full consistent rdeck projects and jobs definitions and database on a daily basis ?
In this post, answer is not clear, because question don't describe what kinbd of backend are used.
The documentation suggests the instance shutdown because some execution could be active, and that means a potentially active transaction in the middle of the "hot backup process" which means a potential data corruption in your backup. Is the safest way to backup your database.
If you want to do a "hot" backup you can export your projects (with all content, including jobs) and keys.

How to configure WAL archiving for a cluster that *only* hosts dev or test databases?

I've got a dev and test database for a project, i.e. databases that I use to either run my project or run tests, locally. They're both in the same cluster ('instance' – I come from Redmond).
Note that my local cluster is different than the cluster that hosts the production database.
How should I configure those databases with respect to archiving the WAL files?
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
But how should I configure the databases or the cluster for archiving WAL files? I understand that I need them if I want to recover the database. I think that's unlikely (as I didn't even know about 'WAL' or their files, or that, presumably they're shared by all of the databases in the same cluster, which seems weird and scary coming from Microsoft SQL Server.)
In the event that I rebuild one of the databases, I should delete the WAL files since the base backup – how can I do that?
But I also don't want to have to worry about the size of the WAL files growing indefinitely. I don't want to be forced to rebuild just to save space. What can I do to prevent this?
My local cluster only contains a single dev and test database for my project, i.e. losing data from one of these databases is (or should be) no big deal. Even having to recreate the cluster itself, and the two databases, is fine and not an issue if it's even just easier than otherwise to restore the two databases to a 'working' condition for local development and testing.
In other words, I don't care about the data in either database. I will ensure – separate from WAL archiving – that I can restore either database to a state sufficient for my needs.
Also, I'd like to document (e.g. in code) how to configure my local cluster and the two databases so that other developers for the same project can use the same setup for their local clusters. These clusters are all distinct from the cluster that hosts the production database.
Rather than trying to manage your WAL files manually, it's generally recommended that you let a third-party app take care of that for you. There are several options, but pg_backrest is the most popular of the open-source offerings out there.
Each database instance writes its WAL stream, chopped in segments of 16MB.
Every other relational database does the same thing, even Microsoft SQL Server (the differences are in the name and organization of these files).
The WAL contains the physical information required to replay transactions. Imagine it as information like: "in file x, block 2734, change 24 bytes at offset 543 as follows: ..."
With a base backup and this information you can restore any given point in time in the life of the database since the end of the base backup.
Each PostgreSQL cluster writes its own "WAL stream". The files are named with long weird hexadecimal numbers that never repeat, so there is no danger that a later WAL segment of a cluster can conflict with an earlier WAL segment of the same cluster.
You have to make sure that WAL is archived to a different machine, otherwise the exercise is pretty useless. If you have several clusters on the same machine, make sure that you archive them to different directories (or locations in general), because the names of the WAL segments of different clusters will collide.
About retention: You want to keep around your backups for some time. Once you get rid of a base backup, you can also get rid of all WAL segments from before that base backup. There is the pg_archivecleanup executable that can help you get rid of all archived WAL segments older than a given base backup.
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
Where is the basebackup coming from? If you are restoring the PROD base backup and running the seed scripts over it, then you don't need WAL archiving at all on test/dev. But then what you get will be a clone of PROD, which means it will not have different databases for test and for dev in the same instance, since (presumably) PROD doesn't have that.
If the base backup is coming from someplace else, you will have to describe what it is. That will dictate your WAL needs.
Trying to run one instance with both test and dev on it seems like a false economy to me. Just run two instances.
Setting archive_mode=off will entirely disable a wal archive. There will still be "live" WAL files in the pg_wal or pg_xlog directory, but these get removed/recycled automatically after each checkpoint--you should not need to manage these, other than by controlling how often checkpoints take place (and making sure you don't have any replication slots hanging around). The WAL archive and the live WAL files are different things. The live WAL files are mandatory and are needed to automatically recover from something like a power failure. The WAL archive may be needed to manually recover from a hard-drive crash or the total destruction of your server, and probably isn't needed at all on dev/test.

Can I schedule backups using the Heroku PG Backup add-on?

I have been using PG Backups add-on recently and everything has worked fine, however this morning the backup process triggered at 10:00 A.M. in the morning generating some blocks and timeouts in my application.
Is there a way to specify the schedule of the backups made with this add-on? I've been searching and haven't found anything specific.
Use Cron for Manual Backup Scheduling
Heroku gives you two types of backups: automated and user-initiated. Each plan has a different number of daily, weekly, and manual backups that are retained. You can't control when the automated backups occur with PG Backups Auto, but you can use cron to trigger a "manual" backup at any time.
For example:
# Trigger a "manual" backup every four hours.
0 */4 * * * source $HOME/database_credentials; heroku pgbackups:capture
See Creating a Backup for more information about using the pgbackups command.
No, there is no way to do it currently, aside from using an external process to fire the calls.
An email to support might reveal more.
While the original question is old, Heroku does have a schedule option for PGBackups now:
https://devcenter.heroku.com/articles/heroku-postgres-backups#scheduling-backups