Lifecycle Management Policy for Azure Data Lake Gen 2 does NOT remove folders? - azure-data-lake-gen2

I got some files stored in an Azure Data Lake Gen 2 storage account. I've set a lifecycle policy so that files are deleted after 2 days. The files are successfully deleted after said period, but of course, the folders aren't deleted, even when the folder ends up empty.
Is there some way to instruct the policy to delete the folder once it ends up empty? The problem is that I'm ending up with a lot of orphaned folders after the deletion takes place.
Thanks!

Related

Without retention policy or lifecycle rules, would Google Cloud Storage automatically delete files?

My app uses Google Cloud Storage through Firebase with Java, Angular & Flutter. It stores pictures and such there. Now, a lot of older files recently disappeared from Google Cloud Storage. A test version of my app is probably the culprit. But I want to make sure that I got the storage bucket configured correctly.
Please note that I don't have object versioning enabled. From what I know, it would keep a copy of deleted files around. That's why I plan to enable it in the future. But it doesn't help me with files deleted in the past.
Right now, my storage bucket is configured as follows:
Default storage class: Standard
Object versioning: Off
Retention policy: None
Lifecycle rules: None
So with that configuration, would Google Cloud Storage automatically delete files? Like, say, after a year or so?
No. If you don't ask Cloud Storage to delete your files, your files will stay around forever. There's no expiration of any sort by default. Cloud Storage is a popular tool for long term storage/backup/retention.
If you want to be especially careful not to delete certain objects, retention policies and object holds can be used to make it harder to delete objects by accident. For example, if you wanted to temporarily ensure that your scripts would not delete your most important object, you could run:
gsutil retention temp set gs://my_bucket_name/my_important_file.txt
This would set a "temporary object hold" on the object, which would make it so that my_important_file.txt could not be deleted with the delete command until you released the hold.

How to recover the specific file which got accidentally deleted in azure file share

What is the exact process for recovering only the deleted file/s in Azure fileshare on the azure storage adls gen2
ADLS Gen2 now supports soft delete feature which allows you to retain the deleted file share for atleast 7 days which you can increase as per your requirement.
When you click on the Show deleted shares you will be able to see the deleted shares in your storage account.
Click on the 3 dots on the right side of the share which has been deleted and click in Undelete option.
You can also restore a entire file share or a specific files from a restore point created by Azure Backup. Refer Restore Azure file shares to know more.

GCS: How to backup and retain versions with a least privilege service account

I want to set up a service account that can save away backups of a file into Google Cloud Storage on a daily basis.
I was going to do it using object versioning and a life cycle policy that maintains the most recent 30 versions of the file.
However, I've discovered that gsutil requires the delete privilege to create a new version of the same file.
It seems a bit nuts to me to give a backup process delete privileges and not really in step with the principle of least privilege since my understanding is that this gives the service account the ability to do gsutil rm -a and nuke all versions of the backup in one go.
What, then, is the best, least privilege way to achieve this?
I could append a timestamp to the filename each time, but then I can't use lifecycle management and would have to write my own script to determine which are the recent 30 and delete the rest.
Is there a better/easier way to do this?
The best way I can think of to solve this is to have two service accounts -- one that can only create objects (creating your backups using timestamps), and one that can list and delete them.
Account 1 would create your backups, using timestamped filenames to avoid overwriting and thus requiring storage.objects.delete permission.
The credentials for Account 2 would be used for running a script that lists your backup objects and deletes all but the most recent 30 -- you could run this script as a cronjob on a VM somewhere, or only run it when a new backup is uploaded by utilizing Cloud Pub/Sub to trigger a Cloud Function.
We've ended up going with just saving to a different filename (eg backup-YYYYMMDD) and using retention policy to delete that file after 30 days.
It's not water tight, if backup fails for 30 days then all versions will be deleted, but we think we've put enough in place that someone would notice that before 30 days.
We didn't like leaving it up to a script to do the deleting because:
It's more error prone
It means we still end up with a service account with the ability to delete files, and we were really aiming to limit that privilege.

About delete binary logs file on Cloud Sql

I have a question about binary log on Google Cloud Sql.
Now that storage on Cloud SQL is constantly increasing, I want to delete the binary logs files. I have read the documentation about it, but it is not clear that when I disable the binary logs function, will the files be deleted immediately or have to wait for the next 7 days for the files to be deleted. Thank you.
https://cloud.google.com/sql/docs/mysql/backup-recovery/pitr#disk-usage
According to the official documentation :
Disk usage and point-in-time recovery
The binary logs are automatically deleted with their associated
automatic backup, which generally happens after about 7 days.
Diagnosing issues with Cloud SQL instances
Binary logs cannot be manually deleted.
Binary logs are automatically deleted with their associated automatic
backup, which generally happens after about seven days.
Therefore you have to wait for about 7 days for the Binary logs and their associated automatic backup to be deleted.

Can't find wal backup files

I have two servers, a master and a replica that work together in asynchronous replication mode; slave server seems to be working fine, since any change in the master is mirrored in the slave right away. Moreover, there is also an archive process that copy the wal files from the master to another filesystem to keep them safety. My doubt is, what wal files can I delete by means of pg_archivecleanup? I guess that I need look for wal files with .backup extension in the master and then I could delete wal files older than last backup.
For instance, if I have these files in the master server
000000010000000000000089
000000010000000000000088.00000028.backup
000000010000000000000088
000000010000000000000087
000000010000000000000086
...
I come to the conclusion that is safe deleting 000000010000000000000088 and older files, and keep the newest ones.
The problem is that I don't find .backup files anywhere: neither in the master, nor in the replica, nor in the archive location.
The *.backup files are created if you perform an online backup with pg_basebackup or similar and you are archiving WAL files using the archive_command.
In that case you can use pg_archivecleanup with such a file as argument to automatically remove all WAL archives that are older than that backup.
Your either use a different backup method (pg_dump?), or you are not using archive_command.
With pg_dump you cannot do archive recovery, so you wouldn't need to archive WALs at all. If you are using a different archive method like pg_receivewal, you won't get the *.backup files and you have to think of a different method to remove your old WAL archives.
One simple method to purge your old WAL archives is to simply remove all those that are older than your retention time.
The files are still being generated and archived (unless you turned that off) an the master. They are also passed by streaming to the replica where they are kept in pg_wal, but the replica automatically cleans them up every restartpoint. You can get the replica to keep them permanently by setting archive_mode=always on the replica, but it sounds like you don't want that.
If the only purpose of the archive (by the master) is to save files for use by the replica in case it falls to far behind for streaming (not for disaster recovery or PITR) than you can use "pg_archivecleanup" to automatically clean them up. This is invoked on the replica (not the master) but it must have write access to the archive directory. So you can mount it as a network file share, you can wrap pg_archivecleanup in ssh so it gets run on the master rather than the replica.