Without retention policy or lifecycle rules, would Google Cloud Storage automatically delete files? - google-cloud-storage

My app uses Google Cloud Storage through Firebase with Java, Angular & Flutter. It stores pictures and such there. Now, a lot of older files recently disappeared from Google Cloud Storage. A test version of my app is probably the culprit. But I want to make sure that I got the storage bucket configured correctly.
Please note that I don't have object versioning enabled. From what I know, it would keep a copy of deleted files around. That's why I plan to enable it in the future. But it doesn't help me with files deleted in the past.
Right now, my storage bucket is configured as follows:
Default storage class: Standard
Object versioning: Off
Retention policy: None
Lifecycle rules: None
So with that configuration, would Google Cloud Storage automatically delete files? Like, say, after a year or so?

No. If you don't ask Cloud Storage to delete your files, your files will stay around forever. There's no expiration of any sort by default. Cloud Storage is a popular tool for long term storage/backup/retention.
If you want to be especially careful not to delete certain objects, retention policies and object holds can be used to make it harder to delete objects by accident. For example, if you wanted to temporarily ensure that your scripts would not delete your most important object, you could run:
gsutil retention temp set gs://my_bucket_name/my_important_file.txt
This would set a "temporary object hold" on the object, which would make it so that my_important_file.txt could not be deleted with the delete command until you released the hold.

Related

GCS: How to backup and retain versions with a least privilege service account

I want to set up a service account that can save away backups of a file into Google Cloud Storage on a daily basis.
I was going to do it using object versioning and a life cycle policy that maintains the most recent 30 versions of the file.
However, I've discovered that gsutil requires the delete privilege to create a new version of the same file.
It seems a bit nuts to me to give a backup process delete privileges and not really in step with the principle of least privilege since my understanding is that this gives the service account the ability to do gsutil rm -a and nuke all versions of the backup in one go.
What, then, is the best, least privilege way to achieve this?
I could append a timestamp to the filename each time, but then I can't use lifecycle management and would have to write my own script to determine which are the recent 30 and delete the rest.
Is there a better/easier way to do this?
The best way I can think of to solve this is to have two service accounts -- one that can only create objects (creating your backups using timestamps), and one that can list and delete them.
Account 1 would create your backups, using timestamped filenames to avoid overwriting and thus requiring storage.objects.delete permission.
The credentials for Account 2 would be used for running a script that lists your backup objects and deletes all but the most recent 30 -- you could run this script as a cronjob on a VM somewhere, or only run it when a new backup is uploaded by utilizing Cloud Pub/Sub to trigger a Cloud Function.
We've ended up going with just saving to a different filename (eg backup-YYYYMMDD) and using retention policy to delete that file after 30 days.
It's not water tight, if backup fails for 30 days then all versions will be deleted, but we think we've put enough in place that someone would notice that before 30 days.
We didn't like leaving it up to a script to do the deleting because:
It's more error prone
It means we still end up with a service account with the ability to delete files, and we were really aiming to limit that privilege.

Mount a shared volume to Kubernetes cluster so that all users can access same storage and share files

I am following Zero to JupyterHub with Kubernetes to create a jupyterHub environment for my team to use.
Using Google Kubernetes Engine and every user gets his/her own storage and files are stored - this setup works fine.
I am having trouble as how should I create a volume or shared database so that everyone in team can see each other's notebooks, share file's and data.
To explain more, in desired setup - when a user signs in and goes to his/her jupyter image - every user sees the same folder "shared" and one can create individual folders for themselves inside that folder but are able to reuse code that someone else has already written.
I looked into NFS with Firestore but that seems very expensive.
As in the documentation gcePersistenceDisk do not support multiple read and write.
There is alternative solution for the problem. Rook is a storage backend various storage provisioner available through it. One of them is Ceph which has shared filesystem solution on kubernetes.

When creating a new cloud composer env, is it possible to set the bucket to preexisting one?

So I already have an empty storage bucket created for this and I don't want composer to create its own bucket for the dags - I'd like to use the one already created.
It's not ideal to have it just create a random bucket and then go
gcloud composer environments run test-environment --location europe-west1 variables -- --set gcs_bucket gs://my-bucket
I've dug around the docs but it seems you cannot go around it creating a brand new bucket every time?
Currently, it is not possible.
In the environment’s configuration in Cloud Composer API, the dagGcsPrefix parameter is output only, you cannot set it. Documentation also mentions a Cloud Storage bucket is always created along with the Composer environment, the name of the bucket is based on the environment’s region, name and a random Id.
You may want to “Star” this Feature Request for the mentioned functionality, to receive notifications whenever an update on this regard is published. You can also review or subscribe to the Cloud Composer release notes to be updated about recently added features.
You are right, this is currently not supported in Composer.

how we can do automatic backup for compute engine disk everyday ? in google cloud

I have created instance in compute engine with windows server 2012. i cant see any option to take automatic backup for instance disk database everyday. there is option of snapshot but we need to operate this manually. please suggest any way to backup automatically and can be restore able on a single click. if is there any other possibility using cloud SQL storage or any other storage please recommend.
thanks
There's an API to take snapshots, see API section here:
https://cloud.google.com/compute/docs/disks/create-snapshots#create_your_snapshot
You can write a simple app to get triggered from Cron or something to take a snapshot periodically.
You have no provision for automatic back up for compute engine disk. But you can do a manual disk backup by creating a snapshot.
Best alternative way is to create a bucket and move your files there. Google cloud buckets have automated back up facility available.
Cloud storage and cloud SQL are your options for automated back ups in google cloud.

Updating Web Role applications (Azure) without deleting user data

I've got a Web Role on Azure with 2 Applications and 1 Virtual Directory.
1 Application is a backend, where admins can upload files, which are stored in the virtual directory (which is accessed by both applications).
Everytime I deploy a new version to Azure, all the uploaded content in the virtual directory is deleted - this is what I don't want!
So how is it possible to publish a new version without deleting all my user generated files?
I've already managed to update the application with WebDeploy. But this is only possible for the "main" application, and not the 2nd application (which is configured as a Virtual Application).
Thanks
You can't. The web role is recreated on deployment. It may also occur on hardware failure, azure redeploys your system if an instance fails. Redeploys a clean virtual machine and then deploys your app to it. You should never store data you want to keep on a web role. You need to use blob storage etc to store files you want to persist.
Virtual directories are stored on "Application" partition which is recreated on each upgrade - see this for more information. So the virtual directory folder is not the right place to store stuff you want preserved across upgrades. BTW the "Application" partition only has 1 gigabyte of space and some of that is used for storing your application binary code so you may find yourself in a "disk full" situation at some moment.
If you want to store some data which you don't mind sacrificing on rare occasions - like cached results - you may use "local resources" disk for that which will survive in-place upgrades and reboots. However it is not guaranteed to be preserved if your VM crashes - for such level of preservation you have to use persistent storage like blob storage for example.
Since you are talking about virtual directories and using web deploy to update application outside of the usual Azure package deployment mechanism, it sounds like your architecture/application might be more suited to a persistent VM role rather than a Web role. These are available on Azure in preview only at the moment.
http://www.windowsazure.com/en-us/home/scenarios/virtual-machines/
They let you have persistent storage that will survive a recycle. The storage is actually backed by blob storage, but it looks like a normal disk from the PVM.