How to set Google cloud Storage bucket (GCS) file object expiration (TTL) using CLI - google-cloud-storage

I want to set a policy to a new GCS bucket so that files expire in 14 days (TTL, time to live, or lifecycle ends).
I use
gsutil mb \
-p ${GCP_PROJECT_ID} \
gs://$GCS_BUCKET_NAME \
--retention 14d
it doesn't work. Why is that?

GCS bucket TTL and retention policy
I had misunderstood the intention of --retention.
A retention policy is to govern how long objects in the bucket must be retained, not when it shall expire or time to live.
https://cloud.google.com/storage/docs/bucket-lock
--retention 14d means the objects are not allowed to be deleted within 14 days. It doesn't mean the objects has 14d lifecycle and shall expire and be deleted after 14 days.
To set TTL correctly for a GCS bucket, do below instead
# set GCS bucket object TTL
echo '
{
"rule":
[
{
"action": {"type": "Delete"},
"condition": {"age": 14}
}
]
}
' > gcs_lifecycle.tmp
gsutil lifecycle set gcs_lifecycle.tmp gs://$GCS_BUCKET_NAME
rm gcs_lifecycle.tmp

Related

Is there a way to figure out in which region a Google Cloud Storage bucket is hosted?

NCBI (the National Center for Biotech Info) generously provided their data for 3rd parties to consume. The data is located in cloud buckets such as gs://sra-pub-run-1/. I would like to read this data without incurring additional costs, which I believe can be achieved by reading it from the same region as where the bucket is hosted. Unfortunately, I can't figure out in which region the bucket is hosted (NCBI mentions in their docs that's in the US, but not where in the US). So my questions are:
Is there a way to figure out in which region a bucket that I don't own, like gs://sra-pub-run-1/ is hosted?
Is my understanding correct that reading the data from instances in the same region is free of charge? What if the GCS bucket is multi-region?
Doing a simple gsutil ls -b -L either provides no information (when listing a specific directory within sra-pub-run-1 or I get a permission denied error if I try to list info on gs://sra-pub-run-1/ directly using:
gsutil -u metagraph ls -b gs://sra-pub-run-1/
You cannot specify a specific Compute Engine zone as a bucket location, but all Compute Engine VM instances in zones within a given region have similar performance when accessing buckets in that region.
Billing-wise, egressing data from Cloud Storage into a Compute Engine instance in the same location/region (for example, US-EAST1 to US-EAST1) is free, regardless of zone.
So, check the "Location constraint" of the GCS bucket (gsutil ls -Lb gs://bucketname ), and if it says "US-EAST1", and if your GCE instance is also in US-EAST1, downloading data from that GCS bucket will not incur an egress fee.

Google cloud storage versioned bucket not obeying lifecycle rule

I would like to have a maximum of only 2 versions of all objects in my google cloud storage bucket. I have enabled object versioning and have added a lifecycle rule to delete any objects with more than 2 versions. I then start adding objects multiple times to the bucket and run
gsutil ls -R -a gs://bucketname
I end up seeing 3 or 4 different generations of each object even after several minutes of waiting they are not deleted.
Eg:
gs://bucketname/b331108b.csv.gz#1562856078193350
gs://bucketname/b331108b.csv.gz#1564856078195342
gs://bucketname/b331108b.csv.gz#1565856078143350
gs://bucketname/b331108b.csv.gz#1567856078193551
Is this the expected behaviour?

How to mount all bucket in Google Cloud Storage

From document https://cloud.google.com/storage/docs/gcs-fuse, when use FUSE GCS there is only mound predefine bucket to the specified path.
In case I have many buckets, show how I can mount all buckets as root directory can access or create any bucket as the directory?
gcsfuse doesn't support mounting multiple buckets in a single process, nor does it support creating buckets for you. You'll need to create buckets in the usual way outside gcsfuse, then run one gcsfuse process per desired bucket.

Object lifecycle seems to stop working on Google Cloud Storage

I've set 3 day TTL on a bucket and it has been working for about a month.
but in last 10 days nothing has been removed from the bucket. I've checked if the lifecycle rule still exists on the bucket using gsutil lifecycle get bucket and it is still there:
{"rule": [{"action": {"type": "Delete"}, "condition": {"age": 3}}]}
Is the rule correct? if yes what could be the problem?

Does lifecycle depend on the time an object was deleted or created?

I have this lifecyle set on my Google Cloud Storage
"action": {"type": "Delete"},
"condition": {"age": 7, "isLive": false}
If I remove a file will the lifecycle delete event occur 7 days later or will it apply immediately if the file is already over 7 days old?
When I use gsutil ls -a it seems like the version doesn't change when I remove a file which makes me think that it will get treated by lifecycle as if it is already over 7 days old.
If that is the case how can I have my files deleted 7 days after they are removed?
If you remove a file, it will be deleted immediately. Nothing will happen 7 days later.
If you have an existing object in a bucket with that lifecycle policy, it will be deleted by GCS at some point after it reaches 7 days old. There is no guarantee that it will be deleted immediately, but it will usually happen in less than a day.