Is there any command that I can use gsutil rm to delete the old versions? - google-cloud-storage

I am learning how to use google cloud I used this command:
"gsutil ls -la gs://bucket01/*"
And I get the follow information:
display.json#01
display.json#02
display.json#03
display.json#04
display.json#05
How can I delete all the previous version and just to keep the new file that would be display.json05?

There is no wildcard that supports deleting all non-live versions, so you would need to delete them individually, like so:
gsutil -m rm gs://bucket01/display.json#01 gs://bucket01/display.json#02 gs://bucket01/display.json#03 gs://bucket01/display.json#04
Depending on your use case, you may just wish to turn versioning off, or configure an Object Lifecycle Management rule on your bucket with Age and NumNewerVersions conditions.

Related

Nearline - Backup Solution - Versioning

I've setup some Nearline buckets and enabled versioning and object lifecycle management. The use-case is to replace my current backup solution, Crashplan.
Using gsutil I can see the different versions of a file using a command like gsutil ls -al gs://backup/test.txt.
First, is there any way of finding files that don't have a live version (e.g. deleted) but still have a version attached?
Second, is there any easier way of managing versions? For instance if I delete a file from my PC, it will no longer have a live version in my bucket but will still have the older versions associated. Say, if I didn't know the file name would I just have to do a recursive ls on the entire bucket and sift through the output?
Would love a UI that supported versioning.
Thanks.
To check if the object currently has no life version use x-goog-if-generation-match header equal to 0, for example :
gsutil -h x-goog-if-generation-match:0 cp file.txt gs://bucket/file.txt
will fail (PreconditionException: 412 Precondition Failed) if file has a live version and will succeed if it has only archived versions.
In order to automatically synchronize your local folder and folder in the bucket (or the other way around) use gcloud rsync:
gcloud rsync -r -d ./test gs://bucket/test/
notice the trailing / in gs://bucket/test/, without it you will receive
CommandException: arg (gs://graham-dest/test) does not name a directory, bucket, or bucket subdir.
-r synchronize all the directories in ./test recursively to gs://bucket/test/`
-d will delete all files from gs://bucket/test/that are not found in./test`
Regarding UI, there already exists a future request. I don't know anything about third party applications however.

Google Cloud Storage : What is the easiest way to update timestamp of all files under all subfolders

I have datewise folders in the form of root-dir/yyyy/mm/dd
under which there are so many files present.
I want to update the timestamp of all the files falling under certain date-range,
for example 2 weeks ie. 14 folders, so that these these files can be picked up by my file-Streaming Data Ingestion process.
What is the easiest way to achieve this?
Is there a way in UI console? or is it through gsutil?
please help
GCS objects are immutable, so the only way to "update" the timestamp would be to copy each object on top of itself, e.g., using:
gsutil cp gs://your-bucket/object1 gs://your-bucket/object1
(and looping over all objects you want to do this to).
This is a fast (metadata-only) operation, which will create a new generation of each object, with a current timestamp.
Note that if you have versioning enabled on the bucket doing this will create an extra version of each file you copy this way.
When you say "folders in the form of root-dir/yyyy/mm/dd", do you mean that you're copying those objects into your bucket with names like gs://my-bucket/root-dir/2016/12/25/christmas.jpg? If not, see Mike's answer; but if they are named with that pattern and you just want to rename them, you could use gsutil's mv command to rename every object with that prefix:
$ export BKT=my-bucket
$ gsutil ls gs://$BKT/**
gs://my-bucket/2015/12/31/newyears.jpg
gs://my-bucket/2016/01/15/file1.txt
gs://my-bucket/2016/01/15/some/file.txt
gs://my-bucket/2016/01/15/yet/another-file.txt
$ gsutil -m mv gs://$BKT/2016/01/15 gs://$BKT/2016/06/20
[...]
Operation completed over 3 objects/12.0 B.
# We can see that the prefixes changed from 2016/01/15 to 2016/06/20
$ gsutil ls gs://$BKT/**
gs://my-bucket/2015/12/31/newyears.jpg
gs://my-bucket/2016/06/20/file1.txt
gs://my-bucket/2016/06/20/some/file.txt
gs://my-bucket/2016/06/20/yet/another-file.txt

GCS CLI: using “gsutil rm” to remove files by creation date

Is there a way to remove files from GoogleCloudStorage, using the CLI by their creation date?
For example:
I would like to remove all files in a specific path, which their creation date is lower than 2016-12-01
There's no built-in way in the CLI to delete by date. There are a couple ways to accomplish something like this. One possibility is to use an object naming scheme that prefixes object names by their creation date. Then it is easy to remove them with wildcards, for example:
gsutil -m rm gs://your-bucket/2016-12-01/*
Another approach would be to write a short parser for gsutil ls -L gs://your-bucket that filters object names by their creation date, then call gsutil -m rm -I with the resulting object names.
If you just want to automatically delete objects older than a certain age, then there is a much easier way than using the CLI: you can configure an Object Lifecycle Management policy on your bucket.

Prevent downtime using lftp mirror

I'm using lftp to deploy a website via Travis CI. There is a build process before the deployment, for that reason a build directory is present and pushed to the root of the ftp server.
lftp $FTP_URL -e "glob -d mirror build . --reverse --delete-first --parallel=10 && exit"
It works quite well, but I dislike to have a downtime / temporary PHP parse errors because of missing files on my website. What is the best way to work arround that issue?
My first approach was an option to set a temporary directory, but the lftp man page says there is only a options for temporary files. I still tried the option but it didn't help.
My second approach was to use "mirror build temp" to use a temporary folder and then replace the root with it. The problem here is, that I cannot exclude the temp folder while deleting the old files and folders like rm -rf *.
For small changes not involving adding/removing php files set xfer:use-temp-file should be sufficient. Also don't use --remove-first, as it causes lftp to delete obsolete files before uploading.
For larger changes I'd create a separate directory for each version of the site and redirect the web server to the directory using .htaccess mod_rewrite or some other configuration file. This technique will allow atomic switch to the new version (and back if needed). Besides, you will be able to do final pre-production testing of the new version if you redirect to the new version conditionally based on your IP address or using some other rule.
If you don't want to re-upload whole site for each new version and the FTP server supports FXP with itself, then you can copy old version to a new directory using mirror old_directory ftp://user#example.com/new_directory, then update the new directory using mirror -eR local_dir new_directory.
This is a zero downtown pattern - each placeholder should be replaced:
lftp $FTP_URL -e "mirror {SOURCE} {TARGET}-new-{TIMESTAMP} --reverse --delete-first;
mv {TARGET} {TARGET}-old-{TIMESTAMP};
mv {TARGET}-new-{TIMESTAMP} {TARGET};
rm -rf {TARGET}-old-{TIMESTAMP};
exit"

GCS - Global Consistency with delete + rename

My issue may be a result of my misunderstanding with global consistency in google storage, but since I have not experienced this issue until just recently (mid November) and now it seems easily reproducible, I wanted some clarification. The issue started happening in a piece of spark code running on compute engine using bdutil but I can reproduce from the command line with gsutil.
My code is deleting a destination path and then immediately renaming a source path as the destination path. With global consistency I would expect since the destination path no longer exists, the src would be renamed to the destination, but instead the src is being nested inside destination as if the destination still exists and that is not consistent.
The hadoop code to reproduce looks like:
fs.delete(new Path(dest), true)
fs.rename(new Path(src), new Path(dest))
From command line I can reproduce with:
gsutil -m rm -r gs://mybucket/dest
gsutil -m cp -r gs://mybucket/src gs://mybucket/dest
If the reason is because list operations are eventually consistent and the FileSystem implementation is using list operations to determine if the destination still exists, then I understand, and then is there a recommended solution to ensure the destination no longer exists before renaming?
Thanks,
Luke
Travis's answer is a couple of years old and not true anymore. Object list operation is strongly consistent now. Read Google's post.
Read-after-write (including delete) operations are strongly consistent, so for example, if you did:
gsutil -m rm -r gs://mybucket/dest
# Command output shows it removed gs://mybucket/dest/file1
gsutil cp gs://mybucket/dest/file1 my_local_dir/file1
That would always fail.
However, to determine if a "directory" exists, gsutil must perform an eventually-consistent listing operation to find out if any object in Google Cloud Storage's flat name space has a prefix with the name of that "directory". This can lead to the problem you described, and I expect that the hadoop code behaves similarly.
There isn't a strongly consistent workaround for this problem because there's no way to check for the existence of a prefix in a strongly-consistent way.