Is there a way to remove files from GoogleCloudStorage, using the CLI by their creation date?
For example:
I would like to remove all files in a specific path, which their creation date is lower than 2016-12-01
There's no built-in way in the CLI to delete by date. There are a couple ways to accomplish something like this. One possibility is to use an object naming scheme that prefixes object names by their creation date. Then it is easy to remove them with wildcards, for example:
gsutil -m rm gs://your-bucket/2016-12-01/*
Another approach would be to write a short parser for gsutil ls -L gs://your-bucket that filters object names by their creation date, then call gsutil -m rm -I with the resulting object names.
If you just want to automatically delete objects older than a certain age, then there is a much easier way than using the CLI: you can configure an Object Lifecycle Management policy on your bucket.
Related
Is there no way to get a file listing out from a Google Cloud Storage bucket that is sorted by date descending? This is very frustrating. I need to check the status of files that are uploaded and the bucket has thousands of objects.
gsutil ls does not have the standard linux -t option.
Google cloud console also lists it but does not offer sorting options.
I use this as a workaround:
gsutil ls -l gs://[bucket-name]/ | sort -k 2
This outputs full listing including date as the second field, sort -k 2 then sorts by this field.
The only ordering supported by GCS is lexicographic.
As a workaround, if it's possible for you to name your objects with a datestamp, that would give you a way to list objects by date.
I've setup some Nearline buckets and enabled versioning and object lifecycle management. The use-case is to replace my current backup solution, Crashplan.
Using gsutil I can see the different versions of a file using a command like gsutil ls -al gs://backup/test.txt.
First, is there any way of finding files that don't have a live version (e.g. deleted) but still have a version attached?
Second, is there any easier way of managing versions? For instance if I delete a file from my PC, it will no longer have a live version in my bucket but will still have the older versions associated. Say, if I didn't know the file name would I just have to do a recursive ls on the entire bucket and sift through the output?
Would love a UI that supported versioning.
Thanks.
To check if the object currently has no life version use x-goog-if-generation-match header equal to 0, for example :
gsutil -h x-goog-if-generation-match:0 cp file.txt gs://bucket/file.txt
will fail (PreconditionException: 412 Precondition Failed) if file has a live version and will succeed if it has only archived versions.
In order to automatically synchronize your local folder and folder in the bucket (or the other way around) use gcloud rsync:
gcloud rsync -r -d ./test gs://bucket/test/
notice the trailing / in gs://bucket/test/, without it you will receive
CommandException: arg (gs://graham-dest/test) does not name a directory, bucket, or bucket subdir.
-r synchronize all the directories in ./test recursively to gs://bucket/test/`
-d will delete all files from gs://bucket/test/that are not found in./test`
Regarding UI, there already exists a future request. I don't know anything about third party applications however.
I have datewise folders in the form of root-dir/yyyy/mm/dd
under which there are so many files present.
I want to update the timestamp of all the files falling under certain date-range,
for example 2 weeks ie. 14 folders, so that these these files can be picked up by my file-Streaming Data Ingestion process.
What is the easiest way to achieve this?
Is there a way in UI console? or is it through gsutil?
please help
GCS objects are immutable, so the only way to "update" the timestamp would be to copy each object on top of itself, e.g., using:
gsutil cp gs://your-bucket/object1 gs://your-bucket/object1
(and looping over all objects you want to do this to).
This is a fast (metadata-only) operation, which will create a new generation of each object, with a current timestamp.
Note that if you have versioning enabled on the bucket doing this will create an extra version of each file you copy this way.
When you say "folders in the form of root-dir/yyyy/mm/dd", do you mean that you're copying those objects into your bucket with names like gs://my-bucket/root-dir/2016/12/25/christmas.jpg? If not, see Mike's answer; but if they are named with that pattern and you just want to rename them, you could use gsutil's mv command to rename every object with that prefix:
$ export BKT=my-bucket
$ gsutil ls gs://$BKT/**
gs://my-bucket/2015/12/31/newyears.jpg
gs://my-bucket/2016/01/15/file1.txt
gs://my-bucket/2016/01/15/some/file.txt
gs://my-bucket/2016/01/15/yet/another-file.txt
$ gsutil -m mv gs://$BKT/2016/01/15 gs://$BKT/2016/06/20
[...]
Operation completed over 3 objects/12.0 B.
# We can see that the prefixes changed from 2016/01/15 to 2016/06/20
$ gsutil ls gs://$BKT/**
gs://my-bucket/2015/12/31/newyears.jpg
gs://my-bucket/2016/06/20/file1.txt
gs://my-bucket/2016/06/20/some/file.txt
gs://my-bucket/2016/06/20/yet/another-file.txt
I am learning how to use google cloud I used this command:
"gsutil ls -la gs://bucket01/*"
And I get the follow information:
display.json#01
display.json#02
display.json#03
display.json#04
display.json#05
How can I delete all the previous version and just to keep the new file that would be display.json05?
There is no wildcard that supports deleting all non-live versions, so you would need to delete them individually, like so:
gsutil -m rm gs://bucket01/display.json#01 gs://bucket01/display.json#02 gs://bucket01/display.json#03 gs://bucket01/display.json#04
Depending on your use case, you may just wish to turn versioning off, or configure an Object Lifecycle Management rule on your bucket with Age and NumNewerVersions conditions.
New to GCS (just got started with it today). Looks very promising.
Is there anyway to use multiple S3 accounts (or GCS) in a single boto file? I only see the option to assign keys to one S3 and one GCS account in a single file. I'd like to use multiple credentials.
We're like to copy from S3 to S3, or GCS to GCS, with each of those buckets using different keys.
You should be able to setup multiple profiles within your .boto file.
You could add something like:
[profile prod]
gs_access_key_id=....
gs_secret_access_key=....
[profile dev]
gs_access_key_id=....
gs_secret_access_key=....
And then from your code you can add a profile_name= parameter to the connection call:
con = boto.gs.connection(profile_name="dev")
You can definitely use multiple boto files, just make sure that the credentials in each of them are valid. Every time you need to switch between them, run the following command with the right path.
$ BOTO_CONFIG=/path/to_boto gsutil cp SOME_FILE gs://bucket
Example :
BOTO_CONFIG=/etc/boto.cfg gsutil -m cp text.txt gs://bucket
Additionally, you can have aliases for your different profiles. Just create an alias for each command and you are set !