I'm trying using gsutil to remove the contents of a Cloud Storage bucket (but not the bucket itself). According to the documentation, the command should be:
gsutil rm gs://bucket/**
However, whenever I run that (with my bucket name substituted of course), I get the following response:
zsh: no matches found: gs://my-bucket/**
I've checked permissions, and I have owner permissions. Additionally, if I specify a file, which is in the bucket, directly, it is successfully deleted.
Other information which may matter:
My bucket name has a "-" in it (similar to "my-bucket")
It is the bucket that Cloud Storage saves my usage logs to
How do I go about deleting the contents of a bucket?
zsh is attempting to expand the wildcard before gsutil sees it (and is complaining that you have no local files matching that wildcard). Please try this, to prevent zsh from doing so:
gsutil rm 'gs://bucket/**'
Note that you need to use single (not double) quotes to prevent zsh wildcard handling.
If you have variables to replace, you can also just escape the wildcard character
Examples with copy (with interesting flags) and rm
GCP_PROJECT_NAME=your-project-name
gsutil -m cp -r gs://${GCP_PROJECT_NAME}.appspot.com/assets/\* src/local-assets/
gsutil rm gs://${GCP_PROJECT_NAME}.appspot.com/\*\*
gsutil rm gs://bucketName/doc.txt
And for remove entire bucket including all objects
gsutil rm -r gs://bucketname
Related
I have a problem with one of my automated jobs.
Before launching a cloud dataflow job, I perform a gsutil rm on previous files but it appears that it does not remove everything because when I launch another dataflow job some older shards remain.
I tried :
gsutil -m rm gs://mybucket/blahblah/*
and
gsutil rm -r gs://mybucket/blablah
But same result...
Strange thing is that not removed files are nor the first nor the last.
I tought it was my second job fault but the fact is that I saw in logs that indeed files were not removed bu gsutil.
Is there possibility that there is too many files to delete ?
Is there known problems of gsutil rm reliability ?
I use version 0.9.80 of google cloud sdk
Thanks
The gsutil rm commands you're using depend on listing the objects in a bucket, which is an eventually consistent operation in Google Cloud Storage. Thus, it's possible that attempting these commands in a bucket soon after objects were written will not remove all the objects. If you try again later it should succeed.
One way to avoid this problem would be to keep track of the names of the objects you uploaded, and explicitly list those objects in the gsutil rm command. For example, if you kept the object list in the file objects.manifest you could run a command like this on Linux or MacOS:
xargs gsutil -m rm < objects.manifest
Scenario: there are multiple folders and many files stored in storage bucket that is accessible by project team members. Instead of downloading individual files one at a time (which is very slow and time consuming), is there a way to download entire folders? Or at least multiple files at once? Is this possible without having to use one of the command consoles? Some of the team members are not tech savvy and need to access these files as simple as possible. Thank you for any help!
I would suggest downloading the files with gsutil. However if you have a large number of files to transfer you might want to use the gsutil -m option, to perform a parallel (multi-threaded/multi-processing) copy:
gsutil -m cp -R gs://your-bucket .
The time reduction for downloading the files can be quite significant. See this Cloud Storage documentation for complete information on the GCS cp command.
If you want to copy into a particular directory, note that the directory must exist first, as gsutils won't create it automatically. (e.g: mkdir my-bucket-local-copy && gsutil -m cp -r gs://your-bucket my-bucket-local-copy)
I recommend they use gsutil. GCS's API deals with only one object at a time. However, its command-line utility, gsutil, is more than happy to download a bunch of objects in parallel, though. Downloading an entire GCS "folder" with gsutil is pretty simple:
$> gsutil cp -r gs://my-bucket/remoteDirectory localDirectory
To download files to local machine need to:
install gsutil to local machine
run Google Cloud SDK Shell
run the command like this (example, for Windows-platform):
gsutil -m cp -r gs://source_folder_path "%userprofile%/Downloads"
gsutil rsync -d -r gs://bucketName .
works for me
Is this my only option or is there a faster way?
# Delete contents in bucket (takes a long time on large bucket)
gsutil -m rm -r gs://my-bucket/*
# Remove bucket
gsutil rb gs://my-bucket/
Buckets are required to be empty before they're deleted. So before you can delete a bucket, you have to delete all of the objects it contains.
You can do this with gsutil rm -r (documentation). Just don't pass the * wildcard and it will delete the bucket itself after it has deleted all of the objects.
gsutil -m rm -r gs://my-bucket
Google Cloud Storage bucket deletes can't succeed until the bucket listing returns 0 objects. If objects remain, you can get a Bucket Not Empty error (or in the UI's case 'Bucket Not Ready') when trying to delete the bucket.
gsutil has built-in retry logic to delete both buckets and objects.
Another option is to enable Lifecycle Management on the bucket. You could specify an Age of 0 days and then wait a couple days. All of your objects should be deleted.
Using Python client, you can force a delete within your script by using:
bucket.delete(force=True)
Try out a similar thing in your current language.
Github thread that discusses this
This deserves to be summarized and pointed out.
Deleting with gsutil rm is slow if you have LOTS (terabytes) of data
gsutil -m rm -r gs://my-bucket
However, you can specify the expiration for the bucket and let the GCS do the work for you. Create a fast-delete.json policy:
{
"rule":[
{
"action":{
"type":"Delete"
},
"condition":{
"age":0
}
}
]
}
then apply
gsutil lifecycle set fast-delete.json gs://MY-BUCKET
Thanks, #jterrace and #Janosch
Use this to set an appropriate lifecycle rule. e.g. wait for a day.
https://cloud.google.com/storage/docs/gsutil/commands/lifecycle
Example (Read carefully before copy paste)
gsutil lifecycle set [LIFECYCLE_CONFIG_FILE] gs://[BUCKET_NAME]
Example (Read carefully before copy paste)
{
"rule":
[
{
"action": {"type": "Delete"},
"condition": {"age": 1}
}
]
}
Then delete the bucket.
This will delete the data asynchronously, so you don't have to keep
some background job running on your end.
Shorter one liner for the lifecycle change:
gsutil lifecycle set <(echo '{"rule":[{"action":{"type":"Delete"},"condition":{"age":0}}]}') gs://MY-BUCKET
I've also had good luck creating an empty bucket then starting a transfer to the bucket I want to empty out. Our largest bucket took about an hour to empty this way; the lifecycle method seems to take at least a day.
I benchmarked deletes using three techniques:
Storage Transfer Service: 1200 - 1500 / sec
gcloud alpha storage rm: 520 / sec
gsutil -m rm: 240 / sec
The big winner is the Storage Transfer Service. To delete files with it you need a source bucket (or folder in a bucket) that is empty, and then you copy that to a destination bucket (or folder in that bucket) that you want to be empty.
If using the GUI select this bullet in the advanced transfer options dialog:
You can also create and run the job from the CLI. This example assumes you have access to gs://bucket1/empty/ (which has no objects in it) and you want to delete all objects from gs://bucket2/:
gcloud transfer jobs create \
gs://bucket1/empty/ gs://bucket2/ \
--delete-from=destination-if-unique \
--project my-project
If you want your deletes to happen even faster you'll need to create multiple transfer jobs and have them target different sections of the bucket. Because it has to do a bucket listing to find the files to delete you'd want to make the destination paths non-overlapping (e.g. gs://bucket2/folder1/ and gs://bucket2/folder2/, etc). Each job will process in parallel at speed getting the job done in less total time.
Usually I like this better than using Object Lifecycle Management (OLM) because it starts right away (no waiting up to 24 hours for policy evaluation) but there may be times when OLM is the way to go.
Remove the bucket from Developers Console. It will ask for confirmation before deleting a non empty bucket. It works like a charm ;)
I've tried both ways (expiration time and gsutil command direct to bucket root), but I could not wait to the expiration time to propagate.
The gsutil rm was deleting 200 files per second, so I did this:
Open several terminal and executed the gsutil rm using different "folder" names with *
ie:
gsutil -m rm -r gs://my-bucket/a*
gsutil -m rm -r gs://my-bucket/b*
gsutil -m rm -r gs://my-bucket/c*
In this example, the command is able to delete 600 files per second.
So you just need to open more terminals and find the patterns to delete more files.
If one wildcard is huge, you can detail, like this
gsutil -m rm -r gs://my-bucket/b1*
gsutil -m rm -r gs://my-bucket/b2*
gsutil -m rm -r gs://my-bucket/b3*
I am having a problem where gsutil does not seem to follow the behavior described in the documentation (at least in Windows). The documentation states:
When performing recursive directory copies, object names are constructed that mirror the source directory structure starting at the point of recursive processing. For example, the command:
gsutil cp -R dir1/dir2 gs://my_bucket
will create objects named like gs://my_bucket/dir2/a/b/c, assuming dir1/dir2 contains the file a/b/c.
However, in practice I have found that it will create objects named:
gs://my_bucket/dir1/dir2/a/b/c
ie, it copies the entire directory path stated in the gsutil command, rather than "starting at the point of recursive processing" (dir2) as stated in the documentation.
Am I missing/misunderstanding something here?
I noticed the same behavior when using the gsutil cp -R command with a similar directory structure. In order to copy the desired directory from within the 'dir2' level I used the command: gsutil rsync -r dir1/dir2 gs://mybucket
We are in the process of moving our servers into the Google Cloud Compute Engine and starting to look the Cloud Storage as a CDN option. I uploaded about 1,000 files through the Developer Console but the problem is all the Object Permissions for All Users is set at None. I can't find any way to edit all the permissions to give All Users Reader access. Am I missing something?
You can use the gsutil acl ch command to do this as follows:
gsutil -m acl ch -R -g All:R gs://bucket1 gs://bucket2/object ...
where:
-m sets multi-threaded mode, which is faster for a large number of objects
-R recursively processes the bucket and all of its contents
-g All:R grants all users read-only access
See the acl documentation for more details.
You can use Google Cloud Shell as your console via a web browser if you just need to run a single command via gsutil, as it comes preinstalled in your console VM.
In addition to using the gsutil acl command to change the existing ACLs, you can use the gsutil defacl command to set the default object ACL on the bucket as follows:
gsutil defacl set public-read gs://«your bucket»
You can then upload your objects in bulk via:
gsutil -m cp -R «your source directory» gs://«your bucket»
and they will have the correct ACLs set. This will all be much faster than using the web interface.
You can set the access control permission by using "predefinedAcl" the code is as follows.
Storage.Objects.Insert insertObject =client.objects().insert(, ,);
insertObject.setPredefinedAcl("publicRead");
This will work fine
Do not miss to put jolly characters after the bucket's object to apply changes to each files - example:
gsutil -m acl ch -R -g All:R gs://bucket/files/*
for all files inside the 'files' folder, or:
gsutil -m acl ch -R -g All:R gs://bucket/images/*.jpg
for each jpg file inside the 'images' folder.