Why some buckets should not appear after a gsutil ls? - google-cloud-storage

When I do gsutil ls -p myproject-id I get a list of buckets (in my case 2 buckets), which I expect to be the list of all my buckets in the project:
gs://bucket-one/
gs://bucket-two/
But, if I do gsutil ls -p myproject-id gs://asixtythreecharacterlongnamebucket I actually get the elements of that long-named bucket:
gs://asixtythreecharacterlongnamebucket/somefolder/
So my question is: why when I do a ls to the project I don't get in the results the long-named bucket?
The only explanation it made sense to me was this: https://stackoverflow.com/a/34738829/3457432
But I'm not sure. Is this the reason? Or could it be other ones?

Are you sure that asixtythreecharacterlongnamebucket belongs to myproject-id? It really sounds like asixtythreecharacterlongnamebucket was created in a different project.
You can verify this by checking the bucket ACLs for asixtythreecharacterlongnamebucket and bucket-one and seeing if the project numbers in the listed entities match:
$ gsutil ls -Lb gs://asixtythreecharacterlongnamebucket | grep projectNumber
$ gsutil ls -Lb gs://bucket-one | grep projectNumber
Also note that the -p argument to ls has no effect in your second command when you're listing objects in some bucket. The -p argument only affects which project should be used when you're listing buckets in some project, as in your first command. Think of ls as listing the children resources belonging to some parent -- the parent of a bucket is a project, while the parent of an object is a bucket.

You don't perform the same request!
gsutil ls -p myproject-id
Here you ask all the bucket resources that belong to a project
gsutil ls -p myproject-id gs://asixtythreecharacterlongnamebucket
Here you ask all the objects that belong to the bucket asixtythreecharacterlongnamebucket and you use the quota project myproject-id
In both case, you need to have permissions to access the resources

Related

Is "in the cloud" gsutil cp an atomic operation?

Assuming I have copied one object into a Google Cloud Storage bucket using the following command:
gsutil -h "Cache-Control:public,max-age=3600" cp -a public-read a.html gs://some-bucket/
I now want to copy this file "in the cloud" while keeping the public ACL and simultaneously updating the Cache-Control header:
gsutil -h "Cache-Control:no-store" cp -p gs://some-bucket/a.html gs://some-bucket/b.html
Is this operation atomic? I.e. can I be sure, that the object gs://some-bucket/b.html will become initially available with the modified Cache-Control:no-store header?
The reason for my question is: I'm using a Google Cloud Storage bucket as a CDN-backend. While I want most of the objects in the bucket to be cached by the CDN according to the max-age provided in the Cache-Control header I want to make sure that a few specific files, which are in fact copies of cacheable versions, are never cached by a CDN. It is therefore crucial that these objects – when being copied – never appear with a Cache-Control:public,max-age=XXX but immediately appear with a Cache-Control:no-store header as to eliminate the chance that a request coming from a CDN would read the copied object at a point in time where a max-age would still be present and hence cache the object which is supposed to never be cached.
Yes, copying to the new object with Cache-Control set will be atomic. You can verify this by looking at the metageneration property of the object.
For example, upload an object:
$ BUCKET=mybucket
$ echo foo | ./gsutil cp - gs://$BUCKET/foo.txt
Copying from <STDIN>...
/ [1 files][ 0.0 B/ 0.0 B]
Operation completed over 1 objects.
and you'll see that its initial metageneration is 1:
$ ./gsutil ls -L gs://$BUCKET/foo.txt | grep Meta
Metageneration: 1
Whenever an object's metadata is modified, the metageneration is changed. For example, if the cache control is updated later like so:
$ ./gsutil setmeta -h "Cache-Control:no-store" gs://$BUCKET/foo.txt
Setting metadata on gs://mybucket/foo.txt...
/ [1 objects]
Operation completed over 1 objects.
The new metageneration is 2:
$ ./gsutil ls -L gs://$BUCKET/foo.txt | grep Meta
Metageneration: 2
Now, if we run the copy command:
$ ./gsutil -h "Cache-Control:no-store" cp -p gs://$BUCKET/foo.txt gs://$BUCKET/bar.txt
Copying gs://mybucket/foo.txt [Content-Type=application/octet-stream]...
- [1 files][ 4.0 B/ 4.0 B]
Operation completed over 1 objects/4.0 B.
The metageneration of the new object is 1:
$ ./gsutil ls -L gs://$BUCKET/bar.txt | grep Meta
Metageneration: 1
This means that the object was written once and has not been modified since.

gsutil returning "no matches found"

I'm trying using gsutil to remove the contents of a Cloud Storage bucket (but not the bucket itself). According to the documentation, the command should be:
gsutil rm gs://bucket/**
However, whenever I run that (with my bucket name substituted of course), I get the following response:
zsh: no matches found: gs://my-bucket/**
I've checked permissions, and I have owner permissions. Additionally, if I specify a file, which is in the bucket, directly, it is successfully deleted.
Other information which may matter:
My bucket name has a "-" in it (similar to "my-bucket")
It is the bucket that Cloud Storage saves my usage logs to
How do I go about deleting the contents of a bucket?
zsh is attempting to expand the wildcard before gsutil sees it (and is complaining that you have no local files matching that wildcard). Please try this, to prevent zsh from doing so:
gsutil rm 'gs://bucket/**'
Note that you need to use single (not double) quotes to prevent zsh wildcard handling.
If you have variables to replace, you can also just escape the wildcard character
Examples with copy (with interesting flags) and rm
GCP_PROJECT_NAME=your-project-name
gsutil -m cp -r gs://${GCP_PROJECT_NAME}.appspot.com/assets/\* src/local-assets/
gsutil rm gs://${GCP_PROJECT_NAME}.appspot.com/\*\*
gsutil rm gs://bucketName/doc.txt
And for remove entire bucket including all objects
gsutil rm -r gs://bucketname

How to share entire Google Cloud Bucket with GSUTIL

Is there a command using GSUTIL that will allow me to share publicly everything in a specific Bucket? Right now, I'm forced to go through and check "share publicly" individually on EVERY SINGLE FILE in the console.
The best way to do this is:
gsutil -m acl ch -u 'AllUsers:R' gs://your-bucket/**
will update ACLs for each existing object in the bucket.
If you want newly created objects in this bucket to also be public, you should also run:
gsutil defacl ch -u 'AllUsers:R' gs://your-bucket
This question was also asked here but the answer recommends using acl set public-read which has the downside of potentially altering your existing ACLs.
$> gsutil acl ch -g All:R -r gs://bucketName
gsutil is the command-line utility for GCS.
"acl ch" means "Modify an ACL."
"-g All:R" means "include read permissions for all users."
"-r" means "recursively"
and the rest is the path.
If you have a whole lot of files and you want MORE SPEED, you can use -m to mean "and also do this multithreaded!", like so:
$> gsutil -m acl ch -g All:R -r gs://bucketName

Fast way of deleting non-empty Google bucket?

Is this my only option or is there a faster way?
# Delete contents in bucket (takes a long time on large bucket)
gsutil -m rm -r gs://my-bucket/*
# Remove bucket
gsutil rb gs://my-bucket/
Buckets are required to be empty before they're deleted. So before you can delete a bucket, you have to delete all of the objects it contains.
You can do this with gsutil rm -r (documentation). Just don't pass the * wildcard and it will delete the bucket itself after it has deleted all of the objects.
gsutil -m rm -r gs://my-bucket
Google Cloud Storage bucket deletes can't succeed until the bucket listing returns 0 objects. If objects remain, you can get a Bucket Not Empty error (or in the UI's case 'Bucket Not Ready') when trying to delete the bucket.
gsutil has built-in retry logic to delete both buckets and objects.
Another option is to enable Lifecycle Management on the bucket. You could specify an Age of 0 days and then wait a couple days. All of your objects should be deleted.
Using Python client, you can force a delete within your script by using:
bucket.delete(force=True)
Try out a similar thing in your current language.
Github thread that discusses this
This deserves to be summarized and pointed out.
Deleting with gsutil rm is slow if you have LOTS (terabytes) of data
gsutil -m rm -r gs://my-bucket
However, you can specify the expiration for the bucket and let the GCS do the work for you. Create a fast-delete.json policy:
{
"rule":[
{
"action":{
"type":"Delete"
},
"condition":{
"age":0
}
}
]
}
then apply
gsutil lifecycle set fast-delete.json gs://MY-BUCKET
Thanks, #jterrace and #Janosch
Use this to set an appropriate lifecycle rule. e.g. wait for a day.
https://cloud.google.com/storage/docs/gsutil/commands/lifecycle
Example (Read carefully before copy paste)
gsutil lifecycle set [LIFECYCLE_CONFIG_FILE] gs://[BUCKET_NAME]
Example (Read carefully before copy paste)
{
"rule":
[
{
"action": {"type": "Delete"},
"condition": {"age": 1}
}
]
}
Then delete the bucket.
This will delete the data asynchronously, so you don't have to keep
some background job running on your end.
Shorter one liner for the lifecycle change:
gsutil lifecycle set <(echo '{"rule":[{"action":{"type":"Delete"},"condition":{"age":0}}]}') gs://MY-BUCKET
I've also had good luck creating an empty bucket then starting a transfer to the bucket I want to empty out. Our largest bucket took about an hour to empty this way; the lifecycle method seems to take at least a day.
I benchmarked deletes using three techniques:
Storage Transfer Service: 1200 - 1500 / sec
gcloud alpha storage rm: 520 / sec
gsutil -m rm: 240 / sec
The big winner is the Storage Transfer Service. To delete files with it you need a source bucket (or folder in a bucket) that is empty, and then you copy that to a destination bucket (or folder in that bucket) that you want to be empty.
If using the GUI select this bullet in the advanced transfer options dialog:
You can also create and run the job from the CLI. This example assumes you have access to gs://bucket1/empty/ (which has no objects in it) and you want to delete all objects from gs://bucket2/:
gcloud transfer jobs create \
gs://bucket1/empty/ gs://bucket2/ \
--delete-from=destination-if-unique \
--project my-project
If you want your deletes to happen even faster you'll need to create multiple transfer jobs and have them target different sections of the bucket. Because it has to do a bucket listing to find the files to delete you'd want to make the destination paths non-overlapping (e.g. gs://bucket2/folder1/ and gs://bucket2/folder2/, etc). Each job will process in parallel at speed getting the job done in less total time.
Usually I like this better than using Object Lifecycle Management (OLM) because it starts right away (no waiting up to 24 hours for policy evaluation) but there may be times when OLM is the way to go.
Remove the bucket from Developers Console. It will ask for confirmation before deleting a non empty bucket. It works like a charm ;)
I've tried both ways (expiration time and gsutil command direct to bucket root), but I could not wait to the expiration time to propagate.
The gsutil rm was deleting 200 files per second, so I did this:
Open several terminal and executed the gsutil rm using different "folder" names with *
ie:
gsutil -m rm -r gs://my-bucket/a*
gsutil -m rm -r gs://my-bucket/b*
gsutil -m rm -r gs://my-bucket/c*
In this example, the command is able to delete 600 files per second.
So you just need to open more terminals and find the patterns to delete more files.
If one wildcard is huge, you can detail, like this
gsutil -m rm -r gs://my-bucket/b1*
gsutil -m rm -r gs://my-bucket/b2*
gsutil -m rm -r gs://my-bucket/b3*

Google Cloud Storage: bulk edit ACLs

We are in the process of moving our servers into the Google Cloud Compute Engine and starting to look the Cloud Storage as a CDN option. I uploaded about 1,000 files through the Developer Console but the problem is all the Object Permissions for All Users is set at None. I can't find any way to edit all the permissions to give All Users Reader access. Am I missing something?
You can use the gsutil acl ch command to do this as follows:
gsutil -m acl ch -R -g All:R gs://bucket1 gs://bucket2/object ...
where:
-m sets multi-threaded mode, which is faster for a large number of objects
-R recursively processes the bucket and all of its contents
-g All:R grants all users read-only access
See the acl documentation for more details.
You can use Google Cloud Shell as your console via a web browser if you just need to run a single command via gsutil, as it comes preinstalled in your console VM.
In addition to using the gsutil acl command to change the existing ACLs, you can use the gsutil defacl command to set the default object ACL on the bucket as follows:
gsutil defacl set public-read gs://«your bucket»
You can then upload your objects in bulk via:
gsutil -m cp -R «your source directory» gs://«your bucket»
and they will have the correct ACLs set. This will all be much faster than using the web interface.
You can set the access control permission by using "predefinedAcl" the code is as follows.
Storage.Objects.Insert insertObject =client.objects().insert(, ,);
insertObject.setPredefinedAcl("publicRead");
This will work fine
Do not miss to put jolly characters after the bucket's object to apply changes to each files - example:
gsutil -m acl ch -R -g All:R gs://bucket/files/*
for all files inside the 'files' folder, or:
gsutil -m acl ch -R -g All:R gs://bucket/images/*.jpg
for each jpg file inside the 'images' folder.