can i search google cloud storage buckets recursively in the console? - google-cloud-storage

We have uploaded files to google cloud storage buckets and planning to create a permission to have a number of people access it. So far we could only filter/search files and folders in the directory you are in. Is it possible to search files recursively though?

It seems what you are looking for is the following command for searching within a bucket recursively:
gsutil ls -r gs://bucket/**
Note: "bucket" is the name of the bucket you have set.
In the case you would like to search within a specific directory you can run the following:
gsutil ls -r gs://bucket/dir/**
Note: "dir" would be the directory in which you would like to search
You can find more information regarding searching through "Directory By Directory, Flat, And Recursive" by going to the following link.
Update
If this is not what you meant then I would like to mention another option. You can retrieve the information regarding the contents in a bucket through an API as well. The following API link here retrieves a list of objects matching the criteria specified.
Note: In order for this API to work the user must have "READER" permission or above.
Please let me know if this is what you were looking for.

Related

Downloading public data directory from google cloud storage with command line utilities like wget

I would like to download publicly available data from google cloud storage. However, because I need to be in a Python3.x environment, it is not possible to use gsutil. I can download individual files with wget as
wget http://storage.googleapis.com/path-to-file/output_filename -O output_filename
However, commands like
wget -r --no-parent https://console.cloud.google.com/path_to_directory/output_directoryname -O output_directoryname
do not seem to work as they just download an index file for the directory. Neither do rsync or curl attempts based on some initial attempts. Any idea of how to download publicly available data on google cloud storage as a directory?
The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.
As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.
Also, gsutil currently has a GitHub issue open to track adding support for Python 3.

How to download multiple objects from IBM Cloud Object Storage?

I am trying to use IBM Cloud Object Storage to store images uploaded to my site by users. I have this functionality working just fine.
However, based on the documentation here (link) it appears as though only one object can be downloaded from a bucket at a time.
Is there any way a list of objects could all be downloaded from the bucket? Is there a different approach to requesting multiple objects from a COS bucket?
Via the REST API, no, you can only download a single object at a time. But most tools (like the AWS CLI, or Minio Client) allow downloading all objects that share a prefix (eg foo/bar and foo/bas). The IBM forks of the S3 libraries also are now integrated with Aspera, and can transfer large directories all at once. What are you trying to do?
According to S3 spec (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html), you can only download one object at a time.
There are various tools which may help to download multiple objects at a time from COS. I used AWS CLI tool to download and upload the objects from/to COS.
So install aws-cli tool and configure it by supplying access_key_id and secret_access_key here.
Recursively copying S3 objects to a local directory: When passed with the parameter --recursive, the following cp command recursively copies all objects under a specified prefix and bucket to a specified directory.
C:\Users\Shashank>aws s3 cp s3://yourBucketName . --recursive
for example:
C:\Users\Shashank>aws --endpoint-url http://s3.us-east.cloud-object-storage.appdomain.cloud s3 cp s3://yourBucketName D:\s3\ --recursive
In my case having endpoint based on us-east region and I am copying objects into D:\s3 directory.
Recursively copying local files to S3: When passed with the parameter --recursive, the following cp command recursively copies all files under a specified directory to a specified bucket.
C:\Users\Shashank>aws s3 cp myDir s3://yourBucketName/ --recursive
for example:
C:\Users\Shashank>aws --endpoint-url http://s3.us-east.cloud-object-storage.appdomain.cloud s3 cp D:\s3 s3://yourBucketName/ --recursive
I am copying objects from D:\s3 directory to COS.
For more reference, you can see the link here.
I hope it works for you.

Mount Bucket on Google Storage

I want to mount a Google bucket to a local server. However, when I run the line, the directory I point it to is empty. Any ideas?
gcsfuse mssng_vcf_files ./mountbucket/
It reports:
File system has been successfully mounted.
but the directory mountbucket/ is empty.
gcsfuse will not show any directory defined by a file with a slash in its name. So if your bucket contains /files/index.txt it will not show until you create a object named files. I am assuming here your bucket contains directories then files, and if that is the case this may be your problem.
gcsfuse supports a flag called --implicit-dirs that changes the behaviour. When this flag is enabled, name lookup requests from the kernel use the GCS API's Objects.list operation to search for objects that would implicitly define the existence of a directory with the name in question. So, in the example above, there would appear to be a directory named "files".
There are some drawbacks which are defined here -
https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/semantics.md#implicit-directories
So you have 2 options
Create the directories in your bucket which will make your files appear
Look at --implicit-dirs flag to get them to always appear.
Hope this helps.

gsutil acl set command AccessDeniedException: 403 Forbidden

I am following the steps of setting up Django on Google App Engine, and since Gunicorn does not serve static files, I have to store my static files to Google Cloud Storage.
I am at the line with "Create a Cloud Storage bucket and make it publically readable." on https://cloud.google.com/python/django/flexible-environment#run_the_app_on_your_local_computer. I ran the following commands as suggested:
$ gsutil mb gs://your-gcs-bucket
$ gsutil defacl set public-read gs://your-gcs-bucket
The first command is supposed to create a new storage bucket, and the second line sets its default ACL. When I type in the command, the second line returns an error.
Setting default object ACL on gs://your-gcs-bucket/...
AccessDeniedException: 403 Forbidden
I also tried other commands setting or getting acl, but all returns the same error, with no additional information.
I am a newbie with google cloud services, could anyone point out what is the problem?
I figured it out myself, and it is kind of silly. I didn't notice if the first command is successful or not. And apparently it did not.
For a newbie like me, it is important to note that things like bucket name and project name are global across its space. And what happened was that the name I used to create a new bucket is already used by other people. And no wonder that I do not have permission to access that bucket.
A better way to work with this is to name the bucket name wisely, like prefixing project name and application name.

Can't use wildcards for bucket names with gsutil for Google Cloud Storage?

Question: can wildcards be used in GCS bucketnames with gsutil?
I want to grab multiple files in GCS using wildcards that are split across buckets. But, I'm consistently running into errors when using wildcards in bucket names with gsutil. I'm using wildcards like this:
gsutil ls gs://myBucket-abcd-*/log/data_*
I want to match all these file names (variations in bucket name AND in object name):
gs://myBucket-abcd-1234/log/data_foo.csv
gs://myBucket-abcd-1234/log/data_bar.csv
gs://myBucket-abcd-5678/log/data_foo.csv
gs://myBucket-abcd-5678/log/data_bar.csv
Documentation on Bucket Wildcards tells me I should be able to use wildcards both in the bucketname and object name, but the code sample above always gets "BadRequestException: 400 Invalid argument."
gsutil is otherwise working when I use no wildcards or use wildcards in the object name only. But adding a wildcard to the bucket name results in the error. Are there workarounds to make the wildcard work in bucket names, or am I misinterpreting the linked documentation?
Found that not being able to use bucket wildcards in this case is working as intended, and is due to differences in permission settings. Google Cloud Storage permissions can be set at both bucket and project levels.
Though the access token used in this case can access every individual bucket, it doesn't have reader/editor/owner access to the top-level project (shared across many users of the system). Without access to the project, wildcards cannot be used on buckets.
This can be fixed by having a project owner add the user as a reader/editor/owner to the project.
In this case, for security reasons we can't give an individual token access to all buckets in the project, but its helpful to understand why the wildcard didn't work. Thanks all for the input, and especially Travis for the contact.
Some shells (Zsh) is trying to expand the * and ** , so you need to include these inside quotation marks. Like this
gsutil ls 'gs://myBucket-abcd-*/log/data_*'
I found it here gsutil returning "no matches found"