How to download multiple files in Google Cloud Storage using gsutil - google-cloud-storage

I am trying to download the exported data from my GSuite (Google Workplace) account. I want to download multiple selected files.
I tried running the following command:
gsutil -m cp \
"gs://dummy-top-3bb5g8a2-s940-412a-bece-6f830889cc83/Status\ Red.pdf" \
"gs://dummy-top-export-3bb5g8a2-s940-412a-bece-6f830889cc83/Status\ Report.jpg" \
.
...but it failed with an error:
CommandException: Wrong number of arguments for "cp" command.
How do I download my files?

Somehow, the backslashes are not successfully escaping the newline characters (maybe there's some other whitespace characters after the trailing backslashes?), thus the arguments on the subsequent lines are not being passed to the command.

Related

What parameter(s) do I have to pass `gsutil` to access a Google Cloud local storage? (storage-testbench)

For test purposes, I want to run the storage-testbench simulator. It allows me to send REST commands to a local server which is supposed to work like a Google Cloud Storage facility.
In my tests, I want to copy 3 files from my local hard drive to that local GCS-like storage facility using gsutil cp .... I found out that in order to connect to that specific server, I need additional options on the command line as follow:
gsutil \
-o "Credentials:gs_json_host=127.0.0.1" \
-o "Credentials:gs_json_port=9000" \
-o "Boto:https_validate_certificates=False" \
cp -p test my-file.ext gs://bucket-name/my-file.ext
See .boto for details on defining the credentials.
Unfortunately, I get this error:
CommandException: No URLs matched: test
The name at the end (test) is the project identifier (-p test). There is an example in the README.md of the storage-testbench project, although it's just a variable in a URI.
How do I make the cp command work?
Note:
The gunicorn process shows that the first GET from the cp command works as expected. It returns a 200. So the issue seems to be inside gsutil. Also, I'm able to create the bucket just fine:
gsutil \
-o "Credentials:gs_json_host=127.0.0.1" \
-o "Credentials:gs_json_port=9000" \
-o "Boto:https_validate_certificates=False" \
mb -p test gs://bucket-name
Trying the mb a second time gives me a 509 as expected.
More links:
gsutil global options
gsutil cp ...

Why am I not able to run this curl command?

I tried to create a custom model for my IBM Watson Visual Recognition API, by following the IBM's docs. I'm stuck at this point.
There are two issues here
The auto unzipping feature (enabled) on your MacBook. Because of this, the Zip files are unzipped and the .zip files are moved to Trash folder. Just move the .zip files from Trash to the folder you are pointing to on the Terminal and run the command. It works!!
There's no need of curly braces in your API key, here's the curl command that worked for me (highlighted example APIKEY value)
curl -X POST -u "apikey:4nsDxUBNqlcL1bU_aAJl9lxxxxxxxx" -F "beagle_positive_examples=#beagle.zip" -F "goldenretriever_positive_examples=#golden-retriever.zip" -F "husky_positive_examples=#husky.zip" -F "negative_examples=#cats.zip" -F "name=dogs" "https://gateway.watsonplatform.net/visual-recognition/api/v3/classifiers?version=2018-03-19"

Gsutil rm does not remove everything

I have a problem with one of my automated jobs.
Before launching a cloud dataflow job, I perform a gsutil rm on previous files but it appears that it does not remove everything because when I launch another dataflow job some older shards remain.
I tried :
gsutil -m rm gs://mybucket/blahblah/*
and
gsutil rm -r gs://mybucket/blablah
But same result...
Strange thing is that not removed files are nor the first nor the last.
I tought it was my second job fault but the fact is that I saw in logs that indeed files were not removed bu gsutil.
Is there possibility that there is too many files to delete ?
Is there known problems of gsutil rm reliability ?
I use version 0.9.80 of google cloud sdk
Thanks
The gsutil rm commands you're using depend on listing the objects in a bucket, which is an eventually consistent operation in Google Cloud Storage. Thus, it's possible that attempting these commands in a bucket soon after objects were written will not remove all the objects. If you try again later it should succeed.
One way to avoid this problem would be to keep track of the names of the objects you uploaded, and explicitly list those objects in the gsutil rm command. For example, if you kept the object list in the file objects.manifest you could run a command like this on Linux or MacOS:
xargs gsutil -m rm < objects.manifest

gsutil returning "no matches found"

I'm trying using gsutil to remove the contents of a Cloud Storage bucket (but not the bucket itself). According to the documentation, the command should be:
gsutil rm gs://bucket/**
However, whenever I run that (with my bucket name substituted of course), I get the following response:
zsh: no matches found: gs://my-bucket/**
I've checked permissions, and I have owner permissions. Additionally, if I specify a file, which is in the bucket, directly, it is successfully deleted.
Other information which may matter:
My bucket name has a "-" in it (similar to "my-bucket")
It is the bucket that Cloud Storage saves my usage logs to
How do I go about deleting the contents of a bucket?
zsh is attempting to expand the wildcard before gsutil sees it (and is complaining that you have no local files matching that wildcard). Please try this, to prevent zsh from doing so:
gsutil rm 'gs://bucket/**'
Note that you need to use single (not double) quotes to prevent zsh wildcard handling.
If you have variables to replace, you can also just escape the wildcard character
Examples with copy (with interesting flags) and rm
GCP_PROJECT_NAME=your-project-name
gsutil -m cp -r gs://${GCP_PROJECT_NAME}.appspot.com/assets/\* src/local-assets/
gsutil rm gs://${GCP_PROJECT_NAME}.appspot.com/\*\*
gsutil rm gs://bucketName/doc.txt
And for remove entire bucket including all objects
gsutil rm -r gs://bucketname

Command to Download Remote URL Folder Contents

I was wondering if there was a command to download the contents of a remote folder, i.e all the files contained within that specific folder.
For instance, if we take the URL http://plugins.svn.wordpress.org/hello-dolly/trunk/ - How would it be possible to download the two files contained within the trunk onto my local machine without having to download each file manually?
Also, if there is a way to download all contents including both files AND any listed subdirectories that would be great.
If you ever need to download an entire Web site, perhaps for off-line viewing, wget can do the job.
For example:
$ wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains wordpress.org \
--no-parent \
http://plugins.svn.wordpress.org/hello-dolly/trunk/
This command downloads the Web site http://plugins.svn.wordpress.org/hello-dolly/trunk/
The options are:
--recursive: download the entire Web site.
--domains wordpress.org: don't follow links outside wordpress.org.
--no-parent: don't follow links outside the directory tutorials/html/.
--page-requisites: get all the elements that compose the page (images, CSS and so on).
--html-extension: save files with the .html extension.
--convert-links: convert links so that they work locally, off-line.
--restrict-file-names=windows: modify filenames so that they will work in Windows as well.
--no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed).