I'm have a problem with google cloud storage and gsutil when use -m option on file uploading. I want to upload ~400 files in multi thread mode (using -m option), but get errors: AccessDeniedException: 403 Rate limit exceeded. Please retry this request later.
This command i use:
gsutil -m -q rsync -R -c -d -e mydir gs://mybucket/mydir1/mydir
I'm using this from gce instance but from custom service account user which has a next access scopes: Editor; Storage Admin; Storage Object Admin; Storage Object Creator and it owner of bucket.
If I upload files without -m it work nice. In google docs I can't found anything about limits.
Please help with understanding this situation, thx!
Related
I am running a Google Compute Instance which must be able to connect to read and write to a bucket that is mounted locally.
At the moment, while ssh-ed into the machine I have the permission to read all the files in the directory but not to write them.
Here some more details:
gcloud init
account: PROJECT_NUMBER-compute#developer.gserviceaccount.com
When looking at the IAMs on google platform, this IAM has proprietary role, so that it should be able to access to all the resources in the project.
gcsfuse -o allow_other --file-mode 777 --dir-mode 777 --o nonempty BUCKET LOCAL_DIR
now looking at permissions, all file have (as expected)
ls -lh LOCAL_DIR/
drwxrwxrwx 1 ubuntu ubuntu 0 Jul 2 11:51 folder
However, when running a very simple python code saving a pickle into one of these directories, i get the following error
OSError: [Errno 5] Input/output error: FILENAME
If I run the gcsuse with --foreground flag, the error it produces is
fuse: 2018/07/02 12:31:05.353525 *fuseops.GetXattrOp error: function not implemented
fuse: 2018/07/02 12:31:05.362076 *fuseops.SetInodeAttributesOp error: SetMtime: \
UpdateObject: googleapi: Error 403: Insufficient Permission, insufficientPermissions
Which is weird as the account on the VM has proprietary role.
Any guess on how to overcome this?
Your instance requires the appropriate scopes to access GCS buckets. You can view the scopes through the console or using gcloud compute instances describe [instance_name] | grep scopes -A 10
You must have Storage read/write or https://www.googleapis.com/auth/devstorage.read_write
I would like to save a large file (approximately 50 GB) directly on Google Cloud storage. I tried gsutil cp https://archive.org/download/archiveteam-twitter-stream-2015-08/archiveteam-twitter-stream-2015-08.tar gs://my/folder, but that didn't work (InvalidUrlError: Unrecognized scheme "https").
Is there a way of doing that, without having to first download the file to my local storage?
Thanks!
You can use curl to fetch the URL and pipe it to gsutil. For example:
curl -L https://archive.org/download/archiveteam-twitter-stream-2015-08/archiveteam-twitter-stream-2015-08.tar | gsutil cp - gs://your/folder/archiveteam-twitter-stream-2015-08.tar
i'm working in a instance at us-central1-a zone and I can't copy a ~200GB file.
i've tried :
gsutil -m cp -L my.log my.file gs://my-bucket/
gsutil -m cp -L my.second.log my.file gs://my-bucket2/
And after several "catch ups" I get the following error:
CommandException: Some temporary components were not uploaded successfully. Please retry this upload.
CommandException: X files/objects could not be transferred.
Any clues?
Thanks
This is a message you'll see if gsutil's parallel composite uploads feature fails to upload at least one of the pieces of the file.
A couple of questions...
Have you already tried performing this upload again, after you saw this message?
If this error persists, could you please provide the stack trace from gsutil -d cp...
If you're consistently seeing this error and need an immediate fix (if this is a bug with parallel uploads), you can set parallel_composite_upload_threshold=0 in the GSUtil section your boto config to disable parallel uploads.
I had the same experience using gsutil. I fixed by installing the crcmod.
First run the command you have issues with using the debug flag, for example:
gsutil -d -m cp gs://<path_to_file_in_bucket>
In the output I can see:
CommandException: Downloading this composite object requires integrity checking with CRC32c, but your crcmod installation isn't using the module's C extension, so the hash computation will likely throttle download performance. For help installing the extension, please see "gsutil help crcmod".
To download regardless of crcmod performance or to skip slow integrity checks, see the "check_hashes" option in your boto config file.
NOTE: It is strongly recommended that you not disable integrity checks. Doing so could allow data corruption to go undetected during uploading/downloading.
You can follow the instructions here from google to install crcmod for your specific os: https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod
I got the same error message. I tried login in to gcloud again with
gcloud auth login
and then I could run the command successfully.
I can migrate data from Amazon AWS S3 to Azure using AWS SDK for Java and Azure SDk for Java. Now I want to do migrate data from Amazon AWS S3 to Google Cloud storage using Java.
The gsutil command-line tool supports S3. After you've configured gsutil, you'll see this in your ~/.boto file:
# To add aws credentials ("s3://" URIs), edit and uncomment the
# following two lines:
#aws_access_key_id =
#aws_secret_access_key =
Fill in the aws_access_key_id and aws_secret_access_key settings with your S3 credentials and uncomment the variables.
Once that's set up, copying from S3 to GCS is as easy as:
gsutil cp -R s3://bucketname gs://bucketname
If you have a lot of objects, run with the -m flag to perform the copy in parallel with multiple threads:
gsutil -m cp -R s3://bucketname gs://bucketname
Use the Google Cloud Storage transfer tool.
The answer suggested by jterrace (aws key and secret in .boto file) is correct and worked for me for many regions but not for some regions that need only AWS Signature Version 4. For instance while connecting to 'Mumbai' region I got this error:
BadRequestException: 400 InvalidRequest
The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256
In order to overcome this problem (make gsutil use AWS Signature v4) I had to add following additional lines to ~/.boto file. These lines create a new section [s3] in the config file:
[s3]
host = s3.ap-south-1.amazonaws.com
use-sigv4 = True
Reference:
Interoperability support for AWS Signature Version 4
Gsutil cannot copy to s3 due to authentication
Create a new .boto file
[Credentials]
aws_access_key_id = ACCESS_KEY_ID
aws_secret_access_key = SECRET_ACCESS_KEY
and this command
BOTO_CONFIG=.boto gsutil -m cp s3://bucket-name/filename gs://bucket-name
or this
BOTO_CONFIG=.boto gsutil -m cp gs://bucket-name/filename s3://bucket-name
AWS_ACCESS_KEY_ID=XXXXXXXX AWS_SECRET_ACCESS_KEY=YYYYYYYY gsutil -m cp s3://bucket-name/filename gs://bucket-name
This approach allows to copy data from s3 to gcs without the need of a a boto file. There can be situations where storing the credentials file in the running virtual machine is not recommended. With this approach we can integrate the gcp secret manager and generate the above command during runtime and execute, preventing the need to store the credentials permanently as a file stored in the machine.
Is it possible to automate gsutil based file upload to google cloud store so that the user intervention is not required for login?
My usecase is to have a jenkins job which polls a SCM location for changes to a set of files. If it detects any changes it will upload all files to a specific Google Cloud Store bucket.
After you configure your credentials once gsutil requires no further intervention. I suspect that you ran gsutil configure as user X but Jenkins runs as user Y. As a result, ~jenkins/.boto does not exist. If you place the .boto file in the right location you should be all set.
Another alternative is to use multiple .boto files and then tell gsutil which one to use with the BOTO_CONFIG environment variable:
gsutil config # complete oauth flow
cp ~/.boto /path/to/existing.boto
# detect that we need to upload
BOTO_CONFIG=/path/to/existing.boto gsutil -m cp files gs://bucket
I frequently use this pattern to use gsutil with multiple accounts:
gsutil config # complete oauth flow for user A
mv ~/.boto user-a.boto
gsutil config # complete oauth flow for user B
mv ~/.boto user-b.boto
BOTO_CONFIG=user-a.boto gsutil cp a-file gs://a-bucket
BOTO_CONFIG=user-b.boto gsutil cp b-file gs//b-bucket