Can't connect to GCS bucket from Python despite being logged in - google-cloud-storage

I have a GCS bucket set up that contains data that I want to access remotely. As per the instructions, I have logged in via gcloud auth login, and have confirmed that I have an active, credentialed account via gcloud auth list. However, when I try to access my bucket (using the Python google.cloud.storage API), I get the following:
HttpError: Anonymous caller does not have storage.objects.list access to <my-bucket-name>.
I'm not sure why it is being accessed anonymously, since I am clearly logged in. Is there something obvious I am missing?

The Python GCP library (and others) uses another authentication mechanism than the gcloud command.
Follow this guide to set up your environment and have access to GCS with Python.
gcloud aut login sets up the gcloud command tool with your credentials.
However, the way forward when executing code, is to have a Service Account. Then, when the env. variable GOOGLE_APPLICATION_CREDENTIALS has been set. Python will use the Service Account credentials
Edit
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_your_.json_credential_file"
Edit
And then, to download gs://my_bucket/my_file.csv to a file: (from the python-docs-samples)
download_blob('my_bucket', 'my_file.csv', 'local/path/to/file.csv')
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print('Blob {} downloaded to {}.'.format(
source_blob_name,
destination_file_name))

Related

ML-Engine unable to access job_dir directory in bucket

I am attempting to submit a job for training in ML-Engine using gcloud but am running into an error with service account permissions that I can't figure out. The model code exists on a Compute Engine instance from which I am running gcloud ml-engine jobs submit as part of a bash script. I have created a service account (ai-platform-developer#....iam.gserviceaccount.com) for gcloud authentication on the VM instance and have created a bucket for the job and model data. The service account has been granted Storage Object Viewer and Storage Object Creator roles for the bucket and the VM and bucket all belong to the same project.
When I try to submit a job per this tutorial, the following are executed:
time_stamp=`date +"%Y%m%d_%H%M"`
job_name='ObjectDetection_'${time_stamp}
gsutil cp object_detection/samples/configs/faster_rcnn_resnet50.config
gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config
gcloud ml-engine jobs submit training ${job_name} \
--project [project-name] \
--runtime-version 1.12 \
--job-dir=gs://[bucket-name]/jobs/${job_name} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
--module-name object_detection.model_main \
--region us-central1 \
--config object_detection/training-config.yml \
-- \
--model_dir=gs://[bucket-name]/output/${job_name}} \
--pipeline_config_path=gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config
where [bucket-name] and [project-name] are placeholders for the bucket created above and the project it and the VM are contained in.
The config file is successfully uploaded to the bucket, I can confirm it exists in the cloud console. However, the job fails to submit with the following error:
ERROR: (gcloud.ml-engine.jobs.submit.training) User [ai-platform-developer#....iam.gserviceaccount.com] does not have permission to access project [project-name] (or it may not exist): Field: job_dir Error: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
- '#type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
field: job_dir
If I look in the cloud console, the files specified by --packages exist in that location, and I've ensured the service account ai-platform-developer#....iam.gserviceaccount.com has been given Storage Object Viewer and Storage Object Creator roles for the bucket, which has bucket level permissions set. After ensuring the service account is activated and the default, I can also run
gsutil ls gs://[bucket-name]/jobs/ObjectDetection_20190709_2001
which successfully returns the contents of the folder without a permission error. In the project, there exists a managed service account service-[project-number]#cloud-ml.google.com.iam.gserviceaccount.com and I have also granted this account Storage Object Viewer and Storage Object Creator roles on the bucket.
To confirm this VM is able to submit a job, I am able to switch the gcloud user to my personal account and the script runs and submits a job without any error. However, since this exists in a shared VM, I would like to rely on service account authorization instead of my own user account.
I had a similar problem with exactly the same error.
I found that the easiest way to troubleshoot those errors is to go to "Logging" and search for "PERMISSION DENIED" text.
In my case service account was missing permission "storage.buckets.get". Then you would need to find a role that have this permission. You could do that from IAM->Roles. In that view you could filter roles by permission name. It turned out that only following roles have the needed permission:
Storage Admin
Storage Legacy Bucket Owner
Storage Legacy Bucket Reader
Storage Legacy Bucket Writer
I added "Storage Legacy Bucket Writer" role to the service account in the bucket and then was able to submit a job.
Have you tried to look in the Compute Engine scope?
Shutdown instance, Edit and change Cloud API access scopes to:
Allow full access to all Cloud APIs

Recovering access after initially provisioning wrong scopes for an instance

I recently created a VM, but mistakenly gave the default service account Storage: Read Only permissions instead of the intended Read Write under "Identity & API access", so GCS write operations from the VM are now failing.
I realized my mistake, so following the advice in this answer, I stopped the VM, changed the scope to Read Write and started the VM. However, when I SSH in, I'm still getting 403 errors when trying to create buckets.
$ gsutil mb gs://some-random-bucket
Creating gs://some-random-bucket/...
AccessDeniedException: 403 Insufficient OAuth2 scope to perform this operation.
Acceptable scopes: https://www.googleapis.com/auth/cloud-platform
How can I fix this? I'm using the default service account, and don't have the IAM permissions to be able to create new ones.
$ gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* (projectnum)-compute#developer.gserviceaccount.com
I will suggest you to try add the scope "cloud-platform" to the instance by running the gcloud command below
gcloud alpha compute instances set-scopes INSTANCE_NAME [--zone=ZONE]
[--scopes=[SCOPE,…] [--service-account=SERVICE_ACCOUNT
As a scopes put "https://www.googleapis.com/auth/cloud-platform" since it give Full access to all Google Cloud Platform resources.
Here is gcloud documentation
Try creating the Google Cloud Storage bucket with your user account.
Type gcloud auth login and access the link you are provided, once there, copy the code and paste it into the command line.
Then do gsutil mb gs://bucket-name.
The security model has 2 things at play, API Scopes and IAM permissions. Access is determined by the AND of them. So you need an acceptable scope and enough IAM privileges in order to do whatever action.
API Scopes are bound to the credentials. They are represented by a URL like, https://www.googleapis.com/auth/cloud-platform.
IAM permissions are bound to the identity. These are setup in the Cloud Console's IAM & admin > IAM section.
This means you can have 2 VMs with the default service account but both have different levels of access.
For simplicity you generally want to just set the IAM permissions and use the cloud-platform API auth scope.
To check if you have this setup go to the VM in cloud console and you'll see something like:
Cloud API access scopes
Allow full access to all Cloud APIs
When you SSH into the VM by default gcloud will be logged in as the service account on the VM. I'd discourage logging in as yourself otherwise you more or less break gcloud's configuration to read the default service account.
Once you have this setup you should be able to use gsutil properly.

How to (set permissions to) access a Google Storage bucket from a command line program?

In my particular case, I would like to open an object with Tensorboard (A Tensorflow component). The command line instruction is the following:
# tensorboard --logdir=gs://mybucket/myobject
gs://mybucket/myobject is not a public object. So the line above generates a forbidden access error. The closest thing I found is gsutil signurl but it generates an http download link. What I think I need is an authenticated gs://... link, how can I create it?
I believe that Tensorboard uses Application Default Credentials for auth. If you have the gcloud command installed, try running gcloud auth application-default login before running tensorboard, and I believe that it should use your credentials to fetch the GCS object.

Google cloud credentials totally hosed after attempting to setup boto

I had a gcloud user authenticated and was running gsutils fine from the command line (Windows 8.1). But I needed to access gsutils from a python application so I followed the instructions here:
https://cloud.google.com/storage/docs/xml-api/gspythonlibrary#credentials
I got as far as creating a .boto file, but now not only does the my python code fail (boto.exception.NoAuthHandlerFound: No handler was ready to authenticate.). But I can't run bsutils from the command line any more. I get this error:
C:\>gsutil ls
You are attempting to access protected data with no configured
credentials. Please visit https://cloud.google.com/console#/project
and sign up for an account, and then run the "gcloud auth login"
command to configure gsutil to use these credentials.
I have run gcloud auth and it appears to work, I can query my users:
C:\>gcloud auth list
Credentialed Accounts:
- XXXserviceuser#XXXXX.iam.gserviceaccount.com ACTIVE
- myname#company.name
To set the active account, run:
$ gcloud config set account `ACCOUNT`
I have tried both with the account associated with my email active, and the new serveruser account (created following instructions above). Same "protected data with no configured credentials." error. I tried removing the .boto file, and adding the secret CLIENT_ID and CLIENT_SECRET to my .boto file.
Anyone any ideas what the issue could be?
So I think the latest documentation/examples showing how to use (and authenticate) Google Cloud storage via python is in this repo:
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/storage/api
That just works for me without messing around with keys and service users.
Would be nice if there was a comment somewhere in the old gspythonlibrary docs pointing this out.

gsutil copy returning "AccessDeniedException: 403 Insufficient Permission" from GCE

I am logged in to a GCE instance via SSH. From there I would like to access the Storage with the help of a Service Account:
GCE> gcloud auth list
Credentialed accounts:
- 1234567890-compute#developer.gserviceaccount.com (active)
I first made sure that this Service account is flagged "Can edit" in the permissions of the project I am working in. I also made sure to give him the Write ACL on the bucket I would like him to copy a file:
local> gsutil acl ch -u 1234567890-compute#developer.gserviceaccount.com:W gs://mybucket
But then the following command fails:
GCE> gsutil cp test.txt gs://mybucket/logs
(I also made sure that "logs" is created under "mybucket").
The error message I get is:
Copying file://test.txt [Content-Type=text/plain]...
AccessDeniedException: 403 Insufficient Permission 0 B
What am I missing?
One other thing to look for is to make sure you set up the appropriate scopes when creating the GCE VM. Even if a VM has a service account attached, it must be assigned devstorage scopes in order to access GCS.
For example, if you had created your VM with devstorage.read_only scope, trying to write to a bucket would fail, even if your service account has permission to write to the bucket. You would need devstorage.full_control or devstorage.read_write.
See the section on Preparing an instance to use service accounts for details.
Note: the default compute service account has very limited scopes (including having read-only to GCS). This is done because the default service account has Project Editor IAM permissions. If you use any user service account this is not typically a problem since user created service accounts get all scope access by default.
After adding necessary scopes to the VM, gsutil may still be using cached credentials which don't have the new scopes. Delete ~/.gsutil before trying the gsutil commands again. (Thanks to #mndrix for pointing this out in the comments.)
You have to log in with an account that has the permissions you need for that project:
gcloud auth login
gsutil config -b
Then surf to the URL it provides,
[ CLICK Allow ]
Then copy the verification code and paste to terminal.
Stop VM
goto --> VM instance details.
in "Cloud API access scopes" select "Allow full access to all Cloud APIs" then
Click "save".
restart VM and Delete ~/.gsutil .
I have written an answer to this question since I can not post comments:
This error can also occur if you're running the gsutil command with a sudo prefix in some cases.
After you have created the bucket, go to the permissions tab and add your email and set Storage Admin permission.
Access VM instance via SSH >> run command: gcloud auth login and follow the steps.
Ref: https://groups.google.com/d/msg/gce-discussion/0L6sLRjX8kg/kP47FklzBgAJ
So I tried a bunch of things trying to copy from GCS bucket to my VM.
Hope this post helps someone.
Via SSHed connection:
and following this script:
sudo gsutil cp gs://[BUCKET_NAME]/[OBJECT_NAME] [OBJECT_DESTINATION_IN_LOCAL]
Got this error:
AccessDeniedException: 403 Access Not Configured. Please go to the Google Cloud Platform Console (https://cloud.google.com/console#/project) for your project, select APIs and Auth and enable the Google Cloud Storage JSON API.
What fixed this was following "Activating the API" section mentioned in this link -
https://cloud.google.com/storage/docs/json_api/
Once I activated the API then I authenticated myself in SSHed window via
gcloud auth login
Following authentication procedure I was finally able to download from Google Storage Bucket to my VM.
PS
I did make sure to:
Make sure that gsutils are installed on my VM instance.
Go to my bucket, go to the permissions tab and add desired service accounts and set Storage Admin permission / role.
3.Make sure my VM had proper Cloud API access scopes:
From the docs:
https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances#changeserviceaccountandscopes
You need to first stop the instance -> go to edit page -> go to "Cloud API access scopes" and choose "storage full access or read/write or whatever you need it for"
Changing the service account and access scopes for an instance If you
want to run the VM as a different identity, or you determine that the
instance needs a different set of scopes to call the required APIs,
you can change the service account and the access scopes of an
existing instance. For example, you can change access scopes to grant
access to a new API, or change an instance so that it runs as a
service account that you created, instead of the Compute Engine
Default Service Account.
To change an instance's service account and access scopes, the
instance must be temporarily stopped. To stop your instance, read the
documentation for Stopping an instance. After changing the service
account or access scopes, remember to restart the instance. Use one of
the following methods to the change service account or access scopes
of the stopped instance.
Change the permissions of bucket.
Add a user for "All User" and give "Storage Admin" access.