I am trying to run pyspark application using gcloud on dataproc master-node . I get "Request had insufficient authentication scopes"
# gcloud dataproc jobs submit pyspark --cluster xxxxx test.py
gcloud.dataproc.jobs.submit.pyspark) You do not have permission
to access cluster [xxxxxx] (or it may not exist):
Request had insufficient authentication scopes
I can run same application through Jobs GUI. I dont have the link to doc rightnow but it says if this is been run on Compute VM no separate credential is required which seem to be in line with when I run the same application using GUI. Any help ?
When running from a Dataproc node, you'll be acting on behalf of a service account attached to the VM. Usually you'll be using a default compute engine service account, but this can also be specified with the Dataproc service account configuration. Alongside a service account, there's also a list of scopes, which limit which GCP services that service account is allowed to access from that VM. By default, there is BigQuery, GCS, logging, and some other small scopes, but not a general administrative scope for doing things like creating other VMs or Dataproc clusters.
To grant the necessary scope, you have to add --scopes when first creating your cluster:
gcloud dataproc clusters create --scopes cloud-platform ...
Related
I've got a container inside a GKE cluster and I want it to be able to talk to the Kubernetes API of another GKE cluster to list some resources there.
This works well if run the following command in a separate container to proxy the connection for me:
gcloud container clusters get-credentials MY_CLUSTER --region MY_REGION --project MY_PROJECT; kubectl --context MY_CONTEXT proxy --port=8001 --v=10
But this requires me to run a separate container that, due to the size of the gcloud cli is more than 1GB big.
Ideally I would like to talk directly from my primary container to the other GKE cluster. But I can't figure out how to figure out the IP address and set-up the authentication required for the connection.
I've seen a few questions:
How to Authenticate GKE Cluster on Kubernetes API Server using its Java client library
Is there a golang sdk equivalent of "gcloud container clusters get-credentials"
But it's still not really clear to me if/how this would work with the Java libraries, if at all possible.
Ideally I would write something like this.
var info = gkeClient.GetClusterInformation(...);
var auth = gkeClient.getAuthentication(info);
...
// using the io.fabric8.kubernetes.client.ConfigBuilder / DefaultKubernetesClient
var config = new ConfigBuilder().withMasterUrl(inf.url())
.withNamespace(null)
// certificate or other autentication mechanishm
.build();
return new DefaultKubernetesClient(config);
Does that make sense, is something like that possible?
There are multiple ways to connect to your cluster without using the gcloud cli, since you are trying to access the cluster from another cluster within the cloud you can use the workload identity authentication mechanism. Workload Identity is the recommended way for your workloads running on Google Kubernetes Engine (GKE) to access Google Cloud services in a secure and manageable way. For more information refer to this official document. Here they have detailed a step by step procedure for configuring workload identity and provided reference links for code libraries.
This is drafted based on information provided in google official documentation.
I removed a bunch of IAM policies and think this is preventing me from creating k8s clusters in Google Cloud (through the UI).
Every time I click Create cluster, it processes for a bit, before hanging up and throwing the following error:
Create Kubernetes Engine cluster "standard-cluster-1"
Just now
MyProject
Google Compute Engine: Required 'compute.zones.get' permission for 'projects/<MY_PROJECT_ID>/zones/us-central1-a'.
I'm mainly doing this through my host shell (iTerm) and NOT through the interactive shell found on cloud.google.com.
Here's the IAM policy for a user (I use my google email address under the Member column):
Really hoping to get unblocked so I can start creating clusters in my shell again and not have to use the interactive shell on the Google Cloud website.
You are missing ServiceAgent roles. But only service accounts can be granted those roles.
1) First, copy you project number
2) create following members for the Service Agents replacing 77597574896 with your project number and set appropriate roles:
service-77597574896#container-engine-robot.iam.gserviceaccount.com - Kubernetes Engine Service Agent
service-77597574896#compute-system.iam.gserviceaccount.com - Kubernetes Engine Service Agent
77597574896#cloudservices.gserviceaccount.com - Editor
This should work now, because I've tested it with my cluster
In order to create new cluster container - please just add new role in yours IAM settings:
- Kubernetes Engine Admin,
Please share with the results.
#dennis-huo
Using non-default service account in Google Cloud dataproc
In continuation to the above problem
I wanted to setup a dataproc cluster for multi user. Since the compute engine of Dataproc cluster uses a default service or custom service account credentials to connect to storage bucket using --properties core:fs.gs.auth.service.account.json.keyfile which doesn't have any relation with user principals who submits the jobs or I couldn't find an option to control it, which makes the dataproc cluster insecure and creates a problem it introduces another level of indirection in multi-user environment, when key file used does not correspond to principal.
In my case we are submitting the job using gcloud dataproc jobs submit hadoop because my thought is to controls access to dataproc cluster using IAM roles but during the job submission the user principals are not getting forward to the hadoop cluster and as well as gcloud cli doesn't perform any access validation on storage buckets at client side , the job always executed as root user. May I know how to map the users to their service account do you have any solution for this case?
All we need is the Hadoop Map Reduce submitted by users using gcloud dataproc jobs submit hadoop should be able to use only the storage buckets or folder which user has access to it.
Current:
gcloud dataproc jobs (IAM - user principal) -> Dataproc Cluster (IAM - user principal) -> (SA Default/custom) -> Storage Bucket
If the user has access to submit jobs to Dataproc cluster can use any storage-buckets which the service account has access to it.
Required:
gcloud dataproc jobs (IAM - user principal) -> Dataproc Cluster (IAM - user principal) -> (IAM - user principal) -> Storage Bucket
The user has access to submit jobs to Dataproc cluster can only use the storage-buckets which the user account has access to it.
So far I couldn't find a way to do it. Can you please help me on it
Is there any workaround or solution available to this problem?
You may try this:
Add custom role , for exmple create roleA for BucketA / roleB for BucketB
Assign Service Account or IAM to this Role. for exmple user1,user2 roleA user1,user3 roleB
By Edit bucket permission, add member to specific role , for example bucketA -> roleA
Then The user has access to submit jobs to Dataproc cluster can only use the storage-buckets which the user account has access to it.
I created a kubernetes cluster under my user account on IBM Bluemix, and added another into my organization. But he can't see my cluster. Is there any other configure?
To manage cluster access, see this link from the IBM Bluemix Container Service documentation. Summarised here:
Managing cluster access
You can grant access to your cluster to other users, so that they can
access the cluster, manage the cluster, and deploy apps to the
cluster.
Every user that works with IBM Bluemix Container Service must be
assigned a service-specific user role in Identity and Access
Management that determines what actions this user can perform.
Identity and Access Management differentiates between the following
access permissions.
IBM Bluemix Container Service access policies
Access policies determine the cluster management actions that you can
perform on a cluster, such as creating or removing clusters, and
adding or removing extra worker nodes.
Cloud Foundry roles
Every user must be assigned a Cloud Foundry user role. This role
determines the actions that the user can perform on the Bluemix
account, such as inviting other users, or viewing the quota usage. To
review the permissions of each role, see Cloud Foundry roles.
RBAC roles
Every user who is assigned an IBM Bluemix Container Service access
policy is automatically assigned an RBAC role. RBAC roles determine
the actions that you can perform on Kubernetes resources inside the
cluster. RBAC roles are set up for the default namespace only. The
cluster administrator can add RBAC roles for other namespaces in the
cluster. See Using RBAC Authorization in the
Kubernetes documentation for more information.
I’m investigating this letsencrypt controller (https://github.com/tazjin/kubernetes-letsencrypt).
It requires pods have permission to make changes to records in Cloud DNS. I thought with the pods running on GKE I’d get that access with the default service account, but the requests are failing. What do I need to do do to allow the pods access to Cloud DNS?
The Google Cloud DNS API's changes.create call requires either the https://www.googleapis.com/auth/ndev.clouddns.readwrite or https://www.googleapis.com/auth/cloud-platform scope, neither of which are enabled by default on a GKE cluster.
You can add a new Node Pool to your cluster with the DNS scope by running:
gcloud container node-pools create np1 --cluster my-cluster --scopes https://www.googleapis.com/auth/ndev.clouddns.readwrite
Or, you can create a brand new cluster with the scopes you need, either by passing the --scopes flag to gcloud container clusters create, or in the New Cluster dialog in Cloud Console, click "More", and set the necessary scopes to "Enabled".