I am spinning up a kubernetes job as a helm pre-install hook on GKE.
The job uses google/cloud-sdk image and I want it to create a compute engine persistent disk.
Here is its spec:
spec:
restartPolicy: OnFailure
containers:
- name: create-db-hook-container
image: google/cloud-sdk:latest
command: ["gcloud"]
args: ["compute", "disks", "create", "--size={{ .Values.volumeMounts.gceDiskSize }}", "--zone={{ .Values.volumeMounts.gceDiskZone }}", "{{ .Values.volumeMounts.gceDiskName }}"]
However this fails with the following error:
brazen-lobster-create-pd-hook-nc2v9 create-db-hook-container ERROR:
(gcloud.compute.disks.create) Could not fetch resource: brazen-lobster-create-pd-hook-nc2v9
create-db-hook-container
- Insufficient Permission: Request had insufficient authentication scopes.
brazen-lobster-create-pd-hook-nc2v9 create-db-hook-container
Apparently I have to grant the gcloud.compute.disks.create permission.
My question is to whom I have to grant this permission?
This is a GCP IAM permission therefore I assume it cannot be granted specifically on a k8s resource (?) so it cannot be dealt within the context of k8s RBAC, right?
edit: I have created a ComputeDiskCreate custom role, that encompasses two permissions:
gcloud.compute.disks.create
gcloud.compute.disks.list
I have attached it to service account
service-2340842080428#container-engine-robot.uam.gserviceaccount.com that my IAM google cloud console has given the name
Kubernetes Engine Service Agent
but the outcome is still the same.
In GKE, all nodes in a cluster are actually Compute Engine VM instances. They're assigned a service account at creation time to authenticate them to other services. You can check the service account assigned to nodes by checking the corresponding node pool.
By default, GKE nodes are assigned the Compute Engine default service account, which looks like PROJECT_NUMBER-compute#developer.gserviceaccount.com, unless you set a different one at cluster/node pool creation time.
Calls to other Google services (like the compute.disks.create endpoint in this case) will come from the node and be authenticated with the corresponding service account credentials.
You should therefore add the gcloud.compute.disks.create permission to your nodes' service account (likely PROJECT_NUMBER-compute#developer.gserviceaccount.com) in your Developer Console's IAM page.
EDIT: Prior to any authentication, the mere ability for a node to access a given Google service is defined by its access scope. This is defined at node pool's creation time and can't be edited. You'll need to create a new node pool and ensure you grant it the https://www.googleapis.com/auth/compute access scope to Compute Engine methods. You can then instruct your particular pod to run on those specific nodes.
Related
I've deployed an EKS cluster using the Terraform module terraform-aws-modules/eks/aws. I’ve deployed Airflow on this EKS cluster with Helm (the official chart, not the community one), and I’ve annotated worker pods with the following IRSA:
serviceAccount:
# Specifies whether a ServiceAccount should be created
create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the release name
name: "airflow-worker"
# Annotations to add to worker kubernetes service account.
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/airflow-worker"
This airflow-worker role has a policy attached to it to enable it to assume a different role.
I have a Python program that assumes this other role and performs some S3 operations. I can exec into a running BashOperator pod, open a Python shell, assume this role, and issue the exact same S3 operations successfully.
But, when I create a Docker image with this program and try to call it from a KubernetesPodOperator task, I see the following error:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation:
User: arn:aws:sts::123456789:assumed-role/core_node_group-eks-node-group-20220726041042973200000001/i-089c64b96cf7878d8 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::987654321:role/TheOtherRole
I don't really know what this role is, but I believe it was created automatically by the Terraform module. However, when I kubectl describe one of these failed pods, I see this:
Environment:
...
...
...
AWS_ROLE_ARN: arn:aws:iam::123456789:role/airflow-worker
My questions:
Why is this role being used, and not the IRSA airflow-worker that I've specified in the Helm chart's values?
What even is this role? It seems the Terraform module creates a number of roles automatically, but it is very difficult to tell what their purpose is or where they're used from the Terraform documentation.
How am I able to assume this role and do everything the Dockerized Python program does when in a shell in the pod? Okay, this is because other operators (such as BashOperator) do use the airflow-worker role. Just not KubernetesPodOperators.
What is the AWS_IAM_ROLE environment variable, and why isn't it being used?
Happy to provide more context if it's helpful.
In order to use the AWS role in EKS pod, you need to add this policy to it:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": " arn:aws:iam::123456789:role/airflow-worker”
},
"Action": "sts:AssumeRole"
}
]
}
Here you can find some information about AWS Security Token Service (STS).
For the tasks running in the worker prod, they will use the role automatically, but if you create a new pod, it will be separated from your worker pod, so you need to let it use the service account which attach the role in order to add the AWS role creds file to the pod.
This is pretty much by design. The non-KubernetesPodOperators use an auto-generated pod template file that has Helm chart values as default properties, while the KubernetesPodOperator needs its own pod template file. That, or it needs to essentially create one by passing arguments to KubernetesPodOperator(....
I fixed the ultimate issue by passing service_account="airflow-worker" to KubernetesPodOperator(....
It seems that we cannot make the Snowplow container (snowplow/scala-stream-collector-kinesis) use the service account we provide. It always uses the shared-eks-node-role but not the provided service account. The config is set to default for both the accessKey as the secretKey.
This is the service account part we use:
apiVersion: v1
kind: ServiceAccount
metadata:
name: thijs-service-account
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123:role/thijs-eks-service-account-role-snowplow
And when I inspect the pod I can see the account:
AWS_ROLE_ARN: arn:aws:iam::123:role/thijs-eks-service-account-role-snowplow
The error then shows not the right account.
Exception in thread "main" com.amazonaws.services.kinesis.model.AmazonKinesisException: User: arn:aws:sts::123:assumed-role/shared-eks-node-role/i-123 is not authorized to perform: kinesis:DescribeStream on resource: arn:aws:kinesis:eu-west-1:123:stream/snowplow-good (Service: AmazonKinesis; Status Code: 400; Error Code: AccessDeniedException; Request ID: 123-123-123; Proxy: null)
The collector itself doesn't do any role swapping. It only cares to receive credentials via one of three methods:
the default creds provider chain
a specific IAM role
environment variables.
The most popular deployment is on an EC2 instance, in which case the default EC2 role can be used to access other resources in the account.
It looks like when you are deploying it on EKS things are not as straightforward. The collector seems to work with this assumed role: arn:aws:sts::123:assumed-role/shared-eks-node-role/i-123 but it is not authorised with Kinesis permissions. Do you know what process creates that role? Perhaps you could add the missing Kinesis policies there?
I had the same issue. First make sure you have the IAM role setup correctly according to https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html. Make sure the names are consistent and it has the right permissions.
Once you've double-checked that, make sure you are on a recent version of snowplow. An old version might not have the right version of the AWS SDK. You need at least AWS SDK v1.12.128 or for AWS SDK v2, 2.10.11 [link].
Finally set the aws accessKey and secretKey in your snowplow configuration file to default. Redeploy and make sure the pod and service account has been recreated. You should be good at this point.
Reference:
https://github.com/snowplow/stream-collector/issues/186
I have the same issue.
It can't use env for the values, because those are not set. However, the collector is runnning as a container - it should use the default credential chain.
From the notes, it looks like without the the env variables being set, i should use iam - which when I do this, it uses the IAM Instance Profile, which loads the underlying nodes role - not the role specified by the SA.
The SDK supports IRSA (i have updated snowplow collector container image to 1 that has supported SDK of greater than 1.11.704 as per supported versions), and from what I can see from the collector docs, the streams config needs an aws block with either env or iam as values... but I want to use the default credential chain without specifying a method....
If i connect to the container, i can see that the creds are set up as per the SA:
$ env | grep -i aws
AWS_REGION=my-region
AWS_DEFAULT_REGION=my-region
AWS_ROLE_ARN=arn:aws:iam::<redacted>:role/sp-collector-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
But when I run the collector, it still uses the nodes IAM Instance profile, and I don't see any activity under sp-collector-role. Is there a way to use the default credential chain? e.g. with aws CLI in on a container in the same service account, I don't specify any credentials, but when I run aws sts get-caller-identity the SDK resolves the IRSA role correctly.
I am using getSignedUrl to get a public authenticated url for a video. It is working fine in my local machine. But after deploying it in GKE, it is not working. I have checked a related question on SigningError with Firebase getSignedUrl(). But I don't see a service account for GKE to configure those roles. I have already assigned full storage and service enabled permissions to the cluster while creating the kubernetes cluester.
Do I have to add any more permissions to get rid of this error or should I do anything else.
This issue got fixed. I have followed this link https://cloud.google.com/kubernetes-engine/docs/tutorials/authenticating-to-cloud-platform#console to fix this issue.
We have to access the service account from the GKE. Google cloud service accounts are not directly accessed by the GKE. I have Followed the below steps to access google cloud service account from GKE.
We have to create service account with the required roles - Storage Object Creator and Service Account Token Creator.
Generate a key and save the json file in your app for one time.
Add volume, volumeMounts, GOOGLE_APPLICATION_CREDENTIALS env variable to deployment.yaml
Use kubectl create secret generic [key name] --from-file=key.json=PATH-TO-KEY-FILE.json
Deploy your manifest using kubectl apply -f deployment.yaml.
These steps will provide access to storage and service account which will fix the signingError.
Is it possible to invoke a kubernetes Cron job inside a pod . Like I have to run this job from the application running in pod .
Do I have to use kubectl inside the pod to execute the job .
Appreciate your help
Use the Default Service Account to access the API server. When you
create a pod, if you do not specify a service account, it is
automatically assigned the default service account in the same
namespace. If you get the raw json or yaml for a pod you have created
(for example, kubectl get pods/ -o yaml), you can see the
spec.serviceAccountName field has been automatically set.
You can access the API from inside a pod using automatically mounted
service account credentials, as described in Accessing the Cluster.
The API permissions of the service account depend on the authorization
plugin and policy in use.
In version 1.6+, you can opt out of automounting API credentials for a
service account by setting automountServiceAccountToken: false on the
service account
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
So the First task is to either grant the permission of doing what you need to create to the default service account of the pod OR create a custom service account and use it inside the pod
Programatically access the API server using that service account to create the job you need
It could be just a simple curl POST to the API server from inside the pod with the json for the job creation
How do I access the Kubernetes api from within a pod container?
you can also use the application specific SDK , for example if you have a python application , you can import kubernetes and run the job.
I'm trying to follow this tutorial to setup an nginx-ingress controller.
It seems it was written before RBAC was fully integrated into k8s. When I get to the final step of running the nginx-controller.yaml I get back an authorization error:
no service with name default/default-http-backend found: services "default-http-backend" is forbidden: User "system:serviceaccount:default:default" cannot get services in the namespace "default"
What do I need to do to make this work with RBAC?
That hackernoon post (like most of them) is incorrent. Specifically there are no RBAC objects and the deployment is not assigned a service account (i.e.: serviceAccountName: ).
To ensure that you have the right (or enough) RBAC objects created check out the RBAC-* objects at https://github.com/mateothegreat/k8-byexamples-ingress-controller/tree/master/manifests.