How to create GKE using a service account in another project - kubernetes

I have a project A in which I have created a service account.
I want to create a GKE in project B.
I followed the steps of service account impersonation listed here https://cloud.google.com/iam/docs/impersonating-service-accounts
in project A,
the default-service-accounts of project B have roles/iam.serviceAccountTokenCreator and roles/iam.serviceAccountUser on the service account I created which is my-service-account
in project B,
my-service-account has Kubernetes admin role
When I try to create, I end up with the error
Error: Error waiting for creating GKE NodePool: All cluster resources were brought up, but: only 0 nodes out of 1 have registered; cluster may be unhealthy.
I am using terraform to create this cluster and the service account being used by terraform has kubernetes admin and service account user role.
This is what it shows in the console
GKE error
Edit:
I tried using Gcloud command line to create GKE
gcloud beta container --project "my-project" clusters create "test-gke-sa" --zone "us-west1-a" --no-enable-basic-auth --cluster-version "1.18.16-gke.502" --release-channel "regular" --machine-type "e2-standard-16" --image-type "COS" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes "3" --enable-stackdriver-kubernetes --enable-private-nodes --master-ipv4-cidr "192.168.0.16/28" --enable-ip-alias --network "projects/infgprj-sbo-n-hostgs-gl-01/global/networks/my-network" --subnetwork "projects/my-network/regions/us-west1/subnetworks/my-subnetwork" --cluster-secondary-range-name "gke1-pods" --services-secondary-range-name "gke1-services" --default-max-pods-per-node "110" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-shielded-nodes --shielded-secure-boot --node-locations "us-west1-a" --service-account="my-service-account#project-a.iam.gserviceaccount.com"
Got the same errors.
I see that the node-pool is created, but not nodes. (or atleast they are not attached to the node-pool?)
here are some more pics of errors
VM page
GKE page
Solution: Finally, I figured, what was wrong. I had given token creator role to only default service accounts. It started working when I gave the same role to default service agents as well. So basically
role = "roles/iam.serviceAccountTokenCreator",
members = [
"serviceAccount:{project-number}-compute#developer.gserviceaccount.com",
"serviceAccount:service-{project-number}#container-engine-robot.iam.gserviceaccount.com",
"serviceAccount:service-{project-number}#compute-system.iam.gserviceaccount.com",
]

Just to confirm that it's a service account error and not something involving Terraform, I recommend that you:
A. impersonate Project A's service account and confirm that you are who you're trying to be with this command - gcloud auth list (the active account is the one with the star next to it), and then
B. try creating a cluster in Project B with gcloud container clusters create - here are the reference docs but you can also:
go to Console > Kubernetes Engine
click on "Create,"
scroll down to the bottom of the form and click on the "COMMAND LINE" link to launch a modal that generates the syntax of the CLI command you'd want to run
copy, paste, tweak to make it create only one node and what other basic settings you want to change...make sure it's specifying --project=project-B
run the command
That will likely give you a more helpful error message. Or at least a different one, so, hurray?

Usually the above error may be caused by following reasons
1] If Shared VPC, verify IAM permissions are correct.
2] Verify Auto generated Ingress Firewall Rules are created
Usually three firewall rules are created
gke-${cluster_name}-${random_char}-all : Firewall Rule for pod to pod communication
gke-${cluster_name}-${random_char}-master : Rule for Master to talk to Nodes
gke-${cluster_name}-${random_char}-vms : Node to Node communication
random char: Random Character
3] Check firewall rules for denial of egress.
By default GCP creates a firewall rule of allowing all egress. If the you delete the rule or denies all egress, then you must configure a firewall rule that allows egress on the master CIDR block via tcp ports 443, 10250. Private Cluster Firewall Rules Private Cluster Firewall Rules documents how to obtain the master CIDR block.
-If you enable other GKE Add-Ons you may require adding additional egress firewall rules.
4] Check DNS Configuration for communication to Google APIs.
Leverage Kubelet logs to check for any curl failed request. Ex: Unable to resolve host or Connection Timeout during kubelet installation. There may be a chance that dns configuration is incorrect (ex resolve Private Google API's or hitting public google APIs). A dig command or looking at 'etc/resolv.conf' for dns servers should confirm where requests are being routed to.

Related

Unable to Bind Google Service Account to Kubernetes Service Account

I am trying to bind my Google Service Account (GSA) to my Kubernetes Service Account (KSA) so I can connect to my Cloud SQL database from the Google Kubernetes Engine (GKE). I am currently using the follow guide provided in Google's documentation (https://cloud.google.com/sql/docs/sqlserver/connect-kubernetes-engine).
Currently I have a cluster running on GKE named MY_CLUSTER, a GSA with the correct Cloud SQL permissions named MY_GCP_SERVICE_ACCOUNT#PROJECT_ID.iam.gserviceaccount.com, and a KSA named MY_K8S_SERVICE_ACCOUNT. I am trying to bind the two accounts using the following command.
gcloud iam service-accounts add-iam-policy-binding \
--member "serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/MY_K8S_SERVICE_ACCOUNT]" \
--role roles/iam.workloadIdentityUser \
MY_GCP_SERVICE_ACCOUNT#PROJECT_ID.iam.gserviceaccount.com
However when I run the previous command I get the following error message.
ERROR: Policy modification failed. For a binding with condition, run "gcloud alpha iam policies lint-condition" to identify issues in condition.
ERROR: (gcloud.iam.service-accounts.add-iam-policy-binding) INVALID_ARGUMENT: Identity Pool does not exist (PROJECT_ID.svc.id.goog). Please check that you specified a valid resource name as returned in the `name` attribute in the configuration API.
Why am I getting this error when I try to bind my GSA to my KSA?
In order to bind your Google Service Account (GSA) to you Kubernetes Service Account (KSA) you need to enable Workload Identity on the cluster. This is explained in more details in Google's documentation (https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity).
To enable Workload Identity on an existing cluster you can run.
gcloud container clusters update MY_CLUSTER \
--workload-pool=PROJECT_ID.svc.id.goog

How to create a Kubernetes cluster from within GitLab CI for e2e testing?

I want to create a Google Cloud Kubernetes cluster programmatically in a GitLab CI script .gitlab-ci.yml in order to do e2e testing against it. Therefore I
created a separate project project-e2e (in order to separate the billing)
enabled Kubernetes Engine API
created a service account gitlab-ci#project-e2e.iam.gserviceaccount.com with a key in JSON format which I'm providing through CI variables and using as shown below
made the service account App Engine Admin, Compute Admin, Kubernetes Engine Admin, Kubernetes Engine Cluster Admin, Editor, Service Account User and Owner following permission-role mappings described at https://cloud.google.com/kubernetes-engine/docs/reference/api-permissions and https://cloud.google.com/compute/docs/access/iam
The script however fails due to missing permissions of the created service account which should be covered by the assigned roles by as far as I understand (the output contains the commands the stage in the CI script consists of):
$ echo "$GOOGLE_KEY" > key.json
$ gcloud config set project project-e2e
Updated property [core/project].
$ gcloud auth activate-service-account --key-file key.json --project project-e2e
Activated service account credentials for: [gitlab-ci#project-e2e.iam.gserviceaccount.com]
$ gcloud config set compute/zone us-central1-a
Updated property [compute/zone].
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
$ gcloud container clusters create project-e2e-$CI_COMMIT_SHORT_SHA --project project-e2e --service-account=gitlab-ci#project-e2e.iam.gserviceaccount.com
WARNING: In June 2019, node auto-upgrade will be enabled by default for newly created clusters and node pools. To disable it, use the `--no-enable-autoupgrade` flag.
WARNING: Starting in 1.12, new clusters will have basic authentication disabled by default. Basic authentication can be enabled (or disabled) manually using the `--[no-]enable-basic-auth` flag.
WARNING: Starting in 1.12, new clusters will not have a client certificate issued. You can manually enable (or disable) the issuance of the client certificate using the `--[no-]issue-client-certificate` flag.
WARNING: Currently VPC-native is not the default mode during cluster creation. In the future, this will become the default mode and can be disabled using `--no-enable-ip-alias` flag. Use `--[no-]enable-ip-alias` flag to suppress this warning.
WARNING: Starting in 1.12, default node pools in new clusters will have their legacy Compute Engine instance metadata endpoints disabled by default. To create a cluster with legacy instance metadata endpoints disabled in the default node pool, run `clusters create` with the flag `--metadata disable-legacy-endpoints=true`.
WARNING: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s).
This will enable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.clusters.create) ResponseError: code=403, message=Required "container.clusters.create" permission(s) for "projects/project-e2e". See https://cloud.google.com/kubernetes-engine/docs/troubleshooting#gke_service_account_deleted for more info.
I tried
omitting --service-account=gitlab-ci#project-e2e.iam.gserviceaccount.com which has no effect
finding answers in https://cloud.google.com/kubernetes-engine/docs/troubleshooting#gke_service_account_deleted which seems unrelated
adding and omitting --no-enable-legacy-authorization which has no effect

RouteController failed to create a route on GKE

I have a cluster on GKE whose node pool I create when I want to use the cluster, and delete when I'm done with it.
It's a two node cluster with the master in europe-west2-a and with and whose node zones are europe-west2-a and europe-west2-b.
The most recent creation resulted in the node in zone B failing with NetworkUnavailable because RouteController failed to create a route. The reason was because Could not create route xxx 10.244.1.0/24 for node xxx after 342.263706ms: instance not found.
Why would this be happening all of a sudden, and what can I do to fix it?!
You didn't mention which version of GKE you are using so just for clarification:
Changes in access scopes
Beginning with Kubernetes version 1.10, gcloud and GCP Console no longer grants the compute-rw access scope on new clusters and new node pools by default. Furthermore, if --scopes is specified in gcloud container create, gcloud no longer silently adds compute-rw or storage-ro.
In any case you can still revert to legacy access scopes but this is not recommended approach.
Hope this help.
With gke 1.13.6-gke.13, some of the default scopes were changed, including the compute-rw scope being removed. I think that due to the age of the cluster, this scope was necessary for a route to be correctly created between nodes in a node pool.
In the end, my gcloud creation command had these scopes:
--scopes https://www.googleapis.com/auth/projecthosting,storage-rw,monitoring,trace,compute-rw

K8S dashboard not accessible after first cluster in GKE - GCP using console

Newbie setup :
Created First project in GCP
Created cluster with default, 3 nodes. Node version 1.7.6. cluster master version 1.7.6-gke.1.
Deployed aan application in a pod, per example.
Able to access "hello world" and the hostname, using the external-ip and the port.
In GCP / GKE webpage of my cloud console, clicked "discovery and loadbalancing", I was able to see the "kubernetes-dashboard" process in green-tick, but cannot access throught the IP listed. tried 8001,9090, /ui and nothing worked.
not using any cloud shell or gcloud commands on my local laptop. Everything is done on console.
Questions :
How can anyone access the kubernetes-dashboard of the cluster created in console?
docs are unclear, are the dashboard components incorporated in the console itself? Are the docs out of sync with GCP-GKE screens?
tutorial says run "kubectl proxy" and then to open
"http://localhost:8001/ui", but it doesnt work, why?
If you create a cluster with with version 1.9.x or greater, then u can access using tokens.
get secret.
kubectl -n kube-system describe secrets `kubectl -n kube-system get secrets | awk '/clusterrole-aggregation-controller/ {print $1}'` | awk '/token:/ {print $2}'
Copy secret.
kubectl proxy.
Open UI using 127.0.0.1:8001/ui. This will redirect to login page.
there will be two options to login, kubeconfig and token.
Select token and paste the secret copied earlier.
hope this helps
It seems to be an issue with the internal Kubernetes DNS service starting at version 1.7.6 on Google Cloud.
The solution is to access the dashboard at this endpoint instead:
http://localhost:8001/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Github Issue links:
https://github.com/kubernetes/dashboard/issues/2368
https://github.com/kubernetes/kubernetes/issues/52729
The address of the dashboard service is only accessible from inside of the cluster. If you ssh into a node in your cluster, you should be able to connect to the dashboard. You can verify this by noticing that the address is within the services CIDR range for your cluster.
The dashboard in running as a pod inside of your cluster with an associated service. If you open the Workloads view you will see the kubernetes-dashboard deployment and can see the pod that was created by the deployment. I'm not sure which docs you are referring to, since you didn't provide a link.
When you run kubectl proxy it creates a secure connection from your local machine into your cluster. It works by connecting to your master and then running through a proxy on the master to the pod/service/host that you are connecting to via an ssh tunnel. It's possible that it isn't working because the ssh tunnels are not running; you should verify that your project has newly created ssh rules allowing access from the cluster endpoint IP address. Otherwise, if you could explain more about how it fails, that would be useful for debugging.
First :
gcloud container clusters get-credentials cluster-1 --zone my-zone --project my-project
Then find your kubernetes dashboard endpoint doing :
kubectl cluster-info
It will be something like https://42.42.42.42/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy
Install kube-dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
Run:
$ kubectl proxy
Access:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login

openshift does not honor the USER directive in Dockerfile

I'm new to Openshift/k8s. The docker image I'm running in openshift is using USER blabla. But when I exec into the pod, it use a different rather than the one in Dockerfile.
I'm wondering why? and how can I work around this ?
Thanks
For security, cluster administrators have the option to force containers to run with cluster assigned uids. By default, most containers run using a uid from a range assigned to the project.
This is controlled by the configured SecurityContextConstraints.
To allow containers to run as the user declared in their dockerfile (even though this can expose the cluster, security-wise), allow the pod's service account access to the anyuid SecurityContextConstraint (oadm policy add-scc-to-user anyuid system:serviceaccount:<your ns>:<your service account>