GKE autopilot cluster creation failure - kubernetes

I am trying to create composer environment using terraform in GCP and i could see that its getting failed in one of the project while creating the Kubernetes cluster in autopilot mode; its working fine in other 2 projects where we deployed in the same way.
So i tried to create autopilot kubernetes cluster in manual way as well and we are not able to track what is the issue with it as it shows the below error alone:
k8s cluster creation error
Error while trying it from command line:
gcloud container clusters create-auto test \
--region europe-west2 \
--project=project-id
Note: The Pod address range limits the maximum size of the cluster. Please refer to https://cloud.google.com/kubernetes-engine/docs/how-to/flexible-pod-cidr to learn how to optimize IP address allocation.
Creating cluster test in europe-west2... Cluster is being deployed...done.
ERROR: (gcloud.container.clusters.create-auto) Operation [<Operation
clusterConditions: [<StatusCondition
canonicalCode: CanonicalCodeValueValuesEnum(UNKNOWN, 2)
message: 'Failed to create cluster'>]
detail: 'Failed to create cluster'
endTime: '2022-05-31T20:00:07.8398558Z'
error: <Status
code: 2
details: []
message: 'Failed to create cluster'>
name: 'operation-1654027061293-a14298fa'
nodepoolConditions: []
operationType: OperationTypeValueValuesEnum(CREATE_CLUSTER, 1)
progress: <OperationProgress
metrics: [<Metric
intValue: 12
name: 'CLUSTER_CONFIGURING'>, <Metric
intValue: 12
name: 'CLUSTER_CONFIGURING_TOTAL'>, <Metric
intValue: 9
name: 'CLUSTER_DEPLOYING'>, <Metric
intValue: 9
name: 'CLUSTER_DEPLOYING_TOTAL'>]
stages: []>
selfLink: 'https://container.googleapis.com/v1/projects/projectid/locations/europe-west2/operations/operation-1654027061293-a14298fa'
startTime: '2022-05-31T19:57:41.293067757Z'
status: StatusValueValuesEnum(DONE, 3)
statusMessage: 'Failed to create cluster'
targetLink: 'https://container.googleapis.com/v1/projects/projectid/locations/europe-west2/clusters/test'
zone: 'europe-west2'>] finished with error: Failed to create cluster

Service account “service-xxxxxxxx#container-engine-robot.iam.gserviceaccount.com” needs the role Kubernetes Engine Service Agent (roles/container.serviceAgent) which cased the k8s cluster creation to fail; after granting the permissions, we were able to create clusters

Related

Google Kubernetes Engine: Failed to create an ingress error

I tried to create an ingress with hello world app here: https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/tree/main/hello-app. But I received an error: Failed to create an ingress. Ingress.extensions "hello-world-ingress" is invalid: spec: Invalid value: []networking.IngressRule(nil): either defaultBackend or rules must be specified".
GKE Failed to create an ingress screenshot
But last two weeks, I can create an ingress with this app and It worked perfectly.

coredns not deploying in new EKS cluster?

I'm deploying an AWS EKS cluster in Fargate (no EC2 nodes) using an existing VPC with both public and private subnets, and am able to create the cluster successfully with eksctl. However, I see that the coredns Deployment is stuck at 0/2 Pods ready in the EKS console. I was reading that I need to enable port 53 in my security group rules, and I have. Here's my config file.
$ eksctl create cluster -f eks-sandbox-cluster.yaml
eks-sandbox-cluster.yaml
------------------------
kind: ClusterConfig
apiVersion: eksctl.io/v1alpha5
metadata:
name: sandbox
region: us-east-1
version: "1.18"
# The VPC and subnets are for the data plane, where the pods will
# ultimately be deployed.
vpc:
id: "vpc-12345678"
clusterEndpoints:
privateAccess: true
publicAccess: false
subnets:
# us-east-1a is full
private:
us-east-1b:
id: "subnet-xxxxxxxx"
us-east-1c:
id: "subnet-yyyyyyy"
public:
us-east-1b:
id: "subnet-aaaaaaaa"
us-east-1c:
id: "subnet-bbbbbbbb"
fargateProfiles:
- name: fp-default
selectors:
- namespace: default
- name: fp-kube
- namespace: kube-system
- name: fp-myapps
selectors:
- namespace: myapp
labels:
app: myapp
cloudWatch:
clusterLogging:
enableTypes: ["api", "audit", "authenticator", "controllerManager", "scheduler"]
Why is coredns Deployment not coming up?
I do see this in the kube-scheduler CloudWatch logs.
I0216 16:46:43.841076 1 factory.go:459] Unable to schedule kube-system/coredns-c79dcb98c-9pfrz: no nodes are registered to the cluster; waiting
I think because of this I can't talk to my cluster either via kubectl?
$ kubectl get pods
Unable to connect to the server: dial tcp 10.23.x.x:443: i/o timeout
When I deployed the EKS cluster using a config file, using our existing VPC with private only endpoits, the coredns Deployment was set to start on EC2 nodes. Of course with Fargate there are no EC2 nodes. I had to edit the coredns Deployment to use fargate and restart the Deployment.

GKE: Service account for Config Connector lacks permissions

I'm attempting to get Config Connector up and running on my GKE project and am following this getting started guide.
So far I have enabled the appropriate APIs:
> gcloud services enable cloudresourcemanager.googleapis.com
Created my service account and added policy binding:
> gcloud iam service-accounts create cnrm-system
> gcloud iam service-accounts add-iam-policy-binding ncnrm-system#test-connector.iam.gserviceaccount.com --member="serviceAccount:test-connector.svc.id.goog[cnrm-system/cnrm-controller-manager]" --role="roles/iam.workloadIdentityUser"
> kubectl wait -n cnrm-system --for=condition=Ready pod --all
Annotated my namespace:
> kubectl annotate namespace default cnrm.cloud.google.com/project-id=test-connector
And then run through trying to apply the Spanner yaml in the example:
~ >>> kubectl describe spannerinstance spannerinstance-sample
Name: spannerinstance-sample
Namespace: default
Labels: label-one=value-one
Annotations: cnrm.cloud.google.com/management-conflict-prevention-policy: resource
cnrm.cloud.google.com/project-id: test-connector
API Version: spanner.cnrm.cloud.google.com/v1beta1
Kind: SpannerInstance
Metadata:
Creation Timestamp: 2020-09-18T18:44:41Z
Generation: 2
Resource Version: 5805305
Self Link: /apis/spanner.cnrm.cloud.google.com/v1beta1/namespaces/default/spannerinstances/spannerinstance-sample
UID:
Spec:
Config: northamerica-northeast1-a
Display Name: Spanner Instance Sample
Num Nodes: 1
Status:
Conditions:
Last Transition Time: 2020-09-18T18:44:41Z
Message: Update call failed: error fetching live state: error reading underlying resource: Error when reading or editing SpannerInstance "test-connector/spannerinstance-sample": googleapi: Error 403: Request had insufficient authentication scopes.
Reason: UpdateFailed
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning UpdateFailed 6m41s spannerinstance-controller Update call failed: error fetching live state: error reading underlying resource: Error when reading or editing SpannerInstance "test-connector/spannerinstance-sample": googleapi: Error 403: Request had insufficient authentication scopes.
I'm not really sure what's going on here, because my cnrm service account has ownership of the project my cluster is in, and I have the APIs listed in the guide enabled.
The CC pods themselves appear to be healthy:
~ >>> kubectl wait -n cnrm-system --for=condition=Ready pod --all
pod/cnrm-controller-manager-0 condition met
pod/cnrm-deletiondefender-0 condition met
pod/cnrm-resource-stats-recorder-58cb6c9fc-lf9nt condition met
pod/cnrm-webhook-manager-7658bbb9-kxp4g condition met
Any insight in to this would be greatly appreciated!
By the error message you have posted, I should supposed that it might be an error in your GKE scopes.
To GKE access others GCP APIs you must allow this access when creating the cluster. You can check the enabled scopes with the command:
gcloud container clusters describe <cluster-name> and find in the result for oauthScopes.
Here you can see the scope's name for Cloud Spanner, you must enable the scope https://www.googleapis.com/auth/cloud-platform as minimum permission.
To verify in the GUI, you can see the permission in: Kubernetes Engine > <Cluster-name> > expand the section permissions and find for Cloud Platform

Kubernetes error building cluster, utility subnet can't be found

Why is it that when I try to update a new Kubernetes cluster it gives the following error:
$ kops update cluster --name k8s-web-dev
error building tasks: could not find utility subnet in zone: "us-east-1b"
I have not been able to deploy it into aws yet. It only creates configs inside s3.
Also because I have private and public subnets I am updating manually k8s config to point to correct subnet-ids. e.g: The ids were added manually.
subnets:
- cidr: 10.0.0.0/19
id: subnet-3724bb40
name: us-east-1b
type: Private
zone: us-east-1b
- cidr: 10.0.64.0/19
id: subnet-918a35c8
name: us-east-1c
type: Private
zone: us-east-1c
- cidr: 10.0.32.0/20
id: subnet-4824bb3f
name: utility-us-east-1b
type: Public
zone: us-east-1b
- cidr: 10.0.96.0/20
id: subnet-908a35c9
name: utility-us-east-1c
type: Public
zone: us-east-1c
Also interestingly enough I did no change in my config. But when I run the kops update once and then once more I get two different results. How is that possible?
kops update cluster --name $n
error building tasks: could not find utility subnet in zone: "us-east-1c"
and then this
kops update cluster --name $n
error building tasks: could not find utility subnet in zone: "us-east-1b"
Using --bastion parameter within kops command line options assumes that bastion instance group is already in place. To create bastion instance group you can use --role flag:
kops create instancegroup bastions --role Bastion --subnet $SUBNET
Check this link for more information.

Pulling Images from GCR into GKE

Today is my first day playing with GCR and GKE. So apologies if my question sounds childish.
So I have created a new registry in GCR. It is private. Using this documentation, I got hold of my Access Token using the command
gcloud auth print-access-token
#<MY-ACCESS_TOKEN>
I know that my username is oauth2accesstoken
On my local laptop when I try
docker login https://eu.gcr.io/v2
Username: oauth2accesstoken
Password: <MY-ACCESS_TOKEN>
I get:
Login Successful
So now its time to create a docker-registry secret in Kubernetes.
I ran the below command:
kubectl create secret docker-registry eu-gcr-io-registry --docker-server='https://eu.gcr.io/v2' --docker-username='oauth2accesstoken' --docker-password='<MY-ACCESS_TOKEN>' --docker-email='<MY_EMAIL>'
And then my Pod definition looks like:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: my-app
image: eu.gcr.io/<my-gcp-project>/<repo>/<my-app>:latest
ports:
- containerPort: 8090
imagePullSecrets:
- name: eu-gcr-io-registry
But when I spin up the pod, I get the ERROR:
Warning Failed 4m (x4 over 6m) kubelet, node-3 Failed to pull image "eu.gcr.io/<my-gcp-project>/<repo>/<my-app>:latest": rpc error: code = Unknown desc = Error response from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
I verified my secrets checking the YAML file and doing a base64 --decode on the .dockerconfigjson and it is correct.
So what have I missed here ?
If your GKE cluster & GCR registry are in the same project: You don't need to configure authentication. GKE clusters are authorized to pull from private GCR registries in the same project with no config. (Very likely you're this!)
If your GKE cluster & GCR registry are in different GCP projects: Follow these instructions to give "service account" of your GKE cluster access to read private images in your GCR cluster: https://cloud.google.com/container-registry/docs/access-control#granting_users_and_other_projects_access_to_a_registry
In a nutshell, this can be done by:
gsutil iam ch serviceAccount:[PROJECT_NUMBER]-compute#developer.gserviceaccount.com:objectViewer gs://[BUCKET_NAME]
where [BUCKET_NAME] is the GCS bucket storing your GCR images (like artifacts.[PROJECT-ID].appspot.com) and [PROJECT_NUMBER] is the numeric GCP project ID hosting your GKE cluster.