GCE Image suddenly not found - kubernetes

I'm running kubernetes on GCE. I used kube-up.sh to create the cluster and the nodes and masters are all running the image gci-stable-56-9000-84-2. I deleted a few nodes today which triggered the autoscaler to create new ones. But they failed with the following error.
Instance 'kubernetes-minion-30gb-20180131-9jwn' creation failed: The
resource
'projects/google-containers/global/images/gci-stable-56-9000-84-2' was
not found (when acting as 'REDACTED')
Is it possible this image was deleted somehow? I don't think I changed any access controls or permissions for any service accounts.
The image is present on this page:
https://cloud.google.com/container-optimized-os/docs/release-notes#cos-stable-56-9000-84-2

This error could be due to authentication issues. Re-authenticate to the gcloud command-line tool with command ‘gcloud auth login’
It could be as well that the Kubernetes Engine service account has been deleted or edited. Check this: https://cloud.google.com/kubernetes-engine/docs/troubleshooting#error_404

Related

Unable to enter a pod in the gke cluster

We have our k8s cluster set up with our app, including a neo4j DB deployment and other artifacts. Overnight, we've started facing an issue in our GKE cluster when trying to enter or interact somehow with any pod running in the cluster. The following screenshot shows a sample of the error we get.
issued command
error: unable to upgrade connection: Authorization error (user=kube-apiserver, verb=create, resource=nodes, subresource=proxy)
Our GKE cluster is created as standard (no autopilot) and the versions are
Node pool details
cluster basics
As said before it was working fine regardless of the warning about the versions. However, we haven't been able yet to identify what could have changed between the last time it worked, and now.
Any clue on what authorization setup might have been changed making it incompatible now is very welcomed

Missing edit permissions on a kubernetes cluster on GCP

This is a Google Cloud specific problem.
I returned from vacation and noticed I can no longer manage workloads or cluster due to this error: "Missing edit permissions on account"
I am a sole person with access to this account (owner role) and yet I see this issue.
The troubleshooting guide suggests checking system service account role, looks like it's set up correctly (why would it not if I haven't edited it):
If it's not set up correctly it suggests turning off/on the Kubernetes API on GCP, but when you press on "disable" there's a scary-looking prompt that your Kubernetes resources are going to be deleted, so obviously I can't do that.
Upon trying to connect to it I get
gcloud container clusters get-credentials cluster-1 --zone us-west1-b --project PROJECT_ID
Fetching cluster endpoint and auth data.
WARNING: cluster cluster-1 is not running. The kubernetes API may not be available.
In the logs I found a record (the last one) that is 4 days old:
"Readiness probe failed: Get http://10.20.0.5:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Anyone here has any ideas?
Thanks in advance.
The issue is solved,
I had to upgrade node versions in the pool.
What a misleading error message.
Hopefully, this helps someone.

How to enable Kubernetes API in GCP? not sorted out here by following the doc

I am learning GCP and wanted to create a Kubernetes cluster with instance, here is what I did and what I followed with no success:
First set the region to my default us-east1-b:
xenonxie#cloudshell:~ (rock-perception-263016)$ gcloud config set compute/region us-east1-b
Updated property [compute/region].
Now proceed to create it:
xenonxie#cloudshell:~ (rock-perception-263016)$ gcloud container clusters create my-first-cluster --num-nodes 1
ERROR: (gcloud.container.clusters.create) One of [--zone, --region]
must be supplied: Please specify location.
So it seems default region/zone us-east1-b is NOT picked up
I then run the same command again with region specified explicitly:
xenonxie#cloudshell:~ (rock-perception-263016)$ gcloud container clusters create my-first-cluster --num-nodes 1 --zone us-east1-b
WARNING: Currently VPC-native is not the default mode during cluster
creation. In the future, this will become the default mode and can be
disabled using --no-enable-ip-alias flag. Use
--[no-]enable-ip-alias flag to suppress this warning. WARNING: Newly
created clusters and node-pools will have node auto-upgrade enabled by
default. This can be disabled using the --no-enable-autoupgrade
flag. WARNING: Starting in 1.12, default node pools in new clusters
will have their legacy Compute Engine instance metadata endpoints
disabled by default. To create a cluster with legacy instance metadata
endpoints disabled in the default node pool,run clusters create with
the flag --metadata disable-legacy-endpoints=true. WARNING: Your Pod
address range (--cluster-ipv4-cidr) can accommodate at most 1008
node(s). This will enable the autorepair feature for nodes. Please see
https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for
more information on node autorepairs. ERROR:
(gcloud.container.clusters.create) ResponseError: code=403,
message=Kubernetes Engine API is not enabled for this project. Please
ensure it is enabled in Google Cloud Console and try again: visit
https://console.cloud.google.com/apis/api/container.googleapis.com/overview?project=rock-perception-263016
to do so.
From the warning/error it seems I need to enable Kubernetes API, and a link is provided to me already, wonderful, I then clicked the link and it took me to enable it, which I did, right after I enabled it, I was prompt to create credential before I can use the API.
Clicking into it and choosing the right API, as you can see from the screenshot, it doesn't give me a button to create the credential:
What is missing here?
Thank you very much.
Once the API is created, you can go ahead and create the cluster. The credentials are not used when you use gcloud since the SDK will wrap the API call and use your logged-in user credentials.
As long as the Kubernetes Engine API shows as enabled, you should be able to run the same command you used and the cluster will be created. Most of those are just warnings letting you know about default settings that you did not specify

kubernetes provisioner for pv in a statefulset with aws-ebs pv issue

Have followed the documentation on how to setup k8s on aws including
Add the provider=aws
Make sure the Nodes have correct IAM permissions
Keep getting the following and I am unsure of where to find logs to see the underlying error that is making the AWS query fail.
This is how error looks:
Failed to provision volume with StorageClass "gp2": error querying for all zones: no instances returned
I faced the same issue and found the solution.
I hope this applies to your issue as well.
So every EC2 instance that is a node in your kubernetes cluster should have a tag
kubernetes.io/cluster/CLUSTERNAME = owned
When you request to create a new persistentstoragevolume kubernetes will request this from AWS. AWS will then check in which AZs you have worked nodes so it doesn't create the volume in a AZ where there are no nodes.
It seem to be doing this by listing all EC2 instances with the tag kubernetes.io/cluster/CLUSTERNAME = owned
But if you have changed or removed this tag, so that it no longer match you cluster name, you will get the exact error message you got here.
Lets say you changed it to
kubernetes.io/cluster/CLUSTERNAME-default = owned
That would trigger the issue.

kubernetes can't pull certain images from ibm cloud registry

My pod does the following:
Warning Failed 21m (x4 over 23m) kubelet, 10.76.199.35 Failed to pull image "registryname/image:version1.2": rpc error: code = Unknown desc = Error response from daemon: unauthorized: authentication required
but other images will work. The output of
ibmcloud cr images
doesn't show anything different about the images that don't work. What could be going wrong here?
Given this is in kubenetes and you can see the image in ibmcloud cr images it most likely going to be a misconfiguration of your imagePullSecrets.
If you do kubectl get pod <pod-name> -o yaml you will be able to see the what imagePullSecrets are in scope for the pod and check if it looks correct (could be worth comparing it to a pod that is working).
It's worth noting that if your cluster is an instance in the IBM Cloud Kubernetes Service a default imagePullSecret for your account is added to the default namespace and therefore if you are running the pod in a different Kubenetes namespace you will need to do additional steps to make that work. This is a good place to start for information on this topic.
https://console.bluemix.net/docs/containers/cs_images.html#other
Looks like you haven't logged into the IBM Cloud Container registry. If you haven't done this yet, You should login with this command
ibmcloud cr login
Other issues can be
Docker is not installed.
The Docker client is not logged in to IBM Cloud Container Registry.
Your IBM Cloud access token might have expired.
You can find more troubleshooting instructions here