Kubernetes cluster is suddenly down - kubernetes

Yesterday, my kubernetes cluster is suddenly down
I tried to investigate as the follows but not sure what the reason was:
Unable to access Kube Dashboard, it returns HTTP ERROR 502
Unable to access deployed apps on cluster, it also returns 502 error
Cannot use kubectl command, it shows the message: "Unable to connect
to the server: x509: certificate has expired or is not yet valid"
With this error, I googled and got the article.
But I'm not sure if this is correct or not.
Can you please help to advise.
Thank you so much.
Environment:
Kubernetes 1.5
Kube-aws

Related

kubectl command to GKE-Autopilot sometimes return forbidden error

env
GKE Autopilot v1.22.12-gke.2300
use kubectl command from ubuntu2004 VM
use gke-gcloud-auth-plugin
what happens
kubectl command sometimes return (Forbidden) error. e.g.)
kubectl get pod
Error from server (Forbidden): pods is forbidden: User "my-mail#domain.com" cannot list resource "pods" in API group "" in the namespace "default": GKEAutopilot authz: the request was sent before policy enforcement is enabled
It happens not always, so it must not be IAM problem. (it happens about 40%).
Before, I thinks it was GKE Autopilot v1.21.xxxx, this error didn't happen; at least not such frequently.
I couldn't find any helpful info even if I searched "GKEAutopilot authz", or "the request was sent before policy enforcement is enabled"
I wish if someone who faced to same issue has any idea.
Thank you in advance
I asked google cloud support.
They said it's bug on GKE master, and was fixed by them.
This problem doesn't happen anymore

Unable to enter a pod in the gke cluster

We have our k8s cluster set up with our app, including a neo4j DB deployment and other artifacts. Overnight, we've started facing an issue in our GKE cluster when trying to enter or interact somehow with any pod running in the cluster. The following screenshot shows a sample of the error we get.
issued command
error: unable to upgrade connection: Authorization error (user=kube-apiserver, verb=create, resource=nodes, subresource=proxy)
Our GKE cluster is created as standard (no autopilot) and the versions are
Node pool details
cluster basics
As said before it was working fine regardless of the warning about the versions. However, we haven't been able yet to identify what could have changed between the last time it worked, and now.
Any clue on what authorization setup might have been changed making it incompatible now is very welcomed

cert-manager no endpoints available for service "cert-manager-webhook"

I'm facing some issue even with the latest cert-manager (). I'm running on k8s v1.22 and the same chart was working as expected on v1.21
error:
Not ready: Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": no endpoints available for service "cert-manager-webhook"
Not ready: Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": no endpoints available for service "cert-manager-webhook"
This happens on the pod pod/cert-manager-startupapicheck-l4ccx started by job.batch/cert-manager-startupapicheck.
I am not sure why this is happening and how to fix this as it looks like to be a k8s issue rather than cert-manager.
Please can anyone point me to some documentation or some similar case as I was not able to find anything related to this. I read the documentation from cert-manager, all github issues I could find and this was not able to get this fixed.

Why would kubectl logs return Authorization error?

I am trying to get logs from a pod that is running using kubectl logs grafana-6bfd846fbd-nbv8r
and I am getting the following output:
Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver, verb=get, resource=nodes, subresource=proxy)
I tried to figure why I would not have this specific authorisation even though I can manage everything with this user, no clue. The weirdest is that when I run kubectl auth can-i get pod/logs I get:
yes
After a few hours of going through ClusterRoles and ClusterRoleBindings, I am getting stuck and do know what to do to be authorized. Thanks for your help!
The failure is kube-apiserver trying to access the kubelet, not related to your user. This indicates your core system RBAC rules might be corrupted, check if your installer or K8s distro has a way to validate or repair them (most don't) or make a new cluster and compare them to that.

Port-forwarding fails on k8s gcp tutorial

I’m k8s beginner, and struggling with below error.
E0117 18:24:47.596238 53015 portforward.go:400]
an error occurred forwarding 9999 -> 80: error forwarding port 80 to pod XXX,
uid : exit status 1: 2020/01/17 09:24:47 socat[840136] E connect(5, AF=2 127.0.0.1:80, 16): Connection refused
I don’t even know what the error stands for, needless to say for its cause. Does anyone know of which situation below error occurs?
This error is occuring while processing GCP's deployment manager tutorial according to tutorial project GCP provides.
https://github.com/GoogleCloudPlatform/deploymentmanager-samples/tree/master/examples/v2/gke
Error occurs when typing this command.
curl localhost:9999
Any ambiguous expression or extra information is required, please notify me.
Thanks in advance!
The error is telling you, that there's nothing listening to port 80 inside the pod. You should check the pod state:
kubectl get pods
It will also tell you which port(s) the pod (its containers) is listening to.
Maybe it has crashed. Also check the log of the pod:
kubectl logs <pod-name>
Btw. Google's Deployment Manager is a very special kind of a tool. Google itself suggests to use Terraform instead. It's nevertheless part of their certification exams.