Multiple K8s Issues relating to Webhooks, Logs & I/O Timeouts - kubernetes

I have a really weird issue with one of my Linode K8s clusters running 1.23, there are multiple issues occuring and I can't quite pinpoint the root cause.
Linode have let me know it is not a issue with the master and nothing on there end, let me highlight all the identified problems to start.
Logs not Working
When trying to pull logs from any pods I get this error (which makes it very hard to troubleshoot)
root#aidan:~# kubectl logs <pod-name> -n revwhois-subdomain-enum
Error from server: Get "https://192.168.150.102:10250/containerLogs/revwhois-subdomain-enum/tldbrr-revwhois-worker12-twppv/tldbrr-revwhois-worker12": dial tcp 192.168.150.102:10250: i/o timeout
Metrics not Working
root#m0chan:~# kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Pod Deletion not Working
When deleting a pod with kubectl delete pod <pod-name> - <namespace> - it will delete the pod however it is stuck in a terminating state, the old pod is not deleted and anew pod is not launched.
Errors Editing Ingress
Error from server (InternalError): error when creating "yaml/xxx/xxx-ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: Temporary Redirect
I also have errors on Metrics logs and Cert-Manager logs relating to failed calling webhook
This is all for now and I would really appreciate some help resolving this.
Aidan

Related

kubectl exec fails intermittently with error: unable to upgrade connection: Authorization error

I am just trying to do kubectl exec to one of my pods. When I see the pod status, it is all fine. My worker nodes are also in a good state. But when I try 'kubectl exec', it fails intermittently with the below error. Not able to understand why is this happening. We guessed it might be because of worker nodes, so we deployed all new worker nodes freshly. But still, see the issue.
error: unable to upgrade connection: Authorization error (user=cluster_admin, verb=create, resource=nodes, subresource=proxy)
Any help is much appreciated. Thanks
Looks like it's not an issue of the Kubectl exec
it's failing due to Authorization issue.
Verify first access
kubectl auth can-i create pods/exec
yes
kubectl auth can-i get pods/exec
yes
If you have admin access with kubectl and getting output of commands, which mean you kubectl to API server connection is good.
It could be failing at kubelet level, as kubelet might be configured to Auth the all requests and API server is not providing the details.
You can read more about it at : https://kubernetes.io/docs/reference/access-authn-authz/kubelet-authn-authz/#kubelet-authentication
API server uses the --kubelet-client-certificate and --kubelet-client-key flags for auth

My Pods getting SIGTERM and exited gracefully as part of signalhandler but unable to find root cause of why SIGTERM sent from kubelet to my pods?

My Pods getting SIGTERM automatically for unknown reason. Unable to find root cause of why SIGTERM sent from kubelet to my pods is the concern to fix issue.
When I ran kubectl describe podname -n namespace, under events section Only killing event is present. I didn't see any unhealthy status before kill event.
Is there any way to debug further with events of pods or any specific log files where we can find trace of reason for sending SIGTERM?
I tried to do kubectl describe on events(killing)but it seems no such command to drill down events further.
Any other approach to debug this issue is appreciated.Thanks in advance!
kubectl desribe pods snippet
Please can you share the yaml of your deployment so we can try to replicate your problem.
Based on your attached screenshot, it looks like your readiness probe failed to complete repeatedly (it didn't run and fail, it failed to complete entirely), and therefore the cluster killed it.
Without knowing what your docker image is doing makes it hard to debug from here.
As a first point of debugging, you can try doing kubectl logs -f -n {namespace} {pod-name} to see what the pod is doing and seeing if it's erroring there.
The error Client.Timeout exceeded while waiting for headers implies your container is proxying something? So perhaps what you're trying to proxy upstream isn't responding.

EKS kubectl logs <podname> suddenly stop working

I have pods running on eks, and pulling the container logs worked fine couple days ago. but today when i tried to run kubectl logs podname i get a tls error.
Error from server: Get "https://host:10250/containerLogs/dev/pod-748b649458-bczdq/server": remote error: tls: internal error
does anyone know how to fix this? the other answers on stackoverflow seems to suggest deleting the kubenetes cluster and rebuilding it...... is there no better solutions?
This could probably due to some firewall rules or security settings that were introduced. I would encourage you to check it along with the following troubleshooting steps -
Ensure all EKS nodes are in running state.
Restart nodes as required
Checking networking configuration and see if other kubectl commands are running.

Why would kubectl logs return Authorization error?

I am trying to get logs from a pod that is running using kubectl logs grafana-6bfd846fbd-nbv8r
and I am getting the following output:
Error from server (InternalError): Internal error occurred: Authorization error (user=kube-apiserver, verb=get, resource=nodes, subresource=proxy)
I tried to figure why I would not have this specific authorisation even though I can manage everything with this user, no clue. The weirdest is that when I run kubectl auth can-i get pod/logs I get:
yes
After a few hours of going through ClusterRoles and ClusterRoleBindings, I am getting stuck and do know what to do to be authorized. Thanks for your help!
The failure is kube-apiserver trying to access the kubelet, not related to your user. This indicates your core system RBAC rules might be corrupted, check if your installer or K8s distro has a way to validate or repair them (most don't) or make a new cluster and compare them to that.

How to find the pod that led to an error in GKE

If I look at my logs in GCP logs, I see for instance that I got a request that gave 500
log_message: "Method: some_cloud_goo.Endpoint failed: INTERNAL_SERVER_ERROR"
I would like to quickly go to that pod and do a kubectl logs on it. But I did not find a way to do this.
I am fairly new to k8s and GKE, any way to traceback the pod that handled that request?
You could run command "kubectl get pods " on each node to check the status of all pods and could figure out accordingly by running for detail description of an error " kubectl describe pod pod-name"
As mentioned in #Neelam answer, you can can get the pod names with the command kubectl get pods -A and log into all your pods to find the error.
Or, alternatively, you could deploy a custom monitoring system like Elastic GKE Logging available in GCP github Click-to-deploy.
See here to install from MarketPlace with few clicks.
It is a free alternative to have a complete monitoring system and you can filter your logs in Kibana dashboard after deployed.