EKS kubectl logs <podname> suddenly stop working - kubernetes

I have pods running on eks, and pulling the container logs worked fine couple days ago. but today when i tried to run kubectl logs podname i get a tls error.
Error from server: Get "https://host:10250/containerLogs/dev/pod-748b649458-bczdq/server": remote error: tls: internal error
does anyone know how to fix this? the other answers on stackoverflow seems to suggest deleting the kubenetes cluster and rebuilding it...... is there no better solutions?

This could probably due to some firewall rules or security settings that were introduced. I would encourage you to check it along with the following troubleshooting steps -
Ensure all EKS nodes are in running state.
Restart nodes as required
Checking networking configuration and see if other kubectl commands are running.

Related

Why kubectl give unable to list cronjob error while trying to list pods?

I'm having a weird issue at times. When trying to list pods in a Kubernetes cluster, it gives me the same exact error, which has nothing to do with cronjobs. This issue get fixed if I restart the terminal (sometimes I have to restart the computer). When I'm having this issue, I checked with other computers, they don't have any issues. I believe something wrong on my end. Does anyone have any idea why I end up having this issue?
➜ ~ kubectl get pods
Error from server (NotFound): Unable to list "tap.linkerd.io/v1alpha1, Resource=cronjobs": the server could not find the requested resource (get cronjobs.tap.linkerd.io)
Edit:
I can list deployments, cronjobs without any issues. This is happening only when I do get pods. Also it gets fixed by itself if I wait some time.
This may be an indication of version mismatch between your kubectl client and your server, not anything specific to Linkerd. You can confirm with kubectl version --short whether or not that is the case.

deployed a service on k8s but not showing any pods weven when it failed

I have deployed a k8s service, however its not showing any pods. This is what I see
kubectl get deployments
It should create on the default namespace
kubectl get nodes (this shows me nothing)
How do I troubleshoot a failed deployment. The test-control-plane is the one deployed by kind this is the k8s one I'm using.
kubectl get nodes
If above command is not showing anything which mean there is no Nodes in your cluster so where your workload will run ?
You need to have at least one worker node in K8s cluster so deployment can schedule the POD on it and run the application.
You can check worker node using same command
kubectl get nodes
You can debug more and check the reason of issue further using
kubectl describe deployment <name of your deployment>
To find out what really went wrong, first follow the steps described in Harsh Manvar in his answer. Perhaps obtaining that information can help you find the problem. If not, check the logs of your deployment. Try to list your pods and see which ones did not boot properly, then check their logs.
You can also use the kubectl describe on pods to see in more detail what went wrong. Since you are using kind, I include a list of known errors for you.
You can also see this visual guide on troubleshooting Kubernetes deployments and 5 Tips for Troubleshooting Kubernetes Deployments.

Cert-manager fails on kubernetes with webhooks

I'm following the Kubernetes install instructions for Helm: https://docs.cert-manager.io/en/latest/getting-started/install/kubernetes.html
With Cert-manager v0.81 on K8 v1.15, Ubuntu 18.04 on-premise.
When I get to testing the installation, I get these errors:
error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "issuers.admission.certmanager.k8s.io": the server is currently unable to handle the request
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request
If I apply the test-resources.yaml before installing with Helm, I'm not getting the errors but it is still not working.
These errors are new to me, as Cert-manager used to work for me on my previous install about a month ago, following the same installation instructions.
I've tried with Cert-Manager 0.72(CRD 0.7) as well as I think that was the last version I managed to get installed but its not working either.
What does these errors mean?
Update: It turned out to be an internal CoreDNS issue on my cluster. Somehow not being configured correctly. Possible related to wrong POD_CIDR configuration.
If you experience this problem, check the logs of CoreDNS(Or KubeDNS) and you may see lots of errors related to contacting services. Unfortunately, I no longer have the errors.
But this is how I figured out that my network setup was invalid.
I'm using Calico(Will apply for other networks as well) and its network was not set to the same as the POD_CIDR network that I initialized my Kubernetes with.
Example
1. Set up K8:
kubeadm init --pod-network-cidr=10.244.0.0/16
Configure Calico.yaml:
- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"
I also tried cert-manager v0.8.0 a very similar setup on Ubuntu 18.04 and k8s v1.14.1 and I began to get the same error when i tore down cert-manager using kubectl delete and reinstalled it, after experiencing some network issues on the cluster.
I stumbled on a solution that worked. On the master node, simply restart the apiserver container:
$ sudo docker ps -a | grep apiserver
af99f816c7ec gcr.io/google_containers/kube-apiserver#sha256:53b987e5a2932bdaff88497081b488e3b56af5b6a14891895b08703129477d85 "/bin/sh -c '/usr/loc" 15 months ago Up 19 hours k8s_kube-apiserver_kube-apiserver-ip-xxxxxc_0
40f3a18050c3 gcr.io/google_containers/pause-amd64:3.0 "/pause" 15 months ago Up 15 months k8s_POD_kube-apiserver-ip-xxxc_0
$ sudo docker restart af99f816c7ec
af99f816c7ec
$
Then try applying the test-resources.yaml again:
$ kubectl apply -f test-resources.yaml
namespace/cert-manager-test unchanged
issuer.certmanager.k8s.io/test-selfsigned created
certificate.certmanager.k8s.io/selfsigned-cert created
If that does not work, this github issue mentions that the master node might need firewall rules to be able to reach the cert-manager-webhook pod. The exact steps to do so will depend on which cloud platform you are on.

Recovery from kubectl crash

What is the best way to troubleshoot when kubectl doesn't responde or exit with timeout? How to get it work again?
I'm having my kubectl as well as helm on my cluster down when installing a helm chart.
General advice:
Check if your kubectl is connecting to the correct kube-api endpoint. You could take a look at your kubeconfig. It is by default stored in $HOME/.kube. Try simple CURL to make sure that it is not DNS problem, etc.
Take a look at your nodes' logs by ssh into the nodes that you have: see this for more details instructions and log locations.
Once you have more information, you could get yourself started in the investigation of problems.

Unable to resolve hostname using `kubectl logs` or `kubectl exec`

I've created a Kubernetes cluster using CoreOS on AWS and I'm having trouble communicating with nodes from the master.
For example, operations like kubectl exec or kubectl logs fail an error similar to the following:
Error from server: dial tcp: lookup ip-XXX-X-XXX-XXX.eu-west-1.compute.internal: no such host
I've found some issues on Github that describe the problem so I know the team knows about this bug, but I would like to ask here if its possible to use some workaround until it gets addressed somehow.
One workaround mentioned was to use the --hostname-override flag but as I'm on AWS, this flag is ignored (see #22984)
Related issues on GitHub: #22770 #22063.
Have you made sure you're using the right context?
kubectl config use-context my-cluster-name