K8s cluster deployment error: nc: bad address 'xx' - kubernetes

host#host:~$ kubectl logs kafka-0 -c init-zookeeper
nc: bad address 'zookeeper-0.zookeeper-headless-service.default.svc.cluster.local'
I have deployed an k8s cluster. When the application pod was installed, the pod keep in the Init state. I try to find out where goes wrong, only get this error below.
pml#pml:~/bfn-mon/k8s$ kubectl get pods
NAME READY STATUS RESTARTS AGE
broker-59f66ff494-lwtxq 0/1 Init:0/2 0 41m
coordinator-9998c64b8-ql7xz 0/1 Init:0/2 0 41m
kafka-0 0/1 Init:0/1 0 41m
host#host:~$ kubectl logs kafka-0 -c init-zookeeper
nc: bad address 'zookeeper-0.zookeeper-headless-service.default.svc.cluster.local'
Would someone can tell what's going wrong? How can I fix it?
I would expect someone who did have the same problem, or know what's going wrong, and give some debug instructions.

During Pod startup, the kubelet delays running init containers until the networking and storage are ready. Then the kubelet runs the Pod's init containers in the order they appear in the Pod's spec.
For pods stuck in an init state with a bad address, It means the PVC may not be recycled correctly so the storage is not ready so the pod will be init state until it gets cleared.
From this link, you can follow below solutions:
Check if PVs are created and bound to all expected PVCs.
Run /opt/kubernetes/bin/kube-restart.sh to restart the cluster.

Related

Ingress-nginx is in CrashLoopBackOff after K8s upgrade

After upgrading Kubernetes node pool from 1.21 to 1.22, ingress-nginx-controller pods started crashing. The same deployment has been working fine in EKS. I'm just having this issue in GKE. Does anyone have any ideas about the root cause?
$ kubectl logs ingress-nginx-controller-5744fc449d-8t2rq -c controller
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.3.1
Build: 92534fa2ae799b502882c8684db13a25cde68155
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.10
-------------------------------------------------------------------------------
W0219 21:23:08.194770 8 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0219 21:23:08.194995 8 main.go:209] "Creating API client" host="https://10.1.48.1:443"
Ingress pod events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned infra/ingress-nginx-controller-5744fc449d-8t2rq to gke-infra-nodep-ffe54a41-s7qx
Normal Pulling 27m kubelet Pulling image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974"
Normal Started 27m kubelet Started container controller
Normal Pulled 27m kubelet Successfully pulled image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" in 6.443361484s
Warning Unhealthy 26m (x6 over 26m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 502
Normal Killing 26m kubelet Container controller failed liveness probe, will be restarted
Normal Created 26m (x2 over 27m) kubelet Created container controller
Warning FailedPreStopHook 26m kubelet Exec lifecycle hook ([/wait-shutdown]) for Container "controller" in Pod "ingress-nginx-controller-5744fc449d-8t2rq_infra(c4c166ff-1d86-4385-a22c-227084d569d6)" failed - error: command '/wait-shutdown' exited with 137: , message: ""
Normal Pulled 26m kubelet Container image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" already present on machine
Warning BackOff 7m7s (x52 over 21m) kubelet Back-off restarting failed container
Warning Unhealthy 2m9s (x55 over 26m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 502
The Beta API versions (extensions/v1beta1 and networking.k8s.io/v1beta1) of Ingress are no longer served (removed) for GKE clusters created on versions 1.22 and later. Please refer to the official GKE ingress documentation for changes in the GA API version.
Also refer to Official Kubernetes documentation for API removals for Kubernetes v1.22 for more information.
Before upgrading your Ingress API as a client, make sure that every ingress controller that you use is compatible with the v1 Ingress API. See Ingress Prerequisites for more context about Ingress and ingress controllers.
Also check below possible causes for Crashloopbackoff :
Increasing the initialDelaySeconds value for the livenessProbe setting may help to alleviate the issue, as it will give the container more time to start up and perform its initial work operations before the liveness probe server checks its health.
Check “Container restart policy”, the spec of a Pod has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always.
Out of memory or resources : Try to increase the VM size. Containers may crash due to memory limits, then new ones spun up, the health check failed and Ingress served up 502.
Check externalTrafficPolicy=Local is set on the NodePort service will prevent nodes from forwarding traffic to other nodes.
Refer to the Github issue Document how to avoid 502s #34 for more information.

A single pod is in ContainerCreating Phase and others are in Running Phase

What does it means, if a single pod is in ContainerCreating Phase and others are in Running Phase, of a particular service? Is my service down or anything I need to worry about?
Didn't found anything reliable
Please use kubectl describe pod NAME to check more details about the pod.
What does it means, if a single pod is in ContainerCreating Phase and
others are in Running Phase, of a particular service?
Not particular service, focus on deployment here. if you single POD is in ContainerCreating Phase others are in Running and Also Ready 1/1 your service is not down, it's up & running as a service forward request to only Ready one.
Service will only route to traffic those are Ready 1/1 doesn't matter about the other staus containerCreating, Running, imagepullbackoff
Even if your PODs are Running but not Ready 1/1 you can consider your service as down.
Extra
Check leveraging the Readiness probe

minikube service url connection refused

I am beginner to kubernetes. I am trying to install minikube wanted to run my application in kubernetes. I am using ubuntu 16.04
I have followed the installation instructions provided here
https://kubernetes.io/docs/setup/learning-environment/minikube/#using-minikube-with-an-http-proxy
Issue1:
After installing kubectl, virtualbox and minikube I have run the command
minikube start --vm-driver=virtualbox
It is failing with following error
Starting local Kubernetes v1.10.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
E0912 17:39:12.486830 17689 start.go:305] Error restarting
cluster: restarting kube-proxy: waiting for kube-proxy to be
up for configmap update: timed out waiting for the condition
But when I checked the virtualbox I see the minikube VM running and when I run the kubectl
kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.10
I see the deployments
kubectl get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
hello-minikube 1 1 1 1 27m
I exposed the hello-minikube deployment as service
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-minikube LoadBalancer 10.102.236.236 <pending> 8080:31825/TCP 15m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 19h
I got the url for the service
minikube service hello-minikube --url
http://192.168.99.100:31825
When I try to curl the url I am getting the following error
curl http://192.168.99.100:31825
curl: (7) Failed to connect to 192.168.99.100 port 31825: Connection refused
1)If minikube cluster got failed while starting, how did the kubectl able to connect to minikube to do deployments and services?
2) If cluster is fine, then why am i getting connection refused ?
I was looking at this proxy(https://kubernetes.io/docs/setup/learning-environment/minikube/#starting-a-cluster) what is my_proxy in this ?
Is this minikube ip and some port ?
I have tried this
Error restarting cluster: restarting kube-proxy: waiting for kube-proxy to be up for configmap update: timed out waiting for the condition
but do not understand how #3(set proxy) in solution will be done. Can some one help me getting instructions for proxy ?
Adding the command output which was asked in the comments
kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
etcd-minikube 1/1 Running 0 4m
kube-addon-manager-minikube 1/1 Running 0 5m
kube-apiserver-minikube 1/1 Running 0 4m
kube-controller-manager-minikube 1/1 Running 0 6m
kube-dns-86f4d74b45-sdj6p 3/3 Running 0 5m
kube-proxy-7ndvl 1/1 Running 0 5m
kube-scheduler-minikube 1/1 Running 0 5m
kubernetes-dashboard-5498ccf677-4x7sr 1/1 Running 0 5m
storage-provisioner 1/1 Running 0 5m
I deleted minikube and removed all files under ~/.minikube and
reinstalled minikube. Now it is working fine. I did not get the output
before but I have attached it after it is working to the question. Can
you tell me what does the output of this command tells ?
It will be very difficult or even impossible to tell what was exactly wrong with your Minikube Kubernetes cluster when it is already removed and set up again.
Basically there were a few things that you could do to properly troubleshoot or debug your issue.
Adding the command output which was asked in the comments
The output you posted is actually only part of the task that #Eduardo Baitello asked you to do. kubectl get po -n kube-system command simply shows you a list of Pods in kube-system namespace. In other words this is the list of system pods forming your Kubernetes cluster and, as you can imagine, proper functioning of each of these components is crucial. As you can see in your output the STATUS of your kube-proxy pod is Running:
kube-proxy-7ndvl 1/1 Running 0 5m
You were also asked in #Eduardo's question to check its logs. You can do it by issuing:
kubectl logs kube-proxy-7ndvl
It could tell you what was wrong with this particular pod at the time when the problem occured. Additionally in such case you may use describe command to see other pod details (sometimes looking at pod events may be very helpful to figure out what's going on with it):
kubectl describe pod kube-proxy-7ndvl
The suggestion to check this particular Pod status and logs was most probably motivated by this fragment of the error messages shown during your Minikube startup process:
E0912 17:39:12.486830 17689 start.go:305] Error restarting
cluster: restarting kube-proxy: waiting for kube-proxy to be
up for configmap update: timed out waiting for the condition
As you can see this message clearly suggests that there is in short "something wrong" with kube-proxy so it made a lot of sense to check it first.
There is one more thing you may have not noticed:
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-minikube LoadBalancer 10.102.236.236 <pending> 8080:31825/TCP 15m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 19h
Your hello-minikube service was not completely ready. In EXTERNAL-IP column you can see that its state was pending. As you can use describe command to describe Pods you can do so to get details of the service. Simple:
describe service hello-minikube
could tell you quite a lot in such case.
1)If minikube cluster got failed while starting, how did the kubectl
able to connect to minikube to do deployments and services? 2) If
cluster is fine, then why am i getting connection refused ?
Remember that Kubernetes Cluster is not a monolith structure and consists of many parts that depend on one another. The fact that kubectl worked and you could create deployment doesn't mean that the whole cluster was working fine and as you can see in the error message it was suggesting that one of its components, namely kube-proxy, could actually not function properly.
Going back to the beginning of your question...
I have followed the installation instructions provided here
https://kubernetes.io/docs/setup/learning-environment/minikube/#using-minikube-with-an-http-proxy
Issue1: After installing kubectl, virtualbox and minikube I have run
the command
minikube start --vm-driver=virtualbox
as far as I understood you don't use the http proxy so you didn't follow instructions from this particular fragment of the docs that you posted, did you ?
I have the impression that you mix 2 concepts. kube-proxy which is a Kubernetes cluster component and which is deployed as pod in kube-system space and http proxy server mentioned in this fragment of documentation.
I was looking at this
proxy(https://kubernetes.io/docs/setup/learning-environment/minikube/#starting-a-cluster)
what is my_proxy in this ?
If you don't know what is your http proxy address, most probably you simply don't use it and if you don't use it to connect to the Internet from your computer, it doesn't apply to your case in any way.
Otherwise you need to set it up for your Minikube by providing additional flags when you start it as follows:
minikube start --docker-env http_proxy=http://$YOURPROXY:PORT \
--docker-env https_proxy=https://$YOURPROXY:PORT
If you were able to start your Minikube and now it works properly only using the command:
minikube start --vm-driver=virtualbox
your issue was caused by something else and you don't need to provide the above mentioned flags to tell your Minikube what is your http proxy server that you're using.
As far as I understand currently everything is up and running and you can access the url returned by the command minikube service hello-minikube --url without any problem, right ? You can also run the command kubectl get service hello-minikube and check if its output differs from what you posted before. As you didn't attach any yaml definition files it's difficult to tell if it was nothing wrong with your service definition. Also note that Load Balancer is a service type designed to work with external load balancers provided by cloud providers and minikube uses NodePort instead of it.

GCP GKE: View logs of terminated jobs/pods

I have a few cron jobs on GKE.
One of the pods did terminate and now I am trying to access the logs.
➣ $ kubectl get events
LAST SEEN TYPE REASON KIND MESSAGE
23m Normal SuccessfulCreate Job Created pod: virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
22m Normal SuccessfulDelete Job Deleted pod: virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
22m Warning DeadlineExceeded Job Job was active longer than specified deadline
23m Normal Scheduled Pod Successfully assigned default/virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42 to staging-cluster-default-pool-4b4827bf-rpnl
23m Normal Pulling Pod pulling image "gcr.io/my-repo/myimage:v8"
23m Normal Pulled Pod Successfully pulled image "gcr.io/my-repo/my-image:v8"
23m Normal Created Pod Created container
23m Normal Started Pod Started container
22m Normal Killing Pod Killing container with id docker://virulent-angelfish-cronjob:Need to kill Pod
23m Normal SuccessfulCreate CronJob Created job virulent-angelfish-cronjob-netsuite-proservices-1562220000
22m Normal SawCompletedJob CronJob Saw completed job: virulent-angelfish-cronjob-netsuite-proservices-1562220000
So at least one CJ run.
I would like to see the pod's logs, but there is nothing there
➣ $ kubectl get pods
No resources found.
Given that in my cj definition, I have:
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 3
shouldn't at least one pod be there for me to do forensics?
Your pod is crashing or otherwise unhealthy
First, take a look at the logs of the current container:
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
If your container has previously crashed, you can access the previous container’s crash log with:
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
Alternately, you can run commands inside that container with exec:
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
Note: -c ${CONTAINER_NAME} is optional. You can omit it for pods that only contain a single container.
As an example, to look at the logs from a running Cassandra pod, you might run:
kubectl exec cassandra -- cat /var/log/cassandra/system.log
If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host.
Finaly, check Logging on Google StackDriver.
Debugging Pods
The first step in debugging a pod is taking a look at it. Check the current state of the pod and recent events with the following command:
kubectl describe pods ${POD_NAME}
Look at the state of the containers in the pod. Are they all Running? Have there been recent restarts?
Continue debugging depending on the state of the pods.
Debugging ReplicationControllers
ReplicationControllers are fairly straightforward. They can either create pods or they can’t. If they can’t create pods, then please refer to the instructions above to debug your pods.
You can also use kubectl describe rc ${CONTROLLER_NAME} to inspect events related to the replication controller.
Hope it helps you to find exactly problem.
You can use the --previous flag to get the logs for the previous pod.
So, you can use:
kubectl logs --previous virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
to get the logs for the pod that was there before this one.

Kubernetes pods are pending not active

If I run this:
kubectl get pods -n kube-system
I get this output:
NAME READY STATUS RESTARTS AGE
coredns-6fdd4f6856-6bl64 0/1 Pending 0 1h
coredns-6fdd4f6856-xgrbm 0/1 Pending 0 1h
kubernetes-dashboard-65c76f6c97-c69jg 0/1 Pending 0 13m
supposedly I need a kubernetes scheduler in order to actually launch containers? Does anyone know how to initiate a kube-scheduler?
More than a Kubernetes scheduler issue, it looks like it's more about not having enough resources on your nodes (or no nodes at all) in your cluster to schedule any workloads. You can check your nodes with:
$ kubectl get nodes
Also, you are not likely able to see any control plane resource on the kube-system namespace because you may be using managed services like EKS or GKE.