containerCreating after a drain on K8S - kubernetes

After trying to make a drain on one node of my cluster I have this kind of error:
Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "gitlab-postgresql-stolon-keeper-2": Error response from daemon: all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused": unavailable
I'm using and K8S 15.4
docker v18.09.9

Related

K8s pod ImagePullBackoff

created a very simple nginx pod and run into status ImagePullBackoff
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32m default-scheduler Successfully assigned reloader/nginx to aks-appnodepool1-22779252-vmss000000
Warning Failed 29m kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 52.200.78.26:443: i/o timeout
Warning Failed 27m kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 52.21.28.242:443: i/o timeout
Warning Failed 23m kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 3.223.210.206:443: i/o timeout
Normal Pulling 22m (x4 over 32m) kubelet Pulling image "nginx"
Warning Failed 20m (x4 over 29m) kubelet Error: ErrImagePull
Warning Failed 20m kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 3.228.155.36:443: i/o timeout
Warning Failed 20m (x7 over 29m) kubelet Error: ImagePullBackOff
Warning Failed 6m41s kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp 52.5.157.114:443: i/o timeout
Normal BackOff 2m17s (x65 over 29m) kubelet Back-off pulling image "nginx"
Checked network status:
A VM in the same subnet can access "https://registry-1.docker.io/v2/library/nginx/manifests/latest" and telnet 52.5.157.114 443 successful.
docker pull nginx successfully on the VM in the same subnet.
kubectl exec into a running pod in the same cluster can wget https://registry-1.docker.io/v2/library/nginx/manifests/latest successfully.
.
What is the possible problem?
When I wget/curl or anything you want to access
https://registry-1.docker.io/v2/library/nginx/manifests/latest
It says
{"errors":[{"code":"UNAUTHORIZED","message":"authentication required","detail":[{"Type":"repository","Class":"","Name":"library/nginx","Action":"pull"}]}]}
However this is because you need to be logged in to pull this image from this repository.
2 solutions:
The first is simple, in the image field just replace this url by nginx:latest and it should work
The second: create a regcred
in your pod yaml , change image : docker.io/library/nginx:latest to docker.io/nginx:latest
Turned out to be firewall dropped the package.

coredns connection refused error while setting up kubernetes cluster

I've got a kubernetes cluster set up with kubeadm. I haven't deployed any pods yet, but the coredns pods are stuck in a ContainerCreating status.
[root#master-node ~]# kubectl get -A pods
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-f5kjh 0/1 ContainerCreating 0 151m
kube-system coredns-64897985d-xz9nt 0/1 ContainerCreating 0 151m
[...]
When I check it out with kubectl describe I see this:
[root#master-node ~]# kubectl describe -n kube-system pod coredns-64897985d-f5kjh
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 22m (x570 over 145m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4974dadd11fecf1ebfbcccd75701641b752426808889895672f34e6934776207": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/4974dadd11fecf1ebfbcccd75701641b752426808889895672f34e6934776207": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bce2558b24468c0d0e83fe1eedf2fa70108420a466d000b74ceaf351e595007d": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/bce2558b24468c0d0e83fe1eedf2fa70108420a466d000b74ceaf351e595007d": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e53e79bc3642c9a0c2b240dc174931af9f5dddf7d5b7df50382fcb3fea351df9": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/e53e79bc3642c9a0c2b240dc174931af9f5dddf7d5b7df50382fcb3fea351df9": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b6da6e72057c3b48ac6ced3ba6b81917111e94c20216b65126a2733462139ed1": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/b6da6e72057c3b48ac6ced3ba6b81917111e94c20216b65126a2733462139ed1": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "09416534b75ef7beea279f9389eb1a732b6a288c3b170a489e04cce01c294fa2": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/09416534b75ef7beea279f9389eb1a732b6a288c3b170a489e04cce01c294fa2": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "411fe06179ab24a3999b1c034bc99452d99249bbb6cb966b496f7a8b467e1806": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/411fe06179ab24a3999b1c034bc99452d99249bbb6cb966b496f7a8b467e1806": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e0fc2a5d4852cd31eca4b473f614cadcb9235a2a325c01b469110bfd6bbf9a3b": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/e0fc2a5d4852cd31eca4b473f614cadcb9235a2a325c01b469110bfd6bbf9a3b": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4528997239e55f7ef546c0af9cc7c12cf5fe4942a370ed2a772ba7fc405773d2": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/4528997239e55f7ef546c0af9cc7c12cf5fe4942a370ed2a772ba7fc405773d2": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b534273b4fe3b893cdeac05555e47429bc7578c1e0c0095481fe155637f0c4ae": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/b534273b4fe3b893cdeac05555e47429bc7578c1e0c0095481fe155637f0c4ae": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "afc479a4bfa16cef4367ecfee74333dfa9bbf12c59995446792f22c8e39ca16d": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/afc479a4bfa16cef4367ecfee74333dfa9bbf12c59995446792f22c8e39ca16d": dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 3m50s (x61 over 16m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a9254528ba611403a9b2293a2201c8758ff4adf75fd4a1d2b9690d15446cc92a": unable to allocate IP address: Post "http://127.0.0.1:6784/ip/a9254528ba611403a9b2293a2201c8758ff4adf75fd4a1d2b9690d15446cc92a": dial tcp 127.0.0.1:6784: connect: connection refused
Any idea what could be causing this?
Turns out this is a firewall issue. I was using Weavenet as my CNI, which requires port 6784 to be open to work. You can see this in the error, where it's trying to access 127.0.0.1:6784 and getting the connection refused (pretty obvious in hindsight). I fixed it by opening port 6784 on my firewall. For firewalld, I did
firewall-cmd --permanent --add-port=6784/tcp
firewall-cmd --reload
This might be a security problem. The weavenet docs said something about how this port should only be accessible to certain processes or something, not sure. For my application security isn't a big concern so I didn't bother looking into it.

Kubernetes deployments are failed with istio-sidecar injection

Our K8 cluster was working for more than a year, recently it got some strange behavior and now when we deploy an app using kubectl apply -f deployment-manifest.yaml, it doesnt show in kubectl get pods. But shows in kubectl get deployments with 0/3 state. kubectl describe deployment app-deployment
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
ReplicaFailure True FailedCreate
Progressing False ProgressDeadlineExceeded
When I check kube-apiserver logs
I1115 12:55:56.110277 1 trace.go:116] Trace[16922026]: "Call validating webhook" configuration:istiod-istio-system,webhook:validation.istio.io,resource:networking.istio.io/v1alpha3, Resource=gateways,subresource:,operation:CREATE,UID:00c425da-6475-4ed3-bc25-5a81d866baf2 (started: 2021-11-15 12:55:26.109897413 +0000 UTC m=+8229.935658158) (total time: 30.00030708s):
Trace[16922026]: [30.00030708s] [30.00030708s] END
W1115 12:55:56.110327 1 dispatcher.go:128] Failed calling webhook, failing open validation.istio.io: failed calling webhook "validation.istio.io": Post https://istiod.istio-system.svc:443/validate?timeout=30s: dial tcp 10.233.30.109:443: i/o timeout
E1115 12:55:56.110363 1 dispatcher.go:129] failed calling webhook "validation.istio.io": Post https://istiod.istio-system.svc:443/validate?timeout=30s: dial tcp 10.233.30.109:443: i/o timeout
I1115 12:55:56.121271 1 trace.go:116] Trace[576910507]: "Create" url:/apis/networking.istio.io/v1alpha3/namespaces/istio-system/gateways,user-agent:pilot-discovery/v0.0.0 (linux/amd64) kubernetes/$Format,client:192.168.1.16 (started: 2021-11-15 12:55:26.108861126 +0000 UTC m=+8229.934621868) (total time: 30.012357263s):
Kube-controller logs
I1116 07:55:06.218995 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"ops-executor-app-6647b7cbdb", UID:"0ef5fefd-88d7-480f-8a5d-f7e2c8025ae9", APIVersion:"apps/v1", ResourceVersion:"122334057", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: Internal error occurred: failed calling webhook "sidecar-injector.istio.io": Post https://istiod.istio-system.svc:443/inject?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E1116 07:56:12.342407 1 replica_set.go:535] sync "default/app-6769f4cb97" failed with Internal error occurred: failed calling webhook "sidecar-injector.istio.io": Post https://istiod.istio-system.svc:443/inject?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
When I check kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
istio-egressgateway-794d6f956b-8p5vz 0/1 Running 5 401d
istio-ingressgateway-784f857457-2fz4v 0/1 Running 5 401d
istiod-67c86464b4-vjp4j 1/1 Running 5 401d
egress and ingress gateway logs have
2021-11-15T16:55:31.419880Z error citadelclient Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:31.419912Z error cache resource:default request:37d26b55-df29-465f-9069-9b9a1904e8ab CSR retrial timed out: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:31.419956Z error cache resource:default failed to generate secret for proxy: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:31.419981Z error sds resource:default Close connection. Failed to get secret for proxy "router~10.233.70.87~istio-egressgateway-794d6f956b-8p5vz.istio-system~istio-system.svc.cluster.local" from secret cache: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:31.420070Z info sds resource:default connection is terminated: rpc error: code = Canceled desc = context canceled
2021-11-15T16:55:31.420336Z warning envoy config StreamSecrets gRPC config stream closed: 14, connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:48.020242Z warning envoy config StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream
2021-11-15T16:55:48.020479Z warning envoy config Unable to establish new stream
2021-11-15T16:55:51.025327Z info sds resource:default new connection
2021-11-15T16:55:51.025597Z info sds Skipping waiting for gateway secret
Tried to get details as described here, but it shows no resources.
Tried deploying application in non-istio injected namespace and it works without any issue.
We have baremetal cluster running Ubuntu-18.04LTS.
istioctl version
client version: 1.7.0
control plane version: 1.7.0
data plane version: none
Kubernetes v1.18.8
As described here, ran kubectl get --raw /api/v1/namespaces/istio-system/services/https:istiod:https-webhook/proxy/inject -v4
I1116 17:05:32.703339 28777 helpers.go:216] server response object: [{
"metadata": {},
"status": "Failure",
"message": "the server rejected our request for an unknown reason",
"reason": "BadRequest",
"details": {
"causes": [
{
"reason": "UnexpectedServerResponse",
"message": "no body found"
}
]
},
"code": 400
}]
F1116 17:05:32.703515 28777 helpers.go:115] Error from server (BadRequest): the server rejected our request for an unknown reason
From ingres-gateway
istio-proxy#istio-ingressgateway-784f857457-2fz4v:/$ curl https://istiod.istio-system:443/inject -k
curl: (6) Could not resolve host: istiod.istio-system
Edit : in master node /var/lib/kubelet/config.yaml
clusterDNS:
- 169.254.25.10
and we can ping to this IP from our nodes.
I found this in coredns pod logs
E1123 08:57:05.386992 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E1123 08:57:05.387108 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.233.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused

minikube cluster pods are unhealthy and restarting because of connection refused error

I have installed minikube cluster on centos7 VM. After installation, few pods are restarting because of unhealthy pod status. all failed pods have similar reasons:
Error from pods (describe command):
Readiness probe failed: Get "http://172.17.0.4:6789/readyz": dial tcp 172.17.0.4:6789: connect: connection refused
Liveness probe failed: Get "http://172.17.0.4:6789/healthz": dial tcp 172.17.0.4:6789: connect: connection refused
Liveness probe failed: HTTP probe failed with statuscode: 500
Readiness probe failed: HTTP probe failed with statuscode: 500
Liveness probe failed: HTTP probe failed with statuscode: 503
Because of these errors, application in the pod is not working properly.
I am new to kubernates and not able to understand to debug this error
UPDATE please see below the error logs of pod(che-operator)
time="2021-09-09T15:37:13Z" level=info msg="Deployment plugin-registry is in the rolling update state."
I0909 15:37:15.651964 1 request.go:655] Throttling request took 1.04725976s, request: GET:https://10.96.0.1:443/apis/extensions/v1beta1?timeout=32s
time="2021-09-09T15:37:16Z" level=info msg="Deployment plugin-registry is in the rolling update state."
I0909 15:37:37.710602 1 request.go:655] Throttling request took 1.046990691s, request: GET:https://10.96.0.1:443/apis/apiextensions.k8s.io/v1?timeout=32s
W0909 15:43:23.172829 1 warnings.go:70] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
E0909 15:47:05.403189 1 leaderelection.go:325] error retrieving resource lock eclipse-che/e79b08a4.org.eclipse.che: Get "https://10.96.0.1:443/api/v1/namespaces/eclipse-che/configmaps/e79b08a4.org.eclipse.che": context deadline exceeded
I0909 15:47:05.403334 1 leaderelection.go:278] failed to renew lease eclipse-che/e79b08a4.org.eclipse.che: timed out waiting for the condition
{"level":"info","ts":1631202425.4036877,"logger":"controller","msg":"Stopping workers","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster"}
{"level":"error","ts":1631202425.4034257,"logger":"setup","msg":"problem running manager","error":"leader election lost","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/che-operator/vendor/github.com/go-logr/zapr/zapr.go:132\nmain.main\n\t/che-operator/main.go:254\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:204"}
Your readiness and Live ness probe failing based on error
Readiness probe failed: Get "http://172.17.0.4:6789/readyz"
Readiness & liveness probe use to check the status of application inside the POD. It continuously check the application status on one endpoint if application fail to responded Kubernetes will restart your POD automatically.
In this case I would suggest checking your application status running inside the POD. readyz & health failing due to this your pods are failing.
Read more about the Readiness & liveness at : https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
You can also check the log of application using :
kubectl get logs <POD name>

Failed to create pod sandbox [flannel]

I am running into this error on random pods. Thank you #matthew-l-daniel for the comment - as I didn't know where to start.
Here is the contents of /opt/cni/bin on the node
:/opt/cni/bin$ ls
bridge host-local loopback
Here are the kubelet logs for a container that failed.
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924370 32233 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "postgres-core-0": Error response from daemon: grpc: the connection is unavailable
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924380 32233 kuberuntime_manager.go:647] createPodSandbox for pod "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "postgres-core-0": Error response from daemon: grpc: the connection is unavailable
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924427 32233 pod_workers.go:186] Error syncing pod d8acae2f-24a2-11e9-b79c-0a0d1213cce2 ("postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)"), skipping: failed to "CreatePodSandbox" for "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" with CreatePodSandboxError: "CreatePodSandbox for pod \"postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"postgres-core-0\": Error response from daemon: grpc: the connection is unavailable"
As for flannel container logs, there are many flannel pods running - and all are healthy.
Kubernetes v 1.10.11
Docker version 17.03.2-ce, build f5ec1e2
Flannel logs
E0130 15:34:16.536354 1 vxlan_network.go:187] DelFDB failed: no such file or directory
E0130 15:34:16.536411 1 vxlan_network.go:191] failed to delete vxlanRoute (100.107.178.0/24 -> 100.107.178.0): no such process
E0130 17:33:44.848163 1 vxlan_network.go:187] DelFDB failed: no such file or directory
E0130 17:33:44.848219 1 vxlan_network.go:191] failed to delete vxlanRoute (100.107.201.0/24 -> 100.107.201.0): no such process