Kubernetes pod deployment error FailedSync| Error syncing pod - kubernetes

Env:
Vbox on a windows 10 desktop machine
Two ubuntu VMs, one VM is master and the other one is k8s(1.7) worker.
I can see two nodes are "ready" when get nodes. But even deploy a very simple nginx pod, I got the error message from pod describe
"norm | SandboxChanged |Pod sandbox changed, it will be killed and re-created." and "warning | FailedSync| Error syncing pod".
But if I run the docker container directly on the worker, the container can be up and running. Anyone has a suggestion what I can check for?
k8s-master#k8smaster-VirtualBox:~$ **kubectl get pods** NAME
READY STATUS RESTARTS AGE
movie-server-1517284798-lbb01 0/1 CrashLoopBackOff 6
16m
k8s-master#k8smaster-VirtualBox:~$ **kubectl describe pod
movie-server-1517284798-lbb01**
--- clip --- kubelet, master-virtualbox spec.containers{movie-server} Warning FailedError: failed to start
container "movie-server": Error response from daemon:
{"message":"cannot join network of a non running container:
3f59947dbd404ecf2f6dd0b65dd9dad8b25bf0c418aceb8cf666ad0761402b53"}
kubelet, master-virtualbox spec.containers{movie-server}
Warning BackOffBack-off restarting failed container
kubelet, master-virtualbox Normal
SandboxChanged Pod sandbox changed, it will be killed and
re-created.
kubelet, master-virtualbox spec.containers{movie-server} Normal
PulledContainer image "nancyfeng/movie-server:0.1.0" already present
on machine
kubelet, master-virtualbox spec.containers{movie-server}
Normal CreatedCreated container
kubelet, master-virtualbox
Warning FailedSync Error syncing pod
kubelet, master-virtualbox spec.containers{movie-server}
Warning FailedError: failed to start container "movie-server": Error
response from daemon: {"message":"cannot join network of a non running
container:
72ba77b25b6a3969e8921214f0ca73ffaab4c82d8a2852e3d1b1f3ac5dde6ce1"}
--- clip ---

Related

istio bookinfo sample application deployment failed for "ImagePullBackOff"

I am trying to deploy istio's sample bookinfo application using the below command:
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
from here
but each time I am getting ImagePullBackoff error like this:
NAME READY STATUS RESTARTS AGE
details-v1-c74755ddf-m878f 2/2 Running 0 6m32s
productpage-v1-778ddd95c6-pdqsk 2/2 Running 0 6m32s
ratings-v1-5564969465-956bq 2/2 Running 0 6m32s
reviews-v1-56f6655686-j7lb6 1/2 ImagePullBackOff 0 6m32s
reviews-v2-6b977f8ff5-55tgm 1/2 ImagePullBackOff 0 6m32s
reviews-v3-776b979464-9v7x5 1/2 ImagePullBackOff 0 6m32s
For error details, I have run :
kubectl describe pod reviews-v1-56f6655686-j7lb6
Which returns these:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m41s default-scheduler Successfully assigned default/reviews-v1-56f6655686-j7lb6 to minikube
Normal Pulled 7m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 7m39s kubelet Created container istio-init
Normal Started 7m39s kubelet Started container istio-init
Warning Failed 5m39s kubelet Failed to pull image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0": rpc error: code = Unknown desc = context deadline exceeded
Warning Failed 5m39s kubelet Error: ErrImagePull
Normal Pulled 5m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 5m39s kubelet Created container istio-proxy
Normal Started 5m39s kubelet Started container istio-proxy
Normal BackOff 5m36s (x3 over 5m38s) kubelet Back-off pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Warning Failed 5m36s (x3 over 5m38s) kubelet Error: ImagePullBackOff
Normal Pulling 5m25s (x2 over 7m38s) kubelet Pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Do I need to build dockerfile first and push it to the local repository? There are no clear instructions there or I failed to find any.
Can anybody help?
If you check in dockerhub the image is there:
https://hub.docker.com/r/istio/examples-bookinfo-reviews-v1/tags
So the error that you need to deal with is context deadline exceeded while trying to pull it from dockerhub. This is likely a networking error (a generic Go error saying it took too long), depending on where your cluster is running you can do manually a docker pull from the nodes and that should work.
EDIT: for minikube do a minikube ssh and then a docker pull
Solved the problem by below command :
minikube ssh docker pull istio/examples-bookinfo-reviews-v1:1.17.0
from this git issues here
Also How to use local docker images with Minikube?
Hope this may help somebody

Centos 8 microk8s Readiness probe failed: HTTP probe failed with statuscode: 503

I have installed microk8s on my centos 8 operating system.
kube-system coredns-7f9c69c78c-lxm7c 0/1 Running 1 18m
kube-system calico-node-thhp8 1/1 Running 1 68m
kube-system calico-kube-controllers-f7868dd95-dpsnl 0/1 CrashLoopBackOff 23 68m
When I do microk8s enable dns, coredns or calico-kube-controllers cannot be started as above.
Describe the pod for coredns :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned kube-system/coredns-7f9c69c78c-lxm7c to localhost.localdomain
Normal Pulled 14m kubelet Container image "coredns/coredns:1.8.0" already present on machine
Normal Created 14m kubelet Created container coredns
Normal Started 14m kubelet Started container coredns
Warning Unhealthy 11m (x22 over 14m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Normal SandboxChanged 2m8s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 2m7s kubelet Container image "coredns/coredns:1.8.0" already present on machine
Normal Created 2m7s kubelet Created container coredns
Normal Started 2m6s kubelet Started container coredns
Warning Unhealthy 2m6s kubelet Readiness probe failed: Get "http://10.1.102.132:8181/ready": dial tcp 10.1.102.132:8181: connect: connection refused
Warning Unhealthy 9s (x12 over 119s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Describe the pod for calico-kube-controllers :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 73m default-scheduler no nodes available to schedule pods
Warning FailedScheduling 73m (x1 over 73m) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 72m (x1 over 72m) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Normal Scheduled 72m default-scheduler Successfully assigned kube-system/calico-kube-controllers-f7868dd95-dpsnl to localhost.localdomain
Warning FailedCreatePodSandBox 72m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f3ea36b003b0c9142ae63fee31531f9102e40ab837f4d795d1efb5c85af223ec": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 71m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a1c405cdcebe79c586badcc8da47700247751a50ef9a1403e95fc4995485fba0": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 71m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4adb07610eef0d7a618105abf72a114e486c373a02d5d1b204da2bd35268dd1b": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 71m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "96aac009175973ac4c20034824db3443b3ab184cfcd1ed23786e539fb6147796": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 71m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "79639a18edcffddbdb93492157af43bb6c1f1a9ac2af1b3fbbac58335737d5dc": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 70m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3264f006447297583a37d8cc87ffe01311deaf2a31bf25867b3b18c83db2167d": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 70m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5c5cf6509bfcf515ad12bc51451e4c385e5242c4f7bb593779d207abf9c906a4": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Normal Pulling 70m kubelet Pulling image "calico/kube-controllers:v3.13.2"
Normal Pulled 69m kubelet Successfully pulled image "calico/kube-controllers:v3.13.2" in 50.744281789s
Normal Created 69m kubelet Created container calico-kube-controllers
Normal Started 69m kubelet Started container calico-kube-controllers
Warning Unhealthy 69m (x2 over 69m) kubelet Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
Warning MissingClusterDNS 37m (x185 over 72m) kubelet pod: "calico-kube-controllers-f7868dd95-dpsnl_kube-system(d8c3ee40-7d3b-4a84-9398-19ec8a6d9082)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Warning Unhealthy 31m (x6 over 32m) kubelet Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
Normal Pulled 30m (x4 over 32m) kubelet Container image "calico/kube-controllers:v3.13.2" already present on machine
Normal Created 30m (x4 over 32m) kubelet Created container calico-kube-controllers
Normal Started 30m (x4 over 32m) kubelet Started container calico-kube-controllers
Warning BackOff 22m (x42 over 32m) kubelet Back-off restarting failed container
Normal SandboxChanged 10m kubelet Pod sandbox changed, it will be killed and re-created.
Warning Unhealthy 9m36s (x6 over 10m) kubelet Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
Normal Pulled 8m51s (x4 over 10m) kubelet Container image "calico/kube-controllers:v3.13.2" already present on machine
Normal Created 8m51s (x4 over 10m) kubelet Created container calico-kube-controllers
Normal Started 8m51s (x4 over 10m) kubelet Started container calico-kube-controllers
Warning BackOff 42s (x42 over 10m) kubelet Back-off restarting failed container
I cannot start my microk8s services. I don't encounter these on my Ubuntu server. What can I do in these error situations that I encounter for my Centos 8 server?
Have you tried updating the microk8s version?

helm3 installation fails with: failed post-install: timed out waiting for the condition

I'm new to Kubernetes and Helm. I have installed k3d and helm:
k3d version v1.7.0
k3s version v1.17.3-k3s1
helm version
version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f31a5ad8dfbb4fe05d143688", GitTreeState:"clean", GoVersion:"go1.13.12"}
I do have a cluster created with 10 worker nodes. When I try to install stackstorm-ha on the cluster I see the following issues:
helm install stackstorm/stackstorm-ha --generate-name --debug
client.go:534: [debug] stackstorm-ha-1592860860-job-st2-apikey-load: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: failed post-install: timed out waiting for the condition
helm.go:84: [debug] failed post-install: timed out waiting for the condition
njbbmacl2813:~ gangsh9$ kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
kubectl describe pods either shows :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/stackstorm-ha-1592857897-st2api-7f6c877b9c-dtcp5 to k3d-st2hatest-worker-5
Warning Failed 23m kubelet, k3d-st2hatest-worker-5 Error: context deadline exceeded
Normal Pulling 17m (x5 over 37m) kubelet, k3d-st2hatest-worker-5 Pulling image "stackstorm/st2api:3.3dev"
Normal Pulled 17m (x5 over 28m) kubelet, k3d-st2hatest-worker-5 Successfully pulled image "stackstorm/st2api:3.3dev"
Normal Created 17m (x5 over 28m) kubelet, k3d-st2hatest-worker-5 Created container st2api
Normal Started 17m (x4 over 28m) kubelet, k3d-st2hatest-worker-5 Started container st2api
Warning BackOff 53s (x78 over 20m) kubelet, k3d-st2hatest-worker-5 Back-off restarting failed container
or
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/stackstorm-ha-1592857897-st2timersengine-c847985d6-74h5k to k3d-st2hatest-worker-2
Warning Failed 6m23s kubelet, k3d-st2hatest-worker-2 Failed to pull image "stackstorm/st2timersengine:3.3dev": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/stackstorm/st2timersengine:3.3dev": failed to resolve reference "docker.io/stackstorm/st2timersengine:3.3dev": failed to authorize: failed to fetch anonymous token: Get https://auth.docker.io/token?scope=repository%3Astackstorm%2Fst2timersengine%3Apull&service=registry.docker.io: net/http: TLS handshake timeout
Warning Failed 6m23s kubelet, k3d-st2hatest-worker-2 Error: ErrImagePull
Normal BackOff 6m22s kubelet, k3d-st2hatest-worker-2 Back-off pulling image "stackstorm/st2timersengine:3.3dev"
Warning Failed 6m22s kubelet, k3d-st2hatest-worker-2 Error: ImagePullBackOff
Normal Pulling 6m10s (x2 over 6m37s) kubelet, k3d-st2hatest-worker-2 Pulling image "stackstorm/st2timersengine:3.3dev"
Kind of stuck here.
Any help would be greatly appreciated.
The TLS handshake timeout error is very common when the machine that you are running your deployment on is running out of resources. Alternative issue is caused by slow internet connection or some proxy settings but we ruled out that since you can pull and run docker images locally and deploy small nginx webserver in your cluster.
As you may notice in the stackstorm helm chart it installs a big amount of services/pods inside your cluster which can take up a lot of resources.
It will install 2 replicas for each component of StackStorm
microservices for redundancy, as well as backends like RabbitMQ HA,
MongoDB HA Replicaset and etcd cluster that st2 replies on for MQ, DB
and distributed coordination respectively.
I deployed stackstorm on both k3d and GKE but I had to use fast machines in order to deploy this quickly and successfully.
NAME: stackstorm
LAST DEPLOYED: Mon Jun 29 15:25:52 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
Congratulations! You have just deployed StackStorm HA!

Troubles while installing GlusterFS on Kubernetes cluster using Heketi

I try to install GlusterFS on my kubernetes cluster using heketi. I start gk-deploy but it shows that pods aren't found:
Using Kubernetes CLI.
Using namespace "default".
Checking for pre-existing resources...
GlusterFS pods ... not found.
deploy-heketi pod ... not found.
heketi pod ... not found.
gluster-s3 pod ... not found.
Creating initial resources ... Error from server (AlreadyExists): error when creating "/heketi/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view not labeled
OK
node/sapdh2wrk1 not labeled
node/sapdh2wrk2 not labeled
node/sapdh2wrk3 not labeled
daemonset.extensions/glusterfs created
Waiting for GlusterFS pods to start ... pods not found.
I've started gk-deploy more than once.
I have 3 nodes in my kubernetes cluster and it seems like pods can't start up on none of them, but I don't understand why.
Pods are created but aren't ready:
kubectl get pods
NAME READY STATUS RESTARTS AGE
glusterfs-65mc7 0/1 Running 0 16m
glusterfs-gnxms 0/1 Running 0 16m
glusterfs-htkmh 0/1 Running 0 16m
heketi-754dfc7cdf-zwpwn 0/1 ContainerCreating 0 74m
Here is a log of one GlusterFS Pod, it ends with a warning:
Events:
Type Reason Age From Message
Normal Scheduled 19m default-scheduler Successfully assigned default/glusterfs-65mc7 to sapdh2wrk1
Normal Pulled 19m kubelet, sapdh2wrk1 Container image "gluster/gluster-centos:latest" already present on machine
Normal Created 19m kubelet, sapdh2wrk1 Created container
Normal Started 19m kubelet, sapdh2wrk1 Started container
Warning Unhealthy 13m (x12 over 18m) kubelet, sapdh2wrk1 Liveness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active glusterd.service
Warning Unhealthy 3m58s (x35 over 18m) kubelet, sapdh2wrk1 Readiness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active glusterd.service
Glusterfs-5.8-100.1 is installed and started up on every node including master.
What is the reason why Pods don't start up?

Add a node to cluster with Flannel : "cannot join network of a non running container"

I am adding a node to the Kubernetes cluster as a node using flannel. Here are the nodes on my cluster:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
jetson-80 NotReady <none> 167m v1.15.0
p4 Ready master 18d v1.15.0
This machine is reachable through the same network. When joining the cluster, Kubernetes pulls some images, among others k8s.gcr.io/pause:3.1, but for some reason failed in pulling the images:
Warning FailedCreatePodSandBox 15d
kubelet,jetson-81 Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Error response from daemon: Get https://k8s.gcr.io/v2/: read tcp 192.168.8.81:58820->108.177.126.82:443: read: connection reset by peer
The machine is connected to the internet but only wget command works, not ping
I tried to pull images elsewhere and copy them to the machine.
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.15.0 d235b23c3570 2 months ago 82.4MB
quay.io/coreos/flannel v0.11.0-arm64 32ffa9fadfd7 6 months ago 53.5MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 20 months ago 742kB
Here are the list of pods on the master :
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-gmsz7 1/1 Running 0 2d22h
coredns-5c98db65d4-j6gz5 1/1 Running 0 2d22h
etcd-p4 1/1 Running 0 2d22h
kube-apiserver-p4 1/1 Running 0 2d22h
kube-controller-manager-p4 1/1 Running 0 2d22h
kube-flannel-ds-amd64-cq7kz 1/1 Running 9 17d
kube-flannel-ds-arm64-4s7kk 0/1 Init:CrashLoopBackOff 0 2m8s
kube-proxy-l2slz 0/1 CrashLoopBackOff 4 2m8s
kube-proxy-q6db8 1/1 Running 0 2d22h
kube-scheduler-p4 1/1 Running 0 2d22h
tiller-deploy-5d6cc99fc-rwdrl 1/1 Running 1 17d
but it didn't work either when I check the associated flannel pod kube-flannel-ds-arm64-4s7kk:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 66s default-scheduler Successfully assigned kube-system/kube-flannel-ds-arm64-4s7kk to jetson-80
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 68ffc44cf8cd655234691b0362615f97c59d285bec790af40f890510f27ba298
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: a196d8540b68dc7fcd97b0cda1e2f3183d1410598b6151c191b43602ac2faf8e
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 9d05d1fcb54f5388ca7e64d1b6627b05d52aea270114b5a418e8911650893bc6
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 5b730961cddf5cc3fb2af564b1abb46b086073d562bb2023018cd66fc5e96ce7
Normal Created <invalid> (x5 over <invalid>) kubelet, jetson-80 Created container install-cni
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 1767e9eb9198969329eaa14a71a110212d6622a8b9844137ac5b247cb9e90292
Normal SandboxChanged <invalid> (x5 over <invalid>) kubelet, jetson-80 Pod sandbox changed, it will be killed and re-created.
Warning BackOff <invalid> (x4 over <invalid>) kubelet, jetson-80 Back-off restarting failed container
Normal Pulled <invalid> (x6 over <invalid>) kubelet, jetson-80 Container image "quay.io/coreos/flannel:v0.11.0-arm64" already present on machine
I still can't identify if it's a Kubernetes or Flannel issue and haven't been able to solve it despite multiple attempts. Please let me know if you need me to share more details
EDIT:
using kubectl describe pod -n kube-system kube-proxy-l2slz :
Normal Pulled <invalid> (x67 over <invalid>) kubelet, ahold-jetson-80 Container image "k8s.gcr.io/kube-proxy:v1.15.0" already present on machine
Normal SandboxChanged <invalid> (x6910 over <invalid>) kubelet, ahold-jetson-80 Pod sandbox changed, it will be killed and re-created.
Warning FailedSync <invalid> (x77 over <invalid>) kubelet, ahold-jetson-80 (combined from similar events): error determining status: rpc error: code = Unknown desc = Error: No such container: 03e7ee861f8f63261ff9289ed2d73ea5fec516068daa0f1fe2e4fd50ca42ad12
Warning BackOff <invalid> (x8437 over <invalid>) kubelet, ahold-jetson-80 Back-off restarting failed container
Your problem may be coused by the mutil sandbox container in you node. Try to restart the kubelet:
$ systemctl restart kubelet
Check if you have generated and copied public key to right node to have connection between them: ssh-keygen.
Please make sure the firewall/security groups allow traffic on UDP port 58820.
Look at the flannel logs and see if there are any errors there but also look for "Subnet added: " messages. Each node should have added the other two subnets.
While running ping, try to use tcpdump to see where the packets get dropped.
Try src flannel0 (icmp), src host interface (udp port 58820), dest host interface (udp port 58820), dest flannel0 (icmp), docker0 (icmp).
Here is useful documentation: flannel-documentation.