Tiller pod crashes after Vagrant VM is powered off - kubernetes

I have set up a Vagrant VM, and installed Kubernetes and Helm.
vagrant#vagrant:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-19T00:05:56Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
vagrant#vagrant:~$ helm version
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
After the first vagrant up that creates the VM, Tiller has no issues.
I power-off the VM with vagrant halt and reactivate it with vagrant up. Then Tiller starts to misbehave.
It has a lot of restarts and at some point, it enters a ClashLoopBackOff state.
etcd-vagrant 1/1 Running 2 1h
heapster-5449cf95bd-h9xk8 1/1 Running 2 1h
kube-apiserver-vagrant 1/1 Running 2 1h
kube-controller-manager-vagrant 1/1 Running 2 1h
kube-dns-6f4fd4bdf-xclbb 3/3 Running 6 1h
kube-proxy-8n8tc 1/1 Running 2 1h
kube-scheduler-vagrant 1/1 Running 2 1h
kubernetes-dashboard-5bd6f767c7-lrdjp 1/1 Running 3 1h
tiller-deploy-78f96d6f9-cswbm 0/1 CrashLoopBackOff 8 38m
weave-net-948jt 2/2 Running 5 1h
I get a look at the pod's events and see that the Liveness and Readiness probes are failing.
vagrant#vagrant:~$ kubectl describe pod tiller-deploy-78f96d6f9-cswbm -n kube-system
Name: tiller-deploy-78f96d6f9-cswbm
Namespace: kube-system
Node: vagrant/10.0.2.15
Start Time: Wed, 23 May 2018 08:51:54 +0000
Labels: app=helm
name=tiller
pod-template-hash=349528295
Annotations: <none>
Status: Running
IP: 10.32.0.28
Controlled By: ReplicaSet/tiller-deploy-78f96d6f9
Containers:
tiller:
Container ID: docker://389470b95c46f0a5ba6b4b5457f212b0e6f3e3a754beb1aeae835260de3790a7
Image: gcr.io/kubernetes-helm/tiller:v2.9.1
Image ID: docker-pullable://gcr.io/kubernetes-helm/tiller#sha256:417aae19a0709075df9cc87e2fcac599b39d8f73ac95e668d9627fec9d341af2
Ports: 44134/TCP, 44135/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 23 May 2018 09:26:53 +0000
Finished: Wed, 23 May 2018 09:27:12 +0000
Ready: False
Restart Count: 8
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-fl44z (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-fl44z:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-fl44z
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 38m kubelet, vagrant MountVolume.SetUp succeeded for volume "default-token-fl44z"
Normal Scheduled 38m default-scheduler Successfully assigned tiller-deploy-78f96d6f9-cswbm to vagrant
Normal Pulled 29m (x2 over 38m) kubelet, vagrant Container image "gcr.io/kubernetes-helm/tiller:v2.9.1" already present on machine
Normal Killing 29m kubelet, vagrant Killing container with id docker://tiller:Container failed liveness probe.. Container will be killed and recreated.
Normal Created 29m (x2 over 38m) kubelet, vagrant Created container
Normal Started 29m (x2 over 38m) kubelet, vagrant Started container
Warning Unhealthy 28m (x2 over 37m) kubelet, vagrant Readiness probe failed: Get http://10.32.0.19:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 17m (x30 over 37m) kubelet, vagrant Liveness probe failed: Get http://10.32.0.19:44135/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Normal SuccessfulMountVolume 11m kubelet, vagrant MountVolume.SetUp succeeded for volume "default-token-fl44z"
Warning FailedCreatePodSandBox 10m (x7 over 11m) kubelet, vagrant Failed create pod sandbox.
Normal SandboxChanged 10m (x8 over 11m) kubelet, vagrant Pod sandbox changed, it will be killed and re-created.
Normal Pulled 10m kubelet, vagrant Container image "gcr.io/kubernetes-helm/tiller:v2.9.1" already present on machine
Normal Created 10m kubelet, vagrant Created container
Normal Started 10m kubelet, vagrant Started container
Warning Unhealthy 10m kubelet, vagrant Liveness probe failed: Get http://10.32.0.28:44135/liveness: dial tcp 10.32.0.28:44135: getsockopt: connection refused
Warning Unhealthy 10m kubelet, vagrant Readiness probe failed: Get http://10.32.0.28:44135/readiness: dial tcp 10.32.0.28:44135: getsockopt: connection refused
Warning Unhealthy 8m (x2 over 9m) kubelet, vagrant Liveness probe failed: Get http://10.32.0.28:44135/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 8m (x2 over 9m) kubelet, vagrant Readiness probe failed: Get http://10.32.0.28:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning BackOff 1m (x22 over 7m) kubelet, vagrant Back-off restarting failed container
After entering this state, it stays there.
Only after I delete the Tiller pod, it comes up again and everything runs smoothly.
vagrant#vagrant:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
etcd-vagrant 1/1 Running 2 1h
heapster-5449cf95bd-h9xk8 1/1 Running 2 1h
kube-apiserver-vagrant 1/1 Running 2 1h
kube-controller-manager-vagrant 1/1 Running 2 1h
kube-dns-6f4fd4bdf-xclbb 3/3 Running 6 1h
kube-proxy-8n8tc 1/1 Running 2 1h
kube-scheduler-vagrant 1/1 Running 2 1h
kubernetes-dashboard-5bd6f767c7-lrdjp 1/1 Running 4 1h
tiller-deploy-78f96d6f9-tgx4z 1/1 Running 0 7m
weave-net-948jt 2/2 Running 5 1h
However, the events seem to have the same Unhealthy Warnings.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m default-scheduler Successfully assigned tiller-deploy-78f96d6f9-tgx4z to vagrant
Normal SuccessfulMountVolume 8m kubelet, vagrant MountVolume.SetUp succeeded for volume "default-token-fl44z"
Normal Pulled 7m kubelet, vagrant Container image "gcr.io/kubernetes-helm/tiller:v2.9.1" already present on machine
Normal Created 7m kubelet, vagrant Created container
Normal Started 7m kubelet, vagrant Started container
Warning Unhealthy 7m kubelet, vagrant Readiness probe failed: Get http://10.32.0.28:44135/readiness: dial tcp 10.32.0.28:44135: getsockopt: connection refused
Warning Unhealthy 7m kubelet, vagrant Liveness probe failed: Get http://10.32.0.28:44135/liveness: dial tcp 10.32.0.28:44135: getsockopt: connection refused
Warning Unhealthy 1m (x6 over 3m) kubelet, vagrant Liveness probe failed: Get http://10.32.0.28:44135/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 41s (x14 over 7m) kubelet, vagrant Readiness probe failed: Get http://10.32.0.28:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Any insight would be appreciated.

Related

kube-scheduler and kube-controller-manager restarting

I have kubernetes 1.15.3 setup
My kube-controller & kube-scheduler are restarting very frequently . This is happening after kubernetes is upgraded to 1.15.3.
kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-nmt5d 1/1 Running 37 24d
coredns-5c98db65d4-tg4kx 1/1 Running 37 24d
etcd-ana01 1/1 Running 1 24d
kube-apiserver-ana01 1/1 Running 10 24d
**kube-controller-manager-ana01 1/1 Running 477 9d**
kube-flannel-ds-amd64-2srzb 1/1 Running 0 12d
kube-proxy-2hvcl 1/1 Running 0 23d
**kube-scheduler-ana01 1/1 Running 518 9d**
tiller-deploy-8557598fbc-kxntc 1/1 Running 0 11d
Here is the logs of the system
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 39m (x500 over 23d) kubelet, ana01 Liveness probe failed: Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
Warning BackOff 39m (x1873 over 23d) kubelet, ana01 Back-off restarting failed container
Normal Pulled 28m (x519 over 24d) kubelet, ana01 Container image "k8s.gcr.io/kube-scheduler:v1.15.3" already present on machine
Normal Created 28m (x519 over 24d) kubelet, ana01 Created container kube-scheduler
Normal Started 27m (x519 over 24d) kubelet, ana01 Started container kube-scheduler
logs are
I0928 09:10:23.554335 1 serving.go:319] Generated self-signed cert in-memory
W0928 09:10:25.002268 1 authentication.go:387] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0928 09:10:25.002523 1 authentication.go:249] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0928 09:10:25.002607 1 authentication.go:252] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0928 09:10:25.002947 1 authorization.go:177] **failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory**
W0928 09:10:25.003116 1 authorization.go:146] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0928 09:10:25.021201 1 server.go:142] Version: v1.15.3

Add a node to cluster with Flannel : "cannot join network of a non running container"

I am adding a node to the Kubernetes cluster as a node using flannel. Here are the nodes on my cluster:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
jetson-80 NotReady <none> 167m v1.15.0
p4 Ready master 18d v1.15.0
This machine is reachable through the same network. When joining the cluster, Kubernetes pulls some images, among others k8s.gcr.io/pause:3.1, but for some reason failed in pulling the images:
Warning FailedCreatePodSandBox 15d
kubelet,jetson-81 Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Error response from daemon: Get https://k8s.gcr.io/v2/: read tcp 192.168.8.81:58820->108.177.126.82:443: read: connection reset by peer
The machine is connected to the internet but only wget command works, not ping
I tried to pull images elsewhere and copy them to the machine.
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.15.0 d235b23c3570 2 months ago 82.4MB
quay.io/coreos/flannel v0.11.0-arm64 32ffa9fadfd7 6 months ago 53.5MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 20 months ago 742kB
Here are the list of pods on the master :
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-gmsz7 1/1 Running 0 2d22h
coredns-5c98db65d4-j6gz5 1/1 Running 0 2d22h
etcd-p4 1/1 Running 0 2d22h
kube-apiserver-p4 1/1 Running 0 2d22h
kube-controller-manager-p4 1/1 Running 0 2d22h
kube-flannel-ds-amd64-cq7kz 1/1 Running 9 17d
kube-flannel-ds-arm64-4s7kk 0/1 Init:CrashLoopBackOff 0 2m8s
kube-proxy-l2slz 0/1 CrashLoopBackOff 4 2m8s
kube-proxy-q6db8 1/1 Running 0 2d22h
kube-scheduler-p4 1/1 Running 0 2d22h
tiller-deploy-5d6cc99fc-rwdrl 1/1 Running 1 17d
but it didn't work either when I check the associated flannel pod kube-flannel-ds-arm64-4s7kk:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 66s default-scheduler Successfully assigned kube-system/kube-flannel-ds-arm64-4s7kk to jetson-80
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 68ffc44cf8cd655234691b0362615f97c59d285bec790af40f890510f27ba298
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: a196d8540b68dc7fcd97b0cda1e2f3183d1410598b6151c191b43602ac2faf8e
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 9d05d1fcb54f5388ca7e64d1b6627b05d52aea270114b5a418e8911650893bc6
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 5b730961cddf5cc3fb2af564b1abb46b086073d562bb2023018cd66fc5e96ce7
Normal Created <invalid> (x5 over <invalid>) kubelet, jetson-80 Created container install-cni
Warning Failed <invalid> kubelet, jetson-80 Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 1767e9eb9198969329eaa14a71a110212d6622a8b9844137ac5b247cb9e90292
Normal SandboxChanged <invalid> (x5 over <invalid>) kubelet, jetson-80 Pod sandbox changed, it will be killed and re-created.
Warning BackOff <invalid> (x4 over <invalid>) kubelet, jetson-80 Back-off restarting failed container
Normal Pulled <invalid> (x6 over <invalid>) kubelet, jetson-80 Container image "quay.io/coreos/flannel:v0.11.0-arm64" already present on machine
I still can't identify if it's a Kubernetes or Flannel issue and haven't been able to solve it despite multiple attempts. Please let me know if you need me to share more details
EDIT:
using kubectl describe pod -n kube-system kube-proxy-l2slz :
Normal Pulled <invalid> (x67 over <invalid>) kubelet, ahold-jetson-80 Container image "k8s.gcr.io/kube-proxy:v1.15.0" already present on machine
Normal SandboxChanged <invalid> (x6910 over <invalid>) kubelet, ahold-jetson-80 Pod sandbox changed, it will be killed and re-created.
Warning FailedSync <invalid> (x77 over <invalid>) kubelet, ahold-jetson-80 (combined from similar events): error determining status: rpc error: code = Unknown desc = Error: No such container: 03e7ee861f8f63261ff9289ed2d73ea5fec516068daa0f1fe2e4fd50ca42ad12
Warning BackOff <invalid> (x8437 over <invalid>) kubelet, ahold-jetson-80 Back-off restarting failed container
Your problem may be coused by the mutil sandbox container in you node. Try to restart the kubelet:
$ systemctl restart kubelet
Check if you have generated and copied public key to right node to have connection between them: ssh-keygen.
Please make sure the firewall/security groups allow traffic on UDP port 58820.
Look at the flannel logs and see if there are any errors there but also look for "Subnet added: " messages. Each node should have added the other two subnets.
While running ping, try to use tcpdump to see where the packets get dropped.
Try src flannel0 (icmp), src host interface (udp port 58820), dest host interface (udp port 58820), dest flannel0 (icmp), docker0 (icmp).
Here is useful documentation: flannel-documentation.

issue on arm64: no endpoints,code:503

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/arm64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/arm64"}
Environment:
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Kernel (e.g. uname -a):
Linux node4 4.11.0-rc6-next-20170411-00286-gcc55807 #0 SMP PREEMPT Mon Jun 5 18:56:20 CST 2017 aarch64 aarch64 aarch64 GNU/Linux
What happened:
I want to use kube-deploy/master.sh to setup master on ARM64, but I encountered the error when visiting $myip:8080/ui:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service "kubernetes-dashboard"",
"reason": "ServiceUnavailable",
"code": 503
}
My branch is 2017-2-7 (c8d6fbfc…)
by the way, It can work on X86-amd64 platform by using the same steps to install.
Anything else we need to know:
5.1 kubectl get pod --namespace=kube-system
k8s-master-10.193.20.23 4/4 Running 17 1h
k8s-proxy-v1-sk8vd 1/1 Running 0 1h
kube-addon-manager-10.193.20.23 2/2 Running 2 1h
kube-dns-3365905565-xvj7n 2/4 CrashLoopBackOff 65 1h
kubernetes-dashboard-1416335539-lhlhz 0/1 CrashLoopBackOff 22 1h
5.2 kubectl describe pods kubernetes-dashboard-1416335539-lhlhz --namespace=kube-system
Name: kubernetes-dashboard-1416335539-lhlhz
Namespace: kube-system
Node: 10.193.20.23/10.193.20.23
Start Time: Mon, 12 Jun 2017 10:04:07 +0800
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=1416335539
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kubernetes-dashboard-1416335539","uid":"6ab170d2-4f13-11e7-a...
scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 10.1.70.2
Controllers: ReplicaSet/kubernetes-dashboard-1416335539
Containers:
kubernetes-dashboard:
Container ID: docker://fbdbe4c047803b0e98ca7412ca617031f1f31d881e3a5838298a1fda24a1ae18
Image: gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0
Image ID: docker-pullable://gcr.io/google_containers/kubernetes-dashboard-arm64#sha256:559d58ef0d8e9dbe78f80060401b97d6262462318c0b8e071937a73896ea1d3d
Port: 9090/TCP
State: Running
Started: Mon, 12 Jun 2017 11:30:03 +0800
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 12 Jun 2017 11:24:28 +0800
Finished: Mon, 12 Jun 2017 11:24:59 +0800
Ready: True
Restart Count: 23
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-0mnn8 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-0mnn8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-0mnn8
Optional: false
QoS Class: Guaranteed
Node-Selectors:
Tolerations:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
30m 30m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id b0562b3640ae: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
18m 18m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 477066c3a00f: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
12m 12m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 3e021d6df31f: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
11m 11m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 43fe3c37817d: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
5m 5m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 23cea72e1f45: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
1h 5m 7 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Warning Unhealthy Liveness probe failed: Get http://10.1.70.2:9090/: dial tcp 10.1.70.2:9090: getsockopt: connection refused
1h 38s 335 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Warning BackOff Back-off restarting failed docker container
1h 38s 307 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)"
1h 27s 24 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Pulled Container image "gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0" already present on machine
59m 23s 15 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Created (events with common reason combined)
59m 22s 15 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Started (events with common reason combined)
5.3 kubectl get svc,ep,rc,rs,deploy,pod -o wide --all-namespaces
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default svc/kubernetes 10.0.0.1 443/TCP 16m
kube-system svc/kube-dns 10.0.0.10 53/UDP,53/TCP 16m k8s-app=kube-dns
kube-system svc/kubernetes-dashboard 10.0.0.95 80/TCP 16m k8s-app=kubernetes-dashboard
NAMESPACE NAME ENDPOINTS AGE
default ep/kubernetes 10.193.20.23:6443 16m
kube-system ep/kube-controller-manager <none> 11m
kube-system ep/kube-dns 16m
kube-system ep/kube-scheduler <none> 11m
kube-system ep/kubernetes-dashboard 16m
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINER(S) IMAGE(S) SELECTOR
kube-system rs/kube-dns-3365905565 1 1 0 16m kubedns,dnsmasq,dnsmasq-metrics,healthz gcr.io/google_containers/kubedns-arm64:1.9,gcr.io/google_containers/kube-dnsmasq-arm64:1.4,gcr.io/google_containers/dnsmasq-metrics-arm64:1.0,gcr.io/google_containers/exechealthz-arm64:1.2 k8s-app=kube-dns,pod-template-hash=3365905565
kube-system rs/kubernetes-dashboard-1416335539 1 1 0 16m kubernetes-dashboard gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0 k8s-app=kubernetes-dashboard,pod-template-hash=1416335539
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINER(S) IMAGE(S) SELECTOR
kube-system deploy/kube-dns 1 1 1 0 16m kubedns,dnsmasq,dnsmasq-metrics,healthz gcr.io/google_containers/kubedns-arm64:1.9,gcr.io/google_containers/kube-dnsmasq-arm64:1.4,gcr.io/google_containers/dnsmasq-metrics-arm64:1.0,gcr.io/google_containers/exechealthz-arm64:1.2 k8s-app=kube-dns
kube-system deploy/kubernetes-dashboard 1 1 1 0 16m kubernetes-dashboard gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0 k8s-app=kubernetes-dashboard
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system po/k8s-master-10.193.20.23 4/4 Running 50 15m 10.193.20.23 10.193.20.23
kube-system po/k8s-proxy-v1-5b831 1/1 Running 0 16m 10.193.20.23 10.193.20.23
kube-system po/kube-addon-manager-10.193.20.23 2/2 Running 6 15m 10.193.20.23 10.193.20.23
kube-system po/kube-dns-3365905565-jxg4f 1/4 CrashLoopBackOff 20 16m 10.1.5.3 10.193.20.23
kube-system po/kubernetes-dashboard-1416335539-frt3v 0/1 CrashLoopBackOff 7 16m 10.1.5.2 10.193.20.23
5.4 kubectl describe pods kube-dns-3365905565-lb0mq --namespace=kube-system
Name: kube-dns-3365905565-lb0mq
Namespace: kube-system
Node: 10.193.20.23/10.193.20.23
Start Time: Wed, 14 Jun 2017 10:43:46 +0800
Labels: k8s-app=kube-dns
pod-template-hash=3365905565
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-dns-3365905565","uid":"4870aec2-50ab-11e7-a420-6805ca36...
scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 10.1.20.3
Controllers: ReplicaSet/kube-dns-3365905565
Containers:
kubedns:
Container ID: docker://729562769b48be60a02b62692acd3d1e1c67ac2505f4cb41240067777f45fd77
Image: gcr.io/google_containers/kubedns-arm64:1.9
Image ID: docker-pullable://gcr.io/google_containers/kubedns-arm64#sha256:3c78a2c5b9b86c5aeacf9f5967f206dcf1e64362f3e7f274c1c078c954ecae38
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-map=kube-dns
--v=0
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 14 Jun 2017 10:56:29 +0800
Finished: Wed, 14 Jun 2017 10:58:06 +0800
Ready: False
Restart Count: 6
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
dnsmasq:
Container ID: docker://b6d7e98a4af2715294764929f901947ab3b985be45d9f213245bd338ab8c3101
Image: gcr.io/google_containers/kube-dnsmasq-arm64:1.4
Image ID: docker-pullable://gcr.io/google_containers/kube-dnsmasq-arm64#sha256:dff5f9e2a521816aa314d469fd8ef961270fe43b4a74bab490385942103f3728
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 14 Jun 2017 10:55:50 +0800
Finished: Wed, 14 Jun 2017 10:57:26 +0800
Ready: False
Restart Count: 6
Requests:
cpu: 150m
memory: 10Mi
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
dnsmasq-metrics:
Container ID: docker://51693aea0e732e488b631dcedc082f5a9e23b5b74857217cf005d1e947375367
Image: gcr.io/google_containers/dnsmasq-metrics-arm64:1.0
Image ID: docker-pullable://gcr.io/google_containers/dnsmasq-metrics-arm64#sha256:fc0e8b676a26ed0056b8c68611b74b9b5f3f00c608e5b11ef1608484ce55dd9a
Port: 10054/TCP
Args:
--v=2
--logtostderr
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Exit Code: 128
Started: Wed, 14 Jun 2017 10:57:28 +0800
Finished: Wed, 14 Jun 2017 10:57:28 +0800
Ready: False
Restart Count: 7
Requests:
memory: 10Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
healthz:
Container ID: docker://fab7ef143a95ad4d2f6363d5fcdc162eba1522b92726665916462be765289327
Image: gcr.io/google_containers/exechealthz-arm64:1.2
Image ID: docker-pullable://gcr.io/google_containers/exechealthz-arm64#sha256:e8300fde6c36b454cc00b5fffc96d6985622db4d8eb42a9f98f24873e9535b5c
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
State: Running
Started: Wed, 14 Jun 2017 10:44:31 +0800
Ready: True
Restart Count: 0
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-1t5v9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1t5v9
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
15m 15m 1 default-scheduler Normal Scheduled Successfully assigned kube-dns-3365905565-lb0mq to 10.193.20.23
14m 14m 1 kubelet, 10.193.20.23 spec.containers{kubedns} Normal Created Created container with docker id 2fef2db445e6; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{kubedns} Normal Started Started container with docker id 2fef2db445e6
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq} Normal Created Created container with docker id 41ec998eeb76; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq} Normal Started Started container with docker id 41ec998eeb76
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 676ef0e877c8; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Pulled Container image "gcr.io/google_containers/exechealthz-arm64:1.2" already present on machine
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 676ef0e877c8 with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Created Created container with docker id fab7ef143a95; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Started Started container with docker id fab7ef143a95
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 45f6bd7f1f3a with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 45f6bd7f1f3a; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "dnsmasq-metrics" with CrashLoopBackOff: "Back-off 10s restarting failed container=dnsmasq-metrics pod=kube-dns-3365905565-lb0mq_kube-system(48845c1a-50ab-11e7-a420-6805ca369d7f)"
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 2d1e5adb97bb; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 2d1e5adb97bb with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 2 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "dnsmasq-metrics" with CrashLoopBackOff: "Back-off 20s restarting failed container=dnsmasq-metrics pod=kube-dns-3365905565-lb0mq_kube-system(48845c1a-50ab-11e7-a420-6805ca369d7f)"
So it looks like you have hit a (or several) bugs in Kubernetes. I suggest that you retry with a more recent version (possibly another docker version too). It would be a good idea to report these bugs too (https://github.com/kubernetes/dashboard/issues).
All in all, bear in mind that Kubernetes on arm is an advanced topic and you should expect problems and be ready to debug/resolve them :/
There might be a problem with that docker image (gcr.io/google_containers/dnsmasq-metrics-amd64). Non amd64 stuff is not well tested.
Could you try running:
kubectl set image --namespace=kube-system deployment/kube-dns dnsmasq-metrics=lenart/dnsmasq-metrics-arm64:1.0`
Can't reach dashboard because the dashboard Pod is unhealthy and failing the readiness probe. Because it's not ready it's not considered for the dashboard service so the service has no endpoints which leads to the error message you reported.
The dashboard is most likely unhealthy because kube-dns is not ready (1/4 containers in the Pod ready, should be 4/4).
The kube-dns is most likely unhealthy because you have no pod networking (overlay network) deployed.
Go to the add-ons, pick a network add-on and deploy it. Weave has 1.5 compatible version and requires no setup.
After you have done that give it a few minutes. If you are inpatient just delete the kubernetes-dashboard and kube-dns pods (not the deployment/controller!!). If this does not resolve your problem then please update your question with the new information.

NFS volume sharing issue between wordpress pod and mysql pod

This repository kubernetes-wordpress-with-nfs-volume-on-gke is trying to implement a wordpress application that shares an NFS volume between mySQL and wordpress. The idea behind sharing a NFS volume between pods is to implement in the next step a StatefulSet for mySQL. This StatefulSet application will need to share the database (the volume of the database) between all the pods of mySQL so that a multi node database is created that ensures the requested high performance.
To do that, there is an example janakiramm/wp-statefulset. This example is using etcd. So why not using nfs in stead of etcd?
The commands to run to create this kubernetes wordpress application that shared the NFS volume between MySQL and wordpress are:
kubectl create -f 01-pv-gce.yml
kubectl create -f 02-dep-nfs.yml
kubectl create -f 03-srv-nfs.yml
kubectl get services # you have to update the file 04-pv-pvc with the new IP address of the service
kubectl create -f 04-pv-pvc.yml
kubectl create -f 05-mysql.yml
kubectl create -f 06-wordpress.yml
This implementation did not succeed. The wordpress pod is not starting:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-2899972627-jgjx0 1/1 Running 0 4m
wp01-mysql-1941769936-m9jjd 1/1 Running 0 3m
wp01-wordpress-2362719074-bv53t 0/1 CrashLoopBackOff 4 2m
It seems to be that there is a problem to access to NFS volume as described below:
$ kubectl describe pods wp01-wordpress-2362719074-bv53t
Name: wp01-wordpress-2362719074-bv53t
Namespace: default
Node: gke-mappedinn-cluster-default-pool-6264f94a-z0sh/10.240.0.4
Start Time: Thu, 04 May 2017 05:59:12 +0400
Labels: app=wp01
pod-template-hash=2362719074
tier=frontend
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"wp01-wordpress-2362719074","uid":"44b91da0-306d-11e7-a0d1-42010a...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container wordpress
Status: Running
IP: 10.244.0.4
Controllers: ReplicaSet/wp01-wordpress-2362719074
Containers:
wordpress:
Container ID: docker://658c7392c1b7a5033fe1a1b456a9653161003ee2878a4f02c6a12abb49241d47
Image: wordpress:4.6.1-apache
Image ID: docker://sha256:ee397259d4e59c65e2c1c5979a3634eb3ab106bba389acea8b21862053359134
Port: 80/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 04 May 2017 06:03:16 +0400
Finished: Thu, 04 May 2017 06:03:16 +0400
Ready: False
Restart Count: 5
Requests:
cpu: 100m
Environment:
WORDPRESS_DB_HOST: wp01-mysql
WORDPRESS_DB_PASSWORD: <set to the key 'password' in secret 'wp01-pwd-wordpress'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-k650h (ro)
/var/www/html from wordpress-persistent-storage (rw)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
wordpress-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: wp01-pvc-data
ReadOnly: false
default-token-k650h:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-k650h
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
5m 5m 1 default-scheduler Normal Scheduled Successfully assigned wp01-wordpress-2362719074-bv53t to gke-mappedinn-cluster-default-pool-6264f94a-z0sh
4m 4m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Pulling pulling image "wordpress:4.6.1-apache"
4m 4m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Pulled Successfully pulled image "wordpress:4.6.1-apache"
4m 4m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Created Created container with docker id 8647e997d6f4; Security:[seccomp=unconfined]
4m 4m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Started Started container with docker id 8647e997d6f4
4m 4m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Created Created container with docker id 37f4f0fd392d; Security:[seccomp=unconfined]
4m 4m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Started Started container with docker id 37f4f0fd392d
4m 4m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "wordpress" with CrashLoopBackOff: "Back-off 10s restarting failed container=wordpress pod=wp01-wordpress-2362719074-bv53t_default(44ba1226-306d-11e7-a0d1-42010a8e0084)"
3m 3m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Created Created container with docker id b78a661388a2; Security:[seccomp=unconfined]
3m 3m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Started Started container with docker id b78a661388a2
3m 3m 2 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "wordpress" with CrashLoopBackOff: "Back-off 20s restarting failed container=wordpress pod=wp01-wordpress-2362719074-bv53t_default(44ba1226-306d-11e7-a0d1-42010a8e0084)"
3m 3m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Created Created container with docker id 2b6384407678; Security:[seccomp=unconfined]
3m 3m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Started Started container with docker id 2b6384407678
3m 2m 4 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "wordpress" with CrashLoopBackOff: "Back-off 40s restarting failed container=wordpress pod=wp01-wordpress-2362719074-bv53t_default(44ba1226-306d-11e7-a0d1-42010a8e0084)"
2m 2m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Created Created container with docker id 930a3410b213; Security:[seccomp=unconfined]
2m 2m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Started Started container with docker id 930a3410b213
2m 1m 7 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "wordpress" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=wordpress pod=wp01-wordpress-2362719074-bv53t_default(44ba1226-306d-11e7-a0d1-42010a8e0084)"
4m 1m 5 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Pulled Container image "wordpress:4.6.1-apache" already present on machine
1m 1m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Created Created container with docker id 658c7392c1b7; Security:[seccomp=unconfined]
1m 1m 1 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Normal Started Started container with docker id 658c7392c1b7
4m 10s 19 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh spec.containers{wordpress} Warning BackOff Back-off restarting failed docker container
1m 10s 5 kubelet, gke-mappedinn-cluster-default-pool-6264f94a-z0sh
Could you please help on that issue?

deis builder keep restart with liveness probe fail

I tried to delete the pods, or rescale the replicas, or delete the aws instances, but still cannot make the deis builder work normally. It keeps restart with failed liveness probe. Below the logs from the deis builder
$ kubectl describe pods/deis-builder-2995120344-mz2zg -n deis
Name: deis-builder-2995120344-mz2zg
Namespace: deis
Node: ip-10-0-48-189.ec2.internal/10.0.48.189
Start Time: Wed, 15 Mar 2017 22:29:03 -0400
Labels: app=deis-builder
pod-template-hash=2995120344
Status: Running
IP: 10.34.184.7
Controllers: ReplicaSet/deis-builder-2995120344
Containers:
deis-builder:
Container ID: docker://f2b7799712c347759832270716057b6ac3be68298eef3057c25727b66024c84a
Image: quay.io/deis/builder:v2.7.1
Image ID: docker-pullable://quay.io/deis/builder#sha256:3dab1dd4e6359d1588fee1b4f93ef9f5c70f268f17de5bed4bc13faa210ce5d0
Ports: 2223/TCP, 8092/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 15 Mar 2017 22:37:37 -0400
Finished: Wed, 15 Mar 2017 22:38:15 -0400
Ready: False
Restart Count: 7
Liveness: http-get http://:8092/healthz delay=30s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8092/readiness delay=30s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/var/run/secrets/api/auth from builder-key-auth (ro)
/var/run/secrets/deis/builder/ssh from builder-ssh-private-keys (ro)
/var/run/secrets/deis/objectstore/creds from objectstore-creds (ro)
/var/run/secrets/kubernetes.io/serviceaccount from deis-builder-token-qbqff (ro)
Environment Variables:
DEIS_REGISTRY_SERVICE_HOST: 127.0.0.1
DEIS_REGISTRY_SERVICE_PORT: 5555
HEALTH_SERVER_PORT: 8092
EXTERNAL_PORT: 2223
BUILDER_STORAGE: s3
DEIS_REGISTRY_LOCATION: ecr
DEIS_REGISTRY_SECRET_PREFIX: private-registry
GIT_LOCK_TIMEOUT: 10
SLUGBUILDER_IMAGE_NAME: <set to the key 'image' of config map 'slugbuilder-config'>
SLUG_BUILDER_IMAGE_PULL_POLICY: <set to the key 'pullpolicy' of config map 'slugbuilder-config'>
DOCKERBUILDER_IMAGE_NAME: <set to the key 'image' of config map 'dockerbuilder-config'>
DOCKER_BUILDER_IMAGE_PULL_POLICY: <set to the key 'pullpolicy' of config map 'dockerbuilder-config'>
DOCKERIMAGE: 1
DEIS_DEBUG: false
POD_NAMESPACE: deis (v1:metadata.namespace)
DEIS_BUILDER_KEY: <set to the key 'builder-key' in secret 'builder-key-auth'>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
builder-key-auth:
Type: Secret (a volume populated by a Secret)
SecretName: builder-key-auth
builder-ssh-private-keys:
Type: Secret (a volume populated by a Secret)
SecretName: builder-ssh-private-keys
objectstore-creds:
Type: Secret (a volume populated by a Secret)
SecretName: objectstorage-keyfile
deis-builder-token-qbqff:
Type: Secret (a volume populated by a Secret)
SecretName: deis-builder-token-qbqff
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type ReasonMessage
--------- -------- ----- ---- ------------- -------- -------------
10m 10m 1 {default-scheduler } Normal Scheduled Successfully assigned deis-builder-2995120344-mz2zg to ip-10-0-48-189.ec2.internal
10m 10m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal StartedStarted container with docker id 7eac3a357f61
10m 10m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal CreatedCreated container with docker id 7eac3a357f61; Security:[seccomp=unconfined]
9m 9m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal CreatedCreated container with docker id 8e730f2731ef; Security:[seccomp=unconfined]
9m 9m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal StartedStarted container with docker id 8e730f2731ef
9m 9m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal KillingKilling container with docker id 7eac3a357f61: pod "deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)" container "deis-builder" is unhealthy, it will be killed and re-created.
9m 9m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal CreatedCreated container with docker id 5f4e695c595a; Security:[seccomp=unconfined]
9m 9m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal KillingKilling container with docker id 8e730f2731ef: pod "deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)" container "deis-builder" is unhealthy, it will be killed and re-created.
9m 9m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal StartedStarted container with docker id 5f4e695c595a
8m 8m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal CreatedCreated container with docker id c87d762fc118; Security:[seccomp=unconfined]
8m 8m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal StartedStarted container with docker id c87d762fc118
8m 8m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal KillingKilling container with docker id 5f4e695c595a: pod "deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)" container "deis-builder" is unhealthy, it will be killed and re-created.
7m 7m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal CreatedCreated container with docker id 416573d43fe4; Security:[seccomp=unconfined]
7m 7m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal StartedStarted container with docker id 416573d43fe4
7m 7m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal KillingKilling container with docker id c87d762fc118: pod "deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)" container "deis-builder" is unhealthy, it will be killed and re-created.
7m 7m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal KillingKilling container with docker id 416573d43fe4: pod "deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)" container "deis-builder" is unhealthy, it will be killed and re-created.
7m 6m 4 {kubelet ip-10-0-48-189.ec2.internal} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "deis-builder" with CrashLoopBackOff: "Back-off 40s restarting failed container=deis-builder pod=deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)"
6m 6m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Created Created container with docker id bf5b29729c27; Security:[seccomp=unconfined]
6m 6m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Started Started container with docker id bf5b29729c27
9m 5m 4 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Warning Unhealthy Readiness probe failed: Get http://10.34.184.7:8092/readiness: dial tcp 10.34.184.7:8092: getsockopt: connection refused
9m 5m 4 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Warning Unhealthy Liveness probe failed: Get http://10.34.184.7:8092/healthz: dial tcp 10.34.184.7:8092: getsockopt: connection refused
5m 5m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Started Started container with docker id e457328db858
5m 5m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Created Created container with docker id e457328db858; Security:[seccomp=unconfined]
5m 5m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Killing Killing container with docker id bf5b29729c27: pod "deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)" container "deis-builder" is unhealthy, it will be killed and re-created.
5m 5m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Killing Killing container with docker id e457328db858: pod "deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)" container "deis-builder" is unhealthy, it will be killed and re-created.
5m 2m 13 {kubelet ip-10-0-48-189.ec2.internal} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "deis-builder" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=deis-builder pod=deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)"
2m 2m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Started Started container with docker id f2b7799712c3
10m 2m 8 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Pulled Container image "quay.io/deis/builder:v2.7.1" already present on machine
2m 2m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Created Created container with docker id f2b7799712c3; Security:[seccomp=unconfined]
10m 1m 6 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Warning Unhealthy Liveness probe failed: Get http://10.34.184.7:8092/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
1m 1m 1 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Normal Killing Killing container with docker id f2b7799712c3: pod "deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)" container "deis-builder" is unhealthy, it will be killed and re-created.
7m 9s 26 {kubelet ip-10-0-48-189.ec2.internal} spec.containers{deis-builder} Warning BackOff Back-off restarting failed docker container
1m 9s 9 {kubelet ip-10-0-48-189.ec2.internal} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "deis-builder" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=deis-builder pod=deis-builder-2995120344-mz2zg_deis(52027ebf-09f0-11e7-8bbf-0a73a2cd36e4)"
What does helm ls show for the workflow version of deis?
Anything showing up in output for the logs for the container when you run the command below?
kubectl --namespace deis logs deis-builder-2995120344-mz2zg
That logs bit will help with anyone trying to help you figure out your unhealthy builder.
My solution was to delete deis and redeploy it.