Helm init for tiller deploy is stuck at ContainerCREATING status - kubernetes

NAME READY STATUS RESTARTS AGE
coredns-5644d7b6d9-289qz 1/1 Running 0 76m
coredns-5644d7b6d9-ssbb2 1/1 Running 0 76m
etcd-k8s-master 1/1 Running 0 75m
kube-apiserver-k8s-master 1/1 Running 0 75m
kube-controller-manager-k8s-master 1/1 Running 0 75m
kube-proxy-2q9k5 1/1 Running 0 71m
kube-proxy-dz9pk 1/1 Running 0 76m
kube-scheduler-k8s-master 1/1 Running 0 75m
tiller-deploy-7b875fbf86-8nxmk 0/1 ContainerCreating 0 17m
weave-net-nzb67 2/2 Running 0 75m
weave-net-t8kmk 2/2 Running 0 71m
Installed Kubernates version v1.16.2 but when installing tiller using new service account it is strucking at Container creating. Tried all the solutions such as RBAC, Removing tiller role and do it again, reinstalling kubernates etc.
Output for Kubectl describe is as follows.
[02:32:50] root#k8s-master$ kubectl describe pods tiller-deploy-7b875fbf86-8nxmk --namespace kube-system
Name: tiller-deploy-7b875fbf86-8nxmk
Namespace: kube-system
Priority: 0
Node: worker-node1/172.17.0.1
Start Time: Thu, 24 Oct 2019 14:12:45 -0400
Labels: app=helm
name=tiller
pod-template-hash=7b875fbf86
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/tiller-deploy-7b875fbf86
Containers:
tiller:
Container ID:
Image: gcr.io/kubernetes-helm/tiller:v2.15.1
Image ID:
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from tiller-token-rr2jg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tiller-token-rr2jg:
Type: Secret (a volume populated by a Secret)
SecretName: tiller-token-rr2jg
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/tiller-deploy-7b875fbf86-8nxmk to worker-node1
Warning FailedCreatePodSandBox 61s (x5 over 17m) kubelet, worker-node1 Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal SandboxChanged 60s (x5 over 17m) kubelet, worker-node1 Pod sandbox changed, it will be killed and re-created.
[~]

FailedCreatePodSandBox
Means that worker-node1 (172.17.0.1) does not have CNI installed or configured, and is a frequently asked question. Whatever mechanism you used to install kubernetes did not do a robust job, or else you missed a step along the way.
I also have pretty high confidence that your kubelet logs on worker-node1 are filled with error messages, if you were to actually look at them.

Related

How to resolve this error that nginx-ingress-controller start fail in my k8s cluster?

Rancher v2.4.2
kubernetes version: v1.17.4
In my k8s cluster,nginx-ingress-controller doesn't work and restart always.I don't get anything useful information in the logs, thanks for your help.
cluster nodes:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready controlplane,etcd,worker 18d v1.17.4
master2 Ready controlplane,etcd,worker 17d v1.17.4
node1 Ready worker 17d v1.17.4
node2 Ready worker 17d v1.17.4
cluster pods in ingress-nginx namespace
> kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
default-http-backend-5bb77998d7-k7gdh 1/1 Running 1 17d
nginx-ingress-controller-6l4jh 0/1 Running 10 27m
nginx-ingress-controller-bh2pg 1/1 Running 0 63m
nginx-ingress-controller-drtzx 1/1 Running 0 63m
nginx-ingress-controller-qndbw 1/1 Running 0 63m
the pod logs of nginx-ingress-controller-6l4jh
> kubectl logs nginx-ingress-controller-6l4jh -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: nginx-0.25.1-rancher1
Build:
Repository: https://github.com/rancher/ingress-nginx.git
nginx version: openresty/1.15.8.1
-------------------------------------------------------------------------------
>
describe info
> kubectl describe pod nginx-ingress-controller-6l4jh -n ingress-nginx
Name: nginx-ingress-controller-6l4jh
Namespace: ingress-nginx
Priority: 0
Node: node2/172.26.13.11
Start Time: Tue, 19 Apr 2022 07:12:16 +0000
Labels: app=ingress-nginx
controller-revision-hash=758cb9dbbc
pod-template-generation=8
Annotations: cattle.io/timestamp: 2022-04-19T07:08:51Z
field.cattle.io/ports:
[[{"containerPort":80,"dnsName":"nginx-ingress-controller-hostport","hostPort":80,"kind":"HostPort","name":"http","protocol":"TCP","source...
field.cattle.io/publicEndpoints:
[{"addresses":["172.26.13.130"],"nodeId":"c-wv692:m-d5802d05bbf0","port":80,"protocol":"TCP"},{"addresses":["172.26.13.130"],"nodeId":"c-w...
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Running
IP: 172.26.13.11
IPs:
IP: 172.26.13.11
Controlled By: DaemonSet/nginx-ingress-controller
Containers:
nginx-ingress-controller:
Container ID: docker://09a6248edb921b9c9cbab678c793fe1cc3d28322ea6abbb8f15c899351ce4b40
Image: 172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
Image ID: docker-pullable://172.26.13.133:5000/rancher/nginx-ingress-controller#sha256:fe50ceea3d1a0bc9a7ccef8d5845c9a30b51f608e411467862dff590185a47d2
Ports: 80/TCP, 443/TCP
Host Ports: 80/TCP, 443/TCP
Args:
/nginx-ingress-controller
--default-backend-service=$(POD_NAMESPACE)/default-http-backend
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Tue, 19 Apr 2022 07:40:12 +0000
Finished: Tue, 19 Apr 2022 07:41:32 +0000
Ready: False
Restart Count: 11
Liveness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-6l4jh (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-2kdbj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-2kdbj:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-2kdbj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoExecute
:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-6l4jh to node2
Normal Pulled 27m (x3 over 30m) kubelet, node2 Container image "172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1" already present on machine
Normal Created 27m (x3 over 30m) kubelet, node2 Created container nginx-ingress-controller
Normal Started 27m (x3 over 30m) kubelet, node2 Started container nginx-ingress-controller
Normal Killing 27m (x2 over 28m) kubelet, node2 Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 25m (x10 over 29m) kubelet, node2 Liveness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning Unhealthy 10m (x21 over 29m) kubelet, node2 Readiness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning BackOff 8s (x69 over 20m) kubelet, node2 Back-off restarting failed container
>
It sounds like the ingress controller pod fails the liveness/readiness checks but looks like only on a certain node. You could try:
check the node for firewall on that port
update to newer version than nginx-0.25.1

pod is not changing state from unready to ready

I am using the readiness probe feature in my k8s pod. For this, probe checks if the file /tmp/healthy is present or not. If not, probe mark pod UN-READY. I have observed If I create the file again /tmp/healthy, the pod does not come back to READY state. Does it mean, in pod lifecycle there is only path of READY--> UN-READY but not vice-versa?
/home/ravi/for_others/ric/stub>kubectl describe pod -n myns deployment-eterm
Name: deployment-eterm
Namespace: myns
Priority: 0
Status: Running
IP: 192.168.252.87
IPs:
IP: 192.168.252.87
Controlled By: ReplicaSet/deployment-myns-eterm-micro-6b896556c8
Containers:
Liveness: exec [/bin/sh -c /tmp/liveliness-status-collector.sh] delay=60s timeout=1s period=2s #success=1 #failure=5
Readiness: exec [/bin/sh -c cat /tmp/healthy] delay=60s timeout=1s period=10s #success=1 #failure=1
Environment Variables from:
configmap-myns-eterm-env-micro ConfigMap Optional: false
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-57jpz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m11s default-scheduler Successfully assigned myns/deployment-eterm to mnode
Normal Pulled 9m10s kubelet, mnode Container image "porics.microlab.com:5000/eterm:latest" already present on machine
Normal Created 9m10s kubelet, mnode Created container container-myns-eterm
Normal Started 9m10s kubelet, mnode Started container container-myns-eterm
Normal Pulled 9m10s kubelet, mnode Container image "porics.microlab.com:5000/config-proxy:latest" already present on machine
Normal Created 9m10s kubelet, mnode Created container container-myns-configproxy
Normal Started 9m10s kubelet, mnode Started container container-myns-configproxy
Warning Unhealthy 3s (x3 over 23s) kubelet, mnode Readiness probe failed: cat: /tmp/healthy: No such file or directory
/home/ravi/for_others/ric/stub>
The below statement is not true:
Does it mean, in pod lifecycle, there is the only one path of READY--> UN-READY
but not vice-versa?
You can run the below experiment to see the pod transition:
Create a pod with below manifest file:
apiVersion: v1
kind: Pod
metadata:
labels:
test: readiness
name: readiness-test
spec:
nodeName: kube-master
containers:
- name: liveness
image: bash
command: ['bash','-c', 'while true;do echo "$(date): creating /tmp/healthy"; touch /tmp/healthy; sleep 10; echo "$(date): deleting /tmp/healthy";rm /tmp/healthy ;sleep 10;done']
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 2
The above manifest will cause the pod to transition into ready/not-ready every 10 seconds.
kubectl get pod -w
NAME READY STATUS RESTARTS AGE
readiness-test 0/1 Running 0 79s
readiness-test 1/1 Running 0 83s
readiness-test 0/1 Running 0 97s
readiness-test 1/1 Running 0 103s
readiness-test 0/1 Running 0 117s
readiness-test 1/1 Running 0 2m3s
readiness-test 0/1 Running 0 2m17s
readiness-test 1/1 Running 0 2m23s
readiness-test 0/1 Running 0 2m37s
readiness-test 1/1 Running 0 2m43s
readiness-test 0/1 Running 0 2m57s
readiness-test 1/1 Running 0 3m3s
readiness-test 0/1 Running 0 3m17s
readiness-test 1/1 Running 0 3m23s
readiness-test 0/1 Running 0 3m37s
readiness-test 1/1 Running 0 3m43s
readiness-test 0/1 Running 0 3m57s
readiness-test 1/1 Running 0 4m3s
You can see the the readiness status getting changed:
while true; do k get pod readiness-test -o jsonpath='{.status.containerStatuses[*].ready}{"\n"}';sleep 2 ;done
true
true
true
true
true
true
false
false
true
true
true
true
true
true
false
false
true
true
true
true
true
true
false
false
true
true
true
true
true
false
false
false

Kubernetes Dashboard CrashLoopBackOff: timeout error on Raspberry Pi cluster

Should be a simple task, I simply want to run the Kubernetes Dashboard on a clean install of Kubernetes on a Raspberry Pi cluster.
What I've done:
Setup the initial cluster (hostname, static ip, cgroup, swapspace, install and configure docker, install kubernetes, setup kubernetes network and join nodes)
I have flannel installed
I have applied the dashboard
Bunch of random testing trying to figure this out
Obviously, as seen below, the container in the dashboard pod is not working because it cannot access kubernetes-dashboard-csrf. I have no idea why this cannot be accessed, my only thought is that I missed a step when setting up the cluster. I've followed about 6 different guides without success, prioritizing the official guide. I have also seen quite a few people having the same or similar issues that most have not posted a resolution. Thanks!
Nodes: kubectl get nodes
NAME STATUS ROLES AGE VERSION
gus3 Ready <none> 346d v1.23.1
juliet3 Ready <none> 346d v1.23.1
shawn4 Ready <none> 346d v1.23.1
vick4 Ready control-plane,master 346d v1.23.1
All Pods: kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-74ff55c5b-7j2xg 1/1 Running 27 346d
kube-system coredns-74ff55c5b-cb2x8 1/1 Running 27 346d
kube-system etcd-vick4 1/1 Running 2 169m
kube-system kube-apiserver-vick4 1/1 Running 2 169m
kube-system kube-controller-manager-vick4 1/1 Running 2 169m
kube-system kube-flannel-ds-gclmp 1/1 Running 0 11m
kube-system kube-flannel-ds-hshjv 1/1 Running 0 12m
kube-system kube-flannel-ds-kdd4w 1/1 Running 0 11m
kube-system kube-flannel-ds-wzhkt 1/1 Running 0 10m
kube-system kube-proxy-4t25v 1/1 Running 26 346d
kube-system kube-proxy-b6vbx 1/1 Running 26 346d
kube-system kube-proxy-jgj4s 1/1 Running 27 346d
kube-system kube-proxy-n65sl 1/1 Running 26 346d
kube-system kube-scheduler-vick4 1/1 Running 2 169m
kubernetes-dashboard dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 77m
kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 18 77m
Resources: kubectl get all -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 79m
pod/kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 19 79m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 172.20.0.191 <none> 8000/TCP 79m
service/kubernetes-dashboard ClusterIP 172.20.0.15 <none> 443/TCP 79m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 79m
deployment.apps/kubernetes-dashboard 0/1 1 0 79m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-5b8896d7fc 1 1 1 79m
replicaset.apps/kubernetes-dashboard-897c7599f 1 1 0 79m
Notice CrashLoopBackOff
Pod Details: kubectl describe pods kubernetes-dashboard-897c7599f-qss5p -n kubernetes-dashboard
Name: kubernetes-dashboard-897c7599f-qss5p
Namespace: kubernetes-dashboard
Priority: 0
Node: shawn4/192.168.10.71
Start Time: Fri, 17 Dec 2021 18:52:15 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=897c7599f
Annotations: <none>
Status: Running
IP: 172.19.1.75
IPs:
IP: 172.19.1.75
Controlled By: ReplicaSet/kubernetes-dashboard-897c7599f
Containers:
kubernetes-dashboard:
Container ID: docker://894a354e40ca1a95885e149dcd75415e0f186ead3f2e05ec0787f4b1c7a29622
Image: kubernetesui/dashboard:v2.4.0
Image ID: docker-pullable://kubernetesui/dashboard#sha256:526850ae4ea9aba360e72b6df69fd3126b129d446efe83ac5250282b85f95b7f
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 17 Dec 2021 20:10:19 +0000
Finished: Fri, 17 Dec 2021 20:10:49 +0000
Ready: False
Restart Count: 19
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-wq9m8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-wq9m8:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-wq9m8
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 21s (x327 over 79m) kubelet Back-off restarting failed container
Logs: kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p
2021/12/17 20:10:19 Starting overwatch
2021/12/17 20:10:19 Using namespace: kubernetes-dashboard
2021/12/17 20:10:19 Using in-cluster config to connect to apiserver
2021/12/17 20:10:19 Using secret token for csrf signing
2021/12/17 20:10:19 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get "https://172.20.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf": dial tcp 172.20.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0x400055fae8)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:41 +0x350
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:66
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:502 +0x8c
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:470 +0x40
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:551
main.main()
/home/runner/work/dashboard/dashboard/src/app/backend/dashboard.go:95 +0x1dc
If you need any more information please ask!
UPDATE 12/29/21:
Fixed this issue by reinstalling the cluster to the newest versions of Kubernetes and Ubuntu.
Turned out there were several issues:
I was using Ubuntu Buster which is deprecated.
My client/server Kubernetes versions were +/-0.3 out of sync
I was following outdated instructions
I reinstalled the whole cluster following Kubernetes official guide and, with a few snags along the way, it works!

JupyterHub not launching on Helm | K8s

I have a metalLB loadbalancer, k8s clusters (one master and one worker) v1.18.5, helm 3.7, and nfs dynamic volume provisioning using helm. I run up a jupyterhub instance with helm. Within a minute everything is set up but when I use the external IP to open JupyterHub on my browser, noting loads up. here is my kubectl get all
pod/continuous-image-puller-4l5gj 1/1 Running 0 23s
pod/hub-6c9cb48df8-k5t4w 1/1 Running 0 23s
pod/nfs-subdir-external-provisioner-789697969b-hqp46 1/1 Running 0 23h
pod/nginx2-669c86457c-hc5mv 1/1 Running 0 35h
pod/proxy-66cb767659-svwbv 1/1 Running 0 23s
pod/user-scheduler-6d4698dd59-wqw9l 1/1 Running 0 23s
pod/user-scheduler-6d4698dd59-zk4c7 1/1 Running 0 23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hub ClusterIP 10.111.196.55 <none> 8081/TCP 23s
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 39h
service/nginx2 LoadBalancer 10.106.241.85 10.0.3.240 80:30746/TCP 32h
service/proxy-api ClusterIP 10.109.211.71 <none> 8001/TCP 23s
service/proxy-public LoadBalancer 10.111.233.85 10.0.3.241 80:31336/TCP 23s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/continuous-image-puller 1 1 1 1 1 <none> 23s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hub 1/1 1 1 23s
deployment.apps/nfs-subdir-external-provisioner 1/1 1 1 23h
deployment.apps/nginx2 1/1 1 1 35h
deployment.apps/proxy 1/1 1 1 23s
deployment.apps/user-scheduler 2/2 2 2 23s
NAME DESIRED CURRENT READY AGE
replicaset.apps/hub-6c9cb48df8 1 1 1 23s
replicaset.apps/nfs-subdir-external-provisioner-789697969b 1 1 1 23h
replicaset.apps/nginx2-669c86457c 1 1 1 35h
replicaset.apps/proxy-66cb767659 1 1 1 23s
replicaset.apps/user-scheduler-6d4698dd59 2 2 2 23s
NAME READY AGE
statefulset.apps/user-placeholder 0/0 23s
Also, below is my storage class for reference: kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 23h
I will not paste the config file as it is very large, basically what I did was
helm show values jupyterhub/jupyterhub > /tmp/jupyterhub.yaml
(after changing some values)
helm install jupyterhub jupyterhub/jupyterhub --values /tmp/jupyterhub.yaml
The only few things I changed was the security-key (hex [as mentioned on the website]) along with writing nfs-client wherever it said storageClass and storageClassName and perhaps altering the storage size (1Gi/2Gi). That's all. The LoadBalancer works fine because I ran nginx and I can easily open it up on my browser. So I decided to check the JupyterHub pod's by first getting the pod's name using: kubectl get pods
NAME READY STATUS RESTARTS AGE
continuous-image-puller-4l5gj 1/1 Running 0 20m
hub-6c9cb48df8-k5t4w 1/1 Running 0 20m
nfs-subdir-external-provisioner-789697969b-hqp46 1/1 Running 0 23h
nginx2-669c86457c-hc5mv 1/1 Running 0 35h
proxy-66cb767659-svwbv 1/1 Running 0 20m
user-scheduler-6d4698dd59-wqw9l 1/1 Running 0 20m
user-scheduler-6d4698dd59-zk4c7 1/1 Running 0 20m
root#master:/home/ubuntu#
and then using kubectl describe pod hub-6c9cb48df8-k5t4w -n default which gave me this:
Name: hub-6c9cb48df8-k5t4w
Namespace: default
Priority: 0
Node: worker/10.0.0.126
Start Time: Sat, 27 Nov 2021 10:21:43 +0000
Labels: app=jupyterhub
component=hub
hub.jupyter.org/network-access-proxy-api=true
hub.jupyter.org/network-access-proxy-http=true
hub.jupyter.org/network-access-singleuser=true
pod-template-hash=6c9cb48df8
release=jupyterhub
Annotations: checksum/config-map: f746d7e563a064e9158fe6f7f59bdbd463ed24ad7a927d75a1f18c022c3afeaf
checksum/secret: 926186a1b18e5cb9aa5b8c0a177f379299bcf0f05ac4de17d1958422054d15e5
cni.projectcalico.org/podIP: 192.168.171.97/32
cni.projectcalico.org/podIPs: 192.168.171.97/32
Status: Running
IP: 192.168.171.97
IPs:
IP: 192.168.171.97
Controlled By: ReplicaSet/hub-6c9cb48df8
Containers:
hub:
Container ID: docker://1d5e3a812f9712f6d59c09d855b034e2f6bc3e058bad4932db87145ec09f70d1
Image: jupyterhub/k8s-hub:1.2.0
Image ID: docker-pullable://jupyterhub/k8s-hub#sha256:e4770285aaf7230b930643986221757c2cc2e9420f5e21ac892582c96a57ce1c
Port: 8081/TCP
Host Port: 0/TCP
Args:
jupyterhub
--config
/usr/local/etc/jupyterhub/jupyterhub_config.py
--upgrade-db
State: Running
Started: Sat, 27 Nov 2021 10:21:45 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:http/hub/health delay=300s timeout=3s period=10s #success=1 #failure=30
Readiness: http-get http://:http/hub/health delay=0s timeout=1s period=2s #success=1 #failure=1000
Environment:
PYTHONUNBUFFERED: 1
HELM_RELEASE_NAME: jupyterhub
POD_NAMESPACE: default (v1:metadata.namespace)
CONFIGPROXY_AUTH_TOKEN: <set to the key 'hub.config.ConfigurableHTTPProxy.auth_token' in secret 'hub'> Optional: false
Mounts:
/srv/jupyterhub from pvc (rw)
/usr/local/etc/jupyterhub/config/ from config (rw)
/usr/local/etc/jupyterhub/jupyterhub_config.py from config (rw,path="jupyterhub_config.py")
/usr/local/etc/jupyterhub/secret/ from secret (rw)
/usr/local/etc/jupyterhub/z2jh.py from config (rw,path="z2jh.py")
/var/run/secrets/kubernetes.io/serviceaccount from hub-token-zd25x (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hub
Optional: false
secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub
Optional: false
pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: hub-db-dir
ReadOnly: false
hub-token-zd25x:
Type: Secret (a volume populated by a Secret)
SecretName: hub-token-zd25x
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: hub.jupyter.org/dedicated=core:NoSchedule
hub.jupyter.org_dedicated=core:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21m default-scheduler Successfully assigned default/hub-6c9cb48df8-k5t4w to worker
Normal Pulled 21m kubelet, worker Container image "jupyterhub/k8s-hub:1.2.0" already present on machine
Normal Created 21m kubelet, worker Created container hub
Normal Started 21m kubelet, worker Started container hub
Warning Unhealthy 21m (x3 over 21m) kubelet, worker Readiness probe failed: Get http://192.168.171.97:8081/hub/health: dial tcp 192.168.171.97:8081: connect: connection refused
So I know that the pod is unhealthy. But I do not have any other details to debug this. Any help on how to fix or debug this would be highly appreciated.
Thank you!

helm chart deployment liveness and readiness failed error

I have an Openshift cluster. I have created one custom application and am trying to deploy it using Helm charts. When I deploy it on Openshift using 'oc new-app', the deployments works perfectly fine. But when I deploy it using helm chart, it does not work.
Following is the output of 'oc get all' ----
[root#worker2 ~]#
[root#worker2 ~]# oc get all
NAME READY STATUS RESTARTS AGE
pod/chart-acme-85648d4645-7msdl 1/1 Running 0 3d7h
pod/chart1-acme-f8b65b78d-k2fb6 1/1 Running 0 3d7h
pod/netshoot 1/1 Running 0 3d10h
pod/sample1-buildachart-5b5d9d8649-qqmsf 0/1 CrashLoopBackOff 672 2d9h
pod/sample2-686bb7f969-fx5bk 0/1 CrashLoopBackOff 674 2d9h
pod/vjobs-npm-96b65fcb-b2p27 0/1 CrashLoopBackOff 817 47h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/chart-acme LoadBalancer 172.30.174.208 <pending> 80:30222/TCP 3d7h
service/chart1-acme LoadBalancer 172.30.153.36 <pending> 80:30383/TCP 3d7h
service/sample1-buildachart NodePort 172.30.29.124 <none> 80:32375/TCP 2d9h
service/sample2 NodePort 172.30.19.24 <none> 80:32647/TCP 2d9h
service/vjobs-npm NodePort 172.30.205.30 <none> 80:30476/TCP 47h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/chart-acme 1/1 1 1 3d7h
deployment.apps/chart1-acme 1/1 1 1 3d7h
deployment.apps/sample1-buildachart 0/1 1 0 2d9h
deployment.apps/sample2 0/1 1 0 2d9h
deployment.apps/vjobs-npm 0/1 1 0 47h
NAME DESIRED CURRENT READY AGE
replicaset.apps/chart-acme-85648d4645 1 1 1 3d7h
replicaset.apps/chart1-acme-f8b65b78d 1 1 1 3d7h
replicaset.apps/sample1-buildachart-5b5d9d8649 1 1 0 2d9h
replicaset.apps/sample2-686bb7f969 1 1 0 2d9h
replicaset.apps/vjobs-npm-96b65fcb 1 1 0 47h
[root#worker2 ~]#
[root#worker2 ~]#
As per above diagram, you can see the deployment 'vjobs-npm' gives a 'CrashLoopBackOff' error.
Below is the output of 'oc describe pod' ----
[root#worker2 ~]#
[root#worker2 ~]# oc describe pod vjobs-npm-96b65fcb-b2p27
Name: vjobs-npm-96b65fcb-b2p27
Namespace: vjobs-testing
Priority: 0
Node: worker0/192.168.100.109
Start Time: Mon, 31 Aug 2020 09:30:28 -0400
Labels: app.kubernetes.io/instance=vjobs-npm
app.kubernetes.io/name=vjobs-npm
pod-template-hash=96b65fcb
Annotations: openshift.io/scc: restricted
Status: Running
IP: 10.131.0.107
IPs:
IP: 10.131.0.107
Controlled By: ReplicaSet/vjobs-npm-96b65fcb
Containers:
vjobs-npm:
Container ID: cri-o://c232849eb25bd96ae9343ac3ed1539d492985dd8cdf47a5a4df7d3cf776c4cf3
Image: quay.io/aditya7002/vjobs_local_build_new:latest
Image ID: quay.io/aditya7002/vjobs_local_build_new#sha256:87f18e3a24fc7043a43a143e96b0b069db418ace027d95a5427cf53de56feb4c
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 31 Aug 2020 09:31:23 -0400
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 31 Aug 2020 09:30:31 -0400
Finished: Mon, 31 Aug 2020 09:31:22 -0400
Ready: False
Restart Count: 1
Liveness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from vjobs-npm-token-vw6f7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
vjobs-npm-token-vw6f7:
Type: Secret (a volume populated by a Secret)
SecretName: vjobs-npm-token-vw6f7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 68s default-scheduler Successfully assigned vjobs-testing/vjobs-npm-96b65fcb-b2p27 to worker0
Normal Killing 44s kubelet, worker0 Container vjobs-npm failed liveness probe, will be restarted
Normal Pulling 14s (x2 over 66s) kubelet, worker0 Pulling image "quay.io/aditya7002/vjobs_local_build_new:latest"
Normal Pulled 13s (x2 over 65s) kubelet, worker0 Successfully pulled image "quay.io/aditya7002/vjobs_local_build_new:latest"
Normal Created 13s (x2 over 65s) kubelet, worker0 Created container vjobs-npm
Normal Started 13s (x2 over 65s) kubelet, worker0 Started container vjobs-npm
Warning Unhealthy 4s (x4 over 64s) kubelet, worker0 Liveness probe failed: Get http://10.131.0.107:80/: dial tcp 10.131.0.107:80:
connect: connection refused
Warning Unhealthy 1s (x7 over 61s) kubelet, worker0 Readiness probe failed: Get http://10.131.0.107:80/: dial tcp 10.131.0.107:80
: connect: connection refused
[root#worker2 ~]#