k8s pod stuck in status "pending" - kubernetes

All new containers are stuck in status "pending". It does not seem to be a resource issue, since the total cluster utilization is about 10% cpu, 30% memory.
How do I get more insights into the issue?
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
cq-iam-boarding-77fd94dc94-8pc6f 1/1 Running 0 30h
cq-iam-demo-cloud-6b99f6544d-9v7j7 1/1 Running 0 30h
cq-iam-mpm-dev-8c6cc58fd-fczlw 1/1 Running 0 30h
cq-iam-proxy-86854cc78d-49gfw 0/1 Terminating 0 7h42m
cq-iam-proxy-86854cc78d-dqlz8 0/1 Terminating 0 7h36m
cq-iam-proxy-86854cc78d-m7zs2 0/1 Pending 0 5h22m
cq-launchpad-app-7b57c478b9-gqcxj 1/1 Running 0 13h
cq-management-api-7c689c7846-q9fz2 1/1 Running 0 29h
cq-opa-api-8458db697c-75rzd 1/1 Running 0 30h
cq-settings-app-6874885794-mspj9 1/1 Running 0 29h
node-debugger-aks-nodepool1-31127038-vmss000000-czt8s 0/1 Pending 0 8h
$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
cq-iam-boarding-77fd94dc94-8pc6f 2m 482Mi
cq-iam-demo-cloud-6b99f6544d-9v7j7 2m 507Mi
cq-iam-mpm-dev-8c6cc58fd-fczlw 2m 443Mi
cq-launchpad-app-7b57c478b9-gqcxj 0m 2Mi
cq-management-api-7c689c7846-q9fz2 1m 88Mi
cq-opa-api-8458db697c-75rzd 1m 17Mi
cq-settings-app-6874885794-mspj9 1m 2Mi
$ kubectl describe pod cq-iam-proxy-86854cc78d-m7zs2
Name: cq-iam-proxy-86854cc78d-m7zs2
Namespace: dev
Priority: 0
Node: aks-nodepool1-31127038-vmss000000/
Labels: app=cq-iam-proxy
pod-template-hash=86854cc78d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/cq-iam-proxy-86854cc78d
Containers:
cq-iam-proxy:
Image: xxx.azurecr.io/karneval/cq-iam-proxy:1.0.14
Port: 80/TCP
Host Port: 0/TCP
Environment:
CQ_HOSTNAME: dev.hvt.zone
key1: TODO
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-pl6p4 (ro)
Conditions:
Type Status
PodScheduled True
Volumes:
default-token-pl6p4:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-pl6p4
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Check the status of nodepool1:
nodepool is all good and running
there are three nodes which are all green (memory, disk, readiness)
Can you show the logs of the pod?
This is what I get when I print the pod logs:
$ kubectl logs cq-iam-proxy-86854cc78d-m7zs2
Error from server (NotFound): the server could not find the requested resource ( pods/log cq-iam-proxy-86854cc78d-m7zs2)
Please include the events of pods in Terminating status. There may be a clue there:
$ kubectl describe pod cq-iam-proxy-86854cc78d-49gfw
Name: cq-iam-proxy-86854cc78d-49gfw
Namespace: dev
Priority: 0
Node: aks-nodepool1-31127038-vmss000000/
Labels: app=cq-iam-proxy
pod-template-hash=86854cc78d
Annotations: <none>
Status: Terminating (lasts 2d18h)
Termination Grace Period: 30s
IP:
IPs: <none>
Controlled By: ReplicaSet/cq-iam-proxy-86854cc78d
Containers:
cq-iam-proxy:
Image: xxx.azurecr.io/karneval/cq-iam-proxy:1.0.14
Port: 80/TCP
Host Port: 0/TCP
Environment:
CQ_HOSTNAME: dev.hvt.zone
key1: TODO
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-pl6p4 (ro)
Conditions:
Type Status
PodScheduled True
Volumes:
default-token-pl6p4:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-pl6p4
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
There are no events there? Is there anything in the logs of those two pods?
$ kubectl logs cq-iam-proxy-86854cc78d-dqlz8
Error from server (NotFound): the server could not find the requested resource ( pods/log cq-iam-proxy-86854cc78d-dqlz8)
This seems like a problem with the application itself.
It does not seem to be a problem with the application itself. I ran these two commands:
$ kubectl run --image=busybox myapp -- false
$ kubectl run --image=busybox myapp2 -- false
myapp was able to start
myapp2 is in pending mode (same as the other applications)
myapp 0/1 CrashLoopBackOff 5 11m
myapp2 0/1 Pending 0 9m26s
$ kubectl describe pod myapp
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned dev/myapp to aks-nodepool1-31127038-vmss000001
Normal Created 11m (x4 over 11m) kubelet Created container myapp
Normal Started 11m (x4 over 11m) kubelet Started container myapp
Normal Pulling 10m (x5 over 11m) kubelet Pulling image "busybox"
Normal Pulled 10m (x5 over 11m) kubelet Successfully pulled image "busybox"
Warning BackOff 95s (x47 over 11m) kubelet Back-off restarting failed container
$ kubectl describe pod myapp2
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned dev/myapp2 to aks-nodepool1-31127038-vmss000000
The only difference between myapp and myapp2 is that they have been scheduled on different nodes:
myapp was successfully started on node aks-nodepool1-31127038-vmss000001
myapp2 does not start on node aks-nodepool1-31127038-vmss000000

After two weeks the cluster healed it self.
The node nodepool1-31127038-vmss000000 was problematic and would get stuck starting a container.
Next time I encounter this problem I will play with these commands to heal the node:
kubectl cordon my-node # Mark my-node as unschedulable
kubectl drain my-node # Drain my-node in preparation for maintenance
kubectl uncordon my-node # Mark my-node as schedulable
kubectl top node my-node # Show metrics for a given node

Related

calico-kube-controllers is in pending state

I'm trying to install Kubernetes cluster using this tutorial:
https://www.linuxtechi.com/install-kubernetes-on-ubuntu-22-04/
But when I install the master place and run: kubectl get pods -n kube-system I get:
kubernetes#kubernetes1:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-555bc4b957-kv6zz 0/1 Pending 0 5m38s
calico-node-kzfqn 1/1 Running 0 5m38s
coredns-6d4b75cb6d-lwdgx 1/1 Running 0 6m44s
coredns-6d4b75cb6d-mrkqj 1/1 Running 0 6m45s
etcd-kubernetes1 1/1 Running 0 6m50s
kube-apiserver-kubernetes1 1/1 Running 0 6m50s
kube-controller-manager-kubernetes1 1/1 Running 0 6m52s
kube-proxy-hqgxj 1/1 Running 0 6m45s
kube-scheduler-kubernetes1 1/1 Running 0 6m50s
events:
kubernetes#kubernetes1:~$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
7m17s Normal NodeHasSufficientMemory node/kubernetes1 Node kubernetes1 status is now: NodeHasSufficientMemory
7m17s Normal NodeHasNoDiskPressure node/kubernetes1 Node kubernetes1 status is now: NodeHasNoDiskPressure
7m17s Normal NodeHasSufficientPID node/kubernetes1 Node kubernetes1 status is now: NodeHasSufficientPID
7m7s Normal Starting node/kubernetes1 Starting kubelet.
7m7s Warning InvalidDiskCapacity node/kubernetes1 invalid capacity 0 on image filesystem
7m7s Normal NodeAllocatableEnforced node/kubernetes1 Updated Node Allocatable limit across pods
7m7s Normal NodeHasSufficientMemory node/kubernetes1 Node kubernetes1 status is now: NodeHasSufficientMemory
7m7s Normal NodeHasNoDiskPressure node/kubernetes1 Node kubernetes1 status is now: NodeHasNoDiskPressure
7m7s Normal NodeHasSufficientPID node/kubernetes1 Node kubernetes1 status is now: NodeHasSufficientPID
7m4s Normal RegisteredNode node/kubernetes1 Node kubernetes1 event: Registered Node kubernetes1 in Controller
6m58s Normal Starting node/kubernetes1
5m15s Normal NodeReady node/kubernetes1 Node kubernetes1 status is now: NodeReady
kubernetes#kubernetes1:~$
Do you know how I can fix calico-kube-controllers-555bc4b957-kv6zz to be in Running State?
kubernetes#kubernetes1:~$ kubectl describe pod --namespace kube-system calico-kube-controllers-555bc4b957-kv6zz
Name: calico-kube-controllers-555bc4b957-kv6zz
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=calico-kube-controllers
pod-template-hash=555bc4b957
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/calico-kube-controllers-555bc4b957
Containers:
calico-kube-controllers:
Image: docker.io/calico/kube-controllers:v3.23.3
Port: <none>
Host Port: <none>
Liveness: exec [/usr/bin/check-status -l] delay=10s timeout=10s period=10s #success=1 #failure=6
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ENABLED_CONTROLLERS: node
DATASTORE_TYPE: kubernetes
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j2hn7 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-j2hn7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m10s (x3 over 14m) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
kubernetes#kubernetes1:~$
From the event of the pod you can clearly see that the scheduler was not able to schedule the pod onto the control-plane due to untolerated taint on the control plane.
Think of taints and tolerations as a bug spray (taint) and a bug which has a toleration to a specific bug spray. Some bugs will tolerate bug sprays that are designed to keep off other species of bugs. In your case, the control plane is tainted with node-role.kubernetes.io/control-plane and your pod has toleration for node-role.kubernetes.io/master. In order to schedule the pod onto the control plane, ensure the pod tolerates the same taints as the target node (controlplane)
You can fix it by adding a toleration to your pod spec:
kind: Pod
...
spec:
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
...

Kubernetes Dashboard CrashLoopBackOff: timeout error on Raspberry Pi cluster

Should be a simple task, I simply want to run the Kubernetes Dashboard on a clean install of Kubernetes on a Raspberry Pi cluster.
What I've done:
Setup the initial cluster (hostname, static ip, cgroup, swapspace, install and configure docker, install kubernetes, setup kubernetes network and join nodes)
I have flannel installed
I have applied the dashboard
Bunch of random testing trying to figure this out
Obviously, as seen below, the container in the dashboard pod is not working because it cannot access kubernetes-dashboard-csrf. I have no idea why this cannot be accessed, my only thought is that I missed a step when setting up the cluster. I've followed about 6 different guides without success, prioritizing the official guide. I have also seen quite a few people having the same or similar issues that most have not posted a resolution. Thanks!
Nodes: kubectl get nodes
NAME STATUS ROLES AGE VERSION
gus3 Ready <none> 346d v1.23.1
juliet3 Ready <none> 346d v1.23.1
shawn4 Ready <none> 346d v1.23.1
vick4 Ready control-plane,master 346d v1.23.1
All Pods: kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-74ff55c5b-7j2xg 1/1 Running 27 346d
kube-system coredns-74ff55c5b-cb2x8 1/1 Running 27 346d
kube-system etcd-vick4 1/1 Running 2 169m
kube-system kube-apiserver-vick4 1/1 Running 2 169m
kube-system kube-controller-manager-vick4 1/1 Running 2 169m
kube-system kube-flannel-ds-gclmp 1/1 Running 0 11m
kube-system kube-flannel-ds-hshjv 1/1 Running 0 12m
kube-system kube-flannel-ds-kdd4w 1/1 Running 0 11m
kube-system kube-flannel-ds-wzhkt 1/1 Running 0 10m
kube-system kube-proxy-4t25v 1/1 Running 26 346d
kube-system kube-proxy-b6vbx 1/1 Running 26 346d
kube-system kube-proxy-jgj4s 1/1 Running 27 346d
kube-system kube-proxy-n65sl 1/1 Running 26 346d
kube-system kube-scheduler-vick4 1/1 Running 2 169m
kubernetes-dashboard dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 77m
kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 18 77m
Resources: kubectl get all -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 79m
pod/kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 19 79m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 172.20.0.191 <none> 8000/TCP 79m
service/kubernetes-dashboard ClusterIP 172.20.0.15 <none> 443/TCP 79m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 79m
deployment.apps/kubernetes-dashboard 0/1 1 0 79m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-5b8896d7fc 1 1 1 79m
replicaset.apps/kubernetes-dashboard-897c7599f 1 1 0 79m
Notice CrashLoopBackOff
Pod Details: kubectl describe pods kubernetes-dashboard-897c7599f-qss5p -n kubernetes-dashboard
Name: kubernetes-dashboard-897c7599f-qss5p
Namespace: kubernetes-dashboard
Priority: 0
Node: shawn4/192.168.10.71
Start Time: Fri, 17 Dec 2021 18:52:15 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=897c7599f
Annotations: <none>
Status: Running
IP: 172.19.1.75
IPs:
IP: 172.19.1.75
Controlled By: ReplicaSet/kubernetes-dashboard-897c7599f
Containers:
kubernetes-dashboard:
Container ID: docker://894a354e40ca1a95885e149dcd75415e0f186ead3f2e05ec0787f4b1c7a29622
Image: kubernetesui/dashboard:v2.4.0
Image ID: docker-pullable://kubernetesui/dashboard#sha256:526850ae4ea9aba360e72b6df69fd3126b129d446efe83ac5250282b85f95b7f
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 17 Dec 2021 20:10:19 +0000
Finished: Fri, 17 Dec 2021 20:10:49 +0000
Ready: False
Restart Count: 19
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-wq9m8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-wq9m8:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-wq9m8
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 21s (x327 over 79m) kubelet Back-off restarting failed container
Logs: kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p
2021/12/17 20:10:19 Starting overwatch
2021/12/17 20:10:19 Using namespace: kubernetes-dashboard
2021/12/17 20:10:19 Using in-cluster config to connect to apiserver
2021/12/17 20:10:19 Using secret token for csrf signing
2021/12/17 20:10:19 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get "https://172.20.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf": dial tcp 172.20.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0x400055fae8)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:41 +0x350
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:66
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:502 +0x8c
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:470 +0x40
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:551
main.main()
/home/runner/work/dashboard/dashboard/src/app/backend/dashboard.go:95 +0x1dc
If you need any more information please ask!
UPDATE 12/29/21:
Fixed this issue by reinstalling the cluster to the newest versions of Kubernetes and Ubuntu.
Turned out there were several issues:
I was using Ubuntu Buster which is deprecated.
My client/server Kubernetes versions were +/-0.3 out of sync
I was following outdated instructions
I reinstalled the whole cluster following Kubernetes official guide and, with a few snags along the way, it works!

JupyterHub not launching on Helm | K8s

I have a metalLB loadbalancer, k8s clusters (one master and one worker) v1.18.5, helm 3.7, and nfs dynamic volume provisioning using helm. I run up a jupyterhub instance with helm. Within a minute everything is set up but when I use the external IP to open JupyterHub on my browser, noting loads up. here is my kubectl get all
pod/continuous-image-puller-4l5gj 1/1 Running 0 23s
pod/hub-6c9cb48df8-k5t4w 1/1 Running 0 23s
pod/nfs-subdir-external-provisioner-789697969b-hqp46 1/1 Running 0 23h
pod/nginx2-669c86457c-hc5mv 1/1 Running 0 35h
pod/proxy-66cb767659-svwbv 1/1 Running 0 23s
pod/user-scheduler-6d4698dd59-wqw9l 1/1 Running 0 23s
pod/user-scheduler-6d4698dd59-zk4c7 1/1 Running 0 23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hub ClusterIP 10.111.196.55 <none> 8081/TCP 23s
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 39h
service/nginx2 LoadBalancer 10.106.241.85 10.0.3.240 80:30746/TCP 32h
service/proxy-api ClusterIP 10.109.211.71 <none> 8001/TCP 23s
service/proxy-public LoadBalancer 10.111.233.85 10.0.3.241 80:31336/TCP 23s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/continuous-image-puller 1 1 1 1 1 <none> 23s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hub 1/1 1 1 23s
deployment.apps/nfs-subdir-external-provisioner 1/1 1 1 23h
deployment.apps/nginx2 1/1 1 1 35h
deployment.apps/proxy 1/1 1 1 23s
deployment.apps/user-scheduler 2/2 2 2 23s
NAME DESIRED CURRENT READY AGE
replicaset.apps/hub-6c9cb48df8 1 1 1 23s
replicaset.apps/nfs-subdir-external-provisioner-789697969b 1 1 1 23h
replicaset.apps/nginx2-669c86457c 1 1 1 35h
replicaset.apps/proxy-66cb767659 1 1 1 23s
replicaset.apps/user-scheduler-6d4698dd59 2 2 2 23s
NAME READY AGE
statefulset.apps/user-placeholder 0/0 23s
Also, below is my storage class for reference: kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 23h
I will not paste the config file as it is very large, basically what I did was
helm show values jupyterhub/jupyterhub > /tmp/jupyterhub.yaml
(after changing some values)
helm install jupyterhub jupyterhub/jupyterhub --values /tmp/jupyterhub.yaml
The only few things I changed was the security-key (hex [as mentioned on the website]) along with writing nfs-client wherever it said storageClass and storageClassName and perhaps altering the storage size (1Gi/2Gi). That's all. The LoadBalancer works fine because I ran nginx and I can easily open it up on my browser. So I decided to check the JupyterHub pod's by first getting the pod's name using: kubectl get pods
NAME READY STATUS RESTARTS AGE
continuous-image-puller-4l5gj 1/1 Running 0 20m
hub-6c9cb48df8-k5t4w 1/1 Running 0 20m
nfs-subdir-external-provisioner-789697969b-hqp46 1/1 Running 0 23h
nginx2-669c86457c-hc5mv 1/1 Running 0 35h
proxy-66cb767659-svwbv 1/1 Running 0 20m
user-scheduler-6d4698dd59-wqw9l 1/1 Running 0 20m
user-scheduler-6d4698dd59-zk4c7 1/1 Running 0 20m
root#master:/home/ubuntu#
and then using kubectl describe pod hub-6c9cb48df8-k5t4w -n default which gave me this:
Name: hub-6c9cb48df8-k5t4w
Namespace: default
Priority: 0
Node: worker/10.0.0.126
Start Time: Sat, 27 Nov 2021 10:21:43 +0000
Labels: app=jupyterhub
component=hub
hub.jupyter.org/network-access-proxy-api=true
hub.jupyter.org/network-access-proxy-http=true
hub.jupyter.org/network-access-singleuser=true
pod-template-hash=6c9cb48df8
release=jupyterhub
Annotations: checksum/config-map: f746d7e563a064e9158fe6f7f59bdbd463ed24ad7a927d75a1f18c022c3afeaf
checksum/secret: 926186a1b18e5cb9aa5b8c0a177f379299bcf0f05ac4de17d1958422054d15e5
cni.projectcalico.org/podIP: 192.168.171.97/32
cni.projectcalico.org/podIPs: 192.168.171.97/32
Status: Running
IP: 192.168.171.97
IPs:
IP: 192.168.171.97
Controlled By: ReplicaSet/hub-6c9cb48df8
Containers:
hub:
Container ID: docker://1d5e3a812f9712f6d59c09d855b034e2f6bc3e058bad4932db87145ec09f70d1
Image: jupyterhub/k8s-hub:1.2.0
Image ID: docker-pullable://jupyterhub/k8s-hub#sha256:e4770285aaf7230b930643986221757c2cc2e9420f5e21ac892582c96a57ce1c
Port: 8081/TCP
Host Port: 0/TCP
Args:
jupyterhub
--config
/usr/local/etc/jupyterhub/jupyterhub_config.py
--upgrade-db
State: Running
Started: Sat, 27 Nov 2021 10:21:45 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:http/hub/health delay=300s timeout=3s period=10s #success=1 #failure=30
Readiness: http-get http://:http/hub/health delay=0s timeout=1s period=2s #success=1 #failure=1000
Environment:
PYTHONUNBUFFERED: 1
HELM_RELEASE_NAME: jupyterhub
POD_NAMESPACE: default (v1:metadata.namespace)
CONFIGPROXY_AUTH_TOKEN: <set to the key 'hub.config.ConfigurableHTTPProxy.auth_token' in secret 'hub'> Optional: false
Mounts:
/srv/jupyterhub from pvc (rw)
/usr/local/etc/jupyterhub/config/ from config (rw)
/usr/local/etc/jupyterhub/jupyterhub_config.py from config (rw,path="jupyterhub_config.py")
/usr/local/etc/jupyterhub/secret/ from secret (rw)
/usr/local/etc/jupyterhub/z2jh.py from config (rw,path="z2jh.py")
/var/run/secrets/kubernetes.io/serviceaccount from hub-token-zd25x (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hub
Optional: false
secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub
Optional: false
pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: hub-db-dir
ReadOnly: false
hub-token-zd25x:
Type: Secret (a volume populated by a Secret)
SecretName: hub-token-zd25x
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: hub.jupyter.org/dedicated=core:NoSchedule
hub.jupyter.org_dedicated=core:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21m default-scheduler Successfully assigned default/hub-6c9cb48df8-k5t4w to worker
Normal Pulled 21m kubelet, worker Container image "jupyterhub/k8s-hub:1.2.0" already present on machine
Normal Created 21m kubelet, worker Created container hub
Normal Started 21m kubelet, worker Started container hub
Warning Unhealthy 21m (x3 over 21m) kubelet, worker Readiness probe failed: Get http://192.168.171.97:8081/hub/health: dial tcp 192.168.171.97:8081: connect: connection refused
So I know that the pod is unhealthy. But I do not have any other details to debug this. Any help on how to fix or debug this would be highly appreciated.
Thank you!

A question about pod running on the kubernetes(k8s) platform:The pods are running but the containers are not-ready

I build a k8s cluster on my virtual Machines(CentOS/7) with Virtual Box:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master Ready control-plane,master 8d v1.21.2 192.168.0.186 <none> CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://20.10.7
k8s-worker01 Ready <none> 8d v1.21.2 192.168.0.187 <none> CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://20.10.7
k8s-worker02 Ready <none> 8d v1.21.2 192.168.0.188 <none> CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://20.10.7
And i run some pods on the default namespace with a ReplicaSet several days before.
They were all worked fine at first, and then I shut down the VM.
Today, after I restarted the VMs, I found that they are not working properly anymore:
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/dnsutils 1/1 Running 3 5d13h
pod/kubapp-6qbfz 0/1 Running 0 5d13h
pod/kubapp-d887h 0/1 Running 0 5d13h
pod/kubapp-z6nw7 0/1 Running 0 5d13h
NAME DESIRED CURRENT READY AGE
replicaset.apps/kubapp 3 3 0 5d13h
Then I delete the ReplicaSet and re-create it to create the pods.
And i run the command to get more infomations:
[root#k8s-master ch04]# kubectl describe po kubapp-z887v
Name: kubapp-d887h
Namespace: default
Priority: 0
Node: k8s-worker02/192.168.0.188
Start Time: Fri, 23 Jul 2021 15:55:16 +0000
Labels: app=kubapp
Annotations: cni.projectcalico.org/podIP: 10.244.69.244/32
cni.projectcalico.org/podIPs: 10.244.69.244/32
Status: Running
IP: 10.244.69.244
IPs:
IP: 10.244.69.244
Controlled By: ReplicaSet/kubapp
Containers:
kubapp:
Container ID: docker://fc352ce4c6a826f2cf108f9bb9a335e3572509fd5ae2002c116e2b080df5ee10
Image: evalle/kubapp
Image ID: docker-pullable://evalle/kubapp#sha256:560c9c50b1d894cf79ac472a9925dc795b116b9481ec40d142b928a0e3995f4c
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 23 Jul 2021 15:55:21 +0000
Ready: False
Restart Count: 0
Readiness: exec [ls /var/ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m9rwr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-m9rwr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30m default-scheduler Successfully assigned default/kubapp-d887h to k8s-worker02
Normal Pulling 30m kubelet Pulling image "evalle/kubapp"
Normal Pulled 30m kubelet Successfully pulled image "evalle/kubapp" in 4.049160061s
Normal Created 30m kubelet Created container kubapp
Normal Started 30m kubelet Started container kubapp
Warning Unhealthy 11s (x182 over 30m) kubelet Readiness probe failed: ls: cannot access /var/ready: No such file or directory
I don`t know what it happens and how i should do for fix it.
SO here i am and ask to you guys for help.
I am a k8s newbie,just give a hand please.
Thanks for paul-becotte`s help and recommendation.I think i should to post the definition of the pod:
apiVersion: apps/v1
kind: ReplicaSet
metadata:
# here is the name of the replication controller (RC)
name: kubapp
spec:
replicas: 3
# what pods the RC is operating on
selector:
matchLabels:
app: kubapp
# the pod template for creating new pods
template:
metadata:
labels:
app: kubapp
spec:
containers:
- name: kubapp
image: evalle/kubapp
readinessProbe:
exec:
command:
- ls
- /var/ready
There is a example definition of yaml from https://github.com/Evalle/k8s-in-action/blob/master/Chapter_4/kubapp-rs.yaml.
I don`t know where to find the dockerfile of the image evalle/kubapp.
And I don't know if it has the /var/ready directory.
Look at your event
Warning Unhealthy 11s (x182 over 30m) kubelet Readiness probe failed: ls: cannot access /var/ready: No such file or directory
Your readiness probe is failing- looks like it is checking for the existence of a file at /var/ready.
Your next step is "does that make sense? Is my container going to actually write a file at /var/ready when its ready?" If so, you'll want to look at the logs from your pod and figure out why its not writing the file. If its NOT the correct check, look at the yaml you used to create your pod/deployment/replicaset whatever and replace that check with something that does make sense.

Pods are in Pending state

My pods are staying in Pending state, as all the answers mentioned I tried to get describe output but no idea about why it is staying in Pending state:
k8s#k8s-master:~/deployment$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 12d v1.12.2
k8s-node-1 Ready <none> 12d v1.12.2
k8s-node-2 Ready <none> 12d v1.12.2
k8s#k8s-master:~/deployment$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 0/1 Pending 0 62m
webserver 0/1 Pending 0 13m
k8s#k8s-master:~/deployment$ kubectl describe pod webserver
Name: webserver
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: creator=rithin
Annotations: <none>
Status: Pending
IP:
Containers:
apache:
Image: httpd
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vdpls (ro)
Volumes:
default-token-vdpls:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vdpls
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Already tried describing the pods, but no info
Your pods require manual scheduling.
In your yaml file for the pods add
nodeName: k8s-master
at the same level of containers under spec.
Your pods would be scheduled at the k8s-master node. If you want to schedule it in any other node, replace "k8s-master" with the appripriate node name.
One possiblity is that worker node is not reachable from master node as there is no node assigned to the pod.
Well I couldn't find any logs related to failure. So recreated the cluster and now it is working. I assume it was a problem with flannel.