A question about pod running on the kubernetes(k8s) platform:The pods are running but the containers are not-ready - kubernetes

I build a k8s cluster on my virtual Machines(CentOS/7) with Virtual Box:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master Ready control-plane,master 8d v1.21.2 192.168.0.186 <none> CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://20.10.7
k8s-worker01 Ready <none> 8d v1.21.2 192.168.0.187 <none> CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://20.10.7
k8s-worker02 Ready <none> 8d v1.21.2 192.168.0.188 <none> CentOS Linux 7 (Core) 3.10.0-1160.31.1.el7.x86_64 docker://20.10.7
And i run some pods on the default namespace with a ReplicaSet several days before.
They were all worked fine at first, and then I shut down the VM.
Today, after I restarted the VMs, I found that they are not working properly anymore:
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/dnsutils 1/1 Running 3 5d13h
pod/kubapp-6qbfz 0/1 Running 0 5d13h
pod/kubapp-d887h 0/1 Running 0 5d13h
pod/kubapp-z6nw7 0/1 Running 0 5d13h
NAME DESIRED CURRENT READY AGE
replicaset.apps/kubapp 3 3 0 5d13h
Then I delete the ReplicaSet and re-create it to create the pods.
And i run the command to get more infomations:
[root#k8s-master ch04]# kubectl describe po kubapp-z887v
Name: kubapp-d887h
Namespace: default
Priority: 0
Node: k8s-worker02/192.168.0.188
Start Time: Fri, 23 Jul 2021 15:55:16 +0000
Labels: app=kubapp
Annotations: cni.projectcalico.org/podIP: 10.244.69.244/32
cni.projectcalico.org/podIPs: 10.244.69.244/32
Status: Running
IP: 10.244.69.244
IPs:
IP: 10.244.69.244
Controlled By: ReplicaSet/kubapp
Containers:
kubapp:
Container ID: docker://fc352ce4c6a826f2cf108f9bb9a335e3572509fd5ae2002c116e2b080df5ee10
Image: evalle/kubapp
Image ID: docker-pullable://evalle/kubapp#sha256:560c9c50b1d894cf79ac472a9925dc795b116b9481ec40d142b928a0e3995f4c
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 23 Jul 2021 15:55:21 +0000
Ready: False
Restart Count: 0
Readiness: exec [ls /var/ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m9rwr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-m9rwr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30m default-scheduler Successfully assigned default/kubapp-d887h to k8s-worker02
Normal Pulling 30m kubelet Pulling image "evalle/kubapp"
Normal Pulled 30m kubelet Successfully pulled image "evalle/kubapp" in 4.049160061s
Normal Created 30m kubelet Created container kubapp
Normal Started 30m kubelet Started container kubapp
Warning Unhealthy 11s (x182 over 30m) kubelet Readiness probe failed: ls: cannot access /var/ready: No such file or directory
I don`t know what it happens and how i should do for fix it.
SO here i am and ask to you guys for help.
I am a k8s newbie,just give a hand please.
Thanks for paul-becotte`s help and recommendation.I think i should to post the definition of the pod:
apiVersion: apps/v1
kind: ReplicaSet
metadata:
# here is the name of the replication controller (RC)
name: kubapp
spec:
replicas: 3
# what pods the RC is operating on
selector:
matchLabels:
app: kubapp
# the pod template for creating new pods
template:
metadata:
labels:
app: kubapp
spec:
containers:
- name: kubapp
image: evalle/kubapp
readinessProbe:
exec:
command:
- ls
- /var/ready
There is a example definition of yaml from https://github.com/Evalle/k8s-in-action/blob/master/Chapter_4/kubapp-rs.yaml.
I don`t know where to find the dockerfile of the image evalle/kubapp.
And I don't know if it has the /var/ready directory.

Look at your event
Warning Unhealthy 11s (x182 over 30m) kubelet Readiness probe failed: ls: cannot access /var/ready: No such file or directory
Your readiness probe is failing- looks like it is checking for the existence of a file at /var/ready.
Your next step is "does that make sense? Is my container going to actually write a file at /var/ready when its ready?" If so, you'll want to look at the logs from your pod and figure out why its not writing the file. If its NOT the correct check, look at the yaml you used to create your pod/deployment/replicaset whatever and replace that check with something that does make sense.

Related

kubernetes pod error: Message /usr/bin/mvn: exec format error

I am trying to create POD in kubernetes inside minikube, but getting Message exec /usr/bin/mvn: exec format error.
Background: so i am using Yaks from citrusframework which is used for BDD testing and supports kubernetes.
I am creating this POD using yaks run helloworld.feature file which creates pod inside which it does testing which we define in .feature file. Yaks based on cucumber.
kubectl describe pod test
Name: test-test-cf93r5qsoe02b4hj9o2g-jhqxf
Namespace: default
Priority: 0
Service Account: yaks-viewer
Node: minikube/192.168.49.2
Start Time: Thu, 26 Jan 2023 08:45:11 +0000
Labels: app=yaks
controller-uid=ff467e84-6ed5-47d7-9b6a-bd10128c589e
job-name=test-test-cf93r5qsoe02b4hj9o2g
yaks.citrusframework.org/test=test
yaks.citrusframework.org/test-id=cf93r5qsoe02b4hj9o2g
Annotations: <none>
Status: Failed
IP : 172.17.0.5
IPs:
IP: 172.17.0.5
Controlled By: Job/test-test-cf93r5qsoe02b4hj9o2g
Containers:
test:
Container ID: docker://e9017e0e5727d736ddbb6057e804f181d558492db18aa914d2cbc0eaeb3d9ee3
Image: docker.io/citrusframework/yaks:0.12.0
Image ID: docker-pullable://citrusframework/yaks#sha256:3504e26ae47bf5a613d38f641f4eb1b97e0bf72678e67ef9d13f41b74b31a70c
Port: <none>
Host Port: <none>
Command:
mvn
--no-snapshot-updates
-B
-q
--no-transfer-progress
-f
/deployments/data/yaks-runtime-maven
-s
/deployments/artifacts/settings.xml
verify
-Dit.test=org.citrusframework.yaks.feature.Yaks_IT
-Dmaven.repo.local=/deployments/artifacts/m2
State: Terminated
Reason: Error
Message: exec /usr/bin/mvn: exec format error
Exit Code: 1
Started: Thu, 26 Jan 2023 08:45:12 +0000
Finished: Thu, 26 Jan 2023 08:45:12 +0000
Ready: False
Restart Count: 0
Environment:
YAKS_TERMINATION_LOG: /dev/termination-log
YAKS_TESTS_PATH: /etc/yaks/tests
YAKS_SECRETS_PATH: /etc/yaks/secrets
YAKS_NAMESPACE: default
YAKS_CLUSTER_TYPE: KUBERNETES
YAKS_TEST_NAME: test
YAKS_TEST_ID: cf93r5qsoe02b4hj9o2g
Mounts:
/etc/yaks/secrets from secrets (rw)
/etc/yaks/tests from tests (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hn7kv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tests:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: test-test
Optional: false
secrets:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-hn7kv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
here are kubectl get all results:
NAME READY STATUS RESTARTS AGE
pod/hello-minikube 1/1 Running 3 (20h ago) 20h
pod/nginx 0/1 Completed 0 21h
pod/test-hello-world-cf94ttisoe02b4hj9o30-lcdmb 0/1 Error 0 56m
pod/test-test-cf93r5qsoe02b4hj9o2g-jhqxf 0/1 Error 0 131m
pod/yaks-operator-65b956c564-h5tsx 1/1 Running 0 131m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/yaks-operator 1/1 1 1 131m
NAME DESIRED CURRENT READY AGE
replicaset.apps/yaks-operator-65b956c564 1 1 1 131m
NAME COMPLETIONS DURATION AGE
job.batch/test-hello-world-cf94ttisoe02b4hj9o30 0/1 56m 56m
job.batch/test-test-cf93r5qsoe02b4hj9o2g 0/1 131m 131m
I tried to install mvn as error says it is formatting issue. but still its not the requirement to run yaks it sould also work without mvn. and should do his thing on its own
Additionall I found this but i dont think so/know if it will helps out.
kubectl logs deployment.apps/yaks-operator
{"level":"error","ts":1674732026.5891097,"logger":"controller.test-controller","msg":"Reconciler error","name":"test","namespace":"default","error":"invalid character 'e' looking for beginning of value","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/Users/cdeppisc/Projects/Go/pkg/mod/sigs.k8s.io/controller-runtime#v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/Users/cdeppisc/Projects/Go/pkg/mod/sigs.k8s.io/controller-runtime#v0.11.2/pkg/internal/controller/controller.go:227"}

CrashLoopBackOff : Back-off restarting failed container for flask application

I am a beginner in kubernetes and was trying to deploy my flask application following this guide: https://medium.com/analytics-vidhya/build-a-python-flask-app-and-deploy-with-kubernetes-ccc99bbec5dc
I have successfully built a docker image and pushed it to dockerhub https://hub.docker.com/repository/docker/beatrix1997/kubernetes_flask_app
but am having trouble debugging a pod.
This is my yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubernetesflaskapp-deploy
labels:
app: kubernetesflaskapp
spec:
replicas: 1
selector:
matchLabels:
app: kubernetesflaskapp
template:
metadata:
labels:
app: kubernetesflaskapp
spec:
containers:
- name: kubernetesflaskapp
image: beatrix1997/kubernetes_flask_app
ports:
- containerPort: 5000
And this is the description of the pod:
Name: kubernetesflaskapp-deploy-5764bbbd44-8696k
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Fri, 20 May 2022 11:26:33 +0100
Labels: app=kubernetesflaskapp
pod-template-hash=5764bbbd44
Annotations: <none>
Status: Running
IP: 172.17.0.12
IPs:
IP: 172.17.0.12
Controlled By: ReplicaSet/kubernetesflaskapp-deploy-5764bbbd44
Containers:
kubernetesflaskapp:
Container ID: docker://d500dc15e389190670a9273fea1d70e6bd6ab2e7053bd2480d114ad6150830f1
Image: beatrix1997/kubernetes_flask_app
Image ID: docker-pullable://beatrix1997/kubernetes_flask_app#sha256:1bfa98229f55b04f32a6b85d72860886abcc0f17295b14e173151a8e4b0f0334
Port: 5000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 20 May 2022 11:58:38 +0100
Finished: Fri, 20 May 2022 11:58:38 +0100
Ready: False
Restart Count: 11
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zq8n7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-zq8n7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 33m default-scheduler Successfully assigned default/kubernetesflaskapp-deploy-5764bbbd44-8696k to minikube
Normal Pulled 33m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 14.783413947s
Normal Pulled 33m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.243534487s
Normal Pulled 32m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.373217701s
Normal Pulling 32m (x4 over 33m) kubelet Pulling image "beatrix1997/kubernetes_flask_app"
Normal Created 32m (x4 over 33m) kubelet Created container kubernetesflaskapp
Normal Pulled 32m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.239794774s
Normal Started 32m (x4 over 33m) kubelet Started container kubernetesflaskapp
Warning BackOff 3m16s (x138 over 33m) kubelet Back-off restarting failed container
I am using ubuntu as my OS if it matters at all.
Any help would be appreciated!
Many thanks!
I would check the following:
Check if your Docker image works in Docker, you can run it with the run command, find the official doc here
If it doesn't work, then you can check what is wrong in your app first.
If it does, try checking the readiness and liveness probe, here the official documentation
You can find more hints about failing pods here
The error can be due to the issue in the application as the reported reason is "Back-off restarting failed container". Please paste the following logs in the question for further clarification
kubectl logs -n <NS> pods <pod-name>

Kubernetes Dashboard CrashLoopBackOff: timeout error on Raspberry Pi cluster

Should be a simple task, I simply want to run the Kubernetes Dashboard on a clean install of Kubernetes on a Raspberry Pi cluster.
What I've done:
Setup the initial cluster (hostname, static ip, cgroup, swapspace, install and configure docker, install kubernetes, setup kubernetes network and join nodes)
I have flannel installed
I have applied the dashboard
Bunch of random testing trying to figure this out
Obviously, as seen below, the container in the dashboard pod is not working because it cannot access kubernetes-dashboard-csrf. I have no idea why this cannot be accessed, my only thought is that I missed a step when setting up the cluster. I've followed about 6 different guides without success, prioritizing the official guide. I have also seen quite a few people having the same or similar issues that most have not posted a resolution. Thanks!
Nodes: kubectl get nodes
NAME STATUS ROLES AGE VERSION
gus3 Ready <none> 346d v1.23.1
juliet3 Ready <none> 346d v1.23.1
shawn4 Ready <none> 346d v1.23.1
vick4 Ready control-plane,master 346d v1.23.1
All Pods: kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-74ff55c5b-7j2xg 1/1 Running 27 346d
kube-system coredns-74ff55c5b-cb2x8 1/1 Running 27 346d
kube-system etcd-vick4 1/1 Running 2 169m
kube-system kube-apiserver-vick4 1/1 Running 2 169m
kube-system kube-controller-manager-vick4 1/1 Running 2 169m
kube-system kube-flannel-ds-gclmp 1/1 Running 0 11m
kube-system kube-flannel-ds-hshjv 1/1 Running 0 12m
kube-system kube-flannel-ds-kdd4w 1/1 Running 0 11m
kube-system kube-flannel-ds-wzhkt 1/1 Running 0 10m
kube-system kube-proxy-4t25v 1/1 Running 26 346d
kube-system kube-proxy-b6vbx 1/1 Running 26 346d
kube-system kube-proxy-jgj4s 1/1 Running 27 346d
kube-system kube-proxy-n65sl 1/1 Running 26 346d
kube-system kube-scheduler-vick4 1/1 Running 2 169m
kubernetes-dashboard dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 77m
kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 18 77m
Resources: kubectl get all -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 79m
pod/kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 19 79m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 172.20.0.191 <none> 8000/TCP 79m
service/kubernetes-dashboard ClusterIP 172.20.0.15 <none> 443/TCP 79m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 79m
deployment.apps/kubernetes-dashboard 0/1 1 0 79m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-5b8896d7fc 1 1 1 79m
replicaset.apps/kubernetes-dashboard-897c7599f 1 1 0 79m
Notice CrashLoopBackOff
Pod Details: kubectl describe pods kubernetes-dashboard-897c7599f-qss5p -n kubernetes-dashboard
Name: kubernetes-dashboard-897c7599f-qss5p
Namespace: kubernetes-dashboard
Priority: 0
Node: shawn4/192.168.10.71
Start Time: Fri, 17 Dec 2021 18:52:15 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=897c7599f
Annotations: <none>
Status: Running
IP: 172.19.1.75
IPs:
IP: 172.19.1.75
Controlled By: ReplicaSet/kubernetes-dashboard-897c7599f
Containers:
kubernetes-dashboard:
Container ID: docker://894a354e40ca1a95885e149dcd75415e0f186ead3f2e05ec0787f4b1c7a29622
Image: kubernetesui/dashboard:v2.4.0
Image ID: docker-pullable://kubernetesui/dashboard#sha256:526850ae4ea9aba360e72b6df69fd3126b129d446efe83ac5250282b85f95b7f
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 17 Dec 2021 20:10:19 +0000
Finished: Fri, 17 Dec 2021 20:10:49 +0000
Ready: False
Restart Count: 19
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-wq9m8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-wq9m8:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-wq9m8
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 21s (x327 over 79m) kubelet Back-off restarting failed container
Logs: kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p
2021/12/17 20:10:19 Starting overwatch
2021/12/17 20:10:19 Using namespace: kubernetes-dashboard
2021/12/17 20:10:19 Using in-cluster config to connect to apiserver
2021/12/17 20:10:19 Using secret token for csrf signing
2021/12/17 20:10:19 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get "https://172.20.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf": dial tcp 172.20.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0x400055fae8)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:41 +0x350
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:66
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:502 +0x8c
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:470 +0x40
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:551
main.main()
/home/runner/work/dashboard/dashboard/src/app/backend/dashboard.go:95 +0x1dc
If you need any more information please ask!
UPDATE 12/29/21:
Fixed this issue by reinstalling the cluster to the newest versions of Kubernetes and Ubuntu.
Turned out there were several issues:
I was using Ubuntu Buster which is deprecated.
My client/server Kubernetes versions were +/-0.3 out of sync
I was following outdated instructions
I reinstalled the whole cluster following Kubernetes official guide and, with a few snags along the way, it works!

Kubernetes Pod Stuck in Pending Without Indicating Any Reason

We are using client-go to create kubernetes jobs and deployments. Today in one of our cluster (kubernetes v1.18.19), I encounter below weird problem.
Pods of kubernetes Job are always stuck in Pending status, without any reasons. kubectl describe pod shows there are no events. Creating Jobs from host (via kubectl) are normal and pods became running eventually.
What surprises me is Creating Deployments is ok, pods get running eventually!! It won't work only for Kubernetes Jobs. Why? How to fix that?? What I can do?? I have taken hours here but got no progress.
kubeconfig by client-go:
Mount from host machine, path: /root/.kube/config
kubectl describe job shows:
Name: unittest
Namespace: default
Selector: controller-uid=f3cec901-c0f4-4098-86d7-f9a7d1fe6cd1
Labels: job-id=unittest
Annotations: <none>
Parallelism: 1
Completions: 1
Start Time: Sat, 19 Jun 2021 00:20:12 +0800
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=f3cec901-c0f4-4098-86d7-f9a7d1fe6cd1
job-name=unittest
Containers:
unittest:
Image: ubuntu:18.04
Port: <none>
Host Port: <none>
Command:
echo hello
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 21m job-controller Created pod: unittest-tt5b2
Kubectl describe on target pod shows:
Name: unittest-tt5b2
Namespace: default
Priority: 0
Node: <none>
Labels: controller-uid=f3cec901-c0f4-4098-86d7-f9a7d1fe6cd1
job-name=unittest
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: Job/unittest
Containers:
unittest:
Image: ubuntu:18.04
Port: <none>
Host Port: <none>
Command:
echo hello
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-72g27 (ro)
Volumes:
default-token-72g27:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-72g27
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
kubectl get events shows:
55m Normal ScalingReplicaSet deployment/job-scheduler Scaled up replica set job-scheduler-76b7465d74 to 1
19m Normal ScalingReplicaSet deployment/job-scheduler Scaled up replica set job-scheduler-74f8896f48 to 1
58m Normal SuccessfulCreate job/unittest Created pod: unittest-pp665
49m Normal SuccessfulCreate job/unittest Created pod: unittest-xm6ck
17m Normal SuccessfulCreate job/unittest Created pod: unittest-tt5b2
I fixed the issue.
We use a custom scheduler for NPU devices and default scheduler for GPU devices. For GPU devices, the scheduler name is "default-scheduler" other than "default". I passed "default" for those kube Jobs, this causes the pods to stuck in pending.

Dashboard not running

I have setup kubenertes on ubuntu server using this link.
Then I installed kubernetes dashboard using:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-rc6/aio/deploy/recommended.yaml
Then I changed the ClusterIP to NodePort 32323, the service to NodePort.
But container is not running.
uday#dockermaster:~$ kubectl -n kubernetes-dashboard get all
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-779f5454cb-pqfrj 1/1 Running 0 50m
pod/kubernetes-dashboard-64686c4bf9-5jkwq 0/1 CrashLoopBackOff 14 50m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 10.103.22.252 <none> 8000/TCP 50m
service/kubernetes-dashboard NodePort 10.102.48.80 <none> 443:32323/TCP 50m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 50m
deployment.apps/kubernetes-dashboard 0/1 1 0 50m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-779f5454cb 1 1 1 50m
replicaset.apps/kubernetes-dashboard-64686c4bf9 1 1 0 50m
uday#dockermaster:~$ kubectl -n kubernetes-dashboard describe svc kubernetes-dashboard
Name: kubernetes-dashboard
Namespace: kubernetes-dashboard
Labels: k8s-app=kubernetes-dashboard
Annotations: Selector: k8s-app=kubernetes-dashboard
Type: NodePort
IP: 10.102.48.80
Port: <unset> 443/TCP
TargetPort: 8443/TCP
NodePort: <unset> 32323/TCP
Endpoints:
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Other apps are working fine with NodePort, whether Tomcat/nginx/databases.
But here, it is failing with container creation.
C:\Users\uday\Desktop>kubectl.exe get pods -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-779f5454cb-pqfrj 1/1 Running 1 20h
kubernetes-dashboard-64686c4bf9-g9z2k 0/1 CrashLoopBackOff 84 18h
C:\Users\uday\Desktop>kubectl.exe describe pod kubernetes-dashboard-64686c4bf9-g9z2k -n kubernetes-dashboard
Name: kubernetes-dashboard-64686c4bf9-g9z2k
Namespace: kubernetes-dashboard
Priority: 0
Node: slave-node/10.0.0.6
Start Time: Sat, 28 Mar 2020 14:16:54 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=64686c4bf9
Annotations: <none>
Status: Running
IP: 182.244.1.12
IPs:
IP: 182.244.1.12
Controlled By: ReplicaSet/kubernetes-dashboard-64686c4bf9
Containers:
kubernetes-dashboard:
Container ID: docker://470ee8c61998c3c3dda86c58ad17817468f55aa73cd4feecf3b018977ce13ca3
Image: kubernetesui/dashboard:v2.0.0-rc6
Image ID: docker-pullable://kubernetesui/dashboard#sha256:61f9c378c427a3f8a9643f83baa9f96db1ae1357c67a93b533ae7b36d71c69dc
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sun, 29 Mar 2020 09:01:31 +0000
Finished: Sun, 29 Mar 2020 09:02:01 +0000
Ready: False
Restart Count: 84
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-pzfbl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-pzfbl:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-pzfbl
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m49s (x501 over 123m) kubelet, slave-node Back-off restarting failed container
kubectl.exe logs kubernetes-dashboard-64686c4bf9-g9z2k -n kubernetes-dashboard
2020/03/29 09:01:31 Starting overwatch
2020/03/29 09:01:31 Using namespace: kubernetes-dashboard
2020/03/29 09:01:31 Using in-cluster config to connect to apiserver
2020/03/29 09:01:31 Using secret token for csrf signing
2020/03/29 09:01:31 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get https://10.96.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf: dial tcp 10.96.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0xc0004e2dc0)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/csrf/manager.go:40 +0x3b0
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/csrf/manager.go:65
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0xc00043ae80)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:499 +0xc6
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0xc00043ae80)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:467 +0x47
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:548
main.main()
/home/travis/build/kubernetes/dashboard/src/app/backend/dashboard.go:105 +0x20d
The Problem
The reason the application is not coming up is because the Dashboard container itself is not running. If you look at the output you provided you can see this:
pod/kubernetes-dashboard-64686c4bf9-5jkwq 0/1 CrashLoopBackOff 14
So how do we troubleshoot this? Well, there are three principle ways. One of which you'll probably be using more than the other two.
Describe
Describe is a is a command that allows you to fetch details about a resource in kubernetes. This could be metadata information, the amount of replicas you assigned, or even some events depicting why a resource is failing to start. For example, a referenced Container Image in your Pod manifest can not be found in the usable container registries. The syntax for using Describe is like so:
kubectl describe pod -n kubernetes-dashboard kubernetes-dashboard-64686c4bf9-5jkwq
Here are some great docs on the tool as well.
Logs
The next troubleshooting step you'll likely take advantage of in Kubernetes is using the logging architecture. As you're probably aware, when a Docker container is spawned it is common practice to have the logs that are produced by the application be redirected to STDOUT or STDERR for the process. Kubernetes them captures this log data for you and provides an API abstraction layer with which you can interact with it. Sometimes your Describe events won't have any indication of why a process isn't running. However, you can then proceed with grabbing logs from the process to determine what is going wrong. An example syntax might look like:
kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-64686c4bf9-5jkwq
Exec
The last common troubleshooting technique is Exec. Exec effectively allows you to attach to the a shell in a running container so that you can interact with the live environment to troubleshoot the application. This allows you to do things like see if configuration files were properly staged on the container's filesystem, determine if environment variables were properly expanded and set, etc. An example syntax for Exec might look like:
kubectl exec -it -n kubernetes-dashboard kubernetes-dashboard-64686c4bf9-5jkwq sh
In this case, however, your pod is in a CrashLoopBackoff state. This means that you will not be able to exec into it due to the fact that the container is not running. The Kubernetes API Server recognized a pattern of failures and automatically reduced its attempts at scheduling the workload accordingly.
Here is a good thread on how to troubleshoot pods that enter this state.
Summary
So, now that I've said all of this. How do we answer your question? Well, we can't answer it directly. But I sort of did with my summary above. Because the real answer you're looking for is how to properly troubleshoot linux containers running in Kubernetes. These issues will be a reoccurring theme in your experience with Kubernetes so it's essential to develop debugging skills in the ecosystem as soon as possible.
If the Describe, Logs, and Exec command are unable to help you find out why the Dashboard pod is failing to come up, feel free to add a comment on this answer requesting additional support and I'll be happy to help where I can!