Debug a pod stuck in pending state [duplicate] - kubernetes

This question already has an answer here:
Error "pod has unbound immediate PersistentVolumeClaim" during statefulset deployment
(1 answer)
Closed 2 years ago.
How could I debug a pod stuck in pending state? I am using k8ssandra https://k8ssandra.io/docs/ to create a Cassandra cluster. It uses helm files. I created a 3 nodes cluster and changed size value to 3 in local values.yaml file to create a 3 node cluster - https://github.com/k8ssandra/k8ssandra/blob/main/charts/k8ssandra-cluster/values.yaml
no_reply#cloudshell:~ (k8ssandra-299315)$ kubectl get pods
NAME READY STATUS RESTARTS AGE
cass-operator-86d4dc45cd-588c8 1/1 Running 0 29h
grafana-deployment-66557855cc-j7476 1/1 Running 0 29h
k8ssandra-cluster-a-grafana-operator-k8ssandra-5b89b64f4f-8pbxk 1/1 Running 0 29h
k8ssandra-cluster-a-reaper-k8ssandra-847c99ccd8-dsnj4 1/1 Running 0 28h
k8ssandra-cluster-a-reaper-k8ssandra-schema-5fzpn 0/1 Completed 0 28h
k8ssandra-cluster-a-reaper-operator-k8ssandra-87d56d56f-wn8hw 1/1 Running 0 29h
k8ssandra-dc1-default-sts-0 2/2 Running 0 29h
**k8ssandra-dc1-default-sts-1 0/2 Pending 0 14m**
k8ssandra-dc1-default-sts-2 2/2 Running 0 14m
k8ssandra-tools-kube-prome-operator-6bcdf668d4-ndhw9 1/1 Running 0 29h
prometheus-k8ssandra-cluster-a-prometheus-k8ssandra-0 2/2 Running 1 29h

The best way as described by Arghya is checking the events of the pod.
kubectl describe pod k8ssandra-dc1-default-sts-1
You could also check for the logs of the pod:
kubectl logs k8ssandra-dc1-default-sts-1

Related

How to debug GKE internal network issue?

UPDATE 1:
Some more logs from api-servers:
https://gist.github.com/nvcnvn/47df8798e798637386f6e0777d869d4f
This question is more about debugging method for current GKE but welcome for solution.
We're using GKE version 1.22.3-gke.1500 with following configuration:
We recently facing issue that commands like kubectl logs and exec doesn't work, deleting a namespace taking forever.
Checking some service inside the cluster, it seem for some reason some network operation just randomly failed. For example metric-server keep crashing with these error logs:
message: "pkg/mod/k8s.io/client-go#v0.19.10/tools/cache/reflector.go:156: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://10.97.0.1:443/api/v1/nodes?resourceVersion=387681528": net/http: TLS handshake timeout"
HTTP request timeout also:
unable to fully scrape metrics: unable to fully scrape metrics from node gke-staging-n2d-standard-8-78c35b3a-6h16: unable to fetch metrics from node gke-staging-n2d-standard-8-78c35b3a-6h16: Get "http://10.148.15.217:10255/stats/summary?only_cpu_and_memory=true": context deadline exceeded
and I also try to restart (by kubectl delete) most of the pod in this list:
kubectl get pod
NAME READY STATUS RESTARTS AGE
event-exporter-gke-5479fd58c8-snq26 2/2 Running 0 4d7h
fluentbit-gke-gbs2g 2/2 Running 0 4d7h
fluentbit-gke-knz2p 2/2 Running 0 85m
fluentbit-gke-ljw8h 2/2 Running 0 30h
gke-metadata-server-dtnvh 1/1 Running 0 4d7h
gke-metadata-server-f2bqw 1/1 Running 0 30h
gke-metadata-server-kzcv6 1/1 Running 0 85m
gke-metrics-agent-4g56c 1/1 Running 12 (3h6m ago) 4d7h
gke-metrics-agent-hnrll 1/1 Running 13 (13h ago) 30h
gke-metrics-agent-xdbrw 1/1 Running 0 85m
konnectivity-agent-87bc84bb7-g9nd6 1/1 Running 0 2m59s
konnectivity-agent-87bc84bb7-rkhhh 1/1 Running 0 3m51s
konnectivity-agent-87bc84bb7-x7pk4 1/1 Running 0 3m50s
konnectivity-agent-autoscaler-698b6d8768-297mh 1/1 Running 0 83m
kube-dns-77d9986bd5-2m8g4 4/4 Running 0 3h24m
kube-dns-77d9986bd5-z4j62 4/4 Running 0 3h24m
kube-dns-autoscaler-f4d55555-dmvpq 1/1 Running 0 83m
kube-proxy-gke-staging-n2d-standard-8-78c35b3a-8299 1/1 Running 0 11s
kube-proxy-gke-staging-n2d-standard-8-78c35b3a-fp5u 1/1 Running 0 11s
kube-proxy-gke-staging-n2d-standard-8-78c35b3a-rkdp 1/1 Running 0 11s
l7-default-backend-7db896cb4-mvptg 1/1 Running 0 83m
metrics-server-v0.4.4-fd9886cc5-tcscj 2/2 Running 82 33h
netd-5vpmc 1/1 Running 0 30h
netd-bhq64 1/1 Running 0 85m
netd-n6jmc 1/1 Running 0 4d7h
Some logs from metrics server
https://gist.github.com/nvcnvn/b77eb02705385889961aca33f0f841c7
if you cannot use kubectl to get info from your cluster, can you try to access them by using their restfull api
http://blog.madhukaraphatak.com/understanding-k8s-api-part-2/
try to delete "metric-server" pods or get logs from it using podman or curl command.

What is 'AVAILABLE' column in kubernetes daemonsets

I may have a stupid question but could someone explain what "Available" correctly represent in DaemonSets? I checked What is the difference between current and available pod replicas in kubernetes deployment? answer but there are no readiness errors.
In cluster i see below status:
$ kubectl get ds -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR
kube-proxy 6 6 5 6 5 beta.kubernetes.io/os=linux
Why it is showing as 5 instead of 6?
all pods are running perfectly fine without any "readiness" errors or restarts?
$ kubectl get pods -n kube-system | grep kube-proxy
kube-proxy-cv7vv 1/1 Running 0 20d
kube-proxy-kcd67 1/1 Running 0 20d
kube-proxy-l4nfk 1/1 Running 0 20d
kube-proxy-mkvjd 1/1 Running 0 87d
kube-proxy-qb7nz 1/1 Running 0 36d
kube-proxy-x8l87 1/1 Running 0 87d
Could someone tell what can be checked further?
The Available field shows the number of replicas or pods that are ready to accept traffic and passed all the criterion such as readiness or liveness probe or any other condition that verifies that your application is ready to serve the requests coming from user.

Delete all the pods created by applying Helm2.13.1

I'm new to Helm. I'm trying to deploy a simple server on the master node. When I do helm install and see the details using the command kubectl get po,svc I see lot of pods created other than the pods I intend to deploy.So, My precise questions are:
Why so many pods got created?
How do I delete all those pods?
Below is the output of the command kubectl get po,svc:
NAME READY STATUS RESTARTS AGE
pod/altered-quoll-stx-sdo-chart-6446644994-57n7k 1/1 Running 0 25m
pod/austere-garfish-stx-sdo-chart-5b65d8ccb7-jjxfh 1/1 Running 0 25m
pod/bald-hyena-stx-sdo-chart-9b666c998-zcfwr 1/1 Running 0 25m
pod/cantankerous-pronghorn-stx-sdo-chart-65f5699cdc-5fkf9 1/1 Running 0 25m
pod/crusty-unicorn-stx-sdo-chart-7bdcc67546-6d295 1/1 Running 0 25m
pod/exiled-puffin-stx-sdo-chart-679b78ccc5-n68fg 1/1 Running 0 25m
pod/fantastic-waterbuffalo-stx-sdo-chart-7ddd7b54df-p78h7 1/1 Running 0 25m
pod/gangly-quail-stx-sdo-chart-75b9dd49b-rbsgq 1/1 Running 0 25m
pod/giddy-pig-stx-sdo-chart-5d86844569-5v8nn 1/1 Running 0 25m
pod/hazy-indri-stx-sdo-chart-65d4c96f46-zmvm2 1/1 Running 0 25m
pod/interested-macaw-stx-sdo-chart-6bb7874bbd-k9nnf 1/1 Running 0 25m
pod/jaundiced-orangutan-stx-sdo-chart-5699d9b44b-6fpk9 1/1 Running 0 25m
pod/kindred-nightingale-stx-sdo-chart-5cf95c4d97-zpqln 1/1 Running 0 25m
pod/kissing-snail-stx-sdo-chart-854d848649-54m9w 1/1 Running 0 25m
pod/lazy-tiger-stx-sdo-chart-568fbb8d65-gr6w7 1/1 Running 0 25m
pod/nonexistent-octopus-stx-sdo-chart-5f8f6c7ff8-9l7sm 1/1 Running 0 25m
pod/odd-boxer-stx-sdo-chart-6f5b9679cc-5stk7 1/1 Running 1 15h
pod/orderly-chicken-stx-sdo-chart-7889b64856-rmq7j 1/1 Running 0 25m
pod/redis-697fb49877-x5hr6 1/1 Running 0 25m
pod/rv.deploy-6bbffc7975-tf5z4 1/2 CrashLoopBackOff 93 30h
pod/sartorial-eagle-stx-sdo-chart-767d786685-ct7mf 1/1 Running 0 25m
pod/sullen-gnat-stx-sdo-chart-579fdb7df7-4z67w 1/1 Running 0 25m
pod/undercooked-cow-stx-sdo-chart-67875cc5c6-mwvb7 1/1 Running 0 25m
pod/wise-quoll-stx-sdo-chart-5db8c766c9-mhq8v 1/1 Running 0 21m
You can run the command helm ls to see all the deployed helm releases in your cluster.
To remove the release (and every resource it created, including the pods), run: helm delete RELEASE_NAME --purge.
If you want to delete all the pods in your namespace without your Helm release (I DON'T think this is what you're looking for), you can run: kubectl delete pods --all.
On a side note, if you're new to Helm, consider starting with Helm v3 since it has many improvements, and specially because the migration from v2 to v3 can become cumbersome, and if you can avoid it - you should.

promethues operator alertmanager-main-0 pending and display

What happened?
kubernetes version: 1.12
promethus operator: release-0.1
I follow the README:
$ kubectl create -f manifests/
# It can take a few seconds for the above 'create manifests' command to fully create the following resources, so verify the resources are ready before proceeding.
$ until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
$ kubectl apply -f manifests/ # This command sometimes may need to be done twice (to workaround a race condition).
and then I use the command and then is showed like:
[root#VM_8_3_centos /data/hansenwu/kube-prometheus/manifests]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 66s
alertmanager-main-1 1/2 Running 0 47s
grafana-54f84fdf45-kt2j9 1/1 Running 0 72s
kube-state-metrics-65b8dbf498-h7d8g 4/4 Running 0 57s
node-exporter-7mpjw 2/2 Running 0 72s
node-exporter-crfgv 2/2 Running 0 72s
node-exporter-l7s9g 2/2 Running 0 72s
node-exporter-lqpns 2/2 Running 0 72s
prometheus-adapter-5b6f856dbc-ndfwl 1/1 Running 0 72s
prometheus-k8s-0 3/3 Running 1 59s
prometheus-k8s-1 3/3 Running 1 59s
prometheus-operator-5c64c8969-lqvkb 1/1 Running 0 72s
[root#VM_8_3_centos /data/hansenwu/kube-prometheus/manifests]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 0/2 Pending 0 0s
grafana-54f84fdf45-kt2j9 1/1 Running 0 75s
kube-state-metrics-65b8dbf498-h7d8g 4/4 Running 0 60s
node-exporter-7mpjw 2/2 Running 0 75s
node-exporter-crfgv 2/2 Running 0 75s
node-exporter-l7s9g 2/2 Running 0 75s
node-exporter-lqpns 2/2 Running 0 75s
prometheus-adapter-5b6f856dbc-ndfwl 1/1 Running 0 75s
prometheus-k8s-0 3/3 Running 1 62s
prometheus-k8s-1 3/3 Running 1 62s
prometheus-operator-5c64c8969-lqvkb 1/1 Running 0 75s
I don't know why the pod altertmanager-main-0 pending and disaply then restart.
And I see the event, it is showed as:
72s Warning FailedCreate StatefulSet create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: The POST operation against Pod could not be completed at this time, please try again.
72s Warning FailedCreate StatefulSet create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: The POST operation against Pod could not be completed at this time, please try again.
72s Warning^Z FailedCreate StatefulSet
[10]+ Stopped kubectl get events -n monitoring
Most likely the alertmanager does not get enough time to start correctly.
Have a look at this answer : https://github.com/coreos/prometheus-operator/issues/965#issuecomment-460223268
You can set the paused field to true, and then modify the StatefulSet to try if extending the liveness/readiness solves your issue.

Kubernetes dashboard (web UI) not working

I have just started a new Kubernetes 1.8.0 environment using minikube (0.27) on Windows 10.
I followed this steps but it didn't work:
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
When I list pods this is the result:
C:\WINDOWS\system32>kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-minikube 1/1 Running 0 23m
kube-system heapster-69b5d4974d-s9vrf 1/1 Running 0 5m
kube-system kube-addon-manager-minikube 1/1 Running 0 23m
kube-system kube-apiserver-minikube 1/1 Running 0 23m
kube-system kube-controller-manager-minikube 1/1 Running 0 23m
kube-system kube-dns-545bc4bfd4-xkt7l 3/3 Running 3 1h
kube-system kube-proxy-7jnk6 1/1 Running 0 23m
kube-system kube-scheduler-minikube 1/1 Running 0 23m
kube-system kubernetes-dashboard-5569448c6d-8zqnc 1/1 Running 2 52m
kube-system kubernetes-dashboard-869db7f6b4-ddlmq 0/1 CrashLoopBackOff 19 51m
kube-system monitoring-influxdb-78d4c6f5b6-b66m9 1/1 Running 0 4m
kube-system storage-provisioner 1/1 Running 2 1h
As you can see, I have 2 kubernets-dashboard pods now, one of then is running and the other one is CrashLookBackOff.
When I try to run minikube dashboard this is the result:
"Waiting, endpoint for service is not ready yet..."
I have tried to remove kubernetes-dashboard-869db7f6b4-ddlmq pod:
kubectl delete pod kubernetes-dashboard-869db7f6b4-ddlmq
This is the result:
"Error from server (NotFound): pods "kubernetes-dashboard-869db7f6b4-ddlmq" not found"
"Error from server (NotFound): pods "kubernetes-dashboard-869db7f6b4-ddlmq" not found"
You failed to delete the pod due to the lack of namespace (add -n kube-system). And it should be 1 dashboard pod if no modification's applied. If it still fails to run minikube dashboard after you delete the abnormal pod, more logs should be provided.