kubernetes UnexpectedAdmissionError after rollout - kubernetes

I had a service failing to reply to some HTTP requests, digging it's logs it seemed to be some sort of DNS failure on reaching a proxy service
'proxy' failed to resolve 'proxy.default.svc.cluster.local' after 2 queries
So I could not find anything wrong and tried kubectl rollout restart deployment/backend.
Just after that these appeared in the pods list:
backend-54769cbb4-xkwf2 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xlpgf 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xmnr5 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-xmq5n 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-xphrw 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-xrmrq 0/1 UnexpectedAdmissionError 0 4h1m
backend-54769cbb4-xrmw8 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xt4ck 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xws8r 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xx6r4 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-xxpfd 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xzjql 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xzzlk 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-z46ms 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z4sl7 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-z6jpj 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z6ngq 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-z8w4h 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z9jqb 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zbvqm 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zcfxg 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zcvqm 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zf2f8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zgnkh 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zhdr8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zhx6g 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zj8f2 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zjbwp 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-zjc8g 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zjdcp 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-zkcrb 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zlpll 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zm2cx 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zn7mr 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-znjkp 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zpnk7 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zrrl7 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zsdsz 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-ztdx8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-ztln6 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-ztplg 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-ztzfh 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zvb8g 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zwsr8 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zwvxr 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-zwx6h 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zz4bf 0/1 UnexpectedAdmissionError 0 4h1m
backend-54769cbb4-zzq6t 0/1 UnexpectedAdmissionError 0 4h2m
(and many more of these)
So I added two more nodes, and now everything seems fine except for this big list of pods in an error state which I don't understand. What is this UnexpectedAdmissionError, and what should I do about it?
Note: this is a DigitalOcean cluster
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:38:36Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
The following seems important: kubectl describe one_failed_pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m51s default-scheduler Successfully assigned default/backend-549f576d5f-xzdv4 to std-16gb-g7mo
Warning UnexpectedAdmissionError 2m51s kubelet, std-16gb-g7mo Update plugin resources failed due to failed to write checkpoint file "kubelet_internal_checkpoint": write /var/lib/kubelet/device-plugins/.543592130: no space left on device, which is unexpected.

I had the same issue, while describing one of the pods with UnexpectedAdmissionError I saw the following:
Update plugin resources failed due to failed to write deviceplugin checkpoint file "kubelet_internal_checkpoint": write /var/lib/kubelet/device-plugins/.525608957: no space left on device, which is unexpected.
when doing describing node:
OutOfDisk Unknown Tue, 30 Jun 2020 14:07:04 -0400 Tue, 30 Jun 2020 14:12:05 -0400 NodeStatusUnknown Kubelet stopped posting node status.
I resolved this by rebooting node

Because the pod was not even started you can't actually check the logs. However describing the pod provided me with the error. We had some disk/cpu/memory utilization issues with worker5 node.
kubectl get pods | grep -i err
kube-system coredns-autoscaler-79599b9dc6-6l8s8 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kube-system coredns-autoscaler-79599b9dc6-kzt9z 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kube-system coredns-autoscaler-79599b9dc6-tgkrc 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kubectl describe pod -n kube-system coredns-autoscaler-79599b9dc6-kzt9z
Reason: UnexpectedAdmissionError
Message: Pod Allocate failed due to failed to write checkpoint file "kubelet_internal_checkpoint": mkdir /var: file exists, which is unexpected
First step was rebooting the node which fixed the issue. The reason was we had restored some backups to the new cluster and restore process caused this issue.
For the pods because they were a part of replica set, they got spawned on other worker nodes. Therefore we deleted the pods.
A quick way to delete a lot of pods, you can use:
kubectl get pods -n namespace | grep -i Error | cut -d' ' -f 1 | xargs kubectl delete pod
To delete all the erroraneous pods in entire cluster
kubectl get pods -A | grep -i Error | awk '{print $2}' | xargs kubectl delete pod
You can use flag -A/--all-namespaces to get pods from all namespaces in the cluster.
However if they are not getting spawned automatically which would be weird, you can run kubectl replace
kubectl get pod coredns-autoscaler-79599b9dc6-6l8s8 -n kube-system -o yaml | kubectl replace --force -f -
For further a verbose read please refer kubectl replace --help and the following blog

Related

Unable to find correct selector for vault on Kubernetes with MicroK8s

I am currently working for the first time with a vault and I am encountering an issue. I am following the official guide from hashicorp to install the vault on a Kubernetes cluster and I am running it with MicroK8s.
However, I am not able to unseal my vault. As you can see it is running but when I try to use the selector to be able to unseal afterwards, it doesn't find the namespace.
I am fairly new to this but I wasn't able to find the solution online.
Any help would be appreciated.
Please find below the console output.
Thank you !
root#vault-dev:/home/dev# kubectl --namespace="vault" get pods
NAME READY STATUS RESTARTS AGE
vault-0 0/1 Pending 0 12m
vault-1 0/1 Pending 0 12m
vault-2 0/1 Pending 0 12m
vault-3 0/1 Pending 0 12m
vault-4 0/1 Pending 0 12m
vault-agent-injector-7969bb745-nmswj 1/1 Running 0 12m
root#vault-dev:/home/dev# kubectl exec --stdin=true --tty=true vault-0 -- vault operator init
Error from server (NotFound): pods "vault-0" not found
root#vault-dev:/home/dev# kubectl --namespace="vault" get pods
NAME READY STATUS RESTARTS AGE
vault-0 0/1 Pending 0 15m
vault-1 0/1 Pending 0 15m
vault-2 0/1 Pending 0 15m
vault-3 0/1 Pending 0 15m
vault-4 0/1 Pending 0 15m
vault-agent-injector-7969bb745-nmswj 1/1 Running 0 15m
root#vault-dev:/home/dev# kubectl --namespace="vault" get all
NAME READY STATUS RESTARTS AGE
pod/vault-0 0/1 Pending 0 15m
pod/vault-1 0/1 Pending 0 15m
pod/vault-2 0/1 Pending 0 15m
pod/vault-3 0/1 Pending 0 15m
pod/vault-4 0/1 Pending 0 15m
pod/vault-agent-injector-7969bb745-nmswj 1/1 Running 0 15m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/vault-internal ClusterIP None <none> 8200/TCP,8201/TCP 15m
service/vault-ui LoadBalancer REDACTED <pending> 8200:32269/TCP 15m
service/vault-standby ClusterIP REDACTED <none> 8200/TCP,8201/TCP 15m
service/vault-agent-injector-svc ClusterIP REDACTED <none> 443/TCP 15m
service/vault ClusterIP REDACTED <none> 8200/TCP,8201/TCP 15m
service/vault-active ClusterIP REDACTED <none> 8200/TCP,8201/TCP 15m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/vault-agent-injector 1/1 1 1 15m
NAME DESIRED CURRENT READY AGE
replicaset.apps/vault-agent-injector-7969bb745 1 1 1 15m
NAME READY AGE
statefulset.apps/vault 0/5 15m
root#vault-dev:/home/dev# kubectl get pods --selector='statefulset.apps/name=vault' --namespace=' vault'
No resources found in vault namespace.

Kubernetes CrashLoopBackOff default timing

What are the defaults for the Kubernetes CrashLoopBackOff?
Say, I have a pod:
kubectl run mynginx --image nginx -- echo hello
And I inspect its status:
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
mynginx 0/1 Pending 0 0s
mynginx 0/1 Pending 0 0s
mynginx 0/1 ContainerCreating 0 0s
mynginx 0/1 Completed 0 2s
mynginx 0/1 Completed 1 4s
mynginx 0/1 CrashLoopBackOff 1 5s
mynginx 0/1 Completed 2 20s
mynginx 0/1 CrashLoopBackOff 2 33s
mynginx 0/1 Completed 3 47s
mynginx 0/1 CrashLoopBackOff 3 59s
mynginx 0/1 Completed 4 97s
mynginx 0/1 CrashLoopBackOff 4 109s
This is "expected". Kubernetes starts a pod, it quits "too fast", Kubernetes schedules it again and then Kubernetes sets the state to CrashLoopBackOff.
Now, if i start a pod slightly differently:
kubectl run mynginx3 --image nginx -- /bin/bash -c "sleep 10; echo hello"
I get the following
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
mynginx3 0/1 Pending 0 0s
mynginx3 0/1 Pending 0 0s
mynginx3 0/1 ContainerCreating 0 0s
mynginx3 1/1 Running 0 2s
mynginx3 0/1 Completed 0 12s
mynginx3 1/1 Running 1 14s
mynginx3 0/1 Completed 1 24s
mynginx3 0/1 CrashLoopBackOff 1 36s
mynginx3 1/1 Running 2 38s
mynginx3 0/1 Completed 2 48s
mynginx3 0/1 CrashLoopBackOff 2 62s
mynginx3 1/1 Running 3 75s
mynginx3 0/1 Completed 3 85s
mynginx3 0/1 CrashLoopBackOff 3 96s
mynginx3 1/1 Running 4 2m14s
mynginx3 0/1 Completed 4 2m24s
mynginx3 0/1 CrashLoopBackOff 4 2m38s
This is also expected.
But say I set sleep for 24 hours, would I still get the same CrashLoopBackOff after two pod exits initially and then after each next pod exit?
Based on these docs:
The restartPolicy applies to all containers in the Pod. restartPolicy only refers to restarts of the containers by the kubelet on the same node. After containers in a Pod exit, the kubelet restarts them with an exponential back-off delay (10s, 20s, 40s, …), that is capped at five minutes. Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container.
I think that means that anything that executes for longer than 10 minutes before exiting will not trigger a CrashLoopBackOff status.

rook-ceph-osd-prepare pod stuck for hours

I am new to ceph and using rook to install ceph in k8s cluster. I see that pod rook-ceph-osd-prepare is in Running status forever and stuck on below line:
2020-06-15 20:09:02.260379 D | exec: Running command: ceph auth get-or-create-key
client.bootstrap-osd mon allow profile bootstrap-osd --connect-timeout=15 --cluster=rook-ceph
--conf=/var/lib/rook/rook-ceph/rook-ceph.config
--name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring
--format json --out-file /tmp/180401029
When I logged into container and ran the same command, I see that its stuck too and after pressing ^C it showed this:
Traceback (most recent call last):
File "/usr/bin/ceph", line 1266, in <module>
retval = main()
File "/usr/bin/ceph", line 1197, in main
verbose)
File "/usr/bin/ceph", line 622, in new_style_command
ret, outbuf, outs = do_command(parsed_args, target, cmdargs, sigdict, inbuf, verbose)
File "/usr/bin/ceph", line 596, in do_command
return ret, '', ''
Below are all my pods:
rook-ceph csi-cephfsplugin-9k9z2 3/3 Running 0 9h
rook-ceph csi-cephfsplugin-mjsbk 3/3 Running 0 9h
rook-ceph csi-cephfsplugin-mrqz5 3/3 Running 0 9h
rook-ceph csi-cephfsplugin-provisioner-5ffbdf7856-59cf7 5/5 Running 0 9h
rook-ceph csi-cephfsplugin-provisioner-5ffbdf7856-m4bhr 5/5 Running 0 9h
rook-ceph csi-cephfsplugin-xgvz4 3/3 Running 0 9h
rook-ceph csi-rbdplugin-6k4dk 3/3 Running 0 9h
rook-ceph csi-rbdplugin-klrwp 3/3 Running 0 9h
rook-ceph csi-rbdplugin-provisioner-68d449986d-2z9gr 6/6 Running 0 9h
rook-ceph csi-rbdplugin-provisioner-68d449986d-mzh9d 6/6 Running 0 9h
rook-ceph csi-rbdplugin-qcmrj 3/3 Running 0 9h
rook-ceph csi-rbdplugin-zdg8z 3/3 Running 0 9h
rook-ceph rook-ceph-crashcollector-k8snode001-76ffd57d58-slg5q 1/1 Running 0 9h
rook-ceph rook-ceph-crashcollector-k8snode002-85b6d9d699-s8m8z 1/1 Running 0 9h
rook-ceph rook-ceph-crashcollector-k8snode004-847bdb4fc5-kk6bd 1/1 Running 0 9h
rook-ceph rook-ceph-mgr-a-5497fcbb7d-lq6tf 1/1 Running 0 9h
rook-ceph rook-ceph-mon-a-6966d857d9-s4wch 1/1 Running 0 9h
rook-ceph rook-ceph-mon-b-649c6845f4-z46br 1/1 Running 0 9h
rook-ceph rook-ceph-mon-c-67869b76c7-4v6zn 1/1 Running 0 9h
rook-ceph rook-ceph-operator-5968d8f7b9-hsfld 1/1 Running 0 9h
rook-ceph rook-ceph-osd-prepare-k8snode001-j25xv 1/1 Running 0 7h48m
rook-ceph rook-ceph-osd-prepare-k8snode002-6fvlx 0/1 Completed 0 9h
rook-ceph rook-ceph-osd-prepare-k8snode003-cqc4g 0/1 Completed 0 9h
rook-ceph rook-ceph-osd-prepare-k8snode004-jxxtl 0/1 Completed 0 9h
rook-ceph rook-discover-28xj4 1/1 Running 0 9h
rook-ceph rook-discover-4ss66 1/1 Running 0 9h
rook-ceph rook-discover-bt8rd 1/1 Running 0 9h
rook-ceph rook-discover-q8f4x 1/1 Running 0 9h
Please let me know if anyone has any hints to resolve this or troubleshoot this?
In my case, the problem is my Kubernetes host is not in the same kernel version.
Once I upgraded the kernel version to match with all the other nodes, this issue is resolved.
In my case, one of my nodes system clock not synchronized with hardware so there was a time gap between nodes.
maybe you should check output of timedatectl command.

Failed to open topo server on vitess with etcd

I'm running a simple example with Helm. Take a look below at values.yaml file:
cat << EOF | helm install helm/vitess -n vitess -f -
topology:
cells:
- name: 'zone1'
keyspaces:
- name: 'vitess'
shards:
- name: '0'
tablets:
- type: 'replica'
vttablet:
replicas: 1
mysqlProtocol:
enabled: true
authType: secret
username: vitess
passwordSecret: vitess-db-password
etcd:
replicas: 3
vtctld:
replicas: 1
vtgate:
replicas: 3
vttablet:
dataVolumeClaimSpec:
storageClassName: nfs-slow
EOF
Take a look at the output of current pods running below:
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-fb8b8dccf-8f5kt 1/1 Running 0 32m
kube-system coredns-fb8b8dccf-qbd6c 1/1 Running 0 32m
kube-system etcd-master1 1/1 Running 0 32m
kube-system kube-apiserver-master1 1/1 Running 0 31m
kube-system kube-controller-manager-master1 1/1 Running 0 32m
kube-system kube-flannel-ds-amd64-bkg9z 1/1 Running 0 32m
kube-system kube-flannel-ds-amd64-q8vh4 1/1 Running 0 32m
kube-system kube-flannel-ds-amd64-vqmnz 1/1 Running 0 32m
kube-system kube-proxy-bd8mf 1/1 Running 0 32m
kube-system kube-proxy-nlc2b 1/1 Running 0 32m
kube-system kube-proxy-x7cd5 1/1 Running 0 32m
kube-system kube-scheduler-master1 1/1 Running 0 32m
kube-system tiller-deploy-8458f6c667-cx2mv 1/1 Running 0 27m
vitess etcd-global-6pwvnv29th 0/1 Init:0/1 0 16m
vitess etcd-operator-84db9bc774-j4wml 1/1 Running 0 30m
vitess etcd-zone1-zwgvd7spzc 0/1 Init:0/1 0 16m
vitess vtctld-86cd78b6f5-zgfqg 0/1 CrashLoopBackOff 7 16m
vitess vtgate-zone1-58744956c4-x8ms2 0/1 CrashLoopBackOff 7 16m
vitess zone1-vitess-0-init-shard-master-mbbph 1/1 Running 0 16m
vitess zone1-vitess-0-replica-0 0/6 Init:CrashLoopBackOff 7 16m
Running logs I see this error:
$ kubectl logs -n vitess vtctld-86cd78b6f5-zgfqg
++ cat
+ eval exec /vt/bin/vtctld '-cell="zone1"' '-web_dir="/vt/web/vtctld"' '-web_dir2="/vt/web/vtctld2/app"' -workflow_manager_init -workflow_manager_use_election -logtostderr=true -stderrthreshold=0 -port=15000 -grpc_port=15999 '-service_map="grpc-vtctl"' '-topo_implementation="etcd2"' '-topo_global_server_address="etcd-global-client.vitess:2379"' -topo_global_root=/vitess/global
++ exec /vt/bin/vtctld -cell=zone1 -web_dir=/vt/web/vtctld -web_dir2=/vt/web/vtctld2/app -workflow_manager_init -workflow_manager_use_election -logtostderr=true -stderrthreshold=0 -port=15000 -grpc_port=15999 -service_map=grpc-vtctl -topo_implementation=etcd2 -topo_global_server_address=etcd-global-client.vitess:2379 -topo_global_root=/vitess/global
ERROR: logging before flag.Parse: E0422 02:35:34.020928 1 syslogger.go:122] can't connect to syslog
F0422 02:35:39.025400 1 server.go:221] Failed to open topo server (etcd2,etcd-global-client.vitess:2379,/vitess/global): grpc: timed out when dialing
I'm running behind vagrant with 1 master and 2 nodes. I suspect that is a issue with eth1.
The storage are configured to use NFS.
$ kubectl logs etcd-operator-84db9bc774-j4wml
time="2019-04-22T17:26:51Z" level=info msg="skip reconciliation: running ([]), pending ([etcd-zone1-zwgvd7spzc])" cluster-name=etcd-zone1 cluster-namespace=vitess pkg=cluster
time="2019-04-22T17:26:51Z" level=info msg="skip reconciliation: running ([]), pending ([etcd-zone1-zwgvd7spzc])" cluster-name=etcd-global cluster-namespace=vitess pkg=cluster
It appears that etcd is not fully initializing. Note that neither the pod for the global lockserver (etcd-global-6pwvnv29th) nor the local one for cell zone1 (pod etcd-zone1-zwgvd7spzc) are ready.

How to debug Kubernetes on the proper way?

I would like to run Istio to play around, but I facing issues with my local kubernetes installation and I am successfuly stack with a way of debug my installation
That is a my current situation:
root#node1:/tmp/istio-0.1.5# kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana 10.233.2.70 <pending> 3000:31202/TCP 1h
istio-egress 10.233.39.101 <none> 80/TCP 1h
istio-ingress 10.233.48.51 <pending> 80:30982/TCP,443:31195/TCP 1h
istio-manager 10.233.2.109 <none> 8080/TCP,8081/TCP 1h
istio-mixer 10.233.39.58 <none> 9091/TCP,9094/TCP,42422/TCP 1h
kubernetes 10.233.0.1 <none> 443/TCP 4h
prometheus 10.233.63.20 <pending> 9090:32170/TCP 1h
servicegraph 10.233.39.104 <pending> 8088:30814/TCP 1h
root#node1:/tmp/istio-0.1.5# kubectl get pods
NAME READY STATUS RESTARTS AGE
grafana-1261931457-3hx2p 0/1 Pending 0 1h
istio-ca-3887035158-6p3b0 0/1 Pending 0 1h
istio-egress-1920226302-vmlx1 0/1 Pending 0 1h
istio-ingress-2112208289-ctxj5 0/1 Pending 0 1h
istio-manager-2910860705-z28dp 0/2 Pending 0 1h
istio-mixer-2335471611-rsrhb 0/1 Pending 0 1h
prometheus-3067433533-l2m48 0/1 Pending 0 1h
servicegraph-3127588006-1k5rg 0/1 Pending 0 1h
kubectl get rs
NAME DESIRED CURRENT READY AGE
grafana-1261931457 1 1 0 1h
istio-ca-3887035158 1 1 0 1h
istio-egress-1920226302 1 1 0 1h
istio-ingress-2112208289 1 1 0 1h
istio-manager-2910860705 1 1 0 1h
istio-mixer-2335471611 1 1 0 1h
prometheus-3067433533 1 1 0 1h
servicegraph-3127588006 1 1 0 1h
kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
grafana-1261931457-3hx2p 0/1 Pending 0 1h app=grafana,pod-template-hash=1261931457
istio-ca-3887035158-6p3b0 0/1 Pending 0 1h istio=istio-ca,pod-template-hash=3887035158
istio-egress-1920226302-vmlx1 0/1 Pending 0 1h istio=egress,pod-template-hash=1920226302
istio-ingress-2112208289-ctxj5 0/1 Pending 0 1h istio=ingress,pod-template-hash=2112208289
istio-manager-2910860705-z28dp 0/2 Pending 0 1h istio=manager,pod-template-hash=2910860705
istio-mixer-2335471611-rsrhb 0/1 Pending 0 1h istio=mixer,pod-template-hash=2335471611
prometheus-3067433533-l2m48 0/1 Pending 0 1h app=prometheus,pod-template-hash=3067433533
servicegraph-3127588006-1k5rg 0/1 Pending 0 1h app=servicegraph,pod-template-hash=3127588006
root#node1:/tmp/istio-0.1.5# kubectl get nodes --show-labels
NAME STATUS AGE VERSION LABELS
node1 Ready 5h v1.6.4+coreos.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node1,node-role.kubernetes.io/master=true,node-role.kubernetes.io/node=true
node2 Ready 5h v1.6.4+coreos.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node2,node-role.kubernetes.io/master=true,node-role.kubernetes.io/node=true
node3 Ready 5h v1.6.4+coreos.0 app=prometeus,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node3,node-role.kubernetes.io/node=true
node4 Ready 5h v1.6.4+coreos.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node4,node-role.kubernetes.io/node=true
Unfortunately, after I read out most of documentation, I found out only few way to debug an installation
journalctl -r -u kubelet
kubectl get events
kubectl describe deployment
Is there any common workflow to debug Kubernetes installation?
Its in the documentation. follow the POD troubleshooting steps
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/