why the kubernetes dashboard pod aways pending - kubernetes

I am check the cluster info and find kubernetes dashboard pod is pending:
[root#ops001 data]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-nginx-756fb87568-rcdsm 0/1 Pending 0 81d
default my-nginx-756fb87568-vtf46 0/1 Pending 0 81d
default soa-room-service-768cfd68d-5zxgd 0/1 Pending 0 81d
kube-system coredns-89764d78c-mbcbz 0/1 Pending 0 123d
kube-system kubernetes-dashboard-74d7cc788-8fggl 0/1 Pending 0 15d
kube-system kubernetes-dashboard-74d7cc788-mk9c7 0/1 UnexpectedAdmissionError 0 123d
this is lack of resource?this is the detail output:
[root#ops001 ~]# kubectl describe pod kubernetes-dashboard-74d7cc788-8fggl --namespace kube-system
Name: kubernetes-dashboard-74d7cc788-8fggl
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=74d7cc788
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
seccomp.security.alpha.kubernetes.io/pod: docker/default
Status: Pending
IP:
Controlled By: ReplicaSet/kubernetes-dashboard-74d7cc788
Containers:
kubernetes-dashboard:
Image: gcr.azk8s.cn/google_containers/kubernetes-dashboard-amd64:v1.10.1
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 50m
memory: 100Mi
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-pmxpf (ro)
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-pmxpf:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-pmxpf
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node.kubernetes.io/not-ready:NoExecute for 360s
node.kubernetes.io/unreachable:NoExecute for 360s
Events: <none>
this is the node top output:
[root#ops001 ~]# top
top - 23:45:57 up 244 days, 5:56, 7 users, load average: 3.45, 2.93, 3.77
Tasks: 245 total, 1 running, 244 sleeping, 0 stopped, 0 zombie
%Cpu(s): 38.6 us, 8.4 sy, 0.0 ni, 49.2 id, 3.4 wa, 0.0 hi, 0.4 si, 0.0 st
KiB Mem : 16266412 total, 3963688 free, 5617380 used, 6685344 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 10228760 avail Mem
the kube-scheduler service is up.I have no idea where is going wrong.

From what I can see you have all pods in pending state even coredns. This is the main reason why dashboard doesn't work.
I would focus on dealing with that first, for this I'd recommend checking Troubleshooting kubeadm.
This will tell you to install networking addon which can be found here.
You can also have a look at this question Kube-dns always in pending state.

Related

Pod staying in Pending state

I have a kubernetes pod that is staying in Pending state. When I describe the pod, I am not seeing why it fails to start, I can just see Back-off restarting failed container.
This is what I can see when I describe the pod.
kubectl describe po jenkins-68d5474964-slpkj -n infrastructure
Name: jenkins-68d5474964-slpkj
Namespace: infrastructure
Priority: 0
PriorityClassName: <none>
Node: ip-172-20-120-29.eu-west-1.compute.internal/172.20.120.29
Start Time: Fri, 05 Feb 2021 17:10:34 +0100
Labels: app=jenkins
chart=jenkins-0.35.0
component=jenkins-jenkins-master
heritage=Tiller
pod-template-hash=2481030520
release=jenkins
Annotations: checksum/config=fc546aa316b7bb9bd6a7cbeb69562ca9f224dbfe53973411f97fea27e90cd4d7
Status: Pending
IP: 100.125.247.153
Controlled By: ReplicaSet/jenkins-68d5474964
Init Containers:
copy-default-config:
Container ID: docker://a6ce91864c181d4fc851afdd4a6dc2258c23e75bbed6981fe1cafad74a764ff2
Image: jenkins/jenkins:2.248
Image ID: docker-pullable://jenkins/jenkins#sha256:352f10079331b1e63c170b6f4b5dc5e2367728f0da00b6ad34424b2b2476426a
Port: <none>
Host Port: <none>
Command:
sh
/var/jenkins_config/apply_config.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 05 Feb 2021 17:15:16 +0100
Finished: Fri, 05 Feb 2021 17:15:36 +0100
Ready: False
Restart Count: 5
Limits:
cpu: 2560m
memory: 2Gi
Requests:
cpu: 50m
memory: 256Mi
Environment:
ADMIN_PASSWORD: <set to the key 'jenkins-admin-password' in secret 'jenkins'> Optional: false
ADMIN_USER: <set to the key 'jenkins-admin-user' in secret 'jenkins'> Optional: false
Mounts:
/usr/share/jenkins/ref/secrets/ from secrets-dir (rw)
/var/jenkins_config from jenkins-config (rw)
/var/jenkins_home from jenkins-home (rw)
/var/jenkins_plugins from plugin-dir (rw)
/var/run/docker.sock from docker-sock (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5tbbb (rw)
Containers:
jenkins:
Container ID:
Image: jenkins/jenkins:2.248
Image ID:
Ports: 8080/TCP, 50000/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--argumentsRealm.passwd.$(ADMIN_USER)=$(ADMIN_PASSWORD)
--argumentsRealm.roles.$(ADMIN_USER)=admin
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 2560m
memory: 2Gi
Requests:
cpu: 50m
memory: 256Mi
Environment:
JAVA_OPTS:
JENKINS_OPTS:
JENKINS_SLAVE_AGENT_PORT: 50000
ADMIN_PASSWORD: <set to the key 'jenkins-admin-password' in secret 'jenkins'> Optional: false
ADMIN_USER: <set to the key 'jenkins-admin-user' in secret 'jenkins'> Optional: false
Mounts:
/usr/share/jenkins/ref/plugins/ from plugin-dir (rw)
/usr/share/jenkins/ref/secrets/ from secrets-dir (rw)
/var/jenkins_config from jenkins-config (ro)
/var/jenkins_home from jenkins-home (rw)
/var/run/docker.sock from docker-sock (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5tbbb (rw)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
jenkins-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: jenkins
Optional: false
plugin-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
secrets-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
jenkins-home:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: jenkins
ReadOnly: false
default-token-5tbbb:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5tbbb
Optional: false
docker-sock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType:
QoS Class: Burstable
Node-Selectors: nodePool=ci
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m default-scheduler Successfully assigned infrastructure/jenkins-68d5474964-slpkj to ip-172-20-120-29.eu-west-1.compute.internal
Normal Started 5m (x4 over 7m) kubelet, ip-172-20-120-29.eu-west-1.compute.internal Started container
Normal Pulling 4m (x5 over 7m) kubelet, ip-172-20-120-29.eu-west-1.compute.internal pulling image "jenkins/jenkins:2.248"
Normal Pulled 4m (x5 over 7m) kubelet, ip-172-20-120-29.eu-west-1.compute.internal Successfully pulled image "jenkins/jenkins:2.248"
Normal Created 4m (x5 over 7m) kubelet, ip-172-20-120-29.eu-west-1.compute.internal Created container
Warning BackOff 2m (x14 over 6m) kubelet, ip-172-20-120-29.eu-west-1.compute.internal Back-off restarting failed container
Once I run helm upgrade for that container, I can see:
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
jenkins 5 441d
jenkins-configs 1 441d
jenkins-tests 1 441d
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
jenkins 0/1 1 0 441d
==> v1/PersistentVolumeClaim
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
jenkins Bound pvc-8813319f-0d37-11ea-9864-0a7b1d347c8a 4Gi RWO aws-efs 441d
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
jenkins-7b85495f65-2w5mv 0/1 Init:0/1 3 2m9s
==> v1/Secret
NAME TYPE DATA AGE
jenkins Opaque 2 441d
jenkins-secrets Opaque 3 441d
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
jenkins LoadBalancer 100.65.2.235 a881a20a40d37... 8080:31962/TCP 441d
jenkins-agent ClusterIP 100.64.69.113 <none> 50000/TCP 441d
==> v1/ServiceAccount
NAME SECRETS AGE
jenkins 1 441d
==> v1beta1/ClusterRoleBinding
NAME AGE
jenkins-role-binding 441d
Can someone advice?
Now you cannot get any logs by kubectl logs pod_name because the pod status is initializing.
When you use kubectl logs command;
If the pod has multiple containers, you have to specify the container name explicitly.
If you have only one container, then no need to specify the container name.
If you want to get logs of initContainers, you need to specify the initContainer name.
For your case, the pod has one init container and seems it stuck now.
Init Containers:
copy-default-config:
Command:
sh
/var/jenkins_config/apply_config.sh
You can check the log of this container.
kubectl logs jenkins-68d5474964-slpkj copy-default-config
For me, the deployment was in this state because the installPlugins list was incorrectly set in the values passed to the Helm chart.
If it can help :)

Istio Prometheus pod in CrashLoopBackOff State

I am trying to setup Istio (1.5.4) for the bookinfo example provided on their website. I have used the demo configuration profile. But on verifying istio installation it fails since Prometheus pod has entered a CrashLoopBackOff state.
NAME READY STATUS RESTARTS AGE
grafana-5f6f8cbf75-psk78 1/1 Running 0 21m
istio-egressgateway-7f9f45c966-g7k9j 1/1 Running 0 21m
istio-ingressgateway-968d69c8b-bhxk5 1/1 Running 0 21m
istio-tracing-9dd6c4f7c-7fm79 1/1 Running 0 21m
istiod-86884c8c45-sw96x 1/1 Running 0 21m
kiali-869c6894c5-wqgjb 1/1 Running 0 21m
prometheus-589c44dbfc-xkwmj 1/2 CrashLoopBackOff 8 21m
The logs for the prometheus pod:
level=warn ts=2020-05-15T09:07:53.113Z caller=main.go:283 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.1, branch=HEAD, revision=8744510c6391d3ef46d8294a7e1f46e57407ab13)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:331 build_context="(go=go1.13.5, user=root#4b1e33c71b9d, date=20191225-01:04:15)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:332 host_details="(Linux 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 prometheus-589c44dbfc-xkwmj (none))"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:107 component=activeQueryTracker msg="Failed to create directory for logging active queries"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: no such file or directory"
panic: Unable to create mmap-ed active query log
goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x24dda5b, 0x5, 0x14, 0x2c62100, 0xc0005f63c0, 0x2c62100)
/app/promql/query_logger.go:115 +0x48c
main.main()
/app/cmd/prometheus/main.go:362 +0x5229
Describe pod output:
Name: prometheus-589c44dbfc-xkwmj
Namespace: istio-system
Priority: 0
Node: inspiron-7577/192.168.0.9
Start Time: Fri, 15 May 2020 14:21:14 +0530
Labels: app=prometheus
pod-template-hash=589c44dbfc
release=istio
Annotations: sidecar.istio.io/inject: false
Status: Running
IP: 172.17.0.11
IPs:
IP: 172.17.0.11
Controlled By: ReplicaSet/prometheus-589c44dbfc
Containers:
prometheus:
Container ID: docker://b6820a000ab67a5ce31d3a38f6f0d510bd150794b2792147fc17ef8f730c03bb
Image: docker.io/prom/prometheus:v2.15.1
Image ID: docker-pullable://prom/prometheus#sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention=6h
--config.file=/etc/prometheus/prometheus.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 15 May 2020 14:37:50 +0530
Finished: Fri, 15 May 2020 14:37:53 +0530
Ready: False
Restart Count: 8
Requests:
cpu: 10m
Liveness: http-get http://:9090/-/healthy delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/istio-certs from istio-certs (rw)
/etc/prometheus from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
istio-proxy:
Container ID: docker://fa756c93510b6f402d7d88c31a5f5f066d4c254590eab70886e7835e7d3871be
Image: docker.io/istio/proxyv2:1.5.4
Image ID: docker-pullable://istio/proxyv2#sha256:e16e2801b7fd93154e8fcb5f4e2fb1240d73349d425b8be90691d48e8b9bb944
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
istio-proxy-prometheus
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system.svc:15012
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--connectTimeout
10s
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
--dnsRefreshRate
300s
--statusPort
15020
--trust-domain=cluster.local
--controlPlaneBootstrap=false
State: Running
Started: Fri, 15 May 2020 14:21:31 +0530
Ready: True
Restart Count: 0
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
OUTPUT_CERTS: /etc/istio-certs
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istio-pilot.istio-system.svc:15012
POD_NAME: prometheus-589c44dbfc-xkwmj (v1:metadata.name)
POD_NAMESPACE: istio-system (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
ISTIO_META_POD_NAME: prometheus-589c44dbfc-xkwmj (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: istio-system (v1:metadata.namespace)
ISTIO_META_MESH_ID: cluster.local
ISTIO_META_CLUSTER_ID: Kubernetes
Mounts:
/etc/istio-certs/ from istio-certs (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus
Optional: false
istio-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
prometheus-token-cgqbc:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-cgqbc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned istio-system/prometheus-589c44dbfc-xkwmj to inspiron-7577
Warning FailedMount 17m kubelet, inspiron-7577 MountVolume.SetUp failed for volume "prometheus-token-cgqbc" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 17m kubelet, inspiron-7577 MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulled 17m kubelet, inspiron-7577 Container image "docker.io/istio/proxyv2:1.5.4" already present on machine
Normal Created 17m kubelet, inspiron-7577 Created container istio-proxy
Normal Started 17m kubelet, inspiron-7577 Started container istio-proxy
Warning Unhealthy 17m kubelet, inspiron-7577 Readiness probe failed: HTTP probe failed with statuscode: 503
Normal Pulled 16m (x4 over 17m) kubelet, inspiron-7577 Container image "docker.io/prom/prometheus:v2.15.1" already present on machine
Normal Created 16m (x4 over 17m) kubelet, inspiron-7577 Created container prometheus
Normal Started 16m (x4 over 17m) kubelet, inspiron-7577 Started container prometheus
Warning BackOff 2m24s (x72 over 17m) kubelet, inspiron-7577 Back-off restarting failed container
It is unable to create directory for logging. Please help with any ideas.
As istio 1.5.4 has been just released there are some issues with prometheus on minikube installed with istioctl manifest apply.
I checked it on a gcp and everything works fine there.
As a workaround, you can use istio operator which was tested by me and OP and as he mentioned in comments, it's working.
Thanks a lot #jt97! It did work.
Steps to install istio operator
Install the istioctl command.
Deploy the Istio operator: istioctl operator init.
Install istio
To install the Istio demo configuration profile using the operator, run the following command:
kubectl create ns istio-system
kubectl apply -f - <<EOF
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
namespace: istio-system
name: example-istiocontrolplane
spec:
profile: demo
EOF
Could you tell me why the normal installation failed?
As I mentioned in comments, I don't know yet. If I found a reason I will update this question.

Helm init for tiller deploy is stuck at ContainerCREATING status

NAME READY STATUS RESTARTS AGE
coredns-5644d7b6d9-289qz 1/1 Running 0 76m
coredns-5644d7b6d9-ssbb2 1/1 Running 0 76m
etcd-k8s-master 1/1 Running 0 75m
kube-apiserver-k8s-master 1/1 Running 0 75m
kube-controller-manager-k8s-master 1/1 Running 0 75m
kube-proxy-2q9k5 1/1 Running 0 71m
kube-proxy-dz9pk 1/1 Running 0 76m
kube-scheduler-k8s-master 1/1 Running 0 75m
tiller-deploy-7b875fbf86-8nxmk 0/1 ContainerCreating 0 17m
weave-net-nzb67 2/2 Running 0 75m
weave-net-t8kmk 2/2 Running 0 71m
Installed Kubernates version v1.16.2 but when installing tiller using new service account it is strucking at Container creating. Tried all the solutions such as RBAC, Removing tiller role and do it again, reinstalling kubernates etc.
Output for Kubectl describe is as follows.
[02:32:50] root#k8s-master$ kubectl describe pods tiller-deploy-7b875fbf86-8nxmk --namespace kube-system
Name: tiller-deploy-7b875fbf86-8nxmk
Namespace: kube-system
Priority: 0
Node: worker-node1/172.17.0.1
Start Time: Thu, 24 Oct 2019 14:12:45 -0400
Labels: app=helm
name=tiller
pod-template-hash=7b875fbf86
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/tiller-deploy-7b875fbf86
Containers:
tiller:
Container ID:
Image: gcr.io/kubernetes-helm/tiller:v2.15.1
Image ID:
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from tiller-token-rr2jg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tiller-token-rr2jg:
Type: Secret (a volume populated by a Secret)
SecretName: tiller-token-rr2jg
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/tiller-deploy-7b875fbf86-8nxmk to worker-node1
Warning FailedCreatePodSandBox 61s (x5 over 17m) kubelet, worker-node1 Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal SandboxChanged 60s (x5 over 17m) kubelet, worker-node1 Pod sandbox changed, it will be killed and re-created.
[~]
FailedCreatePodSandBox
Means that worker-node1 (172.17.0.1) does not have CNI installed or configured, and is a frequently asked question. Whatever mechanism you used to install kubernetes did not do a robust job, or else you missed a step along the way.
I also have pretty high confidence that your kubelet logs on worker-node1 are filled with error messages, if you were to actually look at them.

NoExecuteTaintManager falsely deleting Pod?

I am receiving NoExecuteTaintManager events that are deleting my pod but I can't figure out why. The node is healthy and the Pod has the appropriate tolerations.
This is actually causing infinite scale up because my Pod is setup so that it uses 3/4 Node CPUs and has a Toleration Grace Period > 0. This forces a new node when a Pod terminates. Cluster Autoscaler tries to keep the replicas == 2.
How do I figure out which taint is causing it specifically? Any then why it thinks that node had that taint? Currently the pods are being killed at exactly 600 seconds (which I have changed tolerationSeconds to be for node.kubernetes.io/unreachable and node.kubernetes.io/not-ready) however the node does not appear to undergo either of those situations.
NAME READY STATUS RESTARTS AGE
my-api-67df7bd54c-dthbn 1/1 Running 0 8d
my-api-67df7bd54c-mh564 1/1 Running 0 8d
my-pod-6d7b698b5f-28rgw 1/1 Terminating 0 15m
my-pod-6d7b698b5f-2wmmg 1/1 Terminating 0 13m
my-pod-6d7b698b5f-4lmmg 1/1 Running 0 4m32s
my-pod-6d7b698b5f-7m4gh 1/1 Terminating 0 71m
my-pod-6d7b698b5f-8b47r 1/1 Terminating 0 27m
my-pod-6d7b698b5f-bb58b 1/1 Running 0 2m29s
my-pod-6d7b698b5f-dn26n 1/1 Terminating 0 25m
my-pod-6d7b698b5f-jrnkg 1/1 Terminating 0 38m
my-pod-6d7b698b5f-sswps 1/1 Terminating 0 36m
my-pod-6d7b698b5f-vhqnf 1/1 Terminating 0 59m
my-pod-6d7b698b5f-wkrtg 1/1 Terminating 0 50m
my-pod-6d7b698b5f-z6p2c 1/1 Terminating 0 47m
my-pod-6d7b698b5f-zplp6 1/1 Terminating 0 62m
14:22:43.678937 8 taint_manager.go:102] NoExecuteTaintManager is deleting Pod: my-pod-6d7b698b5f-dn26n
14:22:43.679073 8 event.go:221] Event(v1.ObjectReference{Kind:"Pod", Namespace:"prod", Name:"my-pod-6d7b698b5f-dn26n", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TaintManagerEviction' Marking for deletion Pod prod/my-pod-6d7b698b5f-dn26n
# kubectl -n prod get pod my-pod-6d7b698b5f-8b47r -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
checksum/config: bcdc41c616f736849a6bef9c726eec9bf704ce7d2c61736005a6fedda0ee14d0
kubernetes.io/psp: eks.privileged
creationTimestamp: "2019-10-25T14:09:17Z"
deletionGracePeriodSeconds: 172800
deletionTimestamp: "2019-10-27T14:20:40Z"
generateName: my-pod-6d7b698b5f-
labels:
app.kubernetes.io/instance: my-pod
app.kubernetes.io/name: my-pod
pod-template-hash: 6d7b698b5f
name: my-pod-6d7b698b5f-8b47r
namespace: prod
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: my-pod-6d7b698b5f
uid: c6360643-f6a6-11e9-9459-12ff96456b32
resourceVersion: "2408256"
selfLink: /api/v1/namespaces/prod/pods/my-pod-6d7b698b5f-8b47r
uid: 08197175-f731-11e9-9459-12ff96456b32
spec:
containers:
- args:
- -c
- from time import sleep; sleep(10000)
command:
- python
envFrom:
- secretRef:
name: pix4d
- secretRef:
name: rabbitmq
image: python:3.7-buster
imagePullPolicy: Always
name: my-pod
ports:
- containerPort: 5000
name: http
protocol: TCP
resources:
requests:
cpu: "3"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-gv6q5
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ip-10-142-54-235.ec2.internal
nodeSelector:
nodepool: zeroscaling-gpu-accelerated-p2-xlarge
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 172800
tolerations:
- key: specialized
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 600
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 600
volumes:
- name: default-token-gv6q5
secret:
defaultMode: 420
secretName: default-token-gv6q5
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-10-25T14:10:40Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-10-25T14:11:09Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2019-10-25T14:11:09Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2019-10-25T14:10:40Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://15e2e658c459a91a86573c1096931fa4ac345e06f26652da2a58dc3e3b3d5aa2
image: python:3.7-buster
imageID: docker-pullable://python#sha256:f0db6711abee8d406121c9e057bc0f7605336e8148006164fea2c43809fe7977
lastState: {}
name: my-pod
ready: true
restartCount: 0
state:
running:
startedAt: "2019-10-25T14:11:09Z"
hostIP: 10.142.54.235
phase: Running
podIP: 10.142.63.233
qosClass: Burstable
startTime: "2019-10-25T14:10:40Z"
# kubectl -n prod describe pod my-pod-6d7b698b5f-8b47r
Name: my-pod-6d7b698b5f-8b47r
Namespace: prod
Priority: 0
PriorityClassName: <none>
Node: ip-10-142-54-235.ec2.internal/10.142.54.235
Start Time: Fri, 25 Oct 2019 10:10:40 -0400
Labels: app.kubernetes.io/instance=my-pod
app.kubernetes.io/name=my-pod
pod-template-hash=6d7b698b5f
Annotations: checksum/config: bcdc41c616f736849a6bef9c726eec9bf704ce7d2c61736005a6fedda0ee14d0
kubernetes.io/psp: eks.privileged
Status: Terminating (lasts 47h)
Termination Grace Period: 172800s
IP: 10.142.63.233
Controlled By: ReplicaSet/my-pod-6d7b698b5f
Containers:
my-pod:
Container ID: docker://15e2e658c459a91a86573c1096931fa4ac345e06f26652da2a58dc3e3b3d5aa2
Image: python:3.7-buster
Image ID: docker-pullable://python#sha256:f0db6711abee8d406121c9e057bc0f7605336e8148006164fea2c43809fe7977
Port: 5000/TCP
Host Port: 0/TCP
Command:
python
Args:
-c
from time import sleep; sleep(10000)
State: Running
Started: Fri, 25 Oct 2019 10:11:09 -0400
Ready: True
Restart Count: 0
Requests:
cpu: 3
Environment Variables from:
pix4d Secret Optional: false
rabbitmq Secret Optional: false
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gv6q5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-gv6q5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gv6q5
Optional: false
QoS Class: Burstable
Node-Selectors: nodepool=zeroscaling-gpu-accelerated-p2-xlarge
Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s
node.kubernetes.io/unreachable:NoExecute for 600s
specialized
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12m (x2 over 12m) default-scheduler 0/13 nodes are available: 1 Insufficient pods, 13 Insufficient cpu, 6 node(s) didn't match node selector.
Normal TriggeredScaleUp 12m cluster-autoscaler pod triggered scale-up: [{prod-worker-gpu-accelerated-p2-xlarge 7->8 (max: 13)}]
Warning FailedScheduling 11m (x5 over 11m) default-scheduler 0/14 nodes are available: 1 Insufficient pods, 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 6 node(s) didn't match node selector.
Normal Scheduled 11m default-scheduler Successfully assigned prod/my-pod-6d7b698b5f-8b47r to ip-10-142-54-235.ec2.internal
Normal Pulling 11m kubelet, ip-10-142-54-235.ec2.internal pulling image "python:3.7-buster"
Normal Pulled 10m kubelet, ip-10-142-54-235.ec2.internal Successfully pulled image "python:3.7-buster"
Normal Created 10m kubelet, ip-10-142-54-235.ec2.internal Created container
Normal Started 10m kubelet, ip-10-142-54-235.ec2.internal Started container
# kubectl -n prod describe node ip-10-142-54-235.ec2.internal
Name: ip-10-142-54-235.ec2.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=p2.xlarge
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1b
kubernetes.io/hostname=ip-10-142-54-235.ec2.internal
nodepool=zeroscaling-gpu-accelerated-p2-xlarge
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 25 Oct 2019 10:10:20 -0400
Taints: specialized=true:NoExecute
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 25 Oct 2019 10:23:11 -0400 Fri, 25 Oct 2019 10:10:19 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 25 Oct 2019 10:23:11 -0400 Fri, 25 Oct 2019 10:10:19 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 25 Oct 2019 10:23:11 -0400 Fri, 25 Oct 2019 10:10:19 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 25 Oct 2019 10:23:11 -0400 Fri, 25 Oct 2019 10:10:40 -0400 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.142.54.235
ExternalIP: 3.86.112.24
Hostname: ip-10-142-54-235.ec2.internal
InternalDNS: ip-10-142-54-235.ec2.internal
ExternalDNS: ec2-3-86-112-24.compute-1.amazonaws.com
Capacity:
attachable-volumes-aws-ebs: 39
cpu: 4
ephemeral-storage: 209702892Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 62872868Ki
pods: 58
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 4
ephemeral-storage: 200777747706
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 61209892Ki
pods: 58
System Info:
Machine ID: 0e76fec3e06d41a6bf2c49a18fbe1795
System UUID: EC29973A-D616-F673-6899-A96C97D5AE2D
Boot ID: 4bc510b6-f615-48a7-9e1e-47261ddf26a4
Kernel Version: 4.14.146-119.123.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.13.11-eks-5876d6
Kube-Proxy Version: v1.13.11-eks-5876d6
ProviderID: aws:///us-east-1b/i-0f5b519aa6e38e04a
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
amazon-cloudwatch cloudwatch-agent-4d24j 50m (1%) 250m (6%) 50Mi (0%) 250Mi (0%) 12m
amazon-cloudwatch fluentd-cloudwatch-wkslq 50m (1%) 0 (0%) 150Mi (0%) 300Mi (0%) 12m
prod my-pod-6d7b698b5f-8b47r 3 (75%) 0 (0%) 0 (0%) 0 (0%) 14m
kube-system aws-node-6nr6g 10m (0%) 0 (0%) 0 (0%) 0 (0%) 13m
kube-system kube-proxy-wf8k4 100m (2%) 0 (0%) 0 (0%) 0 (0%) 13m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 3210m (80%) 250m (6%)
memory 200Mi (0%) 550Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 13m kubelet, ip-10-142-54-235.ec2.internal Starting kubelet.
Normal NodeHasSufficientMemory 13m (x2 over 13m) kubelet, ip-10-142-54-235.ec2.internal Node ip-10-142-54-235.ec2.internal status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 13m (x2 over 13m) kubelet, ip-10-142-54-235.ec2.internal Node ip-10-142-54-235.ec2.internal status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 13m (x2 over 13m) kubelet, ip-10-142-54-235.ec2.internal Node ip-10-142-54-235.ec2.internal status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 13m kubelet, ip-10-142-54-235.ec2.internal Updated Node Allocatable limit across pods
Normal Starting 12m kube-proxy, ip-10-142-54-235.ec2.internal Starting kube-proxy.
Normal NodeReady 12m kubelet, ip-10-142-54-235.ec2.internal Node ip-10-142-54-235.ec2.internal status is now: NodeReady
# kubectl get node ip-10-142-54-235.ec2.internal -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2019-10-25T14:10:20Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: p2.xlarge
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: us-east-1
failure-domain.beta.kubernetes.io/zone: us-east-1b
kubernetes.io/hostname: ip-10-142-54-235.ec2.internal
nodepool: zeroscaling-gpu-accelerated-p2-xlarge
name: ip-10-142-54-235.ec2.internal
resourceVersion: "2409195"
selfLink: /api/v1/nodes/ip-10-142-54-235.ec2.internal
uid: 2d934979-f731-11e9-89b8-0234143df588
spec:
providerID: aws:///us-east-1b/i-0f5b519aa6e38e04a
taints:
- effect: NoExecute
key: specialized
value: "true"
status:
addresses:
- address: 10.142.54.235
type: InternalIP
- address: 3.86.112.24
type: ExternalIP
- address: ip-10-142-54-235.ec2.internal
type: Hostname
- address: ip-10-142-54-235.ec2.internal
type: InternalDNS
- address: ec2-3-86-112-24.compute-1.amazonaws.com
type: ExternalDNS
allocatable:
attachable-volumes-aws-ebs: "39"
cpu: "4"
ephemeral-storage: "200777747706"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 61209892Ki
pods: "58"
capacity:
attachable-volumes-aws-ebs: "39"
cpu: "4"
ephemeral-storage: 209702892Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 62872868Ki
pods: "58"
conditions:
- lastHeartbeatTime: "2019-10-25T14:23:51Z"
lastTransitionTime: "2019-10-25T14:10:19Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2019-10-25T14:23:51Z"
lastTransitionTime: "2019-10-25T14:10:19Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2019-10-25T14:23:51Z"
lastTransitionTime: "2019-10-25T14:10:19Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2019-10-25T14:23:51Z"
lastTransitionTime: "2019-10-25T14:10:40Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- python#sha256:f0db6711abee8d406121c9e057bc0f7605336e8148006164fea2c43809fe7977
- python:3.7-buster
sizeBytes: 917672801
- names:
- 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni#sha256:5b7e7435f88a86bbbdb2a5ecd61e893dc14dd13c9511dc8ace362d299259700a
- 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.5.4
sizeBytes: 290739356
- names:
- fluent/fluentd-kubernetes-daemonset#sha256:582770d951f81e0971e852089239ced0186e0bdc3226daf16b99ca4cc22de4f7
- fluent/fluentd-kubernetes-daemonset:v1.3.3-debian-cloudwatch-1.4
sizeBytes: 261867521
- names:
- amazon/cloudwatch-agent#sha256:877106acbc56e747ebe373548c88cd37274f666ca11b5c782211db4c5c7fb64b
- amazon/cloudwatch-agent:latest
sizeBytes: 131360039
- names:
- 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy#sha256:4767b441ddc424b0ea63c305b79be154f65fb15ebefe8a3b2832ce55aa6de2f0
- 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.13.8
sizeBytes: 80183964
- names:
- busybox#sha256:fe301db49df08c384001ed752dff6d52b4305a73a7f608f21528048e8a08b51e
- busybox:latest
sizeBytes: 1219782
- names:
- 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause-amd64#sha256:bea77c323c47f7b573355516acf927691182d1333333d1f41b7544012fab7adf
- 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause-amd64:3.1
sizeBytes: 742472
nodeInfo:
architecture: amd64
bootID: 4bc510b6-f615-48a7-9e1e-47261ddf26a4
containerRuntimeVersion: docker://18.6.1
kernelVersion: 4.14.146-119.123.amzn2.x86_64
kubeProxyVersion: v1.13.11-eks-5876d6
kubeletVersion: v1.13.11-eks-5876d6
machineID: 0e76fec3e06d41a6bf2c49a18fbe1795
operatingSystem: linux
osImage: Amazon Linux 2
systemUUID: EC29973A-D616-F673-6899-A96C97D5AE2D
Unfortunately, I don't have an exact answer to your issue, but I may have some workaround.
I think I had the same issue with Amazon EKS cluster, version 1.13.11 - my pod was triggering node scale-up, pod was scheduled, works for 300s and then evicted:
74m Normal TaintManagerEviction pod/master-3bb760a7-b782-4138-b09f-0ca385db9ad7-workspace Marking for deletion Pod project-beta/master-3bb760a7-b782-4138-b09f-0ca385db9ad7-workspace
Interesting, that the same pod was able to run with no problem if it was scheduled on existing node and not a just created one.
From my investigation, it really looks like some issue with this specific Kubernetes version. Maybe some edge case of the TaintBasedEvictions feature(I think it was enabled by default in 1.13 version of Kubernetes).
To "fix" this issue I updated cluster version to 1.14. After that, mysterious pod eviction did not happen anymore.
So, if it's possible to you, I suggest updating your cluster to 1.14 version(together with cluster-autoscaler).

pod hangs in Pending state

I have a kubernetes deployment in which I am trying to run 5 docker containers inside a single pod on a single node. The containers hang in "Pending" state and are never scheduled. I do not mind running more than 1 pod but I'd like to keep the number of nodes down. I have assumed 1 node with 1 CPU and 1.7G RAM will be enough for the 5 containers and I have attempted to split the workload across.
Initially I came to the conclusion that I have insufficient resources. I enabled autoscaling of nodes which produced the following (see kubectl describe pod command):
pod didn't trigger scale-up (it wouldn't fit if a new node is added)
Anyway, each docker container has a simple command which runs a fairly simple app. Ideally I wouldn't like to have to deal with setting CPU and RAM allocation of resources but even setting the CPU/mem limits within bounds so they don't add up to > 1, I still get (see kubectl describe po/test-529945953-gh6cl) I get this:
No nodes are available that match all of the following predicates::
Insufficient cpu (1), Insufficient memory (1).
Below are various commands that show the state. Any help on what I'm doing wrong will be appreciated.
kubectl get all
user_s#testing-11111:~/gce$ kubectl get all
NAME READY STATUS RESTARTS AGE
po/test-529945953-gh6cl 0/5 Pending 0 34m
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kubernetes 10.7.240.1 <none> 443/TCP 19d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/test 1 1 1 0 34m
NAME DESIRED CURRENT READY AGE
rs/test-529945953 1 1 0 34m
user_s#testing-11111:~/gce$
kubectl describe po/test-529945953-gh6cl
user_s#testing-11111:~/gce$ kubectl describe po/test-529945953-gh6cl
Name: test-529945953-gh6cl
Namespace: default
Node: <none>
Labels: app=test
pod-template-hash=529945953
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"test-529945953","uid":"c6e889cb-a2a0-11e7-ac18-42010a9a001a"...
Status: Pending
IP:
Created By: ReplicaSet/test-529945953
Controlled By: ReplicaSet/test-529945953
Containers:
container-test2-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
test2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-kraken-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-gdax-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-bittrex-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
cloudsql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.09
Port: <none>
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=testing-11111:europe-west2:testology=tcp:5432
-credential_file=/secrets/cloudsql/credentials.json
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment: <none>
Mounts:
/cloudsql from cloudsql (rw)
/etc/ssl/certs from ssl-certs (rw)
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
cloudsql:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-b2mxc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-b2mxc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
27m 17m 44 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (2).
26m 8s 150 cluster-autoscaler Normal NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added)
16m 2s 63 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).
user_s#testing-11111:~/gce$
> Blockquote
kubectl get nodes
user_s#testing-11111:~/gce$ kubectl get nodes
NAME STATUS AGE VERSION
gke-test-default-pool-abdf83f7-p4zw Ready 9h v1.6.7
kubectl get pods
user_s#testing-11111:~/gce$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-529945953-gh6cl 0/5 Pending 0 38m
kubectl describe nodes
user_s#testing-11111:~/gce$ kubectl describe nodes
Name: gke-test-default-pool-abdf83f7-p4zw
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/fluentd-ds-ready=true
beta.kubernetes.io/instance-type=g1-small
beta.kubernetes.io/os=linux
cloud.google.com/gke-nodepool=default-pool
failure-domain.beta.kubernetes.io/region=europe-west2
failure-domain.beta.kubernetes.io/zone=europe-west2-c
kubernetes.io/hostname=gke-test-default-pool-abdf83f7-p4zw
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Tue, 26 Sep 2017 02:05:45 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 26 Sep 2017 02:06:05 +0100 Tue, 26 Sep 2017 02:06:05 +0100 RouteCreated RouteController created a route
OutOfDisk False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:06:05 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled
KernelDeadlock False Tue, 26 Sep 2017 11:33:12 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KernelHasNoDeadlock kernel has no deadlock
Addresses:
InternalIP: 10.154.0.2
ExternalIP: 35.197.217.1
Hostname: gke-test-default-pool-abdf83f7-p4zw
Capacity:
cpu: 1
memory: 1742968Ki
pods: 110
Allocatable:
cpu: 1
memory: 1742968Ki
pods: 110
System Info:
Machine ID: e6119abf844c564193495c64fd9bd341
System UUID: E6119ABF-844C-5641-9349-5C64FD9BD341
Boot ID: 1c2f2ea0-1f5b-4c90-9e14-d1d9d7b75221
Kernel Version: 4.4.52+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.11.2
Kubelet Version: v1.6.7
Kube-Proxy Version: v1.6.7
PodCIDR: 10.4.1.0/24
ExternalID: 6073438913956157854
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system fluentd-gcp-v2.0-k565g 100m (10%) 0 (0%) 200Mi (11%) 300Mi (17%)
kube-system heapster-v1.3.0-3440173064-1ztvw 138m (13%) 138m (13%) 301456Ki (17%) 301456Ki (17%)
kube-system kube-dns-1829567597-gdz52 260m (26%) 0 (0%) 110Mi (6%) 170Mi (9%)
kube-system kube-dns-autoscaler-2501648610-7q9dd 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-gke-test-default-pool-abdf83f7-p4zw 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-490794276-25hmn 100m (10%) 100m (10%) 50Mi (2%) 50Mi (2%)
kube-system l7-default-backend-3574702981-flqck 10m (1%) 10m (1%) 20Mi (1%) 20Mi (1%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
728m (72%) 248m (24%) 700816Ki (40%) 854416Ki (49%)
Events: <none>
As you can see in the output of your kubectl describe nodes command under Allocated resources:, there is 728m (72%) CPU and 700816Ki (40%) Memory already requested by Pods running in the kube-system namespace on the node. The sum of resource requests of your test Pod both exceeds the remaining CPU and Memory available on your node, as you can see under Events of your kubectl describe po/[…] command.
If you want to keep all containers in a single pod, you need to reduce the resource requests of your containers or run them on a node with more CPU and Memory. The better solution would be to split your application in multiple pods, this enables distribution over multiple nodes.