cluster-autoscaler and dns-controller continuously evicting

cluster-autoscaler and dns-controller continuously evicting - kubernetes

I have just terminated a AWS K8S node, and now.
K8S recreated a new one, and installed new pods. Everything seems good so far.
But when I do:
kubectl get po -A
I get:
kube-system cluster-autoscaler-648b4df947-42hxv 0/1 Evicted 0 3m53s
kube-system cluster-autoscaler-648b4df947-45pcc 0/1 Evicted 0 47m
kube-system cluster-autoscaler-648b4df947-46w6h 0/1 Evicted 0 91m
kube-system cluster-autoscaler-648b4df947-4tlbl 0/1 Evicted 0 69m
kube-system cluster-autoscaler-648b4df947-52295 0/1 Evicted 0 3m54s
kube-system cluster-autoscaler-648b4df947-55wzb 0/1 Evicted 0 83m
kube-system cluster-autoscaler-648b4df947-57kv5 0/1 Evicted 0 107m
kube-system cluster-autoscaler-648b4df947-69rsl 0/1 Evicted 0 98m
kube-system cluster-autoscaler-648b4df947-6msx2 0/1 Evicted 0 11m
kube-system cluster-autoscaler-648b4df947-6pphs 0 18m
kube-system dns-controller-697f6d9457-zswm8 0/1 Evicted 0 54m
When I do:
kubectl describe pod -n kube-system dns-controller-697f6d9457-zswm8
I get:
➜ monitoring git:(master) ✗ kubectl describe pod -n kube-system dns-controller-697f6d9457-zswm8
Name: dns-controller-697f6d9457-zswm8
Namespace: kube-system
Priority: 0
Node: ip-172-20-57-13.eu-west-3.compute.internal/
Start Time: Mon, 07 Oct 2019 12:35:06 +0200
Labels: k8s-addon=dns-controller.addons.k8s.io
k8s-app=dns-controller
pod-template-hash=697f6d9457
version=v1.12.0
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage. Container dns-controller was using 48Ki, which exceeds its request of 0.
IP:
IPs: <none>
Controlled By: ReplicaSet/dns-controller-697f6d9457
Containers:
dns-controller:
Image: kope/dns-controller:1.12.0
Port: <none>
Host Port: <none>
Command:
/usr/bin/dns-controller
--watch-ingress=false
--dns=aws-route53
--zone=*/ZDOYTALGJJXCM
--zone=*/*
-v=2
Requests:
cpu: 50m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from dns-controller-token-gvxxd (ro)
Volumes:
dns-controller-token-gvxxd:
Type: Secret (a volume populated by a Secret)
SecretName: dns-controller-token-gvxxd
Optional: false
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Evicted 59m kubelet, ip-172-20-57-13.eu-west-3.compute.internal The node was low on resource: ephemeral-storage. Container dns-controller was using 48Ki, which exceeds its request of 0.
Normal Killing 59m kubelet, ip-172-20-57-13.eu-west-3.compute.internal Killing container with id docker://dns-controller:Need to kill Pod
And:
➜ monitoring git:(master) ✗ kubectl describe pod -n kube-system cluster-autoscaler-648b4df947-2zcrz
Name: cluster-autoscaler-648b4df947-2zcrz
Namespace: kube-system
Priority: 0
Node: ip-172-20-57-13.eu-west-3.compute.internal/
Start Time: Mon, 07 Oct 2019 13:26:26 +0200
Labels: app=cluster-autoscaler
k8s-addon=cluster-autoscaler.addons.k8s.io
pod-template-hash=648b4df947
Annotations: prometheus.io/port: 8085
prometheus.io/scrape: true
scheduler.alpha.kubernetes.io/tolerations: [{"key":"dedicated", "value":"master"}]
Status: Failed
Reason: Evicted
Message: Pod The node was low on resource: [DiskPressure].
IP:
IPs: <none>
Controlled By: ReplicaSet/cluster-autoscaler-648b4df947
Containers:
cluster-autoscaler:
Image: gcr.io/google-containers/cluster-autoscaler:v1.15.1
Port: <none>
Host Port: <none>
Command:
./cluster-autoscaler
--v=4
--stderrthreshold=info
--cloud-provider=aws
--skip-nodes-with-local-storage=false
--nodes=0:1:pamela-nodes.k8s-prod.sunchain.fr
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 300Mi
Liveness: http-get http://:8085/health-check delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8085/health-check delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
AWS_REGION: eu-west-3
Mounts:
/etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
/var/run/secrets/kubernetes.io/serviceaccount from cluster-autoscaler-token-hld2m (ro)
Volumes:
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs/ca-certificates.crt
HostPathType:
cluster-autoscaler-token-hld2m:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-autoscaler-token-hld2m
Optional: false
QoS Class: Guaranteed
Node-Selectors: kubernetes.io/role=master
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned kube-system/cluster-autoscaler-648b4df947-2zcrz to ip-172-20-57-13.eu-west-3.compute.internal
Warning Evicted 11m kubelet, ip-172-20-57-13.eu-west-3.compute.internal The node was low on resource: [DiskPressure].
It seems to be a ressource issue. Weird thing is before I killed my EC2 instance, I didn t have this issue.
Why is it happening and what should I do? Is it mandatory to add more ressources ?
➜ scripts kubectl describe node ip-172-20-57-13.eu-west-3.compute.internal
Name: ip-172-20-57-13.eu-west-3.compute.internal
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-west-3
failure-domain.beta.kubernetes.io/zone=eu-west-3a
kops.k8s.io/instancegroup=master-eu-west-3a
kubernetes.io/hostname=ip-172-20-57-13.eu-west-3.compute.internal
kubernetes.io/role=master
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 28 Aug 2019 09:38:09 +0200
Taints: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 28 Aug 2019 09:38:36 +0200 Wed, 28 Aug 2019 09:38:36 +0200 RouteCreated RouteController created a route
OutOfDisk False Mon, 07 Oct 2019 14:14:32 +0200 Wed, 28 Aug 2019 09:38:09 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Mon, 07 Oct 2019 14:14:32 +0200 Wed, 28 Aug 2019 09:38:09 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Mon, 07 Oct 2019 14:14:32 +0200 Mon, 07 Oct 2019 14:11:02 +0200 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Mon, 07 Oct 2019 14:14:32 +0200 Wed, 28 Aug 2019 09:38:09 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 07 Oct 2019 14:14:32 +0200 Wed, 28 Aug 2019 09:38:35 +0200 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 172.20.57.13
ExternalIP: 35.180.187.101
InternalDNS: ip-172-20-57-13.eu-west-3.compute.internal
Hostname: ip-172-20-57-13.eu-west-3.compute.internal
ExternalDNS: ec2-35-180-187-101.eu-west-3.compute.amazonaws.com
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 7797156Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2013540Ki
pods: 110
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 7185858958
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1911140Ki
pods: 110
System Info:
Machine ID: ec2b3aa5df0e3ad288d210f309565f06
System UUID: EC2B3AA5-DF0E-3AD2-88D2-10F309565F06
Boot ID: f9d5417b-eba9-4544-9710-a25d01247b46
Kernel Version: 4.9.0-9-amd64
OS Image: Debian GNU/Linux 9 (stretch)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.3
Kubelet Version: v1.12.10
Kube-Proxy Version: v1.12.10
PodCIDR: 100.96.1.0/24
ProviderID: aws:///eu-west-3a/i-03bf1b26313679d65
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system etcd-manager-events-ip-172-20-57-13.eu-west-3.compute.internal 100m (5%) 0 (0%) 100Mi (5%) 0 (0%) 40d
kube-system etcd-manager-main-ip-172-20-57-13.eu-west-3.compute.internal 200m (10%) 0 (0%) 100Mi (5%) 0 (0%) 40d
kube-system kube-apiserver-ip-172-20-57-13.eu-west-3.compute.internal 150m (7%) 0 (0%) 0 (0%) 0 (0%) 40d
kube-system kube-controller-manager-ip-172-20-57-13.eu-west-3.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%) 40d
kube-system kube-proxy-ip-172-20-57-13.eu-west-3.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%) 40d
kube-system kube-scheduler-ip-172-20-57-13.eu-west-3.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%) 40d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (37%) 0 (0%)
memory 200Mi (10%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasNoDiskPressure 55m (x324 over 40d) kubelet, ip-172-20-57-13.eu-west-3.compute.internal Node ip-172-20-57-13.eu-west-3.compute.internal status is now: NodeHasNoDiskPressure
Warning EvictionThresholdMet 10m (x1809 over 16d) kubelet, ip-172-20-57-13.eu-west-3.compute.internal Attempting to reclaim ephemeral-storage
Warning ImageGCFailed 4m30s (x6003 over 23d) kubelet, ip-172-20-57-13.eu-west-3.compute.internal (combined from similar events): wanted to free 652348620 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete dd37681076e1 (cannot be forced) - image is being used by running container b1800146af29
I think a better command to debug it is:
devops git:(master) ✗ kubectl get events --sort-by=.metadata.creationTimestamp -o wide
LAST SEEN TYPE REASON KIND SOURCE MESSAGE SUBOBJECT FIRST SEEN COUNT NAME
10m Warning ImageGCFailed Node kubelet, ip-172-20-57-13.eu-west-3.compute.internal (combined from similar events): wanted to free 653307084 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete dd37681076e1 (cannot be forced) - image is being used by running container b1800146af29 23d 6004 ip-172-20-57-13.eu-west-3.compute.internal.15c4124e15eb1d33
2m59s Warning ImageGCFailed Node kubelet, ip-172-20-36-135.eu-west-3.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 639524044 bytes, but freed 0 bytes 7d9h 2089 ip-172-20-36-135.eu-west-3.compute.internal.15c916d24afe2c25
4m59s Warning ImageGCFailed Node kubelet, ip-172-20-33-81.eu-west-3.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 458296524 bytes, but freed 0 bytes 4d14h 1183 ip-172-20-33-81.eu-west-3.compute.internal.15c9f3fe4e1525ec
6m43s Warning EvictionThresholdMet Node kubelet, ip-172-20-57-13.eu-west-3.compute.internal Attempting to reclaim ephemeral-storage 16d 1841 ip-172-20-57-13.eu-west-3.compute.internal.15c66e349b761219
41s Normal NodeHasNoDiskPressure Node kubelet, ip-172-20-57-13.eu-west-3.compute.internal Node ip-172-20-57-13.eu-west-3.compute.internal status is now: NodeHasNoDiskPressure 40d 333 ip-172-20-57-13.eu-west-3.compute.internal.15bf05cec37981b6
Now df -h
admin#ip-172-20-57-13:/var/log$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 972M 0 972M 0% /dev
tmpfs 197M 2.3M 195M 2% /run
/dev/nvme0n1p2 7.5G 6.4G 707M 91% /
tmpfs 984M 0 984M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 984M 0 984M 0% /sys/fs/cgroup
/dev/nvme1n1 20G 430M 20G 3% /mnt/master-vol-09618123eb79d92c8
/dev/nvme2n1 20G 229M 20G 2% /mnt/master-vol-05c9684f0edcbd876

It looks like your nodes/master is running low on storage? I see only 1GB for ephemeral storage available.
You should free up some space on the node and master. It should get rid of your problem.

Related

Node role is missing for Master node - Kubernetes installation done with the help of Kubespray

After clean installation of Kubernetes cluster with 3 nodes (2 Master & 3 Node)
i.e., Masters are also assigned to be worker node.
After successful installation, I got the below roles for the node. Where Node role is missing for the masters as shown.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready master 12d v1.18.5
node2 Ready master 12d v1.18.5
node3 Ready <none> 12d v1.18.5
inventory/mycluster/hosts.yaml
all:
hosts:
node1:
ansible_host: 10.1.10.110
ip: 10.1.10.110
access_ip: 10.1.10.110
node2:
ansible_host: 10.1.10.111
ip: 10.1.10.111
access_ip: 10.1.10.111
node3:
ansible_host: 10.1.10.112
ip: 10.1.10.112
access_ip: 10.1.10.112
children:
kube-master:
hosts:
node1:
node2:
kube-node:
hosts:
node1:
node2:
node3:
etcd:
hosts:
node1:
node2:
node3:
k8s-cluster:
children:
kube-master:
kube-node:
calico-rr:
hosts: {}
vault:
hosts:
node1
node2
node3
Network Plugin : Flannel
Command used to invoke ansible:
ansible-playbook -i inventory/mycluster/hosts.yaml --become cluster.yml
How can I make master node to be work as worker node as well?
Kubectl describe node1 output:
kubectl describe node node1
Name: node1
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"a6:bb:9e:2a:7e:a8"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.1.10.110
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 01 Jul 2020 09:26:15 -0700
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: node1
AcquireTime: <unset>
RenewTime: Tue, 14 Jul 2020 06:39:58 -0700
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 10 Jul 2020 12:51:05 -0700 Fri, 10 Jul 2020 12:51:05 -0700 FlannelIsUp Flannel is running on this node
MemoryPressure False Tue, 14 Jul 2020 06:40:02 -0700 Fri, 03 Jul 2020 15:00:26 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 14 Jul 2020 06:40:02 -0700 Fri, 03 Jul 2020 15:00:26 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 14 Jul 2020 06:40:02 -0700 Fri, 03 Jul 2020 15:00:26 -0700 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 14 Jul 2020 06:40:02 -0700 Mon, 06 Jul 2020 10:45:01 -0700 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.1.10.110
Hostname: node1
Capacity:
cpu: 8
ephemeral-storage: 51175Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32599596Ki
pods: 110
Allocatable:
cpu: 7800m
ephemeral-storage: 48294789041
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 31997196Ki
pods: 110
System Info:
Machine ID: c8690497b9704d2d975c33155c9fa69e
System UUID: 00000000-0000-0000-0000-AC1F6B96768A
Boot ID: 5e3eabe0-7732-4e6d-b25d-7eeec347d6c6
Kernel Version: 3.10.0-1127.13.1.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.12
Kubelet Version: v1.18.5
Kube-Proxy Version: v1.18.5
PodCIDR: 10.233.64.0/24
PodCIDRs: 10.233.64.0/24
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default httpd-deployment-598596ddfc-n56jq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7d20h
kube-system coredns-dff8fc7d-lb6bh 100m (1%) 0 (0%) 70Mi (0%) 170Mi (0%) 3d17h
kube-system kube-apiserver-node1 250m (3%) 0 (0%) 0 (0%) 0 (0%) 12d
kube-system kube-controller-manager-node1 200m (2%) 0 (0%) 0 (0%) 0 (0%) 12d
kube-system kube-flannel-px8cj 150m (1%) 300m (3%) 64M (0%) 500M (1%) 3d17h
kube-system kube-proxy-6spl2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d17h
kube-system kube-scheduler-node1 100m (1%) 0 (0%) 0 (0%) 0 (0%) 12d
kube-system kubernetes-metrics-scraper-54fbb4d595-28vvc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7d20h
kube-system nodelocaldns-rxs4f 100m (1%) 0 (0%) 70Mi (0%) 170Mi (0%) 12d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 900m (11%) 300m (3%)
memory 205860Ki (0%) 856515840 (2%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>

How can I make master node to be work as worker node as well ?
Remove the NoSchedule taint from master nodes using below command
kubectl taint node node1 node-role.kubernetes.io/master:NoSchedule-
kubectl taint node node2 node-role.kubernetes.io/master:NoSchedule-
After this node1 and node2 will become like worker nodes and pods can be scheduled on them.

Minikube pod keep a forever pending status and failed to be scheduled

I am very begginer on kubernetes. Sorry if this a is dumb question.
I am using minikube and kvm2(5.0.0). Here is the info about minikube and kubectl version
Minikube status output
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
kubectl cluster-info output:
Kubernetes master is running at https://127.0.0.1:32768
KubeDNS is running at https://127.0.0.1:32768/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
I am trying to deploy a pod using kubectl apply -f client-pod.yaml. Here is my client-pod.yaml configuration
apiVersion: v1
kind: Pod
metadata:
name: client-pod
labels:
component: web
spec:
containers:
- name: client
image: stephengrider/multi-client
ports:
- containerPort: 3000
This is the kubectl get pods output:
NAME READY STATUS RESTARTS AGE
client-pod 0/1 Pending 0 4m15s
kubectl describe pods output:
Name: client-pod
Namespace: default
Priority: 0
Node: <none>
Labels: component=web
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"component":"web"},"name":"client-pod","namespace":"default"},"spec...
Status: Pending
IP:
IPs: <none>
Containers:
client:
Image: stephengrider/multi-client
Port: 3000/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-z45bq (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-z45bq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-z45bq
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
I have been searching for a way to see which taints is stopping to pod to initialize wihout luck.
Is there a way to see the taint that is failing?
kubectl get nodes output:
NAME STATUS ROLES AGE VERSION
m01 Ready master 11h v1.17.3
-- EDIT --
kubectl describe nodes output:
Name: home-pc
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=home-pc
kubernetes.io/os=linux
minikube.k8s.io/commit=eb13446e786c9ef70cb0a9f85a633194e62396a1
minikube.k8s.io/name=minikube
minikube.k8s.io/updated_at=2020_03_17T22_51_28_0700
minikube.k8s.io/version=v1.8.2
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 17 Mar 2020 22:51:25 -0500
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: home-pc
AcquireTime: <unset>
RenewTime: Tue, 17 Mar 2020 22:51:41 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:41 -0500 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.0.12
Hostname: home-pc
Capacity:
cpu: 12
ephemeral-storage: 227688908Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8159952Ki
pods: 110
Allocatable:
cpu: 12
ephemeral-storage: 209838097266
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8057552Ki
pods: 110
System Info:
Machine ID: 339d426453b4492da92f75d06acc1e0d
System UUID: 62eedb55-444f-61ce-75e9-b06ebf3331a0
Boot ID: a9ae9889-d7cb-48c5-ae75-b2052292ac7a
Kernel Version: 5.0.0-38-generic
OS Image: Ubuntu 19.04
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.5
Kubelet Version: v1.17.3
Kube-Proxy Version: v1.17.3
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-6955765f44-mbwqt 100m (0%) 0 (0%) 70Mi (0%) 170Mi (2%) 10s
kube-system coredns-6955765f44-sblf2 100m (0%) 0 (0%) 70Mi (0%) 170Mi (2%) 10s
kube-system etcd-home-pc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-apiserver-home-pc 250m (2%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-controller-manager-home-pc 200m (1%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-proxy-lk7xs 0 (0%) 0 (0%) 0 (0%) 0 (0%) 10s
kube-system kube-scheduler-home-pc 100m (0%) 0 (0%) 0 (0%) 0 (0%) 12s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (6%) 0 (0%)
memory 140Mi (1%) 340Mi (4%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 24s kubelet, home-pc Starting kubelet.
Normal NodeHasSufficientMemory 23s (x4 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 23s (x3 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 23s (x3 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 23s kubelet, home-pc Updated Node Allocatable limit across pods
Normal Starting 13s kubelet, home-pc Starting kubelet.
Normal NodeHasSufficientMemory 13s kubelet, home-pc Node home-pc status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 13s kubelet, home-pc Node home-pc status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 13s kubelet, home-pc Node home-pc status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 13s kubelet, home-pc Updated Node Allocatable limit across pods
Normal Starting 9s kube-proxy, home-pc Starting kube-proxy.
Normal NodeReady 3s kubelet, home-pc Node home-pc status is now: NodeReady

You have some taints on the node which is stopping the scheduler from deploying the pod.Either remove the taint from master node or add tolerations in the pod spec.

How to remove kube taints from worker nodes: Taints node.kubernetes.io/unreachable:NoSchedule

I was able to remove the Taint from master but my two worker nodes installed bare metal with Kubeadmin keep the unreachable taint even after issuing command to remove them. It says removed but its not permanent. And when I check taints still there. I also tried patching and setting to null but this did not work. Only thing I found on SO or anywhere else deals with master or assumes these commands work.
UPDATE: I checked the timestamp of the Taint and its added in again the moment it is deleted. So in what sense is the node unreachable? I can ping it. Is there any kubernetes diagnostics I can run to find out how it is unreachable? I checked I can ping both ways between master and worker nodes. So where would log would show error which component cannot connect?
kubectl describe no k8s-node1 | grep -i taint
Taints: node.kubernetes.io/unreachable:NoSchedule
Tried:
kubectl patch node k8s-node1 -p '{"spec":{"Taints":[]}}'
And
kubectl taint nodes --all node.kubernetes.io/unreachable:NoSchedule-
kubectl taint nodes --all node.kubernetes.io/unreachable:NoSchedule-
node/k8s-node1 untainted
node/k8s-node2 untainted
error: taint "node.kubernetes.io/unreachable:NoSchedule" not found
result is it says untainted for the two workers nodes but then I see them again when I grep
kubectl describe no k8s-node1 | grep -i taint
Taints: node.kubernetes.io/unreachable:NoSchedule
$ k get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 10d v1.14.2
k8s-node1 NotReady <none> 10d v1.14.2
k8s-node2 NotReady <none> 10d v1.14.2
UPDATE: Found someone had same problem and could only fix by resetting the cluster with Kubeadmin
https://forum.linuxfoundation.org/discussion/846483/lab2-1-kubectl-untainted-not-working
Sure hope I dont have to do that every time the worker nodes get tainted.
k describe node k8s-node2
Name: k8s-node2
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8s-node2
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":”d2:xx:61:c3:xx:16"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.xx.1.xx
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 05 Jun 2019 11:46:12 +0700
Taints: node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure Unknown Fri, 14 Jun 2019 10:34:07 +0700 Fri, 14 Jun 2019 10:35:09 +0700 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Fri, 14 Jun 2019 10:34:07 +0700 Fri, 14 Jun 2019 10:35:09 +0700 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Fri, 14 Jun 2019 10:34:07 +0700 Fri, 14 Jun 2019 10:35:09 +0700 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Fri, 14 Jun 2019 10:34:07 +0700 Fri, 14 Jun 2019 10:35:09 +0700 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 10.10.10.xx
Hostname: k8s-node2
Capacity:
cpu: 2
ephemeral-storage: 26704124Ki
memory: 4096032Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 24610520638
memory: 3993632Ki
pods: 110
System Info:
Machine ID: 6e4e4e32972b3b2f27f021dadc61d21
System UUID: 6e4e4ds972b3b2f27f0cdascf61d21
Boot ID: abfa0780-3b0d-sda9-a664-df900627be14
Kernel Version: 4.4.0-87-generic
OS Image: Ubuntu 16.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.3
Kubelet Version: v1.14.2
Kube-Proxy Version: v1.14.2
PodCIDR: 10.xxx.10.1/24
Non-terminated Pods: (18 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
heptio-sonobuoy sonobuoy-systemd-logs-daemon-set- 6a8d92061c324451-hnnp9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d1h
istio-system istio-pilot-7955cdff46-w648c 110m (5%) 2100m (105%) 228Mi (5%) 1224Mi (31%) 6h55m
istio-system istio-telemetry-5c9cb76c56-twzf5 150m (7%) 2100m (105%) 228Mi (5%) 1124Mi (28%) 6h55m
istio-system zipkin-8594bbfc6b-9p2qc 0 (0%) 0 (0%) 1000Mi (25%) 1000Mi (25%) 6h55m
knative-eventing webhook-576479cc56-wvpt6 0 (0%) 0 (0%) 1000Mi (25%) 1000Mi (25%) 6h45m
knative-monitoring elasticsearch-logging-0 100m (5%) 1 (50%) 0 (0%) 0 (0%) 3d20h
knative-monitoring grafana-5cdc94dbd-mc4jn 100m (5%) 200m (10%) 100Mi (2%) 200Mi (5%) 3d21h
knative-monitoring kibana-logging-7cb6b64bff-dh8nx 100m (5%) 1 (50%) 0 (0%) 0 (0%) 3d20h
knative-monitoring kube-state-metrics-56f68467c9-vr5cx 223m (11%) 243m (12%) 176Mi (4%) 216Mi (5%) 3d21h
knative-monitoring node-exporter-7jw59 110m (5%) 220m (11%) 50Mi (1%) 90Mi (2%) 3d22h
knative-monitoring prometheus-system-0 0 (0%) 0 (0%) 400Mi (10%) 1000Mi (25%) 3d20h
knative-serving activator-6cfb97bccf-bfc4w 120m (6%) 2200m (110%) 188Mi (4%) 1624Mi (41%) 6h45m
knative-serving autoscaler-85749b6c48-4wf6z 130m (6%) 2300m (114%) 168Mi (4%) 1424Mi (36%) 6h45m
knative-serving controller-b49d69f4d-7j27s 100m (5%) 1 (50%) 100Mi (2%) 1000Mi (25%) 6h45m
knative-serving networking-certmanager-5b5d8f5dd8-qjh5q 100m (5%) 1 (50%) 100Mi (2%) 1000Mi (25%) 6h45m
knative-serving networking-istio-7977b9bbdd-vrpl5 100m (5%) 1 (50%) 100Mi (2%) 1000Mi (25%) 6h45m
kube-system canal-qbn67 250m (12%) 0 (0%) 0 (0%) 0 (0%) 10d
kube-system kube-proxy-phbf5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 10d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1693m (84%) 14363m (718%)
memory 3838Mi (98%) 11902Mi (305%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>

Problem was that swap was turned on the worker nodes and thus kublet crashed exited. This was evident from syslog file under /var, thus the taint will get re-added until this is resolved. Perhaps someone can comment on the implications of allowing kublet to run with swap on?:
kubelet[29207]: F0616 06:25:05.597536 29207 server.go:265] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename#011#011#011#011Type#011#011Size#011Used#011Priority /dev/xvda5 partition#0114191228#0110#011-1]
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Unit entered failed state.
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jun 16 06:25:15 k8s-node2 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jun 16 06:25:15 k8s-node2 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 16 06:25:15 k8s-node2 systemd[1]: Started kubelet: The Kubernetes Node Agent.

Problem with Kubernetes in Google Cloud stuck with ContainerCreating status

I'm having a problem with my GKE cluster, all the pods are stuck with ContainerCreating status. When I run the kubectl get events I see this error:
Failed create pod sandbox: rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Anyone knows what the hell is happening? I can't find this solution anywhere.
EDIT
I saw this post https://github.com/kubernetes/kubernetes/issues/44273 saying that the GKE instances that are small than the default google instance for GKE(n1-standard-1) can have network problems. So I changed my instances to the default type, but without success. Here are my node and pod descriptions:
Name: gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/fluentd-ds-ready=true
beta.kubernetes.io/instance-type=n1-standard-1
beta.kubernetes.io/os=linux
cloud.google.com/gke-nodepool=pool-nodes-dev
failure-domain.beta.kubernetes.io/region=southamerica-east1
failure-domain.beta.kubernetes.io/zone=southamerica-east1-a
kubernetes.io/hostname=gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Thu, 27 Sep 2018 20:27:47 -0300
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
KernelDeadlock False Fri, 28 Sep 2018 09:58:58 -0300 Thu, 27 Sep 2018 20:27:16 -0300 KernelHasNoDeadlock kernel has no deadlock
FrequentUnregisterNetDevice False Fri, 28 Sep 2018 09:58:58 -0300 Thu, 27 Sep 2018 20:32:18 -0300 UnregisterNetDevice node is functioning properly
NetworkUnavailable False Thu, 27 Sep 2018 20:27:48 -0300 Thu, 27 Sep 2018 20:27:48 -0300 RouteCreated NodeController create implicit route
OutOfDisk False Fri, 28 Sep 2018 09:59:03 -0300 Thu, 27 Sep 2018 20:27:47 -0300 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 28 Sep 2018 09:59:03 -0300 Thu, 27 Sep 2018 20:27:47 -0300 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 28 Sep 2018 09:59:03 -0300 Thu, 27 Sep 2018 20:27:47 -0300 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 28 Sep 2018 09:59:03 -0300 Thu, 27 Sep 2018 20:27:47 -0300 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 28 Sep 2018 09:59:03 -0300 Thu, 27 Sep 2018 20:28:07 -0300 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.2
ExternalIP:
Hostname: gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6
Capacity:
cpu: 1
ephemeral-storage: 98868448Ki
hugepages-2Mi: 0
memory: 3787608Ki
pods: 110
Allocatable:
cpu: 940m
ephemeral-storage: 47093746742
hugepages-2Mi: 0
memory: 2702168Ki
pods: 110
System Info:
Machine ID: 1e8e0ecad8f5cc7fb5851bc64513d40c
System UUID: 1E8E0ECA-D8F5-CC7F-B585-1BC64513D40C
Boot ID: 971e5088-6bc1-4151-94bf-b66c6c7ee9a3
Kernel Version: 4.14.56+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.2
Kubelet Version: v1.10.7-gke.2
Kube-Proxy Version: v1.10.7-gke.2
PodCIDR: 10.0.32.0/24
ProviderID: gce://aditumpay/southamerica-east1-a/gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6
Non-terminated Pods: (11 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system event-exporter-v0.2.1-5f5b89fcc8-xsvmg 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system fluentd-gcp-scaler-7c5db745fc-vttc9 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system fluentd-gcp-v3.1.0-sz8r8 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system heapster-v1.5.3-75486b456f-sj7k8 138m (14%) 138m (14%) 301856Ki (11%) 301856Ki (11%)
kube-system kube-dns-788979dc8f-99xvh 260m (27%) 0 (0%) 110Mi (4%) 170Mi (6%)
kube-system kube-dns-788979dc8f-9sz2b 260m (27%) 0 (0%) 110Mi (4%) 170Mi (6%)
kube-system kube-dns-autoscaler-79b4b844b9-6s8x2 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-598d75cb96-6nhcd 50m (5%) 100m (10%) 100Mi (3%) 300Mi (11%)
kube-system l7-default-backend-5d5b9874d5-8wk6h 10m (1%) 10m (1%) 20Mi (0%) 20Mi (0%)
kube-system metrics-server-v0.2.1-7486f5bd67-fvddz 53m (5%) 148m (15%) 154Mi (5%) 404Mi (15%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 891m (94%) 396m (42%)
memory 817952Ki (30%) 1391392Ki (51%)
Events: <none>
The other node:
Name: gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/fluentd-ds-ready=true
beta.kubernetes.io/instance-type=n1-standard-1
beta.kubernetes.io/os=linux
cloud.google.com/gke-nodepool=pool-nodes-dev
failure-domain.beta.kubernetes.io/region=southamerica-east1
failure-domain.beta.kubernetes.io/zone=southamerica-east1-a
kubernetes.io/hostname=gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Thu, 27 Sep 2018 20:30:05 -0300
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
KernelDeadlock False Fri, 28 Sep 2018 10:11:03 -0300 Thu, 27 Sep 2018 20:29:34 -0300 KernelHasNoDeadlock kernel has no deadlock
FrequentUnregisterNetDevice False Fri, 28 Sep 2018 10:11:03 -0300 Thu, 27 Sep 2018 20:34:36 -0300 UnregisterNetDevice node is functioning properly
NetworkUnavailable False Thu, 27 Sep 2018 20:30:06 -0300 Thu, 27 Sep 2018 20:30:06 -0300 RouteCreated NodeController create implicit route
OutOfDisk False Fri, 28 Sep 2018 10:11:49 -0300 Thu, 27 Sep 2018 20:30:05 -0300 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 28 Sep 2018 10:11:49 -0300 Thu, 27 Sep 2018 20:30:05 -0300 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 28 Sep 2018 10:11:49 -0300 Thu, 27 Sep 2018 20:30:05 -0300 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 28 Sep 2018 10:11:49 -0300 Thu, 27 Sep 2018 20:30:05 -0300 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 28 Sep 2018 10:11:49 -0300 Thu, 27 Sep 2018 20:30:25 -0300 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.3
ExternalIP:
Hostname: gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz
Capacity:
cpu: 1
ephemeral-storage: 98868448Ki
hugepages-2Mi: 0
memory: 3787608Ki
pods: 110
Allocatable:
cpu: 940m
ephemeral-storage: 47093746742
hugepages-2Mi: 0
memory: 2702168Ki
pods: 110
System Info:
Machine ID: f1d5cf2a0b2c5472cf6509778a7941a7
System UUID: F1D5CF2A-0B2C-5472-CF65-09778A7941A7
Boot ID: f35bebb8-acd7-4a2f-95d6-76604638aef9
Kernel Version: 4.14.56+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.2
Kubelet Version: v1.10.7-gke.2
Kube-Proxy Version: v1.10.7-gke.2
PodCIDR: 10.0.33.0/24
ProviderID: gce://aditumpay/southamerica-east1-a/gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default aditum-payment-7d966c494c-wpk2t 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default aditum-portal-dev-5c69d76bb6-n5d5b 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default aditum-vtexapi-5c758fcfb7-rhvsn 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default admin-mongo-dev-7d9f7f7d46-rrj42 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default mongod-0 200m (21%) 0 (0%) 200Mi (7%) 0 (0%)
kube-system fluentd-gcp-v3.1.0-pgwfx 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz 100m (10%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 700m (74%) 0 (0%)
memory 200Mi (7%) 0 (0%)
Events: <none>
All the cluster's pods are stucked.
NAMESPACE NAME READY STATUS RESTARTS AGE
default aditum-payment-7d966c494c-wpk2t 0/1 ContainerCreating 0 13h
default aditum-portal-dev-5c69d76bb6-n5d5b 0/1 ContainerCreating 0 13h
default aditum-vtexapi-5c758fcfb7-rhvsn 0/1 ContainerCreating 0 13h
default admin-mongo-dev-7d9f7f7d46-rrj42 0/1 ContainerCreating 0 13h
default mongod-0 0/1 ContainerCreating 0 13h
kube-system event-exporter-v0.2.1-5f5b89fcc8-xsvmg 0/2 ContainerCreating 0 13h
kube-system fluentd-gcp-scaler-7c5db745fc-vttc9 0/1 ContainerCreating 0 13h
kube-system fluentd-gcp-v3.1.0-pgwfx 0/2 ContainerCreating 0 16h
kube-system fluentd-gcp-v3.1.0-sz8r8 0/2 ContainerCreating 0 16h
kube-system heapster-v1.5.3-75486b456f-sj7k8 0/3 ContainerCreating 0 13h
kube-system kube-dns-788979dc8f-99xvh 0/4 ContainerCreating 0 13h
kube-system kube-dns-788979dc8f-9sz2b 0/4 ContainerCreating 0 13h
kube-system kube-dns-autoscaler-79b4b844b9-6s8x2 0/1 ContainerCreating 0 13h
kube-system kube-proxy-gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-bgb6 0/1 ContainerCreating 0 13h
kube-system kube-proxy-gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz 0/1 ContainerCreating 0 13h
kube-system kubernetes-dashboard-598d75cb96-6nhcd 0/1 ContainerCreating 0 13h
kube-system l7-default-backend-5d5b9874d5-8wk6h 0/1 ContainerCreating 0 13h
kube-system metrics-server-v0.2.1-7486f5bd67-fvddz 0/2 ContainerCreating 0 13h
A stucked pod.
Name: aditum-payment-7d966c494c-wpk2t
Namespace: default
Node: gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz/10.0.0.3
Start Time: Thu, 27 Sep 2018 20:30:47 -0300
Labels: io.kompose.service=aditum-payment
pod-template-hash=3852270507
Annotations: kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container aditum-payment
Status: Pending
IP:
Controlled By: ReplicaSet/aditum-payment-7d966c494c
Containers:
aditum-payment:
Container ID:
Image: gcr.io/aditumpay/aditumpaymentwebapi:latest
Image ID:
Port: 5000/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment:
CONNECTIONSTRING: <set to the key 'CONNECTIONSTRING' of config map 'aditum-payment-config'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qsc9k (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-qsc9k:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qsc9k
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 3m (x1737 over 13h) kubelet, gke-aditum-k8scluster--pool-nodes-dev-500ebc8b-m7bz Failed create pod sandbox: rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Thanks!

Sorry for taking to long to respond. It was a very silly problem. After I reach the google cloud support, I notice that my NAT machine was not working properly. The PrivateAccess route was passing thougth my NAT. Thanks everyone for the help.

In addition of the description of your nodes, it ca depend from where you are launching them.
As mentioned in kubernetes/minikube issue 2148 or kubernetes/minikube issue 3142, that won't work from China.
The workaround in that case is to find another source, pull it and tag it:
minikube ssh \
"docker pull registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0
docker tag registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0 gcr.io/google_containers/pause-amd64:3.0"

pod hangs in Pending state

I have a kubernetes deployment in which I am trying to run 5 docker containers inside a single pod on a single node. The containers hang in "Pending" state and are never scheduled. I do not mind running more than 1 pod but I'd like to keep the number of nodes down. I have assumed 1 node with 1 CPU and 1.7G RAM will be enough for the 5 containers and I have attempted to split the workload across.
Initially I came to the conclusion that I have insufficient resources. I enabled autoscaling of nodes which produced the following (see kubectl describe pod command):
pod didn't trigger scale-up (it wouldn't fit if a new node is added)
Anyway, each docker container has a simple command which runs a fairly simple app. Ideally I wouldn't like to have to deal with setting CPU and RAM allocation of resources but even setting the CPU/mem limits within bounds so they don't add up to > 1, I still get (see kubectl describe po/test-529945953-gh6cl) I get this:
No nodes are available that match all of the following predicates::
Insufficient cpu (1), Insufficient memory (1).
Below are various commands that show the state. Any help on what I'm doing wrong will be appreciated.
kubectl get all
user_s#testing-11111:~/gce$ kubectl get all
NAME READY STATUS RESTARTS AGE
po/test-529945953-gh6cl 0/5 Pending 0 34m
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kubernetes 10.7.240.1 <none> 443/TCP 19d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/test 1 1 1 0 34m
NAME DESIRED CURRENT READY AGE
rs/test-529945953 1 1 0 34m
user_s#testing-11111:~/gce$
kubectl describe po/test-529945953-gh6cl
user_s#testing-11111:~/gce$ kubectl describe po/test-529945953-gh6cl
Name: test-529945953-gh6cl
Namespace: default
Node: <none>
Labels: app=test
pod-template-hash=529945953
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"test-529945953","uid":"c6e889cb-a2a0-11e7-ac18-42010a9a001a"...
Status: Pending
IP:
Created By: ReplicaSet/test-529945953
Controlled By: ReplicaSet/test-529945953
Containers:
container-test2-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
test2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-kraken-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-gdax-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-bittrex-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
cloudsql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.09
Port: <none>
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=testing-11111:europe-west2:testology=tcp:5432
-credential_file=/secrets/cloudsql/credentials.json
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment: <none>
Mounts:
/cloudsql from cloudsql (rw)
/etc/ssl/certs from ssl-certs (rw)
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
cloudsql:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-b2mxc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-b2mxc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
27m 17m 44 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (2).
26m 8s 150 cluster-autoscaler Normal NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added)
16m 2s 63 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).
user_s#testing-11111:~/gce$
> Blockquote
kubectl get nodes
user_s#testing-11111:~/gce$ kubectl get nodes
NAME STATUS AGE VERSION
gke-test-default-pool-abdf83f7-p4zw Ready 9h v1.6.7
kubectl get pods
user_s#testing-11111:~/gce$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-529945953-gh6cl 0/5 Pending 0 38m
kubectl describe nodes
user_s#testing-11111:~/gce$ kubectl describe nodes
Name: gke-test-default-pool-abdf83f7-p4zw
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/fluentd-ds-ready=true
beta.kubernetes.io/instance-type=g1-small
beta.kubernetes.io/os=linux
cloud.google.com/gke-nodepool=default-pool
failure-domain.beta.kubernetes.io/region=europe-west2
failure-domain.beta.kubernetes.io/zone=europe-west2-c
kubernetes.io/hostname=gke-test-default-pool-abdf83f7-p4zw
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Tue, 26 Sep 2017 02:05:45 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 26 Sep 2017 02:06:05 +0100 Tue, 26 Sep 2017 02:06:05 +0100 RouteCreated RouteController created a route
OutOfDisk False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:06:05 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled
KernelDeadlock False Tue, 26 Sep 2017 11:33:12 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KernelHasNoDeadlock kernel has no deadlock
Addresses:
InternalIP: 10.154.0.2
ExternalIP: 35.197.217.1
Hostname: gke-test-default-pool-abdf83f7-p4zw
Capacity:
cpu: 1
memory: 1742968Ki
pods: 110
Allocatable:
cpu: 1
memory: 1742968Ki
pods: 110
System Info:
Machine ID: e6119abf844c564193495c64fd9bd341
System UUID: E6119ABF-844C-5641-9349-5C64FD9BD341
Boot ID: 1c2f2ea0-1f5b-4c90-9e14-d1d9d7b75221
Kernel Version: 4.4.52+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.11.2
Kubelet Version: v1.6.7
Kube-Proxy Version: v1.6.7
PodCIDR: 10.4.1.0/24
ExternalID: 6073438913956157854
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system fluentd-gcp-v2.0-k565g 100m (10%) 0 (0%) 200Mi (11%) 300Mi (17%)
kube-system heapster-v1.3.0-3440173064-1ztvw 138m (13%) 138m (13%) 301456Ki (17%) 301456Ki (17%)
kube-system kube-dns-1829567597-gdz52 260m (26%) 0 (0%) 110Mi (6%) 170Mi (9%)
kube-system kube-dns-autoscaler-2501648610-7q9dd 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-gke-test-default-pool-abdf83f7-p4zw 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-490794276-25hmn 100m (10%) 100m (10%) 50Mi (2%) 50Mi (2%)
kube-system l7-default-backend-3574702981-flqck 10m (1%) 10m (1%) 20Mi (1%) 20Mi (1%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
728m (72%) 248m (24%) 700816Ki (40%) 854416Ki (49%)
Events: <none>

As you can see in the output of your kubectl describe nodes command under Allocated resources:, there is 728m (72%) CPU and 700816Ki (40%) Memory already requested by Pods running in the kube-system namespace on the node. The sum of resource requests of your test Pod both exceeds the remaining CPU and Memory available on your node, as you can see under Events of your kubectl describe po/[…] command.
If you want to keep all containers in a single pod, you need to reduce the resource requests of your containers or run them on a node with more CPU and Memory. The better solution would be to split your application in multiple pods, this enables distribution over multiple nodes.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

cluster-autoscaler and dns-controller continuously evicting - kubernetes

It looks like your nodes/master is running low on storage? I see only 1GB for ephemeral storage available. You should free up some space on the node and master. It should get rid of your problem.

Related

Node role is missing for Master node - Kubernetes installation done with the help of Kubespray

Minikube pod keep a forever pending status and failed to be scheduled

How to remove kube taints from worker nodes: Taints node.kubernetes.io/unreachable:NoSchedule

Problem with Kubernetes in Google Cloud stuck with ContainerCreating status

pod hangs in Pending state

Categories

Resources