Kubernetes eviction manager evicting control plane pods to reclaim ephemeral storage - kubernetes

I am using Kubernetes v1.13.0. My master is also functioning as a worker-node, so it has workload pods running on it, apart from control plane pods.
The kubelet logs on my master show the following lines:
eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage
eviction_manager.go:358] eviction manager: pods ranked for eviction: kube-controller-manager-vm2_kube-system(1631c2c238e0c5117acac446b26d9f8c), kube-apiserver-vm2_kube-system(ce43eba098d219e13901c4a0b829f43b), etcd-vm2_kube-system(91ab2b0ddf4484a5ac6ee9661dbd0b1c)
Once the kube-apiserver pod is evicted, the cluster becomes unusable.
What can I do to fix this? Should I add more ephemeral storage? How would I go about doing that? That means adding more space to the root partition on my host?
My understanding is that ephemeral storage consists of /var/log and /var/lib/kubelet folders, which both come under the root partition.
A df -h on my host shows:
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 39G 33G 6.2G 85% /
So it looks like the root partition has lot of memory left, and there is no disk pressure. So what is causing this issue? Some of my worker pods must be doing something crazy with storage, but it's still 6G seems like plenty of room.
Will adding more space to the root partition fix this issue temporarily?
kubectl describe vm2 gives the following info:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 11 Jan 2019 21:25:43 +0000 Wed, 05 Dec 2018 19:16:41 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 11 Jan 2019 21:25:43 +0000 Fri, 11 Jan 2019 20:58:07 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 11 Jan 2019 21:25:43 +0000 Wed, 05 Dec 2018 19:16:41 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 11 Jan 2019 21:25:43 +0000 Thu, 06 Dec 2018 17:00:02 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Capacity:
cpu: 8
ephemeral-storage: 40593708Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32946816Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 37411161231
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32844416Ki
pods: 110
It seems to me that there was pressure on ephemeral-storage, and the eviction manager is trying to reclaim some storage by evicting least recently used pods. But it should not evict the control plane pods, otherwise cluster is unusable.
Currently, the Kubelet evicts the control plane pods. Then I try to manually start the apiserver and other control plane pods by adding and removing a space in the /etc/kubernetes/manifests files. This does start the apiserver, but then it again gets evicted. Ideally, the Kubelet should ensure that the static pods in /etc/kubernetes/manifests are always on and properly managed.
I am trying to understand what is going on here, and how to fix this issue, so that my kubernetes cluster becomes more robust, and I don't have to keep manually restarting the apiserver.

I had this same problem and solved it by changing the threshold for evictionHard.
Looking at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf I have:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
so I see my config file for kubelet is /var/lib/kubelet/config.yaml
Opening that I changed evitionHard settings to be (I think they were 10 or 15% before):
...
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%
...
There is also the --experimental-allocatable-ignore-eviction (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) setting which should completely disable eviction.

This is because of your kubelet config setting for eviction nodefs and imagefs % too high, set it lower, then the issues will be resolved:
Modify config in /var/lib/kubelet/config.yaml
Find out the section evction and set the percentage lower like this:
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%

Related

Kubernetes OOMKilled with multiple containers

I run my service in Kubernetes cluster (AWS EKS). Recently, I have added a new container (side car) to the pod. After that, I've started observing OOMKilled, but metrics do not show any high memory usage. This is the config:
Containers:
side-car:
Container ID: ...
Image: ...
...
State: Running
Started: Mon, 21 Feb 2022 09:11:07 +0100
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 17 Feb 2022 18:36:28 +0100
Finished: Mon, 21 Feb 2022 09:11:06 +0100
Ready: True
Restart Count: 1
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 1
memory: 2Gi
...
...
my-service:
Container ID: ...
...
...
...
State: Running
Started: Thu, 17 Feb 2022 18:36:28 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 3
memory: 3Gi
Requests:
cpu: 2
memory: 3Gi
Both side car and my service do have memory limits (and request) set. During OOMKilled none of the containers use more memory than requested/limited. E.g. in one case side-car was using 20MiB, my-service: 800MiB, way low than limits are. Still Kubernetes restarted the container (side-car).
Just for the record, before adding the side-car, my-service was running without problem and no OOMKilled was observed.
Maybe yous sidecar container do has some performance issues that you cant catch and at some point of time it do request more than limit?
Check OOM kill due to container limit reached
State: Running
Started: Thu, 10 Oct 2019 11:14:13 +0200
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
The Exit code 137 is important because it means that the system
terminated the container as it tried to use more memory than its
limit.
In order to monitor this, you always have to look at the use of memory
compared to the limit. Percentage of the node memory used by a pod is
usually a bad indicator as it gives no indication on how close to the
limit the memory usage is. In Kubernetes, limits are applied to
containers, not pods, so monitor the memory usage of a container vs.
the limit of that container.
Most likely you don't get to see when the memory usage goes above the limit, as usually the metrics are pulled at defined intervals (cAdvisor, which is currently the de-facto source for metrics, only refreshes its data every 10-15 seconds by default).
How to troubleshoot further? Connect to the respective node that's running the sidecar container and look at the kernel logs. You can use tail /var/log/syslog -f | grep -i kernel (a sample of how this looks like is in this movie). You should see 2 lines like the ones below, which will indicate the aftermath of the container's cgroup limit being breached, and the respective process terminated:
Jan 16 21:33:51 aks-agentpool-20086390-vmss00003K kernel: [ 8334.895437] Memory cgroup out of memory: Killed process 14300 (dotnet) total-vm:172050596kB, anon-rss:148368kB, file-rss:25056kB, shmem-rss:0kB, UID:0 pgtables:568kB oom_score_adj:-997
Jan 16 21:33:51 aks-agentpool-20086390-vmss00003K kernel: [ 8334.906653] oom_reaper: reaped process 14300 (dotnet), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Pay special attention to the anon-rss and file-rss values, and compare their sum against the limit you've set for the sidecar container.
If you have control over the code that runs in the sidecar container, then you can add some instrumentation code to print out the amount of memory used at small enough intervals, and simply output that to the console. Once the container is OOMKilled, you'll still have access to the logs to see what happened (use the --previous flag with the kubectl logs command). Have a look at this answer for more info.
Including this just for completeness: your system could potentially run so low on memory that somehow the OOM killer is invoked and your sidecar container is chosen to be terminated (such a scenario is described here). That's highly unlikely in your case though, as from my understanding you get the sidecar container to be terminated repeatedly, which most likely points to an issue with that container only.

Why Redis in K8S restarts all the time?

Redis pod restarts like crazy.
How can I find out the reason for this behavior?
I figured out, that the resources quota should be upgraded, but I have no clue what would be the best cpu/ram ratio. And why there are no crash events or logs?
Here are the pods:
> kubectl get pods
redis-master-5d9cfb54f8-8pbgq 1/1 Running 33 3d16h
Here are the logs:
> kubectl logs --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 07:02:12.152 # Server started, Redis version 2.8.19
[1] 08 Sep 07:02:12.153 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[1] 08 Sep 07:02:12.153 * The server is now ready to accept connections on port 6379
[1] 08 Sep 07:03:13.085 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:03:13.085 * Background saving started by pid 8
[8] 08 Sep 07:03:13.101 * DB saved on disk
[8] 08 Sep 07:03:13.101 * RDB: 0 MB of memory used by copy-on-write
[1] 08 Sep 07:03:13.185 * Background saving terminated with success
[1] 08 Sep 07:04:14.018 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:04:14.018 * Background saving started by pid 9
...
[93] 08 Sep 08:38:30.160 * DB saved on disk
[93] 08 Sep 08:38:30.164 * RDB: 2 MB of memory used by copy-on-write
[1] 08 Sep 08:38:30.259 * Background saving terminated with success
[1] 08 Sep 08:39:31.072 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 08:39:31.074 * Background saving started by pid 94
Here is previous logs of the same pod.
> kubectl logs --previous --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 09:41:46.057 * Background saving terminated with success
[1] 08 Sep 09:42:47.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:42:47.076 * Background saving started by pid 140
[140] 08 Sep 09:43:14.398 * DB saved on disk
[140] 08 Sep 09:43:14.457 * RDB: 1 MB of memory used by copy-on-write
[1] 08 Sep 09:43:14.556 * Background saving terminated with success
[1] 08 Sep 09:44:15.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:44:15.077 * Background saving started by pid 141
[1 | signal handler] (1599558267) Received SIGTERM scheduling shutdown...
[1] 08 Sep 09:44:28.052 # User requested shutdown...
[1] 08 Sep 09:44:28.052 # There is a child saving an .rdb. Killing it!
[1] 08 Sep 09:44:28.052 * Saving the final RDB snapshot before exiting.
[1] 08 Sep 09:44:49.592 * DB saved on disk
[1] 08 Sep 09:44:49.592 # Redis is now ready to exit, bye bye...
Here is the description of the pod. As you can see the limit is 100Mi, but I can't see the threshold, after which the pod restarts.
> kubectl describe pod redis-master-5d9cfb54f8-8pbgq
Name: redis-master-5d9cfb54f8-8pbgq
Namespace: cryptoman
Priority: 0
Node: gke-my-cluster-default-pool-818613a8-smmc/10.172.0.28
Start Time: Fri, 04 Sep 2020 18:52:17 +0300
Labels: app=redis
pod-template-hash=5d9cfb54f8
role=master
tier=backend
Annotations: <none>
Status: Running
IP: 10.36.2.124
IPs: <none>
Controlled By: ReplicaSet/redis-master-5d9cfb54f8
Containers:
master:
Container ID: docker://3479276666a41df502f1f9eb9bb2ff9cfa592f08a33e656e44179042b6233c6f
Image: k8s.gcr.io/redis:e2e
Image ID: docker-pullable://k8s.gcr.io/redis#sha256:f066bcf26497fbc55b9bf0769cb13a35c0afa2aa42e737cc46b7fb04b23a2f25
Port: 6379/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 09 Sep 2020 10:27:56 +0300
Last State: Terminated
Reason: OOMKilled
Exit Code: 0
Started: Wed, 09 Sep 2020 07:34:18 +0300
Finished: Wed, 09 Sep 2020 10:27:55 +0300
Ready: True
Restart Count: 42
Limits:
cpu: 100m
memory: 250Mi
Requests:
cpu: 100m
memory: 250Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5tds9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-5tds9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5tds9
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Pod sandbox changed, it will be killed and re-created.
Normal Killing 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Stopping container master
Normal Created 52m (x43 over 4d16h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Created container master
Normal Started 52m (x43 over 4d16h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Started container master
Normal Pulled 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Container image "k8s.gcr.io/redis:e2e" already present on machine
This is the limit after which it restarts. CPU is just throttled, memory is OOM'ed.
Limits:
cpu: 100m
memory: 250Mi
Reason: OOMKilled
Remove requests & limits
Run the pod, make sure it doesn't restart
If you already have prometheus, run VPA Recommender to check how much resources it needs. Or just use any monitoring stack: GKE Prometheus, prometheus-operator, DataDog etc to check actual resource consumption and adjust limits accordingly.
Max's answer is very complete. But if you don't have Prometheus installed or don't want to, there is another way simple to check actual resource consumption installing the metrics server project in your cluster. After installing it you can check the CPU and memory usage with kubectl top node to check consumption on the node, and kubectl top pod to check consumption on pods. I use it and is very useful.
Or you can just increase the CPU and memory limits, but you will not be able to ensure how much resources the container will need. Basically will be a waste of resources.
The main problem is that you didn't limit the redis application. So redis just increases the memory, and when it reaches Pod limits.memory 250Mb, it is killed by an OOM, and restart it.
Then, if you remove the limits.memory, the redis will continue eating memory until the Node has not enough memory to run other processes and K8s kills it and marks it as "Evicted".
Therefore, configure the memory in redis application to limit the memory used by redis inside redis.conf file and depending on your needs set a LRU or LFU policy to remove some keys (https://redis.io/topics/lru-cache):
maxmemory 256mb
maxmemory-policy allkeys-lfu
And limit the memory of the Pod at around the double of redis maxmemory to give the rest of the processes and objects saved in redis some margin:
resources:
limits:
cpu: 100m
memory: 512Mi
Now the pods are getting evicted.
Can I find out the reason?
NAME READY STATUS RESTARTS AGE
redis-master-7d97765bbb-7kjwn 0/1 Evicted 0 38h
redis-master-7d97765bbb-kmc9g 1/1 Running 0 30m
redis-master-7d97765bbb-sf2ss 0/1 Evicted 0 30m

Kubernetes: specify CPUs for cpumanager

Is it possible to specify CPU ID list to the Kubernetes cpumanager? The goal is to make sure pods get CPUs from a single socket (0). I brought all the CPUs on the peer socket offline as mentioned here, for example:
$ echo 0 > /sys/devices/system/cpu/cpu5/online
After doing this, the Kubernetes master indeed sees the remaining online CPUs
kubectl describe node foo
Capacity:
cpu: 56 <<< socket 0 CPU count
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 197524872Ki
pods: 110
Allocatable:
cpu: 54 <<< 2 system reserved CPUs
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 71490952Ki
pods: 110
System Info:
Machine ID: 1155420082478559980231ba5bc0f6f2
System UUID: 4C4C4544-0044-4210-8031-C8C04F584B32
Boot ID: 7fa18227-748f-496c-968c-9fc82e21ecd5
Kernel Version: 4.4.13
OS Image: Ubuntu 16.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.3
Kubelet Version: v1.11.1
Kube-Proxy Version: v1.11.1
However, cpumanager still seems to think there are 112 CPUs (socket0 + socket1).
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-111"}
As a result, the kubelet system pods are throwing the following error:
kube-system kube-proxy-nk7gc 0/1 rpc error: code = Unknown desc = failed to update container "eb455f81a61b877eccda0d35eea7834e30f59615346140180f08077f64896760": Error response from daemon: Requested CPUs are not available - requested 0-111, available: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110 762 36d <IP address> foo <none>
I was able to get this working. Posting this as an answer so that someone in need might benefit.
It appears the CPU set is read from /var/lib/kubelet/cpu_manager_state file and it is not updated across kubelet restarts. So this file needs to be removed before restarting kubelet.
The following worked for me:
# On a running worker node, bring desired CPUs offline. (run as root)
$ cpu_list=`lscpu | grep "NUMA node1 CPU(s)" | awk '{print $4}'`
$ chcpu -d $cpu_list
$ rm -f /var/lib/kubelet/cpu_manager_state
$ systemctl restart kubelet.service
# Check the CPU set seen by the CPU manager
$ cat /var/lib/kubelet/cpu_manager_state
# Try creating pods and check the syslog:
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122466 8070 state_mem.go:84] [cpumanager] updated default cpuset: "0,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122643 8070 policy_static.go:198] [cpumanager] allocateCPUs: returning "2,4,6,8,58,60,62,64"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122660 8070 state_mem.go:76] [cpumanager] updated desired cpuset (container id: 356939cdf32d0f719e83b0029a018a2ca2c349fc0bdc1004da5d842e357c503a, cpuset: "2,4,6,8,58,60,62,64")
I have reported a bug here as I think the CPU set should be updated after kubelet restarts.

Why the actual CPU utilization percentage exceeds Pod CPU limit in Kubernetes

I am running a few kubernetes pods in my a cluster (10 node). Each pod contains only one container which hosts one working process. I have specified the CPU "limits" and "requests" for the container . The following is a description of one pod that is running on a node (crypt12).
Name: alexnet-worker-6-9954df99c-p7tx5
Namespace: default
Node: crypt12/172.16.28.136
Start Time: Sun, 15 Jul 2018 22:26:57 -0400
Labels: job=worker
name=alexnet
pod-template-hash=551089557
task=6
Annotations: <none>
Status: Running
IP: 10.38.0.1
Controlled By: ReplicaSet/alexnet-worker-6-9954df99c
Containers:
alexnet-v1-container:
Container ID: docker://214e30e87ed4a7240e13e764200a260a883ea4550a1b5d09d29ed827e7b57074
Image: alexnet-tf150-py3:v1
Image ID: docker://sha256:4f18b4c45a07d639643d7aa61b06bfee1235637a50df30661466688ab2fd4e6d
Port: 5000/TCP
Host Port: 0/TCP
Command:
/usr/bin/python3
cifar10_distributed.py
Args:
--data_dir=xxxx
State: Running
Started: Sun, 15 Jul 2018 22:26:59 -0400
Ready: True
Restart Count: 0
Limits:
cpu: 800m
memory: 6G
Requests:
cpu: 800m
memory: 6G
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hfnlp (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-hfnlp:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hfnlp
Optional: false
QoS Class: Guaranteed
Node-Selectors: kubernetes.io/hostname=crypt12
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
The following is the output when I run "kubectl describle node crypt12"
Name: crypt12
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=crypt12
Annotations: kubeadm.alpha.kubernetes.io/cri-socket=/var/run/dockershim.sock
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Wed, 11 Jul 2018 23:07:41 -0400
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:22 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:22 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:22 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:22 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:42 -0400 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 172.16.28.136
Hostname: crypt12
Capacity:
cpu: 8
ephemeral-storage: 144937600Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8161308Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 133574491939
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8058908Ki
pods: 110
System Info:
Machine ID: f0444e00ba2ed20e5314e6bc5b0f0f60
System UUID: 37353035-3836-5355-4530-32394E44414D
Boot ID: cf2a9daf-c959-4c7e-be61-5e44a44670c4
Kernel Version: 4.4.0-87-generic
OS Image: Ubuntu 16.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.1
Kubelet Version: v1.11.0
Kube-Proxy Version: v1.11.0
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default alexnet-worker-6-9954df99c-p7tx5 800m (10%) 800m (10%) 6G (72%) 6G (72%)
kube-system kube-proxy-7kdkd 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-dpclj 20m (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 820m (10%) 800m (10%)
memory 6G (72%) 6G (72%)
Events: <none>
As showed, in the node description ("Non-terminated Pods" section), the CPU limits is 10%. However, when I run "ps" or "top" command on the node(crypt12), the CPU utilization of the working process exceeds 10% (about 20%). Why this happened? Could anyone shed light on this?
UPDATED:
I found a github issue discussion where I found the answer of my question: the cpu percentage from "kubectl describe node" is "CPU-limits/# of Cores". Since I set CPU-limit to 0.8, 10% is the result of 0.8/8.
I found a github issue discussion where I found the answer of my question: the cpu percentage from "kubectl describe node" is "CPU-limits/# of Cores". Since I set CPU-limit to 0.8, 10% is the result of 0.8/8.
Here is link: https://github.com/kubernetes/kubernetes/issues/24925
Firstly, by default, Top shows percentage utilisation per core. so with 8 cores you can have 800% ultilisation.
If you're reading the top statistics right then it might have something to do with fact that your node is running more processes than just your pod. Think kube-proxy, kubelet and any other controllers. GKE also runs a dashboard and calls the api for statistics.
Also note that resources are calculated per 100ms. A container can spike above the 10 percent utilisation, but on average never use more than allowed within this duration.
In the official documentation it reads:
The spec.containers[].resources.limits.cpu is converted to its millicore value and multiplied by 100. The resulting value is the total amount of CPU time that a container can use every 100ms. A container cannot use more than its share of CPU time during this interval.

kubernetes node not responding after restart

I have a kubernetes cluster with one master and four nodes. kube-proxy was working fine on all four nodes, and I could access services on any of the nodes irrespective of where it was running; ie. http://node1:30000 through http://node4:30000 was giving the same response.
After restarting node4 by running shutdown -r now, it came back up, but I noticed that the node was no longer responding to requests. I am running the following command:
curl http://node4:30000
If I run it from my PC, or from any other node in the cluster -- node1 through node3, or master -- I get:
curl: (7) Failed to connect to node4 port 30000: Connection timed out
However, if I run it from node4, it responds successfully. This leads me to believe that kube-proxy is running fine, but something is preventing external connections.
When I run kubectl describe node node4, my output looks normal:
Name: node4
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=node4
Taints: <none>
CreationTimestamp: Tue, 21 Feb 2017 15:21:17 -0400
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:28 -0400 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses: 10.6.81.64,10.6.81.64,node4
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 2
memory: 4028748Ki
pods: 110
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 2
memory: 4028748Ki
pods: 110
System Info:
Machine ID: dbc0bb6ba10acae66b1061f958220ade
System UUID: 4229186F-AA5C-59CE-E5A2-258C1BBE9D2C
Boot ID: a3968e6c-eba3-498c-957f-f29283af1cff
Kernel Version: 4.4.0-63-generic
OS Image: Ubuntu 16.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.0
Kubelet Version: v1.5.2
Kube-Proxy Version: v1.5.2
ExternalID: node4
Non-terminated Pods: (27 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
<< application pods listed here >>
kube-system kube-proxy-0p3lj 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-uqmr1 20m (1%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
20m (1%) 0 (0%) 0 (0%) 0 (0%)
Is there anything specific I need to do to bring a node back online after a system restart?
My team was able to solve this one by downgrading docker to 1.12. It appears that the problem is related to this issue:
https://github.com/kubernetes/kubernetes/issues/40182
After downgrading docker to 1.12, everything is working now.