Why Redis in K8S restarts all the time? - kubernetes

Redis pod restarts like crazy.
How can I find out the reason for this behavior?
I figured out, that the resources quota should be upgraded, but I have no clue what would be the best cpu/ram ratio. And why there are no crash events or logs?
Here are the pods:
> kubectl get pods
redis-master-5d9cfb54f8-8pbgq 1/1 Running 33 3d16h
Here are the logs:
> kubectl logs --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 07:02:12.152 # Server started, Redis version 2.8.19
[1] 08 Sep 07:02:12.153 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[1] 08 Sep 07:02:12.153 * The server is now ready to accept connections on port 6379
[1] 08 Sep 07:03:13.085 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:03:13.085 * Background saving started by pid 8
[8] 08 Sep 07:03:13.101 * DB saved on disk
[8] 08 Sep 07:03:13.101 * RDB: 0 MB of memory used by copy-on-write
[1] 08 Sep 07:03:13.185 * Background saving terminated with success
[1] 08 Sep 07:04:14.018 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:04:14.018 * Background saving started by pid 9
...
[93] 08 Sep 08:38:30.160 * DB saved on disk
[93] 08 Sep 08:38:30.164 * RDB: 2 MB of memory used by copy-on-write
[1] 08 Sep 08:38:30.259 * Background saving terminated with success
[1] 08 Sep 08:39:31.072 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 08:39:31.074 * Background saving started by pid 94
Here is previous logs of the same pod.
> kubectl logs --previous --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 09:41:46.057 * Background saving terminated with success
[1] 08 Sep 09:42:47.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:42:47.076 * Background saving started by pid 140
[140] 08 Sep 09:43:14.398 * DB saved on disk
[140] 08 Sep 09:43:14.457 * RDB: 1 MB of memory used by copy-on-write
[1] 08 Sep 09:43:14.556 * Background saving terminated with success
[1] 08 Sep 09:44:15.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:44:15.077 * Background saving started by pid 141
[1 | signal handler] (1599558267) Received SIGTERM scheduling shutdown...
[1] 08 Sep 09:44:28.052 # User requested shutdown...
[1] 08 Sep 09:44:28.052 # There is a child saving an .rdb. Killing it!
[1] 08 Sep 09:44:28.052 * Saving the final RDB snapshot before exiting.
[1] 08 Sep 09:44:49.592 * DB saved on disk
[1] 08 Sep 09:44:49.592 # Redis is now ready to exit, bye bye...
Here is the description of the pod. As you can see the limit is 100Mi, but I can't see the threshold, after which the pod restarts.
> kubectl describe pod redis-master-5d9cfb54f8-8pbgq
Name: redis-master-5d9cfb54f8-8pbgq
Namespace: cryptoman
Priority: 0
Node: gke-my-cluster-default-pool-818613a8-smmc/10.172.0.28
Start Time: Fri, 04 Sep 2020 18:52:17 +0300
Labels: app=redis
pod-template-hash=5d9cfb54f8
role=master
tier=backend
Annotations: <none>
Status: Running
IP: 10.36.2.124
IPs: <none>
Controlled By: ReplicaSet/redis-master-5d9cfb54f8
Containers:
master:
Container ID: docker://3479276666a41df502f1f9eb9bb2ff9cfa592f08a33e656e44179042b6233c6f
Image: k8s.gcr.io/redis:e2e
Image ID: docker-pullable://k8s.gcr.io/redis#sha256:f066bcf26497fbc55b9bf0769cb13a35c0afa2aa42e737cc46b7fb04b23a2f25
Port: 6379/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 09 Sep 2020 10:27:56 +0300
Last State: Terminated
Reason: OOMKilled
Exit Code: 0
Started: Wed, 09 Sep 2020 07:34:18 +0300
Finished: Wed, 09 Sep 2020 10:27:55 +0300
Ready: True
Restart Count: 42
Limits:
cpu: 100m
memory: 250Mi
Requests:
cpu: 100m
memory: 250Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5tds9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-5tds9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5tds9
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Pod sandbox changed, it will be killed and re-created.
Normal Killing 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Stopping container master
Normal Created 52m (x43 over 4d16h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Created container master
Normal Started 52m (x43 over 4d16h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Started container master
Normal Pulled 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Container image "k8s.gcr.io/redis:e2e" already present on machine

This is the limit after which it restarts. CPU is just throttled, memory is OOM'ed.
Limits:
cpu: 100m
memory: 250Mi
Reason: OOMKilled
Remove requests & limits
Run the pod, make sure it doesn't restart
If you already have prometheus, run VPA Recommender to check how much resources it needs. Or just use any monitoring stack: GKE Prometheus, prometheus-operator, DataDog etc to check actual resource consumption and adjust limits accordingly.

Max's answer is very complete. But if you don't have Prometheus installed or don't want to, there is another way simple to check actual resource consumption installing the metrics server project in your cluster. After installing it you can check the CPU and memory usage with kubectl top node to check consumption on the node, and kubectl top pod to check consumption on pods. I use it and is very useful.
Or you can just increase the CPU and memory limits, but you will not be able to ensure how much resources the container will need. Basically will be a waste of resources.

The main problem is that you didn't limit the redis application. So redis just increases the memory, and when it reaches Pod limits.memory 250Mb, it is killed by an OOM, and restart it.
Then, if you remove the limits.memory, the redis will continue eating memory until the Node has not enough memory to run other processes and K8s kills it and marks it as "Evicted".
Therefore, configure the memory in redis application to limit the memory used by redis inside redis.conf file and depending on your needs set a LRU or LFU policy to remove some keys (https://redis.io/topics/lru-cache):
maxmemory 256mb
maxmemory-policy allkeys-lfu
And limit the memory of the Pod at around the double of redis maxmemory to give the rest of the processes and objects saved in redis some margin:
resources:
limits:
cpu: 100m
memory: 512Mi

Now the pods are getting evicted.
Can I find out the reason?
NAME READY STATUS RESTARTS AGE
redis-master-7d97765bbb-7kjwn 0/1 Evicted 0 38h
redis-master-7d97765bbb-kmc9g 1/1 Running 0 30m
redis-master-7d97765bbb-sf2ss 0/1 Evicted 0 30m

Related

Using GPU with Kubernetes GKE and Node auto-provisioning

I try to do something fairly simple: To run a GPU machine in a k8s cluster using auto-provisioning. When deploying the Pod with a limits: nvidia.com/gpu specification the auto-provisioning is correctly creating a node-pool and scaling up an appropriate node. However, the Pod stays at Pending with the following message:
Warning FailedScheduling 59s (x5 over 2m46s) default-scheduler 0/10 nodes are available: 10 Insufficient nvidia.com/gpu.
It seems like taints and tolerations are added correctly by gke. It just doesnt scale up.
Ive followed the instructions here:
https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers
To reproduce:
Create a new cluster in a zone with auto-provisioning that includes gpu (I have replaced my own project name with MYPROJECT). This command is what comes out of the console when these changes are done:
gcloud beta container --project "MYPROJECT" clusters create "cluster-2" --zone "europe-west4-a" --no-enable-basic-auth --cluster-version "1.18.12-gke.1210" --release-channel "regular" --machine-type "e2-medium" --image-type "COS" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes "1" --enable-stackdriver-kubernetes --enable-ip-alias --network "projects/MYPROJECT/global/networks/default" --subnetwork "projects/MYPROJECT/regions/europe-west4/subnetworks/default" --default-max-pods-per-node "110" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-autoprovisioning --min-cpu 1 --max-cpu 20 --min-memory 1 --max-memory 50 --max-accelerator type="nvidia-tesla-p100",count=1 --enable-autoprovisioning-autorepair --enable-autoprovisioning-autoupgrade --autoprovisioning-max-surge-upgrade 1 --autoprovisioning-max-unavailable-upgrade 0 --enable-vertical-pod-autoscaling --enable-shielded-nodes --node-locations "europe-west4-a"
Install NVIDIA drivers by installing DaemonSet:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
Deploy pod that requests GPU:
my-gpu-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: my-gpu-pod
spec:
containers:
- name: my-gpu-container
image: nvidia/cuda:11.0-runtime-ubuntu18.04
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 600; done;"]
resources:
limits:
nvidia.com/gpu: 1
kubectl apply -f my-gpu-pod.yaml
Help would be really appreciated as Ive spent quite some time on this now :)
Edit: Here is the running Pod and Node specifications (the node that was auto-scaled):
Name: my-gpu-pod
Namespace: default
Priority: 0
Node: <none>
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
my-gpu-container:
Image: nvidia/cuda:11.0-runtime-ubuntu18.04
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
--
Args:
while true; do sleep 600; done;
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9rvjz (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-9rvjz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9rvjz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 11m cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added):
Warning FailedScheduling 5m54s (x6 over 11m) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Warning FailedScheduling 54s (x7 over 5m37s) default-scheduler 0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
Name: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=n1-standard-1
beta.kubernetes.io/os=linux
cloud.google.com/gke-accelerator=nvidia-tesla-p100
cloud.google.com/gke-boot-disk=pd-standard
cloud.google.com/gke-nodepool=nap-n1-standard-1-gpu1-18jc7z9w
cloud.google.com/gke-os-distribution=cos
cloud.google.com/machine-family=n1
failure-domain.beta.kubernetes.io/region=europe-west4
failure-domain.beta.kubernetes.io/zone=europe-west4-a
kubernetes.io/arch=amd64
kubernetes.io/hostname=gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
kubernetes.io/os=linux
node.kubernetes.io/instance-type=n1-standard-1
topology.gke.io/zone=europe-west4-a
topology.kubernetes.io/region=europe-west4
topology.kubernetes.io/zone=europe-west4-a
Annotations: container.googleapis.com/instance_id: 7877226485154959129
csi.volume.kubernetes.io/nodeid:
{"pd.csi.storage.gke.io":"projects/exor-arctic/zones/europe-west4-a/instances/gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2"}
node.alpha.kubernetes.io/ttl: 0
node.gke.io/last-applied-node-labels:
cloud.google.com/gke-accelerator=nvidia-tesla-p100,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-nodepool=nap-n1-standar...
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 22 Mar 2021 11:32:17 +0100
Taints: nvidia.com/gpu=present:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
AcquireTime: <unset>
RenewTime: Mon, 22 Mar 2021 11:38:58 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
KernelDeadlock False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 KernelHasNoDeadlock kernel has no deadlock
ReadonlyFilesystem False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 FilesystemIsNotReadOnly Filesystem is not read-only
CorruptDockerOverlay2 False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoCorruptDockerOverlay2 docker overlay2 is functioning properly
FrequentUnregisterNetDevice False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentUnregisterNetDevice node is functioning properly
FrequentKubeletRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentKubeletRestart kubelet is functioning properly
FrequentDockerRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentDockerRestart docker is functioning properly
FrequentContainerdRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentContainerdRestart containerd is functioning properly
NetworkUnavailable False Mon, 22 Mar 2021 11:32:18 +0100 Mon, 22 Mar 2021 11:32:18 +0100 RouteCreated NodeController create implicit route
MemoryPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:19 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.164.0.16
ExternalIP: 35.204.55.105
InternalDNS: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2.c.exor-arctic.internal
Hostname: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2.c.exor-arctic.internal
Capacity:
attachable-volumes-gce-pd: 127
cpu: 1
ephemeral-storage: 98868448Ki
hugepages-2Mi: 0
memory: 3776196Ki
pods: 110
Allocatable:
attachable-volumes-gce-pd: 127
cpu: 940m
ephemeral-storage: 47093746742
hugepages-2Mi: 0
memory: 2690756Ki
pods: 110
System Info:
Machine ID: 307671eefc01914a7bfacf17a48e087e
System UUID: 307671ee-fc01-914a-7bfa-cf17a48e087e
Boot ID: acd58f3b-1659-494c-b83d-427f834d23a6
Kernel Version: 5.4.49+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.9
Kubelet Version: v1.18.12-gke.1210
Kube-Proxy Version: v1.18.12-gke.1210
PodCIDR: 10.100.1.0/24
PodCIDRs: 10.100.1.0/24
ProviderID: gce://exor-arctic/europe-west4-a/gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system fluentbit-gke-k22gv 100m (10%) 0 (0%) 200Mi (7%) 500Mi (19%) 6m46s
kube-system gke-metrics-agent-5fblx 3m (0%) 0 (0%) 50Mi (1%) 50Mi (1%) 6m47s
kube-system kube-proxy-gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 100m (10%) 0 (0%) 0 (0%) 0 (0%) 6m44s
kube-system nvidia-driver-installer-vmw8r 150m (15%) 0 (0%) 0 (0%) 0 (0%) 6m45s
kube-system nvidia-gpu-device-plugin-8vqsl 50m (5%) 50m (5%) 10Mi (0%) 10Mi (0%) 6m45s
kube-system pdcsi-node-k9brg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6m47s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 403m (42%) 50m (5%)
memory 260Mi (9%) 560Mi (21%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-gce-pd 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 6m47s kubelet Starting kubelet.
Normal NodeAllocatableEnforced 6m47s kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasSufficientPID
Normal NodeReady 6m45s kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeReady
Normal Starting 6m44s kube-proxy Starting kube-proxy.
Warning NodeSysctlChange 6m41s sysctl-monitor
Warning ContainerdStart 6m41s systemd-monitor Starting containerd container runtime...
Warning DockerStart 6m41s (x2 over 6m41s) systemd-monitor Starting Docker Application Container Engine...
Warning KubeletStart 6m41s systemd-monitor Started Kubernetes kubelet.
As per the Kubernetes Documentation https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#nvidia-gpu-device-plugin-used-by-gce, we are supposed to use https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/daemonset.yaml.
So can you run
kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/daemonset.yaml
A common error related with GKE is about project Quotas limiting resources, this could lead to the nodes not auto-provisioning or scaling up due to not being able to assign the resources.
Maybe your project Quotas for GPU (or specifically for nvidia-tesla-p100) are set to 0 or to a number way below to the requested one.
In this link there's more information about how to check it and how to request more resources for your quota.
Also, I see that you're making use of shared-core E2 instances, which are not compatible with accelerators. It shouldn't be an issue as GKE should automatically change the machine type to N1 if it detects the workload contains a GPU, as seen in this link, but still maybe attempt to run the cluster with other machine types such as N1.
You might be having a problem with the scopes.
When using node auto-provisioning with GPUs, the auto-provisioned node pools by default do not have sufficient scopes to run the installation DaemonSet. You need to manually change the default autoprovisioning scopes to enable that.
In this case the documented scopes that are required at the time of writting are:
[ "https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/compute"
]
this article mentions this very issue: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#using_node_auto-provisioning_with_gpus
you might just have to expand them and retry. Manually it works because you have the necessary scopes.

Kubernetes eviction manager evicting control plane pods to reclaim ephemeral storage

I am using Kubernetes v1.13.0. My master is also functioning as a worker-node, so it has workload pods running on it, apart from control plane pods.
The kubelet logs on my master show the following lines:
eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage
eviction_manager.go:358] eviction manager: pods ranked for eviction: kube-controller-manager-vm2_kube-system(1631c2c238e0c5117acac446b26d9f8c), kube-apiserver-vm2_kube-system(ce43eba098d219e13901c4a0b829f43b), etcd-vm2_kube-system(91ab2b0ddf4484a5ac6ee9661dbd0b1c)
Once the kube-apiserver pod is evicted, the cluster becomes unusable.
What can I do to fix this? Should I add more ephemeral storage? How would I go about doing that? That means adding more space to the root partition on my host?
My understanding is that ephemeral storage consists of /var/log and /var/lib/kubelet folders, which both come under the root partition.
A df -h on my host shows:
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 39G 33G 6.2G 85% /
So it looks like the root partition has lot of memory left, and there is no disk pressure. So what is causing this issue? Some of my worker pods must be doing something crazy with storage, but it's still 6G seems like plenty of room.
Will adding more space to the root partition fix this issue temporarily?
kubectl describe vm2 gives the following info:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 11 Jan 2019 21:25:43 +0000 Wed, 05 Dec 2018 19:16:41 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 11 Jan 2019 21:25:43 +0000 Fri, 11 Jan 2019 20:58:07 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 11 Jan 2019 21:25:43 +0000 Wed, 05 Dec 2018 19:16:41 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 11 Jan 2019 21:25:43 +0000 Thu, 06 Dec 2018 17:00:02 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Capacity:
cpu: 8
ephemeral-storage: 40593708Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32946816Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 37411161231
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32844416Ki
pods: 110
It seems to me that there was pressure on ephemeral-storage, and the eviction manager is trying to reclaim some storage by evicting least recently used pods. But it should not evict the control plane pods, otherwise cluster is unusable.
Currently, the Kubelet evicts the control plane pods. Then I try to manually start the apiserver and other control plane pods by adding and removing a space in the /etc/kubernetes/manifests files. This does start the apiserver, but then it again gets evicted. Ideally, the Kubelet should ensure that the static pods in /etc/kubernetes/manifests are always on and properly managed.
I am trying to understand what is going on here, and how to fix this issue, so that my kubernetes cluster becomes more robust, and I don't have to keep manually restarting the apiserver.
I had this same problem and solved it by changing the threshold for evictionHard.
Looking at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf I have:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
so I see my config file for kubelet is /var/lib/kubelet/config.yaml
Opening that I changed evitionHard settings to be (I think they were 10 or 15% before):
...
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%
...
There is also the --experimental-allocatable-ignore-eviction (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) setting which should completely disable eviction.
This is because of your kubelet config setting for eviction nodefs and imagefs % too high, set it lower, then the issues will be resolved:
Modify config in /var/lib/kubelet/config.yaml
Find out the section evction and set the percentage lower like this:
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%

Why the actual CPU utilization percentage exceeds Pod CPU limit in Kubernetes

I am running a few kubernetes pods in my a cluster (10 node). Each pod contains only one container which hosts one working process. I have specified the CPU "limits" and "requests" for the container . The following is a description of one pod that is running on a node (crypt12).
Name: alexnet-worker-6-9954df99c-p7tx5
Namespace: default
Node: crypt12/172.16.28.136
Start Time: Sun, 15 Jul 2018 22:26:57 -0400
Labels: job=worker
name=alexnet
pod-template-hash=551089557
task=6
Annotations: <none>
Status: Running
IP: 10.38.0.1
Controlled By: ReplicaSet/alexnet-worker-6-9954df99c
Containers:
alexnet-v1-container:
Container ID: docker://214e30e87ed4a7240e13e764200a260a883ea4550a1b5d09d29ed827e7b57074
Image: alexnet-tf150-py3:v1
Image ID: docker://sha256:4f18b4c45a07d639643d7aa61b06bfee1235637a50df30661466688ab2fd4e6d
Port: 5000/TCP
Host Port: 0/TCP
Command:
/usr/bin/python3
cifar10_distributed.py
Args:
--data_dir=xxxx
State: Running
Started: Sun, 15 Jul 2018 22:26:59 -0400
Ready: True
Restart Count: 0
Limits:
cpu: 800m
memory: 6G
Requests:
cpu: 800m
memory: 6G
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hfnlp (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-hfnlp:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hfnlp
Optional: false
QoS Class: Guaranteed
Node-Selectors: kubernetes.io/hostname=crypt12
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
The following is the output when I run "kubectl describle node crypt12"
Name: crypt12
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=crypt12
Annotations: kubeadm.alpha.kubernetes.io/cri-socket=/var/run/dockershim.sock
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Wed, 11 Jul 2018 23:07:41 -0400
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:22 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:22 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:22 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:22 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 16 Jul 2018 16:25:43 -0400 Wed, 11 Jul 2018 22:57:42 -0400 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 172.16.28.136
Hostname: crypt12
Capacity:
cpu: 8
ephemeral-storage: 144937600Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8161308Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 133574491939
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8058908Ki
pods: 110
System Info:
Machine ID: f0444e00ba2ed20e5314e6bc5b0f0f60
System UUID: 37353035-3836-5355-4530-32394E44414D
Boot ID: cf2a9daf-c959-4c7e-be61-5e44a44670c4
Kernel Version: 4.4.0-87-generic
OS Image: Ubuntu 16.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.1
Kubelet Version: v1.11.0
Kube-Proxy Version: v1.11.0
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default alexnet-worker-6-9954df99c-p7tx5 800m (10%) 800m (10%) 6G (72%) 6G (72%)
kube-system kube-proxy-7kdkd 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-dpclj 20m (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 820m (10%) 800m (10%)
memory 6G (72%) 6G (72%)
Events: <none>
As showed, in the node description ("Non-terminated Pods" section), the CPU limits is 10%. However, when I run "ps" or "top" command on the node(crypt12), the CPU utilization of the working process exceeds 10% (about 20%). Why this happened? Could anyone shed light on this?
UPDATED:
I found a github issue discussion where I found the answer of my question: the cpu percentage from "kubectl describe node" is "CPU-limits/# of Cores". Since I set CPU-limit to 0.8, 10% is the result of 0.8/8.
I found a github issue discussion where I found the answer of my question: the cpu percentage from "kubectl describe node" is "CPU-limits/# of Cores". Since I set CPU-limit to 0.8, 10% is the result of 0.8/8.
Here is link: https://github.com/kubernetes/kubernetes/issues/24925
Firstly, by default, Top shows percentage utilisation per core. so with 8 cores you can have 800% ultilisation.
If you're reading the top statistics right then it might have something to do with fact that your node is running more processes than just your pod. Think kube-proxy, kubelet and any other controllers. GKE also runs a dashboard and calls the api for statistics.
Also note that resources are calculated per 100ms. A container can spike above the 10 percent utilisation, but on average never use more than allowed within this duration.
In the official documentation it reads:
The spec.containers[].resources.limits.cpu is converted to its millicore value and multiplied by 100. The resulting value is the total amount of CPU time that a container can use every 100ms. A container cannot use more than its share of CPU time during this interval.

Kubernetes init containers run every hour

I have recently set up redis via https://github.com/tarosky/k8s-redis-ha, this repo includes an init container, and I have included an extra init container in order to get passwords etc set up.
I am seeing some strange (and it seems undocumented) behavior, whereby the init containers run as expected before the redis container starts, however then they run subsequently every hour, close to an hour. I have tested this behavior using a busybox init container (which does nothing) on deployments & statefulset and experience the same behavior, so it is not specific to this redis pod.
I have tested this on bare metal with k8s 1.6 and 1.8 with the same results, however when applying init containers to GKE (k8s 1.7) this behavior does not happen. I can't see any flags for GKE's kubelet to dictate this behavior.
See below for kubectl describe pod showing that the init containers are run when the main pod has not exited/crashed.
Name: redis-sentinel-1
Namespace: (redacted)
Node: (redacted)/(redacted)
Start Time: Mon, 12 Mar 2018 06:20:55 +0000
Labels: app=redis-sentinel
controller-revision-hash=redis-sentinel-7cc557cf7c
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"StatefulSet","namespace":"(redacted)","name":"redis-sentinel","uid":"759a3a3b-25bd-11e8-a8ce-0242ac110...
security.alpha.kubernetes.io/unsafe-sysctls=net.core.somaxconn=1024
Status: Running
IP: (redacted)
Controllers: StatefulSet/redis-sentinel
Init Containers:
redis-ha-server:
Container ID: docker://557d777a7c660b062662426ebe9bbf6f9725fb9d88f89615a8881346587c1835
Image: tarosky/k8s-redis-ha:sentinel-3.0.1
Image ID: docker-pullable://tarosky/k8s-redis-ha#sha256:98e09ef5fbea5bfd2eb1858775c967fa86a92df48e2ec5d0b405f7ca3f5ada1c
Port:
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 13 Mar 2018 03:01:12 +0000
Finished: Tue, 13 Mar 2018 03:01:12 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/opt from opt (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hkj6d (ro)
-redis-init:
Container ID: docker://18c4e353233a6827999ae4a16adf1f408754a21d80a8e3374750fdf9b54f9b1a
Image: gcr.io/(redacted)/redis-init
Image ID: docker-pullable://gcr.io/(redacted)/redis-init#sha256:42042093d58aa597cce4397148a2f1c7967db689256ed4cc8d9f42b34d53aca2
Port:
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 13 Mar 2018 03:01:25 +0000
Finished: Tue, 13 Mar 2018 03:01:25 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/opt from opt (rw)
/secrets/redis-password from redis-password (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hkj6d (ro)
Containers:
redis-sentinel:
Container ID: docker://a54048cbb7ec535c841022c543a0d566c9327f37ede3a6232516721f0e37404d
Image: redis:3.2
Image ID: docker-pullable://redis#sha256:474fb41b08bcebc933c6337a7db1dc7131380ee29b7a1b64a7ab71dad03ad718
Port: 26379/TCP
Command:
/opt/bin/k8s-redis-ha-sentinel
Args:
/opt/sentinel.conf
State: Running
Started: Mon, 12 Mar 2018 06:21:02 +0000
Ready: True
Restart Count: 0
Readiness: exec [redis-cli -p 26379 info server] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SERVICE: redis-server
SERVICE_PORT: redis-server
Mounts:
/opt from opt (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hkj6d (ro)
redis-sword:
Container ID: docker://50279448bbbf175b6f56f96dab59061c4652c2117452ed15b3a5380681c7176f
Image: tarosky/k8s-redis-ha:sword-3.0.1
Image ID: docker-pullable://tarosky/k8s-redis-ha#sha256:2315c7a47d9e47043d030da270c9a1252c2cfe29c6e381c8f50ca41d3065db6d
Port:
State: Running
Started: Mon, 12 Mar 2018 06:21:03 +0000
Ready: True
Restart Count: 0
Environment:
SERVICE: redis-server
SERVICE_PORT: redis-server
SENTINEL: redis-sentinel
SENTINEL_PORT: redis-sentinel
Mounts:
/opt from opt (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hkj6d (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
opt:
Type: HostPath (bare host directory volume)
Path: /store/redis-sentinel/opt
redis-password:
Type: Secret (a volume populated by a Secret)
SecretName: redis-password
Optional: false
default-token-hkj6d:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hkj6d
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
20h 30m 21 kubelet, 10.1.3.102 spec.initContainers{redis-ha-server} Normal Pulling pulling image "tarosky/k8s-redis-ha:sentinel-3.0.1"
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-ha-server} Normal Started Started container
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-ha-server} Normal Created Created container
20h 30m 21 kubelet, 10.1.3.102 spec.initContainers{redis-ha-server} Normal Pulled Successfully pulled image "tarosky/k8s-redis-ha:sentinel-3.0.1"
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-init} Normal Pulling pulling image "gcr.io/(redacted)/redis-init"
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-init} Normal Pulled Successfully pulled image "gcr.io/(redacted)/redis-init"
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-init} Normal Created Created container
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-init} Normal Started Started container
Note the Containers in the pod which started at Mon, 12 Mar 2018 06:21:02 +0000 (with 0 restarts) and the Init Containers which started from Tue, 13 Mar 2018 03:01:12 +0000. These seem to re-run every hour pretty much in an interval of hour.
Our bare metal must be misconfigured for init containers somewhere? Can anyone shed any light on this strange behavior?
If you are pruning away exited containers, then the container pruning/removal is a likely cause. In my testing, it appears that exited init containers which are removed from Docker Engine (hourly, or otherwise), such as with "docker system prune -f" will cause Kubernetes to re-launch the init containers. Is this the issue in your case, if this is still persisting?
Also, see https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/ for Kubelet garbage collection documentation, which appears to support these types of tasks (rather than needing to implement it yourself)

kubernetes node not responding after restart

I have a kubernetes cluster with one master and four nodes. kube-proxy was working fine on all four nodes, and I could access services on any of the nodes irrespective of where it was running; ie. http://node1:30000 through http://node4:30000 was giving the same response.
After restarting node4 by running shutdown -r now, it came back up, but I noticed that the node was no longer responding to requests. I am running the following command:
curl http://node4:30000
If I run it from my PC, or from any other node in the cluster -- node1 through node3, or master -- I get:
curl: (7) Failed to connect to node4 port 30000: Connection timed out
However, if I run it from node4, it responds successfully. This leads me to believe that kube-proxy is running fine, but something is preventing external connections.
When I run kubectl describe node node4, my output looks normal:
Name: node4
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=node4
Taints: <none>
CreationTimestamp: Tue, 21 Feb 2017 15:21:17 -0400
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:28 -0400 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses: 10.6.81.64,10.6.81.64,node4
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 2
memory: 4028748Ki
pods: 110
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 2
memory: 4028748Ki
pods: 110
System Info:
Machine ID: dbc0bb6ba10acae66b1061f958220ade
System UUID: 4229186F-AA5C-59CE-E5A2-258C1BBE9D2C
Boot ID: a3968e6c-eba3-498c-957f-f29283af1cff
Kernel Version: 4.4.0-63-generic
OS Image: Ubuntu 16.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.0
Kubelet Version: v1.5.2
Kube-Proxy Version: v1.5.2
ExternalID: node4
Non-terminated Pods: (27 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
<< application pods listed here >>
kube-system kube-proxy-0p3lj 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-uqmr1 20m (1%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
20m (1%) 0 (0%) 0 (0%) 0 (0%)
Is there anything specific I need to do to bring a node back online after a system restart?
My team was able to solve this one by downgrading docker to 1.12. It appears that the problem is related to this issue:
https://github.com/kubernetes/kubernetes/issues/40182
After downgrading docker to 1.12, everything is working now.