I have Kubernetes cluster running on a VM. A truncated overview of the mounts is:
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G 4.5G 15G 24% /
/dev/mapper/vg001-lv--docker 140G 33G 108G 23% /var/lib/docker
As you can see, I added an extra disk to store the docker images and its volumes. However, when querying the node's capacity, the following is returned
Capacity:
cpu: 12
ephemeral-storage: 20145724Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 65831264Ki
nvidia.com/gpu: 1
pods: 110
ephemeral-storage is 20145724Ki which is 20G, referring to the disk mounted at /.
How does Kubelet calculate its ephemeral-storage? Is it simply looking at the disk space available at /? Or is it looking at another folder like /var/log/containers?
This is a similar post where the user eventually succumbed to increasing the disk mounted at /.
Some theory
By default Capacity and Allocatable for ephemeral-storage in standard kubernetes environment is sourced from filesystem (mounted to /var/lib/kubelet).
This is the default location for kubelet directory.
The kubelet supports the following filesystem partitions:
nodefs: The node's main filesystem, used for local disk volumes, emptyDir, log storage, and more. For example, nodefs contains
/var/lib/kubelet/.
imagefs: An optional filesystem that container runtimes use to store container images and container writable layers.
Kubelet auto-discovers these filesystems and ignores other
filesystems. Kubelet does not support other configurations.
From Kubernetes website about volumes:
The storage media (such as Disk or SSD) of an emptyDir volume is
determined by the medium of the filesystem holding the kubelet root
dir (typically /var/lib/kubelet).
Location for kubelet directory can be configured by providing:
Command line parameter during kubelet initialization
--root-dir string
Default: /var/lib/kubelet
Via kubeadm with config file (e.g.)
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
root-dir: "/data/var/lib/kubelet"
Customizing kubelet:
To customize the kubelet you can add a KubeletConfiguration next to
the ClusterConfiguration or InitConfiguration separated by ---
within the same configuration file. This file can then be passed to
kubeadm init.
When bootstrapping kubernetes cluster using kubeadm, Capacity reported by kubectl get node is equal to the disk capacity mounted into /var/lib/kubelet
However Allocatable will be reported as:
Allocatable = Capacity - 10% nodefs using the standard kubeadm configuration, since the kubelet has the following default hard eviction thresholds:
nodefs.available<10%
It can be configured during kubelet initialization with:
-eviction-hard mapStringString
Default: imagefs.available<15%,memory.available<100Mi,nodefs.available<10%
Example
I set up a test environment for Kubernetes with a master node and two worker nodes (worker-1 and worker-2).
Both worker nodes have volumes of the same capacity: 50Gb.
Additionally, I mounted a second volume with a capacity of 20Gb for the Worker-1 node at the path /var/lib/kubelet.
Then I created a cluster with kubeadm.
Result
From worker-1 node:
skorkin#worker-1:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 49G 2.8G 46G 6% /
...
/dev/sdb 20G 45M 20G 1% /var/lib/kubelet
and
Capacity:
cpu: 2
ephemeral-storage: 20511312Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4027428Ki
pods: 110
Size of ephemeral-storage is the same as volume mounted at /var/lib/kubelet.
From worker-2 node:
skorkin#worker-2:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 49G 2.7G 46G 6% /
and
Capacity:
cpu: 2
ephemeral-storage: 50633164Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4027420Ki
pods: 110
My Pods are getting killed and recreated stating that OutOfephemeral-storage
Pod describe showing below message
Message: Pod Node didn't have enough resource: ephemeral-storage, requested: 53687091200, used: 0, capacity: 0
Node Capacity
Capacity:
cpu: 80
ephemeral-storage: 1845262880Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 790964944Ki
nvidia.com/gpu: 8
pods: 110
Allocatable:
cpu: 79900m
ephemeral-storage: 1700594267393
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 790612544Ki
nvidia.com/gpu: 8
pods: 110
node disk usage
]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 1.7T 25G 1.7T 2% /
devtmpfs 378G 0 378G 0% /dev
tmpfs 378G 16K 378G 1% /dev/shm
tmpfs 378G 3.8M 378G 1% /run
tmpfs 378G 0 378G 0% /sys/fs/cgroup
Still, the pod is getting rescheduled after some time? any thought why?
In most cases, this is happening due to excess of log messages are consuming the storage. Solution for that would be to configure the Docker logging driver to limit the amount of saved logs:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "10"
}
}
Also worth to mention Docker takes a conservative approach to cleaning up unused objects (often referred to as “garbage collection”), such as images, containers, volumes, and networks: these objects are generally not removed unless you explicitly ask Docker to do so. This can cause Docker to use extra disk space.
It helped for me to use docker function called prune. This will clean up the system from unused objects. If you wish to cleanup multiple objects you can use docker system prune. Check here more about prunning.
Next possible scenario is that that there are pods that use emptyDir without storage quotas. This will fill up the storage. The solution for this would be to set quota to limit this:
resources:
requests:
ephemeral-storage: "1Gi"
limits:
ephemeral-storage: "1Gi"
Without this being set any container can write any amount of storage to its node file system.
For more details how ephemeral storage works please see Ephemeral Storage Consumption.
The issue was with the filesystem, solved with help of the following steps
]# systemctl stop kubelet
]# systemctl stop docker
]# umount -l /<MountFolder>
]# fsck -y /dev/sdb1
]# init 6
I am migrating from minikube to Microk8s and I want to change the configs of Microk8s and control the resources that it can use (cpu, memory, etc.).
In minikube we can use commands like below to set the amount of resources for minikube:
minikube config set memory 8192
minikube config set cpus 2
But I don't know how to do it in Microk8s. I used below commands (with and without sudo):
microk8s.config set cpus 4
microk8s.config set cpu 4
And they returned:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: VORCBDRVJUSUZJQ0FURS0tLS0...
server: https://10.203.101.163:16443
name: microk8s-cluster
contexts:
- context:
cluster: microk8s-cluster
user: admin
name: microk8s
current-context: microk8s
kind: Config
preferences: {}
users:
- name: admin
user:
username: admin
password: ...
But when I get the describe for that node I see that Microk8s is using 8 cpu:
Capacity:
cpu: 8
ephemeral-storage: 220173272Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32649924Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 219124696Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32547524Ki
pods: 110
How can I change the config of Microk8s?
You have a wrong understanding of the microk8s concept.
Unlike minikube, microk8s is not provisioning any VMs for you, it's running on you host machine, hence all resources of the host are allocated for microk8s.
So, in order to keep your cluster resource in borders, you have to manage it with k8s pod/container resource limits
Let's say, your host has 4 CPUs and you don't want your microk8s cluster to use more then half of it's capacity.
You will need to set below limits based on the number of running pods. For a single pod, it'll be like follows:
resources:
requests:
memory: "64Mi"
cpu: 2
limits:
memory: "128Mi"
cpu: 2
On OS/X ...
First stop multipass
sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
Next edit the config file:
sudo su -
vi /var/root/Library/Application\ Support/multipassd/multipassd-vm-instances.json
Start multipassd again
sudo launchctl load /Library/LaunchDaemons/com.canonical.multipassd.plist
Source: https://github.com/canonical/multipass/issues/1158
I need to check if the kubernetes node is configured correctly. Need to use nvidia-docker for one of the worker nodes.
Using: https://github.com/NVIDIA/k8s-device-plugin
How can I confirm that the configuration is correct for the device plugin?
$ kubectl describe node mynode
Roles: worker
Capacity:
cpu: 4
ephemeral-storage: 15716368Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 62710736Ki
nvidia.com/gpu: 1
pods: 110
Allocatable:
cpu: 3800m
ephemeral-storage: 14484204725
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 60511184Ki
nvidia.com/gpu: 1
pods: 110
System Info:
Machine ID: f32e0af35637b5dfcbedcb0a1de8dca1
System UUID: EC2A40D3-76A8-C574-0C9E-B9D571AA59E2
Boot ID: 9f2fa456-0214-4f7c-ac2a-2c62c2ef25a4
Kernel Version: 3.10.0-957.1.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.9.1
Kubelet Version: v1.11.2
Kube-Proxy Version: v1.11.2
However, I can see the nvidia.com/gpu under node resources, the question is: is the Container Runtime Version supposed to say nvidia-docker if the node is configured correctly? Currently, it shows docker which seems fishy, I guess!
Not sure if you did it already, but it seems to be clearly described:
After installing NVIDIA drivers and NVIDIA docker, you need to enable nvidia runtime on your node, by editing /etc/docker/daemon.json as specified here.
So as the instruction says, if you can see that runtimes is correct, you just need to edit that config.
Then deploy a DeamonSet (which is a way of ensuring that a pod runs on each node, with access to host network and devices):
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
Now your containers are ready to consume the GPU - as described here.
Is it possible to specify CPU ID list to the Kubernetes cpumanager? The goal is to make sure pods get CPUs from a single socket (0). I brought all the CPUs on the peer socket offline as mentioned here, for example:
$ echo 0 > /sys/devices/system/cpu/cpu5/online
After doing this, the Kubernetes master indeed sees the remaining online CPUs
kubectl describe node foo
Capacity:
cpu: 56 <<< socket 0 CPU count
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 197524872Ki
pods: 110
Allocatable:
cpu: 54 <<< 2 system reserved CPUs
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 71490952Ki
pods: 110
System Info:
Machine ID: 1155420082478559980231ba5bc0f6f2
System UUID: 4C4C4544-0044-4210-8031-C8C04F584B32
Boot ID: 7fa18227-748f-496c-968c-9fc82e21ecd5
Kernel Version: 4.4.13
OS Image: Ubuntu 16.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.3
Kubelet Version: v1.11.1
Kube-Proxy Version: v1.11.1
However, cpumanager still seems to think there are 112 CPUs (socket0 + socket1).
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-111"}
As a result, the kubelet system pods are throwing the following error:
kube-system kube-proxy-nk7gc 0/1 rpc error: code = Unknown desc = failed to update container "eb455f81a61b877eccda0d35eea7834e30f59615346140180f08077f64896760": Error response from daemon: Requested CPUs are not available - requested 0-111, available: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110 762 36d <IP address> foo <none>
I was able to get this working. Posting this as an answer so that someone in need might benefit.
It appears the CPU set is read from /var/lib/kubelet/cpu_manager_state file and it is not updated across kubelet restarts. So this file needs to be removed before restarting kubelet.
The following worked for me:
# On a running worker node, bring desired CPUs offline. (run as root)
$ cpu_list=`lscpu | grep "NUMA node1 CPU(s)" | awk '{print $4}'`
$ chcpu -d $cpu_list
$ rm -f /var/lib/kubelet/cpu_manager_state
$ systemctl restart kubelet.service
# Check the CPU set seen by the CPU manager
$ cat /var/lib/kubelet/cpu_manager_state
# Try creating pods and check the syslog:
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122466 8070 state_mem.go:84] [cpumanager] updated default cpuset: "0,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122643 8070 policy_static.go:198] [cpumanager] allocateCPUs: returning "2,4,6,8,58,60,62,64"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122660 8070 state_mem.go:76] [cpumanager] updated desired cpuset (container id: 356939cdf32d0f719e83b0029a018a2ca2c349fc0bdc1004da5d842e357c503a, cpuset: "2,4,6,8,58,60,62,64")
I have reported a bug here as I think the CPU set should be updated after kubelet restarts.