How to see what a k8s container is writing to ephemeral storage - kubernetes

One of our containers is using ephemeral storage but we don't know why. The app running in the container shouldn't be writing anything to the disk.
We set the storage limit to 20MB but it's still being evicted. We could increase the limit but this seems like a bandaid fix.
We're not sure what or where this container is writing to, and I'm not sure how to check that. When a container is evicted, the only information I can see is that the container exceeded its storage limit.
Is there an efficient way to know what's being written, or is our only option to comb through the code?

Adding details to the topic.
Pods use ephemeral local storage for scratch space, caching, and logs.
Pods can be evicted due to other pods filling the local storage, after which new pods are not admitted until sufficient storage has been reclaimed.
The kubelet can provide scratch space to Pods using local ephemeral storage to mount emptyDir volumes into containers.
For container-level isolation, if a container's writable layer and log usage exceeds its storage limit, the kubelet marks the Pod for eviction.
For pod-level isolation the kubelet works out an overall Pod storage limit by summing the limits for the containers in that Pod. In this case, if the sum of the local ephemeral storage usage from all containers and also the Pod's emptyDir volumes exceeds the overall Pod storage limit, then the kubelet also marks the Pod for eviction.
To see what files have been written since the pod started, you can run:
find / -mount -newer /proc -print
This will output a list of files modified more recently than '/proc'.
/etc/nginx/conf.d
/etc/nginx/conf.d/default.conf
/run/secrets
/run/secrets/kubernetes.io
/run/secrets/kubernetes.io/serviceaccount
/run/nginx.pid
/var/cache/nginx
/var/cache/nginx/fastcgi_temp
/var/cache/nginx/client_temp
/var/cache/nginx/uwsgi_temp
/var/cache/nginx/proxy_temp
/var/cache/nginx/scgi_temp
/dev
Also, try without the '-mount' option.
To see if any new files are being modified, you can run some variations of the following command in a Pod:
while true; do rm -f a; touch a; sleep 30; echo "monitoring..."; find / -mount -newer a -print; done
and check the file size using the du -h someDir command.
Also, as #gohm'c pointed out in his answer, you can use sidecar/ephemeral debug containers.
Read more about Local ephemeral storage here.

We're not sure what or where this container is writing to, and I'm not sure how to check that.
Try look into the container volumeMounts section that is mounted with emptyDir, then add a sidecar container (eg. busybox) to start a shell session where you can check the path. If your cluster support ephemeral debug container you don't need the sidecar container.

Related

Debugging nfs volume "Unable to attach or mount volumes for pod"

I've set up an nfs server that serves a RMW pv according to the example at https://github.com/kubernetes/examples/tree/master/staging/volumes/nfs
This setup works fine for me in lots of production environments, but in some specific GKE cluster instance, mount stopped working after pods restarted.
From kubelet logs I see the following repeating many times
Unable to attach or mount volumes for pod "api-bf5869665-zpj4c_default(521b43c8-319f-425f-aaa7-e05c08282e8e)": unmounted volumes=[shared-mount], unattached volumes=[geekadm-net deployment-role-token-6tg9p shared-mount]: timed out waiting for the condition; skipping pod
Error syncing pod 521b43c8-319f-425f-aaa7-e05c08282e8e ("api-bf5869665-zpj4c_default(521b43c8-319f-425f-aaa7-e05c08282e8e)"), skipping: unmounted volumes=[shared-mount], unattached volumes=[geekadm-net deployment-role-token-6tg9p shared-mount]: timed out waiting for the condition
Manually mounting the nfs on any of the nodes work just fine: mount -t nfs <service ip>:/ /tmp/mnt
How can I further debug the issue? Are there any other logs I could look at besides kubelet?
In case the pod gets kicked out of the node because the mount is too slow, you may see messages like that in logs.
Kubelets even inform about this issue in logs.
Sample log from Kubelets:
Setting volume ownership for /var/lib/kubelet/pods/c9987636-acbe-4653-8b8d-
aa80fe423597/volumes/kubernetes.io~gce-pd/pvc-fbae0402-b8c7-4bc8-b375-
1060487d730d and fsGroup set. If the volume has a lot of files then setting
volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699
Cause:
The pod.spec.securityContext.fsGroup setting causes kubelet to run chown and chmod on all the files in the volumes mounted for given pod. This can be a very time consuming thing to do in case of big volumes with many files.
By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted. From the document.
Solution:
You can deal with it in the following ways.
Reduce the number of files in the volume.
Stop using the fsGroup setting.
Did you specify an nfs version when mounting command-line? I had the same issue on AKS, but inspired by https://stackoverflow.com/a/71789693/1382108 I checked the nfs versions. Noticed my PV had a vers=3. When I tried mounting command-line using mount -t nfs -o vers=3 command just hung, with vers=4.1 it worked immediately. Changed the version in my PV and next Pod worked just fine.

How to get iostats of a container running in a pod on Kubernetes?

Memory and cpu resources of a container can be tracked using prometheus. But can we track I/O of a container? Are there any metrices available?
If you are using Docker containers you can check the data with the docker stats command (as P... mentioned in the comment). Here you can find more information about this command.
If you want to check pods cpu/memory usage without installing any third party tool then you can get memory and cpu usage of pod from cgroup.
Go to pod's exec mode kubectl exec pod_name -- /bin/bash
Go to cd /sys/fs/cgroup/cpu for cpu usage run cat cpuacct.usage
Go to cd /sys/fs/cgroup/memory for memory usage run cat memory.usage_in_bytes
For more look at this similar question.
Here you can find another interesting question. You should know, that
Containers inside pods partially share /procwith the host system include path about a memory and CPU information.
See also this article about Memory inside Linux containers.

persistence volume with multiple local disks

I have a home Kubernetes cluster with multiple SSDs attached to one of the nodes.
I currently have one persistence volume per mounted disk. Is there an easy way to create a persistence volume that can access data from multiple disks? I thought about symlink but it doesn't seem to work.
You would have to combine them at a lower level. The simplest approach would be Linux LVM but there's a wide range of storage strategies. Kubernetes orchestrates mounting volumes but it's not a storage management solution itself, just the last-mile bits.
As already mentioned by coderanger Kubernetes does not manage your storage at lower level. While with cloud solutions there might some provisioners that will do some of the work for you with bare metal there isn't.
The closest thing that help you manage local storage is Local-volume-static-provisionner.
The local volume static provisioner manages the PersistentVolume
lifecycle for pre-allocated disks by detecting and creating PVs for
each local disk on the host, and cleaning up the disks when released.
It does not support dynamic provisioning.
Have a look at this article for more example it.
I have a trick which is working for me.
You can mount these disks at a directory like /disks/, and then make a loop filesystem, mounted it, and make a symbol link from disks to the loop filesystem.
for example:
touch ~/disk-bunch1 && truncate -s 32M ~/disk-bunch1 && mke2fs -t ext4 -F ~/disk-bunch1
mount it and make a symbol link from disks to the loop filesystem:
mkdir -p /local-pv/bunch1 && mount ~/disk-bunch1 /local-pv/bunch1
ln -s /disk/disk1 /local-pv/bunch1/disk1
ln -s /disk/disk2 /local-pv/bunch1/disk2
Finally, use sig-storage-local-static-provisioner, modify the "hostDir" to "/local-pv" in the values.yaml and deploy the provisioner. And then, you could make a pod use multiple disks.
But this method have a drawback, when you run "kubectl get pv", the CAPACITY is just the size of the loop filesystem instead of the sum of several disk capacities...
By the way, this method, is not recommended ... You'd better think of such as raid0 or lvm and etc...

How to free storage on node when status is "Attempting to reclaim ephemeral-storage"?

I have a 3 node Kubernetes cluster used for development.
One of the node's status is "Attempting to reclaim ephemeral-storage" since 11 days.
How to reclaim storage ?
Since it is just development instance I cannot extend the storage. I dont care about the existing data in the storage. How to clear the storage ?
Thanks
Just run 'docker system prune command' to free up the space on the node. refer the below command
$ docker system prune -a --volumes
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all volumes not used by at least one container
- all images without at least one container associated to them
- all build cache
Are you sure you want to continue? [y/N] y
Since it's a development environment you can just drain the node to clear all pods and their data and then uncordon for pods to be scheduled again
kubectl drain --delete-local-data --ignore-daemonsets $NODE_NAME && kubectl uncordon $NODE_NAME
--delete-local-data flag is for cleaning data of the pods.

how to solve the disk pressure in kubernetes

I have a local OpenNESS network edge cluster using Kubernetes as its infrastructure management.
I'm facing the disk pressure issue due to which pods are getting Evicted and in CrashLoopBack state.
Also, the images from worker-node went missing(got deleted automatically)
If I check the disk usage, I see 83% been used by the dev/sda4 or overlay filesystem.
how to solve this issue.
image attached shows the disk usage
Your disk usage chart reveals a lot of disk usage on the overlay filesystem, so by Docker containers union file system. This suggests that you are having some large containers running. Those might have been large to start with or be writing binary data to the container file system while running.
To get to the bottom of this, you can either have a look into at your monitoring (if present). Or, your can ssh into the affected node and try to identify the "guilty" pod with:
du --max-depth=1 /var/lib/docker/overlay2/ | sort -n
and a subsequent: du | sort -n in the biggest folder.