How to analyse Minikube leakage of storage? - minikube

I have a local cluster with 2 PV
1 For a mysql dump with a 20Gb
1 For Mongo db with around 10Gb
A bunch of different services, mainly images around 400Mb or less, and 1 legacy of 2.1Gb image.
I am running Skaffold + Minikube + VirtualBox
I get the following error from mysql:
2020-06-09T12:49:49.854701Z 0 [ERROR] InnoDB: Write to file ./ibtmp1failed at offset 0, 1048576 bytes should have been written, only 0 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2020-06-09T12:49:49.854724Z 0 [ERROR] InnoDB: Error number 28 means 'No space left on device'
Then I start looking at the current system memory, with this:
minikube ssh "free -mh"
And I get the following:
-------------total used free shared buff/cache available
Mem: 7.8Gi 2.3Gi 2.4Gi 517Mi 3.1Gi 5.3Gi
I run the following command:
minikube ssh "df -h"
And I get the following:
Filesystem Size Used Avail Use% Mounted on
tmpfs 7.1G 493M 6.6G 7% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 26M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
tmpfs 3.9G 12K 3.9G 1% /tmp
/dev/sda1 70G 69G 0 100% /mnt/sda1
/Users 234G 190G 44G 82% /Users
I see that /mnt/sda1 could be the main problem, but I am not able to track down how did that sda1 got filled. How can I see the content /dev/sda1 and the possible guilty process that had filled this device?
I also did the following commands:
docker image prune To remove unused images
docker volume prune To remove unused volumes
I also run skaffold considering the documuntation with a cleanup:
skaffoold -f="file-skaffold-config" --no-prune=false --cache-artifacts=false
docker rmi -f [IMAGE_ID] to remove a particular image by ID and see if the removal of this image could eventually free some space in the /dev/sda1
But I cannot see a better situation with my sda1, meaning that minikube ssh "df -h" shows exactly the same results as before.
How can I see with minikube / skaffold / docker what is filling up /dev/sda1?

After asking into the github repository for some advice about space consumption in minikube:
https://github.com/kubernetes/minikube/issues/8422
I was pointed out to space radar:
https://github.com/zz85/space-radar/tree/v5.1.0
I will be able to provide further insights and ensure what is producing this disk usage... There is a high chance that the case would be resolved by changing something in the images I am running locally, will update this answer once I know what had happen.
[UPDATE]
At the end, I was able to take a closer look to the problem of minikube from my colleague by evaluating the disk consumption of my machine in regards to his machine.
As pointed out by mWatney in the comments, I did:
$ minikube ssh "sudo du -sh /mnt/sda1/*"
17G /mnt/sda1/data
4.0K /mnt/sda1/hostpath-provisioner
4.0K /mnt/sda1/hostpath_pv
16K /mnt/sda1/lost+found
46G /mnt/sda1/var
But the one from my colleague had:
34G /mnt/sda1/data
4.0K /mnt/sda1/hostpath-provisioner
4.0K /mnt/sda1/hostpath_pv
16K /mnt/sda1/lost+found
36G /mnt/sda1/var
Meaning that he had a problem with the Persistent Volumes, and after doing a fix in his machine in the past he had a back up in the minikube host path PV by mistake, so now it is working fine.

Related

Ceph RBD real space usage is much larger than disk usage once mounted

I'm trying to understand how to find out the current and real disk usage of a ceph cluster and I noticed that the output of a rbd du is way different from the output of a df -h once that rbd is mounted as a disk.
Example:
Inside the ToolBox I have the following:
$ rbd du replicapool/csi-vol-da731ad9-eebe-11eb-9fbd-f2c976e9e23a
warning: fast-diff map is not enabled for csi-vol-da731ad9-eebe-11eb-9fbd-f2c976e9e23a. operation may be slow.
2021-09-01T13:53:23.482+0000 7f8c56ffd700 -1 librbd::object_map::DiffRequest: 0x557402c909c0 handle_load_object_map: failed to load object map: rbd_object_map.8cdeb6e704c7e0
NAME PROVISIONED USED
csi-vol-da731ad9-eebe-11eb-9fbd-f2c976e9e23a 100 GiB 95 GiB
But, inside the Pod that is mounting this rbd, I have:
$ k exec -it -n monitoring prometheus-prometheus-operator-prometheus-1 -- sh
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-prometheus-operator-prometheus-1 -n monitoring' to see all of the containers in this pod.
/prometheus $ df -h
Filesystem Size Used Available Use% Mounted on
overlay 38.0G 19.8G 18.2G 52% /
...
/dev/rbd5 97.9G 23.7G 74.2G 24% /prometheus
...
Is there a reason for the two results to be so different? Can this be a problem when ceph keeps track of the total space used by the cluster to know how much space is available?

Google cloud after VM Import no free space available on root drive

I created a Postgres server locally using virualbox using Ubuntu 16.04. Using the import tool to move it to Google cloud seemed to work fine, but the root drive shows 100% full. None of the disk expansion instructions (including creating a snapshot and recreating the boot drive) seem to make any space available.
There seems to be a boot drive and a root drive. But the root drive shows it is all used. The boot drive shows space available, but it should be 15G in size not 720M.
Filesystem Size Used Avail Use% Mounted on
udev 1.8G 0 1.8G 0% /dev
tmpfs 370M 5.3M 365M 2% /run
/dev/mapper/techredo--vg-root 2.5G 2.5G 0 100% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sdb1 720M 121M 563M 18% /boot
tmpfs 370M 0 370M 0% /run/user/406485188
I checked if is possible to use LVM in GCP instances, and I find out that you're free to use it but is not supported by Google Cloud since instances doesn't use LVM by default.
On the other hand, you need to make sure that the Linux Guest Environment is installed in your instance, so you can get the automatic resizing feature. Please follow this guide to learn how to validate: https://cloud.google.com/compute/docs/images/install-guest-environment#wgei
Since your root partition is full and you're not able to install more programs I suggest you 2 workarounds:
Workaround 1: Create a new VirtualBox VM with and import it again, please note that your root partition is pretty small (2.5G) so I suggest you next time create a partition with at least 10GB, and avoid use LVM during the installation.
After your instance is ready in GCP, please check if the Linux Guest Environment is installed in your instance, if not install it: https://cloud.google.com/compute/docs/images/install-guest-environment
Workaround 2: Check which directory is causing problems and then which files are consuming your disk space, delete them to gain space, install the Guest enviroment and try to resize your instance.
a) To check the directories and files sizes, follow these steps:
There are several tools that can display your disk usage graphically but since your root partition is full you'll have to get the information by running commands (old school style).
Please follow these steps:
Please go to the root directory:
cd /
Please run this command to get the size of the main subdirectories under the root partition:
sudo du -aBM -d 1 . -R | sort -nr | head -20
NOTE: Identify which directory is eating your root partition.
Please run this command to get a full list of the files and its sizes:
du -k *| sort -nr | cut -f2 | xargs -d '\n' du -sh
NOTE: The above command will display all files and directories too fast, so in order to scroll down slowly, please run the same command adding the "less" instruction:
du -k *| sort -nr | cut -f2 | xargs -d '\n' du -sh |less
Press the spacebar to scroll down.
Please keep in mind that you have to go to the directory you want to analyze before running the commands in step 3 or 4 (just in case you want to analyze another directory).
Additional to this you can run the command "apt-get clean" to clear the downloaded packages (.deb files) that usually consumes a high part of your disk.
b) To resize your instance, you have 2 options:
Resize your VM instance "primary-server" by following this guide[1].
NOTE: The steps included in this option are pretty easy to follow, if this doesn't work try the second option which requires advanced Linux abilities.
Create a snapshot from the VM instance "primary-server".
2.1 Create a new instance based on a Linux distribution.
2.2 Once it's been created, stop the instance.
2.3 Follow this guide to add an additional disk[2].
NOTE: Basically you have to edit the instance "primary-server" and add an additional disk, don't forget to select the snapshot option from the "Source type" list and click on the snapshot you just created.
2.4 Start the instance.
2.5 Mount the disk by following this guide[3].
NOTE: Please skip step 4. The additional disk it's actually a boot disk so it's been already formatted. So, don't apply the format to it; just mount it, .
2.6 Check the permissions of the file "/etc/fstab".
NOTE: The permissions should be "-rw-r--r--" and the owner "root"
2.6.1 Delete files to reduce the disk size.
2.7 Unmount the disk at OS level.
2.8 Stop the instance.
2.9 Detach the additional disk from the new instance in GCP.
NOTE: Please follow this guide[4] and instead of clicking on X next to the boot disk, please click on X next to the additional disk.
2.10 Create a new instance and instead of using an image in the "boot disk" section, please use the disk you just restored.
NOTE: For this please go to the "Boot disk" section and click on the "Change" button, then go to the "Existing" TAB and select the disk you just restored.
REFERENCES:
[1] https://cloud.google.com/compute/docs/disks/add-persistent-disk#inaccessible_instance
[2] https://cloud.google.com/compute/docs/disks/add-persistent-disk#create_disk
[3] https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting
[4] https://cloud.google.com/compute/docs/disks/detach-reattach-boot-disk#detach_disk
Please let me know the results.

How to explain Ceph space usage

I looked up RBD disk space usage, but found different statistics from Ceph and the host which mounts the disk.
From Ceph:
$ rbd -p rbd du
NAME PROVISIONED USED
kubernetes-dynamic-pvc-13a2d932-6be0-11e9-b53a-0a580a800339 40GiB 37.8GiB
From the host which mounts the disk
$ df -h
Filesystem Size Used Available Use% Mounted on
/dev/rbd0 39.2G 26.6G 10.6G 72% /data
How could I explain the difference?
You can check mount options of /dev/rbd0 device. There should be no 'discard' option. Without that option filesystem cannot report to Ceph about reclaimed space. So Ceph has no idea how much space is actually occupied on rbd volume. This is not a big problem and can be safely ignored. You can rely on stats reported by kubelet.

Docker container: MongoDb Insufficient free space for journal files

I am running MongoDB inside a Docker (version 1.10.1, on OSX) container and it is giving this error:
MongoDb Insufficient free space for journal files
I am not able to find out weather the issue in on the host, the container, or in virtual box?
However, on my host I have:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk1 465Gi 75Gi 389Gi 17% 19777401 102066309 16% /
And on the docker container:
Filesystem Inodes IUsed IFree IUse% Mounted on
none 1218224 742474 475750 61% /
I have also mounted a volume from the host with:
run -it -v /Users/foobar/Projects/compose:/data/db mongoImage:0.1 /bin/bash
Thanks for the comments #andy, the issue did seem to be within the virtual box env.
I was able to resolve the issue by:
backing up all docker images
cloning the default virtual box iso (as
a backup)
deleting the default virtual box iso and all associated
files.
restarting docker, a new default Vbox iso was created. This
resolved the issue (which I expect to have again at some point)

Can't start emulator environment (error NAND: could not write file...file exists)

I'm trying to start developing in android but have had problems setting up the development environment:
I am running Ubuntu 11.04 and have installed Eclipse Juno 4.2.0. and have updated the android sdk tools to the latest version.
When I try to run an android emulator I get the error "NAND: Could not write file...file exists". When searching on this error on answer said I needed to free up some space on my hard drive. I have since freed up a few Gig from the hard drive but I still get the same error. Another site said to delete all emulator environments and create new ones from scratch. I tried this but when I had just one environment listed in the avd manager and I try to delete it, and error message pops up that says I can't because the emulator is currently running. Even when I reboot the computer, open the avd manager and try to delete I still get the same error.
I have tried
adb devices
to find the device that is running but no devices get listed.
I get this error whether I am running the avd manager form Eclipse or from the command line. Does anyone know Why I am getting the NAND: Could not write file...file exists error or why I always get the message about the emulator running.
Regards,
John
Try to check the free space on your hard drive....... its usually due to low storage space
Try running df -h repeatedly while the emulator is starting up. You may see something like this:
$ df -h
Filesystem Size Used Avail Use% Mounted on
...
tmpfs 3.7G 2.7G 1.1G 72% /tmp
...
$ df -h
Filesystem Size Used Avail Use% Mounted on
...
tmpfs 3.7G 3.6G 191M 95% /tmp
...
$ df -h
Filesystem Size Used Avail Use% Mounted on
...
tmpfs 3.7G 3.6G 160M 96% /tmp
...
$ df -h
Filesystem Size Used Avail Use% Mounted on
...
tmpfs 3.7G 3.6G 112M 98% /tmp
...
$ df -h
Filesystem Size Used Avail Use% Mounted on
...
tmpfs 3.7G 3.7G 8.8M 100% /tmp
...
$ df -h
Filesystem Size Used Avail Use% Mounted on
...
tmpfs 3.7G 2.7G 1.1G 72% /tmp
...
That is, the partition fills up, then you get the error message and then the partition frees up.
Solution would be either to remount the tmpfs at /tmp with a larger space allocation, 5 GB should be enough, using sudo mount -o remount,size=5G tmpfs /tmp/ or to tell AVD to put its temp directory somewhere else as per How to change the Android emulator temporary directory and https://code.google.com/p/android/issues/detail?id=15716