CephFS meets bonnie++: No space left on device - ceph

CephFS is evaluated as a distributed filesystem in my lab with bonnie++ in the
following setup:
Virtualbox VMs are used with the following settings:
RAM: 1024 MiB
One SATA hard drive for OS: 8 GiB
One dedicated SATA hard drive for distributed file system for each OSD: 8 GiB
OS: CentOS 7
Ceph version: 10.2
Ceph is configured for three (3) replicas, i.e.
three VMs are operated as OSDs:
node1 hosts ceph-mon, ceph-osd and ceph-mds
node2 and node3 host ceph-mon and ceph-osd.
admin node is idle
The client runs bonnie++ on a CephFS mounts.
Somehow, bonnie++ fails to run through. Instead, it exits with the following
error message:
Writing intelligently...Can't write block.: No space left on device
See attached screenshot:
Funnily enough, df -h reports that 8.7 GiB available disk space.
See next screenshot:
Is this normal ceph behaviour? What do you recommend to fix it?

Related

All ceph daemons container images disappeared on a single node after reconfiguring docker logs driver

I've changed log_driver to "local" in daemon.json docker configuration file, because an high activity level on rados gateway logs had satured disk space. My will was to change to journald to have logrotate. Unfortunately, after restart the docker daemon, many Ceph services did disappeared even as containers images. So now that node had caused an HEALTH_ERR because it lost 1 mgr, 1 mon and 3 osd services at the same time.
I've tried to use some ceph commands inside cephadm shell (on another node), but it freezes and nothing happened. What can I try to do to restore the node's services and cluster health?

external hard disk not detected in my centos 8 server?

My Centos 8 server not detected WD external hard drives which are mount in / with different folder like PHD10, PHD11 etc.
It's faults when running machine.
In /etc/fstab UUID show all disk & file system is ext4.
but in lsblk or fdisk -l not show those disk.
Can someone help?

Monitor daemon running but not in quorum

I'm currently testing OS and version upgrades for a ceph cluster. Starting info:
The cluster is currently on Centos 7 and Ceph version Nautilus. I'm trying to change OS with ubuntu 20.04 and version with Octopus. I started with upgrading mon1 first. I will write down the things done in order.
First of I stopped monitor service - systemctl stop ceph-mon#mon1
Then I removed the monitor from cluster - ceph mon remove mon1
Then installed ubuntu 20.04 on mon1. Updated the system and configured ufw.
Installed ceph octopus packages.
Copied ceph.client.admin.keyring and ceph.conf to mon1 /etc/ceph/
Copied ceph.mon.keyring to mon1 to a temporary folder and changed ownership to ceph:ceph
Got the monmap ceph mon getmap -o ${MONMAP} - The thing is i did this after removing the monitor.
Created /var/lib/ceph/mon/ceph-mon1 folder and changed ownership to ceph:ceph
Created the filesystem for monitor - sudo -u ceph ceph-mon --mkfs -i mon1 --monmap /folder/monmap --keyring /folder/ceph.mon.keyring
After noticing I got the monmap after the monitors removal I added it manually - ceph mon add mon1 <ip> --fsid <fsid>
After starting manually and checking cluster state with ceph -s I can see mon1 is listed but is not in quorum. The monitor daemon runs fine on the said mon1 node. I noticed on logs that mon1 is stuck in "probe" state and on other monitor logs there is an output such as mon1 (rank 2) addr [v2:<ip>:3300/0,v1:<ip>:6789/0] is down (out of quorum) , as i said the the monitor daemon is running on mon1 without any visible errors just stuck in probe state.
I wondered if it was caused by os&version change so i first tried out configuring manager, mds and radosgw daemons by creating the respective folders in /var/lib/ceph/... and copying keyrings. All these services work fine, i was able to reach to my buckets, was able to open the Octopus version dashboard, and metadata server is listed as active in ceph -s. So evidently my problem is only with monitor configuration.
After doing some checking found this on red hat ceph documantation:
If the Ceph Monitor is in the probing state longer than expected, it
cannot find the other Ceph Monitors. This problem can be caused by
networking issues, or the Ceph Monitor can have an outdated Ceph
Monitor map (monmap) and be trying to reach the other Ceph Monitors on
incorrect IP addresses. Alternatively, if the monmap is up-to-date,
Ceph Monitor’s clock might not be synchronized.
There is no network error on the monitor, I can reach all the other machines in the cluster. The clocks are synchronized. If this problem is caused by the monmap situation how can I fix this?
Ok so as a result, directly from centos7-Nautilus to ubuntu20.04-Octopus is not possible for monitor services only, apparently the issue is about hostname resolution with different Operating systems. The rest of the services is fine. There is a longer way to do this without issue and is the correct solution. First change os from centos7 to ubuntu18.04 and install ceph-nautilus packages and add the machines to cluster (no issues at all). Then update&upgrade the system and apply "do-release-upgrade". Works like a charm. I think what eblock mentioned was this.

Google cloud after VM Import no free space available on root drive

I created a Postgres server locally using virualbox using Ubuntu 16.04. Using the import tool to move it to Google cloud seemed to work fine, but the root drive shows 100% full. None of the disk expansion instructions (including creating a snapshot and recreating the boot drive) seem to make any space available.
There seems to be a boot drive and a root drive. But the root drive shows it is all used. The boot drive shows space available, but it should be 15G in size not 720M.
Filesystem Size Used Avail Use% Mounted on
udev 1.8G 0 1.8G 0% /dev
tmpfs 370M 5.3M 365M 2% /run
/dev/mapper/techredo--vg-root 2.5G 2.5G 0 100% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sdb1 720M 121M 563M 18% /boot
tmpfs 370M 0 370M 0% /run/user/406485188
I checked if is possible to use LVM in GCP instances, and I find out that you're free to use it but is not supported by Google Cloud since instances doesn't use LVM by default.
On the other hand, you need to make sure that the Linux Guest Environment is installed in your instance, so you can get the automatic resizing feature. Please follow this guide to learn how to validate: https://cloud.google.com/compute/docs/images/install-guest-environment#wgei
Since your root partition is full and you're not able to install more programs I suggest you 2 workarounds:
Workaround 1: Create a new VirtualBox VM with and import it again, please note that your root partition is pretty small (2.5G) so I suggest you next time create a partition with at least 10GB, and avoid use LVM during the installation.
After your instance is ready in GCP, please check if the Linux Guest Environment is installed in your instance, if not install it: https://cloud.google.com/compute/docs/images/install-guest-environment
Workaround 2: Check which directory is causing problems and then which files are consuming your disk space, delete them to gain space, install the Guest enviroment and try to resize your instance.
a) To check the directories and files sizes, follow these steps:
There are several tools that can display your disk usage graphically but since your root partition is full you'll have to get the information by running commands (old school style).
Please follow these steps:
Please go to the root directory:
cd /
Please run this command to get the size of the main subdirectories under the root partition:
sudo du -aBM -d 1 . -R | sort -nr | head -20
NOTE: Identify which directory is eating your root partition.
Please run this command to get a full list of the files and its sizes:
du -k *| sort -nr | cut -f2 | xargs -d '\n' du -sh
NOTE: The above command will display all files and directories too fast, so in order to scroll down slowly, please run the same command adding the "less" instruction:
du -k *| sort -nr | cut -f2 | xargs -d '\n' du -sh |less
Press the spacebar to scroll down.
Please keep in mind that you have to go to the directory you want to analyze before running the commands in step 3 or 4 (just in case you want to analyze another directory).
Additional to this you can run the command "apt-get clean" to clear the downloaded packages (.deb files) that usually consumes a high part of your disk.
b) To resize your instance, you have 2 options:
Resize your VM instance "primary-server" by following this guide[1].
NOTE: The steps included in this option are pretty easy to follow, if this doesn't work try the second option which requires advanced Linux abilities.
Create a snapshot from the VM instance "primary-server".
2.1 Create a new instance based on a Linux distribution.
2.2 Once it's been created, stop the instance.
2.3 Follow this guide to add an additional disk[2].
NOTE: Basically you have to edit the instance "primary-server" and add an additional disk, don't forget to select the snapshot option from the "Source type" list and click on the snapshot you just created.
2.4 Start the instance.
2.5 Mount the disk by following this guide[3].
NOTE: Please skip step 4. The additional disk it's actually a boot disk so it's been already formatted. So, don't apply the format to it; just mount it, .
2.6 Check the permissions of the file "/etc/fstab".
NOTE: The permissions should be "-rw-r--r--" and the owner "root"
2.6.1 Delete files to reduce the disk size.
2.7 Unmount the disk at OS level.
2.8 Stop the instance.
2.9 Detach the additional disk from the new instance in GCP.
NOTE: Please follow this guide[4] and instead of clicking on X next to the boot disk, please click on X next to the additional disk.
2.10 Create a new instance and instead of using an image in the "boot disk" section, please use the disk you just restored.
NOTE: For this please go to the "Boot disk" section and click on the "Change" button, then go to the "Existing" TAB and select the disk you just restored.
REFERENCES:
[1] https://cloud.google.com/compute/docs/disks/add-persistent-disk#inaccessible_instance
[2] https://cloud.google.com/compute/docs/disks/add-persistent-disk#create_disk
[3] https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting
[4] https://cloud.google.com/compute/docs/disks/detach-reattach-boot-disk#detach_disk
Please let me know the results.

Docker container: MongoDb Insufficient free space for journal files

I am running MongoDB inside a Docker (version 1.10.1, on OSX) container and it is giving this error:
MongoDb Insufficient free space for journal files
I am not able to find out weather the issue in on the host, the container, or in virtual box?
However, on my host I have:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk1 465Gi 75Gi 389Gi 17% 19777401 102066309 16% /
And on the docker container:
Filesystem Inodes IUsed IFree IUse% Mounted on
none 1218224 742474 475750 61% /
I have also mounted a volume from the host with:
run -it -v /Users/foobar/Projects/compose:/data/db mongoImage:0.1 /bin/bash
Thanks for the comments #andy, the issue did seem to be within the virtual box env.
I was able to resolve the issue by:
backing up all docker images
cloning the default virtual box iso (as
a backup)
deleting the default virtual box iso and all associated
files.
restarting docker, a new default Vbox iso was created. This
resolved the issue (which I expect to have again at some point)