ceph-deploy for CentOS 7 does not find ceph-02:/dev/sdb file - ceph

I am trying to activate ceph osd by following command:
ceph-deploy osd prepare ceph-02:/dev/sdb
And found following error
[ceph-02][WARNIN] OSError: [Errno 2] No such file or directory: '/dev/sdb'
[ceph-02][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-disk -v
prepare --cluster ceph --fs-type xfs -- /dev/sdb
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs

step 1 :
parted -s /dev/sdb mklabel gpt mkpart primary xfs 0% 100%
step 2:
reboot
step3:
mkfs.xfs /dev/vdb -f
ceph-deploy osd create --data /dev/sdb server-hostname
its worked i tested this command.

Related

Using newly added firewall service in rpm spec script fails

I have a rpm package that adds a new firewall service and during install wants to enable this service. However this fails with "Error: INVALID_SERVICE":
$ dnf localinstall -y firewall-spec-test-0.0.1-1.fc35.x86_64.rpm
Last metadata expiration check: 1:29:06 ago on Fri 27 May 2022 01:20:48 CEST.
Dependencies resolved.
==============================================================================
Package Arch Version Repository Size
==============================================================================
Installing:
firewall-spec-test x86_64 0.0.1-1.fc35 #commandline 7.2 k
Transaction Summary
==============================================================================
Install 1 Package
Total size: 7.2 k
Installed size: 164
Downloading Packages:
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : firewall-spec-test-0.0.1-1.fc35.x86_64 1/1
Running scriptlet: firewall-spec-test-0.0.1-1.fc35.x86_64 1/1
Error: INVALID_SERVICE: 'dummy' not among existing services
Verifying : firewall-spec-test-0.0.1-1.fc35.x86_64 1/1
Installed:
firewall-spec-test-0.0.1-1.fc35.x86_64
Complete!
The dummy.xml file is
<?xml version="1.0" encoding="utf-8"?>
<service>
<description>dummy service</description>
<short>dummy</short>
<port port="1234" protocol="udp"/>
</service>
and the spec file I have trimmed down to for testing is:
Name: firewall-spec-test
Version: 0.0.1
Release: 1%{?dist}
Summary: ...
License: GPLv3
URL: https://stackoverflow.com/q/...
Source0: dummy.xml
BuildRequires: systemd-rpm-macros
Requires: firewalld
%description
...
%prep
cp %{SOURCE0} .
%build
%install
mkdir -p ${RPM_BUILD_ROOT}%{_sysconfdir}/firewalld/services
cp -a dummy.xml ${RPM_BUILD_ROOT}%{_sysconfdir}/firewalld/services
# https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_syntax
%post
if [ $1 == 1 ]
then
# First time install
firewall-cmd --permanent --zone=public --add-service=dummy
firewall-cmd --reload --quiet
fi
exit 0
%preun
if [ $1 == 0 ]
then
# Complete uninstall
firewall-cmd --permanent --zone=public --remove-service=dummy
firewall-cmd --reload --quiet
fi
exit 0
%files
%defattr(-,root,root,-)
%config(noreplace) %{_sysconfdir}/firewalld/services/*
%changelog
...
So how do I get firewall to use the new service?
So apparently firewalld needs an initial reload first in order for it to pick up the added service definition.
--- firewall-spec-test.spec.fail 2022-05-27 02:58:34.747351419 +0200
+++ firewall-spec-test.spec 2022-05-27 02:59:13.925280222 +0200
## -25,6 +25,7 ##
if [ $1 == 1 ]
then
# First time install
+ firewall-cmd --reload --quiet # In order for firewall-cmd to pick up the added service file
firewall-cmd --permanent --zone=public --add-service=dummy
firewall-cmd --reload --quiet
fi

Replacing disk while retaining osd id

In a ceph cluster, how do we replace failed disks while keeping the osd id(s)?
Here are the steps followed (unsuccessful):
# 1 destroy the failed osd(s)
for i in 38 41 44 47; do ceph osd destroy $i --yes-i-really-mean-it; done
# 2 create the new ones that take the previous osd ids
ceph orch apply osd -i replace.yaml
# Scheduled osd.ceph_osd_ssd update...
The replace.yaml:
service_type: osd
service_id: ceph_osd_ssd # "ceph_osd_hdd" for hdd
placement:
hosts:
- storage01
data_devices:
paths:
- /dev/sdz
- /dev/sdaa
- /dev/sdab
- /dev/sdac
osd_id_claims:
storage01: ['38', '41', '44', '47']
But nothing happens, the osd id still show as destroyed and devices without osd id(s).
# ceph -s
cluster:
id: db2b7dd0-1e3b-11eb-be3b-40a6b721faf4
health: HEALTH_WARN
failed to probe daemons or devices
5 daemons have recently crashed
I also tried to run this
ceph orch daemon add osd storage01:/dev/sdaa
which gives:
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1177, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 141, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 318, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 103, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 92, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 713, in _daemon_add_osd
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 643, in raise_if_exception
raise e
RuntimeError: cephadm exited with an error code: 1, stderr:INFO:cephadm:/bin/podman:stderr Running command: /usr/bin/ceph-authtool --gen-print-key
INFO:cephadm:/bin/podman:stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new daca7735-179b-4443-acef-412bc39865e3
INFO:cephadm:/bin/podman:stderr Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-daca7735-179b-4443-acef-412bc39865e3 ceph-0a533319-def2-4fbe-82f5-e76f971b7f48
INFO:cephadm:/bin/podman:stderr stderr: Calculated size of logical volume is 0 extents. Needs to be larger.
INFO:cephadm:/bin/podman:stderr --> Was unable to complete a new OSD, will rollback changes
INFO:cephadm:/bin/podman:stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.38 --yes-i-really-mean-it
INFO:cephadm:/bin/podman:stderr stderr: purged osd.38
INFO:cephadm:/bin/podman:stderr --> RuntimeError: command returned non-zero exit status: 5
Traceback (most recent call last):
File "<stdin>", line 5204, in <module>
File "<stdin>", line 1116, in _infer_fsid
File "<stdin>", line 1199, in _infer_image
File "<stdin>", line 3322, in command_ceph_volume
File "<stdin>", line 878, in call_throws
RuntimeError: Failed command: /bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=storage01 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -v /var/run/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4:/var/run/ceph:z -v /var/log/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4:/var/log/ceph:z -v /var/lib/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /tmp/ceph-tmp3vjwl32x:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpclrbifgb:/var/lib/ceph/bootstrap-osd/ceph.keyring:z --entrypoint /usr/sbin/ceph-volume docker.io/ceph/ceph:v15 lvm prepare --bluestore --data /dev/sdaa --no-systemd
zapping the devices also errors:
ceph orch device zap storage01 /dev/sdaa --force
Error EINVAL: Zap failed: INFO:cephadm:/bin/podman:stderr --> Zapping: /dev/sdaa
INFO:cephadm:/bin/podman:stderr --> Zapping lvm member /dev/sdaa. lv_path is /dev/ceph-0a533319-def2-4fbe-82f5-e76f971b7f48/osd-data-9a23996c-6b99-4a46-b539-1dfe2e9358ae
INFO:cephadm:/bin/podman:stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-0a533319-def2-4fbe-82f5-e76f971b7f48/osd-data-9a23996c-6b99-4a46-b539-1dfe2e9358ae bs=1M count=10 conv=fsync
INFO:cephadm:/bin/podman:stderr stderr: dd: fsync failed for '/dev/ceph-0a533319-def2-4fbe-82f5-e76f971b7f48/osd-data-9a23996c-6b99-4a46-b539-1dfe2e9358ae': Input/output error
INFO:cephadm:/bin/podman:stderr stderr: 10+0 records in
INFO:cephadm:/bin/podman:stderr 10+0 records out
INFO:cephadm:/bin/podman:stderr 10485760 bytes (10 MB, 10 MiB) copied, 0.00846806 s, 1.2 GB/s
INFO:cephadm:/bin/podman:stderr --> RuntimeError: command returned non-zero exit status: 1
Traceback (most recent call last):
File "<stdin>", line 5203, in <module>
File "<stdin>", line 1115, in _infer_fsid
File "<stdin>", line 1198, in _infer_image
File "<stdin>", line 3321, in command_ceph_volume
File "<stdin>", line 877, in call_throws
RuntimeError: Failed command: /bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=storage01 -v /var/run/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4:/var/run/ceph:z -v /var/log/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4:/var/log/ceph:z -v /var/lib/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm --entrypoint /usr/sbin/ceph-volume docker.io/ceph/ceph:v15 lvm zap --destroy /dev/sdaa
Here is the relevant documentation for this:
https://docs.ceph.com/en/latest/cephadm/drivegroups/#drivegroups
https://docs.ceph.com/en/latest/mgr/orchestrator_modules/#osd-replacement
https://docs.ceph.com/en/latest/mgr/orchestrator/#orchestrator-cli-placement-spec
https://github.com/ceph/ceph/blob/892c51dd3c5f7108e766bea30cd5e0d801b0abd3/src/python-common/ceph/deployment/drive_group.py#L26
https://github.com/ceph/ceph/blob/892c51dd3c5f7108e766bea30cd5e0d801b0abd3/src/python-common/ceph/deployment/drive_group.py#L38
https://github.com/ceph/ceph/blob/892c51dd3c5f7108e766bea30cd5e0d801b0abd3/src/python-common/ceph/deployment/drive_group.py#L161
lvremove /dev/ceph-0a533319-def2-4fbe-82f5-e76f971b7f48/osd-data-9a23996c-6b99-4a46-b539-1dfe2e9358ae -y
vgremove ceph-0a533319-def2-4fbe-82f5-e76f971b7f48
Do this for all, then rerun the zaps:
for i in '/dev/sdz' '/dev/sdaa' '/dev/sdab' '/dev/sdac'; do ceph orch device zap storage01 $i --force; done
and finally
ceph orch apply osd -i replace.yaml

loop device setup (losetup, mount, etc) fails in a container immediately after host reboot

I'm trying to populate a disk image in a container environment (podman) on Centos 8. I had originally run into issues with accessing the loop device from the container until finding on SO and other sources that I needed to run podman as root and with the --privileged option.
While this did solve my problem in general, I noticed that after rebooting my host, my first attempt to setup a loop device in the container would fail (failed to set up loop device: No such file or directory), but after exiting and relaunching the container it would succeed (/dev/loop0). If for some reason I needed to set up a second loop device (/dev/loop1) in the container (after having gotten a first one working), it too would fail until I exited and relaunched the container.
Experimenting a bit further, I found I could avoid the errors entirely if I ran losetup --find --show <file created with dd> enough times to attach the maximum number of loop devices I would need, then detached all of those with losetup -D, I could avoid the loop device errors in the container entirely.
I suspect I'm missing something obvious about what losetup does on the host which it is apparently not able to do entirely within a container, or maybe this is more specifically a Centos+podman+losetup issue. Any insight as to what is going on and why I have to preattach/detach the loop devices after a reboot to avoid problems inside my container?
Steps to reproduce on a Centos 8 system (after having attached/detached once following a reboot):
$ dd if=/dev/zero of=file bs=1024k count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.00826706 s, 1.3 GB/s
$ cp file 1.img
$ cp file 2.img
$ cp file 3.img
$ cp file 4.img
$ sudo podman run -it --privileged --rm -v .:/images centos:8 bash
[root#2da5317bde3e /]# cd images
[root#2da5317bde3e images]# ls
1.img 2.img 3.img 4.img file
[root#2da5317bde3e images]# losetup --find --show 1.img
/dev/loop0
[root#2da5317bde3e images]# losetup --find --show 2.img
losetup: 2.img: failed to set up loop device: No such file or directory
[root#2da5317bde3e images]# losetup -D
[root#2da5317bde3e images]# exit
exit
$ sudo podman run -it --privileged --rm -v .:/images centos:8 bash
[root#f9e41a21aea4 /]# cd images
[root#f9e41a21aea4 images]# losetup --find --show 1.img
/dev/loop0
[root#f9e41a21aea4 images]# losetup --find --show 2.img
/dev/loop1
[root#f9e41a21aea4 images]# losetup --find --show 3.img
losetup: 3.img: failed to set up loop device: No such file or directory
[root#f9e41a21aea4 /]# losetup -D
[root#f9e41a21aea4 images]# exit
exit
$ sudo podman run -it --privileged --rm -v .:/images centos:8 bash
[root#c93cb71b838a /]# cd images
[root#c93cb71b838a images]# losetup --find --show 1.img
/dev/loop0
[root#c93cb71b838a images]# losetup --find --show 2.img
/dev/loop1
[root#c93cb71b838a images]# losetup --find --show 3.img
/dev/loop2
[root#c93cb71b838a images]# losetup --find --show 4.img
losetup: 4.img: failed to set up loop device: No such file or directory
I know it's a little old but I've stumbled across similar problem and here what I've discovered:
After my vm boots up it does not have any loop device configured and it's ok since mount can create additional devices if needed but:
it seems that docker puts overlay over /dev so it won't see any changes that were done in /dev/ after container was started so even if mount requested new loop devices to be created and they actually were created my running container won't see it and fail to mount because of no loop device available.
Once you restart container it will pick up new changes from /dev and see loop devices and successfully mount until it run out of them and try to request again.
So what i tried (and it seems working) I passed /dev to docker as volume mount like this
docker -v /dev:/dev -it --rm <image> <command> and it did work.
If you still have this stuff I was wondering if you could try it too to see if it helps.
The only other method I can think of, beyond what you've already found is to create the /dev/loop devices yourself on boot. Something like this should work:
modprobe loop # This may not be necessary, depending on your kernel build but is harmless.
major=$(grep loop /proc/devices | cut -c3)
for index in 0 1 2 3 4 5
do
mknod /dev/loop$i $major $i
done
Put this in /etc/rc.local, your system's equivalent or otherwise arrange for it to run on boot.

gcsfuse won't write to folder

I am trying to figure out why if I don't need to do this step
(Ubuntu before wily only) Add yourself to the fuse group, then log out and back in:
sudo usermod -a -G fuse $USER
exit
Then why can't I write to files; I keep getting the following error:
Using mount point: /mnt/c/Users/russe/Documents/gstorage
Opening GCS connection...
Opening bucket...
Mounting file system...
daemonize.Run: readFromProcess: sub-process: mountWithArgs: mountWithConn: Mount: mount: running fusermount: exit status 1
stderr:
fusermount: fuse device not found, try 'modprobe fuse' first
I am using Ubuntu (on Windows App Store).
Even running:
sudo mount -t gcsfuse -o implicit_dirs,allow_other,uid=1000,gid=1000,key_file=/mnt/c/Users/russe/Documents/RadioMedia-ba86f56a2aa6.json radiomediapod
cast gstorage
had an error:
Calling gcsfuse with arguments: --uid 1000 --gid 1000 --key-file /mnt/c/Users/russe/Documents/RadioMedia-ba86f56a2aa6.json -o rw --implicit-dirs -o allow_other radiomediapodcast /mnt/c/Users/russe/Documents/gstorage
Using mount point: /mnt/c/Users/russe/Documents/gstorage
Opening GCS connection...
Opening bucket...
Mounting file system...
daemonize.Run: readFromProcess: sub-process: mountWithArgs: mountWithConn: Mount: mount: running fusermount: exit status 1
stderr:
fusermount: fuse device not found, try 'modprobe fuse' first
running gcsfuse: exit status 1
The problem you are having might be because of two things:
-Permissions on the OS after mounting, to solve this mount your bucket with the following command:
sudo mount -t gcsfuse -o implicit_dirs,allow_other,uid=1000,gid=1000,key_file=<KEY_FILE>.json <BUCKET> <PATH>
-Permissions of your service account, to validate this you can go on the console to IAM & admin and verify that the service account being used has Storage Admin Role.

Docker healthcheck command not found inside container

I have the following Dockerfile
FROM mongo
COPY docker-healthcheck /usr/local/bin/
HEALTHCHECK CMD ["docker-healthcheck"]
docker-healthcheck looks like this:
#!/bin/bash
set -eo pipefail
host="$(hostname --ip-address || echo '127.0.0.1')"
if mongo --quiet "$host/test" --eval 'quit(db.runCommand({ ping: 1 }).ok ? 0 : 2)'; then
exit 0
fi
exit 1
Although this is exactly the same as this example the healthcheck returns the following error:
"Output": "OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused \"exec: \\\"docker-healthcheck\\\": executable file not found in $PATH\": unknown"
Permissions were wrong on the docker-healthcheck file. They were 644 but should be 755 for Docker to be able to execute.
chmod+x 755 docker-healthcheck
After building the image again the healthcheck started working.