Ceph luminous rbd map hangs forever - ceph

Running a 1 node ceph cluster, and using the ceph-client from another node. Qemu is working fine with the RBD mounting. When I try to mount a RBD block device on the ceph-client I get an indefinite hang with no output. How to diagnose whats wrong?
System is ubuntu 16.04 server, and Ceph Luminous.
sudo ceph tell osd.* version
{
"version": "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)"
}
ceph -s
cluster:
id: 4bfcc109-e432-4ac0-ba9d-bf81243aea
health: HEALTH_OK
services:
mon: 1 daemons, quorum gcmaster
mgr: gcmaster(active)
osd: 1 osds: 1 up, 1 in
data:
pools: 1 pools, 128 pgs
objects: 1512 objects, 5879 MB
usage: 7356 MB used, 216 GB / 223 GB avail
pgs: 128 active+clean
rbd info gcbase
rbd image 'gcbase':
size 512 MB in 128 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.376974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Fri Dec 29 17:58:02 2017
This hangs forever
rbd map gcbase --pool rbd
As does this
rbd map typo_gcbase --pool rbd
dmesg shows
Dec 29 13:27:32 cephclient1 kernel: [85798.195468] libceph: mon0 192.168.1.55:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
Dec 29 13:27:32 cephclient1 kernel: [85798.222070] libceph: mon0 192.168.1.55:6789 missing required protocol features

The dmesg output tells what's going on: The cluster requires a feature bit that is not supported by the libceph kernel module.
The feature bit in question is either CEPH_FEATURE_CRUSH_TUNABLES5, CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING or CEPH_FEATURE_FS_FILE_LAYOUT_V2 (they are overlapping because they were introduced at the same time) which only became available on kernel 4.5, whereas Ubuntu 16.04 uses a 4.4 kernel.
A similar question (although related to CephFS) came up on the mailing list with a possible solution:
Yes, you should be able to set your CRUSH tunables profile to hammer
with "ceph osd crush tunables hammer".
This will disable some features, but should make the older kernel compatible with the cluster.
Alternatively you could upgrade to a mainline kernel or to a newer OS release.

Related

Ceph PGs not deep scrubbed in time keep increasing

I've noticed this about 4 days ago and dont know what to do right now. The problem is as follows:
I have a 6 node 3 monitor ceph cluster with 84 osds, 72x7200rpm spin disks and 12xnvme ssds for journaling. Every value for scrub configurations are the default values. Every pg in the cluster is active+clean, every cluster stat is green. Yet PGs not deep scrubbed in time keeps increasing and it is at 96 right now. Output from ceph -s:
cluster:
id: xxxxxxxxxxxxxxxxx
health: HEALTH_WARN
1 large omap objects
96 pgs not deep-scrubbed in time
services:
mon: 3 daemons, quorum mon1,mon2,mon3 (age 6h)
mgr: mon2(active, since 2w), standbys: mon1
mds: cephfs:1 {0=mon2=up:active} 2 up:standby
osd: 84 osds: 84 up (since 4d), 84 in (since 3M)
rgw: 3 daemons active (mon1, mon2, mon3)
data:
pools: 12 pools, 2006 pgs
objects: 151.89M objects, 218 TiB
usage: 479 TiB used, 340 TiB / 818 TiB avail
pgs: 2006 active+clean
io:
client: 1.3 MiB/s rd, 14 MiB/s wr, 93 op/s rd, 259 op/s wr
How do i solve this problem? Also ceph health detail output shows that this non deep-scrubbed pg alerts started in january 25th but i didn't notice this before. The time I noticed this was when an osd went down for 30 seconds and got up. Might it be related to this issue? will it just resolve itself? should i tamper with the scrub configurations? For example how much performance loss i might face on client side if i increase osd_max_scrubs to 2 from 1?
Usually the cluster deep-scrubs itself during low I/O intervals on the cluster. The default is every PG has to be deep-scrubbed once a week. If OSDs go down they can't be deep-scrubbed, of course, this could cause some delay.
You could run something like this to see which PGs are behind and if they're all on the same OSD(s):
ceph pg dump pgs | awk '{print $1" "$23}' | column -t
Sort the output if necessary, and you can issue a manual deep-scrub on one of the affected PGs to see if the number decreases and if the deep-scrub itself works.
ceph pg deep-scrub <PG_ID>
Also please add ceph osd pool ls detail to see if any flags are set.
You can set the deep scrub period to 2 week, to stretch the deep scrub window.
Insted of
osd_deep_scrub_interval = 604800
use:
osd_deep_scrub_interval = 1209600
Mr. Eblock has a good idea to force manually some of the pgs for deep scrub , to spread the actions evently within 2 week.
You have 2 options:
Increase the interval between deep scrubs.
Control deep scrubbing manually with a standalone script.
I've written a simple PHP script which takes care of deep scrubbing for me: https://gist.github.com/ethaniel/5db696d9c78516308b235b0cb904e4ad
It lists all the PGs, picks 1 PG which have a last deep scrub done more than 2 weeks ago (the script takes the oldest one), checks if the OSDs that the PG sits on are not being used for another scrub (are in active+clean state), and only then starts a deep scrub on that PG. Otherwise it goes looking for another PG.
I have osd_max_scrubs set to 1 (otherwise OSD daemons start crashing due to a bug in Ceph), so this script works nicely with the regular scheduler - whichever starts the scrubbing on a PG-OSD first, wins.

how to rejoin Mon and mgr Ceph to cluster

i have this situation and cand access to ceph dashboard.i haad 5 mon but 2 of them went down and one of them is the bootstrap mon node so that have mgr and I got this from that node.
2020-10-14T18:59:46.904+0330 7f9d2e8e9700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
cluster:
id: e97c1944-e132-11ea-9bdd-e83935b1c392
health: HEALTH_WARN
no active mgr
services:
mon: 3 daemons, quorum srv4,srv5,srv6 (age 2d)
mgr: no daemons active (since 2d)
mds: heyatfs:1 {0=heyfs.srv10.lxizhc=up:active} 1 up:standby
osd: 54 osds: 54 up (since 47h), 54 in (since 3w)
task status:
scrub status:
mds.heyfs.srv10.lxizhc: idle
data:
pools: 3 pools, 65 pgs
objects: 223.95k objects, 386 GiB
usage: 1.2 TiB used, 97 TiB / 98 TiB avail
pgs: 65 active+clean
io:
client: 105 KiB/s rd, 328 KiB/s wr, 0 op/s rd, 0 op/s wr
I have to say the whole story, I used cephadm to create my cluster at first and I'm so new to ceph i have 15 servers and 14 of them have OSD container and 5 of them had mon and my bootstrap mon that is srv2 have mgr.
2 of these servers have public IP and I used one of them as a client (I know this structure have a lot of question in it but my company forces me to do it and also I'm new to ceph so it's how it's now). 2 weeks ago I lost 2 OSD and I said to datacenter who gives me these servers to change that 2 HDD they restart those servers and unfortunately, those servers were my Mon server. after they restarted those servers on of them came back srv5 but I could see srv3 is out of quorum
so i begon to solve this problem so I used this command in ceph shell --fsid ...
ceph orch apply mon srv3
ceph mon remove srv3
after some while I see in my dashboard srv2 my boostrap mon and mgr is not working and when I used ceph -s ssrv2 isn't there and I can see srv2 mon in removed directory
root#srv2:/var/lib/ceph/e97c1944-e132-11ea-9bdd-e83935b1c392# ls
crash crash.srv2 home mgr.srv2.xpntaf osd.0 osd.1 osd.2 osd.3 removed
but mgr.srv2.xpntaf is running and unfortunately, I lost my access to ceph dashboard now
i tried to add srv2 and 3 to monmap with
576 ceph orch daemon add mon srv2:172.32.X.3
577 history | grep dump
578 ceph mon dump
579 ceph -s
580 ceph mon dump
581 ceph mon add srv3 172.32.X.4:6789
and now
root#srv2:/# ceph -s
cluster:
id: e97c1944-e132-11ea-9bdd-e83935b1c392
health: HEALTH_WARN
no active mgr
2/5 mons down, quorum srv4,srv5,srv6
services:
mon: 5 daemons, quorum srv4,srv5,srv6 (age 16h), out of quorum: srv2, srv3
mgr: no daemons active (since 2d)
mds: heyatfs:1 {0=heyatfs.srv10.lxizhc=up:active} 1 up:standby
osd: 54 osds: 54 up (since 2d), 54 in (since 3w)
task status:
scrub status:
mds.heyatfs.srv10.lxizhc: idle
data:
pools: 3 pools, 65 pgs
objects: 223.95k objects, 386 GiB
usage: 1.2 TiB used, 97 TiB / 98 TiB avail
pgs: 65 active+clean
io:
client: 105 KiB/s rd, 328 KiB/s wr, 0 op/s rd, 0 op/s wr
and I must say ceph orch host ls doesn't work and it hangs when I run it and I think it's because of that err no active mgr and also when I see that removed directory mon.srv2 is there and you can see unit.run file so I used that command to run the container again but it says mon.srv2 isn't on mon map and doesn't have specific IP and by the way I must say after ceph orch apply mon srv3 i could see a new container with a new fsid in srv3 server
I now my whole problem is because I ran this command ceph orch apply mon srv3
because when you see the installation document :
To deploy monitors on a specific set of hosts:
# ceph orch apply mon *<host1,host2,host3,...>*
Be sure to include the first (bootstrap) host in this list.
and I didn't see that line !!!
now I manage to have another mgr running but I got this
root#srv2:/var/lib/ceph/mgr# ceph -s
2020-10-15T13:11:59.080+0000 7f957e9cd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
cluster:
id: e97c1944-e132-11ea-9bdd-e83935b1c392
health: HEALTH_ERR
1 stray daemons(s) not managed by cephadm
2 mgr modules have failed
2/5 mons down, quorum srv4,srv5,srv6
services:
mon: 5 daemons, quorum srv4,srv5,srv6 (age 20h), out of quorum: srv2, srv3
mgr: srv4(active, since 8m)
mds: heyatfs:1 {0=heyatfs.srv10.lxizhc=up:active} 1 up:standby
osd: 54 osds: 54 up (since 2d), 54 in (since 3w)
task status:
scrub status:
mds.heyatfs.srv10.lxizhc: idle
data:
pools: 3 pools, 65 pgs
objects: 301.77k objects, 537 GiB
usage: 1.6 TiB used, 97 TiB / 98 TiB avail
pgs: 65 active+clean
io:
client: 180 KiB/s rd, 597 B/s wr, 0 op/s rd, 0 op/s wr
and when I run the ceph orch host ls i see this
root#srv2:/var/lib/ceph/mgr# ceph orch host ls
HOST ADDR LABELS STATUS
srv10 172.32.x.11
srv11 172.32.x.12
srv12 172.32.x.13
srv13 172.32.x.14
srv14 172.32.x.15
srv15 172.32.x.16
srv2 srv2
srv3 172.32.x.4
srv4 172.32.x.5
srv5 172.32.x.6
srv6 172.32.x.7
srv7 172.32.x.8
srv8 172.32.x.9
srv9 172.32.x.10

Rookio Ceph cluster : mon c is low on available space message

I have setup RookIO 1.4 cluster in Kubernetes 1.18. with 3 nodes allocated 1TB storage on each of them.
after creating cluster. when I run the ceph status cluster status shows as HEALTH_WARN with mon c is low on available space.
There is no data stored yet. why status how low on available space? How to clear this error?
[root#rook-ceph-tools-6bdcd78654-sfjvl /]# ceph status
cluster:
id: ad42764d-aa28-4da5-a828-2d87205aff08
health: HEALTH_WARN
mon c is low on available space
services:
mon: 3 daemons, quorum a,b,c (age 37m)
mgr: a(active, since 36m)
osd: 3 osds: 3 up (since 37m), 3 in (since 37m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 3.6 TiB / 3.6 TiB avail
pgs: 1 active+clean
All three node has same size storage:
sdb 8:16 0 1.2T 0 disk
└─ceph--a6cd601d--7584--4b1f--bf82--48c95437f351-osd--data--ae1bc856--8ded--4b1e--8c87--30ca0f0959a3 253:3 0 1.2T 0 lvm
sdb 8:16 0 1.2T 0 disk
└─ceph--ccaf7144--d6a0--441c--bcd5--6a09d056bd7a-osd--data--36a9b28c--7207--400a--936b--edfb3255ce0b 253:3 0 1.2T 0 lvm
sdb 8:16 0 1.2T 0 disk
└─ceph--53e9b8a9--8925--4b21--a6ea--f8e17a322d5c-osd--data--6b1e779c--a18a--4e4d--960e--73ca9473d02f 253:3 0 1.2T 0 lvm
Thanks
SR
This alert is for your monitor disk space that is stored normally in /var/lib/ceph/mon. This path is stored in root fs that isn't related to your OSDs block device. This warn is raised when this path has less than 30% available space (see mon_data_avail_warn which is 30 by default).
You can change it to ignore alert or resize that path to have more space for its RocksDB data.
As Seena explained, it was because the available space is less than 30%, in this case, you could compact the mon data by the command as follow.
ceph tell mon.`hostname -s` compact
There is another way to trigger the data compaction for mon, add the mon config to the ceph.conf, and then restart the mon.
[mon]
mon compact on start = true

Ran out of Docker disk space

I have this Docker command:
docker run -d mongo
this will build and run a mongodb server running in a docker container
However, I get an error:
no space left on device
I am on MacOS, and using the newer versions of Docker which use hyper-v instead of VirtualBox (I think that's correct).
Here is the exact error message from the mongo container:
$ docker logs efee16702c5756659d563b98d4ae0f58ecf1f1bba8a54f63443c0ae4b520ab4e
about to fork child process, waiting until server is ready for connections.
forked process: 21
2017-05-04T20:23:51.412+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2017-05-04T20:23:51.430+0000 I CONTROL [main] ERROR: Cannot write pid file to /tmp/tmp.Lo035QkbfL: No space left on device
ERROR: child process failed, exited with error number 1
Any idea how to fix this and prevent it from happening in future?
As suggested, the output of df -h is:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk1 465Gi 116Gi 349Gi 25% 1963838 4293003441 0% /
devfs 183Ki 183Ki 0Bi 100% 634 0 100% /dev
map -hosts 0Bi 0Bi 0Bi 100% 0 0 100% /net
map auto_home 0Bi 0Bi 0Bi 100% 0 0 100% /home
Output of docker info is:
$ docker info
Containers: 5
Running: 0
Paused: 0
Stopped: 5
Images: 741
Server Version: 17.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.13-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.952 GiB
Name: moby
ID: OR4L:WYWW:FFAP:IDX3:B6UK:O2AN:UVTO:EPH6:GYSV:4GV4:L5WP:BQTH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 17
Goroutines: 30
System Time: 2017-05-04T20:45:27.056157913Z
EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
As you state in the comments to the question, ls -altrh ~/Library/Containers/com.docker.docker/Data/com.docker.driver.‌​amd64-linux/Docker.q‌​cow2 returns the following:
-rw-r--r--# 1 alexamil staff 53G
This is a known bug on MacOS (actually, not only) and an official dev comment could be found here. Except for one thing: I read, that different people get different size limit. In the comment it is 64Gb, but for another person it was 20Gb.
There are a couple walkarounds, but no definite solution that I could find.
The manual one
Run docker ps -a and manually remove all unused containers. Then run docker images and remove manually all the intermediate and unused images.
The simplest one
Delete the Docker.qcow2 file entirely. But you will lose all images and containers. Completely.
The less simple
Another way is to run docker volume prune, which will remove all unused volumes
The resizing one (keeps the data)
Another idea that comes to me is to expand the disk image size with QEMU or something like it:
$ brew install qemu
$ /Applications/Docker.app/Contents/MacOS/qemu-img resize ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 +5G
After you expanded the image, you will need to run a VM in which you should run GParted against Docker.qcow2 and expand the partition to use added space. You could use GParted Live ISO for that:
$ qemu-system-x86_64 -drive file=Docker.qcow2 -m 512 -cdrom ~/Downloads/gparted-live.iso -boot d -device usb-mouse -usb
Some people report this either doesn't work or doesn't help.
Yet another resizing one (wipes the data)
Create a substitute image with desired size (120G):
$ qemu-img create -f qcow2 ~/data.qcow2 120G
$ cp ~/data.qcow2 /Application/Docker.app/Contents/Resources/moby/data.qcow2
$ rm ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
data.qcow2 is copied to ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 when you restart docker.
This walkaround comes from this comment.
Hope this helps. Good luck!

Ceph enters degraded state after Deis installation

I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean.
I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.
Detail messages follow:
root#deis-2:/# ceph -s
libust[276/276]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
cluster dfa09ba0-66f2-46bb-8d84-12795f281f7d
health HEALTH_WARN 1536 pgs degraded; 1536 pgs stuck unclean; 1536 pgs undersized; recovery 1314/3939 objects degraded (33.359%)
monmap e3: 3 mons at {deis-1=10.132.183.190:6789/0,deis-2=10.132.183.191:6789/0,deis-3=10.132.183.192:6789/0}, election epoch 28, quorum 0,1,2 deis-1,deis-2,deis-3
mdsmap e32: 1/1/1 up {0=deis-1=up:active}, 2 up:standby
osdmap e77: 3 osds: 2 up, 2 in
pgmap v109093: 1536 pgs, 12 pools, 897 MB data, 1313 objects
27342 MB used, 48256 MB / 77175 MB avail
1314/3939 objects degraded (33.359%)
1536 active+undersized+degraded
client io 817 B/s wr, 0 op/s
I am totally new to ceph. I wonder:
Is it a big deal to fix this issue, or could I let it be in this state?
If it is recommended to fix this, would you point out how should I go about it?
I read about Ceph troubleshooting section and POOL, PG AND CRUSH CONFIG REFERENCE, but still have no idea what I should do next.
Thanks a lot!
From this output: osdmap e77: 3 osds: 2 up, 2 in. It sounds like one of your deis-store-daemons isn't responding. deisctl restart store-daemon should recover your cluster, but I'd be curious about what happened to that daemon. I'd love to see journalctl --no-pager -u deis-store-daemon on all of your hosts. If you could add your logs to https://github.com/deis/deis/issues/2520 that'd help us figure out why the daemon isn't responding.
Also, 2GB nodes on DO will likely result in performance issues (and Ceph may be unhappy).