Why does ceph features doesn't match ceph versions - upgrade

Our Ceph cluster was upgraded a while ago from luminous to nautilus 14.2.8, but the process wasn't obviously completed as shown for instance by heterogeneous versions. Most likely ceph-deploy was used for that.
From https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous along the most obvious missing steps, monitors at least were not restarted and this important step was missing:
# ceph osd require-osd-release nautilus
I've completed all the steps I could figure from this page, and the cluster is healthy, but though the version is consistent across all nodes:
# ceph versions
...
"overall": {
"ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)": 20
}
}
# ceph features only displays "release": "luminous",
... while:
# ceph mon feature ls
all features
supported: [kraken,luminous,mimic,osdmap-prune,nautilus]
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
on current monmap (epoch 3)
persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
required: [kraken,luminous,mimic,osdmap-prune,nautilus]
# ceph osd dump | grep min_compat_client
require_min_compat_client nautilus
min_compat_client jewel
This doesn't seem to impact the ability to run a Nautilus feature like decreasing the number of PGs.
Why doesn't # ceph features reflect the Nautilus version?
And could this have an impact when trying to upgrade the cluster further to Octopus, or Pacific (at least v16.2.7) when available?

Related

Ceph Mgr not responding to certain commands

We have a ceph cluster built with rook(2 mgrs, 3 mons, 2 mds each cephfs, 24 osds, rook: 1.9.3, ceph: 16.2.7, kubelet: 1.24.1). Our operation requires constantly creating and deleting cephfilesystems. Overtime we experienced issues with rook-ceph-mgr. After the cluster was built, in a week or two, rook-ceph-mgr failed to respond to certain ceph commands, like ceph osd pool autoscale-status, ceph fs subvolumegroup ls, while other commands, like ceph -s, worked fine. We have to restart rook-ceph-mgr to get it going. Now we have around 30 cephfilesystems and the issue happens more frequently.
We tried disabling mgr modules dashboard, prometheus and iostat, set ceph progress off, increased mgr_stats_period & mon_mgr_digest_period. That didn't help much. The issue happened again after one or two creating & deleting cycles.

How can I fix ceph commands hanging after a reboot?

I'm pretty new to Ceph, so I've included all my steps I used to set up my cluster since I'm not sure what is or is not useful information to fix my problem.
I have 4 CentOS 8 VMs in VirtualBox set up to teach myself how to bring up Ceph. 1 is a client and 3 are Ceph monitors. Each ceph node has 6 8Gb drives. Once I learned how the networking worked, it was pretty easy.
I set each VM to have a NAT (for downloading packages) and an internal network that I called "ceph-public". This network would be accessed by each VM on the 10.19.10.0/24 subnet. I then copied the ssh keys from each VM to every other VM.
I followed this documentation to install cephadm, bootstrap my first monitor, and added the other two nodes as hosts. Then I added all available devices as OSDs, created my pools, then created my images, then copied my /etc/ceph folder from the bootstrapped node to my client node. On the client, I ran rbd map mypool/myimage to mount the image as a block device, then used mkfs to create a filesystem on it, and I was able to write data and see the IO from the bootstrapped node. All was well.
Then, as a test, I shutdown and restarted the bootstrapped node. When it came back up, I ran ceph status but it just hung with no output. Every single ceph and rbd command now hangs and I have no idea how to recover or properly reset or fix my cluster.
Has anyone ever had the ceph command hang on their cluster, and what did you do to solve it?
Let me share a similar experience. I also tried some time ago to perform some tests on Ceph (mimic i think) an my VMs on my VirtualBox acted very strange, nothing comparing with actual bare metal servers so please bare this in mind... the tests are not quite relevant.
As regarding your problem, try to see the following:
have at least 3 monitors (or an even number). It's possible that hang is because of monitor election.
make sure the networking part is OK (separated VLANs for ceph servers and clients)
DNS is resolving OK. (you have added the servername in hosts)
...just my 2 cents...

Adding OSDs to Ceph with WAL+DB

I'm new to Ceph and am setting up a small cluster. I've set up five nodes and can see the available drives but I'm unsure on exactly how I can add an OSD and specify the locations for WAL+DB.
Maybe my Google-fu is weak but the only guides I can find refer to ceph-deploy which, as far as I can see, is deprecated. Guides which mention cephadm only mention adding a drive but not specifying the WAL+DB locations.
I want to add HDDs as OSDs and put the WAL and DB onto separate LVs on an SSD. How?!
It seems for the more advanced cases, like using dedicated WAL and/or DB, you have to use the concept of drivegroups
If the version of your Ceph is Octopus(which ceph-deploy is deprecated), I suppose you could try this.
sudo ceph-volume lvm create --bluestore --data /dev/data-device --block.db /dev/db-device
I built Ceph from source codes but I think this method should be supported and you could
try
ceph-volume lvm create --help
to see more parameters.

Ceph Disaster Recovery Solution

I m sitting up a new cluster , i would like to set up another cluster and enable sync between the two, the second cluster work as disater recovery.
i'm new to Ceph and i'm not sure how to do it?
Disaster recovery in Ceph depends on how you are using Ceph and what exactly your requirements for disaster recovery are. Since Jewel there are two options.
For RBD you can use rbd-mirroring (http://docs.ceph.com/docs/mimic/rbd/rbd-mirroring/). Mirroring can be enabled based on pools and will invoke the rbd-mirror daemon. With rbd-mirroring enabled writes will go into the rbd image journal first, after that the write will be acknowledged to the client and then written to image itself. The rbd-mirror daemon will replay the image journal to your remote location.
Additiontally for radaosgw multisite is available: http://docs.ceph.com/docs/mimic/radosgw/multisite/.
For CephFS there is no such kind of solution available yet.

Installing kubernetes on less ram

Is it possible to install kubernetes by kubeadm init command on system has RAM less than 1GB ?I have tried to install but it failed in kubeadm init command.
As mentioned in the installation steps to be taken before you begin, you need to have:
linux compatible system for master and nodes
2GB or more RAM per machine
network connectivity
swap disabled on every node
But going back to your question, It may be possible to run the installation process, but the further usability is not possible. This configuration will be not stable.