Rookio Ceph cluster : mon c is low on available space message - kubernetes

I have setup RookIO 1.4 cluster in Kubernetes 1.18. with 3 nodes allocated 1TB storage on each of them.
after creating cluster. when I run the ceph status cluster status shows as HEALTH_WARN with mon c is low on available space.
There is no data stored yet. why status how low on available space? How to clear this error?
[root#rook-ceph-tools-6bdcd78654-sfjvl /]# ceph status
cluster:
id: ad42764d-aa28-4da5-a828-2d87205aff08
health: HEALTH_WARN
mon c is low on available space
services:
mon: 3 daemons, quorum a,b,c (age 37m)
mgr: a(active, since 36m)
osd: 3 osds: 3 up (since 37m), 3 in (since 37m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 3.6 TiB / 3.6 TiB avail
pgs: 1 active+clean
All three node has same size storage:
sdb 8:16 0 1.2T 0 disk
└─ceph--a6cd601d--7584--4b1f--bf82--48c95437f351-osd--data--ae1bc856--8ded--4b1e--8c87--30ca0f0959a3 253:3 0 1.2T 0 lvm
sdb 8:16 0 1.2T 0 disk
└─ceph--ccaf7144--d6a0--441c--bcd5--6a09d056bd7a-osd--data--36a9b28c--7207--400a--936b--edfb3255ce0b 253:3 0 1.2T 0 lvm
sdb 8:16 0 1.2T 0 disk
└─ceph--53e9b8a9--8925--4b21--a6ea--f8e17a322d5c-osd--data--6b1e779c--a18a--4e4d--960e--73ca9473d02f 253:3 0 1.2T 0 lvm
Thanks
SR

This alert is for your monitor disk space that is stored normally in /var/lib/ceph/mon. This path is stored in root fs that isn't related to your OSDs block device. This warn is raised when this path has less than 30% available space (see mon_data_avail_warn which is 30 by default).
You can change it to ignore alert or resize that path to have more space for its RocksDB data.

As Seena explained, it was because the available space is less than 30%, in this case, you could compact the mon data by the command as follow.
ceph tell mon.`hostname -s` compact
There is another way to trigger the data compaction for mon, add the mon config to the ceph.conf, and then restart the mon.
[mon]
mon compact on start = true

Related

CentOs Partition Resize

I'm struggling with resizing a CentOs Partition on a Server. I found some steps, but I'm not sure which circumstances I face and whats the correct approach and i definitely cannot mess that up.
The space should already be available, but the partition is not resized as far as I can tell.
The goal is to extend the partition /dev/sdb1 from 197GB to 1TB
Below are the "lsblk", "df -h" and "fdisk -l" results which should show my current situation.
[ ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 50G 0 disk
├─sda1 8:1 0 1G 0 part /boot
├─sda2 8:2 0 3.7G 0 part [SWAP]
└─sda3 8:3 0 45.3G 0 part /
sdb 8:16 0 1T 0 disk
└─sdb1 8:17 0 1024G 0 part /var/www/vhosts
sdc 8:32 0 50G 0 disk
└─sdc1 8:33 0 50G 0 part /var/lib/psa
sr0 11:0 1 680M 0 rom
[ ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 12M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sda3 45G 7.0G 36G 17% /
/dev/sda1 976M 135M 775M 15% /boot
/dev/sdc1 50G 53M 47G 1% /var/lib/psa
/dev/sdb1 197G 126G 62G 68% /var/www/vhosts
tmpfs 1.6G 0 1.6G 0% /run/user/0
[ ~]# fdisk -l
Disk /dev/sda: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0009c4b4
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2099199 1048576 83 Linux
/dev/sda2 2099200 9910271 3905536 82 Linux swap / Solaris
/dev/sda3 9910272 104855551 47472640 83 Linux
Disk /dev/sdb: 1099.5 GB, 1099511627776 bytes, 2147483648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x8e948ef1
Device Boot Start End Blocks Id System
/dev/sdb1 2048 2147483647 1073740800 83 Linux
Disk /dev/sdc: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x7677284e
Device Boot Start End Blocks Id System
/dev/sdc1 2048 104857599 52427776 83 Linux
I found this answer here on an external page, but I'm not familiar with the commands and cannot tell, if thats the right way to go (if allowed I can paste the url). Partition Paths have not beed update to mine.
There are three steps to make:
alter your partition table so sda2 ends at end of disk
reread the partition table (will require a reboot)
resize your LVM pv using pvresize
Step 1 - Partition table Run fdisk
/dev/sda. Issue p to print your current partition table and copy that
output to some safe place. Now issue d followed by 2 to remove the
second partition. Issue n to create a new second partition. Make sure
the start equals the start of the partition table you printed earlier.
Make sure the end is at the end of the disk (usually the default).
Issue t followed by 2 followed by 8e to toggle the partition type of
your new second partition to 8e (Linux LVM).
Issue p to review your new partition layout and make sure the start of
the new second partition is exactly where the old second partition
was.
If everything looks right, issue w to write the partition table to
disk. You will get an error message from partprobe that the partition
table couldn't be reread (because the disk is in use).
Step 2 Reboot your system This step is neccessary so the partition table gets
re-read.
Step 3 Resize the LVM PV After your system rebooted invoke pvresize
/dev/sda2. Your Physical LVM volume will now span the rest of the
drive and you can create or extend logical volumes into that space.
The question is, is that the right way to increase the partition size without loosing any data on it for a CentOs System?
Thank you
As you can see the partition
sdb 8:16 0 1T 0 disk
└─sdb1 8:17 0 1024G 0 part /var/www/vhosts
is already 1TB. So you need to extend the filesystem. If your filesystem is ext4 you can use command:
resize2fs /var/www/vhosts
if your filesystem is xfs you can use command:
xfs_growfs /var/www/vhosts

How to use ceph to store large amount of small data

I set up a cephfs cluster on my virtual machine, and then want to use this cluster to store a batch of image data (total 1.4G, each image is about 8KB). The cluster stores two copies, with a total of 12G of available space. But when I store data inside, the system prompts that the available space is insufficient. How to solve this?The details of the cluster are as follows:
Cluster Information:
cluster:
id: 891fb1a7-df35-48a1-9b5c-c21d768d129b
health: HEALTH_ERR
1 MDSs report slow metadata IOs
1 MDSs report slow requests
1 full osd(s)
1 nearfull osd(s)
2 pool(s) full
Degraded data redundancy: 46744/127654 objects degraded (36.618%), 204 pgs degraded
Degraded data redundancy (low space): 204 pgs recovery_toofull
too many PGs per OSD (256 > max 250)
clock skew detected on mon.node2, mon.node3
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: node2(active), standbys: node1, node3
mds: cephfs-1/1/1 up {0=node1=up:active}, 2 up:standby
osd: 3 osds: 2 up, 2 in
data:
pools: 2 pools, 256 pgs
objects: 63.83k objects, 543MiB
usage: 10.6GiB used, 1.40GiB / 12GiB avail
pgs: 46744/127654 objects degraded (36.618%)
204 active+recovery_toofull+degraded
52 active+clean
Cephfs Space Usage:
[root#node1 0]# df -hT
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/mapper/nlas-root xfs 36G 22G 14G 62% /
devtmpfs devtmpfs 2.3G 0 2.3G 0% /dev
tmpfs tmpfs 2.3G 0 2.3G 0%
/dev/shm
tmpfs tmpfs 2.3G 8.7M 2.3G 1% /run
tmpfs tmpfs 2.3G 0 2.3G 0%
/sys/fs/cgroup
/dev/sda1 xfs 1014M 178M 837M 18% /boot
tmpfs tmpfs 2.3G 28K 2.3G 1%
/var/lib/ceph/osd/ceph-0
tmpfs tmpfs 471M 0 471M 0%
/run/user/0
192.168.152.3:6789,192.168.152.4:6789,192.168.152.5:6789:/ ceph 12G 11G 1.5G 89% /mnt/test
Ceph OSD:
[root#node1 mnt]# ceph osd pool ls
cephfs_data
cephfs_metadata
[root#node1 mnt]# ceph osd pool get cephfs_data size
size: 2
[root#node1 mnt]# ceph osd pool get cephfs_metadata size
size: 2
ceph.dir.layout:
[root#node1 mnt]# getfattr -n ceph.dir.layout /mnt/test
getfattr: Removing leading '/' from absolute path names
# file: mnt/test
ceph.dir.layout="stripe_unit=65536 stripe_count=1 object_size=4194304 pool=cephfs_data"
Storing small files, you need to watch the minimum allocation size. Until the Nautilus release, this defaulted to 16k for SSD and 64k for HDD, but with the new Ceph Pacific the default minimum allocation has been tuned to 4k for both.
I suggest you use Pacific, or manually tune Octopus to the same numbers if that's the version you installed.
You also want to use replication (as opposed to Erasure Coding) if your files are under a multiple of the minimum allocation size, as the chunks of EC would use the same minimum allocation and will waste slack space otherwise. You already made the right choice here by using replication, I am just mentioning it here because you may be tempted by EC's touted space-saving properties -- which unfortunately do not apply to small files.
you need to set bluestore_min_alloc_size to 4096 by default its value is 64kb
[osd]
bluestore_min_alloc_size = 4096
bluestore_min_alloc_size_hdd = 4096
bluestore_min_alloc_size_ssd = 4096

Why my new Ceph cluster status never shows 'HEALTH_OK'?

I'm working on setup a Ceph cluster with Docker and image 'ceph/daemon:v3.1.0-stable-3.1-luminous-centos-7'. But after the cluster has been setup, the ceph status command never reaches HEALTH_OK. Here is my cluster's information. It has enough disk space and the network is all right.
My question are:
Why does Ceph not replicate the 'undersized' pages?
How to fix it?
Thank you very much!
➜ ~ ceph -s
cluster:
id: 483a61c4-d3c7-424d-b96b-311d2c6eb69b
health: HEALTH_WARN
Degraded data redundancy: 3 pgs undersized
services:
mon: 3 daemons, quorum pc-10-10-0-13,pc-10-10-0-89,pc-10-10-0-160
mgr: pc-10-10-0-89(active), standbys: pc-10-10-0-13, pc-10-10-0-160
mds: cephfs-1/1/1 up {0=pc-10-10-0-160=up:active}, 2 up:standby
osd: 5 osds: 5 up, 5 in
rbd-mirror: 3 daemons active
rgw: 3 daemons active
data:
pools: 6 pools, 68 pgs
objects: 212 objects, 5.27KiB
usage: 5.02GiB used, 12.7TiB / 12.7TiB avail
pgs: 65 active+clean
3 active+undersized
➜ ~ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 12.73497 root default
-5 0.90959 host pc-10-10-0-13
3 hdd 0.90959 osd.3 up 1.00000 1.00000
-7 0.90959 host pc-10-10-0-160
4 hdd 0.90959 osd.4 up 1.00000 1.00000
-3 10.91579 host pc-10-10-0-89
0 hdd 3.63860 osd.0 up 1.00000 1.00000
1 hdd 3.63860 osd.1 up 1.00000 1.00000
2 hdd 3.63860 osd.2 up 1.00000 1.00000
➜ ~ ceph osd pool ls detail
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 24 flags hashpspool stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 24 flags hashpspool stripe_width 0 application cephfs
pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 13 pgp_num 13 last_change 27 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 13 pgp_num 13 last_change 30 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 13 pgp_num 13 last_change 32 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 13 pgp_num 13 last_change 34 flags hashpspool stripe_width 0 application rgw
#itsafire This is not the solution. He is asking for solution not asking for hardware recommendation.
I'm running 8 nodes and 5 nodes multiple CEPH clusters. I always use 2 replica with multiple crush map (for SSD, SAS and 72k drives)
Why you need 3 replica if you are using a small cluster with limited resources.
Could you please explain why my solution is Recipe for disaster? You have good reputation and I'm not sure how did you get them. Maybe just replying recommendation not solution.
Create a new Pool with Size 2 and Min Size 1.
For pg-num use Ceph PG Calculator https://ceph.com/pgcalc/
It seems you created a three node cluster with different osd configurations and sizes. The standard crush rule tells ceph to have 3 copies of a PG on different hosts. If there is not enough space to spread the PGs over the three hosts, then your cluster will never be healthy.
It is always a good idea to start with a set of equally sized hosts (RAM, CPU, OSDs).
Update for discussion about cluster with size of 2 vs 3
Don't use 2 replicas. Go for 3. Ceph started out with a size default of 2. But this was changed to 3 in Ceph 0.82 (Firefly release).
Why ? Because if one drive fails you are left with only one drive containing your data. Should this drive fail too while recovery is running, then your data is gone for good.
See this thread on the ceph user mailing list
2 replicas isn't safe, no matter how big or small the cluster is. With
disks becoming larger recovery times will grow. In that window you don't
want to run on a single replica.

Ceph luminous rbd map hangs forever

Running a 1 node ceph cluster, and using the ceph-client from another node. Qemu is working fine with the RBD mounting. When I try to mount a RBD block device on the ceph-client I get an indefinite hang with no output. How to diagnose whats wrong?
System is ubuntu 16.04 server, and Ceph Luminous.
sudo ceph tell osd.* version
{
"version": "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)"
}
ceph -s
cluster:
id: 4bfcc109-e432-4ac0-ba9d-bf81243aea
health: HEALTH_OK
services:
mon: 1 daemons, quorum gcmaster
mgr: gcmaster(active)
osd: 1 osds: 1 up, 1 in
data:
pools: 1 pools, 128 pgs
objects: 1512 objects, 5879 MB
usage: 7356 MB used, 216 GB / 223 GB avail
pgs: 128 active+clean
rbd info gcbase
rbd image 'gcbase':
size 512 MB in 128 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.376974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Fri Dec 29 17:58:02 2017
This hangs forever
rbd map gcbase --pool rbd
As does this
rbd map typo_gcbase --pool rbd
dmesg shows
Dec 29 13:27:32 cephclient1 kernel: [85798.195468] libceph: mon0 192.168.1.55:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
Dec 29 13:27:32 cephclient1 kernel: [85798.222070] libceph: mon0 192.168.1.55:6789 missing required protocol features
The dmesg output tells what's going on: The cluster requires a feature bit that is not supported by the libceph kernel module.
The feature bit in question is either CEPH_FEATURE_CRUSH_TUNABLES5, CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING or CEPH_FEATURE_FS_FILE_LAYOUT_V2 (they are overlapping because they were introduced at the same time) which only became available on kernel 4.5, whereas Ubuntu 16.04 uses a 4.4 kernel.
A similar question (although related to CephFS) came up on the mailing list with a possible solution:
Yes, you should be able to set your CRUSH tunables profile to hammer
with "ceph osd crush tunables hammer".
This will disable some features, but should make the older kernel compatible with the cluster.
Alternatively you could upgrade to a mainline kernel or to a newer OS release.

Ceph enters degraded state after Deis installation

I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean.
I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.
Detail messages follow:
root#deis-2:/# ceph -s
libust[276/276]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
cluster dfa09ba0-66f2-46bb-8d84-12795f281f7d
health HEALTH_WARN 1536 pgs degraded; 1536 pgs stuck unclean; 1536 pgs undersized; recovery 1314/3939 objects degraded (33.359%)
monmap e3: 3 mons at {deis-1=10.132.183.190:6789/0,deis-2=10.132.183.191:6789/0,deis-3=10.132.183.192:6789/0}, election epoch 28, quorum 0,1,2 deis-1,deis-2,deis-3
mdsmap e32: 1/1/1 up {0=deis-1=up:active}, 2 up:standby
osdmap e77: 3 osds: 2 up, 2 in
pgmap v109093: 1536 pgs, 12 pools, 897 MB data, 1313 objects
27342 MB used, 48256 MB / 77175 MB avail
1314/3939 objects degraded (33.359%)
1536 active+undersized+degraded
client io 817 B/s wr, 0 op/s
I am totally new to ceph. I wonder:
Is it a big deal to fix this issue, or could I let it be in this state?
If it is recommended to fix this, would you point out how should I go about it?
I read about Ceph troubleshooting section and POOL, PG AND CRUSH CONFIG REFERENCE, but still have no idea what I should do next.
Thanks a lot!
From this output: osdmap e77: 3 osds: 2 up, 2 in. It sounds like one of your deis-store-daemons isn't responding. deisctl restart store-daemon should recover your cluster, but I'd be curious about what happened to that daemon. I'd love to see journalctl --no-pager -u deis-store-daemon on all of your hosts. If you could add your logs to https://github.com/deis/deis/issues/2520 that'd help us figure out why the daemon isn't responding.
Also, 2GB nodes on DO will likely result in performance issues (and Ceph may be unhappy).