How to use ceph to store large amount of small data - ceph

I set up a cephfs cluster on my virtual machine, and then want to use this cluster to store a batch of image data (total 1.4G, each image is about 8KB). The cluster stores two copies, with a total of 12G of available space. But when I store data inside, the system prompts that the available space is insufficient. How to solve this?The details of the cluster are as follows:
Cluster Information:
cluster:
id: 891fb1a7-df35-48a1-9b5c-c21d768d129b
health: HEALTH_ERR
1 MDSs report slow metadata IOs
1 MDSs report slow requests
1 full osd(s)
1 nearfull osd(s)
2 pool(s) full
Degraded data redundancy: 46744/127654 objects degraded (36.618%), 204 pgs degraded
Degraded data redundancy (low space): 204 pgs recovery_toofull
too many PGs per OSD (256 > max 250)
clock skew detected on mon.node2, mon.node3
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: node2(active), standbys: node1, node3
mds: cephfs-1/1/1 up {0=node1=up:active}, 2 up:standby
osd: 3 osds: 2 up, 2 in
data:
pools: 2 pools, 256 pgs
objects: 63.83k objects, 543MiB
usage: 10.6GiB used, 1.40GiB / 12GiB avail
pgs: 46744/127654 objects degraded (36.618%)
204 active+recovery_toofull+degraded
52 active+clean
Cephfs Space Usage:
[root#node1 0]# df -hT
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/mapper/nlas-root xfs 36G 22G 14G 62% /
devtmpfs devtmpfs 2.3G 0 2.3G 0% /dev
tmpfs tmpfs 2.3G 0 2.3G 0%
/dev/shm
tmpfs tmpfs 2.3G 8.7M 2.3G 1% /run
tmpfs tmpfs 2.3G 0 2.3G 0%
/sys/fs/cgroup
/dev/sda1 xfs 1014M 178M 837M 18% /boot
tmpfs tmpfs 2.3G 28K 2.3G 1%
/var/lib/ceph/osd/ceph-0
tmpfs tmpfs 471M 0 471M 0%
/run/user/0
192.168.152.3:6789,192.168.152.4:6789,192.168.152.5:6789:/ ceph 12G 11G 1.5G 89% /mnt/test
Ceph OSD:
[root#node1 mnt]# ceph osd pool ls
cephfs_data
cephfs_metadata
[root#node1 mnt]# ceph osd pool get cephfs_data size
size: 2
[root#node1 mnt]# ceph osd pool get cephfs_metadata size
size: 2
ceph.dir.layout:
[root#node1 mnt]# getfattr -n ceph.dir.layout /mnt/test
getfattr: Removing leading '/' from absolute path names
# file: mnt/test
ceph.dir.layout="stripe_unit=65536 stripe_count=1 object_size=4194304 pool=cephfs_data"

Storing small files, you need to watch the minimum allocation size. Until the Nautilus release, this defaulted to 16k for SSD and 64k for HDD, but with the new Ceph Pacific the default minimum allocation has been tuned to 4k for both.
I suggest you use Pacific, or manually tune Octopus to the same numbers if that's the version you installed.
You also want to use replication (as opposed to Erasure Coding) if your files are under a multiple of the minimum allocation size, as the chunks of EC would use the same minimum allocation and will waste slack space otherwise. You already made the right choice here by using replication, I am just mentioning it here because you may be tempted by EC's touted space-saving properties -- which unfortunately do not apply to small files.

you need to set bluestore_min_alloc_size to 4096 by default its value is 64kb
[osd]
bluestore_min_alloc_size = 4096
bluestore_min_alloc_size_hdd = 4096
bluestore_min_alloc_size_ssd = 4096

Related

CentOs Partition Resize

I'm struggling with resizing a CentOs Partition on a Server. I found some steps, but I'm not sure which circumstances I face and whats the correct approach and i definitely cannot mess that up.
The space should already be available, but the partition is not resized as far as I can tell.
The goal is to extend the partition /dev/sdb1 from 197GB to 1TB
Below are the "lsblk", "df -h" and "fdisk -l" results which should show my current situation.
[ ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 50G 0 disk
├─sda1 8:1 0 1G 0 part /boot
├─sda2 8:2 0 3.7G 0 part [SWAP]
└─sda3 8:3 0 45.3G 0 part /
sdb 8:16 0 1T 0 disk
└─sdb1 8:17 0 1024G 0 part /var/www/vhosts
sdc 8:32 0 50G 0 disk
└─sdc1 8:33 0 50G 0 part /var/lib/psa
sr0 11:0 1 680M 0 rom
[ ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 12M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sda3 45G 7.0G 36G 17% /
/dev/sda1 976M 135M 775M 15% /boot
/dev/sdc1 50G 53M 47G 1% /var/lib/psa
/dev/sdb1 197G 126G 62G 68% /var/www/vhosts
tmpfs 1.6G 0 1.6G 0% /run/user/0
[ ~]# fdisk -l
Disk /dev/sda: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0009c4b4
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2099199 1048576 83 Linux
/dev/sda2 2099200 9910271 3905536 82 Linux swap / Solaris
/dev/sda3 9910272 104855551 47472640 83 Linux
Disk /dev/sdb: 1099.5 GB, 1099511627776 bytes, 2147483648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x8e948ef1
Device Boot Start End Blocks Id System
/dev/sdb1 2048 2147483647 1073740800 83 Linux
Disk /dev/sdc: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x7677284e
Device Boot Start End Blocks Id System
/dev/sdc1 2048 104857599 52427776 83 Linux
I found this answer here on an external page, but I'm not familiar with the commands and cannot tell, if thats the right way to go (if allowed I can paste the url). Partition Paths have not beed update to mine.
There are three steps to make:
alter your partition table so sda2 ends at end of disk
reread the partition table (will require a reboot)
resize your LVM pv using pvresize
Step 1 - Partition table Run fdisk
/dev/sda. Issue p to print your current partition table and copy that
output to some safe place. Now issue d followed by 2 to remove the
second partition. Issue n to create a new second partition. Make sure
the start equals the start of the partition table you printed earlier.
Make sure the end is at the end of the disk (usually the default).
Issue t followed by 2 followed by 8e to toggle the partition type of
your new second partition to 8e (Linux LVM).
Issue p to review your new partition layout and make sure the start of
the new second partition is exactly where the old second partition
was.
If everything looks right, issue w to write the partition table to
disk. You will get an error message from partprobe that the partition
table couldn't be reread (because the disk is in use).
Step 2 Reboot your system This step is neccessary so the partition table gets
re-read.
Step 3 Resize the LVM PV After your system rebooted invoke pvresize
/dev/sda2. Your Physical LVM volume will now span the rest of the
drive and you can create or extend logical volumes into that space.
The question is, is that the right way to increase the partition size without loosing any data on it for a CentOs System?
Thank you
As you can see the partition
sdb 8:16 0 1T 0 disk
└─sdb1 8:17 0 1024G 0 part /var/www/vhosts
is already 1TB. So you need to extend the filesystem. If your filesystem is ext4 you can use command:
resize2fs /var/www/vhosts
if your filesystem is xfs you can use command:
xfs_growfs /var/www/vhosts

Postgres cant vacuum despite enough space left (could not resize shared memory segment bytes)

I have a docker-compose file with
postgres:
container_name: second_postgres_container
image: postgres:latest
shm_size: 1g
and i wanted to vacuum a table, but got
ERROR: could not resize shared memory segment "/PostgreSQL.301371499" to 1073795648 bytes: No space left on device
the first number is smaller than the right one, also i do have enough space on the server (only 32% is taken)
I wonder if it sees the docker container as not big enough (as it resizes on demand (?)) or where else could be the problem ?
note
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
95c689aa4d38 redis:latest "docker-entrypoint.s…" 10 days ago Up 10 days 0.0.0.0:6379->6379/tcp second_redis_container
f9efc8fad63a postgres:latest "docker-entrypoint.s…" 2 weeks ago Up 2 weeks 0.0.0.0:5433->5432/tcp second_postgres_container
docker exec -it f9efc8fad63a df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
shm 1.0G 2.4M 1022M 1% /dev/shm
df -m
Filesystem 1M-blocks Used Available Use% Mounted on
udev 16019 0 16019 0% /dev
tmpfs 3207 321 2887 11% /run
/dev/md1 450041 132951 294207 32% /
tmpfs 16035 0 16035 0% /dev/shm
tmpfs 5 0 5 0% /run/lock
tmpfs 16035 0 16035 0% /sys/fs/cgroup
tmpfs 3207 0 3207 0% /run/user/1000
overlay 450041 132951 294207 32% /var/lib/docker/overlay2/0abe6aee8caba5096bd53904c5d47628b281f5d12f0a9205ad41923215cf9c6f/merged
overlay 450041 132951 294207 32% /var/lib/docker/overlay2/6ab0dde3640b8f2108d545979ef0710ccf020e6b122abd372b6e37d3ced272cb/merged
thx
That is a sign that parallel query is running out of memory. The cause may be restrictive settings for shared memory on the container.
You can work around the problem by setting max_parallel_maintenance_workers to 0. Then VACUUM won't use parallel workers.
I figured it out (a friend helped :) )
i guess i cant count 1073795648 is slightly more then i needed for the vacuum so indeed shm size 10g instead of 1g helped

Partition's size in df -h is totally different than the size in /proc/partitions [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I'm using buildroot to build a custom linux system for my raspi A+.
Using genimage, I've created two partitions on a 1 GB sdcard. The first partion is the boot partition. It's vfat and it is 32 MB. The second partition is ext4, it is the rootfs and it is 512 MB.
Once I boot my raspi with the newly burned sdcard and that I type df -h I get this in the output:
Filesystem Size Used Available Use% Mounted on
/dev/root 17.1M 14.0M 1.8M 89% /
devtmpfs 200.6M 0 200.6M 0% /dev
tmpfs 200.7M 0 200.7M 0% /dev/shm
tmpfs 200.7M 0 200.7M 0% /tmp
tmpfs 200.7M 4.0K 200.7M 0% /run
as you can see, /dev/root is 17.1 MB instead of 512 MB.
Then, I issue cat /proc/partitions:
major minor #blocks name
1 0 4096 ram0
1 1 4096 ram1
1 2 4096 ram2
1 3 4096 ram3
1 4 4096 ram4
1 5 4096 ram5
1 6 4096 ram6
1 7 4096 ram7
1 8 4096 ram8
1 9 4096 ram9
1 10 4096 ram10
1 11 4096 ram11
1 12 4096 ram12
1 13 4096 ram13
1 14 4096 ram14
1 15 4096 ram15
179 0 969728 mmcblk0
179 1 32768 mmcblk0p1
179 2 524288 mmcblk0p2
We clearly see that the sdcard (mmcblk0) is 1 GB, the boot partition (mmcblk0p1) is 32 MB and the rootfs partition (mmcblk0p2) is 512 MB.
So, to convince myself that the mmcblk0p2 partition may have been imporperly mounted, I mount it again with mount -t ext4 -o rw /dev/mmcblk0p2 /mnt and then I issue df -h again:
Filesystem Size Used Available Use% Mounted on
/dev/root 17.1M 14.0M 1.8M 89% /
devtmpfs 200.6M 0 200.6M 0% /dev
tmpfs 200.7M 0 200.7M 0% /dev/shm
tmpfs 200.7M 0 200.7M 0% /tmp
tmpfs 200.7M 4.0K 200.7M 0% /run
/dev/mmcblk0p2 17.1M 14.0M 1.8M 89% /mnt
Again, I see that mmcblk0p2 size is 17.1 MB.
So, my question is Why is cat /proc/partitions returning the expected size for my rootfs partition while df -h returns a totally different and bogus size ?
TL;DR: set BR2_TARGET_ROOTFS_EXT2_BLOCKS to 524288.
You have to distinguish the partition from the filesystem on the partition.
The partition sizes and offsets are specified in the partition table, and you can view them with cat /proc/partitions. Paritions are created with a tool like fdisk (or when you're using Buildroot, it's often created by genimage).
The filesystem size is specified in the filesystem superblock, a piece of metadata that specifies the size of the filesystem, any options (e.g. if journalling is used), cluster sizes, etc. This is created by a tool like mke2fs. When you use mke2fs directly on a partition, it will use the full space of the partition for the filesystem, which is typically what you want. However, when you create the filesystem before partitioning the SD card (as is often the case when you generate an image with e.g. Buildroot), you have to specify the size to mke2fs (cfr. the man page: the second argument is blocks-count).
In Buildroot, you typically create an image as a file and don't write directly to the SD card. That is because the size of the SD card is not known a priori, and because you have to be root to be able to write the SD card. Therefore, there is no way for Buildroot to know how large the ext4 filesystem should be when you create the filesystem. Before the 2017.05 release of Buildroot, it would try to estimate how large the filesystem should be to fit everything, and create a filesystem of exactly that size. You are probably in that situation.
To fix this, you should set the configuration variable BR2_TARGET_ROOTFS_EXT2_BLOCKS to 524288 (= 512MB in 1024-byte blocks). Or if you use Buildroot more recent than the 2017.05 release, set BR2_TARGET_ROOTFS_EXT2_SIZE to 512M (the new option is in bytes but allows suffixes K, M, G).

From which file should I copy data to make an img on Rasbian if I want to backup raspberry and retore it

I saw one highly-voted answer on the net and it goes like this:
On Linux, you can use the standard dd tool:
dd if=/dev/sdx of=/path/to/image bs=1M
Where /dev/sdx is your SD card.
But I cheked my device there is no /dev/sdx.
Some other says dd if=/dev/mmcblk0 of=/path/to/image bs=1Mshould work fine.
I suppose it has something to do with the version of my raspberry.Mine is the newst Raspbian version.I don't want to break the systems so I just want to make sure the code is right before I run it.So I come here to ask help from those who have tried it before.
This is the situation of my filesystems:
~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 15G 4.1G 9.5G 31% /
devtmpfs 214M 0 214M 0% /dev
tmpfs 218M 0 218M 0% /dev/shm
tmpfs 218M 4.7M 213M 3% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 218M 0 218M 0% /sys/fs/cgroup
/dev/mmcblk0p1 41M 21M 21M 51% /boot
tmpfs 44M 0 44M 0% /run/user/1000
Which file should I choose??
Anybody knows from which file(similiar with /dev/sdx )to copy the data?
Thank you very much!
I think what I was trying to do is to copy the file from machine A while using machine A. Most people's answers on the Internet actually indicates using another machine B to copy the files of machine A.That's why when when I use "df -h",the terminal shows "/dev/root" instead of "/dev/sdX".
Maybe it's because when you read files,the files itself cannot achieve other operations.So I used another machine B and the code "df -h",it shows "/dev/sdX" successfully.And now I can follow the instructions on the Internet and do the backup.

How to create database on the disk with enough space in psql?

I want to import data into a database on aws. But the space is always not enough. I created the database using this command sudo -u postgres createdb ~/data/word2vec/AidaDB -O MyName and tried to import the data into the database using this command:
bzcat AIDA_entity_repository_2014-01-02v10.sql.bz2 | psql /home/ubuntu/data/word2vec/AidaDB.
Here is the disk usage:
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 89G 84G 343M 100% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 16G 12K 16G 1% /dev
tmpfs 3.2G 848K 3.2G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 16G 76K 16G 1% /run/shm
none 100M 24K 100M 1% /run/user
/dev/xvdg 138G 60M 131G 1% /home/ubuntu/data/glove
/dev/xvdf 246G 32G 203G 14% /home/ubuntu/data/word2vec
Why the disk is not enough? The data is 31GB. But I thought I created the database in /home/ubuntu/data/word2vec. Is there a way to solve this problem? Many thanks.
You cannot specify the location of the database as part of the name of the database. PostgreSQL always creates the database in it's data directory. However you could create an additional tablespace and create your database within this tablespace.
CREATE TABLESPACE mydbspace LOCATION '/home/ubuntu/data/word2vec';
CREATE DATABASE AidaDB OWNER MyName TABLESPACE mydbspace;