Mapping nested device mapper mounts back to their physical drive - loopback

Looking for a reliable (and hopefully simple) way to trace a directory in an lvm or other dm mounted fs back to the physical disk it resides on. Goal is to get the model and serial number of the drive no matter where the script wakes up.
Not a problem when the fs mount is on a physical partition, but gets messy when layers of lvm and/or loopbacks are in between. The lsblk tree shows the dm relationships back to /dev/sda in the following example, but wouldn't be very easy or desirable to parse:
# lsblk -po NAME,MODEL,SERIAL,MOUNTPOINT,MAJ:MIN
NAME MODEL SERIAL MOUNTPOINT MAJ:MIN
/dev/loop0 /mnt/test 7:0
/dev/sda AT1000MX500SSD1 21035FEA05B8 8:0
├─/dev/sda1 /boot 8:1
├─/dev/sda2 8:2
└─/dev/sda5 8:5
└─/dev/mapper/sda5_crypt 254:0
├─/dev/mapper/test5--vg-root / 254:1
└─/dev/mapper/test5--vg-swap_1 [SWAP] 254:2
Tried udevadm info, stat and a few other variations, but they all dead end at the device mapper without a way (that I can see) of connecting the dots to the backing disk and it's model/serial number.

Got enough solution by enumerating the base /dev/sd? devices, looping through each one and its partitions with lsblk -ln devpart and looking for the mountpoint in column 7. In the following example, the desired / shows up in the mappings to the /dev/sda5 partition. The serial number (and a lot of other data) for the base device can then be returned with udevadm info /dev/sda:
sda5 8:5 0 931G 0 part
sda5_crypt 254:0 0 931G 0 crypt
test5--vg-root 254:1 0 651G 0 lvm /
test5--vg-swap_1 254:2 0 976M 0 lvm [SWAP]

Related

How to find the correct devicePaths to use in local storage for an Openshift Persistent Volume?

I've seen the docs to create Custom Resources for Local Storage: https://docs.openshift.com/container-platform/4.5/storage/persistent_storage/persistent-storage-local.html#local-volume-cr_persistent-storage-local
But not sure how to populate the spec.storageClassDevices.devicePaths field.
I've tried the lsblk command in one of my nodes and got this response:
sh-4.4# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 250G 0 disk
|-vda1 252:1 0 384M 0 part /boot
|-vda2 252:2 0 127M 0 part /boot/efi
|-vda3 252:3 0 1M 0 part
`-vda4 252:4 0 249.5G 0 part
`-coreos-luks-root-nocrypt 253:0 0 249.5G 0 dm /sysroot
vdb 252:16 0 200G 0 disk
vdc 252:32 0 200G 0 disk
On Linux and other Unix-like systems your block devices are represented in the filesystem as files so each block device has its unique path, the same way as regular file.
Instead of running lsblk you can run df command which will show you those paths however it will show you only the devices which are already mounted.
All block devices on your system start with vd so you can simply run:
ls -l /dev/vd*
to list them all.
So if you want to mount e.g. your vdb and vdc disks, the values for spec.storageClassDevices.devicePaths will be /dev/vdb and /dev/vdc:
spec:
...
storageClassDevices:
- storageClassName: "local-sc"
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/vdb
- /dev/vdc

What's the best practice of improving IOPS of a CEPH cluster?

I am currently building a CEPH cluster for a KVM platform, which got catastrophic performance outcome right now. The figure is dreadful. I am not really familiar with physically distributed systems, is there any general advice of improving the overall performance (i.e. latency, bandwidth and IOPS)?
The hardware configuration is not optimal right now, but I am still would like to release the full potential of what I currently got:
1x 10Gbe Huawei switch
3x Rack server, with hardware configuration:
Intel(R) Xeon(R) CPU E5-2678 v3 # 2.50GHz x2, totally 48 logical cores,
128GB DDR3 RAM
Intel 1.84T NVMe SSD x6 as data drive, with 1 OSD per disk (totally 6 OSDs per server)
My current /etc/ceph/ceph.conf:
[global]
fsid = f2d6d3a7-0e61-4768-b3f5-b19dd2d8b657
mon initial members = ceph-node1, ceph-node2, ceph-node3
mon allow pool delete = true
mon host = 192.168.16.1, 192.168.16.2, 192.168.16.3
public network = 192.168.16.0/24
cluster network = 192.168.16.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 600
osd pool default pgp num = 600
osd memory target = 4294967296
max open files = 131072
[mon]
mon clock drift allowed = 1
mon osd min down reporters = 13
mon osd down out interval = 600
[OSD]
osd journal size = 20000
osd max write size = 512
osd client message size cap = 2147483648
osd deep scrub stride = 131072
osd op threads = 16
osd disk threads = 4
osd map cache size = 1024
osd map cache bl size = 128
osd recovery op priority = 2
osd recovery max active = 10
osd max backfills = 4
osd min pg log entries = 30000
osd max pg log entries = 100000
osd mon heartbeat interval = 40
ms dispatch throttle bytes = 1048576000
objecter inflight ops = 819200
osd op log threshold = 50
osd crush chooseleaf type = 0
journal max write bytes = 1073714824
journal max write entries = 10000
journal queue max ops = 50000
journal queue max bytes = 10485760000
[Client]
rbd cache = True
rbd cache size = 335544320
rbd cache max dirty = 134217728
rbd cache max dirty age = 30
rbd cache writethrough until flush = False
rbd cache max dirty object = 2
rbd cache target dirty = 235544320
IO benchmark is done by fio, with the configuration:
fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=100G -filename=/data/testfile -name="CEPH Test" -iodepth=8 -runtime=30
Benchmark result screenshot:
The bench mark result
The benchmark was done on a sperate machine, configured to connect the cluster via 10Gbe switch by installing MDS only. The benchmark machine is identical to other 3 which formed the cluster, apart from the absence of Intel NVMe SSD drives.
Any help is appreciated,
First, I must note that Ceph is not an acronym, it is short for Cephalopod, because tentacles.
That said, you have a number of settings in ceph.conf that surprise me, like the extreme number of osdmaps you're caching. The thread settings can be tricky, and vary in applicability between releases. If you're building pools with 600 PGs that isn't great, you generally want a power of 2, and to target a ratio per OSD that factors in drive type and other pools. Setting the mon clock skew to a full second (vs. the default 50ms) is downright alarming, with Chrony or even the legacy ntpd it's not hard to get sub-millisecond syncing.
Three nodes may be limiting in the degree of parallelism / overlap clients can support, especially since you only have 6 drives per server. That's only 18 OSDs.
You have Filestore settings in there too, you aren't really using Filestore are you? Or a Ceph release older than Nautilus?
Finally, as more of an actual answer to the question posed, one simple thing you can do is to split each NVMe drive into two OSDs -- with appropriate pgp_num and pg_num settings for the pool.
ceph-volume lvm batch –osds-per-device 2
I assume you 3 meant 3 servers blades and not 3 racks.
What was you rough estimate of performance ?
What is the performance profile of your disk hardware (outside ceph) at 4K and 2MB ?
How many disk do you have in this pool, what is replication factor/strategy and object size ?
On the client side you are performing small reads: 4K
On the server side, depending on your read-ahead settings and object size each of this 4K may grasp much more data in the background.
Did you check that one of your disk is really at its limits and there is no Network/cpu throttling?
You can partition your drive with lvm and use multiple OSDs per drive. Since you have so many cores per server, one osd per drive is not making use of them

"cat: write error: No space left on device" when I write to a character device using "cat"

I am trying to use VS1053, an audio decoder, on Linux 4.14 to play music. This device communicate through SPI bus, and I've developed a driver and registered VS1053 as a character device, thanks for https://github.com/rvp-nl/vs10xx-linux. Here comes the problem.
The way to play music is:
cat musicfile.mp3 > /dev/VS1053_device
When I throw a WAV music file to the device, everything is ok, and music plays well. However, when I throw a mp3 music file to the device, Linux casts an error
cat: write error: No space left on device
I've searched for the reason on many sites. Many said, check free space and free inode on file system, but this is my result:
root#s32v234sbc:~# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/root 956592 10795 945797 2% /
devtmpfs 234285 308 233977 1% /dev
tmpfs 234333 205 234128 1% /run
tmpfs 234333 10 234323 1% /var/volatile
root#s32v234sbc:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 15G 412M 14G 3% /
devtmpfs 916M 0 916M 0% /dev
tmpfs 916M 84K 916M 1% /run
tmpfs 916M 28K 916M 1% /var/volatile
root#s32v234sbc:~#
The music file is limited within 100MB, there is no way that the space is used up.
I tried to write an print file program to substitute "cat", but that doesn't work either. I have no idea why and how this error happened.
I am super grateful if anyone could help me on this!!!
try the below command:
tune2fs -l /dev/VS1053_device | grep -i reserved
And cat is not suitable for the operation you are performing.The MP3 format has all sorts of junk that can lurk at the front and end of the file and this needs to be strippe out.Try with ffmpeg or mp3wrap or aplay
Seems like there is a problem with the driver you are using. When you get "no space left on device" it doesn't mean in this case that the local filesystem is full. Probably it means that there is some problem on local or the other side of the SPI bus. It is possible that the driver you are using receives -ENOSPC status from SPI driver from linux kernel, but your driver doesn't service this error properly. So it will be needed to dig a little into the driver you are using.

OpenWRT: Cant install packages - memory issue

I switched from default firmware to OpenWrt for my: TP-Link TL-WR1043N/ND v1 and have the problem that I am not able to install any new packages.
What I did:
Flash OpenWrt
Installed Luci (no problem there)
And than when I try to install anything else I get:
Collected errors:
* xsystem: wget: vfork: Out of memory.
* opkg_download: Failed to download http://downloads.openwrt.org/snapshots/trunk/ar71xx/generic/packages/luci/luci-app-wshaper_git-15.338.68695-3bae3c7-1_all.ipk, wget returned -1.
* opkg_install_pkg: Failed to download luci-app-wshaper. Perhaps you need to run 'opkg update'?
* opkg_install_cmd: Cannot install package luci-app-wshaper.
The important thing seems to be : * xsystem: wget: vfork: Out of memory. And yep I did try reboot and 'update' several times.
But under Luci -> Software I can see:
And here is my DF output:
root#OpenWrt:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 2048 2048 0 100% /rom
tmpfs 14052 1128 12924 8% /tmp
/dev/mtdblock3 4736 936 3800 20% /overlay
overlayfs:/overlay 4736 936 3800 20% /
tmpfs 512 0 512 0% /dev
root#OpenWrt:~#
OpenWRT Router Link: https://wiki.openwrt.org/toh/tp-link/tl-wr1043nd
Manufacturer Link: http://www.tp-link.com/en/download/TL-WR1043ND_V1.html
Please does anyone have any idea what could cause the issue? I know the solution could be using an external USB but I want to aviod this at all costs + I cannot imagine that this router would have space just for luci :)
Go into /etc/opkg/distfeeds.conf and comment out everything but base and luci, the first two.
No idea why, but opkg started using tons of RAM with all of those enabled, which they are by default.
Thats my distfeeds. At first try have some troubles with install udpxy, after comment some points in conf, all works.
$ cat /etc/opkg/distfeeds.conf
src/gz designated_driver_base http://downloads.openwrt.org/snapshots/trunk/brcm63xx/generic/packages/base
#src/gz designated_driver_kernel http://downloads.openwrt.org/snapshots/trunk/brcm63xx/generic/packages/kernel
#src/gz designated_driver_telephony http://downloads.openwrt.org/snapshots/trunk/brcm63xx/generic/packages/telephony
src/gz designated_driver_packages http://downloads.openwrt.org/snapshots/trunk/brcm63xx/generic/packages/packages
#src/gz designated_driver_routing http://downloads.openwrt.org/snapshots/trunk/brcm63xx/generic/packages/routing
src/gz designated_driver_luci http://downloads.openwrt.org/snapshots/trunk/brcm63xx/generic/packages/luci
src/gz designated_driver_management http://downloads.openwrt.org/snapshots/trunk/brcm63xx/generic/packages/management
# src/gz designated_driver_targets http://downloads.openwrt.org/snapshots/trunk/brcm63xx/generic/packages/targets

Understanding results of mongostat

I am trying to understand the results of mongostat:
example
insert query update delete getmore command flushes mapped vsize res faults locked % idx
0 2 4 0 0 10 0 976m 2.21g 643m 0 0.1 0
0 1 0 0 0 4 0 976m 2.21g 643m 0 0 0
0 0 0 0 0 1 0 976m 2.21g 643m 0 0 0
I see
mapped - 976m
vsize-2.2.g
res - 643m
res - RAM, so ~650MB of my database is in RAM
mapped - total size of database (via memory mapped files)
vsize - ???
not sure why vsize is important or what exactly it means in this content - im running an m1.large so i have like 400GB of HD space + 8GB of RAM.
Can someone help me out here and explain if
I am on the right page
what stats I should monitor in production
This should give you enough information
mapped - amount of data mmaped (total data size) megabytes
vsize - virtual size of process in megabytes
res - resident size of process in megabytes
1) I am on the right page
So mongostat is not really a "live monitor". It's mostly useful for connecting to a specific server and watching for something specific (what's happening when this job runs?). But it's not really useful for tracking performance over time.
Typically, for monitoring the server, you will want to use a tool like Zabbix or Cacti or Munin. Or some third-party server monitor. The MongoDB webiste has a list.
2) what stats I should monitor in production
You should monitor the same basic stats you would monitor on any server:
CPU
Memory
Disk IO
Network traffic
For MongoDB specifically, you will to run db.serverStatus() and track the
opcounters
connections
indexcounters
Note that these are increasing counters, so you'll have to create the correct "counter type" in your monitoring system (Zabbix, Cacti, etc.) A few of these monitoring programs already have MongoDB plug-ins available.
Also note that MongoDB has a "free" monitoring service called MMS. I say "free" because you will be receiving calls from salespeople in exchange for setting up MMS.
Also you can use these mini tools watching mongodb
http://openmymind.net/2011/9/23/Compressed-Blobs-In-MongoDB/
by the way I remembered this great online tool from 10gen
https://mms.10gen.com/user/login