kubectl top nodes giving extra memory wrt to free -m - kubernetes

I am trying to understand the total memory usage of cluster
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
Master Node 1 308m 7% 4286Mi 55%
Master Node 2 281m 7% 3959Mi 51%
Master Node 3 279m 6% 3959Mi 51%
Worker Node 1 3767m 9% 85715Mi 33%
Worker Node 2 3993m 9% 87353Mi 33%
If we are talking about Worker Node 1, the memory usage here is approx 85GB, But when I am doing free -m on the same node, it is showing me a different result
free -m
total used free shared buff/cache available
Mem: 257381 35450 120699 4094 101231 216970
Swap: 0 0 0
Here used is only 35GB, used+buffer ~=130GB.
How to relate those both,
Also, what is the correct Prometheus query for cluster overall memory usage?
sum (container_memory_working_set_bytes{id="/",kubernetes_io_hostname=~"^$Node$"}) / sum (machine_memory_bytes{kubernetes_io_hostname=~"^$Node$"}) * 100
Is this or something else?

Related

CentOs Partition Resize

I'm struggling with resizing a CentOs Partition on a Server. I found some steps, but I'm not sure which circumstances I face and whats the correct approach and i definitely cannot mess that up.
The space should already be available, but the partition is not resized as far as I can tell.
The goal is to extend the partition /dev/sdb1 from 197GB to 1TB
Below are the "lsblk", "df -h" and "fdisk -l" results which should show my current situation.
[ ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 50G 0 disk
├─sda1 8:1 0 1G 0 part /boot
├─sda2 8:2 0 3.7G 0 part [SWAP]
└─sda3 8:3 0 45.3G 0 part /
sdb 8:16 0 1T 0 disk
└─sdb1 8:17 0 1024G 0 part /var/www/vhosts
sdc 8:32 0 50G 0 disk
└─sdc1 8:33 0 50G 0 part /var/lib/psa
sr0 11:0 1 680M 0 rom
[ ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 12M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sda3 45G 7.0G 36G 17% /
/dev/sda1 976M 135M 775M 15% /boot
/dev/sdc1 50G 53M 47G 1% /var/lib/psa
/dev/sdb1 197G 126G 62G 68% /var/www/vhosts
tmpfs 1.6G 0 1.6G 0% /run/user/0
[ ~]# fdisk -l
Disk /dev/sda: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0009c4b4
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2099199 1048576 83 Linux
/dev/sda2 2099200 9910271 3905536 82 Linux swap / Solaris
/dev/sda3 9910272 104855551 47472640 83 Linux
Disk /dev/sdb: 1099.5 GB, 1099511627776 bytes, 2147483648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x8e948ef1
Device Boot Start End Blocks Id System
/dev/sdb1 2048 2147483647 1073740800 83 Linux
Disk /dev/sdc: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x7677284e
Device Boot Start End Blocks Id System
/dev/sdc1 2048 104857599 52427776 83 Linux
I found this answer here on an external page, but I'm not familiar with the commands and cannot tell, if thats the right way to go (if allowed I can paste the url). Partition Paths have not beed update to mine.
There are three steps to make:
alter your partition table so sda2 ends at end of disk
reread the partition table (will require a reboot)
resize your LVM pv using pvresize
Step 1 - Partition table Run fdisk
/dev/sda. Issue p to print your current partition table and copy that
output to some safe place. Now issue d followed by 2 to remove the
second partition. Issue n to create a new second partition. Make sure
the start equals the start of the partition table you printed earlier.
Make sure the end is at the end of the disk (usually the default).
Issue t followed by 2 followed by 8e to toggle the partition type of
your new second partition to 8e (Linux LVM).
Issue p to review your new partition layout and make sure the start of
the new second partition is exactly where the old second partition
was.
If everything looks right, issue w to write the partition table to
disk. You will get an error message from partprobe that the partition
table couldn't be reread (because the disk is in use).
Step 2 Reboot your system This step is neccessary so the partition table gets
re-read.
Step 3 Resize the LVM PV After your system rebooted invoke pvresize
/dev/sda2. Your Physical LVM volume will now span the rest of the
drive and you can create or extend logical volumes into that space.
The question is, is that the right way to increase the partition size without loosing any data on it for a CentOs System?
Thank you
As you can see the partition
sdb 8:16 0 1T 0 disk
└─sdb1 8:17 0 1024G 0 part /var/www/vhosts
is already 1TB. So you need to extend the filesystem. If your filesystem is ext4 you can use command:
resize2fs /var/www/vhosts
if your filesystem is xfs you can use command:
xfs_growfs /var/www/vhosts

How to use ceph to store large amount of small data

I set up a cephfs cluster on my virtual machine, and then want to use this cluster to store a batch of image data (total 1.4G, each image is about 8KB). The cluster stores two copies, with a total of 12G of available space. But when I store data inside, the system prompts that the available space is insufficient. How to solve this?The details of the cluster are as follows:
Cluster Information:
cluster:
id: 891fb1a7-df35-48a1-9b5c-c21d768d129b
health: HEALTH_ERR
1 MDSs report slow metadata IOs
1 MDSs report slow requests
1 full osd(s)
1 nearfull osd(s)
2 pool(s) full
Degraded data redundancy: 46744/127654 objects degraded (36.618%), 204 pgs degraded
Degraded data redundancy (low space): 204 pgs recovery_toofull
too many PGs per OSD (256 > max 250)
clock skew detected on mon.node2, mon.node3
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: node2(active), standbys: node1, node3
mds: cephfs-1/1/1 up {0=node1=up:active}, 2 up:standby
osd: 3 osds: 2 up, 2 in
data:
pools: 2 pools, 256 pgs
objects: 63.83k objects, 543MiB
usage: 10.6GiB used, 1.40GiB / 12GiB avail
pgs: 46744/127654 objects degraded (36.618%)
204 active+recovery_toofull+degraded
52 active+clean
Cephfs Space Usage:
[root#node1 0]# df -hT
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/mapper/nlas-root xfs 36G 22G 14G 62% /
devtmpfs devtmpfs 2.3G 0 2.3G 0% /dev
tmpfs tmpfs 2.3G 0 2.3G 0%
/dev/shm
tmpfs tmpfs 2.3G 8.7M 2.3G 1% /run
tmpfs tmpfs 2.3G 0 2.3G 0%
/sys/fs/cgroup
/dev/sda1 xfs 1014M 178M 837M 18% /boot
tmpfs tmpfs 2.3G 28K 2.3G 1%
/var/lib/ceph/osd/ceph-0
tmpfs tmpfs 471M 0 471M 0%
/run/user/0
192.168.152.3:6789,192.168.152.4:6789,192.168.152.5:6789:/ ceph 12G 11G 1.5G 89% /mnt/test
Ceph OSD:
[root#node1 mnt]# ceph osd pool ls
cephfs_data
cephfs_metadata
[root#node1 mnt]# ceph osd pool get cephfs_data size
size: 2
[root#node1 mnt]# ceph osd pool get cephfs_metadata size
size: 2
ceph.dir.layout:
[root#node1 mnt]# getfattr -n ceph.dir.layout /mnt/test
getfattr: Removing leading '/' from absolute path names
# file: mnt/test
ceph.dir.layout="stripe_unit=65536 stripe_count=1 object_size=4194304 pool=cephfs_data"
Storing small files, you need to watch the minimum allocation size. Until the Nautilus release, this defaulted to 16k for SSD and 64k for HDD, but with the new Ceph Pacific the default minimum allocation has been tuned to 4k for both.
I suggest you use Pacific, or manually tune Octopus to the same numbers if that's the version you installed.
You also want to use replication (as opposed to Erasure Coding) if your files are under a multiple of the minimum allocation size, as the chunks of EC would use the same minimum allocation and will waste slack space otherwise. You already made the right choice here by using replication, I am just mentioning it here because you may be tempted by EC's touted space-saving properties -- which unfortunately do not apply to small files.
you need to set bluestore_min_alloc_size to 4096 by default its value is 64kb
[osd]
bluestore_min_alloc_size = 4096
bluestore_min_alloc_size_hdd = 4096
bluestore_min_alloc_size_ssd = 4096

Postgres cant vacuum despite enough space left (could not resize shared memory segment bytes)

I have a docker-compose file with
postgres:
container_name: second_postgres_container
image: postgres:latest
shm_size: 1g
and i wanted to vacuum a table, but got
ERROR: could not resize shared memory segment "/PostgreSQL.301371499" to 1073795648 bytes: No space left on device
the first number is smaller than the right one, also i do have enough space on the server (only 32% is taken)
I wonder if it sees the docker container as not big enough (as it resizes on demand (?)) or where else could be the problem ?
note
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
95c689aa4d38 redis:latest "docker-entrypoint.s…" 10 days ago Up 10 days 0.0.0.0:6379->6379/tcp second_redis_container
f9efc8fad63a postgres:latest "docker-entrypoint.s…" 2 weeks ago Up 2 weeks 0.0.0.0:5433->5432/tcp second_postgres_container
docker exec -it f9efc8fad63a df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
shm 1.0G 2.4M 1022M 1% /dev/shm
df -m
Filesystem 1M-blocks Used Available Use% Mounted on
udev 16019 0 16019 0% /dev
tmpfs 3207 321 2887 11% /run
/dev/md1 450041 132951 294207 32% /
tmpfs 16035 0 16035 0% /dev/shm
tmpfs 5 0 5 0% /run/lock
tmpfs 16035 0 16035 0% /sys/fs/cgroup
tmpfs 3207 0 3207 0% /run/user/1000
overlay 450041 132951 294207 32% /var/lib/docker/overlay2/0abe6aee8caba5096bd53904c5d47628b281f5d12f0a9205ad41923215cf9c6f/merged
overlay 450041 132951 294207 32% /var/lib/docker/overlay2/6ab0dde3640b8f2108d545979ef0710ccf020e6b122abd372b6e37d3ced272cb/merged
thx
That is a sign that parallel query is running out of memory. The cause may be restrictive settings for shared memory on the container.
You can work around the problem by setting max_parallel_maintenance_workers to 0. Then VACUUM won't use parallel workers.
I figured it out (a friend helped :) )
i guess i cant count 1073795648 is slightly more then i needed for the vacuum so indeed shm size 10g instead of 1g helped

Rookio Ceph cluster : mon c is low on available space message

I have setup RookIO 1.4 cluster in Kubernetes 1.18. with 3 nodes allocated 1TB storage on each of them.
after creating cluster. when I run the ceph status cluster status shows as HEALTH_WARN with mon c is low on available space.
There is no data stored yet. why status how low on available space? How to clear this error?
[root#rook-ceph-tools-6bdcd78654-sfjvl /]# ceph status
cluster:
id: ad42764d-aa28-4da5-a828-2d87205aff08
health: HEALTH_WARN
mon c is low on available space
services:
mon: 3 daemons, quorum a,b,c (age 37m)
mgr: a(active, since 36m)
osd: 3 osds: 3 up (since 37m), 3 in (since 37m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 3.6 TiB / 3.6 TiB avail
pgs: 1 active+clean
All three node has same size storage:
sdb 8:16 0 1.2T 0 disk
└─ceph--a6cd601d--7584--4b1f--bf82--48c95437f351-osd--data--ae1bc856--8ded--4b1e--8c87--30ca0f0959a3 253:3 0 1.2T 0 lvm
sdb 8:16 0 1.2T 0 disk
└─ceph--ccaf7144--d6a0--441c--bcd5--6a09d056bd7a-osd--data--36a9b28c--7207--400a--936b--edfb3255ce0b 253:3 0 1.2T 0 lvm
sdb 8:16 0 1.2T 0 disk
└─ceph--53e9b8a9--8925--4b21--a6ea--f8e17a322d5c-osd--data--6b1e779c--a18a--4e4d--960e--73ca9473d02f 253:3 0 1.2T 0 lvm
Thanks
SR
This alert is for your monitor disk space that is stored normally in /var/lib/ceph/mon. This path is stored in root fs that isn't related to your OSDs block device. This warn is raised when this path has less than 30% available space (see mon_data_avail_warn which is 30 by default).
You can change it to ignore alert or resize that path to have more space for its RocksDB data.
As Seena explained, it was because the available space is less than 30%, in this case, you could compact the mon data by the command as follow.
ceph tell mon.`hostname -s` compact
There is another way to trigger the data compaction for mon, add the mon config to the ceph.conf, and then restart the mon.
[mon]
mon compact on start = true

Basic ContainerCreating Failure

Occasionally I see problems where creating my deployments takes a much longer time than usual (this one is typically a minute or two). How do people normally deal with this? Is it best to remove the offending node? What's the right way to debug this?
error: deployment "hillcity-twitter-staging-deployment" exceeded its progress deadline
Waiting for rollout to complete (been 500s)...
NAME READY STATUS RESTARTS AGE IP NODE
hillcity-twitter-staging-deployment-5bf6b48779-5jvgv 2/2 Running 0 8m 10.168.41.12 gke-charles-test-cluster-default-pool-be943055-mq4j
hillcity-twitter-staging-deployment-5bf6b48779-knzkw 2/2 Running 0 8m 10.168.34.34 gke-charles-test-cluster-default-pool-be943055-czqr
hillcity-twitter-staging-deployment-5bf6b48779-qxmg8 0/2 ContainerCreating 0 8m <none> gke-charles-test-cluster-default-pool-be943055-rzg2
I've ssh-ed into the "rzg2" node but didn't see anything particularly wrong with it. Here's the k8s view:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-charles-test-cluster-default-pool-be943055-2q9f 385m 40% 2288Mi 86%
gke-charles-test-cluster-default-pool-be943055-35fl 214m 22% 2030Mi 76%
gke-charles-test-cluster-default-pool-be943055-3p95 328m 34% 2108Mi 79%
gke-charles-test-cluster-default-pool-be943055-67h0 204m 21% 1783Mi 67%
gke-charles-test-cluster-default-pool-be943055-czqr 342m 36% 2397Mi 90%
gke-charles-test-cluster-default-pool-be943055-jz8v 149m 15% 2299Mi 86%
gke-charles-test-cluster-default-pool-be943055-kl9r 246m 26% 1796Mi 67%
gke-charles-test-cluster-default-pool-be943055-mq4j 123m 13% 1523Mi 57%
gke-charles-test-cluster-default-pool-be943055-mx18 276m 29% 1755Mi 66%
gke-charles-test-cluster-default-pool-be943055-pb48 200m 21% 1667Mi 63%
gke-charles-test-cluster-default-pool-be943055-rzg2 392m 41% 2270Mi 85%
gke-charles-test-cluster-default-pool-be943055-wkxk 274m 29% 1954Mi 73%
```
Added: Here's some of the output of "$ sudo journalctl -u kubelet"
Sep 04 22:14:11 gke-charles-test-cluster-default-pool-be943055-rzg2 kubelet[1442]: E0904 22:14:11.882166 1442 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /var/lib/docker/overlay/83ed56fdfae736d5b1bd3afc3649555916a2ef24a287415256a408c463186107 with output stdout: , stderr: - signal: killed, rootInodeErr: <nil>, extraDiskErr: <nil>
[...repeated a lot...]
Sep 04 22:25:19 gke-charles-test-cluster-default-pool-be943055-rzg2 kubelet[1442]: E0904 22:25:19.917177 1442 kube_docker_client.go:324] Cancel pulling image "gcr.io/able-store-864/hillcity-worker:0.0.1" because of no progress for 1m0s, latest progress: "43f9fd4bd389: Extracting [=====> ] 32.77 kB/295.9 kB"