Basic ContainerCreating Failure - kubernetes

Occasionally I see problems where creating my deployments takes a much longer time than usual (this one is typically a minute or two). How do people normally deal with this? Is it best to remove the offending node? What's the right way to debug this?
error: deployment "hillcity-twitter-staging-deployment" exceeded its progress deadline
Waiting for rollout to complete (been 500s)...
NAME READY STATUS RESTARTS AGE IP NODE
hillcity-twitter-staging-deployment-5bf6b48779-5jvgv 2/2 Running 0 8m 10.168.41.12 gke-charles-test-cluster-default-pool-be943055-mq4j
hillcity-twitter-staging-deployment-5bf6b48779-knzkw 2/2 Running 0 8m 10.168.34.34 gke-charles-test-cluster-default-pool-be943055-czqr
hillcity-twitter-staging-deployment-5bf6b48779-qxmg8 0/2 ContainerCreating 0 8m <none> gke-charles-test-cluster-default-pool-be943055-rzg2
I've ssh-ed into the "rzg2" node but didn't see anything particularly wrong with it. Here's the k8s view:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-charles-test-cluster-default-pool-be943055-2q9f 385m 40% 2288Mi 86%
gke-charles-test-cluster-default-pool-be943055-35fl 214m 22% 2030Mi 76%
gke-charles-test-cluster-default-pool-be943055-3p95 328m 34% 2108Mi 79%
gke-charles-test-cluster-default-pool-be943055-67h0 204m 21% 1783Mi 67%
gke-charles-test-cluster-default-pool-be943055-czqr 342m 36% 2397Mi 90%
gke-charles-test-cluster-default-pool-be943055-jz8v 149m 15% 2299Mi 86%
gke-charles-test-cluster-default-pool-be943055-kl9r 246m 26% 1796Mi 67%
gke-charles-test-cluster-default-pool-be943055-mq4j 123m 13% 1523Mi 57%
gke-charles-test-cluster-default-pool-be943055-mx18 276m 29% 1755Mi 66%
gke-charles-test-cluster-default-pool-be943055-pb48 200m 21% 1667Mi 63%
gke-charles-test-cluster-default-pool-be943055-rzg2 392m 41% 2270Mi 85%
gke-charles-test-cluster-default-pool-be943055-wkxk 274m 29% 1954Mi 73%
```
Added: Here's some of the output of "$ sudo journalctl -u kubelet"
Sep 04 22:14:11 gke-charles-test-cluster-default-pool-be943055-rzg2 kubelet[1442]: E0904 22:14:11.882166 1442 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /var/lib/docker/overlay/83ed56fdfae736d5b1bd3afc3649555916a2ef24a287415256a408c463186107 with output stdout: , stderr: - signal: killed, rootInodeErr: <nil>, extraDiskErr: <nil>
[...repeated a lot...]
Sep 04 22:25:19 gke-charles-test-cluster-default-pool-be943055-rzg2 kubelet[1442]: E0904 22:25:19.917177 1442 kube_docker_client.go:324] Cancel pulling image "gcr.io/able-store-864/hillcity-worker:0.0.1" because of no progress for 1m0s, latest progress: "43f9fd4bd389: Extracting [=====> ] 32.77 kB/295.9 kB"

Related

kubectl top nodes giving extra memory wrt to free -m

I am trying to understand the total memory usage of cluster
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
Master Node 1 308m 7% 4286Mi 55%
Master Node 2 281m 7% 3959Mi 51%
Master Node 3 279m 6% 3959Mi 51%
Worker Node 1 3767m 9% 85715Mi 33%
Worker Node 2 3993m 9% 87353Mi 33%
If we are talking about Worker Node 1, the memory usage here is approx 85GB, But when I am doing free -m on the same node, it is showing me a different result
free -m
total used free shared buff/cache available
Mem: 257381 35450 120699 4094 101231 216970
Swap: 0 0 0
Here used is only 35GB, used+buffer ~=130GB.
How to relate those both,
Also, what is the correct Prometheus query for cluster overall memory usage?
sum (container_memory_working_set_bytes{id="/",kubernetes_io_hostname=~"^$Node$"}) / sum (machine_memory_bytes{kubernetes_io_hostname=~"^$Node$"}) * 100
Is this or something else?

How to use ceph to store large amount of small data

I set up a cephfs cluster on my virtual machine, and then want to use this cluster to store a batch of image data (total 1.4G, each image is about 8KB). The cluster stores two copies, with a total of 12G of available space. But when I store data inside, the system prompts that the available space is insufficient. How to solve this?The details of the cluster are as follows:
Cluster Information:
cluster:
id: 891fb1a7-df35-48a1-9b5c-c21d768d129b
health: HEALTH_ERR
1 MDSs report slow metadata IOs
1 MDSs report slow requests
1 full osd(s)
1 nearfull osd(s)
2 pool(s) full
Degraded data redundancy: 46744/127654 objects degraded (36.618%), 204 pgs degraded
Degraded data redundancy (low space): 204 pgs recovery_toofull
too many PGs per OSD (256 > max 250)
clock skew detected on mon.node2, mon.node3
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: node2(active), standbys: node1, node3
mds: cephfs-1/1/1 up {0=node1=up:active}, 2 up:standby
osd: 3 osds: 2 up, 2 in
data:
pools: 2 pools, 256 pgs
objects: 63.83k objects, 543MiB
usage: 10.6GiB used, 1.40GiB / 12GiB avail
pgs: 46744/127654 objects degraded (36.618%)
204 active+recovery_toofull+degraded
52 active+clean
Cephfs Space Usage:
[root#node1 0]# df -hT
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/mapper/nlas-root xfs 36G 22G 14G 62% /
devtmpfs devtmpfs 2.3G 0 2.3G 0% /dev
tmpfs tmpfs 2.3G 0 2.3G 0%
/dev/shm
tmpfs tmpfs 2.3G 8.7M 2.3G 1% /run
tmpfs tmpfs 2.3G 0 2.3G 0%
/sys/fs/cgroup
/dev/sda1 xfs 1014M 178M 837M 18% /boot
tmpfs tmpfs 2.3G 28K 2.3G 1%
/var/lib/ceph/osd/ceph-0
tmpfs tmpfs 471M 0 471M 0%
/run/user/0
192.168.152.3:6789,192.168.152.4:6789,192.168.152.5:6789:/ ceph 12G 11G 1.5G 89% /mnt/test
Ceph OSD:
[root#node1 mnt]# ceph osd pool ls
cephfs_data
cephfs_metadata
[root#node1 mnt]# ceph osd pool get cephfs_data size
size: 2
[root#node1 mnt]# ceph osd pool get cephfs_metadata size
size: 2
ceph.dir.layout:
[root#node1 mnt]# getfattr -n ceph.dir.layout /mnt/test
getfattr: Removing leading '/' from absolute path names
# file: mnt/test
ceph.dir.layout="stripe_unit=65536 stripe_count=1 object_size=4194304 pool=cephfs_data"
Storing small files, you need to watch the minimum allocation size. Until the Nautilus release, this defaulted to 16k for SSD and 64k for HDD, but with the new Ceph Pacific the default minimum allocation has been tuned to 4k for both.
I suggest you use Pacific, or manually tune Octopus to the same numbers if that's the version you installed.
You also want to use replication (as opposed to Erasure Coding) if your files are under a multiple of the minimum allocation size, as the chunks of EC would use the same minimum allocation and will waste slack space otherwise. You already made the right choice here by using replication, I am just mentioning it here because you may be tempted by EC's touted space-saving properties -- which unfortunately do not apply to small files.
you need to set bluestore_min_alloc_size to 4096 by default its value is 64kb
[osd]
bluestore_min_alloc_size = 4096
bluestore_min_alloc_size_hdd = 4096
bluestore_min_alloc_size_ssd = 4096

Pod killed with "The node was low on resource: inodes" but df -i shows 29% usage only

Sometimes when running certain builds via self-hosted GitLab CI in a Kubernetes cluster the runners fail with the K8s event The node was low on resource: inodes. Upon checking the respective nodes inodes I don't see any problem. This occurs only on specific CI runs which seem to install a lot of files with npm. Sadly the Node Exporter always reports zero free inodes with its metric container_fs_inodes_free.
Since the inode-usage on the node itself is very low I don't know how to further debug this problem. Do you have ideas/hints I can follow?
All nodes are set up with VMWare.
Here is the build-log:
$ npx lerna run typecheck --scope=myproduct --stream
lerna notice cli v3.22.1
lerna info versioning independent
lerna info ci enabled
lerna notice filter including "myproduct"
lerna info filter [ 'myproduct' ]
lerna info Executing command in 1 package: "yarn run typecheck"
myproduct: yarn run v1.22.5
myproduct: $ tsc --noEmit
Cleaning up file based variables
ERROR: Job failed: command terminated with exit code 137
Here is the output of df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
devtmpfs 4079711 454 4079257 1% /dev
tmpfs 4083138 1 4083137 1% /dev/shm
tmpfs 4083138 1738 4081400 1% /run
tmpfs 4083138 17 4083121 1% /sys/fs/cgroup
/dev/mapper/vg00-lv_root 327680 93629 234051 29% /
/dev/mapper/vg00-lv_opt 131072 18148 112924 14% /opt
/dev/mapper/vg00-lv_var 327680 16972 310708 6% /var
/dev/sda1 128016 351 127665 1% /boot
/dev/mapper/vg00-lv_home 32768 73 32695 1% /home
/dev/mapper/vg00-lv_tmp 65536 53 65483 1% /tmp
/dev/mapper/vg01-lv_docker 3211264 409723 2801541 13% /var/lib/docker
/dev/mapper/vg00-lv_varlog 65536 283 65253 1% /var/log

Postgres cant vacuum despite enough space left (could not resize shared memory segment bytes)

I have a docker-compose file with
postgres:
container_name: second_postgres_container
image: postgres:latest
shm_size: 1g
and i wanted to vacuum a table, but got
ERROR: could not resize shared memory segment "/PostgreSQL.301371499" to 1073795648 bytes: No space left on device
the first number is smaller than the right one, also i do have enough space on the server (only 32% is taken)
I wonder if it sees the docker container as not big enough (as it resizes on demand (?)) or where else could be the problem ?
note
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
95c689aa4d38 redis:latest "docker-entrypoint.s…" 10 days ago Up 10 days 0.0.0.0:6379->6379/tcp second_redis_container
f9efc8fad63a postgres:latest "docker-entrypoint.s…" 2 weeks ago Up 2 weeks 0.0.0.0:5433->5432/tcp second_postgres_container
docker exec -it f9efc8fad63a df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
shm 1.0G 2.4M 1022M 1% /dev/shm
df -m
Filesystem 1M-blocks Used Available Use% Mounted on
udev 16019 0 16019 0% /dev
tmpfs 3207 321 2887 11% /run
/dev/md1 450041 132951 294207 32% /
tmpfs 16035 0 16035 0% /dev/shm
tmpfs 5 0 5 0% /run/lock
tmpfs 16035 0 16035 0% /sys/fs/cgroup
tmpfs 3207 0 3207 0% /run/user/1000
overlay 450041 132951 294207 32% /var/lib/docker/overlay2/0abe6aee8caba5096bd53904c5d47628b281f5d12f0a9205ad41923215cf9c6f/merged
overlay 450041 132951 294207 32% /var/lib/docker/overlay2/6ab0dde3640b8f2108d545979ef0710ccf020e6b122abd372b6e37d3ced272cb/merged
thx
That is a sign that parallel query is running out of memory. The cause may be restrictive settings for shared memory on the container.
You can work around the problem by setting max_parallel_maintenance_workers to 0. Then VACUUM won't use parallel workers.
I figured it out (a friend helped :) )
i guess i cant count 1073795648 is slightly more then i needed for the vacuum so indeed shm size 10g instead of 1g helped

From which file should I copy data to make an img on Rasbian if I want to backup raspberry and retore it

I saw one highly-voted answer on the net and it goes like this:
On Linux, you can use the standard dd tool:
dd if=/dev/sdx of=/path/to/image bs=1M
Where /dev/sdx is your SD card.
But I cheked my device there is no /dev/sdx.
Some other says dd if=/dev/mmcblk0 of=/path/to/image bs=1Mshould work fine.
I suppose it has something to do with the version of my raspberry.Mine is the newst Raspbian version.I don't want to break the systems so I just want to make sure the code is right before I run it.So I come here to ask help from those who have tried it before.
This is the situation of my filesystems:
~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 15G 4.1G 9.5G 31% /
devtmpfs 214M 0 214M 0% /dev
tmpfs 218M 0 218M 0% /dev/shm
tmpfs 218M 4.7M 213M 3% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 218M 0 218M 0% /sys/fs/cgroup
/dev/mmcblk0p1 41M 21M 21M 51% /boot
tmpfs 44M 0 44M 0% /run/user/1000
Which file should I choose??
Anybody knows from which file(similiar with /dev/sdx )to copy the data?
Thank you very much!
I think what I was trying to do is to copy the file from machine A while using machine A. Most people's answers on the Internet actually indicates using another machine B to copy the files of machine A.That's why when when I use "df -h",the terminal shows "/dev/root" instead of "/dev/sdX".
Maybe it's because when you read files,the files itself cannot achieve other operations.So I used another machine B and the code "df -h",it shows "/dev/sdX" successfully.And now I can follow the instructions on the Internet and do the backup.