How to deploy Docker services to swarm using cephFS? - centos

I have configured Ceph in a Docker Swarm with 5 nodes using the daemon/ceph Docker image. The status of the same is given below.
root#master1 mnt]# ceph -s
cluster:
id: eec52783-a0b5-4195-82b7-b66ea5514650
health: HEALTH_WARN
clock skew detected on mon.slave2, mon.master1
services:
mon: 3 daemons, quorum slave1,slave2,master1
mgr: slave3(active)
mds: cephfs-1/1/1 up {0=master1=up:active}, 1 up:standby
osd: 5 osds: 5 up, 5 in
data:
pools: 2 pools, 128 pgs
objects: 22 objects, 2.2 KiB
usage: 1.0 GiB used, 142 GiB / 143 GiB avail
pgs: 128 active+clean
I want to start using the storage and deploy Docker services to this Swarm in a way they share the Ceph storage.

Related

Zookeeper Stuck in CrashBackloopoff status When Installing in Redhat 8.6 Working fine on Redat 7.9

We are using Strimzi kafka on RedHat 7.9 and it is working fine but when we are trying to install the same on RHEL 8.6 Zookeeper is stuck in crashloopbackoff status.
Docker version : 19.03.15
K8s version: v1.19.9
Redhat Version: Red Hat Enterprise Linux release 8.6 (Ootpa)
Operator version: 0.27.1
kafka version: 0.27.1-kafka-3.0.0
$$kubectl describe pod zookeeper-0 -n kafka
"Events:
Type Reason Age From Message
Normal Scheduled 16m default-scheduler Successfully assigned kafka/atomiq-cluster-zookeeper-0 to atomiqplatformrhel8601-vm
Warning Unhealthy 15m kubelet Readiness probe failed:
Warning Unhealthy 15m kubelet Liveness probe failed:
Normal Killing 14m kubelet Container zookeeper failed liveness probe, will be restarted
Normal Created 14m (x4 over 16m) kubelet Created container zookeeper
Normal Pulled 14m (x4 over 16m) kubelet Container image "atomiqplatformrhel8601-vm:7443/strimzi/kafka:0.27.1-kafka-3.0.0" already present on machine
Normal Started 14m (x4 over 16m) kubelet Started container zookeeper
Warning Unhealthy 14m (x3 over 16m) kubelet Readiness probe failed: Ncat: Connection refused.
Warning Unhealthy 14m (x3 over 16m) kubelet Liveness probe failed: Ncat: Connection refused.
Warning BackOff 83s (x58 over 15m) kubelet Back-off restarting failed container"
$$kubectl logs zookeeper-0 -n kafka
2022-11-29 07:52:23,063 ERROR Couldn't bind to atomiq-cluster-zookeeper-0.atomiq-cluster-zookeeper-nodes.kafka.svc:2888 (org.apache.zookeeper.server.quorum.Leader) [QuorumPeermyid=1(secure=0.0.0.0:2181)]
java.net.SocketException: Unresolved address
at java.base/java.net.ServerSocket.bind(ServerSocket.java:388)
at java.base/java.net.ServerSocket.bind(ServerSocket.java:349)
at org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:315)
at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:294)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3566)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.apache.zookeeper.server.quorum.Leader.(Leader.java:297)
at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479)
2022-11-29 07:52:23,208 ERROR Leaving listener thread for address atomiq-cluster-zookeeper-0.atomiq-cluster-zookeeper-nodes.kafka.svc:3888 after 3 errors. Use zookeeper.electionPortBindRetry property to increase retry count. (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-atomiq-cluster-zookeeper-0.atomiq-cluster-zookeeper-nodes.kafka.svc:3888]
2022-11-29 07:52:23,209 ERROR As I'm leaving the listener thread, I won't be able to participate in leader election any longer: atomiq-cluster-zookeeper-0.atomiq-cluster-zookeeper-nodes.kafka.svc:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumPeerListener]
2022-11-29 07:52:23,210 ERROR Exiting JVM with code 14 (org.apache.zookeeper.util.ServiceUtils) [QuorumPeerListener
Is there any specific kernel setting we have to do in Redhat 8.6?
Checked the issues related to this error seems this is related to DNS, but we didn't find anything related to DNS Settings.
Kube-flannel logs:
I1129 06:18:36.102150 1 main.go:514] Determining IP address of default interface
I1129 06:18:36.103234 1 main.go:527] Using interface with name eth0 and address 10.67.39.73
I1129 06:18:36.103265 1 main.go:544] Defaulting external address to interface address (10.67.39.73)
I1129 06:18:36.201993 1 kube.go:126] Waiting 10m0s for node controller to sync
I1129 06:18:36.202043 1 kube.go:309] Starting kube subnet manager
I1129 06:18:37.202166 1 kube.go:133] Node controller sync successful
I1129 06:18:37.202209 1 main.go:244] Created subnet manager: Kubernetes Subnet Manager - atomiqplatformrhel8601-vm
I1129 06:18:37.202214 1 main.go:247] Installing signal handlers
I1129 06:18:37.202322 1 main.go:386] Found network config - Backend type: vxlan
I1129 06:18:37.202427 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
I1129 06:18:37.204109 1 main.go:317] Wrote subnet file to /run/flannel/subnet.env
I1129 06:18:37.204125 1 main.go:321] Running backend.
I1129 06:18:37.204152 1 main.go:339] Waiting for all goroutines to exit
I1129 06:18:37.204171 1 vxlan_network.go:60] watching for new subnet leases
Kube-proxy logs.
W1129 05:49:26.754768 1 proxier.go:649] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
I1129 05:49:26.878377 1 node.go:136] Successfully retrieved node IP: 10.67.39.73
I1129 05:49:26.878419 1 server_others.go:142] kube-proxy node IP is an IPv4 address (10.67.39.73), assume IPv4 operation
W1129 05:49:27.085025 1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
I1129 05:49:27.085155 1 server_others.go:185] Using iptables Proxier.
I1129 05:49:27.085496 1 server.go:650] Version: v1.19.9
I1129 05:49:27.086000 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I1129 05:49:27.086090 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1129 05:49:27.086131 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1129 05:49:27.086443 1 config.go:315] Starting service config controller
I1129 05:49:27.086454 1 shared_informer.go:240] Waiting for caches to sync for service config
I1129 05:49:27.086475 1 config.go:224] Starting endpoint slice config controller
I1129 05:49:27.086479 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I1129 05:49:27.186686 1 shared_informer.go:247] Caches are synced for endpoint slice config
I1129 05:49:27.186731 1 shared_informer.go:247] Caches are synced for service config
lsmod o/p
Module Size Used by
iptable_nat 16384 1
iptable_filter 16384 1
ip_tables 28672 2 iptable_filter,iptable_nat
xt_statistic 16384 3
vxlan 65536 0
ip6_udp_tunnel 16384 1 vxlan
udp_tunnel 20480 1 vxlan
ipt_REJECT 16384 8
nf_reject_ipv4 16384 1 ipt_REJECT
ip_vs_sh 16384 0
ip_vs_wrr 16384 0
ip_vs_rr 16384 0
ip_vs 172032 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
xt_comment 16384 214
xt_mark 16384 6
xt_nat 16384 43
veth 28672 0
ipt_MASQUERADE 16384 11
nf_conntrack_netlink 49152 0
xt_addrtype 16384 17
nft_chain_nat 16384 12
nf_nat 45056 4 ipt_MASQUERADE,xt_nat,nft_chain_nat,iptable_nat
br_netfilter 24576 0
bridge 278528 1 br_netfilter
stp 16384 1 bridge
llc 16384 2 bridge,stp
nft_counter 16384 278
xt_conntrack 16384 11
nf_conntrack 172032 6 xt_conntrack,nf_nat,ipt_MASQUERADE,xt_nat,nf_conntrack_netlink,ip_vs
nf_defrag_ipv6 20480 2 nf_conntrack,ip_vs
nf_defrag_ipv4 16384 1 nf_conntrack
xt_owner 16384 1
nft_compat 20480 453
overlay 139264 152
nf_tables 180224 1047 nft_compat,nft_counter,nft_chain_nat
nfnetlink 16384 4 nft_compat,nf_conntrack_netlink,nf_tables
vfat 20480 1
fat 81920 1 vfat
intel_rapl_msr 16384 0
intel_rapl_common 24576 1 intel_rapl_msr
isst_if_mbox_msr 16384 0
isst_if_common 16384 1 isst_if_mbox_msr
kvm_intel 339968 0
kvm 905216 1 kvm_intel
irqbypass 16384 1 kvm
crct10dif_pclmul 16384 1
crc32_pclmul 16384 0
ghash_clmulni_intel 16384 0
rapl 20480 0
hv_utils 45056 1
hv_balloon 32768 0
hyperv_fb 24576 1
i2c_piix4 24576 0
ata_generic 16384 0
pcspkr 16384 0
joydev 24576 0
binfmt_misc 20480 1
xfs 1556480 11
libcrc32c 16384 5 nf_conntrack,nf_nat,nf_tables,xfs,ip_vs
sd_mod 53248 12
t10_pi 16384 1 sd_mod
sg 40960 0
hv_netvsc 94208 0
hv_storvsc 24576 8
scsi_transport_fc 81920 1 hv_storvsc
hid_hyperv 16384 0
hyperv_keyboard 16384 0
ata_piix 36864 0
libata 262144 2 ata_piix,ata_generic
crc32c_intel 24576 1
hv_vmbus 131072 7 hv_balloon,hv_utils,hv_netvsc,hid_hyperv,hv_storvsc,hyperv_keyboard,hyperv_fb
serio_raw 16384 0
dm_mirror 28672 0
dm_region_hash 20480 1 dm_mirror
dm_log 20480 2 dm_region_hash,dm_mirror
dm_mod 151552 32 dm_log,dm_mirror
ipmi_devintf 20480 0
ipmi_msghandler 110592 1 ipmi_devintf

Prometheus metrics yield multiplied values for Kubernetes monitoring on cores, memory and storage

I'm trying to import some pre-built Grafana dashboards for Kubernetes monitoring but I don't get why some metrics seem to be duplicated or multiplied.
For example, this metric is yielding 6 nodes:
sum(kube_node_info{node=~"$node"})
Which is double than the what the cluster actually has:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-agentpool-vmss000000 Ready agent 57d v1.23.5
aks-agentpool-vmss000001 Ready agent 57d v1.23.5
aks-agentpool-vmss000007 Ready agent 35d v1.23.5
Another example:
This metrics is yielding a total of 36 cores, when in reality there are only 12 (3 nodes x 4 cores each)
sum (machine_cpu_cores{kubernetes_io_hostname=~"^$Node$"})
Capacity:
cpu: 4
ephemeral-storage: 129900528Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16393292Ki
pods: 110
If I filter que query by system_uuid, each of the 3 uuids yield 12 cores.
The same goes for memory usage, filesystem storage and so on. Any idea why the metrics are multiplied?
The dashboard in question is this: https://grafana.com/grafana/dashboards/10000

how to restart a Mon in a Ceph cluster

I am running a Ceph Cluster (pacific) consisting of 3 nodes. I have 3 Monitor Daemons started (one on each node).
Just for some test I wanted to remove node3 out of the list of monitors with the command
sudo ceph mon remove node3
The mon disappeared in the Dashboard and also when I call ceph status command:
$ sudo ceph status
cluster:
id: xxxxx-yyyyy-zzzzzzz
health: HEALTH_WARN
1 failed cephadm daemon(s)
services:
mon: 2 daemons, quorum node1,node2 (age 53m)
mgr: node1.gafnwm(active, since 102m), standbys: node2.jppwxo
osd: 3 osds: 3 up (since 15m), 3 in (since 96m)
But now I found no way to restart or add the monitor again into my cluster. I tried:
$ sudo ceph orch daemon add mon node3:10.0.0.4
Error EINVAL: name mon.node3 already in use
What does this mean? Why is the name mon.node3 already in use even if I can't see it?

GlusterFS volume with replica 3 arbiter 1 mounted in Kubernetes PODs contains zero size files

I was planning to migrate from replica 3 to replica 3 with arbiter 1, but faced with a strange issue on my third node(that acts as arbiter).
When I mounted new volume endpoint to the node where Gluster arbiter POD is running I'm getting strange behavior: some files are fine, but some are zero sizes. When I mount the same share on another node, all files are fine.
GlusterFS is running as a Kubernetes daemonset and I'm using heketi to manage GlusterFS from Kubernetes automatically.
I'm using glusterfs 4.1.5 and Kubernetes 1.11.1.
gluster volume info vol_3ffdfde93880e8aa39c4b4abddc392cf
Type: Replicate
Volume ID: e67d2ade-991a-40f9-8f26-572d0982850d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.2.70:/var/lib/heketi/mounts/vg_426b28072d8d0a4c27075930ddcdd740/brick_35389ca30d8f631004d292b76d32a03b/brick
Brick2: 192.168.2.96:/var/lib/heketi/mounts/vg_3a9b2f229b1e13c0f639db6564f0d820/brick_953450ef6bc25bfc1deae661ea04e92d/brick
Brick3: 192.168.2.148:/var/lib/heketi/mounts/vg_7d1e57c2a8a779e69d22af42812dffd7/brick_b27af182cb69e108c1652dc85b04e44a/brick (arbiter)
Options Reconfigured:
user.heketi.id: 3ffdfde93880e8aa39c4b4abddc392cf
user.heketi.arbiter: true
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
Status Output:
gluster volume status vol_3ffdfde93880e8aa39c4b4abddc392cf
Status of volume: vol_3ffdfde93880e8aa39c4b4abddc392cf
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.2.70:/var/lib/heketi/mounts/v
g_426b28072d8d0a4c27075930ddcdd740/brick_35
389ca30d8f631004d292b76d32a03b/brick 49152 0 Y 13896
Brick 192.168.2.96:/var/lib/heketi/mounts/v
g_3a9b2f229b1e13c0f639db6564f0d820/brick_95
3450ef6bc25bfc1deae661ea04e92d/brick 49152 0 Y 12111
Brick 192.168.2.148:/var/lib/heketi/mounts/
vg_7d1e57c2a8a779e69d22af42812dffd7/brick_b
27af182cb69e108c1652dc85b04e44a/brick 49152 0 Y 25045
Self-heal Daemon on localhost N/A N/A Y 25069
Self-heal Daemon on worker1-aws-va N/A N/A Y 12134
Self-heal Daemon on 192.168.2.70 N/A N/A Y 13919
Task Status of Volume vol_3ffdfde93880e8aa39c4b4abddc392cf
------------------------------------------------------------------------------
There are no active volume tasks
Heal output:
gluster volume heal vol_3ffdfde93880e8aa39c4b4abddc392cf info
Brick 192.168.2.70:/var/lib/heketi/mounts/vg_426b28072d8d0a4c27075930ddcdd740/brick_35389ca30d8f631004d292b76d32a03b/brick
Status: Connected
Number of entries: 0
Brick 192.168.2.96:/var/lib/heketi/mounts/vg_3a9b2f229b1e13c0f639db6564f0d820/brick_953450ef6bc25bfc1deae661ea04e92d/brick
Status: Connected
Number of entries: 0
Brick 192.168.2.148:/var/lib/heketi/mounts/vg_7d1e57c2a8a779e69d22af42812dffd7/brick_b27af182cb69e108c1652dc85b04e44a/brick
Status: Connected
Number of entries: 0
Any ideas how to resolve this issue?
The issues were fixed after updating glusterfs-client and glusterfs-common packages on Kubernetes Workers to a more recent version.

Ceph status HEALTH_WARN while adding an RGW Instance

I want to create ceph cluster and then connect to it through S3 RESTful api.
So, I've deployed ceph cluster (mimic 13.2.4) on "Ubuntu 16.04.5 LTS" with 3 OSD (one per each HDD 10Gb).
Using this tutorials:
1) http://docs.ceph.com/docs/mimic/start/quick-start-preflight/#ceph-deploy-setup
2) http://docs.ceph.com/docs/mimic/start/quick-ceph-deploy/
At this point, ceph status is OK:
root#ubuntu-srv:/home/slavik/my-cluster# ceph -s
cluster:
id: d7459118-8c16-451d-9774-d09f7a926d0e
health: HEALTH_OK
services:
mon: 1 daemons, quorum ubuntu-srv
mgr: ubuntu-srv(active)
osd: 3 osds: 3 up, 3 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 27 GiB / 30 GiB avail
pgs:
3) "To use the Ceph Object Gateway component of Ceph, you must deploy an instance of RGW. Execute the following to create an new instance of RGW:"
root#ubuntu-srv:/home/slavik/my-cluster# ceph-deploy rgw create ubuntu-srv
....
[ceph_deploy.rgw][INFO ] The Ceph Object Gateway (RGW) is now running on host ubuntu-srv and default port 7480
root#ubuntu-srv:/home/slavik/my-cluster# ceph -s
cluster:
id: d7459118-8c16-451d-9774-d09f7a926d0e
health: HEALTH_WARN
too few PGs per OSD (2 < min 30)
services:
mon: 1 daemons, quorum ubuntu-srv
mgr: ubuntu-srv(active)
osd: 3 osds: 3 up, 3 in
data:
pools: 1 pools, 8 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 27 GiB / 30 GiB avail
pgs: 37.500% pgs unknown
62.500% pgs not active
5 creating+peering
3 unknown
Ceph status has been changed to HEALTH_WARN - why and how to resolve it?
Your issue is
health: HEALTH_WARN
too few PGs per OSD (2 < min 30)
Look at you current pg config by running:
ceph osd dump|grep pool
See what each pool is configured for pg count, then goto https://ceph.com/pgcalc/ to calculate what your pools should be configured for.
The warning is that you have a low number of pg's per osd, right now you have 2 per osd, where min should be 30