CockroachDB on Single Cluster Kube PODs fail with CrashLoopBackOff - kubernetes

Using VirtualBox and 4 x Centos7 OS installs.
Following a basic Single cluster kubernetes install:
https://kubernetes.io/docs/setup/independent/install-kubeadm/
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
[root#k8s-master cockroach]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 41m v1.13.2
k8s-slave1 Ready <none> 39m v1.13.2
k8s-slave2 Ready <none> 39m v1.13.2
k8s-slave3 Ready <none> 39m v1.13.2
I have created 3 x NFS PV's on master for my slaves to pick up as part of the cockroachdb-statefulset.yaml as described here:
https://www.cockroachlabs.com/blog/running-cockroachdb-on-kubernetes/
However my cockroach PODs just continually fail to communicate with each other.
[root#k8s-slave1 kubernetes]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cockroachdb-0 0/1 CrashLoopBackOff 6 8m47s
cockroachdb-1 0/1 CrashLoopBackOff 6 8m47s
cockroachdb-2 0/1 CrashLoopBackOff 6 8m47s
[root#k8s-slave1 kubernetes]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
datadir-cockroachdb-0 Bound cockroachdbpv0 10Gi RWO 17m
datadir-cockroachdb-1 Bound cockroachdbpv2 10Gi RWO 17m
datadir-cockroachdb-2 Bound cockroachdbpv1 10Gi RWO 17m
...the cockroach pod logs do not really tell me why...
[root#k8s-slave1 kubernetes]# kubectl logs cockroachdb-0
++ hostname -f
+ exec /cockroach/cockroach start --logtostderr --insecure --advertise-host cockroachdb-0.cockroachdb.default.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%
W190113 17:00:46.589470 1 cli/start.go:1055 RUNNING IN INSECURE MODE!
- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.
Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v2.1/secure-a-cluster.html
I190113 17:00:46.595544 1 server/status/recorder.go:609 available memory from cgroups (8.0 EiB) exceeds system memory 3.7 GiB, using system memory
I190113 17:00:46.600386 1 cli/start.go:1069 CockroachDB CCL v2.1.3 (x86_64-unknown-linux-gnu, built 2018/12/17 19:15:31, go1.10.3)
I190113 17:00:46.759727 1 server/status/recorder.go:609 available memory from cgroups (8.0 EiB) exceeds system memory 3.7 GiB, using system memory
I190113 17:00:46.759809 1 server/config.go:386 system total memory: 3.7 GiB
I190113 17:00:46.759872 1 server/config.go:388 server configuration:
max offset 500000000
cache size 947 MiB
SQL memory pool size 947 MiB
scan interval 10m0s
scan min idle time 10ms
scan max idle time 1s
event log enabled true
I190113 17:00:46.759896 1 cli/start.go:913 using local environment variables: COCKROACH_CHANNEL=kubernetes-insecure
I190113 17:00:46.759909 1 cli/start.go:920 process identity: uid 0 euid 0 gid 0 egid 0
I190113 17:00:46.759919 1 cli/start.go:545 starting cockroach node
I190113 17:00:46.762262 22 storage/engine/rocksdb.go:574 opening rocksdb instance at "/cockroach/cockroach-data/cockroach-temp632709623"
I190113 17:00:46.803749 22 server/server.go:851 [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190113 17:00:46.804168 22 storage/engine/rocksdb.go:574 opening rocksdb instance at "/cockroach/cockroach-data"
I190113 17:00:46.828487 22 server/config.go:494 [n?] 1 storage engine initialized
I190113 17:00:46.828526 22 server/config.go:497 [n?] RocksDB cache size: 947 MiB
I190113 17:00:46.828536 22 server/config.go:497 [n?] store 0: RocksDB, max size 0 B, max open file limit 60536
W190113 17:00:46.838175 22 gossip/gossip.go:1499 [n?] no incoming or outgoing connections
I190113 17:00:46.838260 22 cli/start.go:505 initial startup completed, will now wait for `cockroach init`
or a join to a running cluster to start accepting clients.
Check the log file(s) for progress.
I190113 17:00:46.841243 22 server/server.go:1402 [n?] no stores bootstrapped and --join flag specified, awaiting init command.
W190113 17:01:16.841095 89 cli/start.go:535 The server appears to be unable to contact the other nodes in the cluster. Please try:
- starting the other nodes, if you haven't already;
- double-checking that the '--join' and '--listen'/'--advertise' flags are set up correctly;
- running the 'cockroach init' command if you are trying to initialize a new cluster.
If problems persist, please see https://www.cockroachlabs.com/docs/v2.1/cluster-setup-troubleshooting.html.
I190113 17:01:31.357765 1 cli/start.go:756 received signal 'terminated'
I190113 17:01:31.359529 1 cli/start.go:821 initiating graceful shutdown of server
initiating graceful shutdown of server
I190113 17:01:31.361064 1 cli/start.go:872 too early to drain; used hard shutdown instead
too early to drain; used hard shutdown instead
...any ideas how to debug this further?

I have gone through *.yaml file at https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset.yaml
I noticed that towards the bottom there is no storageClassName mentioned which means that during the volume claim process, pods are going to look for standard storage class.
I am not sure if you used below annotation while provisioning 3 NFS volumes -
storageclass.kubernetes.io/is-default-class=true
You should be able to check the same using -
kubectl get storageclass
If the output does not show Standard storage class then I would suggest either readjusting persistent volumes definitions by adding annotation or add empty string as storageClassName towards the end of the cockroach-statefulset.yaml file
More logs can be viewed using -
kubectl describe cockroachdb-{statefulset}

OK it came down to the fact I had NAT as my virtualbox external facing network adaptor. I changed it to Bridged and it all started working perfectly. If anyone can tell me why, that would be awesome :)

In my case, using helm chart, like below:
$ helm install stable/cockroachdb \
-n cockroachdb \
--namespace cockroach \
--set Storage=10Gi \
--set NetworkPolicy.Enabled=true \
--set Secure.Enabled=true
After wait to finish adding csr's for cockroach:
$ watch kubectl get csr
Several csr's are pending:
$ kubectl get csr
NAME AGE REQUESTOR CONDITION
cockroachdb.client.root 130m system:serviceaccount:cockroachdb:cockroachdb-cockroachdb Pending
cockroachdb.node.cockroachdb-cockroachdb-0 130m system:serviceaccount:cockroachdb:cockroachdb-cockroachdb Pending
cockroachdb.node.cockroachdb-cockroachdb-1 129m system:serviceaccount:cockroachdb:cockroachdb-cockroachdb Pending
cockroachdb.node.cockroachdb-cockroachdb-2 130m system:serviceaccount:cockroachdb:cockroachdb-cockroachdb Pending
To approve that run follow command:
$ kubectl get csr -o json | \
jq -r '.items[] | select(.metadata.name | contains("cockroach.")) | .metadata.name' | \
xargs -n 1 kubectl certificate approve

Related

How do you increase maximum pods per node in K3S?

I have tried setting max nodes per pod using the following upon install:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--max-pods 250" sh -s -
However, the K3s server will then fail to load. It appears that the --max-pods flag has been deprecated per the kubernetes docs:
--max-pods int32 Default: 110
(DEPRECATED: This parameter should be set via the config file
specified by the Kubelet's --config flag. See
https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
for more information.)
So with K3s, where is that kubelet config file and can/should it be set using something like the above method?
To update your existing installation with an increased max-pods, add a kubelet config file into a k3s associated location such as /etc/rancher/k3s/kubelet.config:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 250
edit /etc/systemd/system/k3s.service to change the k3s server args:
ExecStart=/usr/local/bin/k3s \
server \
'--disable' \
'servicelb' \
'--disable' \
'traefik' \
'--kubelet-arg=config=/etc/rancher/k3s/kubelet.config'
reload systemctl to pick up the service change:
sudo systemctl daemon-reload
restart k3s:
sudo systemctl restart k3s
Check the output of describe nodes with kubectl describe <node> and look for allocatable resources:
Allocatable:
cpu: 32
ephemeral-storage: 199789251223
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 131811756Ki
pods: 250
and a message noting that allocatable node limit has been updated in Events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 20m kube-proxy Starting kube-proxy.
Normal Starting 20m kubelet Starting kubelet.
...
Normal NodeNotReady 7m52s kubelet Node <node> status is now: NodeNotReady
Normal NodeAllocatableEnforced 7m50s kubelet Updated Node Allocatable limit across pods
Normal NodeReady 7m50s kubelet Node <node> status is now: NodeReady
As described in documentation, it is possible to set the Kubelet's configuration parameters via
an on-disk config file.
NOTE: Using an on-disk config file, we can set only subset of the Kubelet's configuration parameters, that we want to override, all other Kubelet configuration values are left at their built-in defaults, unless overridden by flags.
I've created simple config file to override maxPods value (default 110):
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 250
And then we have to pass this config file as an argument during K3s installation (I recommend to specify absolute pathname):
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--kubelet-arg=config=<KUBELET_CONFIG_FILE_LOCATION>" sh -
Finally we can check that maxPods is equal to 250:
# kubectl describe nodes <NODE_NAME> | grep -i pod
pods: 250
pods: 250
PodCIDR: 10.42.0.0/24
PodCIDRs: 10.42.0.0/24
In addition you can find interesting discussion here, I believe you can find there another way to solve your issue.

Using Microk8s and OpenEBS cStor leads to an error when creating pool claims. Anybody know why this is occurring, and how to fix it?

I am using Microk8s (1.19, on Ubuntu 20.04.1 LTS) and am trying to use OpenOBS (cStor engine) for storage.
Since I'm running this locally before pushing to prod, I created virtual block devices with:
blockdevicedisk='/k8storage/diskimage'
blockdevicesize=10000
sudo dd if=/dev/zero of=$blockdevicedisk bs=1M count=$blockdevicesize
sudo mkfs.ext4 $blockdevicedisk
sudo losetup /dev/loop0 /k8storage/diskimage
$lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 9.8G 0 loop
loop1 7:1 0 7.8G 0 loop
sda 8:0 0 256G 0 disk
sdb 8:16 0 256G 0 disk /
I installed OpenEBS with helm, then removed 'loop' from the openebs-ndm-config -> filterconfigs -> path-filter -> exclude. So that ndm would display these as blockdevices.
$kubectl get blockdevices -n openebs
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-87ca7d6819eab3ea3af2884f2f6e9f8e v 274877906944 Unclaimed Active 19h
blockdevice-0a6c8d26081660a37f0a87dbb316c7ae v 10485760000 Unclaimed Active 19h
blockdevice-cd43d37664edd1c880e11f5b8e9cbe60 v 8388608000 Unclaimed Active 19h
^ the last 2 are the ones I made.
I then wrote the config to create a cStor StoragePoolClaim
apiVersion: openebs.io/v1alpha1
kind: StoragePoolClaim
metadata:
name: cstor-pool-claim
spec:
name: cstor-pool-claim
type: disk
poolSpec:
poolType: striped
blockDevices:
blockDeviceList:
- blockdevice-0a6c8d26081660a37f0a87dbb316c7ae
- blockdevice-cd43d37664edd1c880e11f5b8e9cbe60
When I apply it, both the blockdevices are claimed
$kubectl get blockdevices -n openebs
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-87ca7d6819eab3ea3af2884f2f6e9f8e v 274877906944 Unclaimed Active 19h
blockdevice-0a6c8d26081660a37f0a87dbb316c7ae v 10485760000 Claimed Active 19h
blockdevice-cd43d37664edd1c880e11f5b8e9cbe60 v 8388608000 Claimed Active 19h
which is expected.
$kubectl get spc
NAME AGE
cstor-pool-claim 18h
However, there is a problem!
$kubectl get csp
NAME ALLOCATED FREE CAPACITY STATUS READONLY TYPE AGE
cstor-pool-claim-nf0g Init false striped 19h
It never changes from Init status. There is a pod created which shows the error as
$kubectl describe pod cstor-pool-claim-nf0g-6cb75f8f49-sw6q2 -n openebs
which spews a lot of text I could show if it helps. The key part is the error message, which is:
Error: failed to create containerd task: OCI runtime create failed:
container_linux.go:370: starting container process caused:
process_linux.go:459: container init caused: rootfs_linux.go:59:
mounting
"/var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/a9b84df9076c91b83982f157e9bacdc5a10f80846d32034dd15cdae1c1d4c4c1/shm"
to rootfs at "/dev/shm" caused: secure join: too many levels of
symbolic links: unknown
I've tried resetting my setup and re-entering the commands one-by-one to ensure that I was following the documentation, and other examples correctly, however, I keep encountering this error.
Is this a limitation of microk8s? A fault of OpenEBS? Something weird about my setup? Or did I do something wrong?
More importantly: Is there a way to get this to work correctly?

After rebooting nodes, pods stuck in containerCreating state due to insufficient weave IP

I have 3 node Kubernetes cluster on 1.11 deployed with kubeadm and weave(CNI) running of version 2.5.1. I am providing weave CIDR of IP range of 128 IP's. After two reboot of nodes some of the pods stuck in containerCreating state.
Once you run kubectl describe pod <pod_name> you will see following errors:
Events:
Type Reason Age From Message
---- ------ ---- ----
-------
Normal SandboxChanged 20m (x20 over 1h) kubelet, 10.0.1.63 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 30s (x25 over 1h) kubelet, 10.0.1.63 Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
If I check how many containers are running and how many IP address are allocated to those, I can see 24 containers:
[root#ip-10-0-1-63 centos]# weave ps | wc -l
26
The number of total IP's to weave at that node is 42.
[root#ip-10-0-1-212 centos]# kubectl exec -n kube-system -it weave-net-6x4cp -- /home/weave/weave --local status ipam
Defaulting container name to weave.
Use 'kubectl describe pod/weave-net-6x4cp -n kube-system' to see all of the containers in this pod.
6e:0d:f3:d7:f5:49(10.0.1.63) 42 IPs (32.8% of total) (42 active)
7a:24:6f:3c:1b:be(10.0.1.212) 40 IPs (31.2% of total)
ee:00:d4:9f:9d:79(10.0.1.43) 46 IPs (35.9% of total)
You can see all 42 IP's are active so no more IP's are available to allocate to new containers. But out of 42 only 26 are actually allocated to containers, I am not sure where are remaining IP's. It is happening on all three nodes.
Here is the output of weave status for your reference:
[root#ip-10-0-1-212 centos]# weave status
Version: 2.5.1 (version 2.5.2 available - please upgrade!)
Service: router
Protocol: weave 1..2
Name: 7a:24:6f:3c:1b:be(10.0.1.212)
Encryption: disabled
PeerDiscovery: enabled
Targets: 3
Connections: 3 (2 established, 1 failed)
Peers: 3 (with 6 established connections)
TrustedSubnets: none
Service: ipam
Status: waiting for IP(s) to become available
Range: 192.168.13.0/25
DefaultSubnet: 192.168.13.0/25
If you need anymore information, I would happy to provide. Any Clue?
Not sure if we have the same problem.
But before i reboot a node. I need to drain it first. So, all pods in that nodes will be evicted. We are safe to reboot the node.
After that node is up. You need to uncordon again. The node will be available to scheduling pod again.
My reference https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/
I guess that 16 IP's have reserved for Pods reuse purpose. These are the maximum pods per node based on CIDR ranges.
Maximum Pods per Node CIDR Range per Node
8 /28
9 to 16 /27
17 to 32 /26
33 to 64 /25
65 to 110 /24
In case if you're weave IP's are exhausted and some of the IP's are not released after reboot. You can delete the file /var/lib/weave/weave-netdata.db and restart the weave pods.
For my case, I have added a systemd script which on every reboot or shutdown of the system removes the /var/lib/weave/weave-netdata.db file and Once system comes up it allocates new Ip's to all the pods and the weave IP exhaust were never seen again.
Posting this here in hope someone else will find it useful for their use case.

Kubernetes: specify CPUs for cpumanager

Is it possible to specify CPU ID list to the Kubernetes cpumanager? The goal is to make sure pods get CPUs from a single socket (0). I brought all the CPUs on the peer socket offline as mentioned here, for example:
$ echo 0 > /sys/devices/system/cpu/cpu5/online
After doing this, the Kubernetes master indeed sees the remaining online CPUs
kubectl describe node foo
Capacity:
cpu: 56 <<< socket 0 CPU count
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 197524872Ki
pods: 110
Allocatable:
cpu: 54 <<< 2 system reserved CPUs
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 71490952Ki
pods: 110
System Info:
Machine ID: 1155420082478559980231ba5bc0f6f2
System UUID: 4C4C4544-0044-4210-8031-C8C04F584B32
Boot ID: 7fa18227-748f-496c-968c-9fc82e21ecd5
Kernel Version: 4.4.13
OS Image: Ubuntu 16.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.3
Kubelet Version: v1.11.1
Kube-Proxy Version: v1.11.1
However, cpumanager still seems to think there are 112 CPUs (socket0 + socket1).
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-111"}
As a result, the kubelet system pods are throwing the following error:
kube-system kube-proxy-nk7gc 0/1 rpc error: code = Unknown desc = failed to update container "eb455f81a61b877eccda0d35eea7834e30f59615346140180f08077f64896760": Error response from daemon: Requested CPUs are not available - requested 0-111, available: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110 762 36d <IP address> foo <none>
I was able to get this working. Posting this as an answer so that someone in need might benefit.
It appears the CPU set is read from /var/lib/kubelet/cpu_manager_state file and it is not updated across kubelet restarts. So this file needs to be removed before restarting kubelet.
The following worked for me:
# On a running worker node, bring desired CPUs offline. (run as root)
$ cpu_list=`lscpu | grep "NUMA node1 CPU(s)" | awk '{print $4}'`
$ chcpu -d $cpu_list
$ rm -f /var/lib/kubelet/cpu_manager_state
$ systemctl restart kubelet.service
# Check the CPU set seen by the CPU manager
$ cat /var/lib/kubelet/cpu_manager_state
# Try creating pods and check the syslog:
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122466 8070 state_mem.go:84] [cpumanager] updated default cpuset: "0,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122643 8070 policy_static.go:198] [cpumanager] allocateCPUs: returning "2,4,6,8,58,60,62,64"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122660 8070 state_mem.go:76] [cpumanager] updated desired cpuset (container id: 356939cdf32d0f719e83b0029a018a2ca2c349fc0bdc1004da5d842e357c503a, cpuset: "2,4,6,8,58,60,62,64")
I have reported a bug here as I think the CPU set should be updated after kubelet restarts.

kubectl get nodes shows NotReady

I have installed two nodes kubernetes 1.12.1 in cloud VMs, both behind internet proxy. Each VMs have floating IPs associated to connect over SSH, kube-01 is a master and kube-02 is a node. Executed export:
no_proxy=127.0.0.1,localhost,10.157.255.185,192.168.0.153,kube-02,192.168.0.25,kube-01
before running kubeadm init, but I am getting the following status for kubectl get nodes:
NAME STATUS ROLES AGE VERSION
kube-01 NotReady master 89m v1.12.1
kube-02 NotReady <none> 29s v1.12.2
Am I missing any configuration? Do I need to add 192.168.0.153 and 192.168.0.25 in respective VM's /etc/hosts?
Looks like pod network is not installed yet on your cluster . You can install weave for example with below command
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
After a few seconds, a Weave Net pod should be running on each Node and any further pods you create will be automatically attached to the Weave network.
You can install pod networks of your choice . Here is a list
after this check
$ kubectl describe nodes
check all is fine like below
Conditions:
Type Status
---- ------
OutOfDisk False
MemoryPressure False
DiskPressure False
Ready True
Capacity:
cpu: 2
memory: 2052588Ki
pods: 110
Allocatable:
cpu: 2
memory: 1950188Ki
pods: 110
next ssh to the pod which is not ready and observe kubelet logs. Most likely errors can be of certificates and authentication.
You can also use journalctl on systemd to check kubelet errors.
$ journalctl -u kubelet
Try with this
Your coredns is in pending state check with the networking plugin you have used and check the proper addons are added
check kubernates troubleshooting guide
https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#coredns-or-kube-dns-is-stuck-in-the-pending-state
https://kubernetes.io/docs/concepts/cluster-administration/addons/
And install the following with those
And check
kubectl get pods -n kube-system
On the off chance it might be the same for someone else, in my case, I was using the wrong AMI image to create the nodegroup.
Run
journalctl -u kubelet
Then check at node logs, if you get below error, disable the sawp using swapoff -a
"Failed to run kubelet" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fa
Main process exited, code=exited, status=1/FAILURE