Rook ceph broken on kubernetes? - kubernetes

Using Ceph v1.14.10, Rook v1.3.8 on k8s 1.16 on-premise. After 10 days without any trouble, we decided to drain some nodes, then, all moved pods cant attach to their PV any more, look like Ceph cluster is broken:
My ConfigMap rook-ceph-mon-endpoints is referencing 2 missing mon pod IPs:
csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["10.115.0.129:6789","10.115.0.4:6789","10.115.0.132:6789"]}]
But
kubectl -n rook-ceph get pod -l app=rook-ceph-mon -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-mon-e-56b849775-4g5wg 1/1 Running 0 6h42m 10.115.0.2 XXXX <none> <none>
rook-ceph-mon-h-fc486fb5c-8mvng 1/1 Running 0 6h42m 10.115.0.134 XXXX <none> <none>
rook-ceph-mon-i-65666fcff4-4ft49 1/1 Running 0 30h 10.115.0.132 XXXX <none> <none>
Is it normal or I must run a kind of "reconciliation" task to update the CM with new mon pod IPs ?
(could be related to https://github.com/rook/rook/issues/2262)
I had to manualy update:
secret rook-ceph-config
cm rook-ceph-mon-endpoints
cm rook-ceph-csi-config

As #travisn said:
The operator owns updating that configmap and secret. It's not expected to update them manually unless there is some disaster recovery situation as described at https://rook.github.io/docs/rook/v1.4/ceph-disaster-recovery.html.

Related

Pods assigned to node with no role

I am currently doing some scenarios on Katacoda and I am curious about the node roles.
My cluster has 2 nodes and one has the role Master and the other has no role.
NAME STATUS ROLES AGE VERSION
controlplane Ready master 5m11s v1.18.0
node01 Ready <none> 4m40s v1.18.0
Now when I look at the pods in my cluster, it seems like all three pods are assigned to node01
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
random-logger-7687d48b59-jftl9 1/1 Running 0 26s 10.244.1.5 node01 <none> <none>
random-logger-7687d48b59-mwz4h 1/1 Running 0 26s 10.244.1.4 node01 <none> <none>
random-logger-7687d48b59-vgbc8 1/1 Running 0 117s 10.244.1.3 node01 <none> <none>
My understanding is that the scheduler puts pods on notes that are assigned the worker role, so why are they being scheduled on a node with no role? Is a node with no role in the cluster assumed to be a worker node?
use kubectl describe nodes my-node to see the details of the node
A node role is just a label with the format node-role.kubernetes.io/
You can add this yourself with 'kubectl label'

Openshift: How to Delete or Manage Specific Pod

I spun up an Openshift 4.6 cluster on AWS, 3 masters, 2 Workers for learning/play. Since it was just for learning, I shut down all nodes at once using the AWS Web Console. When I brought them back up a few days later, the console will not spin up.
So I followed this doc on restarting the cluster. Didn't work, so I started looking at how the pods were doing for the console itself. I ran oc get pods -n openshift-console -o wide -w and got:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-59f557f67d-ks5kq 0/1 Pending 0 13m <none> <none> <none> <none>
console-59f557f67d-q6zrc 0/1 UnexpectedAdmissionError 0 4d6h <none> ip-10-0-139-41.us-west-2.compute.internal <none> <none>
console-59f557f67d-w4q7l 0/1 UnexpectedAdmissionError 0 56m <none> ip-10-0-131-234.us-west-2.compute.internal <none> <none>
console-59f557f67d-zvxzn 0/1 UnexpectedAdmissionError 0 59m <none> ip-10-0-131-234.us-west-2.compute.internal <none> <none>
downloads-55f4ff79-lqdj7 1/1 Running 0 59m 10.131.0.4 ip-10-0-208-19.us-west-2.compute.internal <none> <none>
downloads-55f4ff79-mrfzn 1/1 Running 0 59m 10.131.0.13 ip-10-0-208-19.us-west-2.compute.internal <none> <none>
So seeing that there were a few messed up pods, I wanted to look at their logs, but I don't know how to target just a specific pod by the NAME column above. I tried each of these:
oc get pods/console-59f557f67d-q6zrc
oc get podtemplates/console-59f557f67d-q6zrc
I consistently get Error from server (NotFound): pods "console-59f557f67d-q6zrc" not found.
I then found the command oc get pods -n openshift-console -o name which reveals:
pod/console-59f557f67d-ks5kq
pod/console-59f557f67d-q6zrc
pod/console-59f557f67d-w4q7l
pod/console-59f557f67d-zvxzn
pod/downloads-55f4ff79-lqdj7
pod/downloads-55f4ff79-mrfzn
So I was right, it's a "pod", but then if I try to run anything like oc logs <name> it returns the same error that it can't be found. Is this a bug? Does Openshift think there are Pods around that no longer exist and is routing to those Pods despite not existing?
If not, what resource type is the thing under the NAME column? How do I target it with say an oc logs or oc delete command?
I discovered the correct syntax in this Redhat Bugzilla Issue.
The correct syntax is to place the name as another argument after the namespace declaration.
Examples:
oc describe pod -n openshift-console console-59f557f67d-zvxzn
oc logs pod -n openshift-console console-59f557f67d-zvxzn
oc delete pod -n openshift-console console-59f557f67d-zvxzn
I'll follow up and update when I find this reference in the official docs or command line help.

Inquiring pod and service subnets from inside Kubernetes cluster

How can one inquire the Kubernetes pod and service subnets in use (e.g. 10.244.0.0/16 and 10.96.0.0/12 respectively) from inside a Kubernetes cluster in a portable and simple way?
For instance, kubectl get cm -n kube-system kubeadm-config -o yaml reports podSubnet and serviceSubnet. But this is not fully portable because a cluster may have been set up by another means than kubeadm.
kubectl get cm -n kube-system kube-proxy -o yaml reports clusterCIDR (i.e. pod subnet) and kubectl get pod -n kube-system kube-apiserver-master1 -o yaml reports the value
passed as command-line option --service-cluster-ip-range to kube-apiserver (i.e. service subnet). master1 stands for the name of any control plane node. But this seems a bit complex.
Is there a better way available e.g. with the Kubernetes 1.17 API?
I don't think it would be possible to obtain what you want in a portable and simple way.
If you don't specify Cidr's parameters it will assign default one.
As you have many ways to run kubernetes as unmanaged clusters like kubeadm, minikbue, k3s, micork8s or managed like Cloud providers (GKE, Azure, AWS) it's hard to find one way to list all cidrs in all environments. Another obstacle can be versions of Kubernetes or CNI.
In Kubernetes 1.17 Release notes you can find information that
Deprecate the default service IP CIDR. The previous default was 10.0.0.0/24 which will be removed in 6 months/2 releases. Cluster admins must specify their own desired value, by using --service-cluster-ip-range on kube-apiserver.
As example of Kubeadm: $ kubeadm init --pod-network-cidr 10.100.0.0/12 --service-cidr 10.99.0.0/12
There are a few ways to get this pod and service-cidr:
$ kubectl cluster-info dump | grep -E '(service-cluster-ip-range|cluster-cidr)'
"--service-cluster-ip-range=10.99.0.0/12",
"--cluster-cidr=10.100.0.0/12",
$ kubeadm config view | grep Subnet
podSubnet: 10.100.0.0/12
serviceSubnet: 10.99.0.0/12
But if you will check all pods in this cluster, some pods are starting with 192.168.190.X or 192.168.137.X
$ kubectl get pods -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default nginx 1/1 Running 0 62m 192.168.190.129 kubeadm-worker <none> <none>
kube-system calico-kube-controllers-77c5fc8d7f-9n6m5 1/1 Running 0 118m 192.168.137.66 kubeadm-master <none> <none>
kube-system calico-node-2kx2v 1/1 Running 0 117m 10.128.0.4 kubeadm-worker <none> <none>
kube-system calico-node-8xqd9 1/1 Running 0 118m 10.128.0.3 kubeadm-master <none> <none>
kube-system coredns-66bff467f8-sgmkw 1/1 Running 0 120m 192.168.137.65 kubeadm-master <none> <none>
kube-system coredns-66bff467f8-t84ht 1/1 Running 0 120m 192.168.137.67 kubeadm-master <none> <none>
If you will describe any CNI pods you can find another CIDRs:
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
For GKE example you will have:
node CIDRs
$ kubectl describe node | grep CIDRs
PodCIDRs: 10.52.1.0/24
PodCIDRs: 10.52.0.0/24
PodCIDRs: 10.52.2.0/24
$ gcloud container clusters describe cluster-2 --zone=europe-west2-b | grep Cidr
clusterIpv4Cidr: 10.52.0.0/14
clusterIpv4Cidr: 10.52.0.0/14
clusterIpv4CidrBlock: 10.52.0.0/14
servicesIpv4Cidr: 10.116.0.0/20
servicesIpv4CidrBlock: 10.116.0.0/20
podIpv4CidrSize: 24
servicesIpv4Cidr: 10.116.0.0/20
Honestly I don't think there is an easy and portable way to list all podCidrs and serviceCidrs in one simple command.

Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

I am building a Kubernetes cluster following this tutorial, and I have troubles to access the Kubernetes dashboard. I already created another question about it that you can see here, but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question.
I start my master, by running the following commands:
> kubeadm reset
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later
> kubectl apply -f https://git.io/weave-kube/
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 2m46s
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 2m46s
etcd-kubemaster 1/1 Running 0 93s
kube-apiserver-kubemaster 1/1 Running 0 93s
kube-controller-manager-kubemaster 1/1 Running 0 113s
kube-proxy-lxhvs 1/1 Running 0 2m46s
kube-scheduler-kubemaster 1/1 Running 0 93s
Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command :
> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq
I can see in the Events part the following Warning :
Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.
Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) :
> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
--discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]
> ssh [USER]#[WORKER_IP] 'bash' < join.sh
This node has joined the cluster.
On the master, I check that the node is connected:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady master 13m v1.14.1
kubeslave1 NotReady <none> 31s v1.14.1
And I check my pods :
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 14m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 14m
etcd-kubemaster 1/1 Running 0 13m
kube-apiserver-kubemaster 1/1 Running 0 13m
kube-controller-manager-kubemaster 1/1 Running 0 13m
kube-proxy-lxhvs 1/1 Running 0 14m
kube-proxy-xllx4 0/1 ContainerCreating 0 2m16s
kube-scheduler-kubemaster 1/1 Running 0 13m
We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status.
And when I am doing a describe again :
kubectl -n kube-system describe pod kube-proxy-xllx4
I can see in the Events part multiple identical Warnings :
Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused
Here are my repositories :
docker image ls
REPOSITORY TAG
k8s.gcr.io/kube-proxy v1.14.1
k8s.gcr.io/kube-apiserver v1.14.1
k8s.gcr.io/kube-controller-manager v1.14.1
k8s.gcr.io/kube-scheduler v1.14.1
k8s.gcr.io/coredns 1.3.1
k8s.gcr.io/etcd 3.3.10
k8s.gcr.io/pause 3.1
And so, for the dashboard part, I tried to start it with the command
> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml
But the dashboard pod is stuck in Pending state.
kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 40m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 40m
etcd-kubemaster 1/1 Running 0 38m
kube-apiserver-kubemaster 1/1 Running 0 38m
kube-controller-manager-kubemaster 1/1 Running 0 39m
kube-proxy-lxhvs 1/1 Running 0 40m
kube-proxy-xllx4 0/1 ContainerCreating 0 27m
kube-scheduler-kubemaster 1/1 Running 0 38m
kubernetes-dashboard-5f7b999d65-qn8qn 1/1 Pending 0 8s
So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that.
I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this.
There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; which I resolve by adding pod network.
Looks like because there is no Network Addon installed, the nodes are taint as not-ready. Installing the Addon would remove the taints and the Pods will be able to schedule. In my case adding flannel fixed the issue.
EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm:
The network must be deployed before any applications. Also, CoreDNS
will not start up before a network is installed. kubeadm only
supports Container Network Interface (CNI) based networks (and does
not support kubenet).
Actually it is the opposite of a deep or serious issue. This is a trivial issue. Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; mostly because there are no enough resources on the node.
In your case it is a taint that has the node, and your pod doesn't have the toleration. What you have to do is to describe the node and get the taint:
kubectl describe node | grep -i taints
Note: you might have more then one taint. So you might want to do kubectl describe no NODE since with grep you will only see one taint.
Once you get the taint, that will be something like hello=world:NoSchedule; which means key=value:effect, you will have to add a toleration section in your Deployment. This is an example Deployment so you can see how it should look like:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 10
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
name: http
tolerations:
- effect: NoExecute #NoSchedule, PreferNoSchedule
key: node
operator: Equal
value: not-ready
tolerationSeconds: 3600
As you can see there is the toleration section in the yaml. So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration.
Also you can remove the taint, if you don need it. To remove a taint you would describe the node, get the key of the taint and do:
kubectl taint node NODE key-
Hope it makes sense. Just add this section to your deployment, and it will work.
Set up the flannel network tool.
Running commands:
$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f
https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

Kubernetes Canal CNI error on masters

I'm setting up a Kubernetes cluster on a customer.
I've done this process before multiple times, including dealing with vagrant specifics and I've been able to constantly get a K8s cluster up and running without too much fuss.
Now, on this customer I'm doing the same but I've been finding a lot of issues when setting things up, which is completely unexpected.
Comparing to other places where I've setup Kubernetes, the only obvious difference that I see is that I have a proxy server which I constantly have to battle with. Nothing that a NO_PROXY env hasn't been able to handle.
The main issue I'm facing is setting up Canal (Calico + Flannel).
For some reason, on Masters 2 and 3 it just won't start.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system canal-2pvpr 2/3 CrashLoopBackOff 7 14m 10.136.3.37 devmn2.cpdprd.pt
kube-system canal-rdmnl 2/3 CrashLoopBackOff 7 14m 10.136.3.38 devmn3.cpdprd.pt
kube-system canal-swxrw 3/3 Running 0 14m 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-apiserver-devmn1.cpdprd.pt 1/1 Running 1 1h 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-apiserver-devmn2.cpdprd.pt 1/1 Running 1 4h 10.136.3.37 devmn2.cpdprd.pt
kube-system kube-apiserver-devmn3.cpdprd.pt 1/1 Running 1 1h 10.136.3.38 devmn3.cpdprd.pt
kube-system kube-controller-manager-devmn1.cpdprd.pt 1/1 Running 0 15m 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-controller-manager-devmn2.cpdprd.pt 1/1 Running 0 15m 10.136.3.37 devmn2.cpdprd.pt
kube-system kube-controller-manager-devmn3.cpdprd.pt 1/1 Running 0 15m 10.136.3.38 devmn3.cpdprd.pt
kube-system kube-dns-86f4d74b45-vqdb4 0/3 ContainerCreating 0 1h <none> devmn2.cpdprd.pt
kube-system kube-proxy-4j7dp 1/1 Running 1 2h 10.136.3.38 devmn3.cpdprd.pt
kube-system kube-proxy-l2wpm 1/1 Running 1 2h 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-proxy-scm9g 1/1 Running 1 2h 10.136.3.37 devmn2.cpdprd.pt
kube-system kube-scheduler-devmn1.cpdprd.pt 1/1 Running 1 1h 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-scheduler-devmn2.cpdprd.pt 1/1 Running 1 4h 10.136.3.37 devmn2.cpdprd.pt
kube-system kube-scheduler-devmn3.cpdprd.pt 1/1 Running 1 1h 10.136.3.38 devmn3.cpdprd.pt
Looking for the specific error, I've come to find out that the issue is with the kube-flannel container, which is throwing an error:
[exXXXXX#devmn1 ~]$ kubectl logs canal-rdmnl -n kube-system -c kube-flannel
I0518 16:01:22.555513 1 main.go:487] Using interface with name ens192 and address 10.136.3.38
I0518 16:01:22.556080 1 main.go:504] Defaulting external address to interface address (10.136.3.38)
I0518 16:01:22.565141 1 kube.go:130] Waiting 10m0s for node controller to sync
I0518 16:01:22.565167 1 kube.go:283] Starting kube subnet manager
I0518 16:01:23.565280 1 kube.go:137] Node controller sync successful
I0518 16:01:23.565311 1 main.go:234] Created subnet manager: Kubernetes Subnet Manager - devmn3.cpdprd.pt
I0518 16:01:23.565331 1 main.go:237] Installing signal handlers
I0518 16:01:23.565388 1 main.go:352] Found network config - Backend type: vxlan
I0518 16:01:23.565440 1 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E0518 16:01:23.565619 1 main.go:279] Error registering network: failed to acquire lease: node "devmn3.cpdprd.pt" pod cidr not assigned
I0518 16:01:23.565671 1 main.go:332] Stopping shutdownHandler...
I just can't understand why.
Some relevant info:
My clusterCIDR and podCIDR are: 192.168.151.0/25 (I know, it's weird, don't ask unless it's a huge issue)
I've setup etcd on systemd
I've modified the kube-controller-manager.yaml to change the mask size to 25 (otherwise the IP mentioned before wouldn't work).
I'm installing everything with Kubeadm. One weird thing I did notice was that, when viewing the config (kubeadm config view) much of the information that I had setup on the kubeadm config.yaml (for kubeadm init) was not present in the config view, including the paths to etcd certs.
I'm also not sure why that happened, but I've fixed it (hopefully) by editing the kubeadm config map (kubectl edit cm kubeadm-config -n kube-system) and saving it.
Still no luck with canal.
Can anyone help me figure out what's wrong?
I have documented pretty much every step of the configuration I've done, so if required I may be able to provide it.
EDIT:
I figured how meanwhile that indeed my master2 and 3 do not have a podCIDR associated. Why would this happen? And how can I add it?
Try to edit:
/etc/kubernetes/manifests/kube-controller-manager.yaml
and add
--allocate-node-cidrs=true
--cluster-cidr=192.168.151.0/25
then, reload kubelet.
I found this information here and it was useful for me.