Kubernetes 1.6.2 flannel configuration in centos 7 - kubernetes

Using kueadm command I have configured 3 nodes Kubernetes cluster. Unlike earlier version 1.6.2 kubeadm command configures all the Kubernetes process automatically. For flannel I used this yml file kube-flannel.yml. my understanding with Kubernetes is it will create the container and run the process inside the container but I see flannel process running on node itself but /opt/bin/flannel binary not in my node. How Kubernetes running the flannel?
How Kubernetes handles this? Is there right document explains this concepts?
flannel pod running in master node itself.
[root#master01 ~]# kubectl get pods -o wide --namespace=kube-system -l app=flannel
NAME READY STATUS RESTARTS AGE IP NODE
kube-flannel-ds-3694s 2/2 Running 37 3d 192.168.15.101 master01
kube-flannel-ds-mbh9b 2/2 Running 10 3d 192.168.15.102 node-01
kube-flannel-ds-vlm20 2/2 Running 12 3d 192.168.15.103 node-02
I see flanneld process
[root#master01 ~]# ps -fed |grep flan
root 5447 5415 0 May10 ? 00:00:08 /opt/bin/flanneld --ip-masq --kube-subnet-mgr
root 5604 5582 0 May10 ? 00:00:00 /bin/sh -c set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done
but flanneld is not in the master node
> [root#master01 ~]# ls -ld /opt/bin/flanneld
> ls: cannot access /opt/bin/flanneld: No such file or directory
Thanks
SR

After some more reading found the answer flanneld run inside the continerd.
here is the run details.
https://github.com/opencontainers/runc
we can extract the flannel docker images like below.
> docker save -o flannel-v0.7.1-amd64.tar
> quay.io/coreos/flannel:v0.7.1-amd64 tar tvf flannel-v0.7.1-amd64.tar

Related

How to cause an intentional restart of a single kubernetes pod

I am testing a log previous command and for that I need a pod to restart.
I can get my pods using a command like
kubectl get pods -n $ns -l $label
Which shows that my pods did not restart so far. I want to test the command:
kubectl logs $podname -n $ns --previous=true
That command fails because my pod did not restart making the --previous=true switch meaningless.
I am aware of this command to restart pods when configuration changed:
kubectl rollout restart deployment myapp -n $ns
This does not restart the containers in a way that is meaningful for my log command test but rather terminates the old pods and creates new pods (which have a restart count of 0).
I tried various versions of exec to see if I can shut them down from within but most commands I would use are not found in that container:
kubectl exec $podname -n $ns -- shutdown
kubectl exec $podname -n $ns -- shutdown now
kubectl exec $podname -n $ns -- halt
kubectl exec $podname -n $ns -- poweroff
How can I use a kubectl command to forcefully restart the pod with it retaining its identity and the restart counter increasing by one so that my test log command has a previous instance to return the logs from.
EDIT:
Connecting to the pod is well described.
kubectl -n $ns exec --stdin --tty $podname -- /bin/bash
The process list shows only a handful running processes:
ls -1 /proc | grep -Eo "^[0-9]{1,5}$"
proc 1 seems to be the one running the pod.
kill 1 does nothing, not even kill the proc with pid 1
I am still looking into this at the moment.
There are different ways to achieve your goal. I'll describe below most useful options.
Crictl
Most correct and efficient way - restart the pod on container runtime level.
I tested this on Google Cloud Platform - GKE and minikube with docker driver.
You need to ssh into the worker node where the pod is running. Then find it's POD ID:
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
9863a993e0396 87a94228f133e 3 minutes ago Running nginx-3 2 6d17dad8111bc
OR
$ crictl pods -s ready
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
6d17dad8111bc About an hour ago Ready nginx-3 default 2 (default)
Then stop it:
$ crictl stopp 6d17dad8111bc
Stopped sandbox 6d17dad8111bc
After some time, kubelet will start this pod again (with different POD ID in CRI, however kubernetes cluster treats this pod as the same):
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
f5f0442841899 87a94228f133e 41 minutes ago Running nginx-3 3 b628e1499da41
This is how it looks in cluster:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-3 1/1 Running 3 48m
Getting logs with --previous=true flag also confirmed it's the same POD for kubernetes.
Kill process 1
It works with most images, however not always.
E.g. I tested on simple pod with nginx image:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 27h
$ kubectl exec -it nginx -- /bin/bash
root#nginx:/# kill 1
root#nginx:/# command terminated with exit code 137
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 1 27h
Useful link:
Debugging Kubernetes nodes with crictl

Connect to CoreDNS pod in microk8s

For some troubleshooting, I want to connect to my coredns pod. Is this possible?
$ microk8s kubectl get pod --namespace kube-system
NAME READY STATUS RESTARTS AGE
hostpath-provisioner-5c65fbdb4f-w6fmn 1/1 Running 1 7d22h
coredns-7f9c69c78c-mcdl5 1/1 Running 1 7d23h
calico-kube-controllers-f7868dd95-hbmjt 1/1 Running 1 7d23h
calico-node-rtprh 1/1 Running 1 7d23h
When I try, I get the following error msg:
$ microk8s kubectl --namespace kube-system exec --stdin --tty coredns-7f9c69c78c-mcdl5 -- /bin/bash
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "f1d08ed8494894d1281cd5c43dee36119225ab1ba414def333659538e5edc561": OCI runtime exec failed: exec failed: container_linux.go:370: starting container process caused: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
User AndD has good mentioned in the comment:
Coredns Pod have no shell, I think. Check this to kind-of exec with a sidecar: How to get into CoreDNS pod kuberrnetes?
Yes. This image has no shell. You can read more about this situation in this thread:
The image does not contain a shell. Logs can be viewed with kubectl.
You have asked:
I want to connect to my coredns pod, is this possible?
Theoretically yes, but you need to make a workaroud with docker. It is described in this answer:
In short, do this to find a node where a coredns pod is running:
kubectl -n kube-system get po -o wide | grep coredns
ssh to one of those nodes, then:
docker ps -a | grep coredns
Copy the Container ID to clipboard and run:
ID=<paste ID here>
docker run -it --net=container:$ID --pid=container:$ID --volumes-from=$ID alpine sh
You will now be inside the "sidecar" container and can poke around. I.e.
cat /etc/coredns/Corefile
Additionally, you can check the logs, with kubectl. See also official documentation about DNS debugging.

Flannel is crashing for Slave node

I am getting this result for flannel service on my slave node. Flannel is running fine on master node.
kube-system kube-flannel-ds-amd64-xbtrf 0/1 CrashLoopBackOff 4 3m5s
Kube-proxy running on the slave is fine but not the flannel pod.
I have a master and a slave node only. At first its say running, then it goes to error and finally, crashloopbackoff.
godfrey#master:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system kube-flannel-ds-amd64-jszwx 0/1 CrashLoopBackOff 4 2m17s 192.168.152.104 slave3 <none> <none>
kube-system kube-proxy-hxs6m 1/1 Running 0 18m 192.168.152.104 slave3 <none> <none>
I am also getting this from the logs:
I0515 05:14:53.975822 1 main.go:390] Found network config - Backend type: vxlan
I0515 05:14:53.975856 1 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0515 05:14:53.976072 1 main.go:291] Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
I0515 05:14:53.976154 1 main.go:370] Stopping shutdownHandler...
I could not find a solution so far. Help appreciated.
As solution came from OP, I'm posting answer as community wiki.
As reported by OP in the comments, he didn't passed the podCIDR during kubeadm init.
The following command was used to see that the flannel pod was in "CrashLoopBackoff" state:
sudo kubectl get pods --all-namespaces -o wide
To confirm that podCIDR was not passed to flannel pod kube-flannel-ds-amd64-ksmmh that was in CrashLoopBackoff state.
$ kubectl logs kube-flannel-ds-amd64-ksmmh
kubeadm init --pod-network-cidr=172.168.10.0/24 didn't pass the podCIDR to the slave nodes as expected.
Hence to solve the problem, kubectl patch node slave1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}' command had to be used to pass podCIDR to each slave node.
Please see this link: coreos.com/flannel/docs/latest/troubleshooting.html and section "Kubernetes Specific"
The described cluster configuration doesn't look correct in two aspects:
First of all, PodCIDR reasonable minimum subnet size is /16. Each Kubernetes node usually gets /24 subnet because it can run up to 100 pods.
PodCIDR and ServicesCIDR (default: "10.96.0.0/12") must not interfere with your existing LAN network and with each other.
So, correct kubeadm command would look like:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
In your case PodCIDR subnet is only /24 and it was assigned to master node. Slave node didn't get its own /24 subnet, so Flannel Pod showed the error in the logs:
Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
Assigning the same subnet to several nodes manually will lead to the other connectivity problems.
You can find more details on Kubernetes IP subnets in GKE documentation.
The second problem is the IP subnet number.
Recent Calico network addon versions are able to detect the correct Pod subnet based on kubeadm parameter --pod-network-cidr. Older version was using predefined subnet 192.168.0.0/16 and you had to adjust it in its YAML file in the Deaemonset specification :
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
Flannel is still requires default subnet ( 10.244.0.0/16 ) to be specified for kubeadm init.
To use custom subnet for your cluster, Flannel "installation" YAML file should be adjusted before applying to the cluster.
...
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
...
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
...
So the following should work for any version of Kubernetes and Calico:
$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Latest Calico version
$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# or specific version, v3.14 in this case, which is also latest at the moment
# kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
Same for Flannel:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# For Kubernetes v1.7+
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# For older versions of Kubernetes:
# For RBAC enabled clusters:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-legacy.yml
$
There are many other network addons. You can find the list in the documentation:
Cluster Networking
Installing Addons

Rook Ceph Operator hangs when checking for cluster status

I've setup a k8s cluster on digital ocean Ubuntu 18.04 LTS droplets using calico on top of wireguard vpn, and was able to setup nginx-ingress with traefik as external LB. I'm now on the step of setting up distributed storage using rook ceph, by following the quickstart at https://rook.io/docs/rook/master/ceph-quickstart.html, but it seems like the monitors never reach a quorum (even when its just one). Actually, monitor a reaches by itself, but neither the operator or any other monitors seem to know that, and the operator hangs when trying to check the status.
I've tried troubleshooting network issues, all the way from wireguard, calico and ufw. I've even set ufw to temporarily allow all traffic by default just to make sure I wasn't allowing one port but the traffic was on another interface (i have wg0, eth1, tunl0 and the calico interfaces).
The I followed the ceph troubleshooting guide unsuccessfully: http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap
I've been 4 days at this and I'm out of solutions.
Here's how I setup the storage cluster
cd cluster/examples/kubernetes/ceph
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster-test.yaml
Running kubectl get pods returns
NAME READY STATUS RESTARTS AGE
pod/rook-ceph-agent-9ws2p 1/1 Running 0 24s
pod/rook-ceph-agent-v6v9n 1/1 Running 0 24s
pod/rook-ceph-agent-x2jv4 1/1 Running 0 24s
pod/rook-ceph-mon-a-74cc6db5c8-8s5l5 1/1 Running 0 9s
pod/rook-ceph-operator-7cd5d8bd4c-pclxp 1/1 Running 0 25s
pod/rook-discover-24cfj 1/1 Running 0 24s
pod/rook-discover-6xsnp 1/1 Running 0 24s
pod/rook-discover-hj4tc 1/1 Running 0 24s
However, when I try to check the status of the monitors, from the operator pod I get:
#This hangs forever
kubectl exec -it rook-ceph-operator-7cd5d8bd4c-pclxp ceph status
#This hangs foverer
kubectl exec -it rook-ceph-operator-7cd5d8bd4c-pclxp ceph ping mon.a
#This returns [errno 2] error calling ping_monitor
#Which I guess should, becasue mon.b does/should not exist
#But I expected a response such as mon.b does not exist
kubectl exec -it rook-ceph-operator-7cd5d8bd4c-pclxp ceph ping mon.b
Pinging the monitor pod from the operator works just fine by the way
Operator logs
https://gist.github.com/figassis/0a3f499f5e3f79a430c9bd58718fd29f#file-operator-log
Monitor a logs
https://gist.github.com/figassis/0a3f499f5e3f79a430c9bd58718fd29f#file-mon-a-log
Monitor a status, obtainer directly form monitor pod via socket
https://gist.github.com/figassis/0a3f499f5e3f79a430c9bd58718fd29f#file-mon-a-status
You can execute ceph status command inside ceph toolbox pod.
https://github.com/rook/rook/blob/master/Documentation/ceph-toolbox.md

kubectl get nodes not showing workers

I am following this tutorial with 2 vms running CentOS7. Everything looks fine (no errors during installation/setup) but I can't see my nodes.
NOTE:
I am running this on VMWare VMs
kub1 is my master and kub2 my worker node
kubectl get nodes output:
[root#kub1 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#kub2 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
nodes:
[root#kub1 ~]# kubectl get nodes
[root#kub1 ~]# kubectl get nodes -a
[root#kub1 ~]#
[root#kub2 ~]# kubectl get nodes -a
[root#kub2 ~]# kubectl get no
[root#kub2 ~]#
cluster events:
[root#kub1 ~]# kubectl get events -a
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kubelet kub2.local} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
/var/log/messages:
kubelet.go:1194] Unable to construct api.Node object for kubelet: can't get ip address of node node-kub2: lookup node-kub2: no such host
QUESTION: any idea why my nodes are not shown using "kubectl get nodes"?
My issue was that the KUBELET_HOSTNAME on /etc/kubernetes/kubeletvalue didn't match the hostname.
I commented that line, then restarted the services and I could see my worker after that.
hope that helps
Not sure about your scenario, but I have solved it after 3-4 hours of efforts.
Solved
I was facing this issue, because my docker cgroup driver was different than kubernetes cgroup driver.
Just updated it to cgroupfs using following commands mentioned in doc.
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
EOF
Restart docker service service docker restart.
Reset kubernetes on slave node: kubeadm reset
Joined master again: kubeadm join <><>
It was visible on master using kubectl get nodes.
I had a similar problem after installing k8s using kubespray on fedora31, and to debug the issue, tried to run a random container directly using docker run that failed with:
docker: Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.
this is a known problem cause by cgroup version on fedora 31, and the fix is to update grub to use the previous version:
sudo dnf install grubby
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"