Failed to install rook on k8s cluster - kubernetes

I am trying to create a rook cluster inside k8s cluster.
Set up - 1 master node, 1 worker node
These are the steps I have followed
Master node:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
sudo sysctl net.bridge.bridge-nf-call-iptables=1
sudo sysctl net.bridge.bridge-nf-call-ip6tables=1
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/32a765fd19ba45b387fdc5e3812c41fff47cfd55/Documentation/kube-flannel.yml
kubeadm token create --print-join-command
Worker node:
kubeadm join {master_ip_address}:6443 --token {token} --discovery-token-ca-cert-hash {hash} --apiserver-advertise-address={worker_private_ip}
Master node - Install rook - (reference - https://rook.github.io/docs/rook/master/ceph-quickstart.html):
kubectl create -f ceph/common.yaml
kubectl create -f ceph/operator.yaml
kubectl create -f ceph/cluster-test.yaml
Error while creating rook-ceph-operator pod:
(combined from similar events): Failed create pod sandbox: rpc error: code =
Unknown desc = failed to set up sandbox container "4a901f12e5af5340f2cc48a976e10e5c310c01a05a4a47371f766a1a166c304f"
network for pod "rook-ceph-operator-fdfbcc5c5-jccc9": networkPlugin cni failed to
set up pod "rook-ceph-operator-fdfbcc5c5-jccc9_rook-ceph" network: failed to set bridge addr:
"cni0" already has an IP address different from 10.244.1.1/24
Can anybody help me with this issue?

This issue start if you did kubeadm reset and after that kubeadm init reinitialize Kubernetes.
kubeadm reset
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
After this start docker and kubelet and kubeadm again.
Work around
You can also try this way as simple easy solution
ip link delete cni0
ip link delete flannel.1
that depends on which network you are using inside k8s.

Related

Flannel is crashing for Slave node

I am getting this result for flannel service on my slave node. Flannel is running fine on master node.
kube-system kube-flannel-ds-amd64-xbtrf 0/1 CrashLoopBackOff 4 3m5s
Kube-proxy running on the slave is fine but not the flannel pod.
I have a master and a slave node only. At first its say running, then it goes to error and finally, crashloopbackoff.
godfrey#master:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system kube-flannel-ds-amd64-jszwx 0/1 CrashLoopBackOff 4 2m17s 192.168.152.104 slave3 <none> <none>
kube-system kube-proxy-hxs6m 1/1 Running 0 18m 192.168.152.104 slave3 <none> <none>
I am also getting this from the logs:
I0515 05:14:53.975822 1 main.go:390] Found network config - Backend type: vxlan
I0515 05:14:53.975856 1 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0515 05:14:53.976072 1 main.go:291] Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
I0515 05:14:53.976154 1 main.go:370] Stopping shutdownHandler...
I could not find a solution so far. Help appreciated.
As solution came from OP, I'm posting answer as community wiki.
As reported by OP in the comments, he didn't passed the podCIDR during kubeadm init.
The following command was used to see that the flannel pod was in "CrashLoopBackoff" state:
sudo kubectl get pods --all-namespaces -o wide
To confirm that podCIDR was not passed to flannel pod kube-flannel-ds-amd64-ksmmh that was in CrashLoopBackoff state.
$ kubectl logs kube-flannel-ds-amd64-ksmmh
kubeadm init --pod-network-cidr=172.168.10.0/24 didn't pass the podCIDR to the slave nodes as expected.
Hence to solve the problem, kubectl patch node slave1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}' command had to be used to pass podCIDR to each slave node.
Please see this link: coreos.com/flannel/docs/latest/troubleshooting.html and section "Kubernetes Specific"
The described cluster configuration doesn't look correct in two aspects:
First of all, PodCIDR reasonable minimum subnet size is /16. Each Kubernetes node usually gets /24 subnet because it can run up to 100 pods.
PodCIDR and ServicesCIDR (default: "10.96.0.0/12") must not interfere with your existing LAN network and with each other.
So, correct kubeadm command would look like:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
In your case PodCIDR subnet is only /24 and it was assigned to master node. Slave node didn't get its own /24 subnet, so Flannel Pod showed the error in the logs:
Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
Assigning the same subnet to several nodes manually will lead to the other connectivity problems.
You can find more details on Kubernetes IP subnets in GKE documentation.
The second problem is the IP subnet number.
Recent Calico network addon versions are able to detect the correct Pod subnet based on kubeadm parameter --pod-network-cidr. Older version was using predefined subnet 192.168.0.0/16 and you had to adjust it in its YAML file in the Deaemonset specification :
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
Flannel is still requires default subnet ( 10.244.0.0/16 ) to be specified for kubeadm init.
To use custom subnet for your cluster, Flannel "installation" YAML file should be adjusted before applying to the cluster.
...
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
...
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
...
So the following should work for any version of Kubernetes and Calico:
$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Latest Calico version
$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# or specific version, v3.14 in this case, which is also latest at the moment
# kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
Same for Flannel:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# For Kubernetes v1.7+
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# For older versions of Kubernetes:
# For RBAC enabled clusters:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-legacy.yml
$
There are many other network addons. You can find the list in the documentation:
Cluster Networking
Installing Addons

About : CreateContainerError

i installed K8S cluster in my laptop, it was running fine in the beginning but when i restarted my laptop then some services were not running.
kube-system coredns-5c98db65d4-9nm6m 0/1 Error 594 12d
kube-system coredns-5c98db65d4-qwkk9 0/1 CreateContainerError
kube-system kube-scheduler-kubemaster 0/1 CreateContainerError
I searched online for solution but could not get appropriate answer ,
please help me resolve this issue
I encourage you to look for official kubernetes documentation. Remember that your kubemaster should have at least fallowing resources: 2CPUs or more, 2GB or more of RAM.
Firstly install docker and kubeadm (as a root user) on each machine.
Initialize kubeadm (on master):
kubeadm init <args>
For example for Calico to work correctly, you need to pass --pod-network-cidr=192.168.0.0/16 to kubeadm init:
kubeadm init --pod-network-cidr=192.168.0.0/16
Install a pod network add-on (depends on what you would like to use). You can install a pod network add-on with the following command:
kubectl apply -f <add-on.yaml>
e.g. for Calico:
kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
To start using your cluster, you need to run on master the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You can now join any number of machines by running the following on each node as root:
kubeadm join <master-ip>:<master-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>
By default, tokens expire after 24 hours. If you are joining a node to the cluster after the current token has expired, you can create a new token by running the following command on the control-plane node:
kubeadm token create
Please, let me know if it works for you.
Did you check the status of docker and kubelet services.? if not, please run below commands and verify that services are up and running.
systemctl status docker kubelet

How to start kubelet service?

I ran command
systemctl stop kubelet
then try to start it
systemctl start kubelet
but can't able to start it
here is the output of systemctl status kubelet
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Wed 2019-06-05 15:35:34 UTC; 7s ago
Docs: https://kubernetes.io/docs/home/
Process: 31697 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 31697 (code=exited, status=255)
Because of this i am not able to run any kubectl command
example kubectl get pods gives
The connection to the server 172.31.6.149:6443 was refused - did you specify the right host or port?
Worked
Need to disable swap using swapoff -a
then,
try to start it systemctl start kubelet
So i need to reset kubelete service
Here are the step :-
check status of your docker service.
If stoped,start it by cmd sudo systemctl start docker.
If not installed installed it
#yum install -y kubelet kubeadm kubectl docker
Make swap off by #swapoff -a
Now reset kubeadm by #kubeadm reset
Now try #kudeadm init
after that check #systemctl status kubelet
it will be working
Check nodes
kubectl get nodes
if Master Node is not ready ,refer following
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
if you not able to create pod ..check dns
kubectl get pods --namespace=kube-system
if dns pods are in pending state
i.e you need to use network service
i used calico
kubectl apply -f https://docs.projectcalico.org/v3.7/manifests/calico.yaml
Now your master node is ready .. now you can deploy pod

Kubectl connectivity issue

I installed first ectd, kubeapiserver and kubelet using systemd service. The services are running fine and listening to all required ports.
When I run kubectl cluster-info , I get below output
Kubernetes master is running at http://localhost:8080
When I run kubectl get componentstatuses, then I get below output
etcd-0 Healthy {"health": "true"}
But running kubectl get nodes , I get below error
Error from server (ServerTimeout): the server cannot complete the requested operation at this time, try again later (get nodes)
Can anybody help me out on this.
For the message:
:~# k get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}
--------
Modify the following files on all master nodes:
$ sudo vim /etc/kubernetes/manifests/kube-scheduler.yaml
Comment or delete the line:
- --port=0
in (spec->containers->command->kube-scheduler)
$ sudo vim /etc/kubernetes/manifests/kube-controller-manager.yaml
Comment or delete the line:
- --port=0
in (spec->containers->command->kube-controller-manager)
Then restart kubelet service:
$ sudo systemctl restart kubelet.service
Your missing kubeconfig file. kubectl looks config file in this location $HOME/.kube/config
Part of install you can copy config file like this on master node.
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
What is the status of controller manager and scheduler. Do you see them listed as Healthy when you run the below command
kubectl get cs

Unable to start MiniKube: Stuck on Starting Cluster Components

I am new to minikube. I followed the below steps to install minikube on oracle linux 7.5 (kernel 3.10.0-327.28.3.el7.x86_64)
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube
After installing i ran the minikube
sudo minikube start --vm-driver=none
Minikube is falied on
sudo minikube start --vm-driver=none
Starting local Kubernetes v1.12.4 cluster...
Starting VM...
Waiting for SSH to be available...
Detecting the provisioner...
Setting Docker configuration on the remote daemon...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Stopping extra container runtimes...
Starting cluster components...
E0105 13:00:41.436961 19330 start.go:343] Error starting cluster: timed out waiting to elevate kube-system RBAC privileges: Temporary Error: creating clusterrolebinding: Post https://192.168.99.100:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?timeout=1m0s: net/http: TLS handshake timeout
Temporary Error: creating clusterrolebinding: Post https://192.168.99.100:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?timeout=1m0s: net/http: TLS handshake timeout
Temporary Error: creating clusterrolebinding: Post https://192.168.99.100:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?timeout=1m0s: net/http: TLS handshake timeout
Temporary Error: creating clusterrolebinding: Post https://192.168.99.100:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?timeout=1m0s: net/http: TLS handshake timeout
Temporary Error: creating clusterrolebinding: Post https://192.168.99.100:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?timeout=1m0s: net/http: TLS handshake timeout
I checked logs also and found that it is stuck in some loop and retrying but i am unable to understand
I0105 12:24:24.522907 19330 utils.go:224] > Your Kubernetes master has initialized successfully!
I0105 12:24:24.522916 19330 utils.go:224] > To start using your cluster, you need to run the following as a regular user:
I0105 12:24:24.522919 19330 utils.go:224] > mkdir -p $HOME/.kube
I0105 12:24:24.522925 19330 utils.go:224] > sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
I0105 12:24:24.522929 19330 utils.go:224] > sudo chown $(id -u):$(id -g) $HOME/.kube/config
I0105 12:24:24.522934 19330 utils.go:224] > You should now deploy a pod network to the cluster.
I0105 12:24:24.522944 19330 utils.go:224] > Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
I0105 12:24:24.522950 19330 utils.go:224] > https://kubernetes.io/docs/concepts/cluster-administration/addons/
I0105 12:24:24.522957 19330 utils.go:224] > You can now join any number of machines by running the following on each node
I0105 12:24:24.522959 19330 utils.go:224] > as root:
I0105 12:24:24.522972 19330 utils.go:224] > kubeadm join localhost:8443 --token 5apexw.uv7nfpirz4on2e33 --discovery-token-ca-cert-hash sha256:6dcf73220b8bc229269bb8c6a350592fe6b0cd068ef8f336163cc5b3a384990e
I0105 12:24:45.792762 19330 utils.go:117] sleeping 500ms
I0105 12:24:46.292883 19330 utils.go:106] retry loop 1
I0105 12:25:07.561761 19330 utils.go:117] sleeping 500ms
I0105 12:25:08.062042 19330 utils.go:106] retry loop 2
I0105 12:25:29.330434 19330 utils.go:117] sleeping 500ms
I0105 12:25:29.830625 19330 utils.go:106] retry loop 3
I0105 12:25:51.096835 19330 utils.go:117] sleeping 500ms
I0105 12:25:51.597099 19330 utils.go:106] retry loop 4
I suppose you did it in AWS.
Remove all and recreate from the scratch. Just reproduced and all works fine.
Remove all:
minikube delete
rm -rf ~/.minikube
My steps from the begging(under root):
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube
sudo yum install docker-engine
systemctl enable docker.service
systemctl start docker.service
minikube start --vm-driver=none
Result:
======================================== Starting local Kubernetes v1.12.4 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Stopping extra container runtimes...
Starting cluster components...
Verifying kubelet health ...
Verifying apiserver health ...Kubectl is now configured to use the
cluster.
===================
WARNING: IT IS RECOMMENDED NOT TO RUN THE NONE DRIVER ON PERSONAL
WORKSTATIONS The 'none' driver will run an insecure kubernetes
apiserver as root that may leave the host vulnerable to CSRF attacks
When using the none driver, the kubectl config and credentials
generated will be root owned and will appear in the root home
directory. You will need to move the files to the appropriate location
and then set the correct permissions. An example of this is below:
sudo mv /root/.kube $HOME/.kube # this will write over any previous configuration
sudo chown -R $USER $HOME/.kube
sudo chgrp -R $USER $HOME/.kube
sudo mv /root/.minikube $HOME/.minikube # this will write over any previous configuration
sudo chown -R $USER $HOME/.minikube
sudo chgrp -R $USER $HOME/.minikube
This can also be done automatically by setting the env var
CHANGE_MINIKUBE_NONE_USER=true Loading cached images from config file.
Everything looks great. Please enjoy minikube!