I am trying to setup my very first Kubernetes cluster and it seems to have setup fine until nginx-ingress controller.
Here is my cluster information:
Nodes: three RHEL7 and one RHEL8 nodes
Master is running on RHEL7
Kubernetes server version: 1.19.1
Networking used: flannel
coredns is running fine.
selinux and firewall are disabled on all nodes
Here are my all pods running in kube-system
I then followed instructions on following page to install nginx ingress controller: https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/
Instead of deployment, I decided to use daemon-set since I am going to have only few nodes running in my kubernetes cluster.
After following the instructions, pod on my RHEL8 is constantly failing with the following error:
Readiness probe failed: Get "http://10.244.3.2:8081/nginx-ready": dial
tcp 10.244.3.2:8081: connect: connection refused Back-off restarting
failed container
Here is the screenshot shows that RHEL7 pods are working just fine and RHEL8 is failing:
All nodes are setup exactly the same way and there is no difference.
I am very new to Kubernetes and don't know much internals of it. Can someone please point me on how can I debug and fix this issue? I am really willing to learn from issues like this.
This is how I provisioned RHEL7 and RHEL8 nodes
Installed docker version: 19.03.12, build 48a66213fe
Disabled firewalld
Disabled swap
Disabled SELinux
To enable iptables to see bridged traffic, set net.bridge.bridge-nf-call-ip6tables = 1 and net.bridge.bridge-nf-call-iptables = 1
Added hosts entry for all the nodes involved in Kubernetes cluster so that they can find each other without hitting DNS
Added IP address of all nodes in Kubernetes cluster on /etc/environment for no_proxy so that it doesn't hit corporate proxy
Verified docker driver to be "systemd" and NOT "cgroupfs"
Reboot server
Install kubectl, kubeadm, kubelet as per kubernetes guide here at: https://kubernetes.io/docs/tasks/tools/install-kubectl/
Start and enable kubelet service
Initialize master by executing the following:
kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
Apply node-selector patch for mixed OS scheduling
wget https://raw.githubusercontent.com/Microsoft/SDN/master/Kubernetes/flannel/l2bridge/manifests/node-selector-patch.yml
kubectl patch ds/kube-proxy --patch "$(cat node-selector-patch.yml)" -n=kube-system
Apply flannel CNI
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Modify net-conf.json section of kube-flannel.yml for a type "host-gw"
kubectl apply -f kube-flannel.yml
Apply node selector patch
kubectl patch ds/kube-flannel-ds-amd64 --patch "$(cat node-selector-patch.yml)" -n=kube-system
Thanks
According to kubernetes documentation the list of supported host operating systems is as follows:
Ubuntu 16.04+
Debian 9+
CentOS 7
Red Hat Enterprise Linux (RHEL) 7
Fedora 25+
HypriotOS v1.0.1+
Flatcar Container Linux (tested with 2512.3.0)
This article mentioned that there are network issues on RHEL 8:
(2020/02/11 Update: After installation, I keep facing pod network issue which is like deployed pod is unable to reach external network
or pods deployed in different workers are unable to ping each other
even I can see all nodes (master, worker1 and worker2) are ready via
kubectl get nodes. After checking through the Kubernetes.io official website, I observed the nfstables backend is not compatible with the
current kubeadm packages. Please refer the following link in “Ensure
iptables tooling does not use the nfstables backend”.
The simplest solution here is to reinstall the node on supported operating system.
Related
I have a single node kubernetes setup on Ubuntu 20.04. Am using microk8s and longhorn storage for my single node cluster. I install packages using Helm via Lens IDE. I have configured everything as per the respective guides but anytime I install a package that requires persistence eg Mariadb or Wordpress, the following happens:
pv and pvc get created and Bound successfully
pod does not successfully create and throws the error below
MountVolume.SetUp failed for volume "pvc-fdada93c-c4af-4916-942f-abf9897feaf9" : applyFSGroup failed for vol pvc-fdada93c-c4af-4916-942f-abf9897feaf9: lstat /var/snap/microk8s/common/var/lib/kubelet/pods/f69173e1-cd98-4f86-9e52-edf62fa723da/volumes/kubernetes.io~csi/pvc-fdada93c-c4af-4916-942f-abf9897feaf9/mount: no such file or directory
when I manually create a directory using the command below, the pod will successfully start
mkdir -p /var/snap/microk8s/common/var/lib/kubelet/pods/f69173e1-cd98-4f86-9e52-edf62fa723da/volumes/kubernetes.io~csi/pvc-fdada93c-c4af-4916-942f-abf9897feaf9/mount
the issue will then repeat if I do server reboot
Question: How can I get the pods to automatically mount when I install a package from Helm. I have seen this happen on similar single node clusters using the same software.
NOTE: nfs-common and open-iscsi are both running
I was able to figure out the issue.
The issue was actually not due to Longhorn itself. It was due to CoreDNS.
Due to firewall restrictions, CoreDNS could not resolve internal kubernetes DNS, especially longhorn-backend
Provided the UI and Driver could not reach longhorn-backend, they could never start. Fixing CoreDNS issues fixed caused the longhorn services to work well and my PVCs and PVs also worked as expected.
Steps to resolve were as follow
Check the coredns pod for errors
kubectl logs coredns-7f9c69c78c-7dsjg -n kube-system
Any output other than simply the coredns version means you need to resolve the errors shown
For me it was done by disabling firewalls and adding 8.8.8.8 in my Node's /etc/resolv.conf file
Once resolved, you can ether wait a minute for coredns to resolve internal DNS or restart it with the command below
kubectl rollout restart deployment/coredns -n kube-system
Everything worked well after that!
I have make my deployment work with istio ingressgateway before. I am not aware of any changes made in istio or k8s side.
When I tried to deploy, I see an error in replicaset side that's why it cannot create new pod.
Error creating: Internal error occurred: failed calling webhook
"namespace.sidecar-injector.istio.io": Post
"https://istiod.istio-system.svc:443/inject?timeout=10s": dial tcp
10.104.136.116:443: connect: no route to host
When I try to go inside api-server and ping 10.104.136.116 (istiod service IP) it just hangs.
What I have tried so far:
Deleted all coredns pods
Deleted all istiod pods
Deleted all weave pods
Reinstalling istio via istioctl x uninstall --purge
turning all of VMs firewall
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -F
restarted all of the nodes
manual istio pod injection
Setup
k8s version: 1.21.2
istio: 1.10.3
HA setup
CNI: weave
CRI: containerd
In my case this was related to firewall. More info can be found here.
The gist of it is that on GKE at least you need to open another port 15017 in addition to 10250 and 443. This is to allow communication from your master node(s) to you VPC.
I don't have a definite answer unto why is this happening. But kube-apiserver cannot access istiod via service IP, wherein it can connect when I used the istiod pod IP.
Since I don't have the control over the VM and lower networking layer and not sure if they have changed something (because it is working before).
I made this work by changing my CNI from weave to flannel
In my case it was due to firewall. Following this Istio debug guide, I identified that the kubectl get --raw /api/v1/namespaces/istio-system/services/https:istiod:https-webhook/proxy/inject -v4 command was timing out while all other cluster internal calls were ok.
The best way to diagnose this is to open temporarly your AWS Security Groups involved to 0.0.0.0/0 for port 15017 and then try again.
If the errror won't show again, you know there's need to fix this part.
I am using EKS with Amazon VPC CNI v1.12.2-eksbuild.1
I install a 3 node Kubernetes Cluster with RKE tool. The installation was successful with no errors but I'm unable to ping from one pod to another pod.
If I ping a pod running on worker2 node(NODE-IP-10.222.22.47) I get a response, but no responses from pods running on worker1(NODE-IP-10.222.22.46).
My Pods are as follows -
Also I noticed for some pods it has given node-ip addresses. The node IP addresses are
Master1=10.222.22.45
Worker1=10.222.22.46
Worker2=10.222.22.47
cluster_cidr: 10.42.0.0/16
service_cluster_ip_range: 10.43.0.0/16
cluster_dns_server: 10.43.0.10
Overlay network - canal
OS- CentOS Linux release 7.8.2003
Kubernetes - v1.20.8 installed with rke tool
Docker - 20.10.7
Sysctl entries in all nodes
firewall was disabled in all nodes before install.
Check - sysctl net.bridge.bridge-nf-call-ip6tables
net.bridge.bridge-nf-call-ip6tables = 1
Check - sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
From description and screenshots provided, some addons and
controllers of master01 do not seem ready, which might need to be
reconfigured.
You can also use routes to connect to pods on worker1. Check this for
detailed instructions.
The reason was the UDP ports were blocked
I am setting up my very first Kubernetes cluster. We are expecting to have mix of Windows and Linux node so I picked flannel as my cni. I am using RHEL 7.7 as my master node and I have two other RHEL 7.7 machines as worker node and then rest are Windows Server 2019. For most of the part, I was following documentation provided on Microsoft site: https://learn.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/getting-started-kubernetes-windows and also one on Kubernetes site: https://kubernetes.cn/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/ . I know article on Microsoft site is more than 2 years old but this is only the guide I found for mixed mode operations.
I have done following so far on Master and worker RHEL nodes:
stopped and disabled firewalld
disabled selinux
update && upgrade
Disabled swap partition
Added /etc/hosts entry for all nodes involved in my Kubernetes cluster
Installed Docker CE 19.03.11
Install kubectl, kubeadm and kubelet 1.18.3 (Build date 2020-05-20)
Prepare Kubernetes control plane for Flannel: sudo sysctl net.bridge.bridge-nf-call-iptables=1
I have now done following on RHEL Master node
Initialize cluster
kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
kubectl as non-root user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Patch the Daemon set for the node selector
wget https://raw.githubusercontent.com/Microsoft/SDN/master/Kubernetes/flannel/l2bridge/manifests/node-selector-patch.yml
kubectl patch ds/kube-proxy --patch "$(cat node-selector-patch.yml)" -n=kube-system
After the patch, kube-proxy looks like this:
Add Flannel
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Modify the net-conf.json section of the flannel manifest in order to set the VNI to 4096 and the Port to 4789. It should look as follows:
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan",
"VNI" : 4096,
"Port": 4789
}
}
Apply modified kube-flannel
kubectl apply -f kube-flannel.yml
After adding network, here is what I get for pods in kube-system
Add Windows Flannel and kube-proxy DaemonSets
curl -L https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/kube-proxy.yml | sed 's/VERSION/v1.18.0/g' | kubectl apply -f -
kubectl apply -f https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/flannel-overlay.yml
Join Worker node
I am now trying to join the RHEL 7.7 worker node by executing the kubeadm join command generated when IU initialized my cluster.
Worker node initializes fine as seen below:
when I go to my RHEL worker node, I see that k8s_install-cni_kube-flannel-ds-amd64-f4mtp_kube-system container is exited as seen below:
Can you please let me know if I am following the correct procedure? I believe Flannel CNI is required to talk to pods within kubernetes cluster
If Flannel is difficult to setup for mixed mode, can we use other network which can work?
If we decide to go only and only RHEL nodes, what is the best and easiest network plugin I can install without going through lot of issues?
Thanks and I appreciate it.
There are a lot of materials about Kubernetes on official site and I encourage you to check it out:
Kubernetes.io
I divided this answer on parts:
CNI
Troubleshooting
CNI
What is CNI?
CNI (Container Network Interface), a Cloud Native Computing Foundation project, consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins. CNI concerns itself only with network connectivity of containers and removing allocated resources when the container is deleted. Because of this focus, CNI has a wide range of support and the specification is simple to implement.
-- Github.com: Containernetworking: CNI
Your CNI plugin in simple terms is responsible for pod's networking inside your cluster.
There are multiple CNI plugins like:
Flannel
Calico
Multus
Weavenet
What I mean about that, you don't need to use Flannel. You can use other plugin like Calico. The major consideration is that they are different from each other and you should pick option best for your use case (support for some feature for example).
There are a lot of materials/resources on this topic. Please take a look at some of them:
Youtube.com: Kubernetes and the CNI: Where We Are and What's Next - Casey Callendrello, CoreOS
Youtube.com: Container Network Interface (CNI) Explained in 7 Minutes
Kubernetes.io: Docs: Concepts: Cluster administration: Networking
As for:
If Flannel is difficult to setup for mixed mode, can we use other network which can work?
If you mean mixed mode by using nodes that are Windows and Linux machines, I would stick to guides that are already written like one you mentioned: Kubernetes.io: Adding Windows nodes
As for:
If we decide to go only and only RHEL nodes, what is the best and easiest network plugin I can install without going through lot of issues?
The best way to choose CNI plugin would entail looking for solution fitting your needs the most. You can follow this link for an overview:
Kubernetes.io: Docs: Concepts: Cluster administration: Networking
Also you can look here (Please have in mind that this article is from 2018 and could be outdated):
Itnext.io: Benchmark results of Kubernetes network plugin cni over 10gbit's network
Troubleshooting
when I go to my RHEL worker node, I see that k8s_install-cni_kube-flannel-ds-amd64-f4mtp_kube-system container is exited as seen below:
Your k8s_install-cni_kube-flannel-ds-amd64-f4mtp_kube-system container exited with status 0 which should indicate correct provisioning.
You can check the logs of flannel pods by invoking below command:
kubectl logs POD_NAME
You can also refer to official documentation of Flannel: Github.com: Flannel: Troubleshooting
As I said in the comment:
To check if your CNI is working you can spawn 2 pods on 2 different nodes and try make a connection between them (like ping them).
Steps:
Spawn pods
Check their IP addresses
Exec into pods
Ping
Spawn pods
Below is example deployment definition that will spawn ubuntu pods. They will be used to check if pods have communication between nodes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ubuntu
spec:
selector:
matchLabels:
app: ubuntu
replicas: 5
template:
metadata:
labels:
app: ubuntu
spec:
containers:
- name: ubuntu
image: ubuntu:latest
command:
- sleep
- infinity
Please have in mind that this example is for testing purposes only. Apply above definition with:
kubectl apply -f FILE_NAME.yaml
Check their IP addresses
After pods spawned you should be able to run command:
$ kubectl get pods -o wide
and see output similar to this:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ubuntu-557dc88445-lngt7 1/1 Running 0 8s 10.20.0.4 NODE-1 <none> <none>
ubuntu-557dc88445-nhvbw 1/1 Running 0 8s 10.20.0.5 NODE-1 <none> <none>
ubuntu-557dc88445-p8v86 1/1 Running 0 8s 10.20.2.4 NODE-2 <none> <none>
ubuntu-557dc88445-vm2kg 1/1 Running 0 8s 10.20.1.9 NODE-3 <none> <none>
ubuntu-557dc88445-xwt86 1/1 Running 0 8s 10.20.0.3 NODE-1 <none> <none>
You can see from above output:
what IP address each pod has
what node is assigned to each pod.
By above example we will try to make a connection between:
ubuntu-557dc88445-lngt7 (first one) with ip address of 10.20.0.4 on the NODE-1
ubuntu-557dc88445-p8v86 (third one) with ip address of 10.20.2.4 on the NODE-2
Exec into pods
You can exec into the pod to run commands:
$ kubectl exec -it ubuntu-557dc88445-lngt7 -- /bin/bash
Please take a look on official documentation here: Kubernetes.io: Get shell running container
Ping
Ping was not built into the ubuntu image but you can install it with:
$ apt update && apt install iputils-ping
After that you can ping the 2nd pod and check if you can connect to another pod:
root#ubuntu-557dc88445-lngt7:/# ping 10.20.2.4 -c 4
PING 10.20.2.4 (10.20.2.4) 56(84) bytes of data.
64 bytes from 10.20.2.4: icmp_seq=1 ttl=62 time=0.168 ms
64 bytes from 10.20.2.4: icmp_seq=2 ttl=62 time=0.169 ms
64 bytes from 10.20.2.4: icmp_seq=3 ttl=62 time=0.174 ms
64 bytes from 10.20.2.4: icmp_seq=4 ttl=62 time=0.206 ms
--- 10.20.2.4 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3104ms
rtt min/avg/max/mdev = 0.168/0.179/0.206/0.015 ms
I'm trying to deploy Kubernetes with Calico (IPIP) with Kubeadm. After deployment is done I'm deploying Calico using these manifests
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
Before applying it, I'm editing CALICO_IPV4POOL_CIDR and setting it to 10.250.0.0/17 as well as using command kubeadm init --pod-cidr 10.250.0.0/17.
After few seconds CoreDNS pods (for example getting addr 10.250.2.2) starts restarting with error 10.250.2.2:8080 connection refused.
Now a bit of digging:
from any node in cluster ping 10.250.2.2 works and it reaches pod (tcpdump in pod net namespace shows it).
from different pod (on different node) curl 10.250.2.2:8080 works well
from any node to curl 10.250.2.2:8080 fails with connection refused
Because it's coredns pod it listens on 53 both udp and tcp, so I've tried netcat from nodes
nc 10.250.2.2 53 - connection refused
nc -u 10.250.2.2 55 - works
Now I've tcpdump each interface on source node for port 8080 and curl to CoreDNS pod doesn't even seem to leave node... sooo iptables?
I've also tried weave, canal and flannel, all seem to have same issue.
I've ran out of ideas by now...any pointers please?
Seems to be a problem with Calico implementation, CoreDNS Pods are sensitive on the CNI network Pods successful functioning.
For proper CNI network plugin implementation you have to include --pod-network-cidr flag to kubeadm init command and afterwards apply the same value to CALICO_IPV4POOL_CIDR parameter inside calico.yml.
Moreover, for a successful Pod network installation you have to apply some RBAC rules in order to make sufficient permissions in compliance with general cluster security restrictions, as described in official Kubernetes documentation:
For Calico to work correctly, you need to pass
--pod-network-cidr=192.168.0.0/16 to kubeadm init or update the calico.yml file to match your Pod network. Note that Calico works on
amd64 only.
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
In your case I would switched to the latest Calico versions at least from v3.3 as given in the example.
If you've noticed that you run Pod network plugin installation properly, please take a chance and update the question with your current environment setup and Kubernetes components versions with a health statuses.