Pods not accessible (timeout) on 3 Node cluster created in AWS ec2 from master - kubernetes

I have 3 node cluster in AWS ec2 (Centos 8 ami).
When I try to access pods scheduled on worker node from master:
kubectl exec -it kube-flannel-ds-amd64-lfzpd -n kube-system /bin/bash
Error from server: error dialing backend: dial tcp 10.41.12.53:10250: i/o timeout
kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-54ff9cd656-8mpbx 1/1 Running 2 7d21h 10.244.0.7 master <none> <none>
kube-system coredns-54ff9cd656-xcxvs 1/1 Running 2 7d21h 10.244.0.6 master <none> <none>
kube-system etcd-master 1/1 Running 2 7d21h 10.41.14.198 master <none> <none>
kube-system kube-apiserver-master 1/1 Running 2 7d21h 10.41.14.198 master <none> <none>
kube-system kube-controller-manager-master 1/1 Running 2 7d21h 10.41.14.198 master <none> <none>
kube-system kube-flannel-ds-amd64-8zgpw 1/1 Running 2 7d21h 10.41.14.198 master <none> <none>
kube-system kube-flannel-ds-amd64-lfzpd 1/1 Running 2 7d21h 10.41.12.53 worker1 <none> <none>
kube-system kube-flannel-ds-amd64-nhw5j 1/1 Running 2 7d21h 10.41.15.9 worker3 <none> <none>
kube-system kube-flannel-ds-amd64-s6nms 1/1 Running 2 7d21h 10.41.15.188 worker2 <none> <none>
kube-system kube-proxy-47s8k 1/1 Running 2 7d21h 10.41.15.9 worker3 <none> <none>
kube-system kube-proxy-6lbvq 1/1 Running 2 7d21h 10.41.15.188 worker2 <none> <none>
kube-system kube-proxy-vhmfp 1/1 Running 2 7d21h 10.41.14.198 master <none> <none>
kube-system kube-proxy-xwsnk 1/1 Running 2 7d21h 10.41.12.53 worker1 <none> <none>
kube-system kube-scheduler-master 1/1 Running 2 7d21h 10.41.14.198 master <none> <none>
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 7d21h v1.13.10
worker1 Ready <none> 7d21h v1.13.10
worker2 Ready <none> 7d21h v1.13.10
worker3 Ready <none> 7d21h v1.13.10
I tried below steps in all nodes, but no luck so far:
iptables -w -P FORWARD ACCEPT on all nodes
Turn on Masquerade
Turn on port 10250/tcp
Turn on port 8472/udp
Start kubelet
Any pointer would be helpful.

Flannel does not support NFT, and since you are using CentOS 8, you can't fallback to iptables.
Your best bet in this situation would be to switch to Calico.
You have to update Calico DaemonSet with:
....
Environment:
FELIX_IPTABLESBACKEND: NFT
....
or use version 3.12 or newer, as it adds
Autodetection of iptables backend
Previous versions of Calico required you to specify the host’s iptables backend (one of NFT or Legacy). With this release, Calico can now autodetect the iptables variant on the host by setting the Felix configuration parameter IptablesBackend to Auto. This is useful in scenarios where you don’t know what the iptables backend might be such as in mixed deployments. For more information, see the documentation for iptables dataplane configuration
Or switch to Ubuntu 20.04. Ubuntu doesn't use nftables yet.

Issue was because of inbound port in SG.I added these ports in SG I am able to get pass that issue.
2222
24007
24008
49152-49251
My original installer script does not need to follow above steps while running on VMs and standalone machine.
As SG is specific to EC2, so port should be allowed in inbound.
Point to note here is all my nodes(master and worker) are on same SG. even then port hast to be opened in inbound rule, that's the way SG works.

Related

K8s pods running in diffrent node can't communicate with each other

I have k8s cluster with two node, master and worker node, installed Calico.
I initialized cluster and installed calico with following commands
# Initialize cluster
kubeadm init --apiserver-advertise-address=<MatserNodePublicIP> --pod-network-cidr=192.168.0.0/16
# Install Calico. Refer to official document
# https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico-with-kubernetes-api-datastore-50-nodes-or-less
curl https://docs.projectcalico.org/manifests/calico.yaml -O
kubectl apply -f calico.yaml
After that, I found pods running in different node can't communicate with each other, but pods running in same node can communicate with each other.
Here are my operations:
# With following command, I ran a nginx pod scheduled to worker node
# and assigned pod id 192.168.199.72
kubectl create nginx --image=nginx
# With following command, I ran a busybox pod scheduled to master node
# and assigned pod id 192.168.119.197
kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox sh
# In busybox bash, I executed following command
# '>' represents command output
wget 192.168.199.72
> Connecting to 192.168.199.72 (192.168.199.72:80)
> wget: can't connect to remote host (192.168.199.72): Connection timed out
However, if nginx pod run in master node (same as busybox), the wget would output a correct welcome html.
(For scheduling nginx pod to master node, I cordon worker node, and restarted nginx pod)
I also tried to schedule nginx and busybox pod to worker node, the wget ouput is a correct welcome html.
Here are my cluster status, everything looks fine. I searched all I can find but couldn't find solution.
matser and worker node can ping each other with private IP.
For firewall
systemctl status firewalld
> Unit firewalld.service could not be found.
For node infomation
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
pro-con-scrapydmanager Ready control-plane,master 26h v1.21.2 10.120.0.5 <none> CentOS Linux 7 (Core) 3.10.0-957.27.2.el7.x86_64 docker://20.10.5
pro-con-scraypd-01 Ready,SchedulingDisabled <none>
For pod infomation
kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default busybox 0/1 Error 0 24h 192.168.199.72 pro-con-scrapydmanager <none> <none>
default nginx 1/1 Running 1 26h 192.168.119.197 pro-con-scraypd-01 <none> <none>
kube-system calico-kube-controllers-78d6f96c7b-msrdr 1/1 Running 1 26h 192.168.199.77 pro-con-scrapydmanager <none> <none>
kube-system calico-node-gjhwh 1/1 Running 1 26h 10.120.0.2 pro-con-scraypd-01 <none> <none>
kube-system calico-node-x8d7g 1/1 Running 1 26h 10.120.0.5 pro-con-scrapydmanager <none> <none>
kube-system coredns-558bd4d5db-mllm5 1/1 Running 1 26h 192.168.199.78 pro-con-scrapydmanager <none> <none>
kube-system coredns-558bd4d5db-whfnn 1/1 Running 1 26h 192.168.199.75 pro-con-scrapydmanager <none> <none>
kube-system etcd-pro-con-scrapydmanager 1/1 Running 1 26h 10.120.0.5 pro-con-scrapydmanager <none> <none>
kube-system kube-apiserver-pro-con-scrapydmanager 1/1 Running 1 26h 10.120.0.5 pro-con-scrapydmanager <none> <none>
kube-system kube-controller-manager-pro-con-scrapydmanager 1/1 Running 2 26h 10.120.0.5 pro-con-scrapydmanager <none> <none>
kube-system kube-proxy-84cxb 1/1 Running 2 26h 10.120.0.2 pro-con-scraypd-01 <none> <none>
kube-system kube-proxy-nj2tq 1/1 Running 2 26h 10.120.0.5 pro-con-scrapydmanager <none> <none>
kube-system kube-scheduler-pro-con-scrapydmanager 1/1 Running 1 26h 10.120.0.5 pro-con-scrapydmanager <none> <none>
lens-metrics kube-state-metrics-78596b555-zxdst 1/1 Running 1 26h 192.168.199.76 pro-con-scrapydmanager <none> <none>
lens-metrics node-exporter-ggwtc 1/1 Running 1 26h 192.168.199.73 pro-con-scrapydmanager <none> <none>
lens-metrics node-exporter-sbz6t 1/1 Running 1 26h 192.168.119.196 pro-con-scraypd-01 <none> <none>
lens-metrics prometheus-0 1/1 Running 1 26h 192.168.199.74 pro-con-scrapydmanager <none> <none>
For services
kubectl get services -o wide --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 26h <none>
default nginx ClusterIP 10.99.117.158 <none> 80/TCP 24h run=nginx
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 26h k8s-app=kube-dns
lens-metrics kube-state-metrics ClusterIP 10.104.32.63 <none> 8080/TCP 26h name=kube-state-metrics
lens-metrics node-exporter ClusterIP None <none> 80/TCP 26h name=node-exporter,phase=prod
lens-metrics prometheus ClusterIP 10.111.86.164 <none> 80/TCP 26h name=prometheus
Ok. It's fault of firewall. I opened all of the following ports on my master node and recreated my cluster, then everything got fine and cni0 interface appeared. Although I still don't know why.
During the proccessing of tring, I find cni0 interface is important. If there is no cni0, I could not ping pod running in diffrent node.
(Refer: https://docs.projectcalico.org/getting-started/bare-metal/requirements)
Configuration Host(s) Connection type Port/protocol
Calico networking (BGP) All Bidirectional TCP 179
Calico networking with IP-in-IP enabled (default) All Bidirectional IP-in-IP, often represented by its protocol number 4
Calico networking with VXLAN enabled All Bidirectional UDP 4789
Calico networking with Typha enabled Typha agent hosts Incoming TCP 5473 (default)
flannel networking (VXLAN) All Bidirectional UDP 4789
All kube-apiserver host Incoming Often TCP 443 or 6443*
etcd datastore etcd hosts Incoming Officially TCP 2379 but can vary

Is it possible to show the k8s network topology using kubectl?

I installed the minikube in my CentOS 7.7 Server.
there are several pods in it:
[dele#att root]$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-f9fd979d6-4p6xg 1/1 Running 1 23h 172.18.0.2 minikube <none> <none>
kube-system etcd-minikube 1/1 Running 0 22h 172.17.0.2 minikube <none> <none>
kube-system kube-apiserver-minikube 1/1 Running 0 22h 172.17.0.2 minikube <none> <none>
kube-system kube-controller-manager-minikube 1/1 Running 1 23h 172.17.0.2 minikube <none> <none>
kube-system kube-proxy-4k468 1/1 Running 1 23h 172.17.0.2 minikube <none> <none>
kube-system kube-scheduler-minikube 1/1 Running 1 23h 172.17.0.2 minikube <none> <none>
kube-system storage-provisioner 1/1 Running 2 23h 172.17.0.2 minikube <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-c95fcf479-k7zpn 1/1 Running 1 23h 172.18.0.3 minikube <none> <none>
kubernetes-dashboard kubernetes-dashboard-5c448bc4bf-f9swt 1/1 Running 1 23h 172.18.0.4 minikube <none> <none>
but I can not see a clear network topology diagram, is it possible to show the network topology using the kubectl?
This is not possible out of the box with kubernetes (and kubectl) as far as I know.
With additional software in your cluster I know about three possiblities with visualization:
Istio has the possibility to visualize the communication within the mesh with kiali (For reference: https://istio.io/latest/docs/tasks/observability/kiali/)
The second option is spekt8
Weavescope comes with agents that gather data and visualizes them
Despite these options others could exist and I would really like to see more options because not everyone wants to add Istio and accept the performance impact just to visualize the pod/network landscape.
And as far as I understand spekt8 it's more about the visualization of relations between Kubernetes resources than about network topology visualization.
Weavescope needs cluster administration rights therefore it isn't advisable to make it public accessible without setting up some form of authentication.

kubernetes dashboard CrashLoopBackOff

Brand new to kubernetes, but managed to install kubernetes, ubuntu 20.04 LTS, but having issues with the dashboard. followed the procedure, using flannel as CNF.
The log states issues with connection to 10.96.0.1:443, but telnet seems to work? Any suggestion how to getting further ?
bwa#prod3:~$ kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-66bff467f8-jgmpl 0/1 Running 1 27h 10.244.0.6 prod3 <none> <none>
kube-system coredns-66bff467f8-ldr9d 0/1 Running 1 27h 10.244.0.9 prod3 <none> <none>
kube-system etcd-prod3 1/1 Running 1 27h 192.168.0.93 prod3 <none> <none>
kube-system kube-apiserver-prod3 1/1 Running 1 27h 192.168.0.93 prod3 <none> <none>
kube-system kube-controller-manager-prod3 1/1 Running 1 27h 192.168.0.93 prod3 <none> <none>
kube-system kube-flannel-ds-amd64-xm26h 1/1 Running 2 27h 192.168.0.93 prod3 <none> <none>
kube-system kube-proxy-7lk5d 1/1 Running 1 27h 192.168.0.93 prod3 <none> <none>
kube-system kube-scheduler-prod3 1/1 Running 1 27h 192.168.0.93 prod3 <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-6b4884c9d5-xrdbh 1/1 Running 1 27h 10.244.0.7 prod3 <none> <none>
kubernetes-dashboard kubernetes-dashboard-7f99b75bf4-lfqtf 0/1 CrashLoopBackOff 310 27h 10.244.0.8 prod3 <none> <none>
bwa#prod3:~$ kubectl logs kubernetes-dashboard-7f99b75bf4-lfqtf --namespace=kubernetes-dashboard --tail=100
2020/08/05 12:02:31 Starting overwatch
2020/08/05 12:02:31 Using namespace: kubernetes-dashboard
2020/08/05 12:02:31 Using in-cluster config to connect to apiserver
2020/08/05 12:02:31 Using secret token for csrf signing
2020/08/05 12:02:31 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get "https://10.96.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf": dial tcp 10.96.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0xc00000c640)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:41 +0x446
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:66
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0xc00044f800)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:501 +0xc6
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0xc00044f800)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:469 +0x47
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:550
main.main()
/home/runner/work/dashboard/dashboard/src/app/backend/dashboard.go:105 +0x20d
bwa#prod3:~$ telnet 10.96.0.1 443
Trying 10.96.0.1...
Connected to 10.96.0.1.
Escape character is '^]'.
^CConnection closed by foreign host.
bwa#prod3:~$
By the looks of that cluster, you do not have a networking plugin (CNI) installed. I do not see any flannel pods in the kube-system namespace, and the coredns pods are not starting.
This would also explain why the dashboard panics, as it is unable to reach the K8s API server via the 10.96.0.1 service.
Can you check the flannel installation (or just reinstall flannel on the cluster)?

Virtualbox kubernetes NodePort access

Morning,
I have a simple nginx setup that is using NodePort to access on an alternate port 30000. I cannot seem to figure out how to actually access it on my workstation that has the virtualbox installed.
Some basic stuff:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP
25h
nginx-55bc597fbf-zb2ml ClusterIP 10.101.124.73 <none> 8080/TCP
24h
nginx-service-np NodePort 10.105.157.230 <none>
8082:30000/TCP 22h
user-login-service NodePort 10.106.129.60 <none>
5000:31395/TCP 38m
I am using flannel
kubectl cluster-info
Kubernetes master is running at https://192.168.56.101:6443
KubeDNS is running at https://192.168.56.101:6443/api/v1/namespaces/kube-
system/services/kube-dns:dns/proxy
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 25h v1.15.1
k8s-worker1 Ready <none> 94m v1.15.1
k8s-worker2 Ready <none> 98m v1.15.1
I did port forwarding for NAT where it is supposed to forward 30000 to 80 and also did 31395 to 31396 for the user-login-service
Trying to access using master ip https://192.168.56.101:80 or https://192.168.56.101:31396 fails. I did try http as well, but cluster-info seems to show master using https and kubernetes is using 443/tcp.
There are two adapters for master and the workers. One adapter is NAT and used to allow flow of traffic outbound (e.g., for use with apt-get commands)
This seems to use 10.0.3.15 address assigned to all three nodes
The other adapter is host-ip and is what is giving the servers addresses in the 192.168.56.0 network. I did set those as static using netplan.
The three servers can see each other fine. I can do external traffic fine.
/etc/netplan# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-4xg8h 1/1 Running 17
120m
coredns-5c98db65d4-xn797 1/1 Running 17
120m
etcd-k8s-master 1/1 Running 8 25h
kube-apiserver-k8s-master 1/1 Running 8 25h
kube-controller-manager-dashap-k8s-master 1/1 Running 12 25h
kube-flannel-ds-amd64-6fw7x 1/1 Running 0 25h
kube-flannel-ds-amd64-hd4ng 1/1 Running 0
122m
kube-flannel-ds-amd64-z2wls 1/1 Running 0
126m
kube-proxy-g8k5l 1/1 Running 0 25h
kube-proxy-khn67 1/1 Running 0
126m
kube-proxy-zsvqs 1/1 Running 0
122m
kube-scheduler-k8s-master 1/1 Running 10 25h
weave-net-2l5cs 2/2 Running 0 44m
weave-net-n4zmr 2/2 Running 0 44m
weave-net-v6t74 2/2 Running 0 44m
This is my first setup, so it is hard to troubleshoot for me. Any help on how to reach the the two services using my browser on my workstation and not within the nodes would be appreciated.

Kubernetes Dashborad is not opening

My Master node ip address is 192.168.56.101. there is no node connected to master yet.
master#kmaster:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kmaster Ready master 125m v1.15.1
master#kmaster:~$
When i deployed my kubernetes-dashborad using below command, why running IP Address of kubernetes-dashboard-5c8f9556c4-f2jpz is 192.168.189.6
Similarly the other pods has also different IP address.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta1/aio/deploy/recommended.yaml
master#kmaster:~$ kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-kube-controllers-7bd78b474d-r2bwg 1/1 Running 0 113m 192.168.189.2 kmaster <none> <none>
kube-system calico-node-dsgqt 1/1 Running 0 113m 192.168.56.101 kmaster <none> <none>
kube-system coredns-5c98db65d4-n2wml 1/1 Running 0 114m 192.168.189.3 kmaster <none> <none>
kube-system coredns-5c98db65d4-v5qc8 1/1 Running 0 114m 192.168.189.1 kmaster <none> <none>
kube-system etcd-kmaster 1/1 Running 0 114m 192.168.56.101 kmaster <none> <none>
kube-system kube-apiserver-kmaster 1/1 Running 0 114m 192.168.56.101 kmaster <none> <none>
kube-system kube-controller-manager-kmaster 1/1 Running 0 114m 192.168.56.101 kmaster <none> <none>
kube-system kube-proxy-bgtmr 1/1 Running 0 114m 192.168.56.101 kmaster <none> <none>
kube-system kube-scheduler-kmaster 1/1 Running 0 114m 192.168.56.101 kmaster <none> <none>
kubernetes-dashboard kubernetes-dashboard-5c8f9556c4-f2jpz 1/1 Running 0 107m 192.168.189.6 kmaster <none> <none>
kubernetes-dashboard kubernetes-metrics-scraper-86456cdd8f-w45w2 1/1 Running 0 107m 192.168.189.4 kmaster <none> <none>
master#kmaster:~$
And also not able to access the kubernetes-dashboard UI. i am using the link
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/.
and the link KubeDNS https://192.168.56.101:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy is also not working.
but when trying to access Kubernetes master at https://192.168.56.101:6443 is working.
master#kmaster:~$ kubectl cluster-info
Kubernetes master is running at https://192.168.56.101:6443
KubeDNS is running at https://192.168.56.101:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Any suggestions.
Solution (see comments): Don't mix your physical and overlay network ranges.
Accessing the KubeDNS is only possible with DNS as protocol, not HTTP. If you want to query the DNS service you need to kubectl port-forward, not the HTTP (API) proxy.
If you try to access the dashboard with localhost:8081, you have to run kubectl proxy --port 8081 from your console to setup the proxy between you localhost to the k8s apiserver.
If you want to access dashboard from apiserver directly without the local proxy, try the following url https://192.168.56.101:6443/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy (assuming your service name is kubernetes-dashboard)
You can also run kubectl port-forward svc/kubernetes-dashboard -n kubernetes-dashboard 443, then access the dashboard with https://localhost:443