Kubespray : Netchecker connectivity check fails - kubernetes

I deployed a Kubernetes (v1.17.5) cluster on OpenStack instances using Kubespray. Those instances are CentOS 7.6.1811 qcow2 images imported in Glance.
The install was successful, and I can see my nodes and pods with kubectl commands.
I used the deploy_netchecker option to deploy NetChecker and test the network within my cluster, and set network_plugin="flannel".
I also tried kube_proxy_mode="iptables", but it doesn't seem to affect the result.
That's pretty much all the changes I did in the k8s-cluster.yml file.
All the pods are running, services too :
[centos#cl1-master-0 ~]$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 46h
default netchecker-service NodePort 10.233.13.213 <none> 8081:31081/TCP 46h
kube-system coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 46h
kube-system dashboard-metrics-scraper ClusterIP 10.233.59.12 <none> 8000/TCP 46h
kube-system kubernetes-dashboard ClusterIP 10.233.63.20 <none> 443/TCP 46h
But netchecker API gives the following answer :
[root#localhost ~]# curl http://X.X.X.X:31081/api/v1/connectivity_check
{"Message":"Connectivity check fails. Reason: there are absent or outdated pods; look up the payload","Absent":["netchecker-agent-hostnet-kk56x","netchecker-agent-hostnet-klldn","netchecker-agent-hostnet-r2vqs","netchecker-agent-hostnet-wqhjs"],"Outdated":["netchecker-agent-4jsgf","netchecker-agent-c9pcf","netchecker-agent-hostnet-jzbfv","netchecker-agent-vxgpf"]}
For an unknown reason, I cannot access the API from a cluster node with localhost, so I used a floating IP with OpenStack.
Here are some logs from the agent :
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-vjnwl_d8290268-3ea4-4e3c-acb4-295ab162a735/netchecker-agent/0.log
{"log":"I0701 13:04:01.814246 1 agent.go:135] Response status code: 200\n","stream":"stderr","time":"2020-07-01T13:04:01.81437579Z"}
{"log":"I0701 13:04:01.814272 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:04:01.814393199Z"}
{"log":"I0701 13:04:16.817398 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-vjnwl\n","stream":"stderr","time":"2020-07-01T13:04:16.817786735Z"}
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-hostnet-klldn_d5fa6e72-885f-44e1-97a6-880a25e6d6d6/netchecker-agent/0.log
{"log":"E0701 13:05:22.804428 1 agent.go:133] Error while sending info. Details: Post http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn: dial tcp 10.233.13.213:8081: i/o timeout\n","stream":"stderr","time":"2020-07-01T13:05:22.805138032Z"}
{"log":"I0701 13:05:22.804474 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:05:22.805190295Z"}
{"log":"I0701 13:05:37.807140 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn\n","stream":"stderr","time":"2020-07-01T13:05:37.807309111Z"}
Logs from the server do not indicate any error.
I tried to check DNS resolve with the following :
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- /bin/sh
/ $ nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
nslookup: can't resolve 'kubernetes.default'
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5
169.254.25.10 is the IP of the nodelocaldns, but it doesn't seem to query the coredns service deployed.
When I use nslookup netchecker-service.default.svc.cluster.local 10.233.0.3, with the coredns IP, I get a correct answer.
What can be wrong with my configuration ?
Thanks in advance
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.

UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.

Related

kube service domain name not working, but clusterIP does work with Jupyter Enterprise Gateway

I have a Jupyter notebook setup in the jupyter namespace on a kubernetes cluster, and Jupyter Enterprise Gateway setup in the enterprise-gateway namespace as a Service in the same cluster.
If I configure the notebook to connect to the enterprise-gateway service using the clusterIP it works fine.
--gateway-url=http://172.20.186.249:8888
but if I switch to using the service domain name the notebook receives a 503 Connection Refused error
--gateway-url=http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888
When I use busybox check to check the kubernetes dns, the domain resolves as expected.
kubectl -n default exec -ti busybox nslookup enterprise-gateway.enterprise-gateway
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Server: 172.20.0.10
Address 1: 172.20.0.10 kube-dns.kube-system.svc.cluster.local
Name: enterprise-gateway.enterprise-gateway
Address 1: 172.20.186.249 enterprise-gateway.enterprise-gateway.svc.cluster.local
How do I get the domain name to work?
The Service config for the JEG looks like this...
kubectl describe svc enterprise-gateway --namespace enterprise-gateway
Name: enterprise-gateway
Namespace: enterprise-gateway
Labels: app=enterprise-gateway
app.kubernetes.io/managed-by=Helm
chart=enterprise-gateway-2.6.0
component=enterprise-gateway
heritage=Helm
release=enterprise-gateway
Annotations: meta.helm.sh/release-name: enterprise-gateway
meta.helm.sh/release-namespace: enterprise-gateway
Selector: app=enterprise-gateway
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.250.15
IPs: 172.20.250.15
Port: http 8888/TCP
TargetPort: 8888/TCP
NodePort: http 31366/TCP
Endpoints: 10.1.16.136:8888,10.1.2.228:8888,10.1.30.90:8888
Port: response 8877/TCP
TargetPort: 8877/TCP
NodePort: response 31201/TCP
Endpoints: 10.1.16.136:8877,10.1.2.228:8877,10.1.30.90:8877
Session Affinity: ClientIP
External Traffic Policy: Cluster
Events: <none>
Ok, i dont know where to start i have a bunch of findings. I will start with the eye catcher one, i have a working test project i can share later on and i have to elaborate more in this answer if needed.
Step1
1- I see a mismatch on your IPs. The DNS lookup did not resolved the service DNS to the correct IP.
Address 1: 172.20.186.249 is different than IP: 172.20.250.15
To debug DNS:
kubectl exec "YOURPODNAME" cat /etc/resolv.conf
Verify that a search path and a name server are set up correctly
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
check if the kubedns/coredns pods are running
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
....
kube-dns-86f4d74b45-2qkfd 3/3 Running 232 133d
kube-proxy-b2frq 1/1 Running 0 15m
...
If the pod is running, there might be something wrong with the global DNS service
kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP
You might also need to check whether DNS endpoints are exposed:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 172.17.0.5:53,172.17.0.5:53 133d
These debugging actions will usually indicate the problem with your DNS configuration, or it will simply show you that a DNS add-on should be enabled in your cluster configuration.
Step 2
When using busybox to check the kubernetes dns
This seems incorrect when looking Address 1: 172.20.186.249 im expecting to get an IP 10.X.X.X
Install dnsutils on the pod as pointed below
kubectl exec --stdin --tty "YOURPODNAME" -- apt update && sudo
apt-get -y install dnsutils
kubectl exec -it "YOURPODNAME" -- /bin/bash
Inside the pod and again (weird) run apt-get install dnsutils
Stay inside the pod and run nslookup "YOURSERVICENAME" you will get
an IP and a Name(DNS).
Check this IP since it needs to match with the IP of the service description.
kubectl describe svc "YOURSERVICENAME", the IP should be the same as #4
What you must see:
Step 3
Once you have Step #2 solved you will be able to use the service
name(FQDN) returned in Step 2 item #4
To be continued...

minikube dashabord unable to access it from outsude/internet

Here is the output of minikube dashbaord
ubuntu#ip-172-31-5-166:~$ minikube dashboard
* Enabling dashboard ...
* Verifying dashboard health ...
* Launching proxy ...
* Verifying proxy health ...
* Opening http://127.0.0.1:45493/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/ in your default browser...
- http://127.0.0.1:45493/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
I have enabled port 45493 at Security Group Level and also on Linux VM. However,, when I'm trying to access the Kube dashboard, I don't have luck
wget http://13.211.44.210:45493/
--2020-04-16 05:50:52-- http://13.211.44.210:45493/
Connecting to 13.211.44.210:45493... failed: Connection refused.
However, when I do the below, it works and produces index.html file with status code 200
wget http://127.0.0.1:45493/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
--2020-04-16 05:52:55-- http://127.0.0.1:45493/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
Connecting to 127.0.0.1:45493... connected.
HTTP request sent, awaiting response... 200 OK
Steps to reproduce at high level is as below:
EC2 Ubuntu of size t2.large
Install minikube, minikube start --driver=docker
Perform deployment as like kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta8/aio/deploy/recommended.yaml
kubectl get pods -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-84bfdf55ff-xx8pl 1/1 Running 0 26m
kubernetes-dashboard-bc446cc64-7nl68 1/1 Running 0 26m
5.kubectl get svc -n kubernetes-dashboard
TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.102.85.110 <none> 8000/TCP 40m
kubernetes-dashboard ClusterIP 10.99.75.241 <none> 80/TCP 40m
My question is why I'm unable to access the internet?
This is by design, minikube is a development tool for local environments.
You can deploy an ingress or loadbalancer service to expose the dashboard, if you really know what you are doing.

Kong Ingress Controller at Home

I'm learning about Kubernetes and ingress controllers but I'm stucked getting this error when I try to apply kong ingress manifest...
ingress-kong-7dd57556c5-bh687 0/2 Init:0/1 0 29s
kong-migrations-gzlqj 0/1 Init:0/1 0 28s
postgres-0 0/1 Pending 0 28s
Is it possible to run this ingress on my home server without minikube ? If so, how?
Note: I have a FQDN pointing to my home server.
I guess you run manifest from Github
Issues with Pods
I have reproduced your case. As you have 3 pods, you have used option with DB.
If you will describe pods using
$ kubectl describe pod <podname> -n kong
you will receive error output:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7s (x4 over 17s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
You can also check job in kong namespace.
It is work correctly on fresh Minikube cluster, so I guess you might apply same changes to storageclass.
Is it possible to run this ingress on my home server without minikube ? If so, how?
You have to use Kubernetes to do it. Since Minikube is supporting LoadBalancer you can can use it in Home.
You can check this thread about FQDN. As mentioned:
The host machine should be able to resolve the name of that FQDN. You
might add a record into the /etc/hosts at the Mac host to achieve
that:
10.0.0.2 mydb.mytestdomain
But in your case it should be IP address of LoadBalancer, kong-proxy.
Obtain LoadBalancer IP in Minikube
If you will deploy everything correctly you can check your services.
$ kubectl get svc -n kong
You will see kong-proxy service with LoadBalancer type wit <pending> EXTERNAL-IP.
To obtain ExternalIP you have to use minikbue tunnel.
Please note that you need have $ sudo minikube tunnel run in one console whole time.
Before Minikube tunnel
$ kubectl get svc -n kong
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kong-proxy LoadBalancer 10.110.218.74 <pending> 80:31881/TCP,443:31319/TCP 103m
kong-validation-webhook ClusterIP 10.108.204.137 <none> 443/TCP 103m
postgres ClusterIP 10.105.9.54 <none> 5432/TCP 103m
After
$ kubectl get svc -n kong
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kong-proxy LoadBalancer 10.110.218.74 10.110.218.74 80:31881/TCP,443:31319/TCP 104m
kong-validation-webhook ClusterIP 10.108.204.137 <none> 443/TCP 104m
postgres ClusterIP 10.105.9.54 <none> 5432/TCP 104m
Testing Kong
Here you can find how to get start with Kong. It will show you how to create Ingress. Later as I mentioned you have to edit ingress and add rule (host) similar like in K8s docs.

What happens when a service receives a request but has no ready pods?

Having a kubernetes service (of type ClusterIP) connected to a set of pods, but none of them are currently ready - what will happen to the request?
Will it:
fail eagerly
timeout
wait until a ready pod is available (or forever, whichever is earlier)
something else?
It will time out.
Kube-proxy pulls out the IP addresses from healthy pods and sets as endpoints of the service (backends). Also, note that all kube-proxy does is to re-write the iptables when you create, delete or modify a service.
So, when you send a request within your network and there is no one to reply, your request will timeout.
Deployed nginx service
[node1 ~]$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 443/TCP 2h
my-nginx ClusterIP 10.100.1.134 80/TCP 9s
$ curl 10.100.1.134
curl: (7) Failed connect to 10.100.1.134:80; Connection refused
Deployed nginx deployment
$ kubectl create -f nginx-depl.yaml
$ kubectl get po
NAME READY STATUS RESTARTS AGE
my-nginx-f9945ffdd-2f77f 1/1 Running 0 1m
my-nginx-f9945ffdd-rk68v 1/1 Running 0 1m
$ curl 10.100.1.134
Welcome to nginx!
most likely you would get Connection refused error

coredns do not resolve service name correctly

i use Kubernetes v1.11.3 ,it use coredns to resolve host or service name,but i find in pod ,the resolve not work correctly,
# kubectl get services --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 50d <none>
kube-system calico-etcd ClusterIP 10.96.232.136 <none> 6666/TCP 50d k8s-app=calico-etcd
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 50d k8s-app=kube-dns
kube-system kubelet ClusterIP None <none> 10250/TCP 32d <none>
testalex grafana NodePort 10.96.51.173 <none> 3000:30002/TCP 2d app=grafana
testalex k8s-alert NodePort 10.108.150.47 <none> 9093:30093/TCP 13m app=alertmanager
testalex prometheus NodePort 10.96.182.108 <none> 9090:30090/TCP 16m app=prometheus
following command no response
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping k8s-alert
PING k8s-alert.testalex.svc.cluster.local (10.108.150.47) 56(84) bytes of data.
and no cordons output log
# kubectl logs coredns-78fcdf6894-h78sd -n kube-system
i think maybe something is wrong,but i can not locate the problem,another question is why the two coredns pods on the master node,it suppose to one on each node
UPDATE
it seems coredns work fine ,but i do not understand the ping command no return
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
# kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 192.168.121.3:53,192.168.121.4:53,192.168.121.3:53 + 1 more... 50d
also dns server can not be reached
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
^C
--- 10.96.0.10 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8000ms
i think maybe i misconfig the network
this is my cluster init command
kubeadm init --kubernetes-version=v1.11.3 --apiserver-advertise-address=10.100.1.20 --pod-network-cidr=172.16.0.0/16
and this is calico ip pool set
# kubectl exec -it calico-node-77m9l -n kube-system /bin/sh
Defaulting container name to calico-node.
Use 'kubectl describe pod/calico-node-77m9l -n kube-system' to see all of the containers in this pod.
/ # cd /tmp
/tmp # ls
calicoctl tunl-ip
/tmp # ./calicoctl get ipPool
CIDR
192.168.0.0/16
You can start by checking if the dns is working
Run the nslookup on kubernetes.default from inside the pod k8s-monitor-7ddcb74b87-n6jsd, check if it is working.
[root#k8s-monitor-7ddcb74b87-n6jsd /]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
If this returns output that means everything is working from the coredns. If output is not okay, then look into the the resolve.conf inside the pod k8s-monitor-7ddcb74b87-n6jsd, it should return output something like this:
[root#metrics-master-2 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
At last check the coredns endpoints are exposed using:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.180.3.17:53,10.180.3.17:53 1h
You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns
Hope this helps.
EDIT:
You might be having this issue, Please have a look:
https://github.com/kubernetes/kubeadm/issues/1056
You cannot ping ipaddress or hostname of service cluster always,since it is virtual ip
service’s cluster IP is a virtual IP, and only has meaning when combined with the service port.You can try the same via srv recored(combination of virtual ip and port)(refer kubernetes in action by mark luksa)
Thanks for the answer. This is the output. IP-s certainly not real.
[root#master ~]# nslookup kubernetes.default
Server: 203.150.92.12
Address: 203.150.92.12#53
** server can't find kubernetes.default: NXDOMAIN
[root#master ~]# kubectl cluster-info
Kubernetes master is running at https://203.150.72.81:6443
coredns is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/coredns:dns/proxy
kubernetes-dashboard is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
metrics-server is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#master ~]# cat /etc/resolv.conf
search invalid
nameserver 203.150.92.12
nameserver 203.150.92.10
nameserver 1111:c207::2:55
[root#master ~]# kubectl get ep kube-dns --namespace=kube-system
Error from server (NotFound): endpoints "kube-dns" not found
[root#master ~]#
I think the reason why you cannot get ping working is because you are using iptables to redirect the request to service cluster IP to the correct pods. The iptables rule will only redirect the traffic to the service cluster IP with the exported ports. The icmp request is never been redirected to the real endpoints.