Kubernetes coredns is not reachable from the pod - kubernetes

I have a pod deployed named 'sample_pod' in rancher cluster having a container named 'sample_container'. The sample pod has a service named 'test'. Inside the sample_container, if I try to resolve the cluster domain names using 'host' or 'dig' or 'nslookup' command, I am always getting connection refused; no servers could be reached.
I have coredns pods running inside my cluster
user#abc$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7fbff695b4-f7vxc 1/1 Running 0 21h
canal-928m6 2/2 Running 0 21h
canal-d7vjr 2/2 Running 0 20h
coredns-6f85d5fb88-9txmx 1/1 Running 0 21h
coredns-autoscaler-79599b9dc6-ndgfj 1/1 Running 0 21h
kube-multus-ds-769n6 1/1 Running 0 20h
metrics-server-8449844bf-jz66w 1/1 Running 0 21h
rke-coredns-addon-deploy-job-dlvlh 0/1 Completed 0 21h
rke-ingress-controller-deploy-job-jcj6w 0/1 Completed 0 21h
rke-metrics-addon-deploy-job-wnhbq 0/1 Completed 0 21h
rke-network-plugin-deploy-job-wzqfb 0/1 Completed 0 21h
whereabouts-p6vcc 1/1 Running 0 20h
I am not touching the default Corefile of coredns
Corefile:
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . "/etc/resolv.conf"
cache 30
loop
reload
loadbalance
}
/etc/hosts file of sample_container:
[root#sample_container]# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.42.1.18 sample_pod
# Entries added by HostAliases.
127.0.0.1 localhost
10.94.66.8 netboot.com
/etc/resolv.conf of sample_container:
[root#sample_container]# cat /etc/resolv.conf
nameserver 10.43.0.10
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5
Host or dig command I used to resolve following domains and got the error:
[root#sample_container]# ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10) 56(84) bytes of data.
^C
--- 10.43.0.10 ping statistics ---
99 packets transmitted, 0 received, 100% packet loss, time 98003ms
[root#sample_container]# host kube-dns.kube-system
;; connection timed out; no servers could be reached
[root#sample_container]# host localhost
;; connection timed out; no servers could be reached
I tried to resolve test service in the default namespace (where sample_container, sample_pod resides in same namespace)
[root#sample_container]# host test
;; connection timed out; no servers could be reached
dig or nslookup command also returns same
[root#sample_container]# nslookup localhost
;; connection timed out; no servers could be reached
[root#sample_container]# dig localhost
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.8 <<>> localhost
;; global options: +cmd
;; connection timed out; no servers could be reached
Additional information on pod ip and service ip:
root#user$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/sample_pod 1/1 Running 0 177m 10.42.1.18 dsc-worker-node <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/test ClusterIP 10.43.19.85 <none> 80/TCP,443/TCP 177m role=test
Note: I deployed this pod such a way that some containers will access the baremetal machine to serve its purpose. And I need to achieve forwarding certain domain names to that baremetal server which will reply for that dns query. Also I am aware of forward plugin which does this job. But without touching the Corefile, I am unable to reach coredns for cluster domain names itself.
Could someone help me to solve this issue? It would be really helpful for me. Thanks in advance!!!

I solved this issue after changing the route. By default, the dns queries are sent to kubernetes nameserver via private interface instead of sending via default gateway (public interface). After changing the route to make dns queries to be sent via default gateway, it was solved.

Related

how does tcp connection works in kubernetes loadbalancer

Hi I am running 5 replicas tcp-client( can be scaled up) and exposed 3 services as LoadBalancer to external network to get incoming connection. This client is listining on a port 7777 internally and mapped to external port 17777, 27777, 37777.
Pods
[root#pwconfig-k8s-master0 tcp_poc]# kubectl get pods -l app=tcp-client
NAME READY STATUS RESTARTS AGE
tcp-client-7dd545dcc9-54bdl 1/1 Running 0 4m47s
tcp-client-7dd545dcc9-628jn 1/1 Running 0 4m47s
tcp-client-7dd545dcc9-7pm44 1/1 Running 0 2m30s
tcp-client-7dd545dcc9-b287n 1/1 Running 0 4m47s
tcp-client-7dd545dcc9-mrmnm 1/1 Running 0 2m30s
Service
[root#pwconfig-k8s-master0 tcp_poc]# kubectl get svc | grep tcp-client
tcp-client ClusterIP y.y.y.y <none> 7777/TCP 4m36s
tcp-client-0 LoadBalancer y.y.y.y x.x.x.x 17777:30859/TCP 2m55s
tcp-client-1 LoadBalancer y.y.y.y x.x.x.x 27777:30089/TCP 2m55s
tcp-client-2 LoadBalancer y.y.y.y x.x.x.x 37777:31031/TCP 2m55s
We have seen this behavior that once any external client makes the tcp connection, the connection get fixed with particular pod and stay alive until external client closes the connection. I wanted to know the how the routing and tcp connection is working as I can see the LoadBalancing is over external client tcp connection not over the packets.
So if there are 100 external client , it will loadbalance over the client and route the tcp connection and fix with the pod for the lifecycle of the tcp connection.
Thanks for the help in advanced.

kubernetes - nginx ingress - How to access

I could not access my application from the k8s cluster.
With nodePort everything works. If I use ingress controller, I could see that it is created successfully. I am able to ping IP. If I try to telnet, it says connection refused. I am also unable to access the application. What do i miss? I do not see any exception in the ingress pod.
kubectl get ing -n test
NAME CLASS HOSTS ADDRESS PORTS AGE
web-ingress <none> * 192.168.0.102 80 44m
ping 192.168.0.102
PING 192.168.0.102 (192.168.0.102) 56(84) bytes of data.
64 bytes from 192.168.0.102: icmp_seq=1 ttl=64 time=0.795 ms
64 bytes from 192.168.0.102: icmp_seq=2 ttl=64 time=0.860 ms
64 bytes from 192.168.0.102: icmp_seq=3 ttl=64 time=0.631 ms
^C
--- 192.168.0.102 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2038ms
rtt min/avg/max/mdev = 0.631/0.762/0.860/0.096 ms
telnet 192.168.0.102 80
Trying 192.168.0.102...
telnet: Unable to connect to remote host: Connection refused
kubectl get all -n ingress-nginx
shows this
NAME READY STATUS RESTARTS AGE
pod/ingress-nginx-admission-create-htvkh 0/1 Completed 0 99m
pod/ingress-nginx-admission-patch-cf8gj 0/1 Completed 0 99m
pod/ingress-nginx-controller-7fd7d8df56-kll4v 1/1 Running 0 99m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ingress-nginx-controller NodePort 10.102.220.87 <none> 80:31692/TCP,443:32736/TCP 99m
service/ingress-nginx-controller-admission ClusterIP 10.106.159.230 <none> 443/TCP 99m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ingress-nginx-controller 1/1 1 1 99m
NAME DESIRED CURRENT READY AGE
replicaset.apps/ingress-nginx-controller-7fd7d8df56 1 1 1 99m
NAME COMPLETIONS DURATION AGE
job.batch/ingress-nginx-admission-create 1/1 7s 99m
job.batch/ingress-nginx-admission-patch 1/1 8s 99m
Answer
The IP from kubectl get ing -n test is not an externally accessible address that you should be using.
Your NGINX Ingress Controller Deployment has a Service deployed alongside it. You can use the external IP of this Service (if it has one) to hit your Ingress Controller.
Because your Service is of NodePort type (and does not show an external IP), you must address the Ingress Controller Pods through your cluster's Node IPs. You would need to track which Node the Pod is on, then find the Node's IP. Here is an example of doing this:
NODE=$(kubectl get pods -o wide | grep "ingress-nginx-controller" | awk {'print $8'})
NODE_IP=$(kubectl get nodes "$NODE" -o wide | grep Ready | awk {'print $7'})
More Info
If your cluster is managed (i.e. GKE/Azure/AWS), you can use a LoadBalancer Service to provide an external IP to hit the Ingress Controller.

Failed to create SubnetManager: asn1: structure error: tags don't match

While this error message is just a symptom, my problem is real.
My bare-metal cluster experienced a certificate expired situation. i managed to renew all certificates, but upon restart, most pods wouldn't work. the pod that seems responsible, is the flannel one (crashloopbackoff).
logs for the flannel pods show
I1120 22:24:00.541277 1 main.go:475] Determining IP address of default interface
I1120 22:24:00.541546 1 main.go:488] Using interface with name eth0 and address xxx.xxx.xxx.xxx
I1120 22:24:00.541565 1 main.go:505] Defaulting external address to interface address (xxx.xxx.xxx.xxx)
E1120 22:24:03.572745 1 main.go:232] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-amd64-dmrzh': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-amd64-dmrzh: dial tcp 10.96.0.1:443: getsockopt: network is unreachable
on the host there is not even a flannel interface anymore. neither a systemd file
running flanneld manually yields this output
I1120 20:12:15.923966 26361 main.go:446] Determining IP address of default interface
I1120 20:12:15.924171 26361 main.go:459] Using interface with name eth0 and address xxx.xxx.xxx.xxx
I1120 20:12:15.924187 26361 main.go:476] Defaulting external address to interface address (xxx.xxx.xxx.xxx)
E1120 20:12:15.924285 26361 main.go:223] Failed to create SubnetManager: asn1: structure error: tags don't match (16 vs {class:0 tag:2 length:1 isCompound:false}) {optional:false explicit:false application:false defaultValue:<nil> tag:<nil> stringType:0 timeType:0 set:false omitEmpty:false} tbsCertificate #2
the available pieces of evidence point in several directions, but upon checking those out, it points somewehre else. so i need a pointer to which part causes the problem.
is it etcd?
is it the new etcd certificate?
is it the missing flannel-interface?
is it the non-operational flanneld?
is it something not listed here?
if there is information missing here, please ask, i can surely provide.
key specs:
- host: ubuntu 18.04
- kubeadm 1.13.2
thank you and best regards,
scones
UPDATE1
$ k get cs,po,svc
NAME STATUS MESSAGE ERROR
componentstatus/controller-manager Healthy ok
componentstatus/scheduler Healthy ok
componentstatus/etcd-0 Healthy {"health": "true"}
NAME READY STATUS RESTARTS AGE
pod/cert-manager-6dc5c68468-hkb6j 0/1 Error 51 89d
pod/coredns-86c58d9df4-dtdxq 0/1 Completed 23 304d
pod/coredns-86c58d9df4-k7h7m 0/1 Completed 23 304d
pod/etcd-redacted 1/1 Running 2506 304d
pod/hostpath-provisioner-5c6754fbd4-ckvnp 0/1 Error 12 222d
pod/kube-apiserver-redacted 1/1 Running 1907 304d
pod/kube-controller-manager-redacted 1/1 Running 2682 304d
pod/kube-flannel-ds-amd64-dmrzh 0/1 CrashLoopBackOff 338 372d
pod/kube-proxy-q8jgs 1/1 Running 15 304d
pod/kube-scheduler-redacted 1/1 Running 2694 304d
pod/metrics-metrics-server-65cd865c9f-dbh85 0/1 Error 2658 120d
pod/tiller-deploy-865b88d89-8ftzs 0/1 Error 170 305d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 372d
service/metrics-metrics-server ClusterIP 10.97.186.19 <none> 443/TCP 120d
service/tiller-deploy ClusterIP 10.103.184.226 <none> 44134/TCP 354d
unfortunately i don't recall how i installed flannel a year ago.
kubectl version is also 1.13.2, as is the cluster
the linked post by #hanx is about renewing certificated, not broken network overlays, so not applicable.

Why doesn't kube-proxy route traffic to another worker node?

I've deployed several different services and always get the same error.
The service is reachable on the node port from the machine where the pod is running. On the two other nodes I get timeouts.
The kube-proxy is running on all worker nodes and I can see in the logfiles from kube-proxy that the service port was added and the node port was opened.
In this case I've deployed the stars demo from calico
Kube-proxy log output:
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.229458 659 service.go:309] Adding new service port "management-ui/management-ui:" at 10.32.0.133:9001/TCP
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.257483 659 proxier.go:1427] Opened local port "nodePort for management-ui/management-ui:" (:30002/tcp)
The kube-proxy is listening on the port 30002
root#kuben1:/tmp# netstat -lanp | grep 30002
tcp6 0 0 :::30002 :::* LISTEN 659/kube-proxy
There are also some iptable rules defined:
root#kuben1:/tmp# iptables -L -t nat | grep management-ui
KUBE-MARK-MASQ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-MARK-MASQ tcp -- !10.200.0.0/16 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
The interesting part is that I can reach the service IP from any worker node
root#kubem1:/tmp# kubectl get svc -n management-ui
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
management-ui NodePort 10.32.0.133 <none> 9001:30002/TCP 52m
The service IP/port can be accessed from any worker node if I do a "curl http://10.32.0.133:9001"
I don't understand why kube-proxy does not "route" this properly...
Has anyone a hint where I can find the error?
Here some cluster specs:
This is a hand build cluster inspired by Kelsey Hightower's "kubernetes the hard way" guide.
6 Nodes (3 master: 3 worker) local vms
OS: Ubuntu 18.04
K8s: v1.13.0
Docker: 18.9.3
Cni: calico
Component status on the master nodes looks okay
root#kubem1:/tmp# kubectl get componentstatus
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
The worker nodes are looking okay if I trust kubectl
root#kubem1:/tmp# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kuben1 Ready <none> 39d v1.13.0 192.168.178.77 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben2 Ready <none> 39d v1.13.0 192.168.178.78 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben3 Ready <none> 39d v1.13.0 192.168.178.79 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
As asked by P Ekambaram:
root#kubem1:/tmp# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-bgjdg 1/1 Running 5 40d
calico-node-nwkqw 1/1 Running 5 40d
calico-node-vrwn4 1/1 Running 5 40d
coredns-69cbb76ff8-fpssw 1/1 Running 5 40d
coredns-69cbb76ff8-tm6r8 1/1 Running 5 40d
kubernetes-dashboard-57df4db6b-2xrmb 1/1 Running 5 40d
I've found a solution for my "Problem".
This behavior was caused by a change in Docker v1.13.x and the issue was fixed in kubernetes with version 1.8.
The easy solution was to change the forward rules via iptables.
Run the following cmd on all worker nodes: "iptables -A FORWARD -j ACCEPT"
To fix it the right way i had to tell the kube-proxy the cidr for the pods.
Theoretical that could be solved in two ways:
Add "--cluster-cidr=10.0.0.0/16" as argument to the kube-proxy command line(in my case in the systemd service file)
Add 'clusterCIDR: "10.0.0.0/16"' to the kubeconfig file for kube-proxy
In my case the cmd line argument doesn't had any effect.
As i've added the line to my kubeconfig file and restarted the kube-proxy on all worker nodes everything works well.
Here is the github merge request for this "FORWARD" issue: link

ipvsadm not showing any entry in kubeadm cluster

I have installed kubeadm and created service and pod:
packet#test:~$ kubectl get pod
NAME READY STATUS RESTARTS AGE
udp-server-deployment-6f87f5c9-466ft 1/1 Running 0 5m
udp-server-deployment-6f87f5c9-5j9rt 1/1 Running 0 5m
udp-server-deployment-6f87f5c9-g9wrr 1/1 Running 0 5m
udp-server-deployment-6f87f5c9-ntbkc 1/1 Running 0 5m
udp-server-deployment-6f87f5c9-xlbjq 1/1 Running 0 5m
packet#test:~$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1h
udp-server-service NodePort 10.102.67.0 <none> 10001:30001/UDP 6m
but still I am not able to access udp-server pod:
packet#test:~$ curl http://192.168.43.161:30001
curl: (7) Failed to connect to 192.168.43.161 port 30001: Connection refused
while debugging i could see kube-proxy is running but there is no entry in IPVS:
root#test:~# ps auxw | grep kube-proxy
root 4050 0.5 0.7 44340 29952 ? Ssl 14:33 0:25 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf
root 6094 0.0 0.0 14224 968 pts/1 S+ 15:48 0:00 grep --color=auto kube-proxy
root#test:~# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
Seems to be there is no entry in ipvsadm causing connection time out.
Regards, Ranjith
From this issue (putting aside the load balancer part),
Both externalIPs and status.loadBalancer.ingress[].ip seem to be ignored by kube-proxy in IPVS mode, so external traffic is completely unrouteable.
In contrast, kube-proxy in iptables mode creates DNAT/SNAT rules for external and loadbalancer IPs.
So check if adding a network plugin (flannel, Calico, ...) would improve the situation.
Or check out cloudnativelabs/kube-router, which is also ipvs-based.
A lean yet powerful alternative to several network components used in typical Kubernetes clusters.
All this from a single DaemonSet/Binary. It doesn't get any easier.
Since curl use tcp connection, while 30001 is a udp port, they don't work together, try a udp probe tool, like nmap.
initially I have created VM(Linux VM) using virtual box(running on window),where I found this type of issue.
Now i have created VM(Linux VM) using virtual manager(running on Linux),in this set up there is no issue and every thing works fine.
It would be great if any one tell is there any restriction from virtual box?