k3s - networking between pods not working - kubernetes

I'm struggling with this cross-communication between pods even though clusterIP services are set up for them. All the pods are on the same master node, and in the same namespace. In Summary:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-744f4df6df-rxhph 1/1 Running 0 136m 10.42.0.31 raspberrypi <none> <none>
nginx-2-867f4f8859-csn48 1/1 Running 0 134m 10.42.0.32 raspberrypi <none> <none>
$ kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
nginx-service ClusterIP 10.43.155.201 <none> 80/TCP 136m app=nginx
nginx-service2 ClusterIP 10.43.182.138 <none> 85/TCP 134m app=nginx-2
where I can't curl http://nginx-service2:85 from within nginx container, or vice versa... while I validated that this worked from my docker desktop installation:
# docker desktop
root#nginx-7dc45fbd74-7prml:/# curl http://nginx-service2:85
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
nginx.org.<br/>
Commercial support is available at
nginx.com.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
# k3s
root#nginx-744f4df6df-rxhph:/# curl http://nginx-service2.pwk3spi-vraptor:85
curl: (6) Could not resolve host: nginx-service2.pwk3spi-vraptor
After googling the issue (and please correct me if I'm wrong) it seems like a coredns issue, because looking at the logs and see the error timeouts:
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
helm-install-traefik-qr2bd 0/1 Completed 0 153d
metrics-server-7566d596c8-nnzg2 1/1 Running 59 148d
svclb-traefik-kjbbr 2/2 Running 60 153d
traefik-758cd5fc85-wzjrn 1/1 Running 20 62d
local-path-provisioner-6d59f47c7-4hvf2 1/1 Running 72 148d
coredns-7944c66d8d-gkdp4 1/1 Running 0 3m47s
$ kubectl logs coredns-7944c66d8d-gkdp4 -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 1c648f07b77ab1530deca4234afe0d03
CoreDNS-1.6.9
linux/arm, go1.14.1, 1766568
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:50482->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:34160->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:53485->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:46642->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:55329->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:44471->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:49182->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:54082->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:48151->192.168.8.109:53: i/o timeout
[ERROR] plugin/errors: 2 1898797220.1916943194. HINFO: read udp 10.42.0.38:48599->192.168.8.109:53: i/o timeout
where people recommended
changing the coredns config map to forward to your master node IP
... other CoreFile stuff
forward . host server IP
... other CoreFile stuff
or adding your coredns clusterip IP as a nameserver to /etc/resolve.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.42.0.38
nameserver 192.168.8.1
nameserver fe80::266:19ff:fea7:85e7%wlan0
, however didn't find that these solutions worked.
Details for reference:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
raspberrypi Ready master 153d v1.18.9+k3s1 192.168.8.109 <none> Raspbian GNU/Linux 10 (buster) 5.10.9-v7l+ containerd://1.3.3-k3s2
$ kubectl get svc -n kube-system -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 153d k8s-app=kube-dns
metrics-server ClusterIP 10.43.205.8 <none> 443/TCP 153d k8s-app=metrics-server
traefik-prometheus ClusterIP 10.43.222.138 <none> 9100/TCP 153d app=traefik,release=traefik
traefik LoadBalancer 10.43.249.133 192.168.8.109 80:31222/TCP,443:32509/TCP 153d app=traefik,release=traefik
$ kubectl get ep kube-dns -n kube-system
NAME ENDPOINTS AGE
kube-dns 10.42.0.38:53,10.42.0.38:9153,10.42.0.38:53 153d
No idea where I'm going wrong, or if I focused on the wrong stuff, or how to continue. Any help will be much appreciated, please.

When all else fails..... go back to the manual. I tried finding the 'issue' in all the wrong places, while I just had to follow Rancher's installation documentation for k3s (sigh).
Rancher's documentation is very good (you just have to actually follow it), where they state that when installing k3s on Raspbian Buster environments
check version:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster
you need to change to legacy iptables, stating to run (link):
sudo iptables -F
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo reboot
note that when setting the iptables, do it directly on the pi, not via ssh. You will be kicked out
After doing this, all my services were happy, and could curl each other from within the containers via their defined clusterIP service names etc.

for anyone that don't want to waste 3 hours like me on centos using k3s you need to disable firewall for those services to call eachother
https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-red-hat-centos-enterprise-linux
It is recommended to turn off firewalld:
systemctl disable firewalld --now
If enabled, it is required to disable nm-cloud-setup and reboot the node:
systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
reboot
after i disabled it, the services was able to call each other through dns name in my Config
still looking for a better way then disable firewall, but that depend on the developer of the k3s project

Is there a reason why you try to curl this address:
curl http://nginx-service2.pwk3spi-vraptor:85
Shouldn't this be just:
curl http://nginx-service2:85

In my case , follow rancher docs:
The nodes need to be able to reach other nodes over UDP port 8472 when Flannel VXLAN is used or over UDP ports 51820 and 51821 (when using IPv6) when Flannel Wireguard backend is used.
I just open udp port in Oracle cloud and it's work

Related

k3s debuging dns resolution

I'm new to kubernetes and I have some issues with my dns names in my k3s cluster on pc with arm architecture.
I've tried to debug as docs (https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/) suggest
I installed 3ks as follows:
sudo curl -sfL https://get.k3s.io | K3S_KUBECONFIG_MODE=”644” sh -
And applied manifest for debugging pod:
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
I've checked that pod is running:
kubectl get pods dnsutils
and tried to run
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
and expected smth like that:
Server: 10.0.0.10
Address 1: 10.0.0.10
Name: kubernetes.default
Address 1: 10.0.0.1
But get:
;; connection timed out; no servers could be reached
command terminated with exit code 1
Any thoughts to debug? It seems that I messing smth...
UPD. Tried to debug as rancher suggests (https://docs.ranchermanager.rancher.io/v2.5/troubleshooting/other-troubleshooting-tips/dns):
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default
And there is the output:
If you don't see a command prompt, try pressing enter.
Address 1: 10.43.0.10
nslookup: can't resolve 'kubernetes.default'
pod "busybox" deleted
pod default/busybox terminated (Error)
So I tried next step:
for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --namespace=kube-system $p; done
and logs are:
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
.:53
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/reload: Running configuration SHA512 = b941b080e5322f6519009bb49349462c7ddb6317425b0f6a83e5451175b720703949e3f3b454a24e77f3ffe57fd5e9c6130e528a5a1dd00d9000e4afd6c1108d
CoreDNS-1.9.1
linux/arm64, go1.17.8, 4b597f8
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:39581->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:52272->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:41480->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:52059->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:46821->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:35222->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:38013->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:42222->8.8.8.8:53: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:50612->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 4288512074117887106.1437335397389171032. HINFO: read udp 10.42.0.5:50341->8.8.8.8:53: i/o timeout
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
...
UPD2
kubectl -n kube-system get cm coredns -o yaml
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
hosts /etc/coredns/NodeHosts {
ttl 60
reload 15s
fallthrough
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
import /etc/coredns/custom/*.server
NodeHosts: |
192.168.0.103 ubuntu
kind: ConfigMap
metadata:
annotations:
objectset.rio.cattle.io/applied: H4sIAAAAAAAA/4yQwWrzMBCEX0Xs2fEf20nsX9BDybH02lMva2kdq1Z2g6SkBJN3L8IUCiVtbyNGOzvfzoAn90IhOmHQcKmgAIsJQc+wl0CD8wQaSr1t1PzKSilFIUiIix4JfRoXHQjtdZHTuafAlCgq488xUSi9wK2AybEFDXvhwR2e8QQFHCnh50ZkloTJCcf8lP6NTIqUyuCkNJiSp9LJP5czoLjryztTWB0uE2iYmvjFuVSFenJsHx6tFf41gvGY6Y0Eshz/9D2e0OSZfIJVvMZExwzusSf/I9SIcQQNvaG6a+r/XVdV7abBddPtsN9W66Eedi0N7aberM22zaHf6t0tcPsIAAD//8Ix+PfoAQAA
objectset.rio.cattle.io/id: ""
objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
objectset.rio.cattle.io/owner-name: coredns
objectset.rio.cattle.io/owner-namespace: kube-system
creationTimestamp: "2022-09-23T09:06:05Z"
labels:
objectset.rio.cattle.io/hash: bce283298811743a0386ab510f2f67ef74240c57
name: coredns
namespace: kube-system
resourceVersion: "315"
uid: 33a8ccf6-511f-49c4-9752-424859d67d70
UPD3
kubectl -n kube-system get po -o wide
Output:
coredns-b96499967-sct84 1/1 Running 1 (17h ago) 20h 10.42.0.6 ubuntu <none> <none>
helm-install-traefik-crd-wrh5b 0/1 Completed 0 20h 10.42.0.3 ubuntu <none> <none>
helm-install-traefik-wx7s2 0/1 Completed 1 20h 10.42.0.5 ubuntu <none> <none>
local-path-provisioner-7b7dc8d6f5-qxjvs 1/1 Running 1 (17h ago) 20h 10.42.0.3 ubuntu <none> <none>
metrics-server-668d979685-ngbmr 1/1 Running 1 (17h ago) 20h 10.42.0.5 ubuntu <none> <none>
svclb-traefik-67fcd721-mz6sd 2/2 Running 2 (17h ago) 20h 10.42.0.2 ubuntu <none> <none>
traefik-7cd4fcff68-j74gd 1/1 Running 1 (17h ago) 20h 10.42.0.4 ubuntu <none> <none>
kubectl -n kube-system get svc
Output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 20h
metrics-server ClusterIP 10.43.178.64 <none> 443/TCP 20h
traefik LoadBalancer 10.43.36.41 192.168.0.103 80:30268/TCP,443:30293/TCP 20h
Actually I found workaround. When install k3s one should use flag flannel-backend=ipsec
curl -sfL https://get.k3s.io | sh -s - server --write-kubeconfig-mode 644 --flannel-backend=ipsec
By default it uses --flannel-backend=vxlan I've tried --flannel-backend=host-gw
But for me works well flannel-backend=ipsec

Kubespray : Netchecker connectivity check fails

I deployed a Kubernetes (v1.17.5) cluster on OpenStack instances using Kubespray. Those instances are CentOS 7.6.1811 qcow2 images imported in Glance.
The install was successful, and I can see my nodes and pods with kubectl commands.
I used the deploy_netchecker option to deploy NetChecker and test the network within my cluster, and set network_plugin="flannel".
I also tried kube_proxy_mode="iptables", but it doesn't seem to affect the result.
That's pretty much all the changes I did in the k8s-cluster.yml file.
All the pods are running, services too :
[centos#cl1-master-0 ~]$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 46h
default netchecker-service NodePort 10.233.13.213 <none> 8081:31081/TCP 46h
kube-system coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 46h
kube-system dashboard-metrics-scraper ClusterIP 10.233.59.12 <none> 8000/TCP 46h
kube-system kubernetes-dashboard ClusterIP 10.233.63.20 <none> 443/TCP 46h
But netchecker API gives the following answer :
[root#localhost ~]# curl http://X.X.X.X:31081/api/v1/connectivity_check
{"Message":"Connectivity check fails. Reason: there are absent or outdated pods; look up the payload","Absent":["netchecker-agent-hostnet-kk56x","netchecker-agent-hostnet-klldn","netchecker-agent-hostnet-r2vqs","netchecker-agent-hostnet-wqhjs"],"Outdated":["netchecker-agent-4jsgf","netchecker-agent-c9pcf","netchecker-agent-hostnet-jzbfv","netchecker-agent-vxgpf"]}
For an unknown reason, I cannot access the API from a cluster node with localhost, so I used a floating IP with OpenStack.
Here are some logs from the agent :
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-vjnwl_d8290268-3ea4-4e3c-acb4-295ab162a735/netchecker-agent/0.log
{"log":"I0701 13:04:01.814246 1 agent.go:135] Response status code: 200\n","stream":"stderr","time":"2020-07-01T13:04:01.81437579Z"}
{"log":"I0701 13:04:01.814272 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:04:01.814393199Z"}
{"log":"I0701 13:04:16.817398 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-vjnwl\n","stream":"stderr","time":"2020-07-01T13:04:16.817786735Z"}
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-hostnet-klldn_d5fa6e72-885f-44e1-97a6-880a25e6d6d6/netchecker-agent/0.log
{"log":"E0701 13:05:22.804428 1 agent.go:133] Error while sending info. Details: Post http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn: dial tcp 10.233.13.213:8081: i/o timeout\n","stream":"stderr","time":"2020-07-01T13:05:22.805138032Z"}
{"log":"I0701 13:05:22.804474 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:05:22.805190295Z"}
{"log":"I0701 13:05:37.807140 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn\n","stream":"stderr","time":"2020-07-01T13:05:37.807309111Z"}
Logs from the server do not indicate any error.
I tried to check DNS resolve with the following :
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- /bin/sh
/ $ nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
nslookup: can't resolve 'kubernetes.default'
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5
169.254.25.10 is the IP of the nodelocaldns, but it doesn't seem to query the coredns service deployed.
When I use nslookup netchecker-service.default.svc.cluster.local 10.233.0.3, with the coredns IP, I get a correct answer.
What can be wrong with my configuration ?
Thanks in advance
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.

Kube-state-metrics error: Failed to create client: ... i/o timeout

I'm running Kubernetes in virtual machines and going through the basic tutorials, currently Add logging and metrics to the PHP / Redis Guestbook example. I'm trying to install kube-state-metrics:
git clone https://github.com/kubernetes/kube-state-metrics.git kube-state-metrics
kubectl create -f kube-state-metrics/kubernetes
but it fails.
kubectl describe pod --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7
...
Warning Unhealthy 28m (x8 over 30m) kubelet, kubernetes-node1 Readiness probe failed: Get http://192.168.129.102:8080/healthz: dial tcp 192.168.129.102:8080: connect: connection refused
kubectl logs --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7 -c kube-state-metrics
I0514 17:29:26.980707 1 main.go:85] Using default collectors
I0514 17:29:26.980774 1 main.go:93] Using all namespace
I0514 17:29:26.980780 1 main.go:129] metric white-blacklisting: blacklisting the following items:
W0514 17:29:26.980800 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0514 17:29:26.983504 1 main.go:169] Testing communication with server
F0514 17:29:56.984025 1 main.go:137] Failed to create client: ERROR communicating with apiserver: Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout
I'm unsure if this 10.96.0.1 IP is correct. My virtual machines are in a bridged network 10.10.10.0/24 and a host-only network 192.168.59.0/24. When initializing Kubernetes I used the argument --pod-network-cidr=192.168.0.0/16 so that's one more IP range that I'd expect. But 10.96.0.1 looks unfamiliar.
I'm new to Kubernetes, just doing the basic tutorials, so I don't know what to do now. How to fix it or investigate further?
EDIT - additonal info:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubernetes-master Ready master 15d v1.14.1 10.10.10.11 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node1 Ready <none> 15d v1.14.1 10.10.10.5 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node2 Ready <none> 15d v1.14.1 10.10.10.98 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
The command I used to initialize the cluster:
sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=192.168.0.0/16
The reason for this is probably overlapping of Pod network with Node network - you set Pod network CIDR to 192.168.0.0/16 which your host-only network will be included into as its address is 192.168.59.0/24.
To solve this you can either change the pod network CIDR to 192.168.0.0/24 (it is not recommended as this will give you only 255 addresses for your pod networking)
You can also use different range for your Calico. If you want to do it on a running cluster here is an instruction.
Also other way I tried:
edit Calico manifest to different range (for example 10.0.0.0/8) - sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=10.0.0.0/8) and apply it after the init.
Another way would be using different CNI like Flannel (which uses 10.244.0.0/16).
You can find more information about ranges of CNI plugins here.

Why doesn't kube-proxy route traffic to another worker node?

I've deployed several different services and always get the same error.
The service is reachable on the node port from the machine where the pod is running. On the two other nodes I get timeouts.
The kube-proxy is running on all worker nodes and I can see in the logfiles from kube-proxy that the service port was added and the node port was opened.
In this case I've deployed the stars demo from calico
Kube-proxy log output:
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.229458 659 service.go:309] Adding new service port "management-ui/management-ui:" at 10.32.0.133:9001/TCP
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.257483 659 proxier.go:1427] Opened local port "nodePort for management-ui/management-ui:" (:30002/tcp)
The kube-proxy is listening on the port 30002
root#kuben1:/tmp# netstat -lanp | grep 30002
tcp6 0 0 :::30002 :::* LISTEN 659/kube-proxy
There are also some iptable rules defined:
root#kuben1:/tmp# iptables -L -t nat | grep management-ui
KUBE-MARK-MASQ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-MARK-MASQ tcp -- !10.200.0.0/16 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
The interesting part is that I can reach the service IP from any worker node
root#kubem1:/tmp# kubectl get svc -n management-ui
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
management-ui NodePort 10.32.0.133 <none> 9001:30002/TCP 52m
The service IP/port can be accessed from any worker node if I do a "curl http://10.32.0.133:9001"
I don't understand why kube-proxy does not "route" this properly...
Has anyone a hint where I can find the error?
Here some cluster specs:
This is a hand build cluster inspired by Kelsey Hightower's "kubernetes the hard way" guide.
6 Nodes (3 master: 3 worker) local vms
OS: Ubuntu 18.04
K8s: v1.13.0
Docker: 18.9.3
Cni: calico
Component status on the master nodes looks okay
root#kubem1:/tmp# kubectl get componentstatus
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
The worker nodes are looking okay if I trust kubectl
root#kubem1:/tmp# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kuben1 Ready <none> 39d v1.13.0 192.168.178.77 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben2 Ready <none> 39d v1.13.0 192.168.178.78 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben3 Ready <none> 39d v1.13.0 192.168.178.79 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
As asked by P Ekambaram:
root#kubem1:/tmp# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-bgjdg 1/1 Running 5 40d
calico-node-nwkqw 1/1 Running 5 40d
calico-node-vrwn4 1/1 Running 5 40d
coredns-69cbb76ff8-fpssw 1/1 Running 5 40d
coredns-69cbb76ff8-tm6r8 1/1 Running 5 40d
kubernetes-dashboard-57df4db6b-2xrmb 1/1 Running 5 40d
I've found a solution for my "Problem".
This behavior was caused by a change in Docker v1.13.x and the issue was fixed in kubernetes with version 1.8.
The easy solution was to change the forward rules via iptables.
Run the following cmd on all worker nodes: "iptables -A FORWARD -j ACCEPT"
To fix it the right way i had to tell the kube-proxy the cidr for the pods.
Theoretical that could be solved in two ways:
Add "--cluster-cidr=10.0.0.0/16" as argument to the kube-proxy command line(in my case in the systemd service file)
Add 'clusterCIDR: "10.0.0.0/16"' to the kubeconfig file for kube-proxy
In my case the cmd line argument doesn't had any effect.
As i've added the line to my kubeconfig file and restarted the kube-proxy on all worker nodes everything works well.
Here is the github merge request for this "FORWARD" issue: link

ipvsadm not showing any entry in kubeadm cluster

I have installed kubeadm and created service and pod:
packet#test:~$ kubectl get pod
NAME READY STATUS RESTARTS AGE
udp-server-deployment-6f87f5c9-466ft 1/1 Running 0 5m
udp-server-deployment-6f87f5c9-5j9rt 1/1 Running 0 5m
udp-server-deployment-6f87f5c9-g9wrr 1/1 Running 0 5m
udp-server-deployment-6f87f5c9-ntbkc 1/1 Running 0 5m
udp-server-deployment-6f87f5c9-xlbjq 1/1 Running 0 5m
packet#test:~$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1h
udp-server-service NodePort 10.102.67.0 <none> 10001:30001/UDP 6m
but still I am not able to access udp-server pod:
packet#test:~$ curl http://192.168.43.161:30001
curl: (7) Failed to connect to 192.168.43.161 port 30001: Connection refused
while debugging i could see kube-proxy is running but there is no entry in IPVS:
root#test:~# ps auxw | grep kube-proxy
root 4050 0.5 0.7 44340 29952 ? Ssl 14:33 0:25 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf
root 6094 0.0 0.0 14224 968 pts/1 S+ 15:48 0:00 grep --color=auto kube-proxy
root#test:~# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
Seems to be there is no entry in ipvsadm causing connection time out.
Regards, Ranjith
From this issue (putting aside the load balancer part),
Both externalIPs and status.loadBalancer.ingress[].ip seem to be ignored by kube-proxy in IPVS mode, so external traffic is completely unrouteable.
In contrast, kube-proxy in iptables mode creates DNAT/SNAT rules for external and loadbalancer IPs.
So check if adding a network plugin (flannel, Calico, ...) would improve the situation.
Or check out cloudnativelabs/kube-router, which is also ipvs-based.
A lean yet powerful alternative to several network components used in typical Kubernetes clusters.
All this from a single DaemonSet/Binary. It doesn't get any easier.
Since curl use tcp connection, while 30001 is a udp port, they don't work together, try a udp probe tool, like nmap.
initially I have created VM(Linux VM) using virtual box(running on window),where I found this type of issue.
Now i have created VM(Linux VM) using virtual manager(running on Linux),in this set up there is no issue and every thing works fine.
It would be great if any one tell is there any restriction from virtual box?