Why Kube-proxy not working ? No DNS resolution [closed] - kubernetes

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
i've juste set up a fresh cluster with kubeadm and kubernetes 1.21. All pods are mark ready. But i can't access any of them. After digging into the problem, it appear that no DNS resolution is possible. It seems that kube-proxy does not work.
this is a log of a kube-proxy pods
I0712 05:50:46.511967 1 node.go:172] Successfully retrieved node IP: x.x.x.x
I0712 05:50:46.512039 1 server_others.go:140] Detected node IP x.x.x.x
W0712 05:50:46.512077 1 server_others.go:598] Unknown proxy mode "", assuming iptables proxy
I0712 05:50:46.545626 1 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary
I0712 05:50:46.545672 1 server_others.go:212] Using iptables Proxier.
I0712 05:50:46.545692 1 server_others.go:219] creating dualStackProxier for iptables.
W0712 05:50:46.545715 1 server_others.go:512] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6
I0712 05:50:46.546089 1 server.go:643] Version: v1.21.2
I0712 05:50:46.549861 1 conntrack.go:52] Setting nf_conntrack_max to 196608
I0712 05:50:46.550300 1 config.go:224] Starting endpoint slice config controller
I0712 05:50:46.550338 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0712 05:50:46.550332 1 config.go:315] Starting service config controller
I0712 05:50:46.550354 1 shared_informer.go:240] Waiting for caches to sync for service config
W0712 05:50:46.553020 1 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
W0712 05:50:46.555115 1 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
I0712 05:50:46.650614 1 shared_informer.go:247] Caches are synced for service config
I0712 05:50:46.650634 1 shared_informer.go:247] Caches are synced for endpoint slice config
W0712 05:57:14.556916 1 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
W0712 06:06:34.558550 1 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
and this is my pods running :
kube-system pod/coredns-558bd4d5db-qpf5m 1/1 Running 1 8h
kube-system pod/coredns-558bd4d5db-r5jwz 1/1 Running 0 8h
kube-system pod/etcd-master2 1/1 Running 3 20h
kube-system pod/kube-apiserver-master2 1/1 Running 3 20h
kube-system pod/kube-controller-manager-master2 1/1 Running 3 8h
kube-system pod/kube-flannel-ds-b7xrm 1/1 Running 0 8h
kube-system pod/kube-flannel-ds-hcn7f 1/1 Running 0 8h
kube-system pod/kube-flannel-ds-rx8j6 1/1 Running 1 8h
kube-system pod/kube-flannel-ds-wc2jc 1/1 Running 0 8h
kube-system pod/kube-proxy-48wmr 1/1 Running 0 25m
kube-system pod/kube-proxy-4gw8t 1/1 Running 0 25m
kube-system pod/kube-proxy-h9djp 1/1 Running 0 25m
kube-system pod/kube-proxy-r4k9t 1/1 Running 0 24m
kube-system pod/kube-scheduler-master2 1/1 Running 3 20h
the command kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox nslookup kubernetes.default give me :
Address 1: x.x.x.x
nslookup: can't resolve 'kubernetes.default'
pod "busybox" deleted
pod default/busybox terminated (Error)
My iptables rules :
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes health check service ports */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
DOCKER-USER all -- anywhere anywhere
DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- 10.244.0.0/16 anywhere
ACCEPT all -- anywhere 10.244.0.0/16
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain DOCKER (1 references)
target prot opt source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target prot opt source destination
DROP all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all -- anywhere anywhere
Chain KUBE-EXTERNAL-SERVICES (2 references)
target prot opt source destination
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
DROP all -- !127.0.0.0/8 127.0.0.0/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-FORWARD (1 references)
target prot opt source destination
DROP all -- anywhere anywhere ctstate INVALID
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- anywhere anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-NODEPORTS (1 references)
target prot opt source destination
Chain KUBE-PROXY-CANARY (0 references)
target prot opt source destination
Chain KUBE-SERVICES (2 references)
target prot opt source destination
any idea?
[Edit]
#kubectl edit cm -n kube-system kubelet-config-1.21
apiVersion: v1
data:
kubelet: |
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
#kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 22h

Kube-proxy it's network service.
DNS-provider responsible for DNS resolution. As I see, you already have coredns installed.
Check your kubelet configuration. It should point to correct service, and this service should be accessible within your pods.
Also please check if your firewalld or iptables service is disabled on all nodes.
Like this:
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: "/var/lib/kubernetes/ca.pem"
authorization:
mode: Webhook
clusterDomain: "cluster.local"
clusterDNS:
- "10.33.0.10"
kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.33.0.10 <none> 53/UDP,53/TCP,9153/TCP 35h
And then:
kubectl exec -ti net-diag-86589fd8f5-r28qq -- nslookup kubernetes.default
Server: 10.33.0.10
Address: 10.33.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.33.0.1
UPD.
I Just noticed that you have Docker as a container runtime and flannel as a network provider. Per my understanding the problem may be is that Docker messing around with your iptables rules, try to set all docker rules as prmissive and see if it'll work.
I'm not a big expert in iptables configuration but something like this may help:
https://unrouted.io/2017/08/15/docker-firewall/
Also if you are using Flannel, make sure that you are using correct iface option. It may be critical if you are running non-cloud installations.
https://github.com/flannel-io/flannel/blob/master/Documentation/configuration.md#key-command-line-options

Related

VPC native Clusters in GKE can't communicate in GKE 1.14

I have created two seperate GKE clusters on K8s 1.14.10.
VPN access to in-house network not working after GKE cluster upgrade to 1.14.6
I have followed this and the IP masquerading agent documentation.
I have tried to test this using a client pod and server pod to exchange messages.
I'm using Internal node IP to send message and created a ClusterIP to expose the pods.
I have allowed requests for every instance in firewall rules for ingress and egress i.e 0.0.0.0/0.
Pic:This is the description of the cluster which I have created
The config map of the IP masquerading agent stays the same as in the documentation.
I'm able to ping the other node from within the pod but curl request says connection refused and tcpdump shows no data.
Problem:
I need to communicate from cluster A to cluster B in gke 1.14 with ipmasquerading set to true. I either get connection refused or i/o timeout. I have tried using internal and external node IPs as well as using a loadbalancer.
You have provided quite general information and without details I cannot provide specific scenario answer. It might be related to how did you create clusters or other firewalls settings. Due to that I will provide correct steps to creation and configuration 2 clusters with firewall and masquerade. Maybe you will be able to find which step you missed or misconfigured.
Clusters configuration (node,pods,svc) are on the bottom of the answer.
1. Create VPC and 2 clusters
In docs it says about 2 different projects but you can do it in one project.
Good example of VPC creation and 2 clusters can be found in GKE docs. Create VPC and Crate 2 clusters. In cluster Tier1 you can enable NetworkPolicy now instead of enabling it later.
After that you will need to create Firewall Rules. You will also need to add ICMP protocol to firewall rule.
At this point you should be able to ping between nodes from 2 clusters.
For additional Firewall rules (allowing connection between pods, svc, etc) please check this docs.
2. Enable IP masquerade agent
As mentioned in docs, to run IPMasquerade:
The ip-masq-agent DaemonSet is automatically installed as an add-on with --nomasq-all-reserved-ranges argument in a GKE cluster, if one or more of the following is true:
The cluster has a network policy.
OR
The Pod's CIDR range is not within 10.0.0.0/8.
It mean that tier-2-cluster already have ip-masq-agent in kube-system namespace (because The Pod's CIDR range is not within 10.0.0.0/8.). And if you enabled NetworkPolicy during creation of tier-1-cluster it should be have also installed. If not, you will need to enable it using command:
$ gcloud container clusters update tier-1-cluster --update-addons=NetworkPolicy=ENABLED --zone=us-central1-a
To verify if everything is ok you have to check if Daemonset ip-masq-agent pods were created. (Each pod for node).
$ kubectl get ds ip-masq-agent -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ip-masq-agent 3 3 3 3 3 beta.kubernetes.io/masq-agent-ds-ready=true 168m
If you will SSH to any of your nodes you will be able to see default iptables entries.
$ sudo iptables -t nat -L IP-MASQ
Chain IP-MASQ (1 references)
target prot opt source destination
RETURN all -- anywhere 169.254.0.0/16 /* ip-masq: local traffic is not subject to MASQUERADE */
RETURN all -- anywhere 10.0.0.0/8 /* ip-masq: RFC 1918 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 172.16.0.0/12 /* ip-masq: RFC 1918 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 192.168.0.0/16 /* ip-masq: RFC 1918 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 240.0.0.0/4 /* ip-masq: RFC 5735 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 192.0.2.0/24 /* ip-masq: RFC 5737 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 198.51.100.0/24 /* ip-masq: RFC 5737 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 203.0.113.0/24 /* ip-masq: RFC 5737 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 100.64.0.0/10 /* ip-masq: RFC 6598 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 198.18.0.0/15 /* ip-masq: RFC 6815 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 192.0.0.0/24 /* ip-masq: RFC 6890 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 192.88.99.0/24 /* ip-masq: RFC 7526 reserved range is not subject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq: outbound traffic is subject to MASQUERADE (must be last in chain) */
3. Deploy test application
I've used Hello application from GKE docs and deployed on both Clusters. In addition I have also deployed ubuntu image for tests.
4. Apply proper configuration for IPMasquerade
This config need to be on the source cluster.
In short, if destination CIDR is in nonMasqueradeCIDRs:, it will show it internal IP, otherwise it will show NodeIP as source.
Save to file config below text:
nonMasqueradeCIDRs:
- 10.0.0.0/8
resyncInterval: 2s
masqLinkLocal: true
Create IPMasquarade ConfigMap
$ kubectl create configmap ip-masq-agent --from-file config --namespace kube-system
It will overwrite iptables configuration
$ sudo iptables -t nat -L IP-MASQ
Chain IP-MASQ (2 references)
target prot opt source destination
RETURN all -- anywhere 10.0.0.0/8 /* ip-masq-agent: local traffic is not subject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) */
5. Tests:
When IP is Masqueraded
SSH to Node form Tier2 cluster and run:
sudo toolbox bash
apt-get update
apt install -y tcpdump
Now you should listen using command below. Port 32502 is NodePort service from Tier 2 Cluster
tcpdump -i eth0 -nn -s0 -v port 32502
In Cluster Tier1 you need to enter ubuntu pod and curl NodeIP:NodePort
$ kubectl exec -ti ubuntu -- bin/bash
You will need to install curl apt-get install curl.
curl NodeIP:NodePort (Node which is listening, NodePort from service from Cluster Tier 2).
CLI:
root#ubuntu:/# curl 172.16.4.3:32502
Hello, world!
Version: 2.0.0
Hostname: hello-world-deployment-7f67f479f5-h4wdm
On Node you can see entry like:
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
12:53:30.321641 IP (tos 0x0, ttl 63, id 25373, offset 0, flags [DF], proto TCP (6), length 60)
10.0.4.4.56018 > 172.16.4.3.32502: Flags [S], cksum 0x8648 (correct), seq 3001889856
10.0.4.4 is NodeIP where Ubuntu pod is located.
When IP was not Masqueraded
Remove ConfigMap from Cluster Tier 1
$ kubectl delete cm ip-masq-agent -n kube-system
Change in file config CIDR to 172.16.4.0/22 which is Tier 2 nodes pool and reapply CM
$ kubectl create configmap ip-masq-agent --from-file config --namespace kube-system
SSH to any node from Tier 1 to check if iptables rules were changed.
sudo iptables -t nat -L IP-MASQ
Chain IP-MASQ (2 references)
target prot opt source destination
RETURN all -- anywhere 172.16.4.0/22 /* ip-masq-agent: local traffic is not subject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) */
Now for test I have again used Ubuntu pod and curl the same ip like before.
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:16:50.316234 IP (tos 0x0, ttl 63, id 53160, offset 0, flags [DF], proto TCP (6), length 60)
10.4.2.8.57876 > 172.16.4.3.32502
10.4.2.8 is internal IP of Ubuntu pod.
Configuration for Tests:
TIER1
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/hello-world-deployment-7f67f479f5-b2qqz 1/1 Running 0 15m 10.4.1.8 gke-tier-1-cluster-default-pool-e006097b-5tnj <none> <none>
pod/hello-world-deployment-7f67f479f5-shqrt 1/1 Running 0 15m 10.4.2.5 gke-tier-1-cluster-default-pool-e006097b-lfvh <none> <none>
pod/hello-world-deployment-7f67f479f5-x7jvr 1/1 Running 0 15m 10.4.0.8 gke-tier-1-cluster-default-pool-e006097b-1wbf <none> <none>
ubuntu 1/1 Running 0 91s 10.4.2.8 gke-tier-1-cluster-default-pool-e006097b-lfvh <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/hello-world NodePort 10.0.36.46 <none> 60000:31694/TCP 14m department=world,greeting=hello
service/kubernetes ClusterIP 10.0.32.1 <none> 443/TCP 115m <none>
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/gke-tier-1-cluster-default-pool-e006097b-1wbf Ready <none> 115m v1.14.10-gke.36 10.0.4.2 35.184.38.21 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
node/gke-tier-1-cluster-default-pool-e006097b-5tnj Ready <none> 115m v1.14.10-gke.36 10.0.4.3 35.184.207.20 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
node/gke-tier-1-cluster-default-pool-e006097b-lfvh Ready <none> 115m v1.14.10-gke.36 10.0.4.4 35.226.105.31 Container-Optimized OS from Google 4.14.138+ docker://18.9.7<none> 100m v1.14.10-gke.36 10.0.4.4 35.226.105.31 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
TIER2
$ kubectl get pods,svc,nodes -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/hello-world-deployment-7f67f479f5-92zvk 1/1 Running 0 12m 172.20.1.5 gke-tier-2-cluster-default-pool-57b1cc66-xqt5 <none> <none>
pod/hello-world-deployment-7f67f479f5-h4wdm 1/1 Running 0 12m 172.20.1.6 gke-tier-2-cluster-default-pool-57b1cc66-xqt5 <none> <none>
pod/hello-world-deployment-7f67f479f5-m85jn 1/1 Running 0 12m 172.20.1.7 gke-tier-2-cluster-default-pool-57b1cc66-xqt5 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/hello-world NodePort 172.16.24.206 <none> 60000:32502/TCP 12m department=world,greeting=hello
service/kubernetes ClusterIP 172.16.16.1 <none> 443/TCP 113m <none>
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/gke-tier-2-cluster-default-pool-57b1cc66-84ng Ready <none> 112m v1.14.10-gke.36 172.16.4.2 35.184.118.151 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
node/gke-tier-2-cluster-default-pool-57b1cc66-mlmn Ready <none> 112m v1.14.10-gke.36 172.16.4.3 35.238.231.160 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
node/gke-tier-2-cluster-default-pool-57b1cc66-xqt5 Ready <none> 112m v1.14.10-gke.36 172.16.4.4 35.202.94.194 Container-Optimized OS from Google 4.14.138+ docker://18.9.7

Unable to acccess nginx pod across nodes using ClusterIP

I have created nginx deployment and nginx service(ClusterIP) to access nginx pod. But not able to access pod through cluster IP across nodes other than node where pod is scheduled.
I tried looking for IPtable too. But do not DNAT entry over there.
root#kdm-master-1:~# k get all -A -o wide |grep nginx
default pod/nginx-6db489d4b7-pfkm9 1/1 Running 0 3h16m 10.244.1.3 kdm-worker-1 <none> <none>
default service/nginx ClusterIP 10.102.239.131 <none> 80/TCP 3h20m run=nginx
default deployment.apps/nginx 1/1 1 1 3h32m nginx nginx run=nginx
default replicaset.apps/nginx-6db489d4b7 1 1 1 3h32m nginx nginx pod-template-hash=6db489d4b7,run=nginx
IP table:
root#kdm-master-1:~# iptables -L -t nat|grep nginx
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.102.239.131 /* default/nginx:80-80 cluster IP */ tcp dpt:http
KUBE-SVC-OVTWZ4GROBJZO4C5 tcp -- anywhere 10.102.239.131 /* default/nginx:80-80 cluster IP */ tcp dpt:http
# Warning: iptables-legacy tables present, use iptables-legacy to see them
Please advice how can I resolve it?
set net.ipv4.ip_forward=1 in /etc/sysctl.conf
run sysctl --system
This will resolve the issue and one will be able able to access the pod from any node.

Service won't allow traffic to pods after re-creating deployments/pods on k8s centos7 baremetal cluster: No route to host

Created a bare metal multiple master cluster 1.18 release:
NAME STATUS ROLES AGE VERSION
master1 Ready master 9d v1.18.0
master2 Ready master 9d v1.18.0
master3 Ready master 9d v1.18.0
worker1 Ready <none> 9d v1.18.0
worker2 Ready <none> 9d v1.18.0
worker3 Ready <none> 9d v1.18.0
kubectl create deploy la --image=linuxacademycontent/kubeserve:v1
kubectl scale deploy la --replica=6
kubectl get pod
curlpod 1/1 Running 0 55m
la-65776bc88f-4kp95 1/1 Running 0 42m
la-65776bc88f-7lb9f 1/1 Running 0 42m
la-65776bc88f-dgzsz 1/1 Running 0 42m
la-65776bc88f-l2hs2 1/1 Running 0 43m
la-65776bc88f-vkcts 1/1 Running 0 42m
la-65776bc88f-wkm2k 1/1 Running 0 42m
kubectl expose deploy la --port=80 --target-port=80
kubectl get svc
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 75m
la ClusterIP 10.103.199.187 <none> 80/TCP 46m
kubectl get ep
kubernetes 172.31.16.67:6443 78m
la 10.244.1.6:80,10.244.1.7:80,10.244.2.5:80 + 3 more... 49m
I can verify all is working by curling from a testpod - curlpod
kubectl exec -it curlpod bash
This is v1 pod la-65776bc88f-dgzsz
Problem is if i now delete deploy (or delete all the pods), and recreate them, the service won't allow traffic to new pods.
Here is how to reproduce the problem.
kubectl delete deploy la
curlpod 1/1 Running 0 71m
la-65776bc88f-4kp95 1/1 Terminating 0 58m
la-65776bc88f-7lb9f 1/1 Terminating 0 58m
la-65776bc88f-dgzsz 1/1 Terminating 0 58m
la-65776bc88f-l2hs2 1/1 Terminating 0 59m
la-65776bc88f-vkcts 1/1 Terminating 0 58m
la-65776bc88f-wkm2k 1/1 Terminating 0 58m
kubectl create deploy la --image=linuxacademycontent/kubeserve:v1
kubectl scale deploy la --replicas=6
kubectl get pod
NAME READY STATUS RESTARTS AGE
curlpod 1/1 Running 0 73m
la-65776bc88f-5q7d8 1/1 Running 0 6s
la-65776bc88f-b69z2 1/1 Running 0 10s
la-65776bc88f-j2t4t 1/1 Running 0 6s
la-65776bc88f-ndxs5 1/1 Running 0 6s
la-65776bc88f-ns2gt 1/1 Running 0 6s
la-65776bc88f-vjp8x 1/1 Running 0 6s
All looks fine. However, service won't allow any traffic to the new pods
kubectl exec -it curlpod bash
root#curlpod:/app# curl la
curl: (7) Failed to connect to la port 80: No route to host
root#curlpod:/app# curl 10.103.199.187 #name resolution seems fine
curl: (7) Failed to connect to la port 80: No route to host
I can still reach pods individually. eg la-65776bc88f-5q7d8 has IP: 10.244.5.13
root#test-pod:/app# curl 10.244.5.13
This is v1 pod la-65776bc88f-5q7d8
So I thought it was service that won't route the traffic, but I can see service has no change and all new pods are listed as endpoints for the service.
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 93m
la ClusterIP 10.103.199.187 <none> 80/TCP 64m
kubectl get ep
NAME ENDPOINTS AGE
kubernetes 172.31.16.67:6443 93m
la 10.244.5.13,10.244.1.9:80,10.244.2.7:80 + 3 more... 64m
Somewhere something is blocking the traffic from service to new pods.
Now, if we restart the svc, the problem will go away!!
kubectl delete svc la
service "la" deleted
kubectl expose deploy la --port=80 --target-port=80
service/la exposed
kubectl get svc
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 108m
la ClusterIP 10.96.124.200 <none> 80/TCP 71s
kubectl exec -it curlpod bash
This is v1 pod la-65776bc88f-z24qn
Tried a few alternative, such as use yaml to create deployment, service, same problem.
Also tried using kubectl scale to scale down and up. However I create new pod in the deployment, the traffic from service to pod would always be blocked.
Any help or suggestion welcome!
Add Flannel log
kubectl logs --namespace kube-system kube-flannel-ds-amd64-tbglf
I0423 15:15:52.311421 1 main.go:518] Determining IP address of default interface
I0423 15:15:52.312644 1 main.go:531] Using interface with name eth0 and address 10.30.193.13
I0423 15:15:52.312669 1 main.go:548] Defaulting external address to interface address (10.30.193.13)
W0423 15:15:52.312689 1 client_config.go:517] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0423 15:15:52.323517 1 kube.go:119] Waiting 10m0s for node controller to sync
I0423 15:15:52.324491 1 kube.go:306] Starting kube subnet manager
I0423 15:15:53.323919 1 kube.go:126] Node controller sync successful
I0423 15:15:53.323976 1 main.go:246] Created subnet manager: Kubernetes Subnet Manager - uk02sybk8smp3
I0423 15:15:53.323983 1 main.go:249] Installing signal handlers
I0423 15:15:53.324084 1 main.go:390] Found network config - Backend type: vxlan
I0423 15:15:53.324176 1 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I0423 15:15:53.365832 1 main.go:355] Current network or subnet (10.244.0.0/16, 10.244.2.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I0423 15:15:53.367815 1 iptables.go:167] Deleting iptables rule: -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN
I0423 15:15:53.409089 1 iptables.go:167] Deleting iptables rule: -s 0.0.0.0/0 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0423 15:15:53.410118 1 iptables.go:167] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN
I0423 15:15:53.411017 1 iptables.go:167] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -j MASQUERADE --random-fully
I0423 15:15:53.412076 1 main.go:305] Setting up masking rules
I0423 15:15:53.412916 1 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0423 15:15:53.412999 1 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0423 15:15:53.413009 1 main.go:325] Running backend.
I0423 15:15:53.413017 1 main.go:343] Waiting for all goroutines to exit
I0423 15:15:53.413038 1 vxlan_network.go:60] watching for new subnet leases
I0423 15:15:53.510136 1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0423 15:15:53.510158 1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I0423 15:15:53.510293 1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0423 15:15:53.510301 1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 -j ACCEPT
I0423 15:15:53.511096 1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0423 15:15:53.512091 1 iptables.go:167] Deleting iptables rule: ! -s 10.244.0.0/16 -d 10.244.2.0/24 -j RETURN
I0423 15:15:53.512206 1 iptables.go:167] Deleting iptables rule: -d 10.244.0.0/16 -j ACCEPT
I0423 15:15:53.512998 1 iptables.go:167] Deleting iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE --random-fully
I0423 15:15:53.513970 1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I0423 15:15:53.609595 1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 -j ACCEPT
I0423 15:15:53.612312 1 iptables.go:155] Adding iptables rule: -d 10.244.0.0/16 -j ACCEPT
I0423 15:15:53.613132 1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0423 15:15:53.710527 1 iptables.go:155] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.2.0/24 -j RETURN
I0423 15:15:53.712938 1 iptables.go:155] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE --random-fully
kube-proxy is flooded with
E0504 08:25:36.251542 1 proxier.go:1533] Failed to sync endpoint for service: 10.30.218.46:30080/TCP, err: parseIP Error ip=[10 244 5 3 0 0 0 0 0 0 0 0 0 0 0 0]
E0504 08:25:36.251632 1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[10 244 5 3 0 0 0 0 0 0 0 0 0 0 0 0]

Kubernetes pods cannot communicate outbound

I have installed Kubernetes v1.13.10 on a group of VMs running CentOS 7. When I deploy pods, they can connect to one another but cannot connect to anything outside of the cluster. The CoreDNS pods have these errors in the log:
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. A: unreachable backend: read udp 172.21.0.33:48105->10.20.10.52:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. AAAA: unreachable backend: read udp 172.21.0.33:49098->10.20.10.51:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. AAAA: unreachable backend: read udp 172.21.0.33:53113->10.20.10.51:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. A: unreachable backend: read udp 172.21.0.33:39648->10.20.10.51:53: i/o timeout
The IPs 10.20.10.51 and 10.20.10.52 are the internal DNS servers and are reachable from the nodes. I did a Wireshark capture from the DNS servers, and I see the traffic is coming in from the CoreDNS pod IP address 172.21.0.33. There would be no route for the DNS servers to get back to that IP as it isn't routable outside of the Kubernetes cluster.
My understanding is that an iptables rule should be implemented to nat the pod IPs to the address of the node when a pod is trying to communicate outbound (correct?). Below is the POSTROUTING chain in iptables:
[root#kube-aci-1 ~]# iptables -t nat -L POSTROUTING -v --line-number
Chain POSTROUTING (policy ACCEPT 23 packets, 2324 bytes)
num pkts bytes target prot opt in out source destination
1 1990 166K KUBE-POSTROUTING all -- any any anywhere anywhere /* kubernetes postrouting rules */
2 0 0 MASQUERADE all -- any ens192.152 172.21.0.0/16 anywhere
Line 1 was added by kube-proxy and line 2 was a line I manually added to try to nat anything coming from the pod subnet 172.21.0.0/16 to the node interface ens192.152, but that didn't work.
Here's the kube-proxy logs:
[root#kube-aci-1 ~]# kubectl logs kube-proxy-llq22 -n kube-system
W1117 16:31:59.225870 1 proxier.go:498] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.232006 1 proxier.go:498] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.233727 1 proxier.go:498] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.235700 1 proxier.go:498] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.255278 1 server_others.go:296] Flag proxy-mode="" unknown, assuming iptables proxy
I1117 16:31:59.289360 1 server_others.go:148] Using iptables Proxier.
I1117 16:31:59.296021 1 server_others.go:178] Tearing down inactive rules.
I1117 16:31:59.324352 1 server.go:484] Version: v1.13.10
I1117 16:31:59.335846 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1117 16:31:59.336443 1 config.go:102] Starting endpoints config controller
I1117 16:31:59.336466 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1117 16:31:59.336493 1 config.go:202] Starting service config controller
I1117 16:31:59.336499 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1117 16:31:59.436617 1 controller_utils.go:1034] Caches are synced for service config controller
I1117 16:31:59.436739 1 controller_utils.go:1034] Caches are synced for endpoints config controller
I have tried flushing the iptables nat table as well as restarted kube-proxy on all nodes, but the problem still persisted. Any clues in the output above, or thoughts on further troubleshooting?
Output of kubectl get nodes:
[root#kube-aci-1 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kube-aci-1 Ready master 85d v1.13.10 10.10.52.217 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://1.13.1
kube-aci-2 Ready <none> 85d v1.13.10 10.10.52.218 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://1.13.1
Turns out it is necessary to use a subnet that is routable on the network with the CNI in use if outbound communication from pods is necessary. I made the subnet routable on the external network and the pods can now communicate outbound.

Why doesn't kube-proxy route traffic to another worker node?

I've deployed several different services and always get the same error.
The service is reachable on the node port from the machine where the pod is running. On the two other nodes I get timeouts.
The kube-proxy is running on all worker nodes and I can see in the logfiles from kube-proxy that the service port was added and the node port was opened.
In this case I've deployed the stars demo from calico
Kube-proxy log output:
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.229458 659 service.go:309] Adding new service port "management-ui/management-ui:" at 10.32.0.133:9001/TCP
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.257483 659 proxier.go:1427] Opened local port "nodePort for management-ui/management-ui:" (:30002/tcp)
The kube-proxy is listening on the port 30002
root#kuben1:/tmp# netstat -lanp | grep 30002
tcp6 0 0 :::30002 :::* LISTEN 659/kube-proxy
There are also some iptable rules defined:
root#kuben1:/tmp# iptables -L -t nat | grep management-ui
KUBE-MARK-MASQ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-MARK-MASQ tcp -- !10.200.0.0/16 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
The interesting part is that I can reach the service IP from any worker node
root#kubem1:/tmp# kubectl get svc -n management-ui
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
management-ui NodePort 10.32.0.133 <none> 9001:30002/TCP 52m
The service IP/port can be accessed from any worker node if I do a "curl http://10.32.0.133:9001"
I don't understand why kube-proxy does not "route" this properly...
Has anyone a hint where I can find the error?
Here some cluster specs:
This is a hand build cluster inspired by Kelsey Hightower's "kubernetes the hard way" guide.
6 Nodes (3 master: 3 worker) local vms
OS: Ubuntu 18.04
K8s: v1.13.0
Docker: 18.9.3
Cni: calico
Component status on the master nodes looks okay
root#kubem1:/tmp# kubectl get componentstatus
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
The worker nodes are looking okay if I trust kubectl
root#kubem1:/tmp# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kuben1 Ready <none> 39d v1.13.0 192.168.178.77 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben2 Ready <none> 39d v1.13.0 192.168.178.78 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben3 Ready <none> 39d v1.13.0 192.168.178.79 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
As asked by P Ekambaram:
root#kubem1:/tmp# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-bgjdg 1/1 Running 5 40d
calico-node-nwkqw 1/1 Running 5 40d
calico-node-vrwn4 1/1 Running 5 40d
coredns-69cbb76ff8-fpssw 1/1 Running 5 40d
coredns-69cbb76ff8-tm6r8 1/1 Running 5 40d
kubernetes-dashboard-57df4db6b-2xrmb 1/1 Running 5 40d
I've found a solution for my "Problem".
This behavior was caused by a change in Docker v1.13.x and the issue was fixed in kubernetes with version 1.8.
The easy solution was to change the forward rules via iptables.
Run the following cmd on all worker nodes: "iptables -A FORWARD -j ACCEPT"
To fix it the right way i had to tell the kube-proxy the cidr for the pods.
Theoretical that could be solved in two ways:
Add "--cluster-cidr=10.0.0.0/16" as argument to the kube-proxy command line(in my case in the systemd service file)
Add 'clusterCIDR: "10.0.0.0/16"' to the kubeconfig file for kube-proxy
In my case the cmd line argument doesn't had any effect.
As i've added the line to my kubeconfig file and restarted the kube-proxy on all worker nodes everything works well.
Here is the github merge request for this "FORWARD" issue: link