I am following below document to create the service. this pod is running on both nodes. when I make a request using master node IP, not getting any response.
curl http://192.168.15.101:30534
https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/
> # kubectl cluster-info Kubernetes master is running at https://192.168.15.101:6443 KubeDNS is running at
> https://192.168.15.101:6443/api/v1/proxy/namespaces/kube-
system/services/kube-dns
Here is my routes
# ip r
default via 192.168.15.1 dev eth1
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.0.0/16 dev flannel.1
10.244.0.0/14 dev docker0 proto kernel scope link src 10.244.0.1
169.254.0.0/16 dev eth1 scope link metric 1003
192.168.15.0/24 dev eth1 proto kernel scope link src 192.168.15.101
kubectl get pods --selector="run=load-balancer-example" --output=wide
NAME READY STATUS RESTARTS AGE IP NODE
hello-world-3272482377-225z8 1/1 Running 1 18h 10.244.1.81 node-01
hello-world-3272482377-f6qqd 1/1 Running 1 18h 10.244.2.78 node-02
I check the iptables rules, NAT is set for cluster IP.
how can I troubleshoot this connection issue?
Thanks
-SR
Related
I have a master and two worker node architecture. I had single node and later added two nodes using kubeadm, I can see worker nodes when I use kubectl get nodes.
But cluster loses internet connectivity soon after workers joins.
I found this error when I try to deploy nginx web server to worker.
I have installed Cal as described here.
But it does not show CoreDNS pod with kubectl get pods --all-namespaces.
When I use route -n on a worker node :
$route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.100.1 0.0.0.0 UG 100 0 0 ens32
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 ens32
192.168.100.1 192.168.100.222 255.255.255.255 UGH 0 0 0 tunl0
192.168.100.1 0.0.0.0 255.255.255.255 UH 100 0 0 ens32
Here 192.168.100.1 is my network gateway server and 192.168.100.222 is my master node.
All 3 nodes running Ubuntu 18.04 LTS. As I described earlier, I had single node cluster and then I added two worker nodes using kubeadm join
I am really new to Kubernetes.What am I missing here?
I have created two seperate GKE clusters on K8s 1.14.10.
VPN access to in-house network not working after GKE cluster upgrade to 1.14.6
I have followed this and the IP masquerading agent documentation.
I have tried to test this using a client pod and server pod to exchange messages.
I'm using Internal node IP to send message and created a ClusterIP to expose the pods.
I have allowed requests for every instance in firewall rules for ingress and egress i.e 0.0.0.0/0.
Pic:This is the description of the cluster which I have created
The config map of the IP masquerading agent stays the same as in the documentation.
I'm able to ping the other node from within the pod but curl request says connection refused and tcpdump shows no data.
Problem:
I need to communicate from cluster A to cluster B in gke 1.14 with ipmasquerading set to true. I either get connection refused or i/o timeout. I have tried using internal and external node IPs as well as using a loadbalancer.
You have provided quite general information and without details I cannot provide specific scenario answer. It might be related to how did you create clusters or other firewalls settings. Due to that I will provide correct steps to creation and configuration 2 clusters with firewall and masquerade. Maybe you will be able to find which step you missed or misconfigured.
Clusters configuration (node,pods,svc) are on the bottom of the answer.
1. Create VPC and 2 clusters
In docs it says about 2 different projects but you can do it in one project.
Good example of VPC creation and 2 clusters can be found in GKE docs. Create VPC and Crate 2 clusters. In cluster Tier1 you can enable NetworkPolicy now instead of enabling it later.
After that you will need to create Firewall Rules. You will also need to add ICMP protocol to firewall rule.
At this point you should be able to ping between nodes from 2 clusters.
For additional Firewall rules (allowing connection between pods, svc, etc) please check this docs.
2. Enable IP masquerade agent
As mentioned in docs, to run IPMasquerade:
The ip-masq-agent DaemonSet is automatically installed as an add-on with --nomasq-all-reserved-ranges argument in a GKE cluster, if one or more of the following is true:
The cluster has a network policy.
OR
The Pod's CIDR range is not within 10.0.0.0/8.
It mean that tier-2-cluster already have ip-masq-agent in kube-system namespace (because The Pod's CIDR range is not within 10.0.0.0/8.). And if you enabled NetworkPolicy during creation of tier-1-cluster it should be have also installed. If not, you will need to enable it using command:
$ gcloud container clusters update tier-1-cluster --update-addons=NetworkPolicy=ENABLED --zone=us-central1-a
To verify if everything is ok you have to check if Daemonset ip-masq-agent pods were created. (Each pod for node).
$ kubectl get ds ip-masq-agent -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ip-masq-agent 3 3 3 3 3 beta.kubernetes.io/masq-agent-ds-ready=true 168m
If you will SSH to any of your nodes you will be able to see default iptables entries.
$ sudo iptables -t nat -L IP-MASQ
Chain IP-MASQ (1 references)
target prot opt source destination
RETURN all -- anywhere 169.254.0.0/16 /* ip-masq: local traffic is not subject to MASQUERADE */
RETURN all -- anywhere 10.0.0.0/8 /* ip-masq: RFC 1918 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 172.16.0.0/12 /* ip-masq: RFC 1918 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 192.168.0.0/16 /* ip-masq: RFC 1918 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 240.0.0.0/4 /* ip-masq: RFC 5735 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 192.0.2.0/24 /* ip-masq: RFC 5737 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 198.51.100.0/24 /* ip-masq: RFC 5737 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 203.0.113.0/24 /* ip-masq: RFC 5737 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 100.64.0.0/10 /* ip-masq: RFC 6598 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 198.18.0.0/15 /* ip-masq: RFC 6815 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 192.0.0.0/24 /* ip-masq: RFC 6890 reserved range is not subject to MASQUERADE */
RETURN all -- anywhere 192.88.99.0/24 /* ip-masq: RFC 7526 reserved range is not subject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq: outbound traffic is subject to MASQUERADE (must be last in chain) */
3. Deploy test application
I've used Hello application from GKE docs and deployed on both Clusters. In addition I have also deployed ubuntu image for tests.
4. Apply proper configuration for IPMasquerade
This config need to be on the source cluster.
In short, if destination CIDR is in nonMasqueradeCIDRs:, it will show it internal IP, otherwise it will show NodeIP as source.
Save to file config below text:
nonMasqueradeCIDRs:
- 10.0.0.0/8
resyncInterval: 2s
masqLinkLocal: true
Create IPMasquarade ConfigMap
$ kubectl create configmap ip-masq-agent --from-file config --namespace kube-system
It will overwrite iptables configuration
$ sudo iptables -t nat -L IP-MASQ
Chain IP-MASQ (2 references)
target prot opt source destination
RETURN all -- anywhere 10.0.0.0/8 /* ip-masq-agent: local traffic is not subject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) */
5. Tests:
When IP is Masqueraded
SSH to Node form Tier2 cluster and run:
sudo toolbox bash
apt-get update
apt install -y tcpdump
Now you should listen using command below. Port 32502 is NodePort service from Tier 2 Cluster
tcpdump -i eth0 -nn -s0 -v port 32502
In Cluster Tier1 you need to enter ubuntu pod and curl NodeIP:NodePort
$ kubectl exec -ti ubuntu -- bin/bash
You will need to install curl apt-get install curl.
curl NodeIP:NodePort (Node which is listening, NodePort from service from Cluster Tier 2).
CLI:
root#ubuntu:/# curl 172.16.4.3:32502
Hello, world!
Version: 2.0.0
Hostname: hello-world-deployment-7f67f479f5-h4wdm
On Node you can see entry like:
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
12:53:30.321641 IP (tos 0x0, ttl 63, id 25373, offset 0, flags [DF], proto TCP (6), length 60)
10.0.4.4.56018 > 172.16.4.3.32502: Flags [S], cksum 0x8648 (correct), seq 3001889856
10.0.4.4 is NodeIP where Ubuntu pod is located.
When IP was not Masqueraded
Remove ConfigMap from Cluster Tier 1
$ kubectl delete cm ip-masq-agent -n kube-system
Change in file config CIDR to 172.16.4.0/22 which is Tier 2 nodes pool and reapply CM
$ kubectl create configmap ip-masq-agent --from-file config --namespace kube-system
SSH to any node from Tier 1 to check if iptables rules were changed.
sudo iptables -t nat -L IP-MASQ
Chain IP-MASQ (2 references)
target prot opt source destination
RETURN all -- anywhere 172.16.4.0/22 /* ip-masq-agent: local traffic is not subject to MASQUERADE */
MASQUERADE all -- anywhere anywhere /* ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) */
Now for test I have again used Ubuntu pod and curl the same ip like before.
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:16:50.316234 IP (tos 0x0, ttl 63, id 53160, offset 0, flags [DF], proto TCP (6), length 60)
10.4.2.8.57876 > 172.16.4.3.32502
10.4.2.8 is internal IP of Ubuntu pod.
Configuration for Tests:
TIER1
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/hello-world-deployment-7f67f479f5-b2qqz 1/1 Running 0 15m 10.4.1.8 gke-tier-1-cluster-default-pool-e006097b-5tnj <none> <none>
pod/hello-world-deployment-7f67f479f5-shqrt 1/1 Running 0 15m 10.4.2.5 gke-tier-1-cluster-default-pool-e006097b-lfvh <none> <none>
pod/hello-world-deployment-7f67f479f5-x7jvr 1/1 Running 0 15m 10.4.0.8 gke-tier-1-cluster-default-pool-e006097b-1wbf <none> <none>
ubuntu 1/1 Running 0 91s 10.4.2.8 gke-tier-1-cluster-default-pool-e006097b-lfvh <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/hello-world NodePort 10.0.36.46 <none> 60000:31694/TCP 14m department=world,greeting=hello
service/kubernetes ClusterIP 10.0.32.1 <none> 443/TCP 115m <none>
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/gke-tier-1-cluster-default-pool-e006097b-1wbf Ready <none> 115m v1.14.10-gke.36 10.0.4.2 35.184.38.21 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
node/gke-tier-1-cluster-default-pool-e006097b-5tnj Ready <none> 115m v1.14.10-gke.36 10.0.4.3 35.184.207.20 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
node/gke-tier-1-cluster-default-pool-e006097b-lfvh Ready <none> 115m v1.14.10-gke.36 10.0.4.4 35.226.105.31 Container-Optimized OS from Google 4.14.138+ docker://18.9.7<none> 100m v1.14.10-gke.36 10.0.4.4 35.226.105.31 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
TIER2
$ kubectl get pods,svc,nodes -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/hello-world-deployment-7f67f479f5-92zvk 1/1 Running 0 12m 172.20.1.5 gke-tier-2-cluster-default-pool-57b1cc66-xqt5 <none> <none>
pod/hello-world-deployment-7f67f479f5-h4wdm 1/1 Running 0 12m 172.20.1.6 gke-tier-2-cluster-default-pool-57b1cc66-xqt5 <none> <none>
pod/hello-world-deployment-7f67f479f5-m85jn 1/1 Running 0 12m 172.20.1.7 gke-tier-2-cluster-default-pool-57b1cc66-xqt5 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/hello-world NodePort 172.16.24.206 <none> 60000:32502/TCP 12m department=world,greeting=hello
service/kubernetes ClusterIP 172.16.16.1 <none> 443/TCP 113m <none>
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/gke-tier-2-cluster-default-pool-57b1cc66-84ng Ready <none> 112m v1.14.10-gke.36 172.16.4.2 35.184.118.151 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
node/gke-tier-2-cluster-default-pool-57b1cc66-mlmn Ready <none> 112m v1.14.10-gke.36 172.16.4.3 35.238.231.160 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
node/gke-tier-2-cluster-default-pool-57b1cc66-xqt5 Ready <none> 112m v1.14.10-gke.36 172.16.4.4 35.202.94.194 Container-Optimized OS from Google 4.14.138+ docker://18.9.7
This is a Kubespray deployment using calico. All the defaults are were left as-is except for the fact that there is a proxy. Kubespray ran to the end without issues.
Access to Kubernetes services started failing and after investigation, there was no route to host to the coredns service. Accessing a K8S service by IP worked. Everything else seems to be correct, so I am left with a cluster that works, but without DNS.
Here is some background information:
Starting up a busybox container:
# nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
Now the output while explicitly defining the IP of one of the CoreDNS pods:
# nslookup kubernetes.default 10.233.0.3
;; connection timed out; no servers could be reached
Notice that telnet to the Kubernetes API works:
# telnet 10.233.0.1 443
Connected to 10.233.0.1
kube-proxy logs:
10.233.0.3 is the service IP for coredns. The last line looks concerning, even though it is INFO.
$ kubectl logs kube-proxy-45v8n -nkube-system
I1114 14:19:29.657685 1 node.go:135] Successfully retrieved node IP: X.59.172.20
I1114 14:19:29.657769 1 server_others.go:176] Using ipvs Proxier.
I1114 14:19:29.664959 1 server.go:529] Version: v1.16.0
I1114 14:19:29.665427 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I1114 14:19:29.669508 1 config.go:313] Starting service config controller
I1114 14:19:29.669566 1 shared_informer.go:197] Waiting for caches to sync for service config
I1114 14:19:29.669602 1 config.go:131] Starting endpoints config controller
I1114 14:19:29.669612 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I1114 14:19:29.769705 1 shared_informer.go:204] Caches are synced for service config
I1114 14:19:29.769756 1 shared_informer.go:204] Caches are synced for endpoints config
I1114 14:21:29.666256 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.124.23:53
I1114 14:21:29.666380 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.122.11:53
All pods are running without crashing/restarts etc. and otherwise services behave correctly.
IPVS looks correct. CoreDNS service is defined there:
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.233.0.1:443 rr
-> x.59.172.19:6443 Masq 1 0 0
-> x.59.172.20:6443 Masq 1 1 0
TCP 10.233.0.3:53 rr
-> 10.233.122.12:53 Masq 1 0 0
-> 10.233.124.24:53 Masq 1 0 0
TCP 10.233.0.3:9153 rr
-> 10.233.122.12:9153 Masq 1 0 0
-> 10.233.124.24:9153 Masq 1 0 0
TCP 10.233.51.168:3306 rr
-> x.59.172.23:6446 Masq 1 0 0
TCP 10.233.53.155:44134 rr
-> 10.233.89.20:44134 Masq 1 0 0
UDP 10.233.0.3:53 rr
-> 10.233.122.12:53 Masq 1 0 314
-> 10.233.124.24:53 Masq 1 0 312
Host routing also looks correct.
# ip r
default via x.59.172.17 dev ens3 proto dhcp src x.59.172.22 metric 100
10.233.87.0/24 via x.59.172.21 dev tunl0 proto bird onlink
blackhole 10.233.89.0/24 proto bird
10.233.89.20 dev calib88cf6925c2 scope link
10.233.89.21 dev califdffa38ed52 scope link
10.233.122.0/24 via x.59.172.19 dev tunl0 proto bird onlink
10.233.124.0/24 via x.59.172.20 dev tunl0 proto bird onlink
x.59.172.16/28 dev ens3 proto kernel scope link src x.59.172.22
x.59.172.17 dev ens3 proto dhcp scope link src x.59.172.22 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
I have redeployed this same cluster in separate environments with flannel and calico with iptables instead of ipvs. I have also disabled the docker http proxy after deploy temporarily. None of which makes any difference.
Also:
kube_service_addresses: 10.233.0.0/18
kube_pods_subnet: 10.233.64.0/18
(They do not overlap)
What is the next step in debugging this issue?
I highly recommend you to avoid using latest busybox image to troubleshoot DNS. There are few issues reported regarding dnslookup on versions newer than 1.28.
v 1.28.4
user#node1:~$ kubectl exec -ti busybox busybox | head -1
BusyBox v1.28.4 (2018-05-22 17:00:17 UTC) multi-call binary.
user#node1:~$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
Name: kubernetes.default
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
v 1.31.1
user#node1:~$ kubectl exec -ti busyboxlatest busybox | head -1
BusyBox v1.31.1 (2019-10-28 18:40:01 UTC) multi-call binary.
user#node1:~$ kubectl exec -ti busyboxlatest -- nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
command terminated with exit code 1
Going deeper and exploring more possibilities, I've reproduced your problem on GCP and after some digging I was able to figure out what is causing this communication problem.
GCE (Google Compute Engine) blocks traffic between hosts by default; we have to allow Calico traffic to flow between containers on different hosts.
According to calico documentation, you can do it by creating a firewall allowing this communication rule:
gcloud compute firewall-rules create calico-ipip --allow 4 --network "default" --source-ranges "10.128.0.0/9"
You can verify the rule with this command:
gcloud compute firewall-rules list
This is not present on the most recent calico documentation but it's still true and necessary.
Before creating firewall rule:
user#node1:~$ kubectl exec -ti busybox2 -- nslookup kubernetes.default
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
After creating firewall rule:
user#node1:~$ kubectl exec -ti busybox2 -- nslookup kubernetes.default
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
It doesn't matter if you bootstrap your cluster using kubespray or kubeadm, this problem will happen because calico needs to communicate between nodes and GCE is blocking it as default.
This is what works for me, I tried to install my k8s cluster using kubespray configured with calico as CNI and containerd as container runtime
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -F
[delete coredns pod]
I am using cilium in Kubernetes 1.12 in Direct Routing mode. It is working fine in IPv4 mode. We are using cilium/cilium:no-routes image and cloudnativelabs/kube-router to advertise the routes through BGP.
Now I would like to configure the same in IPv6 only Kubernetes cluster. But I found that kube-router pod is crashing and not creating the route entries for the --pod-network-cidr.
Following is the lab details -
master node: IPv6 private IP -fd0c:6493:12bf:2942::ac18:1164
Work node: IPv6 private IP -fd0c:6493:12bf:2942::ac18:1165
Public IP for both the nodes are IPv4 as i don't have IPv6 public IP.
IPv6 only K8s cluster is created as
master:
sudo kubeadm init --kubernetes-version v1.13.2 --pod-network-cidr=2001:2::/64 --apiserver-advertise-address=fd0c:6493:12bf:2942::ac18:1164 --token-ttl 0
worker:
sudo kubeadm join [fd0c:6493:12bf:2942::ac18:1164]:6443 --token 9k9sdq.el298rka0sjqy0ha --discovery-token-ca-cert-hash sha256:b830c22dc21561c9e9287275ecc675ec6de012662fabde3bd1aba03be66562eb
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP
EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master NotReady master 38h v1.13.2 fd0c:6493:12bf:2942::ac18:1164
<none> Ubuntu 18.10 4.18.0-13-generic docker://18.6.0
worker1 Ready <none> 38h v1.13.2 fd0c:6493:12bf:2942::ac18:1165
<none> Ubuntu 18.10 4.18.0-10-generic docker://18.6.0
master node is not ready as cni is not configured yet and codedns pods are not up yet.
Now install the cilium in Ipv6.
1. Run the etcd in master node.
sudo docker run -d --network=host \
--name "cilium-etcd" k8s.gcr.io/etcd:3.2.24 \
etcd -name etcd0 \
-advertise-client-urls http://[fd0c:6493:12bf:2942::ac18:1164]:4001 \
-listen-client-urls http://[fd0c:6493:12bf:2942::ac18:1164]:4001 \
-initial-advertise-peer-urls http://[fd0c:6493:12bf:2942::ac18:1164]:2382 \
-listen-peer-urls http://[fd0c:6493:12bf:2942::ac18:1164]:2382 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://[fd0c:6493:12bf:2942::ac18:1164]:2382 \
-initial-cluster-state new
Here [fd0c:6493:12bf:2942::ac18:1164] is master node ipv6 ip.
2. sudo mount bpffs /sys/fs/bpf -t bpf
3. Run the kuberouter.
Expected Result:
Kube-router adds the routing entry for the POD-CIDR corresponding to the each of the other nodes in the cluster. Node public IP will be set as GW. Following result is obtained for IPv4. For IPv4, routing entry is created in node-1 for node-2 ( public IP 10.40.139.196 and POD CIDR 10.244.1.0/24). Device is the interface where public IP is bound.
$ ip route show
10.244.1.0/24 via 10.40.139.196 dev ens4f0.116 proto 17
Note: For IPv6 only Kubernetes, --pod-network-cidr=2001:2::/64
Actual result -
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-86c58d9df4-g7nvf 0/1 ContainerCreating 0 22h
coredns-86c58d9df4-rrtgp 0/1 ContainerCreating 0 38h
etcd-master 1/1 Running 0 38h
kube-apiserver-master 1/1 Running 0 38h
kube-controller-manager-master 1/1 Running 0 38h
kube-proxy-9xb2c 1/1 Running 0 38h
kube-proxy-jfv2m 1/1 Running 0 38h
kube-router-5xjv4 0/1 CrashLoopBackOff 15 73m
kube-scheduler-master 1/1 Running 0 38h
Question -
Can kuberouter use private IPv6 address which is used by the Kubernetes cluster instead of using the public IP which in our case isenter code here IPv4.
So I've got a Kubernetes cluster up and running using the Kubernetes on CoreOS Manual Installation Guide.
$ kubectl get no
NAME STATUS AGE
coreos-master-1 Ready,SchedulingDisabled 1h
coreos-worker-1 Ready 54m
$ kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default curl-2421989462-h0dr7 1/1 Running 1 53m 10.2.26.4 coreos-worker-1
kube-system busybox 1/1 Running 0 55m 10.2.26.3 coreos-worker-1
kube-system kube-apiserver-coreos-master-1 1/1 Running 0 1h 192.168.0.200 coreos-master-1
kube-system kube-controller-manager-coreos-master-1 1/1 Running 0 1h 192.168.0.200 coreos-master-1
kube-system kube-proxy-coreos-master-1 1/1 Running 0 1h 192.168.0.200 coreos-master-1
kube-system kube-proxy-coreos-worker-1 1/1 Running 0 58m 192.168.0.204 coreos-worker-1
kube-system kube-scheduler-coreos-master-1 1/1 Running 0 1h 192.168.0.200 coreos-master-1
$ kubectl get svc --all-namespaces
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes 10.3.0.1 <none> 443/TCP 1h
As with the guide, I've setup a service network 10.3.0.0/16 and a pod network 10.2.0.0/16. Pod network seems fine as busybox and curl containers get IPs. But the services network has problems. Originally, I've encountered this when deploying kube-dns: the service IP 10.3.0.1 couldn't be reached, so kube-dns couldn't start all containers and DNS was ultimately not working.
From within the curl pod, I can reproduce the issue:
[ root#curl-2421989462-h0dr7:/ ]$ curl https://10.3.0.1
curl: (7) Failed to connect to 10.3.0.1 port 443: No route to host
[ root#curl-2421989462-h0dr7:/ ]$ ip route
default via 10.2.26.1 dev eth0
10.2.0.0/16 via 10.2.26.1 dev eth0
10.2.26.0/24 dev eth0 src 10.2.26.4
It seems ok that there's only a default route in the container. As I understood it, the request (to default route) should be intercepted by the kube-proxy on the worker node, forwarded to the the proxy on the master node where the IP is translated via iptables to the masters public IP.
There seems to be a common problem with a bridge/netfilter sysctl setting, but that seems fine in my setup:
core#coreos-worker-1 ~ $ sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
I'm having a real hard time to troubleshoot, as I lack the understanding of what the service IP is used for, how the service network is supposed to work in terms of traffic flow and how to best debug this.
So here're the questions I have:
What is the 1st IP of the service network (10.3.0.1 in this case) used for?
Is above description of the traffic flow correct? If not, what steps does it take for a container to reach a service IP?
What are the best ways to debug each step in the traffic flow? (I can't get any idea what's wrong from the logs)
Thanks!
The Sevice network provides fixed IPs for Services. It is not a routeable network (so don't expect ip ro to show anything nor will ping work) but a collection iptables rules managed by kube-proxy on each node (see iptables -L; iptables -t nat -L on the nodes, not Pods). These virtual IPs (see the pics!) act as load balancing proxy for endpoints (kubectl get ep), which are usually ports of Pods (but not always) with a specific set of labels as defined in the Service.
The first IP on the Service network is for reaching the kube-apiserver itself. It's listening on port 443 (kubectl describe svc kubernetes).
Troubleshooting is different on each network/cluster setup. I would generally check:
Is kube-proxy running on each node? On some setups it's run via systemd and on others there is a DeamonSet that schedules a Pod on each node. On your setup it is deployed as static Pods created by the kubelets thrmselves from /etc/kubernetes/manifests/kube-proxy.yaml
Locate logs for kube-proxy and find clues (can you post some?)
Change kube-proxy into userspace mode. Again, the details depend on your setup. For you it's in the file I mentioned above. Append --proxy-mode=userspace as a parameter on each node
Is the overlay (pod) network functional?
If you leave comments I will get back to you..
I had this same problem, and the ultimate solution that worked for me was enabling IP forwarding on all nodes in the cluster, which I had neglected to do.
$ sudo sysctl net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1
Service IPs and DNS started working immediately afterwards.
I had the same issue, turned out to be a configuration issue in kube-proxy.yaml For the "master" parameter I had the ip address as in - --master=192.168.3.240 but it actually required to be a url like - --master=https://192.168.3.240
FYI my kube-proxy sucessfully uses --proxy-mode=iptables (v1.6.x)