Hi guys I have an 11 nodes Kubernetes cluster with cilium 1.12.1 kubeProxyReplacement=strict built on bare metal in our data center, but pods on 4 of the nodes(node5-node8) have issues when communicate with other pods or service which not on the same node, other 7 nodes don't have the issue. I can ping to other pods IP, but when telnet the port, packages seems never arrived.
All the 11 nodes installed the same version of OS, same kernel, and the cluster is deployed with Kubespray, I made sure that the 11 nodes had the same software environment as much as possible, (I’m not sure if it has anything to do with the hardware, but the 4 problematic nodes were gigabit NIC servers and the others were all 10 gigabit NICs.)
This is the node list:
❯ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master01 Ready control-plane 39h v1.24.4 10.252.55.22 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
master02 Ready control-plane 39h v1.24.4 10.252.54.44 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
master03 Ready control-plane 39h v1.24.4 10.252.55.39 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node05 Ready <none> 39h v1.24.4 10.252.34.27 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node06 Ready <none> 39h v1.24.4 10.252.33.44 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node07 Ready <none> 39h v1.24.4 10.252.33.52 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node08 Ready <none> 39h v1.24.4 10.252.33.45 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node01 Ready <none> 39h v1.24.4 10.252.144.206 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node02 Ready <none> 39h v1.24.4 10.252.145.13 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node03 Ready <none> 39h v1.24.4 10.252.145.163 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node04 Ready <none> 39h v1.24.4 10.252.145.226 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
This is what happens in pod on node5 when communicate with nginx pods running on master01:
# ping works fine
bash-5.1# ping 10.233.64.103
PING 10.233.64.103 (10.233.64.103) 56(84) bytes of data.
64 bytes from 10.233.64.103: icmp_seq=1 ttl=63 time=0.214 ms
64 bytes from 10.233.64.103: icmp_seq=2 ttl=63 time=0.148 ms
--- 10.233.64.103 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1026ms
rtt min/avg/max/mdev = 0.148/0.181/0.214/0.033 ms
# curl not working
bash-5.1# curl 10.233.64.103
curl: (28) Failed to connect to 10.233.64.103 port 80 after 3069 ms: Operation timed out
# hubble observe logs(hubble observe --to-ip 10.233.64.103 -f):
Sep 6 03:15:16.100: cilium-test/testubuntu-g2gv6 (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-overlay FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:15:16.100: cilium-test/testubuntu-g2gv6 (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:15:22.026: cilium-test/testubuntu-g2gv6:33722 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: SYN)
This is what happens in pod on node4 when communicate with the same nginx pod:
# ping works fine
bash-5.1# ping 10.233.64.103
PING 10.233.64.103 (10.233.64.103) 56(84) bytes of data.
64 bytes from 10.233.64.103: icmp_seq=1 ttl=63 time=2.33 ms
64 bytes from 10.233.64.103: icmp_seq=2 ttl=63 time=2.30 ms
# curl works fine as well
bash-5.1# curl 10.233.64.103
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
nginx.org.<br/>
Commercial support is available at
nginx.com.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
# hubble observe logs(hubble observe --to-ip 10.233.64.103 -f):
Sep 6 03:16:24.808: cilium-test/testubuntu-wcwfg (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-overlay FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:16:24.810: cilium-test/testubuntu-wcwfg (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:16:27.043: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: SYN)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: SYN)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK, PSH)
Sep 6 03:16:27.047: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.047: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Sep 6 03:16:27.048: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK, FIN)
Sep 6 03:16:27.050: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Sep 6 03:16:27.050: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.051: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK)
This is the cilium-health status, also shows the port connection issues on the 4 nodes:
❯ kubectl exec -it -n kube-system ds/cilium -- cilium-health status
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), clean-cilium-state (init)
Probe time: 2022-09-06T03:10:24Z
Nodes:
node01 (localhost):
Host connectivity to 10.252.144.206:
ICMP to stack: OK, RTT=341.295µs
HTTP to agent: OK, RTT=100.729µs
Endpoint connectivity to 10.233.67.53:
ICMP to stack: OK, RTT=334.224µs
HTTP to agent: OK, RTT=163.289µs
master01:
Host connectivity to 10.252.55.22:
ICMP to stack: OK, RTT=1.994728ms
HTTP to agent: OK, RTT=1.610932ms
Endpoint connectivity to 10.233.64.235:
ICMP to stack: OK, RTT=2.100332ms
HTTP to agent: OK, RTT=2.489126ms
master02:
Host connectivity to 10.252.54.44:
ICMP to stack: OK, RTT=2.33033ms
HTTP to agent: OK, RTT=2.34166ms
Endpoint connectivity to 10.233.65.225:
ICMP to stack: OK, RTT=2.101561ms
HTTP to agent: OK, RTT=2.067012ms
master03:
Host connectivity to 10.252.55.39:
ICMP to stack: OK, RTT=1.688641ms
HTTP to agent: OK, RTT=1.593428ms
Endpoint connectivity to 10.233.66.74:
ICMP to stack: OK, RTT=2.210915ms
HTTP to agent: OK, RTT=1.725555ms
node05:
Host connectivity to 10.252.34.27:
ICMP to stack: OK, RTT=2.383001ms
HTTP to agent: OK, RTT=2.48362ms
Endpoint connectivity to 10.233.70.87:
ICMP to stack: OK, RTT=2.194843ms
HTTP to agent: Get "http://10.233.70.87:4240/hello": dial tcp 10.233.70.87:4240: connect: connection timed out
node06:
Host connectivity to 10.252.33.44:
ICMP to stack: OK, RTT=2.091932ms
HTTP to agent: OK, RTT=1.724729ms
Endpoint connectivity to 10.233.71.119:
ICMP to stack: OK, RTT=1.984056ms
HTTP to agent: Get "http://10.233.71.119:4240/hello": dial tcp 10.233.71.119:4240: connect: connection timed out
node07:
Host connectivity to 10.252.33.52:
ICMP to stack: OK, RTT=2.055482ms
HTTP to agent: OK, RTT=2.037437ms
Endpoint connectivity to 10.233.72.47:
ICMP to stack: OK, RTT=1.853614ms
HTTP to agent: Get "http://10.233.72.47:4240/hello": dial tcp 10.233.72.47:4240: connect: connection timed out
node08:
Host connectivity to 10.252.33.45:
ICMP to stack: OK, RTT=2.461315ms
HTTP to agent: OK, RTT=2.369003ms
Endpoint connectivity to 10.233.74.247:
ICMP to stack: OK, RTT=2.097029ms
HTTP to agent: Get "http://10.233.74.247:4240/hello": dial tcp 10.233.74.247:4240: connect: connection timed out
node02:
Host connectivity to 10.252.145.13:
ICMP to stack: OK, RTT=372.787µs
HTTP to agent: OK, RTT=168.915µs
Endpoint connectivity to 10.233.73.98:
ICMP to stack: OK, RTT=360.354µs
HTTP to agent: OK, RTT=287.224µs
node03:
Host connectivity to 10.252.145.163:
ICMP to stack: OK, RTT=363.072µs
HTTP to agent: OK, RTT=216.652µs
Endpoint connectivity to 10.233.68.73:
ICMP to stack: OK, RTT=312.153µs
HTTP to agent: OK, RTT=304.981µs
node04:
Host connectivity to 10.252.145.226:
ICMP to stack: OK, RTT=375.121µs
HTTP to agent: OK, RTT=185.484µs
Endpoint connectivity to 10.233.69.140:
ICMP to stack: OK, RTT=403.752µs
HTTP to agent: OK, RTT=277.517µs
Any suggestions on where I should start troubleshooting?
Since 1.12 version they changed the routing heavily.
Try to enable legacy routing.
In the helm_values.yaml (if you are using helm to deploy) you should add:
bpf:
hostLegacyRouting: true
It configures whether direct routing mode should route traffic via host stack (true) or directly and more efficiently out of BPF (false) if the kernel supports it. The latter has the implication that it will also bypass netfilter in the host namespace.
You can read more about BPF in the official docs. Pay attention to the compatibility of the node OS with BPF
Related
Implemented on Oracle VM as per the document, but failover didn't work.
https://github.com/justmeandopensource/kubernetes/tree/master/kubeadm-ha-multi-master
Role FQDN IP OS RAM CPU
Load Balancer loadbalancer.example.com172.16.16.100 Ubuntu 20.04 1G 1
Master kmaster1.example.com 172.16.16.101 Ubuntu 20.04 2G 2
Master kmaster2.example.com 172.16.16.102 Ubuntu 20.04 2G 2
Worker kworker1.example.com 172.16.16.201 Ubuntu 20.04 1G 1
Below are the details before shutting down the kmaster1 and after.
root#kmaster2:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kmaster1 Ready master 22h v1.19.2
kmaster2 Ready master 22h v1.19.2
kworker1 Ready <none> 22h v1.19.2
===>shutdown now on master1-========
root#kmaster2:~# kubectl get nodes
Error from server: etcdserver: request timed out
root#kmaster2:~# kubectl get nodes
Error from server: etcdserver: request timed out
root#kmaster2:~# ping 172.16.16.100
PING 172.16.16.100 (172.16.16.100) 56(84) bytes of data.
64 bytes from 172.16.16.100: icmp_seq=1 ttl=64 time=0.580 ms
64 bytes from 172.16.16.100: icmp_seq=2 ttl=64 time=0.716 ms
64 bytes from 172.16.16.100: icmp_seq=3 ttl=64 time=1.08 ms
^C
--- 172.16.16.100 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2053ms
rtt min/avg/max/mdev = 0.580/0.792/1.081/0.211 ms
root#kmaster2:~# kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout
root#kmaster2:~# kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout
I would like to access my application via localhost with kubectl port-forward command. But when I run kubectl port-forward road-dashboard-dev-5cdc465475-jwwgz 8082:8080 I received an below error.
> Forwarding from 127.0.0.1:8082 -> 8080 Forwarding from [::1]:8082 ->
> 8080 Handling connection for 8082 Handling connection for 8082 E0124
> 14:15:27.173395 4376 portforward.go:400] an error occurred
> forwarding 8082 -> 8080: error forwarding port 8080 to pod
> 09a76f6936b313e438bbf5a84bd886b3b3db8f499b5081b66cddc390021556d5, uid
> : exit status 1: 2020/01/24 11:15:27 socat[9064] E connect(6, AF=2
> 127.0.0.1:8080, 16): Connection refused
I also try to connect pod in cluster via exec -it but it did not work as well.What might be the missing point that I ignore?
node#road-dashboard-dev-5cdc465475-jwwgz:/usr/src/app$ curl -v localhost:8080
* Rebuilt URL to: localhost:8080/
* Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 8080 failed: Connection refused
* Trying 127.0.0.1...
* TCP_NODELAY set
* connect to 127.0.0.1 port 8080 failed: Connection refused
* Failed to connect to localhost port 8080: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 8080: Connection refused
kubectl get all out is below.I am sure that Container port value is set 8080.
NAME READY STATUS RESTARTS AGE
pod/road-dashboard-dev-5cdc465475-jwwgz 1/1 Running 0 34m
pod/road-dashboard-dev-5cdc465475-rdk7g 1/1 Running 0 34m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/road-dashboard-dev NodePort 10.254.61.225 <none> 80:41599/TCP 18h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/road-dashboard-dev 2/2 2 2 18h
NAME DESIRED CURRENT READY AGE
replicaset.apps/road-dashboard-dev-5cdc465475 2 2 2 34m
Name: road-dashboard-dev-5cdc465475-jwwgz
Namespace: dev
Priority: 0
PriorityClassName: <none>
Node: c123
Containers:
road-dashboard:
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 24 Jan 2020 13:42:39 +0300
Ready: True
Restart Count: 0
Environment: <none>
To debug your issue you should let the port forward command tuning in foreground and curl from a second terminal and see what output you get on the port-forward prompt.
$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx 1/1 Running 2 112m 10.244.3.43 k8s-node-3 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 11d <none>
service/nginx NodePort 10.96.130.207 <none> 80:31316/TCP 20m run=nginx
Example :
$ kubectl port-forward nginx 31000:80
Forwarding from 127.0.0.1:31000 -> 80
Forwarding from [::1]:31000 -> 80
Curl from second terminal window curl the port forward you have.
$ curl localhost:31000
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
nginx.org.<br/>
Commercial support is available at
nginx.com.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
You should see that on first terminal the portforward promt list that it is handeling a connection like below note new line Handling connection for 31000
$ kubectl port-forward nginx 31000:80
Forwarding from 127.0.0.1:31000 -> 80
Forwarding from [::1]:31000 -> 80
Handling connection for 31000
So if like i have wrong port forwarding as below (note i have mode the port 8080 for nginx container exposing port 80)
$ kubectl port-forward nginx 31000:8080
Forwarding from 127.0.0.1:31000 -> 8080
Forwarding from [::1]:31000 -> 8080
The curl will result clear error on port forward prompt indicating the connection was refused from container when getting to port 8080 as its not correct. and we get a empty reply back.
$ curl -v localhost:31000
* Rebuilt URL to: localhost:31000/
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 31000 (#0)
> GET / HTTP/1.1
> Host: localhost:31000
> User-Agent: curl/7.47.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
$ kubectl port-forward nginx 31000:8080
Forwarding from 127.0.0.1:31000 -> 8080
Forwarding from [::1]:31000 -> 8080
Handling connection for 31000
E0124 11:35:53.390711 10791 portforward.go:400] an error occurred forwarding 31000 -> 8080: error forwarding port 8080 to pod 88e4de4aba522b0beff95c3b632eca654a5c34b0216320a29247bb8574ef0f6b, uid : exit status 1: 2020/01/24 11:35:57 socat[15334] E connect(5, AF=2 127.0.0.1:8080, 16): Connection refused
I am trying to access a Flask server running in one Openshift pod from other.
For that I created a service as below.
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-web-app ClusterIP 172.30.216.112 <none> 8080/TCP 8m
$ oc describe svc my-web-app
Name: my-web-app
Namespace: neo4j-sys
Labels: app=my-web-app
Annotations: openshift.io/generated-by=OpenShiftNewApp
Selector: app=my-web-app,deploymentconfig=my-web-app
Type: ClusterIP
IP: 172.30.216.112
Port: 8080-tcp 8080/TCP
TargetPort: 8080/TCP
Endpoints: 172.20.203.104:5000,172.20.49.150:5000
Session Affinity: None
Events: <none>
1)
First, I ping ed from one pod to other pod and got response.
(app-root) sh-4.2$ ping 172.20.203.104
PING 172.20.203.104 (172.20.203.104) 56(84) bytes of data.
64 bytes from 172.20.203.104: icmp_seq=1 ttl=64 time=5.53 ms
64 bytes from 172.20.203.104: icmp_seq=2 ttl=64 time=0.527 ms
64 bytes from 172.20.203.104: icmp_seq=3 ttl=64 time=3.10 ms
64 bytes from 172.20.203.104: icmp_seq=4 ttl=64 time=2.12 ms
64 bytes from 172.20.203.104: icmp_seq=5 ttl=64 time=0.784 ms
64 bytes from 172.20.203.104: icmp_seq=6 ttl=64 time=6.81 ms
64 bytes from 172.20.203.104: icmp_seq=7 ttl=64 time=18.2 ms
^C
--- 172.20.203.104 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6012ms
rtt min/avg/max/mdev = 0.527/5.303/18.235/5.704 ms
But, when I tried curl, it is not responding.
(app-root) sh-4.2$ curl 172.20.203.104
curl: (7) Failed connect to 172.20.203.104:80; Connection refused
(app-root) sh-4.2$ curl 172.20.203.104:8080
curl: (7) Failed connect to 172.20.203.104:8080; Connection refused
2)
After that I tried to reach cluster IP from one pod. In this case, both ping, curl not reachable.
(app-root) sh-4.2$ ping 172.30.216.112
PING 172.30.216.112 (172.30.216.112) 56(84) bytes of data.
From 172.20.49.1 icmp_seq=1 Destination Host Unreachable
From 172.20.49.1 icmp_seq=4 Destination Host Unreachable
From 172.20.49.1 icmp_seq=2 Destination Host Unreachable
From 172.20.49.1 icmp_seq=3 Destination Host Unreachable
^C
--- 172.30.216.112 ping statistics ---
7 packets transmitted, 0 received, +4 errors, 100% packet loss, time 6002ms
pipe 4
(app-root) sh-4.2$ curl 172.30.216.112
curl: (7) Failed connect to 172.30.216.112:80; No route to host
Please let me know where I am going wrong here. Why the above cases #1, #2 failing. How to access the clusterIP services.
I am completely new to the services and accessing them and hence I might be missing some basics.
I gone through other answers How can I access the Kubernetes service through ClusterIP 1. But, it is for Nodeport which is not helping me.
Updates based on below comment from Graham Dumpleton, below are my observations.
This is the Flask server log which I am running in the pods for information.
* Serving Flask app "wsgi" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [14/Nov/2019 04:54:53] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [14/Nov/2019 04:55:05] "GET / HTTP/1.1" 200 -
Is your pod listening on external interfaces on pod 8080?
If I understand question correctly, my intention here is just to communicate between pods via clusterIP service. I am not looking for accessing pods from external interfaces(other projects or through web urls as load balancer service)
If you get into the pod, can you do curl $HOSTNAME:8080?
Yes, if I am running as localhost or 127.0.0.1, I am getting response from the same pod where I run this as expected.
(app-root) sh-4.2$ curl http://127.0.0.1:5000/
Hello World!
(app-root) sh-4.2$ curl http://localhost:5000/
Hello World!
But, if I tried with my-web-app or service IP(clusterIP). I am not getting response.
(app-root) sh-4.2$ curl http://172.30.216.112:5000/
curl: (7) Failed connect to 172.30.216.112:5000; No route to host
(app-root) sh-4.2$ curl my-web-app:8080
curl: (7) Failed connect to my-web-app:8080; Connection refused
(app-root) sh-4.2$ curl http://my-web-app:8080/
curl: (7) Failed connect to my-web-app:8080; Connection refused
With pod IP also I am not getting response.
(app-root) sh-4.2$ curl http://172.20.49.150:5000/
curl: (7) Failed connect to 172.20.49.150:5000; Connection refused
(app-root) sh-4.2$ curl 172.20.49.150
curl: (7) Failed connect to 172.20.49.150:80; Connection refused
I am answering my own question. Here is how my issue got resolved based on inputs from Graham Dumpleton.
Initially, I used to connect Flask server as below.
from flask import Flask
application = Flask(__name__)
if __name__ == "__main__":
application.run()
This bind the server to http://127.0.0.1:5000/ by default.
As part of resolution I changed the bind to 0.0.0.0 as below
from flask import Flask
application = Flask(__name__)
if __name__ == "__main__":
application.run(host='0.0.0.0')
And log as below after that.
* Serving Flask app "wsgi" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
After that pods successfully communicated via clusterIP. Below are the service details(increased one more pod)
$ oc describe svc my-web-app
Name: my-web-app
Namespace: neo4j-sys
Labels: app=my-web-app
Annotations: openshift.io/generated-by=OpenShiftNewApp
Selector: app=my-web-app,deploymentconfig=my-web-app
Type: ClusterIP
IP: 172.30.4.250
Port: 8080-tcp 8080/TCP
TargetPort: 5000/TCP
Endpoints: 172.20.106.184:5000,172.20.182.118:5000,172.20.83.40:5000
Session Affinity: None
Events: <none>
Below is the successful response.
(app-root) sh-4.2$ curl http://172.30.4.250:8080 //with clusterIP which is my expectation
Hello World!
(app-root) sh-4.2$ curl http://172.20.106.184:5000 // with pod IP
Hello World!
(app-root) sh-4.2$ curl $HOSTNAME:5000 // with $HOSTNAME
Hello World!
I have a k8s service/deployment in a minikube cluster (name amq in default namespace:
D20181472:argo-k8s gms$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
argo argo-ui ClusterIP 10.97.242.57 <none> 80/TCP 5h19m
default amq LoadBalancer 10.102.205.126 <pending> 61616:32514/TCP 4m4s
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h23m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 5h23m
I spun up infoblox/dnstools, and tried nslookup, dig and ping of amq.default with the following results:
dnstools# nslookup amq.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: amq.default.svc.cluster.local
Address: 10.102.205.126
dnstools# ping amq.default
PING amq.default (10.102.205.126): 56 data bytes
^C
--- amq.default ping statistics ---
28 packets transmitted, 0 packets received, 100% packet loss
dnstools# dig amq.default
; <<>> DiG 9.11.3 <<>> amq.default
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 15104
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;amq.default. IN A
;; Query time: 32 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Sat Jan 26 01:58:13 UTC 2019
;; MSG SIZE rcvd: 29
dnstools# ping amq.default
PING amq.default (10.102.205.126): 56 data bytes
^C
--- amq.default ping statistics ---
897 packets transmitted, 0 packets received, 100% packet loss
(NB: pinging the ip address directly gives the same result)
I admittedly am not very knowledgable about the deep workings of DNS, so I am not sure why I can do a lookup and dig for the hostname, but not ping it.
I admittedly am not very knowledgable about the deep workings of DNS, so I am not sure why I can do a lookup and dig for the hostname, but not ping it.
Because Service IP addresses are figments of your cluster's imagination, caused by either iptables or ipvs, and don't actually exist. You can see them with iptables -t nat -L -n on any Node that is running kube-proxy (or ipvsadm -ln), as is described by the helpful Debug[-ing] Services page
Since they are not real IPs bound to actual NICs, they don't respond to any traffic other than the port numbers registered in the Service resource. The correct way of testing connectivity against a service is with something like curl or netcat and using the port number upon which you are expecting application traffic to travel.
That’s because the service’s cluster IP is a virtual IP, and only has meaning when combined with the service port.
Whenever a service gets created by API server a Virtual IP address is assigned to it immediately and after that, the API server notifies all kube-proxy agents running on the worker nodes that a new Service has been created. Then, It's kube-proxy's job to make that service addressable on the node it’s running on. kube-proxy does this by setting up a few iptables rules, which make sure each packet destined for the service IP/port pair is intercepted and its destination address modified, so the packet is redirected to one of the pods backing the service.
IPs and VIPs
In my Kubernetes environment I have following to pods running
NAME READY STATUS RESTARTS AGE IP NODE
httpd-6cc5cff4f6-5j2p2 1/1 Running 0 1h 172.16.44.12 node01
tomcat-68ccbb7d9d-c2n5m 1/1 Running 0 45m 172.16.44.13 node02
One is a Tomcat instance and other one is a Apache instance.
from node01 and node02 I can curl the httpd which is using port 80. But If i curl the tomcat server which is running on node2 from node1 it fails. I get below output.
[root#node1~]# curl -v 172.16.44.13:8080
* About to connect() to 172.16.44.13 port 8080 (#0)
* Trying 172.16.44.13...
* Connected to 172.16.44.13 (172.16.44.13) port 8080 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 172.16.44.13:8080
> Accept: */*
>
^C
[root#node1~]# wget -v 172.16.44.13:8080
--2019-01-16 12:00:21-- http://172.16.44.13:8080/
Connecting to 172.16.44.13:8080... connected.
HTTP request sent, awaiting response...
But I'm able telnet to port 8080 on 172.16.44.13 from node1
[root#node1~]# telnet 172.16.44.13 8080
Trying 172.16.44.13...
Connected to 172.16.44.13.
Escape character is '^]'.
^]
telnet>
Any reason for this behavior? why am I able to telnet but unable to get the web content? I have also tried different ports but curl is working only for port 80.
I was able to get this fixed by disabling selinux on my nodes.