Can't Connect to Kubernetes Service from Inside Service Pod? - kubernetes

I create a one-replica zookeeper + kafka cluster with the official kafka chart from the official incubator repo:
helm install --name mykafka -f kafka.yaml incubator/kafka
This gives me two pods:
kubectl get pods
NAME READY STATUS
mykafka-kafka-0 1/1 Running
mykafka-zookeeper-0 1/1 Running
And four services (in addition to the default kubernetes service)
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP
mykafka-kafka ClusterIP 10.108.143.59 <none> 9092/TCP
mykafka-kafka-headless ClusterIP None <none> 9092/TCP
mykafka-zookeeper ClusterIP 10.109.43.48 <none> 2181/TCP
mykafka-zookeeper-headless ClusterIP None <none> 2888/TCP,3888/TCP
If I shell into the zookeeper pod:
> kubectl exec -it mykafka-zookeeper-0 -- /bin/bash
I use the curl tool to test TCP connectivity. I expect a communications error as the server isn't using HTTP, but if curl can't even connect and I have to ctrl-C out, then the TCP connection isn't working.
I can access the local pod through curl localhost:2181:
root#mykafka-zookeeper-0:/# curl localhost:2181
curl: (52) Empty reply from server
I can access other pod through curl mykafka-kafka:9092:
root#mykafka-zookeeper-0:/# curl mykafka-kafka:9092
curl: (56) Recv failure: Connection reset by peer
But I can't access mykafka-zookeeper:2181. That name resolves to the cluster IP, but the attempt to TCP connect hangs until I ctrl-C:
root#mykafka-zookeeper-0:/# curl -v mykafka-zookeeper:2181
* Rebuilt URL to: mykafka-zookeeper:2181/
* Trying 10.109.43.48...
^C
Similarly, I can shell into the kafka pod:
> kubectl exec -it mykafka-kafka-0 -- /bin/bash
Connecting to the Zookeeper pod by the service name works fine:
root#mykafka-kafka-0:/# curl mykafka-zookeeper:2181
curl: (52) Empty reply from server
Connecting to localhost kafka works fine:
root#mykafka-kafka-0:/# curl localhost:9092
curl: (56) Recv failure: Connection reset by peer
But connecting to the Kafka pod by the service name doesn't work and I must ctrl-C the curl attempt:
curl -v mykafka-kafka:9092
* Rebuilt URL to: mykafka-kafka:9092/
* Hostname was NOT found in DNS cache
* Trying 10.108.143.59...
^C
Can anyone explain why using I can only connect to a Kubernetes service from outside the service and not from within the service?

I believe what you're experiencing can be resolved by looking at how your kubelet is set up to run. There is a setting you can toggle when starting up the kubelet called --hairpin-mode. By default this setting is set to the string promiscuous, where a pod can't connect to its own service, but you can change it to be hairpin-veth, which would allow a pod to connect to its own service.
There are a few issues on the topic, but this seems to be referenced the most:
https://github.com/kubernetes/kubernetes/issues/45790

Related

client applications connection to kubernetes serviced pods are timing out if no container is listening

i'm facing my own program stuck in connecting to a serviced pod in my kubernetes cluster.
Let me explain, let's take a curl program trying to connect one of the containers in a serviced pod from outside the cluster.
curl -X GET http://192.168.1.105:31003/ready
{ "ready": true }
no error, the service is doing good :)
When my deployment is deleted, the curl command reports the network error as expected
curl -v -X GET http://192.168.1.105:31003/ready
curl: (7) Failed connect to 192.168.1.105:31003; Connection refused
Now, if i replace the webserver container in pod by a sleep 3600 container, start the deployment, then the curl command is timing out
curl -v -X GET http://192.168.1.105:31003/ready
* About to connect() to 192.168.1.105 port 31003 (#0)
* Trying 192.168.1.105...
* Connection timed out
* Failed connect to 192.168.1.105:31003; Connection timed out
* Closing connection 0
curl: (7) Failed connect to 192.168.1.105:31003; Connection timed out
I don't understand why the curl client doesn't get an error when it tries to connect to a container running sleep with no port opened!
My pod has no liveness or readiness probe set, so all containers are declared as 'running'.
kubectl get pod
NAME READY STATUS RESTARTS AGE
alis-green-core-2hlgx 3/3 Running 0 104s
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service-alis-green-core NodePort 172.30.28.252 <none> 11003:31003/TCP,11903:31229/TCP,11904:32364/TCP,11007:31183/TCP,14281:31419/TCP 2m11s
kubectl get endpoints
NAME ENDPOINTS AGE
service-alis-green-core 10.129.0.44:14281,10.129.0.44:11903,10.129.0.44:11007 + 2 more... 2m52s
I guess the issue is related to some kube-proxy configuration i may have missed.
Thanks

Kubespray : Netchecker connectivity check fails

I deployed a Kubernetes (v1.17.5) cluster on OpenStack instances using Kubespray. Those instances are CentOS 7.6.1811 qcow2 images imported in Glance.
The install was successful, and I can see my nodes and pods with kubectl commands.
I used the deploy_netchecker option to deploy NetChecker and test the network within my cluster, and set network_plugin="flannel".
I also tried kube_proxy_mode="iptables", but it doesn't seem to affect the result.
That's pretty much all the changes I did in the k8s-cluster.yml file.
All the pods are running, services too :
[centos#cl1-master-0 ~]$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 46h
default netchecker-service NodePort 10.233.13.213 <none> 8081:31081/TCP 46h
kube-system coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 46h
kube-system dashboard-metrics-scraper ClusterIP 10.233.59.12 <none> 8000/TCP 46h
kube-system kubernetes-dashboard ClusterIP 10.233.63.20 <none> 443/TCP 46h
But netchecker API gives the following answer :
[root#localhost ~]# curl http://X.X.X.X:31081/api/v1/connectivity_check
{"Message":"Connectivity check fails. Reason: there are absent or outdated pods; look up the payload","Absent":["netchecker-agent-hostnet-kk56x","netchecker-agent-hostnet-klldn","netchecker-agent-hostnet-r2vqs","netchecker-agent-hostnet-wqhjs"],"Outdated":["netchecker-agent-4jsgf","netchecker-agent-c9pcf","netchecker-agent-hostnet-jzbfv","netchecker-agent-vxgpf"]}
For an unknown reason, I cannot access the API from a cluster node with localhost, so I used a floating IP with OpenStack.
Here are some logs from the agent :
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-vjnwl_d8290268-3ea4-4e3c-acb4-295ab162a735/netchecker-agent/0.log
{"log":"I0701 13:04:01.814246 1 agent.go:135] Response status code: 200\n","stream":"stderr","time":"2020-07-01T13:04:01.81437579Z"}
{"log":"I0701 13:04:01.814272 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:04:01.814393199Z"}
{"log":"I0701 13:04:16.817398 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-vjnwl\n","stream":"stderr","time":"2020-07-01T13:04:16.817786735Z"}
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-hostnet-klldn_d5fa6e72-885f-44e1-97a6-880a25e6d6d6/netchecker-agent/0.log
{"log":"E0701 13:05:22.804428 1 agent.go:133] Error while sending info. Details: Post http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn: dial tcp 10.233.13.213:8081: i/o timeout\n","stream":"stderr","time":"2020-07-01T13:05:22.805138032Z"}
{"log":"I0701 13:05:22.804474 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:05:22.805190295Z"}
{"log":"I0701 13:05:37.807140 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn\n","stream":"stderr","time":"2020-07-01T13:05:37.807309111Z"}
Logs from the server do not indicate any error.
I tried to check DNS resolve with the following :
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- /bin/sh
/ $ nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
nslookup: can't resolve 'kubernetes.default'
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5
169.254.25.10 is the IP of the nodelocaldns, but it doesn't seem to query the coredns service deployed.
When I use nslookup netchecker-service.default.svc.cluster.local 10.233.0.3, with the coredns IP, I get a correct answer.
What can be wrong with my configuration ?
Thanks in advance
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.

What happens when a service receives a request but has no ready pods?

Having a kubernetes service (of type ClusterIP) connected to a set of pods, but none of them are currently ready - what will happen to the request?
Will it:
fail eagerly
timeout
wait until a ready pod is available (or forever, whichever is earlier)
something else?
It will time out.
Kube-proxy pulls out the IP addresses from healthy pods and sets as endpoints of the service (backends). Also, note that all kube-proxy does is to re-write the iptables when you create, delete or modify a service.
So, when you send a request within your network and there is no one to reply, your request will timeout.
Deployed nginx service
[node1 ~]$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 443/TCP 2h
my-nginx ClusterIP 10.100.1.134 80/TCP 9s
$ curl 10.100.1.134
curl: (7) Failed connect to 10.100.1.134:80; Connection refused
Deployed nginx deployment
$ kubectl create -f nginx-depl.yaml
$ kubectl get po
NAME READY STATUS RESTARTS AGE
my-nginx-f9945ffdd-2f77f 1/1 Running 0 1m
my-nginx-f9945ffdd-rk68v 1/1 Running 0 1m
$ curl 10.100.1.134
Welcome to nginx!
most likely you would get Connection refused error

Kubernetes ClusterIP Service ENV set incorrectly

I created a ClusterIP service that has 3 pods in the deployment:
tornado-service ClusterIP 10.107.117.119 <none> 8085/TCP 2m
When I ssh into one of the pods it has these env variables:
TORNADO_SERVICE_PORT_8085_TCP_PROTO=tcp
TORNADO_SERVICE_PORT_8085_TCP=tcp://10.99.116.50:8085
TORNADO_SERVICE_SERVICE_HOST=10.99.116.50
Which doesn't match what kubectl gave me. When I curl another pod using the ENV ip address it hangs:
curl -XPOST 10.99.116.50:8085
But when I use the kubectl IP I get a 200 http response:
curl -XPOST 10.107.117.119:8085
Why is Kubernetes setting the service IP env incorrectly in my pods?

K8s NodePort service is “unreachable by IP” only on 2/4 slaves in the cluster

I created a K8s cluster of 5 VMs (1 master and 4 slaves running Ubuntu 16.04.3 LTS) using kubeadm. I used flannel to set up networking in the cluster. I was able to successfully deploy an application. I, then, exposed it via NodePort service. From here things got complicated for me.
Before I started, I disabled the default firewalld service on master and the nodes.
As I understand from the K8s Services doc, the type NodePort exposes the service on all nodes in the cluster. However, when I created it, the service was exposed only on 2 nodes out of 4 in the cluster. I am guessing that's not the expected behavior (right?)
For troubleshooting, here are some resource specs:
root#vm-vivekse-003:~# kubectl get nodes
NAME STATUS AGE VERSION
vm-deepejai-00b Ready 5m v1.7.3
vm-plashkar-006 Ready 4d v1.7.3
vm-rosnthom-00f Ready 4d v1.7.3
vm-vivekse-003 Ready 4d v1.7.3 //the master
vm-vivekse-004 Ready 16h v1.7.3
root#vm-vivekse-003:~# kubectl get pods -o wide -n playground
NAME READY STATUS RESTARTS AGE IP NODE
kubernetes-bootcamp-2457653786-9qk80 1/1 Running 0 2d 10.244.3.6 vm-rosnthom-00f
springboot-helloworld-2842952983-rw0gc 1/1 Running 0 1d 10.244.3.7 vm-rosnthom-00f
root#vm-vivekse-003:~# kubectl get svc -o wide -n playground
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
sb-hw-svc 10.101.180.19 <nodes> 9000:30847/TCP 5h run=springboot-helloworld
root#vm-vivekse-003:~# kubectl describe svc sb-hw-svc -n playground
Name: sb-hw-svc
Namespace: playground
Labels: <none>
Annotations: <none>
Selector: run=springboot-helloworld
Type: NodePort
IP: 10.101.180.19
Port: <unset> 9000/TCP
NodePort: <unset> 30847/TCP
Endpoints: 10.244.3.7:9000
Session Affinity: None
Events: <none>
root#vm-vivekse-003:~# kubectl get endpoints sb-hw-svc -n playground -o yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: 2017-08-09T06:28:06Z
name: sb-hw-svc
namespace: playground
resourceVersion: "588958"
selfLink: /api/v1/namespaces/playground/endpoints/sb-hw-svc
uid: e76d9cc1-7ccb-11e7-bc6a-fa163efaba6b
subsets:
- addresses:
- ip: 10.244.3.7
nodeName: vm-rosnthom-00f
targetRef:
kind: Pod
name: springboot-helloworld-2842952983-rw0gc
namespace: playground
resourceVersion: "473859"
uid: 16d9db68-7c1a-11e7-bc6a-fa163efaba6b
ports:
- port: 9000
protocol: TCP
After some tinkering I realized that on those 2 "faulty" nodes, those services were not available from within those hosts itself.
Node01 (working):
root#vm-vivekse-004:~# curl 127.0.0.1:30847 //<localhost>:<nodeport>
Hello Docker World!!
root#vm-vivekse-004:~# curl 10.101.180.19:9000 //<cluster-ip>:<port>
Hello Docker World!!
root#vm-vivekse-004:~# curl 10.244.3.7:9000 //<pod-ip>:<port>
Hello Docker World!!
Node02 (working):
root#vm-rosnthom-00f:~# curl 127.0.0.1:30847
Hello Docker World!!
root#vm-rosnthom-00f:~# curl 10.101.180.19:9000
Hello Docker World!!
root#vm-rosnthom-00f:~# curl 10.244.3.7:9000
Hello Docker World!!
Node03 (not working):
root#vm-plashkar-006:~# curl 127.0.0.1:30847
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out
root#vm-plashkar-006:~# curl 10.101.180.19:9000
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out
root#vm-plashkar-006:~# curl 10.244.3.7:9000
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out
Node04 (not working):
root#vm-deepejai-00b:/# curl 127.0.0.1:30847
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out
root#vm-deepejai-00b:/# curl 10.101.180.19:9000
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out
root#vm-deepejai-00b:/# curl 10.244.3.7:9000
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out
Tried netstat and telnet on all 4 slaves. Here's the output:
Node01 (the working host):
root#vm-vivekse-004:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 27808/kube-proxy
root#vm-vivekse-004:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Node02 (the working host):
root#vm-rosnthom-00f:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 11842/kube-proxy
root#vm-rosnthom-00f:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Node03 (the not-working host):
root#vm-plashkar-006:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 7791/kube-proxy
root#vm-plashkar-006:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
Node04 (the not-working host):
root#vm-deepejai-00b:/# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 689/kube-proxy
root#vm-deepejai-00b:/# telnet 127.0.0.1 30847
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
Addition info:
From the kubectl get pods output, I can see that the pod is actually deployed on slave vm-rosnthom-00f. I am able to ping this host from all the 5 VMs and curl vm-rosnthom-00f:30847 also works from all the VMs.
I can clearly see that the internal cluster networking is messed up, but I am unsure how to resolve it! iptables -L for all the slaves are identical, and even the Local Loopback (ifconfig lo) is up and running for all the slaves. I'm completely clueless as to how to fix it!
Use a service type NodePort and access the NodePort if the Ipadress of your Master node.
The Service obviously knows on which node a Pod is running and redirect the traffic to one of the pods if you have several instances.
Label your pods and use the corrispondent selectors in the service.
If you get still into issues please post your service and deployment.
To check the connectivity i would suggest to use netcat.
nc -zv ip/service port
if network is ok it responds: open
inside the cluster access the containers like so:
nc -zv servicename.namespace.svc.cluster.local port
Consider always that you have 3 kinds of ports.
Port on which your software is running in side your container.
Port on which you expose that port to the pod. (a pod has one ipaddress, the clusterIp address, which is use by a container on a specific port)
NodePort wich allows you to access the pods ipaddress ports from outside the clusters network.
Either your firewall blocks some connections between nodes or your kube-proxy is not working properly. I guess your services work only on nodes where pods are running on.
If you want to reach the service from any node in the cluster you need fine service type as ClusterIP. Since you defined service type as NodePort, you can connect from the node where service is running.
my above answer was not correct, based on documentation we should be able to connect from any NodeIP:Nodeport. but its not working in my cluster also.
https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services---service-types
NodePort: Exposes the service on each Node’s IP at a static port (the
NodePort). A ClusterIP service, to which the NodePort service will
route, is automatically created. You’ll be able to contact the
NodePort service, from outside the cluster, by requesting
:.
One of my node ip forward not set. I was able to connect my service using NodeIP:nodePort
sysctl -w net.ipv4.ip_forward=1