Kubernetes internal pods connecting to Mosquitto pod - kubernetes

I'm running a Mosquitto pod (docker.io/jllopis/mosquitto:v1.6.8-2) on an AKS instance (incidentally, using HTTP auth backend with the plugin) and have exposed that through a K8s Service. Looking at the logs for the broker I can see constant (multiple times at the same timestamp) sets of records like this:
1587048303: New connection from 10.240.0.6 on port 8883.
1587048303: New connection from 10.240.0.6 on port 1883.
1587048303: New connection from 10.240.0.6 on port 1883.
1587048305: Socket error on client <unknown>, disconnecting.
1587048305: Socket error on client <unknown>, disconnecting.
These come from different IP addresses but all within the same range; and checking using kubectl get pods --all-namespaces -o wide I can see that they are internal k8s processes, such as more-fs-watchers-sb64w, in the kube-system namespace.
What are all these doing and how can I stop them bombarding the broker? Why are they doing it? And could this be affecting other MQTT clients, legitimate ones, that are reporting intermittent connection problems?

I suspect that you are running the more-fs-watcher daemonset in your cluster.
This was vaguely recommended to go around the following issue: https://github.com/Azure/AKS/issues/772
Note that the issue is now fixed and live in the latest AKS cluster, so it should be safe to remove the more-fs-watcher DaemonSet.

Related

openVPN accesses the K8S cluster, it access the POD of the host where the server is located,cannot access the POD of other hosts in the cluster

I deployed the OpenVPN server in the K8S cluster and deployed the OpenVPN client on a host outside the cluster. However, when I use client access, I can only access the POD on the host where the OpenVPN server is located, but cannot access the POD on other hosts in the cluster.
The network used by the cluster is Calico. I also added the following iptables rules to the openVPN server host in the cluster:
I found that I did not receive the package back when I captured the package of tun0 on the server.
When the server is deployed on hostnetwork, a forward rule is missing in the iptables field.
Not sure how you set up iptables inside the server pod as iptables/netfilter was not accessible on most kube clusters I saw.
If you want to have full access to cluster networking over that OpenVPN server you probably want to use hostNetwork: true on your vpn server. The problem is that you still need proper MASQ/SNAT rule to get response across to your client.
You should investigate your traffic going out of the server pod to see if it has a properly rewritten source address, otherwise the nodes in cluster will have no knowledge on how to route the response.
You probably have a common gateway for your nodes, depending on your kube implementation you might get around this issue by setting the route back to your vpn, but that likely will require some scripting around vpn server it self to make sure the route is updated each time server pod is rescheduled.

Kubernetes: kafka pod rechability issue from another pod

I know the below information is not enough to trace the issue but still, I want some solution.
We have Amazon EKS cluster.
Currently, we are facing the reachability of the Kafka pod issue.
Environment:
Total 10 nodes with Availability zone ap-south-1a,1b
I have a three replica of the Kafka cluster (Helm chart installation)
I have a three replica of the zookeeper (Helm chart installation)
Kafka using external advertised listener on port 19092
Kafka has service with an internal network load balancer
I have deployed a test-pod to check reachability of Kafka pod.
we are using cloud-map based DNS for advertized listener
Working:
When I run telnet command from ec2 like telnet 10.0.1.45 19092. It works as expected. IP 10.0.1.45 is a loadbalancer ip.
When I run telnet command from ec2 like telnet 10.0.1.69 31899. It works as expected. IP 10.0.1.69 is a actual node's ip and 31899 is nodeport.
Problem:
When I run same command from test-pod. like telnet 10.0.1.45 19092. It works sometime and sometime it will gives an error like telnet: Unable to connect to remote host: Connection timed out
The issue is something related to kube-proxy. we need help to resolve this issue.
Can anyone help to guide me?
Can I restart kube-proxy? Does it affect other pods/deployments?
I believe this problem is caused by AWS's NLB TCP-only nature (as mentioned in the comments).
In a nutshell, your pod-to-pod communication fails when hairpin is needed.
To confirm this is the root cause, you can verify that when the telnet works, kafka pod and client pod are not in the same EC2 node. And when they're in the same EC2 server, the telnet fails.
There are (at least) two approaches to tackle this issue:
Use K8s internal networking - Refer to k8s Service's URL
Every K8s service has its own DNS FQDN for internal usage (meaning using k8s network only, without reaching the LoadBalancer and come back to k8s again). You can just telnet this instead of the NodePort via the LB.
I.e. let's assume your kafka service is named kafka. Then you can just telnet kafka.svc.cluster.local (on the port exposed by kafka service)
Use K8s anti-affinity to make sure client and kafka are never scheduled in the same node.
Oh and as indicated in this answer you might need to make that service headless.

How to prevent kube-dns from forwarding request to 8.8.8.8:53

How can I prevent kube-dns from forwarding request to Google's name servers (8.8.8.8:53 and 8.8.4.4:53)?
I just want to launch pods only for internal use, which means containers in pods are not supposed to connect to the outside at all.
When a Zookeeper client connects to a Zookeeper server using hostname (e.g. zkCli.sh -server zk-1.zk-headless), it takes 10 seconds for the client to change its state from [Connecting] to [Connected].
The reason I suspect kube-dns is that, with pods' IP address, the client gets connected instantly.
When I take a look at the log of kube-dns, I found the following two lines:
07:25:35:170773 1 logs.go:41] skydns: failure to forward request "read udp 10.244.0.13:43455->8.8.8.8:53: i/o timeout"
07:25:39:172847 1 logs.go:41] skydns: failure to forward request "read udp 10.244.0.13:42388->8.8.8.8:53: i/o timeout"
It was around 07:25:30 when the client starts to connect to the server.
I'm running Kubernetes on a private cluster where internal servers are communicating to internet via http_proxy/https_proxy, which means I cannot connect to 8.8.8.8 for name resolution, AFAIK.
I found the followings from https://github.com/skynetservices/skydns:
The default value of an environmental variable named SKYDNS_NAMESERVERS is "8.8.8.8:53,8.8.4.4:53"
I could achieve my purpose by setting no_rec to true
I've been initiating Kubernetes using kubeadm and I couldn't find a way to modify the environmental variable and set the property value of skydns.
How can I prevent kube-dns from forwarding request to the outside of an internal Kubernetes cluster which is deployed by kubeadm?
I don't think there is an option to completely prevent the kube-dns addon from forwarding requests. There certainly isn't an option directly in kubeadm for that.
Your best bet is to edit the kube-dns Deployment (e.g. kubectl edit -n kube-system deploy kube-dns) yourself after kubeadmin has started the cluster and change things to work for you.
You may want to try changing the upstream nameserver to something other than 8.8.8.8 that is accessible by the cluster. You should be able to do that by adding --nameservers=x.x.x.x to the args for the kubedns container.

Kubernetes service ip isn't always accessible within cluster (with flannel)

I built a kubernetes cluster, using flannel overlay network. The problem is one of the service ip isn't always accessible.
I tested within the cluster, by telneting the service ip and port, ended in connection timeout. Checked with netstat, the connection was always in "SYN_SENT" state, seemed that peer didn't accept connection.
But if I telnet to the pod ip and port that backed the service directly, the connection could be made successfully.
It only happened to one of the service, other services are ok.
And if I scaled the backend pod to a larger value, like 2. Then some of requests to the service ip can succeed. It seemed that the service wasn't able to connect to one of the backed pod.
Which component may be the cause of such problem? My service configuration, kube-proxy or flannel?
Check the discussion here: https://github.com/kubernetes/kubernetes/issues/38802
It's required to sysctl net.bridge.bridge-nf-call-iptables=1 on nodes.

in kubernetes, will service route the request to pod no longer serve the port but alive

I have a piece of code to run on a k8s cluster. I need shutdown those k8s nodes when all of my code running in the pods get finished. I let my code to serve on a port until the job of the code doing completed, and I will keep the program running to avoid the replication controller starting another pod. and I defined a service in k8s to routing the request.
Externally, I wrote a script to ping the service until the service can't response with code 200, then I will shutdown those k8s nodes to save resource.
My question is when my code in the pod no longer to serve the port, will k8s service still route the incoming request to that pod or not.
And, is there any other way to achieve the equivalent result?
If a TCP connection can't be opened to the pod's IP on the given port, a different pod will be connected to instead.
In other words, as long as the pod closes the socket that was listening on the port, no requests should be sent to it after that point.