EKS internal service connection unreliable - kubernetes

I just setup a new EKS cluster (latest version available, using three default AMI).
I deployed a Redis instance in it as a Kubernetes service and exposed it. I can access the Redis database through internal DNS like : mydatabase.redis (it's deployed in the redis namespace). In another pod I can connect to my Redis database, however sometimes the connection takes more than 10 seconds.
It's doesn't seem to be a DNS resolution issue as host mydatabase.redis responds immediately with the service IP address. However when I try to connect to it, for example: nc mydatabase.redis 6379 -v it sometimes connects instantly and sometimes takes more than 10 seconds.
All my services are impacted, I don't know why. I didn't change any settings in my cluster this a basic EKS cluster.
How can I debug this?

Related

Access Kubernetes applications via localhost from the host system

Is there any other way except port-forwarding, I can access the apps running inside my K8s cluster via http://localhost:port from my host operating system.
For example
I am running minikube setup to practise the K8s and I deployed three pods along with their services, I choose three different service type, Cluster IP, nodePort and LoadBalancer.
For Cluster IP, I can use port-forward option to access my app via localhost:port, but the problem is, I have to leave that command running and if for some reason, it is distributed, connection will be dropped, so is there any alternate solution here ?
For nodePort, I can only access this via minikube node IP not with the localhost, therefore, if I have to access this remotely, I wont have a route to this node IP address
For LoadBalancer, not a valid option as I am running minikube in my local system not in cloud.
Please let me know if there is any other solution to this problem, the reason why I am asking this when I deploy same application via docker compose, I can access all these services via localhost:port and I can even call them via VM_IP:port from other systems.
Thanks,
-Rafi

how to connect to k8s cluster with bitnami postgresql-ha deployed?

My setup (running locally in two minikubes) is I have two k8s clusters:
frontend cluster is running a golang api-server,
backend cluster is running an ha bitnami postgres cluster (used bitnami postgresql-ha chart for this)
Although if i set the pgpool service to use nodeport and i get the ip + port for the node that the pgpool pod is running on i can hardwire this (host + port) to my database connector in the api-server (in the other cluster) this works.
However what i haven't been able to figure out is how to generically connect to the other cluster (e.g. to pgpool) without using the ip address?
I also tried using Skupper, which also has an example of connecting to a backend cluster with postgres running on it, but their example doesn't use bitnami ha postgres helm chart, just a simple postgres install, so it is not at all the same.
Any ideas?
For those times when you have to, or purposely want to, connect pods/deployments across multiple clusters, Nethopper (https://www.nethopper.io/) is a simple and secure solution. The postgresql-ha scenario above is covered under their free tier. There is a two cluster minikube 'how to' tutorial at https://www.nethopper.io/connect2clusters which is very similar to your frontend/backend use case. Nethopper is based on skupper.io, but the configuration is much easier and user friendly, and is centralized so it scales to many clusters if you need to.
To solve your specific use case, you would:
First install your api server in the frontend and your bitnami postgresql-ha chart in the backend, as you normally would.
Go to https://mynethopper.com/ and
Register
Clouds -> define both clusters (clouds), frontend and backend
Application Network -> create an application network
Application Network -> attach both clusters to the network
Application Network -> install nethopper-agent in each cluster with copy paste instructions.
Objects -> import and expose pgpool (call the service 'pgpool') in your backend.
Objects -> distribute the service 'pgpool' to frontend, using a distribution rule.
Now, you should see 'pgpool' service in the frontend cluster
kubectl get service
When the API server pods in the frontend request service from pgpool, they will connect to pgpool in the backend, magically. It's like the 'pgpool' pod is now running in the frontend.
The nethopper part should only take 5-10 minutes, and you do NOT need IP addresses, TLS certs, K8s ingresses or loadbalancers, a VPN, or an istio service mesh or sidecars.
After moving to the one cluster architecture, it became easier to see how to connect to the bitnami postgres-ha cluster, by trying a few different things finally this worked:
-postgresql-ha-postgresql-headless:5432
(that's the host and port I'm using to call from my golang server)
Now i believe it should be fairly straightforward to also run the two cluster case using skupper to bind to the headless service.

Kubernetes: kafka pod rechability issue from another pod

I know the below information is not enough to trace the issue but still, I want some solution.
We have Amazon EKS cluster.
Currently, we are facing the reachability of the Kafka pod issue.
Environment:
Total 10 nodes with Availability zone ap-south-1a,1b
I have a three replica of the Kafka cluster (Helm chart installation)
I have a three replica of the zookeeper (Helm chart installation)
Kafka using external advertised listener on port 19092
Kafka has service with an internal network load balancer
I have deployed a test-pod to check reachability of Kafka pod.
we are using cloud-map based DNS for advertized listener
Working:
When I run telnet command from ec2 like telnet 10.0.1.45 19092. It works as expected. IP 10.0.1.45 is a loadbalancer ip.
When I run telnet command from ec2 like telnet 10.0.1.69 31899. It works as expected. IP 10.0.1.69 is a actual node's ip and 31899 is nodeport.
Problem:
When I run same command from test-pod. like telnet 10.0.1.45 19092. It works sometime and sometime it will gives an error like telnet: Unable to connect to remote host: Connection timed out
The issue is something related to kube-proxy. we need help to resolve this issue.
Can anyone help to guide me?
Can I restart kube-proxy? Does it affect other pods/deployments?
I believe this problem is caused by AWS's NLB TCP-only nature (as mentioned in the comments).
In a nutshell, your pod-to-pod communication fails when hairpin is needed.
To confirm this is the root cause, you can verify that when the telnet works, kafka pod and client pod are not in the same EC2 node. And when they're in the same EC2 server, the telnet fails.
There are (at least) two approaches to tackle this issue:
Use K8s internal networking - Refer to k8s Service's URL
Every K8s service has its own DNS FQDN for internal usage (meaning using k8s network only, without reaching the LoadBalancer and come back to k8s again). You can just telnet this instead of the NodePort via the LB.
I.e. let's assume your kafka service is named kafka. Then you can just telnet kafka.svc.cluster.local (on the port exposed by kafka service)
Use K8s anti-affinity to make sure client and kafka are never scheduled in the same node.
Oh and as indicated in this answer you might need to make that service headless.

Fail to connect the GKE with GCE on the same VPC?

I am new to Google Cloud Platform and the following context:
I have a Compute Engine VM running as a MongoDB server and a Compute Engine VM running as a NodeJS server already with Docker. Then the NodeJS application connects to Mongo via the default VPC internal IP. Now, I'm trying to migrate the NodeJS application to Google Kubernetes Engine, but I can't connect to the MongoDB server when I deploy the NodeJS application Docker image to the cluster.
All services like GCE and GKE are in the same region (us-east-1).
I did a hard test accessing a kubernetes cluster node via SSH and deploying a simple MongoDB Docker image and trying to connect to the remote MongoDB server via command line, but the problem is the same, time out when trying to connect.
I have also checked the firewall settings on GCP as well as the bindIp setting on the MongoDB server and it has no blocking on that.
Does anyone know what may be happening? Thank you very much.
In my case traffic from GKE to GCE VM was blocked by Google Firewall even thou both are in the same network (default).
I had to whitelist cluster pod network listed in cluster details:
Pod address range 10.8.0.0/14
https://console.cloud.google.com/kubernetes/list
https://console.cloud.google.com/networking/firewalls/list
By default, containers in a GKE cluster should be able to access GCE VMs of the same VPC through internal IPs. It is just like you access the internet (e.g., google.com) from GKE containers, GKE and VPC know how to route the traffic. The problem must be with other configurations (firewall or your application).
You can do a test, start a simple HTTP server in the GCE VM, say the internal IP is 10.138.0.5:
python -m SimpleHTTPServer 8080
then create a GKE container and try to access the service:
kubectl run my-client -it --image=tutum/curl --generator=run-pod/v1 -- curl http://10.138.0.5:8080

Kubernetes service ip isn't always accessible within cluster (with flannel)

I built a kubernetes cluster, using flannel overlay network. The problem is one of the service ip isn't always accessible.
I tested within the cluster, by telneting the service ip and port, ended in connection timeout. Checked with netstat, the connection was always in "SYN_SENT" state, seemed that peer didn't accept connection.
But if I telnet to the pod ip and port that backed the service directly, the connection could be made successfully.
It only happened to one of the service, other services are ok.
And if I scaled the backend pod to a larger value, like 2. Then some of requests to the service ip can succeed. It seemed that the service wasn't able to connect to one of the backed pod.
Which component may be the cause of such problem? My service configuration, kube-proxy or flannel?
Check the discussion here: https://github.com/kubernetes/kubernetes/issues/38802
It's required to sysctl net.bridge.bridge-nf-call-iptables=1 on nodes.