Intermittent timeout issues when communicating eks and rds - postgresql

I would like to know the cause or solution of the timeout that occurs in EKS.
When communicating with other AWS resources, such as RDS or MSK, in the pod of EKS, a timeout occurs intermittently.
There's not much traffic.
I am inquiring because it is a phenomenon that has not been found when using ECS or EC2, not EKS.
All resources are in the same vpc and connection is possible, but periodically times out.
RDS error log follows:
Cloud not receive data from client: Connection reset by peer
I referred to this link, but it didn't work.
https://aws.amazon.com/ko/premiumsupport/knowledge-center/rds-aurora-postgresql-connection-errors/
What am I doing wrong?

Related

Issues with outbound connections from pods on GKE cluster with NAT (and router)

I'm trying to investigate issue with random 'Connection reset by peer' error or long (up 2 minutes) PDO connection initializations but failing to find a solution.
Similar issue: https://kubernetes.io/blog/2019/03/29/kube-proxy-subtleties-debugging-an-intermittent-connection-reset/, but that supposed to be fixed in the version of kubernetes that I'm running.
GKE config details:
GKE is running on 1.20.12-gke.1500 version, with a NAT network configuration and a router. Cluster has 2 nodes and router has 2 static IP's assigned with dynamic port allocation and range of 32728-65536 ports per VM.
On the kubernetes:
deployments: docker image with local nginx, php-fpm, and google sql proxy
services: LoadBalancer to expose the deployment
As per replication of the issue I created a simple script connecting in a loop to database and making simple count query. I eliminated issues with the database server by testing the script on a standalone GCE VM where I didn't get any issues. When I'm running the script on any of the application pods in the cluster, I'm getting random 'Connection reset by peer' errors. I have tested that script using google sql proxy service or with direct database IP with same random connection issues.
Any help would be appreciated.
Update
On https://cloud.google.com/kubernetes-engine/docs/release-notes I can see that there has been fix released to solve potentially something that I'm getting: "The following GKE versions fix a known issue in which random TCP connection resets might happen for GKE nodes that use Container-Optimized OS with Docker (cos). To fix the issue, upgrade your nodes to any of these versions:"
I'm updating nodes this evening so I hope that will solve the issue.
Update
The update of nodes solved random connection resets.
Updating cluster and nodes to 1.20.15-gke.3400 version using google cloud panel resolved the issue.

EKS internal service connection unreliable

I just setup a new EKS cluster (latest version available, using three default AMI).
I deployed a Redis instance in it as a Kubernetes service and exposed it. I can access the Redis database through internal DNS like : mydatabase.redis (it's deployed in the redis namespace). In another pod I can connect to my Redis database, however sometimes the connection takes more than 10 seconds.
It's doesn't seem to be a DNS resolution issue as host mydatabase.redis responds immediately with the service IP address. However when I try to connect to it, for example: nc mydatabase.redis 6379 -v it sometimes connects instantly and sometimes takes more than 10 seconds.
All my services are impacted, I don't know why. I didn't change any settings in my cluster this a basic EKS cluster.
How can I debug this?

How to establish connectivity in AWS EKS/Fargate hybrid mode (app on Fargate, db on EKS worker node)

I'm running a hybrid version of EKS cluster where I'm trying to use AWS Fargate for some of my workload. From what I know, AWS Fargate can be used for stateless pods which makes natural that, for standard app/db scenario, you would have to use hybrid mode where app is running on Fargate while db is running on one of the EKS worker nodes.
Problem that I see is that app cannot communicate with db in this case.
Now, I would conclude that, workload on Fargate can be reached from the outside of the Fargate only if using ALB ingress in front?
If that is true, that would also not solve this problem since app (on Fargate) needs to connect to db (running on EKS worker nodes), not the vice versa. I guess this can be solved by having ALB ingress in front of db but seems to me like an overkill?
Is there any other way around this problem?
Thanks.
Do you really need db running on EKS? If you really do, I think you can create a Cluster Service for your db pods so that your application pods can have access to it without too much effort.
On the other hand, I would just go with an instance of RDS and create a security group there to allow access from your EKS Fargate pods.

Kubernetes Installation failing with on premise HAProxy load balancer

I am currently trying to form a high availability Kubernetes cluster having 4 worker nodes and 3 master in on-premise server. I learned about implementation of high availability cluster by checking its documentation. I have installed HAProxy 1.8.3.
While deploying kubernetes using kubeadm, installation is failing with error api server timeout.
So, when i checked my telnet connection from master to HAProxy ip on 6443, Connection closed by foreign host.
Can anyone help me on this problem?

Accessing GCP Memorystore from kubernetes

I'm trying to connect to Google cloud memorystore from kubernetes pod but I always get connection timeout error.
After investigation I found the following:
when I'm trying to connect to redis from pod which scheduled on the normal node pool, it works fine.
but when I'm trying to connect to redis from pod which scheduled on the Preembtiple node pool, it fails and I get connection timeout error.
So how can I solve this problem?
it's a bit hard to give an answer with the little informations you gave, we don't know any configuration of you cluster.
Not sure if I'm totally in the wrong but it may help.
Normal or preemptible node should not have any effect on the network connections if the nodes are on the same network. Whate could cause this on gke pods is that the memorystore works by creating a vpc peering, and that gke works sort of in the same way, thus preventing the memstore and the pods to speak to one another as two peering can't exchange with one another.
What should be done in this case is the use of ip aliasing in the gke creation: https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips
Hope this can help you.