I am currently trying to form a high availability Kubernetes cluster having 4 worker nodes and 3 master in on-premise server. I learned about implementation of high availability cluster by checking its documentation. I have installed HAProxy 1.8.3.
While deploying kubernetes using kubeadm, installation is failing with error api server timeout.
So, when i checked my telnet connection from master to HAProxy ip on 6443, Connection closed by foreign host.
Can anyone help me on this problem?
Related
I'm trying to investigate issue with random 'Connection reset by peer' error or long (up 2 minutes) PDO connection initializations but failing to find a solution.
Similar issue: https://kubernetes.io/blog/2019/03/29/kube-proxy-subtleties-debugging-an-intermittent-connection-reset/, but that supposed to be fixed in the version of kubernetes that I'm running.
GKE config details:
GKE is running on 1.20.12-gke.1500 version, with a NAT network configuration and a router. Cluster has 2 nodes and router has 2 static IP's assigned with dynamic port allocation and range of 32728-65536 ports per VM.
On the kubernetes:
deployments: docker image with local nginx, php-fpm, and google sql proxy
services: LoadBalancer to expose the deployment
As per replication of the issue I created a simple script connecting in a loop to database and making simple count query. I eliminated issues with the database server by testing the script on a standalone GCE VM where I didn't get any issues. When I'm running the script on any of the application pods in the cluster, I'm getting random 'Connection reset by peer' errors. I have tested that script using google sql proxy service or with direct database IP with same random connection issues.
Any help would be appreciated.
Update
On https://cloud.google.com/kubernetes-engine/docs/release-notes I can see that there has been fix released to solve potentially something that I'm getting: "The following GKE versions fix a known issue in which random TCP connection resets might happen for GKE nodes that use Container-Optimized OS with Docker (cos). To fix the issue, upgrade your nodes to any of these versions:"
I'm updating nodes this evening so I hope that will solve the issue.
Update
The update of nodes solved random connection resets.
Updating cluster and nodes to 1.20.15-gke.3400 version using google cloud panel resolved the issue.
My setup (running locally in two minikubes) is I have two k8s clusters:
frontend cluster is running a golang api-server,
backend cluster is running an ha bitnami postgres cluster (used bitnami postgresql-ha chart for this)
Although if i set the pgpool service to use nodeport and i get the ip + port for the node that the pgpool pod is running on i can hardwire this (host + port) to my database connector in the api-server (in the other cluster) this works.
However what i haven't been able to figure out is how to generically connect to the other cluster (e.g. to pgpool) without using the ip address?
I also tried using Skupper, which also has an example of connecting to a backend cluster with postgres running on it, but their example doesn't use bitnami ha postgres helm chart, just a simple postgres install, so it is not at all the same.
Any ideas?
For those times when you have to, or purposely want to, connect pods/deployments across multiple clusters, Nethopper (https://www.nethopper.io/) is a simple and secure solution. The postgresql-ha scenario above is covered under their free tier. There is a two cluster minikube 'how to' tutorial at https://www.nethopper.io/connect2clusters which is very similar to your frontend/backend use case. Nethopper is based on skupper.io, but the configuration is much easier and user friendly, and is centralized so it scales to many clusters if you need to.
To solve your specific use case, you would:
First install your api server in the frontend and your bitnami postgresql-ha chart in the backend, as you normally would.
Go to https://mynethopper.com/ and
Register
Clouds -> define both clusters (clouds), frontend and backend
Application Network -> create an application network
Application Network -> attach both clusters to the network
Application Network -> install nethopper-agent in each cluster with copy paste instructions.
Objects -> import and expose pgpool (call the service 'pgpool') in your backend.
Objects -> distribute the service 'pgpool' to frontend, using a distribution rule.
Now, you should see 'pgpool' service in the frontend cluster
kubectl get service
When the API server pods in the frontend request service from pgpool, they will connect to pgpool in the backend, magically. It's like the 'pgpool' pod is now running in the frontend.
The nethopper part should only take 5-10 minutes, and you do NOT need IP addresses, TLS certs, K8s ingresses or loadbalancers, a VPN, or an istio service mesh or sidecars.
After moving to the one cluster architecture, it became easier to see how to connect to the bitnami postgres-ha cluster, by trying a few different things finally this worked:
-postgresql-ha-postgresql-headless:5432
(that's the host and port I'm using to call from my golang server)
Now i believe it should be fairly straightforward to also run the two cluster case using skupper to bind to the headless service.
I am trying to convert a single master setup to 3 HA master. Load balancer is configured to accept traffic at port 443 and one of its backend is the single master IP and its port 443. The connectivity between load balancer and the k8s nodes are verified.
It's working fine before when the server in my ~/.kube/config used the IP of the master node. But when I tried to change it to the dedicated load balancer DNS. It throws the x509 cannot validate certificate error. We are using digicert, do I need to put it somewhere in the kube-api server or kubelet?
I am using https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ as guide
sudo kubeadm init --control-plane-endpoint companykube.com:443 --kubernetes-version v1.21.2 --pod-network-cidr 10.244.0.0/16 --cri-socket /run/containerd/containerd.sock --upload-certs --apiserver-cert-extra-sans=companykube.com,singlemasterIp,2ndmasterIP,3rdmasterIP --apiserver-bind-port=443
Here are such a lot steps and aspects to create multi-master and you described so few.
Usually multi-master(and always HA) contains 3 masters that have properly configured etcd.
Please check Install and configure a multi-master Kubernetes cluster with kubeadm - I used that article to create multi-master cluster.
your steps:
Installing the HAProxy load balancer
Generating the TLS certificates and distributing them
Creating the certificate for the Etcd cluster and distributing them
Installing and configuring Etcd
Initializing the master nodes
Configuring kubectl on the client machine(kubeconfig you are asking for)
Deploying the overlay network
And I highly recommend you check Kubernetes High Availability: No Single Point of Failure - its about HA in real life. Maybe you need HA instead of multi-master to ensure your cluster is always accessible?
Unfortunately, the problem is in the load balancer side. The traffic is not going through from front-end to back-end.
I know the below information is not enough to trace the issue but still, I want some solution.
We have Amazon EKS cluster.
Currently, we are facing the reachability of the Kafka pod issue.
Environment:
Total 10 nodes with Availability zone ap-south-1a,1b
I have a three replica of the Kafka cluster (Helm chart installation)
I have a three replica of the zookeeper (Helm chart installation)
Kafka using external advertised listener on port 19092
Kafka has service with an internal network load balancer
I have deployed a test-pod to check reachability of Kafka pod.
we are using cloud-map based DNS for advertized listener
Working:
When I run telnet command from ec2 like telnet 10.0.1.45 19092. It works as expected. IP 10.0.1.45 is a loadbalancer ip.
When I run telnet command from ec2 like telnet 10.0.1.69 31899. It works as expected. IP 10.0.1.69 is a actual node's ip and 31899 is nodeport.
Problem:
When I run same command from test-pod. like telnet 10.0.1.45 19092. It works sometime and sometime it will gives an error like telnet: Unable to connect to remote host: Connection timed out
The issue is something related to kube-proxy. we need help to resolve this issue.
Can anyone help to guide me?
Can I restart kube-proxy? Does it affect other pods/deployments?
I believe this problem is caused by AWS's NLB TCP-only nature (as mentioned in the comments).
In a nutshell, your pod-to-pod communication fails when hairpin is needed.
To confirm this is the root cause, you can verify that when the telnet works, kafka pod and client pod are not in the same EC2 node. And when they're in the same EC2 server, the telnet fails.
There are (at least) two approaches to tackle this issue:
Use K8s internal networking - Refer to k8s Service's URL
Every K8s service has its own DNS FQDN for internal usage (meaning using k8s network only, without reaching the LoadBalancer and come back to k8s again). You can just telnet this instead of the NodePort via the LB.
I.e. let's assume your kafka service is named kafka. Then you can just telnet kafka.svc.cluster.local (on the port exposed by kafka service)
Use K8s anti-affinity to make sure client and kafka are never scheduled in the same node.
Oh and as indicated in this answer you might need to make that service headless.
I am new to Google Cloud Platform and the following context:
I have a Compute Engine VM running as a MongoDB server and a Compute Engine VM running as a NodeJS server already with Docker. Then the NodeJS application connects to Mongo via the default VPC internal IP. Now, I'm trying to migrate the NodeJS application to Google Kubernetes Engine, but I can't connect to the MongoDB server when I deploy the NodeJS application Docker image to the cluster.
All services like GCE and GKE are in the same region (us-east-1).
I did a hard test accessing a kubernetes cluster node via SSH and deploying a simple MongoDB Docker image and trying to connect to the remote MongoDB server via command line, but the problem is the same, time out when trying to connect.
I have also checked the firewall settings on GCP as well as the bindIp setting on the MongoDB server and it has no blocking on that.
Does anyone know what may be happening? Thank you very much.
In my case traffic from GKE to GCE VM was blocked by Google Firewall even thou both are in the same network (default).
I had to whitelist cluster pod network listed in cluster details:
Pod address range 10.8.0.0/14
https://console.cloud.google.com/kubernetes/list
https://console.cloud.google.com/networking/firewalls/list
By default, containers in a GKE cluster should be able to access GCE VMs of the same VPC through internal IPs. It is just like you access the internet (e.g., google.com) from GKE containers, GKE and VPC know how to route the traffic. The problem must be with other configurations (firewall or your application).
You can do a test, start a simple HTTP server in the GCE VM, say the internal IP is 10.138.0.5:
python -m SimpleHTTPServer 8080
then create a GKE container and try to access the service:
kubectl run my-client -it --image=tutum/curl --generator=run-pod/v1 -- curl http://10.138.0.5:8080