How to setup a health check for a celery-sqs cluster on AWS - celery

I'm trying to setup an ELB for my celery cluster that uses SQS as the broker. The instance is only running celery and not a webserver.
How can I go about setting up an ELB to health check the availability of the cluster. I've seen tutorials online suggesting a TCP check on 5672 for a celery cluster with rabbitmq, but since I'm using SQS here which works over HTTPS, I'm not quite sure how to go about with this?

Related

how to connect to k8s cluster with bitnami postgresql-ha deployed?

My setup (running locally in two minikubes) is I have two k8s clusters:
frontend cluster is running a golang api-server,
backend cluster is running an ha bitnami postgres cluster (used bitnami postgresql-ha chart for this)
Although if i set the pgpool service to use nodeport and i get the ip + port for the node that the pgpool pod is running on i can hardwire this (host + port) to my database connector in the api-server (in the other cluster) this works.
However what i haven't been able to figure out is how to generically connect to the other cluster (e.g. to pgpool) without using the ip address?
I also tried using Skupper, which also has an example of connecting to a backend cluster with postgres running on it, but their example doesn't use bitnami ha postgres helm chart, just a simple postgres install, so it is not at all the same.
Any ideas?
For those times when you have to, or purposely want to, connect pods/deployments across multiple clusters, Nethopper (https://www.nethopper.io/) is a simple and secure solution. The postgresql-ha scenario above is covered under their free tier. There is a two cluster minikube 'how to' tutorial at https://www.nethopper.io/connect2clusters which is very similar to your frontend/backend use case. Nethopper is based on skupper.io, but the configuration is much easier and user friendly, and is centralized so it scales to many clusters if you need to.
To solve your specific use case, you would:
First install your api server in the frontend and your bitnami postgresql-ha chart in the backend, as you normally would.
Go to https://mynethopper.com/ and
Register
Clouds -> define both clusters (clouds), frontend and backend
Application Network -> create an application network
Application Network -> attach both clusters to the network
Application Network -> install nethopper-agent in each cluster with copy paste instructions.
Objects -> import and expose pgpool (call the service 'pgpool') in your backend.
Objects -> distribute the service 'pgpool' to frontend, using a distribution rule.
Now, you should see 'pgpool' service in the frontend cluster
kubectl get service
When the API server pods in the frontend request service from pgpool, they will connect to pgpool in the backend, magically. It's like the 'pgpool' pod is now running in the frontend.
The nethopper part should only take 5-10 minutes, and you do NOT need IP addresses, TLS certs, K8s ingresses or loadbalancers, a VPN, or an istio service mesh or sidecars.
After moving to the one cluster architecture, it became easier to see how to connect to the bitnami postgres-ha cluster, by trying a few different things finally this worked:
-postgresql-ha-postgresql-headless:5432
(that's the host and port I'm using to call from my golang server)
Now i believe it should be fairly straightforward to also run the two cluster case using skupper to bind to the headless service.

Kubernetes: kafka pod rechability issue from another pod

I know the below information is not enough to trace the issue but still, I want some solution.
We have Amazon EKS cluster.
Currently, we are facing the reachability of the Kafka pod issue.
Environment:
Total 10 nodes with Availability zone ap-south-1a,1b
I have a three replica of the Kafka cluster (Helm chart installation)
I have a three replica of the zookeeper (Helm chart installation)
Kafka using external advertised listener on port 19092
Kafka has service with an internal network load balancer
I have deployed a test-pod to check reachability of Kafka pod.
we are using cloud-map based DNS for advertized listener
Working:
When I run telnet command from ec2 like telnet 10.0.1.45 19092. It works as expected. IP 10.0.1.45 is a loadbalancer ip.
When I run telnet command from ec2 like telnet 10.0.1.69 31899. It works as expected. IP 10.0.1.69 is a actual node's ip and 31899 is nodeport.
Problem:
When I run same command from test-pod. like telnet 10.0.1.45 19092. It works sometime and sometime it will gives an error like telnet: Unable to connect to remote host: Connection timed out
The issue is something related to kube-proxy. we need help to resolve this issue.
Can anyone help to guide me?
Can I restart kube-proxy? Does it affect other pods/deployments?
I believe this problem is caused by AWS's NLB TCP-only nature (as mentioned in the comments).
In a nutshell, your pod-to-pod communication fails when hairpin is needed.
To confirm this is the root cause, you can verify that when the telnet works, kafka pod and client pod are not in the same EC2 node. And when they're in the same EC2 server, the telnet fails.
There are (at least) two approaches to tackle this issue:
Use K8s internal networking - Refer to k8s Service's URL
Every K8s service has its own DNS FQDN for internal usage (meaning using k8s network only, without reaching the LoadBalancer and come back to k8s again). You can just telnet this instead of the NodePort via the LB.
I.e. let's assume your kafka service is named kafka. Then you can just telnet kafka.svc.cluster.local (on the port exposed by kafka service)
Use K8s anti-affinity to make sure client and kafka are never scheduled in the same node.
Oh and as indicated in this answer you might need to make that service headless.

How to scale RabbitMQ across multiple Kubernetes Clusters

I have an application running in a Kubernetes cluster Azure AKS which is made up of a website running in one deployment, background worker processes running as scheduled tasks in Kubernetes, RabbitMQ running as another deployment and a SQL Azure DB which is not part of the Kubernetes.
I would like to deploy achieve load balancing and failover by deploying another kubernetes cluster in another region and placing a Traffic Manager DNS Load Balancer in front of the web site.
The problem that I see is that if the two rabbit instances are in separate kubernetes clusters then items queued in one will not be available in the other.
Is there a way to cluster the rabbitmq instances running in each kubernetes cluster or something besides clustering?
Or is there a common design pattern that might avoid problems from having seperate queues?
I should also note that currently there is only one node running RabbitMq in the current kuberntes cluster but as part of this upgrade it seems like a good idea to run multiple nodes in each cluster which I think the current Helm charts support.
You shouldn't cluster RabbitMQ nodes across regions. Your cluster will get split brain because of network delays. To synchronise RabbitMQ queues, exchanges between clusters you can use federation or shovel plugin, depending on your use case.
Federation plugin can be enabled on a cluster by running commands:
rabbitmqctl stop_app
rabbitmq-plugins enable rabbitmq_federation
rabbitmq-plugins enable rabbitmq_federation_management
rabbitmqctl start_app
Mode details on Federation.
For shovel:
rabbitmq-plugins enable rabbitmq_shovel
rabbitmq-plugins enable rabbitmq_shovel_management
rabbitmqctl stop_app
rabbitmqctl start_app
Mode details on Shovel.
Full example on how to setup Federation on RabbitMQ cluster can be found here.

Cannot connect to kafka connect cluster running on AWS from outside EC2

I have an ECS cluster with 3 EC2 instances all sitting in private subnets. I created a task definition to run the kafka-connect image provided by Confluent with the following environment variables:
CONNECT_CONFIG_STORAGE_TOPIC=quickstart-config
CONNECT_GROUP_ID=quickstart
CONNECT_INTERNAL_KEY_CONVERTER=org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_VALUE_CONVERTER=org.apache.kafka.connect.json.JsonConverter
CONNECT_KEY_CONVERTER=org.apache.kafka.connect.json.JsonConverter
CONNECT_OFFSET_STORAGE_TOPIC=quickstart-offsets
CONNECT_PLUGIN_PATH=/usr/share/java
CONNECT_REST_ADVERTISED_HOST_NAME=localhost
CONNECT_REST_ADVERTISED_PORT=8083
CONNECT_SECURITY_PROTOCOL=SSL
CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM=
CONNECT_STATUS_STORAGE_TOPIC=quickstart-status
CONNECT_VALUE_CONVERTER=org.apache.kafka.connect.json.JsonConverter
I have an application load balancer in front of this cluster with a listener on port 8083. I have correctly set up target group to include the EC2 instances running kafka-connect. So the load balancer should forward requests to the cluster. And it does, but I always get back a 502 Bad Gateway response. I can ssh into the EC2 instances and curl localhost:8083 and get the response back from kafka-connect, but from outside the EC2, I don't get a response.
To rule out networking issues between the load balancer and the cluster, I created a separate task defintion running Nginx on port 80 and I'm able to successfully hit it from outside the EC2 instances through the load balancer.
I have a feeling that I have not set CONNECT_REST_ADVERTISED_HOST_NAME to the correct value. It's my understanding that this is the host clients should connect to. However, because my EC2 instances are in a private subnet, I have no idea what to set this to, which is why I've set it to localhost. I tried setting it to the load balancer's DNS name, but that doesn't work.
You need to set CONNECT_REST_ADVERTISED_HOST_NAME to the host or IP that the other Kafka Connect workers can resolve and connect to.
It's used for the internal communication between workers, and if it's localhost then if your REST request (via your load balancer) hits a worker that is not the current leader of the cluster, that worker will try to forward the request to the leader—using the CONNECT_REST_ADVERTISED_HOST_NAME. But if CONNECT_REST_ADVERTISED_HOST_NAME is localhost then the worker will simply be forwarding the request to itself and hence things won't work.
For more details see https://rmoff.net/2019/11/22/common-mistakes-made-when-configuring-multiple-kafka-connect-workers/

Connecting RabbitMQ | Pika | Kubernetes

I have a kubernetes cluster running. Inside this I have python application running as a pod.
This app is talking to rabbitmq which is also one of the pods of the same cluster. So the whole connection is done using internal IPs.
The problem is I have a service for rabbitmq, so I am providing the service IP for connection. It looks something like this ( with pika )
credentials = pika.PlainCredentials(rabbituser, rabbitpass)
connection = pika.BlockingConnection(pika.ConnectionParameters(host=QHost, port=80, credentials=credentials))
channel = connection.channel()
As you can see the connection I am trying to make is to port 80 which is open in the k8s service.
Now it should redirect it to respective pod which is rabbitmq, but it is trying to connect to port 5672
That means, error is because it is trying to connect to service-ip:5672, which is not there.
I want it to forward the request to the rabbit mq pod where the service is pointed.
I hope that I have explained main things.
Please ask if more details required.
Thanks