Artemis k8s cluster client connection handling - kubernetes

I am using the Artemis Cloud operator for deploying ActiveMQ Artemis in k8s cluster. Number of replicas configured is 3. Queues are expected to be created by client applications. Operator creates a headless service and service for each pod in the cluster setup.
Whenever client connects to a pod ,it creates a queue in that broker pod.So if client connects to 3 brokers at random time, three queues gets created in pods, one in each pod respectively.So when a producer sends message to the queue, it is sent to the connected pod. It wont be present in each broker pod.(No replication of messages).
My question is what service name should client applications use inorder to connect to artemis pods and also maintain session affinity? In other words, what should be done in order to make a client connect to same broker whenever it tries a connection?(and avoid duplicate queue creation)
What I currently use is a kubernetes clusterip service I created that splits traffics to pods.And queues are created via stomp producer.

Not sure which helm it will be using in the background but you will be using the service name inside the kubernetes app for connection instead of the ClusterIP.
Service name could be starting the with helm chart name. i was checking and referring this chart for ref : https://github.com/deviceinsight/activemq-artemis-helm could be different.
Looks like it's creating the service with Cluster IP:none (Headless service) so connecting to the existing service is not working, i doubt currently it might be returning all the PODs IP.
Option : 2 if above one not work give try to this
In another case, you can create the new service type clusterIP with a different name, everything else will be the same, like port and all.
In service you can notice amqp, mqtt and other port, so you app will connect to new service with port config as per requirement. For example : active-mq-service:61613
If everything working fine for you and you are just looking for session affinity you can add this config to your service and it will start managing the session affinity.
SessionAffinity: ClientIP
If you want to make sure that connections from a particular client are
passed to the same Pod each time, you can select the session affinity
based on the client's IP addresses by setting
service.spec.sessionAffinity to "ClientIP" (the default is "None").
You can also set the maximum session sticky time by setting
service.spec.sessionAffinityConfig.clientIP.timeoutSeconds
appropriately. (the default value is 10800, which works out to be 3
hours).
Ref doc : https://kubernetes.io/docs/concepts/services-networking/service/

Related

Does sessionAffinity over ClientIP works with UDP protocol on Kubernetes setup?

lets say, we have two independent Kubernetes clusters Cluster 1 & Cluster 2 , Each of them has two replicas of same application Pod. Like
Cluster 1 : Pod A & Pod B
Cluster 2 : Pod C & Pod D
Application code in Pod A(client) wants to connect to any Pod running in cluster 2 via NodePort/Loadbalancer service over UDP protocol to send messages. The only requirement is, to maintain affinity so that all messages from Pod A should go to any one pod only (either Pod C or Pod D). Since, UDP is a connectionless protocol, my concern is around the session Affinity based on ClientIP. Should setting the sessionAffinity as client IP solve my issue ?
Since, UDP is a connectionless protocol, my concern is around the session Affinity based on ClientIP. Should setting the sessionAffinity as client IP solve my issue ?
sessionAffinity keeps each session based on sourceIP regardless of the protocols at the same cluster. But it does not mean your real session is kept as you expected on your env across your whole access path journey.
In other words, just only using sessionAffinity does not ensure keeping whole session on your access paths.
For example, Pod A outbound IP is translated as running node IP(SNAT) if you does not use egress IP solutions for the Pod A.
It also depends your NodePort and LoadBalancer Service config about source IP in cluster 2. Refer Using Source IP for more details.
So you should consider how to keep session safely while accessing each other between other clusters. Personally I think you had better consider application layer(7Layer) sticky session for keeping the session, not sessionAffinity of the service.

Connect ArangoDB cluster hosted on Kubernetes pods via Python

I have a 3 node system, on which I have hosted a basic ArangoDB cluster having 3 dbservers. I am using Python-arango library, which says that to connect to the clusters, we need to supply the list of IPs with the port 8529 to ArangoClient class.My Python code is running in a different pod.
Using Arangosh, I can access the coordinator pods using localhost:8529 configuration. However, on Arangosh, I can only access one coordinator pod at once. Since the Arango arrangement is a cluster, I would want to connect to all three coordinator pods at once in a roundrobin manner. Now as per the Kubernetes documentation in case of MongoDB, I can supply the list of the <pod_name.service_name> to the MongoClient and it would get connected. I am trying something similar with the Arango coordinator pods, but that does not seem to work. The examples in Arango documentation only point to writing 'localhost-1','localhost-2', etc.
To summarize, the issue in hand is that I do not know how to identify the IP of the Arango coordinator pods. I tried using the internal IPs of the coordinators, I tried using the http://pod_name.8529 and I have tried using the end point of the coordinator that shows in the Web UI (which begins with ssl, but I had replaced ssl with http) and yet got no success.
Would be grateful for any help in this context. Please let me know if some additional information is needed. Am stuck with this problem since a week now.
if you follow this : https://www.arangodb.com/2018/12/deploying-arangodb-3-4-on-kubernetes/
there is YAML config files it's creating multiple PODs. How your application should be connecting with ArragoDB is using the Kubernetes service.
So your application will be connecting to Kubernetes service and by default kubernets service manage the RoundRobin load balancing.
in example you can see it's creating three services
my-arangodb-cluster ClusterIP 10.11.247.191 <none> 8529/TCP 46m
my-arangodb-cluster-ea LoadBalancer 10.11.241.253 35.239.220.180 8529:31194/TCP 46m
my-arangodb-cluster-int ClusterIP None <none> 8529/TCP 46m
service my-arangodb-cluster-ea publicly open. While service my-arangodb-cluster can be accessed internally using the application.
Your flow will be something like
Application > Arango service (Round robin) > Arango Pods
I guess you either can connect to them using LoadBalancer as mentioned above, in that case k8s will balance your requests between coordinator pods,
or you can explicitly create services for each coordinator and use their IP addresses in your python client configuration in the way you want

How does Cassandra driver update contactPoints if all pods are restarted in Kubernetes without restarting the client application?

We have created a statefulset & headless service. There are 2 ways by which we can define peer ips in application:
Use 'cassandra-headless-service-name' in contactPoints
Fetch the peers ip from headless-service & externalize the peers ip and read these ips when initializing the connection.
SO far so good.
Above will work if one/some pods are restarted, not all. In this case, driver will updated the new ips automatically.
But, how this will work in case of complete outage ? If all pods are down & when they come back, if all pods ip are changed (IP can change in Kubernetes), how do application will connect to Cassandra?
In a complete outage, you're right, the application will not have any valid endpoints for the cluster. Those will need to be refreshed (and the app restarted) before the app will connect to Cassandra.
We actually wrote a RESTful API that we can use query current, valid endpoints by cluster. That way, the app teams can find the current IPs for their cluster at any time. I recommend doing something similar.

Why headless service to be used for Kafka in Kubernetes, why not Cluster IP with load balancing out of box?

Most of the examples I come across to use Kafka in Kubernetes is to deploy it as a headless service but I am not able to get the answer yet on why it should be headless and not Cluster IP? In my opinion cluster, IP provides the load balancing in which we ensure out of the box that not only one of the broker gets loaded always with its resources as I see with headless the Kafka clients be it sarma or java client tries to pick always the first IP from the DNS lookup and connects to it, will this not be a bottleneck if there are around 100+ clients trying to do the same and open connection to the first IP? or Kafka handles this inbuilt already which I am still trying to understand how it really happens.
When there is no differentiation between various instances of a services(replicas of a pod serving a stateless application), you can expose them under a ClusterIP service as connecting to any of the replica to serve the current request is okay. This is not the case with stateful services(like Kafka, databases etc). Each instance is responsible for it's own data. Each instance might be owning a different partition/topic etc. The instances of the service are not exact "replicas". Solutions for running such stateful services on Kubernetes usually use headless services and/or statefulsets so that each instance of the service has a unique identity. Such stateful applications usually have their own clustering technology that rely on each instance in the cluster having a unique identity.
Now that you know why stable identities are required for stateful applications and how statefulsets with headless services provide stable identities, you can check how your Kafka distributions might using them to run Kafka on kubernetes.
This blog post explains how strimzi does it:
For StatefulSets – which Strimzi is using to run the Kafka brokers –
you can use the Kubernetes headless service to give each of the pods a
stable DNS name. Strimzi is using these DNS names as the advertised
addresses for the Kafka brokers. So with Strimzi:
The initial connection is done using a regular Kubernetes service to
get the metadata.
The subsequent connections are opened using the DNS
names given to the pods by another headless Kubernetes service.
It's used in cases where communication to specific Pods is needed.
For example, A monitoring service must be able to reach all pods behind a service, to check their status, so it needs the addresses of all Pods and not just any one of them. This would be a use case of headless service.
Or when there is a cluster of Pods being set up, it's important to coordinate with the Pods to keep the cluster working for consumers. In Kafka, this work is done by Zookeeper. thus a headless service is needed by Zookeeper
Stateful:
Kafka streaming platform maintain replicas of partition across kafka brokers based on RELICATION_FACTOR. It maintains it data across persistent storage. When it comes to K8s ; stateful type is suggested; Pods in StatefulSets are not interchangeable: each Pod has a unique identifier that is maintained no matter where it is scheduled.
Headless:
To maintain internal communication between PODS. Lets not forget Zookeeper orchestrates kafka brokers.
Thanks
Within POD they should know eachother who is running and who stopped

Proxy outgoing traffic of Kubernetes cluster through a static IP

I am trying to build a service that needs to be connected to a socket over the internet without downtime. The service will be reading and publishing info to a message queue, messages should be published only once and in the order received.
For this reason I thought of deploying it into Kubernetes where I can automatically have multiple replicas in case one process fails, i.e. just one process (pod) should be running all time, not multiple pods publishing the same messages to the queue.
These requests need to be routed through a proxy with a static IP, otherwise I cannot connect to the socket. I understand this may not be a standard use case as a reverse proxy as it is normally use with load balancers such as Nginx.
How is it possible to build this kind of forward proxy in Kubernetes?
I will be deploying this on Google Container Engine.
Assuming you're happy to use Terraform, you can use this:
https://github.com/GoogleCloudPlatform/terraform-google-nat-gateway
However, there's one caveat and that is it may inbound traffic to other clusters in that same region/zone.
Is the LoadBalancer that you need?
kubernetes create external loadbalancer,you can see this doc.