ActiveMQ Artemis: Internal and external IP addresses - kubernetes

We are running an ActiveMQ Artemis cluster in a Kubernetes cluster. All of our applications (Java/Springboot/JMS) running in the Kubernetes cluster take advantage of connecting directly to the broker instances.
However, the IP addresses from the Kubernetes Pod network are unavailable outside of the cluster. Exposing the broker instances to the public network is possible — but with different IP addresses. This is similar to hiding the Artemis cluster behind a NAT configuration. When connecting to the brokers through the public IP addresses, client applications receive cluster topology information containing IP addresses (or hostnames?) that are unreachable outside of the cluster.
Is there any way to deal with “internal” and “external” IP addresses and/or hostnames and make topology discovery work for cluster-external applications?
And, related (I am not a Java developer): Is there any way to log received topology information for JMS applications?

ActiveMQ Artemis CORE client provides the useTopologyForLoadBalancing url parameter to disable the use of the cluster topology information for load balancing, i.e.
tcp://localhost:61616?useTopologyForLoadBalancing=false
The log of the cluster topology information can be enabled setting the TRACE for the org.apache.activemq.artemis.core.protocol.core logger in the logging.properties file, see the documentation, i.e.
loggers=...,org.apache.activemq.artemis.core.protocol.core
logger.org.apache.activemq.audit.message.level=TRACE
handler.CONSOLE.level=TRACE
handler.FILE.level=TRACE

You can't rely on topology discovery from outside clients. What you can do is either provide a list of the external ips or have a router / load-balancer in front of your cluster.

Related

Kakfa brokers inside kubernets can be addresed by clients outside k8s cluster

Basically I'm builing a system on google cloud. Most services are working on k8s cluster but some codes are not. Lambda and operator of composer, dataflow job are the examples. (Composer is also k8s but different cluster)
I picked kafka as event channel to interconnect the services and I have to decide proper place of kafka broker. K8s pods or VM. I prefer k8s pods, but I worry about the communication between brokers and services, espicially with services outside of k8s cluster.
Consumer addresses broker with "bootstrap server" that is list of some broker's static unique address. I suppose if brokers are installed inside k8s, addresses of them will be not static unique from outside. Can brokers are connected from service outside of k8s? If possible, which string must be provided to bootstrap sever config?
Conventional virtual machine is the solution without any suspicion. But I want put more and more things into k8s.
There are a different solutions to your problems
You can deploy the Kafka on K8s cluster and use the service mesh to interconnect both clusters. So broker and service can connect with each other without any worry.
If you are on GCP you can use the MCS service or traffic director and other service mesh.
You can also set up Kafka on VM and expose it over the IP and further that will be used by services to connect.
Can brokers are connected from service outside of k8s?
Yes, you can expose your Kafka broker using the service type Loadblanacer or Node Port. Reference doc
I suppose if brokers are installed inside k8s, addresses of them will
be not static unique from outside.
You dont need to bind Kafka to any specific hostname for the interface, Kafka will listen to all the interfaces and you can expose it using the K8s service if running on K8s.

Why headless service to be used for Kafka in Kubernetes, why not Cluster IP with load balancing out of box?

Most of the examples I come across to use Kafka in Kubernetes is to deploy it as a headless service but I am not able to get the answer yet on why it should be headless and not Cluster IP? In my opinion cluster, IP provides the load balancing in which we ensure out of the box that not only one of the broker gets loaded always with its resources as I see with headless the Kafka clients be it sarma or java client tries to pick always the first IP from the DNS lookup and connects to it, will this not be a bottleneck if there are around 100+ clients trying to do the same and open connection to the first IP? or Kafka handles this inbuilt already which I am still trying to understand how it really happens.
When there is no differentiation between various instances of a services(replicas of a pod serving a stateless application), you can expose them under a ClusterIP service as connecting to any of the replica to serve the current request is okay. This is not the case with stateful services(like Kafka, databases etc). Each instance is responsible for it's own data. Each instance might be owning a different partition/topic etc. The instances of the service are not exact "replicas". Solutions for running such stateful services on Kubernetes usually use headless services and/or statefulsets so that each instance of the service has a unique identity. Such stateful applications usually have their own clustering technology that rely on each instance in the cluster having a unique identity.
Now that you know why stable identities are required for stateful applications and how statefulsets with headless services provide stable identities, you can check how your Kafka distributions might using them to run Kafka on kubernetes.
This blog post explains how strimzi does it:
For StatefulSets – which Strimzi is using to run the Kafka brokers –
you can use the Kubernetes headless service to give each of the pods a
stable DNS name. Strimzi is using these DNS names as the advertised
addresses for the Kafka brokers. So with Strimzi:
The initial connection is done using a regular Kubernetes service to
get the metadata.
The subsequent connections are opened using the DNS
names given to the pods by another headless Kubernetes service.
It's used in cases where communication to specific Pods is needed.
For example, A monitoring service must be able to reach all pods behind a service, to check their status, so it needs the addresses of all Pods and not just any one of them. This would be a use case of headless service.
Or when there is a cluster of Pods being set up, it's important to coordinate with the Pods to keep the cluster working for consumers. In Kafka, this work is done by Zookeeper. thus a headless service is needed by Zookeeper
Stateful:
Kafka streaming platform maintain replicas of partition across kafka brokers based on RELICATION_FACTOR. It maintains it data across persistent storage. When it comes to K8s ; stateful type is suggested; Pods in StatefulSets are not interchangeable: each Pod has a unique identifier that is maintained no matter where it is scheduled.
Headless:
To maintain internal communication between PODS. Lets not forget Zookeeper orchestrates kafka brokers.
Thanks
Within POD they should know eachother who is running and who stopped

Connect to external database cluster from kubernetes

Is there option to connect to external database cluster from POD? I need to connect to elastic search, zookeeeper, Kafka and couchbase, each of them has its own cluster. Per my understanding the documentation, I can define multi external IPs, but I cannot find how will k8s behave if one of them is down. I am working with pure k8s 1.6 now, and we will migrate to 1.7 soon. Information about OpenShift 3.7 will be also welcome because I cannot find anything specific in its documentation.
The k8s doc on your link has more info on exposing services running on k8s but not externally
You generally want to expose your service using a DNS entry and manage the HA for that service separately.
For example you can a single DNS entry mykafka.mydomain.com and then assign IP addresses to that entry:
kafka1 ip
kafka2 ip
kafka3 ip
You can see that approach on the Openshift docs in the USING AN EXTERNAL DOMAIN NAME section. Yes, its not clear from the docs whether k8s/openshift does a round robin on the multiple IPs for an external service and if automatically fails over.
Hope it helps.

Proxy outgoing traffic of Kubernetes cluster through a static IP

I am trying to build a service that needs to be connected to a socket over the internet without downtime. The service will be reading and publishing info to a message queue, messages should be published only once and in the order received.
For this reason I thought of deploying it into Kubernetes where I can automatically have multiple replicas in case one process fails, i.e. just one process (pod) should be running all time, not multiple pods publishing the same messages to the queue.
These requests need to be routed through a proxy with a static IP, otherwise I cannot connect to the socket. I understand this may not be a standard use case as a reverse proxy as it is normally use with load balancers such as Nginx.
How is it possible to build this kind of forward proxy in Kubernetes?
I will be deploying this on Google Container Engine.
Assuming you're happy to use Terraform, you can use this:
https://github.com/GoogleCloudPlatform/terraform-google-nat-gateway
However, there's one caveat and that is it may inbound traffic to other clusters in that same region/zone.
Is the LoadBalancer that you need?
kubernetes create external loadbalancer,you can see this doc.

zookeeper initial discovery

I want to use Apache Zookeeper (or Curator) as a replicated naming service. Let's say I run 3 zookeeper servers and I have a dozen of computers with different applications which can connect to these servers.
How should I communicate zookeeper IP addresses to clients? A configuration file which should be distributed manually to each machine?
Corba Naming service had an option of UDP broadcast discovery in which case no configuration file is needed. Is there a similar possibility in Zookeeper?
It depends where/how you are deploying. If this is at AWS you can use Route 53 or elastic IPs. In general, the solution is some kind of DNS. i.e. a well known hostname for each of the ZK instances.
If you use something like Exhibitor (disclaimer, I wrote it) it's easier in that Exhibitor can work with Apache Curator to provide up-to-date cluster information.