Kakfa brokers inside kubernets can be addresed by clients outside k8s cluster - kubernetes

Basically I'm builing a system on google cloud. Most services are working on k8s cluster but some codes are not. Lambda and operator of composer, dataflow job are the examples. (Composer is also k8s but different cluster)
I picked kafka as event channel to interconnect the services and I have to decide proper place of kafka broker. K8s pods or VM. I prefer k8s pods, but I worry about the communication between brokers and services, espicially with services outside of k8s cluster.
Consumer addresses broker with "bootstrap server" that is list of some broker's static unique address. I suppose if brokers are installed inside k8s, addresses of them will be not static unique from outside. Can brokers are connected from service outside of k8s? If possible, which string must be provided to bootstrap sever config?
Conventional virtual machine is the solution without any suspicion. But I want put more and more things into k8s.

There are a different solutions to your problems
You can deploy the Kafka on K8s cluster and use the service mesh to interconnect both clusters. So broker and service can connect with each other without any worry.
If you are on GCP you can use the MCS service or traffic director and other service mesh.
You can also set up Kafka on VM and expose it over the IP and further that will be used by services to connect.
Can brokers are connected from service outside of k8s?
Yes, you can expose your Kafka broker using the service type Loadblanacer or Node Port. Reference doc
I suppose if brokers are installed inside k8s, addresses of them will
be not static unique from outside.
You dont need to bind Kafka to any specific hostname for the interface, Kafka will listen to all the interfaces and you can expose it using the K8s service if running on K8s.

Related

Cross cluster communication in GKE Multi-cluster Service

I’m using GKE multi-cluster service and have configured two clusters.
On one cluster I have an endpoint I want to consume and it's hard-coded on address:
redpanda-0.redpanda.processing.svc.cluster.local.
Does anyone know how I can reach this from the other cluster?
EDIT:
I have exported the service, which is then automatically imported into the other cluster. Previously, I have been able to connect to the other cluster using SERVICE_EXPORT_NAME.NAMESPACE.svc.clusterset.local, but then I had to change the endpoint address manually to this exact address. In my new case, the endpoint address is not configurable.

ActiveMQ Artemis: Internal and external IP addresses

We are running an ActiveMQ Artemis cluster in a Kubernetes cluster. All of our applications (Java/Springboot/JMS) running in the Kubernetes cluster take advantage of connecting directly to the broker instances.
However, the IP addresses from the Kubernetes Pod network are unavailable outside of the cluster. Exposing the broker instances to the public network is possible — but with different IP addresses. This is similar to hiding the Artemis cluster behind a NAT configuration. When connecting to the brokers through the public IP addresses, client applications receive cluster topology information containing IP addresses (or hostnames?) that are unreachable outside of the cluster.
Is there any way to deal with “internal” and “external” IP addresses and/or hostnames and make topology discovery work for cluster-external applications?
And, related (I am not a Java developer): Is there any way to log received topology information for JMS applications?
ActiveMQ Artemis CORE client provides the useTopologyForLoadBalancing url parameter to disable the use of the cluster topology information for load balancing, i.e.
tcp://localhost:61616?useTopologyForLoadBalancing=false
The log of the cluster topology information can be enabled setting the TRACE for the org.apache.activemq.artemis.core.protocol.core logger in the logging.properties file, see the documentation, i.e.
loggers=...,org.apache.activemq.artemis.core.protocol.core
logger.org.apache.activemq.audit.message.level=TRACE
handler.CONSOLE.level=TRACE
handler.FILE.level=TRACE
You can't rely on topology discovery from outside clients. What you can do is either provide a list of the external ips or have a router / load-balancer in front of your cluster.

Connect to Cassandra on Kubernetes using java-driver

We are bringing up a Cassandra cluster, using k8ssandra helm chart, it exposes several services, our client applications are using the datastax Java-Driver and running at the same k8s cluster as the Cassandra cluster (this is testing phase)
CqlSessionBuilder builder = CqlSession.builder();
What is the recommended way to connect the application (via the Driver) to Cassandra?
Adding all nodes?
for (String node :nodes) {
builder.addContactPoint(new InetSocketAddress(node, 9042));
}
Adding just the service address?
builder.addContactPoint(new InetSocketAddress(service-dns-name , 9042))
Adding the service address as unresolved? (would that even work?)
builder.addContactPoint(InetSocketAddress.createUnresolved(service-dns-name , 9042))
The k8ssandra Helm chart deploys a CassandraDatacenter object and cass-operator in addition to a number of other resources. cass-operator is responsible for managing the CassandraDatacenter. It creates the StatefulSet(s) and creates several headless services including:
datacenter service
seeds service
all pods service
The seeds service only resolves to pods that are seeds. Its name is of the form <cluster-name>-seed-service. Because of the ephemeral nature of pods cass-operator may designate different C* nodes as seed nodes. Do not use the seed service for connecting client applications.
The all pods service resolves to all Cassandra pods, regardless of whether they are readiness. Its name is of the form <cluster-name>-<dc-name>-all-pods-service. This service is intended to facilitate with monitoring. Do not use the all pods service for connecting client applications.
The datacenter service resolves to ready pods. Its name is of the form <cluster-name>-<dc-name>-service This is the service that you should use for connecting client applications. Do not directly use pod IPs as they will change over time.
Adding all nodes?
You definitely do not need to add all of the nodes as contact points. Even in vanilla Cassandra, only adding a few is fine as the driver will gossip and find the rest.
Adding just the service address?
Your second option of binding on the service address is all you should need to do. The nice thing about the service address, is that it will account for changing/removing of IPs in the cluster.

Why headless service to be used for Kafka in Kubernetes, why not Cluster IP with load balancing out of box?

Most of the examples I come across to use Kafka in Kubernetes is to deploy it as a headless service but I am not able to get the answer yet on why it should be headless and not Cluster IP? In my opinion cluster, IP provides the load balancing in which we ensure out of the box that not only one of the broker gets loaded always with its resources as I see with headless the Kafka clients be it sarma or java client tries to pick always the first IP from the DNS lookup and connects to it, will this not be a bottleneck if there are around 100+ clients trying to do the same and open connection to the first IP? or Kafka handles this inbuilt already which I am still trying to understand how it really happens.
When there is no differentiation between various instances of a services(replicas of a pod serving a stateless application), you can expose them under a ClusterIP service as connecting to any of the replica to serve the current request is okay. This is not the case with stateful services(like Kafka, databases etc). Each instance is responsible for it's own data. Each instance might be owning a different partition/topic etc. The instances of the service are not exact "replicas". Solutions for running such stateful services on Kubernetes usually use headless services and/or statefulsets so that each instance of the service has a unique identity. Such stateful applications usually have their own clustering technology that rely on each instance in the cluster having a unique identity.
Now that you know why stable identities are required for stateful applications and how statefulsets with headless services provide stable identities, you can check how your Kafka distributions might using them to run Kafka on kubernetes.
This blog post explains how strimzi does it:
For StatefulSets – which Strimzi is using to run the Kafka brokers –
you can use the Kubernetes headless service to give each of the pods a
stable DNS name. Strimzi is using these DNS names as the advertised
addresses for the Kafka brokers. So with Strimzi:
The initial connection is done using a regular Kubernetes service to
get the metadata.
The subsequent connections are opened using the DNS
names given to the pods by another headless Kubernetes service.
It's used in cases where communication to specific Pods is needed.
For example, A monitoring service must be able to reach all pods behind a service, to check their status, so it needs the addresses of all Pods and not just any one of them. This would be a use case of headless service.
Or when there is a cluster of Pods being set up, it's important to coordinate with the Pods to keep the cluster working for consumers. In Kafka, this work is done by Zookeeper. thus a headless service is needed by Zookeeper
Stateful:
Kafka streaming platform maintain replicas of partition across kafka brokers based on RELICATION_FACTOR. It maintains it data across persistent storage. When it comes to K8s ; stateful type is suggested; Pods in StatefulSets are not interchangeable: each Pod has a unique identifier that is maintained no matter where it is scheduled.
Headless:
To maintain internal communication between PODS. Lets not forget Zookeeper orchestrates kafka brokers.
Thanks
Within POD they should know eachother who is running and who stopped

Forward Traffic to POD in Kubernetes Cluster

I installed and configured 3 node K8S cluster. The worker nodes are windows nodes. We have one .Net application. We want to containerize this application. This application internally using Apache Ignite for the distributed cache.
We build docker image for this application, wrote a deployment file and deployed it in K8S cluster. The deployment will also create a service of “LoadBalancer” type. Using this service we are connecting to the application from the outside world. All is good so far.
Coming to the issue, as we are using Apache Ignite for the distributed cache. One of the POD will be master. We want to always forward the traffic to the POD which is acting as the master node in the Apache Ignite cluster. The Apache Ignite master node identification must be dynamic.
I had gone through the below link. Here the POD configuration is static. We want to dynamically identify the master POD and forward the traffic. What we have to do on the service side.
https://appscode.com/products/voyager/7.4.0/guides/ingress/http/statefulset-pod/
Any help on how to forward the traffic to the POD is greatly appreciated.
The very fact that you have a leader/follower topology, the ask to direct traffic to a said nome (master node) is flawed for a couple of reasons:
What happens when the current leader fails over and there is a new election to select a new leader
The fact that pods are ephemeral they should not have major roles to play in production, instead work with deployments and their replicas. What you are trying to achieve is an anti-pattern
In any case, if this is what you want, may be you want to read about gateways in istio which can be found here