kafka on kubernetes does not receive messages from external producer

kafka on kubernetes does not receive messages from external producer - kubernetes

Hello (I use the google translator).
I have the following problem, I have a Kafka service in kubernetes, everything is managed by rancher and the deployment of kafka is done through the catalogs that rancher allows, (I attach an image of the service)
Everything works correctly within kubernetes, but now I need a producer external to kubernetes connects to Kafka and sends messages so that they are received internally in kubernetes.
I have not been able to accomplish this task and I have already tried another kafka deployment following this guide:
https://www.weave.works/blog/kafka-on-kubernetes-and-deploying-best-practice
[1]
But I can't understand both in the version of rancher catalogs and not in the version installed through YAML files, where and what should I configure to have a producer outside of kubernetes, I also tried to set the service as NodePort but this didn't work, any help you are welcome and thank you.

NodePort is one option. A LoadBalancer or Ingress is another.
Rather than use Rancher catalogs, I'd recommend that you read through the Strimzi operator documentation, where it covers all options for external client communication

Related

Strimzi Kafka Connect how to mount PersistentVolumeClaim

I am deploying KafkaConnect using Custom Resource. I would like to mount to Kafka Connect cluster PersistentVolumeClaim. The idea is that another application responsible for file transfer will place there file that will be picked up by kafka connector.
I checked KafkaConnect resource config docs and it seems that I cannot simply add the volume to the Pod.
My understanding is that if I will patch the Pod strimzi-operator will recognise the modification and on the next reconciliation will overwrite it.
Would anyone have an idea how can I still use KafkaConnect CR and mount the pvc volume?

This is currently not supported by Strimzi. There is an enhancement issue for this (https://github.com/strimzi/strimzi-kafka-operator/issues/2571) but nobody implemented it yet.
I wonder if for the use-case you describe something like running the Kafka Connect as a sidecar would make more sense. You could then share the storage directly without any networking etc. (This is not something supported by Strimzi, But Kafka Connect itself can of course be used like this)

Why headless service to be used for Kafka in Kubernetes, why not Cluster IP with load balancing out of box?

Most of the examples I come across to use Kafka in Kubernetes is to deploy it as a headless service but I am not able to get the answer yet on why it should be headless and not Cluster IP? In my opinion cluster, IP provides the load balancing in which we ensure out of the box that not only one of the broker gets loaded always with its resources as I see with headless the Kafka clients be it sarma or java client tries to pick always the first IP from the DNS lookup and connects to it, will this not be a bottleneck if there are around 100+ clients trying to do the same and open connection to the first IP? or Kafka handles this inbuilt already which I am still trying to understand how it really happens.

When there is no differentiation between various instances of a services(replicas of a pod serving a stateless application), you can expose them under a ClusterIP service as connecting to any of the replica to serve the current request is okay. This is not the case with stateful services(like Kafka, databases etc). Each instance is responsible for it's own data. Each instance might be owning a different partition/topic etc. The instances of the service are not exact "replicas". Solutions for running such stateful services on Kubernetes usually use headless services and/or statefulsets so that each instance of the service has a unique identity. Such stateful applications usually have their own clustering technology that rely on each instance in the cluster having a unique identity.
Now that you know why stable identities are required for stateful applications and how statefulsets with headless services provide stable identities, you can check how your Kafka distributions might using them to run Kafka on kubernetes.
This blog post explains how strimzi does it:
For StatefulSets – which Strimzi is using to run the Kafka brokers –
you can use the Kubernetes headless service to give each of the pods a
stable DNS name. Strimzi is using these DNS names as the advertised
addresses for the Kafka brokers. So with Strimzi:
The initial connection is done using a regular Kubernetes service to
get the metadata.
The subsequent connections are opened using the DNS
names given to the pods by another headless Kubernetes service.

It's used in cases where communication to specific Pods is needed.
For example, A monitoring service must be able to reach all pods behind a service, to check their status, so it needs the addresses of all Pods and not just any one of them. This would be a use case of headless service.
Or when there is a cluster of Pods being set up, it's important to coordinate with the Pods to keep the cluster working for consumers. In Kafka, this work is done by Zookeeper. thus a headless service is needed by Zookeeper

Stateful:
Kafka streaming platform maintain replicas of partition across kafka brokers based on RELICATION_FACTOR. It maintains it data across persistent storage. When it comes to K8s ; stateful type is suggested; Pods in StatefulSets are not interchangeable: each Pod has a unique identifier that is maintained no matter where it is scheduled.
Headless:
To maintain internal communication between PODS. Lets not forget Zookeeper orchestrates kafka brokers.
Thanks
Within POD they should know eachother who is running and who stopped

Usage of custom resource definition (CRD) vs Service Catalog in k8s

I recently started to explore k8s extensions and got introduced to two concepts:
CRD.
Service catalogs.
They look pretty similar to me. The only difference to my understanding is, CRDs are deployed inside same cluster to be consumed; whereas, catalogs are deployed to be exposed outside the cluster for example as database service (client can order cluster of mysql which will be accessible from his cluster).
My query here is:
Is my understanding correct? if yes, can there be any other scenario where I would like to create catalog and not CRD.

Yes, your understanding is correct. Taken from official documentation:
Example use case
An application developer wants to use message queuing as part of their application running in a Kubernetes cluster.
However, they do not want to deal with the overhead of setting such a
service up and administering it themselves. Fortunately, there is a
cloud provider that offers message queuing as a managed service
through its service broker.
A cluster operator can setup Service Catalog and use it to communicate
with the cloud provider’s service broker to provision an instance of
the message queuing service and make it available to the application
within the Kubernetes cluster. The application developer therefore
does not need to be concerned with the implementation details or
management of the message queue. The application can simply use it as
a service.
With CRD you are responsible for provisioning resources, running backend logic and so on.
More info can be found in this KubeCon 2018 presentation.

Forward Traffic to POD in Kubernetes Cluster

I installed and configured 3 node K8S cluster. The worker nodes are windows nodes. We have one .Net application. We want to containerize this application. This application internally using Apache Ignite for the distributed cache.
We build docker image for this application, wrote a deployment file and deployed it in K8S cluster. The deployment will also create a service of “LoadBalancer” type. Using this service we are connecting to the application from the outside world. All is good so far.
Coming to the issue, as we are using Apache Ignite for the distributed cache. One of the POD will be master. We want to always forward the traffic to the POD which is acting as the master node in the Apache Ignite cluster. The Apache Ignite master node identification must be dynamic.
I had gone through the below link. Here the POD configuration is static. We want to dynamically identify the master POD and forward the traffic. What we have to do on the service side.
https://appscode.com/products/voyager/7.4.0/guides/ingress/http/statefulset-pod/
Any help on how to forward the traffic to the POD is greatly appreciated.

The very fact that you have a leader/follower topology, the ask to direct traffic to a said nome (master node) is flawed for a couple of reasons:
What happens when the current leader fails over and there is a new election to select a new leader
The fact that pods are ephemeral they should not have major roles to play in production, instead work with deployments and their replicas. What you are trying to achieve is an anti-pattern
In any case, if this is what you want, may be you want to read about gateways in istio which can be found here

K8s istio enabled pod can't reach regular services

I'm trying to use Istio in a K8s 1.6 cluster on AWS.
I have a Kafka pod/service running the old fashion way, with a "kafka-zk-broker-kafka.dev" service without IP, so the kafka-zk-broker-kafka.dev service (I'm in the dev namespace) resolve to the internal name of my 3 Kafka pods. This is working great.
~ # nslookup kafka-zk-broker-kafka.dev
Name: kafka-zk-broker-kafka.dev
Address 1: 10.33.0.11 kafka-zk-kafka-0.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 2: 10.38.96.16 kafka-zk-kafka-2.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 3: 10.40.128.13 kafka-zk-kafka-1.kafka-zk-broker-kafka.dev.svc.cluster.local
I deployed a kafka producer application, using Istio sidecart as it is also exposing a gRPC port for internal uses.
Deployment went fine, but my application can't connect to to the "kafka-broker" service. DNS resolution is OK, but I can't reach the service port (TCP:9092) using either kafka client or telnet.
What I understand is that, when the Istio (envoy) sidecart is deployed, everything out of the POD is going through the Envoy proxy...
So the envoy proxy does not know how to reach regular services ?
Am I missing something ? is there a way to mix Istio/Envoy with regular k8s services ?

What you are doing should work, but I think you're running into this known bug: https://github.com/istio/issues/issues/37

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse