How Kafka Nodes communicate with each other? - apache-kafka

I could not find any details on Kafka nodes within a cluster communicate with each other. I mean during replication producer is going to contact with one of the node. Lets say it is the leader. Now for replication, that data should be copied to other nodes too as per configuration. So how that happens? Does those communication happen through zookeeper cluster? Or, Leader directly talks with Follower for replication? If they communicate directly, on which port they did that?

Producers send messages to the Kafka leader with the other Kafka nodes acting as clients to this leader for replication, as any external Kafka client. The followers use to communicate with the leader the same port exposed for normal clients, by default 9092.

Related

Client cannot pubish because firewall issue with full replication factor

Setup: Three Node Kafka cluster running version 2.12-2.3.0. Replication factor is 3 and with 20 partition per topic.
Description:
All three nodes in Kafka cluster can communicate between themself without issue. An incorrect firewall is introduced with Kafka client which "block" client from communicating with one Kafka node. The client can no longer publish to any of the Kafka node. Two Kafka nodes are still network reachable from Kafka client. We understand this is a network split brain issue.
Question: Is there a way to configure Kafka so that kafka client can communicate with "surviving" Kafka nodes?
The client can no longer publish to any of the Kafka node
That shouldn't happen. The client should only be unable to communicate with leader partitions on that one node, and continue communicating with the leader partitions on the other, reachable nodes.
There are no changes you could make on the server-side if the client's host network/firewall is the issue.

Is it possible / best practice to add different kafka brokers to the same cluster?

In case I have a cluster and I have in it a broker consume/produce event x from microservice MS-1
Can I add additional broker to the same cluster so it will consume/produce event y from microservice MS-2 or for each broker type have to generate dedicated cluster ?
Is it best practice or even possible ?
I am asking since I have seen that brokers used as leader-follower on the same cluster, means all are replicas of the leader.
Your brokers are the nodes in the cluster that handle requests from your clients. Your clients are Consumers or Producers (or both) that interact with your cluster (Consumers and Producers are not Brokers).
While you can add brokers to a running cluster, the concept I think you're looking for is a Topic, which is a group of related event/message types. Your cluster can support many Topics, and yes, microservice1 could produce events to Topic1, and microservice2 could produce events to Topic2.

Is it possible to produce to a kafka topic when only 1 of the brokers is reachable?

Is it possible to produce to a Kafka topic when only 1 of the brokers is reachable from the producer, none of the zookeeper nodes are reachable from the producer, but all of the brokers are healthy and are reachable from each other?
For example, this would be required if I were to produce messages via an SSH tunnel. If this were for a temporary push I could possibly create the topic with replication factor 1 and have all partitions assigned to the broker in question, and reassign the partitions after the fact, but I'm hoping there is a more flexible setup.
This is all using the java client.
Producers don't interact with Zookeeper so it's not an issue.
The only requirement for Producers is to be able to connect to the brokers that are leaders for the partitions they want to use.
If the broker you connect to is the leader for the partitions you want to use, then yes you can produce to it.
Otherwise it's not going to work. Also creating a topic may not help as its partitions could be assigned to any brokers. Also in order to create a topic, a client has to connect to the controller which may not be the broker you can reach.
If you can only connect to 1 "thing", you may want to consider using something like a REST Proxy. Your "isolated" environment could send REST requests to the proxy which is able to connect to all brokers in the cluster.

How many connections are made by producer and consumer to a Kafka cluster?

Can anyone shed some light on the number of connections and what type of connections are made by the Kafka Java producers and Kafka Java consumers to a Kafka cluster.
Are the number of connections based on the number of topics or partitions or brokers in the cluster?
Each consumer/producer needs to be connected to the broker which is leader for the partition that the consumer/producer wants to read/write.
It means that a client doesn't need to be connected to all brokers inside a cluster but just the brokers needed for reading/sending messages.
During the initial configuration, we provide a list of brokers to connect to (which could be even only one). Using such broker(s), the client gets metadata information about the topic/partitions it wants to use and where they are placed (other brokers in the cluster). Such connections need to be in place for client working on the desired topic/partitions.

Kafka Producer and Consumer Info

I am trying to create a tool that can kill kafka producers and consumers randomly in order to simulate production environment. Is there a way to find the active producers and consumers in a kafka cluster? I want to exactly know which thread in which host is acting as the producer or the consumer?
I started by getting the I.P. of at least one kafka broker from the HDP cluster.
I checked the open connections on the kafka broker on its specified port (by default it is 6667) and retrieved the IP addresses of the connected machines.
Using their IP Addresses, I found out the processes that are connected to the kafka broker.
Thus, I figured out the machines that are kafka producers and consumers.