Do we need kafka connection pool from client? - apache-kafka

I have a REST webservice with traffic about 1 mil requests per day. In that REST service, for each request, we send message to remote kafka topic (Confluent platform). Do I have to set up Kafka connection pool to improve the performance?

No, you don't need to keep the Kafka connection pooling. Kafka clients keep the connections with the Kafka cluster and it will manage all the connections. As long as you have enough partitions configured for the Kafka topic, it should be alright.

Related

Client cannot pubish because firewall issue with full replication factor

Setup: Three Node Kafka cluster running version 2.12-2.3.0. Replication factor is 3 and with 20 partition per topic.
Description:
All three nodes in Kafka cluster can communicate between themself without issue. An incorrect firewall is introduced with Kafka client which "block" client from communicating with one Kafka node. The client can no longer publish to any of the Kafka node. Two Kafka nodes are still network reachable from Kafka client. We understand this is a network split brain issue.
Question: Is there a way to configure Kafka so that kafka client can communicate with "surviving" Kafka nodes?
The client can no longer publish to any of the Kafka node
That shouldn't happen. The client should only be unable to communicate with leader partitions on that one node, and continue communicating with the leader partitions on the other, reachable nodes.
There are no changes you could make on the server-side if the client's host network/firewall is the issue.

Does kafka client connect to zookeeper or is it behind the scene

Kafka client code directly refers to the broker ip and port and in case if it is down will zookeeper direct to another broker. is zookeper always behind the scene
In the case you provide only one broker address in the client code, and it goes down, plus your client restarts, then your client will also be down. Zookeeper will not be used here because the broker will not be reachable.
If you give more than one broker address in the client, then it's more resilient in that the Kafka Controller process periodically fetches a list of all alive brokers in the cluster from Zookeeper and is responsible for sending that information back to the clients via the leader of the partitions they get assigned. Zookeeper is indirectly used here, but does not communicate with any external clients
If I got the question in the right way the answer is no.
The Kafka clients need connection only to Kafka brokers and Zookeeper isn't involved at all. Clients needs to write/read leader partitions on brokers.
If the Kafka brokers set in the brokers list aren't available, the clients can connect and cannot start to send/receive messages.
Only in the old version 0.8.0 the Zookeeper was involved for consumers which saved offset on Zookeeper. Starting from 0.9.0, the consumers save offset in Kafka topics so Zookeeper isn't needed anymore.

How many connections are made by producer and consumer to a Kafka cluster?

Can anyone shed some light on the number of connections and what type of connections are made by the Kafka Java producers and Kafka Java consumers to a Kafka cluster.
Are the number of connections based on the number of topics or partitions or brokers in the cluster?
Each consumer/producer needs to be connected to the broker which is leader for the partition that the consumer/producer wants to read/write.
It means that a client doesn't need to be connected to all brokers inside a cluster but just the brokers needed for reading/sending messages.
During the initial configuration, we provide a list of brokers to connect to (which could be even only one). Using such broker(s), the client gets metadata information about the topic/partitions it wants to use and where they are placed (other brokers in the cluster). Such connections need to be in place for client working on the desired topic/partitions.

Kafka Producer and Consumer Info

I am trying to create a tool that can kill kafka producers and consumers randomly in order to simulate production environment. Is there a way to find the active producers and consumers in a kafka cluster? I want to exactly know which thread in which host is acting as the producer or the consumer?
I started by getting the I.P. of at least one kafka broker from the HDP cluster.
I checked the open connections on the kafka broker on its specified port (by default it is 6667) and retrieved the IP addresses of the connected machines.
Using their IP Addresses, I found out the processes that are connected to the kafka broker.
Thus, I figured out the machines that are kafka producers and consumers.

Apache Kafka Consumer Connection

I am looking at the docs of Apache Kafka.
The consumer connects to the Kafka by using the IP address/port of zookeepers.
Is it possible to use the IP address/port of broker?
Yes, when using the Simple consumer API you get to manage consumption directly from the brokers. See usage example here
Of course you can.
The consumer connects to the Kafka by using the IP address/port of zookeepers. What you point is High-level API
And connecting to broker directly points Low-level API
Maybe this can help you