Does kafka client connect to zookeeper or is it behind the scene - apache-kafka

Kafka client code directly refers to the broker ip and port and in case if it is down will zookeeper direct to another broker. is zookeper always behind the scene

In the case you provide only one broker address in the client code, and it goes down, plus your client restarts, then your client will also be down. Zookeeper will not be used here because the broker will not be reachable.
If you give more than one broker address in the client, then it's more resilient in that the Kafka Controller process periodically fetches a list of all alive brokers in the cluster from Zookeeper and is responsible for sending that information back to the clients via the leader of the partitions they get assigned. Zookeeper is indirectly used here, but does not communicate with any external clients

If I got the question in the right way the answer is no.
The Kafka clients need connection only to Kafka brokers and Zookeeper isn't involved at all. Clients needs to write/read leader partitions on brokers.
If the Kafka brokers set in the brokers list aren't available, the clients can connect and cannot start to send/receive messages.
Only in the old version 0.8.0 the Zookeeper was involved for consumers which saved offset on Zookeeper. Starting from 0.9.0, the consumers save offset in Kafka topics so Zookeeper isn't needed anymore.

Related

Does a kafka consumer machine need to run zookeeper?

So my question is this: If i have a server running Kafka (And zookeeper), and another machine only consuming messages, does the consumer machine need to run zookeeper too? Or does the server take care of all?
No.
Role of Zookeeper in Kafka is:
Broker registration: (cluster membership) with heartbeats mechanism to keep the list current
Storing topic configuration: which topics exist, how many partitions each
has, where are the replicas, who is the preferred leader, list of ISR for
partitions
Electing controller: The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions.
So Zookeeper is required only for kafka broker. There is no need to have Zookeper on the producer or consumer side.
The consumer does not need zookeeper
You have not mentioned which version of Kafka or the clients you're using.
Kafka consumers using 0.8 store their offsets in Zookeeper, so it is required for them. However, no, you would not run Zookeeper and consumers on the same server
From 0.9 and later, clients are separate from needing it (unless you want to manage external connections to Zookeeper on your own for storing data)

Is it possible to produce to a kafka topic when only 1 of the brokers is reachable?

Is it possible to produce to a Kafka topic when only 1 of the brokers is reachable from the producer, none of the zookeeper nodes are reachable from the producer, but all of the brokers are healthy and are reachable from each other?
For example, this would be required if I were to produce messages via an SSH tunnel. If this were for a temporary push I could possibly create the topic with replication factor 1 and have all partitions assigned to the broker in question, and reassign the partitions after the fact, but I'm hoping there is a more flexible setup.
This is all using the java client.
Producers don't interact with Zookeeper so it's not an issue.
The only requirement for Producers is to be able to connect to the brokers that are leaders for the partitions they want to use.
If the broker you connect to is the leader for the partitions you want to use, then yes you can produce to it.
Otherwise it's not going to work. Also creating a topic may not help as its partitions could be assigned to any brokers. Also in order to create a topic, a client has to connect to the controller which may not be the broker you can reach.
If you can only connect to 1 "thing", you may want to consider using something like a REST Proxy. Your "isolated" environment could send REST requests to the proxy which is able to connect to all brokers in the cluster.

How many connections are made by producer and consumer to a Kafka cluster?

Can anyone shed some light on the number of connections and what type of connections are made by the Kafka Java producers and Kafka Java consumers to a Kafka cluster.
Are the number of connections based on the number of topics or partitions or brokers in the cluster?
Each consumer/producer needs to be connected to the broker which is leader for the partition that the consumer/producer wants to read/write.
It means that a client doesn't need to be connected to all brokers inside a cluster but just the brokers needed for reading/sending messages.
During the initial configuration, we provide a list of brokers to connect to (which could be even only one). Using such broker(s), the client gets metadata information about the topic/partitions it wants to use and where they are placed (other brokers in the cluster). Such connections need to be in place for client working on the desired topic/partitions.

Kafka Producer and Consumer Info

I am trying to create a tool that can kill kafka producers and consumers randomly in order to simulate production environment. Is there a way to find the active producers and consumers in a kafka cluster? I want to exactly know which thread in which host is acting as the producer or the consumer?
I started by getting the I.P. of at least one kafka broker from the HDP cluster.
I checked the open connections on the kafka broker on its specified port (by default it is 6667) and retrieved the IP addresses of the connected machines.
Using their IP Addresses, I found out the processes that are connected to the kafka broker.
Thus, I figured out the machines that are kafka producers and consumers.

How to setup/connect distributed Kafka brokers, producers and consumers on a local network?

I have setup Apache Kafka and confirmed that producers and consumers work on the localhost.
How to set up Kafka so that:
multiple producers feed messages into a broker on a network computer
many consumers on the network can consume the messages from the broker
I noticed the following line: zookeeper.connect=localhost:2181 in server.properties which is used to start the kafka server. If this is the setting, is the setting for what addresses it listens to or is it specifying the server's address/port is on the network?
The zookeeper is used internally by Kafka to coordinate the cluster (leader election). In versions of Kafka before 0.8, ZK was the exclusive store for consumer offsets (what is consumed so far), but from 0.8.1, I think, you can choose whether to store offsets in ZK or in a special Kafka topic called __consumer_offsets.
What you're interested in is advertised.host.name and advertised.port settings that Kafka exposes to clients (or "what addresses it listens to", as you put it).
It is the name of the zookeeper server that Kafka connects to. The documentation for the Broker configuration can be found here http://kafka.apache.org/documentation.html#brokerconfigs