How to run kafka on different machines - apache-kafka

From last 10 days i am trying to set Kafka on different machine:
Server32
Server56
Below are the list of task which i have done so far
Configured Zookeeper and started on both server with
server.1=Server32_IP:2888:3888
server.2=Server56_IP:2888:3888
I also changed server and server-1 properties as below
broker.id=0 port=9092 log.dir=/tmp/kafka0-logs
host.name=Server32
zookeeper.connect=Server32_IP:9092,Server56_IP:9062
& server-1
broker.id=1 port=9062 log.dir=/tmp/kafka1-logs
host.name=Server56
zookeeper.connect=Server32_IP:9092,Server56_IP:9062
Server.property i ran in Server32
Server-1.property i ran in Server56
The Problem is : when i start producer in both the servers and if i try to consume from any one then it is working BUT
When i stop any one server then another one is not able to send the details
Please help me in explaining the process

Running 2 zookeepers is not fault tolerant. If one of the zookeepers is stopped, then the system will not work. Unlike Kafka brokers, zookeeper needs a quorum (or majority) of the configured nodes in order to work. This is why zookeeper is typically deployed with an odd number of instances (nodes). Since 1 of 2 nodes is not a majority it really is no better than running a single zookeeper. You need at least 3 zookeepers to tolerate a failure because 2 of 3 is a majority so the system will stay up.
Kafka is different so you can have any number of Kafka brokers and if they are configured correctly and you create your topics with a replication factor of 2 or greater, then the Kafka cluster can continue if you take any one of the broker nodes down , even if it's just 1 of 2.

There's a lot of information missing here like the Kafka version and whether or not you're using the new consumer APIs or the old APIs. I'm assuming you're probably using a new version of Kafka like 0.10.x along with the new client APIs. In the new version of the client APIs the log data is stored on the Kafka brokers and not Zookeeper as in the older versions. I think your issue here is that you created your topics with a replication factor of 1 and coincidently the Kafka broker server you shutdown was hosting the only replica, so you won't be able to produce or consume messages. You can confirm the health of your topics by running the command:
kafka-topics.sh --zookeeper ZHOST:2181 --describe
You might want to increase the replication factor to 2. That way you might be able to get away with one broker failing. Ideally you would have 3 or more Kafka Broker servers with a replication factor of 2 or higher (obviously not more than the number of brokers in your cluster). Refer to the link below:
https://kafka.apache.org/documentation/#basic_ops_increase_replication_factor
For a topic with replication factor N, we will tolerate up to N-1 server >failures without losing any records committed to the log."

Related

How does the replicas in kafka works if one broker went done for 2 hours and came back?

we are having 3 zookeeper and 3 kafka broker nodes as cluster setup running in different systems in AWS,And we changes the below properties to ensure the high availabilty and prevent data loss.
server.properties
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=1
i am having the following question
Assume BROKER A,B,C
since we are enabling replication factor as 3 all the data will be available in all A,B,C brokers if A broker is down it wont affect the flow.
but when ever broker A went down but at the same time we are continously receiving data from connector and it is storing the broker B and C
so after 2 hours broker A came up
In that time the data came between the down time and up time of A is available in broker A or not?
is there any specific configuration we need to mention for that?
how does the replication between the broker happen when one broker came online from offline?
i didn't know whether it is a valid question, but please share your thoughts on this to understand this replication factor working
While A is recovering, it'll be out of the ISR list. If you've disabled unclean leader election, then A cannot become the leader broker of any partitions it holds (no client can write or read to it) and will replicate data from other replicas until its up to date, then join the ISR

Unable to connect kafka Consumer to Kafka cluster

We have Kafka cluster with 3 broker nodes. When all are up and running, consumer able to read data from Kafka. However if I stop all Kafka server and brings up only 2 Kafka server except the one which stopped last then Consumer unable to connect to Kafka cluster.
What could be the reason behind this? Thanks in advance.
I would guess that the problem could be the offsets.topic.replication.factor in the broker that by default is 3 while you are now running a cluster with 2 brokers only.
This is the topic where the consumers store the offsets when consuming and it was created with replication factor of 3 on the first run.
When, on the second run, you start only 2 broker, it could be the problem now.

Zookeeper resiliency

We have cluster of 17 brokers and 5 zookeepers. I wanted to test resiliency of zookeepers. So I took down 3 zookeepers as my understanding is that for a cluster with 5 zookeepers the maximum outage it can withstand is failure of 2 ( using 2n+1 rule) zookeepers. But to my surprise I was able to produce & consume data. And even with all the zookeepers ( i.e. all 5) down I was able to produce data. Can some explain the reason behind the two behaviors ?
Zookeeper is only required for notifications when there are changes happening to the cluster. Say, brokers joining the cluster or going down. If all brokers are running, and there's consumers/producers connected to it and sending/receiving data, there's no need for zookeeper communication, and everything will just keep working. New producer/consumer connections might not work, but I'm not 100% confident of that.
When brokers join/leave the cluster, consumers need to be notified, so they can point to the correct leaders for the partitions/topics they're consuming from. Also, other brokers need to be notified to start syncing data to the new broker, or to take on leadership for topic/partitions that are now leaderless. All those notifications are sent through zookeeper.
There's many more details at these links:
https://www.waitingforcode.com/apache-kafka/the-role-of-apache-zookeeper-in-apache-kafka/read
https://data-flair.training/blogs/zookeeper-in-kafka/
Kafka can operate just fine without Zookeeper as long as there is no need to change the In Sync Replicas. Kafka would start throwing errors once there's any update in the ISR for any partition when brokers bounce/joins.

Kafka setup strategy for replication?

I have two vm servers (say S1 and S2) and need to install kafka in cluster mode where there will be topic with only one partition and two replicas(one is leader in itself and other is follower ) for reliability.
Got high level idea from this cluster setup Want to confirm If below strategy is correct.
First set up zookeeper as cluster on both nodes for high availability(HA). If I do setup zk on single node only and then that node goes down, complete cluster
will be down. Right ? Is it mandatory to use zk in latest kafka version also ? Looks it is must for older version Is Zookeeper a must for Kafka?
Start the kafka broker on both nodes . It can be on same port as it is hosted on different nodes.
Create Topic on any node with partition 1 and replica as two.
zookeeper will select any broker on one node as leader and another as follower
Producer will connect to any broker and start publishing the message.
If leader goes down, zookeeper will select another node as leader automatically . Not sure how replica of 2 will be maintained now as there is only
one node live now ?
Is above strategy correct ?
Useful resources
ISR
ISR vs replication factor
First set up zookeeper as cluster on both nodes for high
availability(HA). If I do setup zk on single node only and then that
node goes down, complete cluster will be down. Right ? Is it mandatory
to use zk in latest kafka version also ? Looks it is must for older
version Is Zookeeper a must for Kafka?
Answer: Yes. Zookeeper is still must until KIP-500 will be released. Zookeeper is responsible for electing controller, storing metadata about Kafka cluster and managing broker membership (link). Ideally the number of Zookeeper nodes should be at least 3. By this way you can tolerate one node failure. (2 healthy Zookeeper nodes (majority in cluster) are still capable of selecting a controller)) You should also consider to set up Zookeeper cluster on different machines other than the machines that Kafka is installed. Thus the failure of a server won't lead to loss of both Zookeeper and Kafka nodes.
Start the kafka broker on both nodes . It can be on same port as it is
hosted on different nodes.
Answer: You should first start Zookeeper cluster, then Kafka cluster. Same ports on different nodes are appropriate.
Create Topic on any node with partition 1 and replica as two.
Answer: Partitions are used for horizontal scalability. If you don't need this, one partition is okay. By having replication factor 2, one of the nodes will be leader and one of the nodes will be follower at any time. But it is not enough for avoiding data loss completely as well as providing HA. You should have at least 3 Kafka brokers, 3 replication factor of topics, min.insync.replicas=2 as broker config and acks=all as producer config in the ideal configuration for avoiding data loss by not compromising HA. (you can check this for more information)
zookeeper will select any broker on one node as leader and another as
follower
Answer: Controller broker is responsible for maintaining the leader/follower relationship for all the partitions. One broker will be partition leader and another one will be follower. You can check partition leaders/followers with this command.
bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic
Producer will connect to any broker and start publishing the message.
Answer: Yes. Setting only one broker as bootstrap.servers is enough to connect to Kafka cluster. But for redundancy you should provide more than one broker in bootstrap.servers.
bootstrap.servers: A list of host/port pairs to use for establishing
the initial connection to the Kafka cluster. The client will make use
of all servers irrespective of which servers are specified here for
bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the form
host1:port1,host2:port2,.... Since these servers are just used for the
initial connection to discover the full cluster membership (which may
change dynamically), this list need not contain the full set of
servers (you may want more than one, though, in case a server is
down).
If leader goes down, zookeeper will select another node as leader
automatically . Not sure how replica of 2 will be maintained now as
there is only one node live now ?
Answer: If Controller broker goes down, Zookeeper will select another broker as new Controller. If broker which is leader of your partition goes down, one of the in-sync-replicas will be the new leader. (Controller broker is responsible for this) But of course, if you have just two brokers then replication won't be possible. That's why you should have at least 3 brokers in your Kafka cluster.
Yes - ZooKeeper is still needed on Kafka 2.4, but you can read about KIP-500 which plans to remove the dependency on ZooKeeper in the near future and start using the Raft algorithm in order to create the quorum.
As you already understood, if you will install ZK on a single node it will work in a standalone mode and you won't have any resiliency. The classic ZK ensemble consist 3 nodes and it allows you to lose 1 ZK node.
After pointing your Kafka brokers to the right ZK cluster you can start your brokers and the cluster will be up and running.
In your example, I would suggest you to create another node in order to gain better resiliency and met the replication factor that you wanted, while still be able to lose one node without losing data.
Bear in mind that using single partition means that you are bounded to single consumer per Consumer Group. The rest of the consumers will be idle.
I suggest you to read this blog about Kafka Best Practices and how to choose the number of topics/partitions in a Kafka cluster.

Why is my kafka topic not consumable with a broker down?

My issue is that I have a three broker Kafka Cluster and an availability requirement to have access to consume and produce to a topic when one or two of my three brokers is down.
I also have a reliability requirement to have a replication factor of 3. These seem to be conflicting requirements to me. Here is how my problem manifests:
I create a new topic with replication factor 3
I send several messages to that topic
I kill one of my brokers to simulate a broker issue
I attempt to consume the topic I created
My consumer hangs
I review my logs and see the error:
Number of alive brokers '2' does not meet the required replication factor '3' for the offsets topic
If I set all my broker's offsets.topic.replication.factor setting to 1, then I'm able to produce and consume my topics, even if I set the topic level replication factor to 3.
Is this an okay configuration? Or can you see any pitfalls in setting things up this way?
You only need as many brokers as your replication factor when creating the topic.
I'm guessing in your case, you start with a fresh cluster and no consumers have connected yet. In this case, the __consumer_offsets internal topic does not exist as it is only created when it's first needed. So first connect a consumer for a moment and then kill one of the brokers.
Apart from that, in order to consume you only need 1 broker up, the leader for the partition.