Creating a topic in Kafka only in one cluster - apache-kafka

I run Docker file in this location which creates 3 clusters with 3 brokers. When I run docker ps, I can see 3 zookeeper and 3 kafka instances.
When I create a topic in one of the Kafka brokers (0.0.0.0:9092), and list broker topics, I see that topic listed in all brokers.
My expectation was to see that topic only under the broker I picked.
What am I missing?

Related

Unable to connect kafka Consumer to Kafka cluster

We have Kafka cluster with 3 broker nodes. When all are up and running, consumer able to read data from Kafka. However if I stop all Kafka server and brings up only 2 Kafka server except the one which stopped last then Consumer unable to connect to Kafka cluster.
What could be the reason behind this? Thanks in advance.
I would guess that the problem could be the offsets.topic.replication.factor in the broker that by default is 3 while you are now running a cluster with 2 brokers only.
This is the topic where the consumers store the offsets when consuming and it was created with replication factor of 3 on the first run.
When, on the second run, you start only 2 broker, it could be the problem now.

How to understand which partition replica kafka broker is down

There's topic with 22 replicas, 50 partitions and 22 running Kafka brokers.
Topic manual assignment screen in Kafka Manager shows that there's Broker Down in all topic partitions as seen in the image.
How to determine Kafka broker that's down using cli or Kafka Manager?
Currently, i look which broker id is missing in Partition replicas.
This information on brokers is also maintained in Zookeeper. So, you could go onto one of the zookeeper nodes and use the cli to extract this and here's a command sequence you could use:
On the command line, just issue the command zookeeper-client and this should invoke the zookeeper command prompt
On the new prompt, issue the command - ls /brokers/ids and this should return the ids of all the active brokers

Kafka Topic, Broker, ZooKeeper architecture overview

I have read a bunch of articles regarding Kafka architecture but I'm still brand-new in this and when it came to coding there was some confusion if I get the things correctly.
From what I understand Kafka server, broker and node are synonyms. There can be a few brokers within Kafka cluster. There is a Kafka topic (T1) and it consists of a few partitions (P1, P2..). These partitions can be replicated across the brokers (B1, B2..). B1 can be leader for P1, B2 for P2 and so on. Do we say that there is topic T1 defined for broker or cluster, and if we treat topic as set of partitions can we say 'topic replicas'?
From the official Kafka documentation:
bootstrap.servers: A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,.... Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
So from what I understand, defining host1:port1,host2:port2 says that there are two brokers.
In this case, does ZooKeeper automatically distribute a message to a leader when executing bin/kafka-console-producer.sh --broker-list host1:port1,host2:port2 --topic test ? (I believe somewhere I have read that a producer should read broker id from ZooKeeper, but wouldn't it be unnecessary here?)
Is it equal to publishing using bin/kafka-console-producer.sh --zookeeper host1:z_port1,host2:z_port2 --topic test ?
How should I basically understand bin/kafka-configs.sh --zookeeper host1:z_port1,host2:z_port2? We have only one zookeeper instance?
Do we say that there is topic T1 defined for broker or cluster, and if we treat topic as set of partitions can we say 'topic replicas'?
1) Cluster. 2) Partitions are individually replicated across multiple brokers, often more than the replication factor itself. The more proper term would be the "in sync replicas (ISR)"
does ZooKeeper automatically distribute a message to a leader when executing
Zookeeper does not, no. Your client communicates with a Broker Controller, then receives all brokers in the cluster, which also returns metadata about which broker is the leader for which topic-partitions. The client then individually connects and produces to each leader broker for the calculated partitions
Is it equal to publishing
Producing*, yes.
We have only one zookeeper instance?
One Zookeeper cluster can manage multiple Kafka clusters via a feature called a chroot, the root directory in the Zookeeper znodes that contains information about the managed service.
Also, kafka-topics command can now use --bootstrap-server, not --zookeeper

Doubts Regarding Kafka Cluster Setup

I have a use case I want to set up a Kafka cluster initially at the starting I have 1 Kafka Broker(A) and 1 Zookeeper Node. So below mentioned are my queries:
On adding a new Kafka Broker(B) to the cluster. Will all data present on broker A will be distributed automatically? If not what I need to do distribute the data.
Not let's suppose somehow the case! is solved my data is distributed on both the brokers. Now due to some maintenance issue, I want to take down the server B.
How to transfer the data of Broker B to the already existing broker A or to a new Broker C.
How can I increase the replication factor of my brokers at runtime
How can I change the zookeeper IPs present in Kafka Broker Config at runtime without restarting Kafka?
How can I dynamically change the Kafka Configuration at runtime
Regarding Kafka Client:
Do I need to specify all Kafka broker IP to kafkaClient for connection?
And each and every time a broker is added or removed does I need to add or remove my IP in Kafka Client connection String. As it will always require to restart my producer and consumers?
Note:
Kafka Version: 2.0.0
Zookeeper: 3.4.9
Broker Size : (2 core, 8 GB RAM) [4GB for Kafka and 4 GB for OS]
To run a topic from a single kafka broker you will have to set a replication factor of 1 when creating that topic (explicitly, or implicitly via default.replication.factor). This means that the topic's partitions will be on a single broker, even after increasing the number of brokers.
You will have to increase the number of replicas as described in the kafka documentation. You will also have to pay attention that the internal __consumer_offsets topic has enough replicas. This will start the replication process and eventually the original broker will be the leader of every topic partition, and the other broker will be the follower and fully caught up. You can use kafka-topics.sh --describe to check that every partition has both brokers in the ISR (in-sync replicas).
Once that is done you should be able to take the original broker offline and kafka will elect the new broker as the leader of every topic partition. Don't forget to update the clients so they are aware of the new broker as well, in case a client needs to restart when the original broker is down (otherwise it won't find the cluster).
Here are the answers in brief:
Yes, the data present on broker A will also be distributed in Kafka broker B
You can set up three brokers A, B and C so if A fails then B and C will, and if B fails then, C will take over and so on.
You can increase the replication factor of your broker
you could create increase-replication-factor.json and put this content in it:
{"version":1,
"partitions":[
{"topic":"signals","partition":0,"replicas":[0,1,2]},
{"topic":"signals","partition":1,"replicas":[0,1,2]},
{"topic":"signals","partition":2,"replicas":[0,1,2]}
]}
To increase the number of replicas for a given topic, you have to:
Specify the extra partitions to the existing topic with below command(let us say the increase from 2 to 3)
bin/kafktopics.sh --zookeeper localhost:2181 --alter --topic topic-to-increase --partitions 3
There is zoo.cfg file where you can add the IP and configuration related to ZooKeeper.

Shutdown Kafka Cluster and then Start Kafka Cluster

I have a 2 Broker node kafka with 3 node ZooKeeper cluster. When Stopping and starting Kafka cluster, what are the steps I should take,
Do I go, stop 2 brokers individually first and then stop 3 zks individually?
And then start zks individually and 2 kafka brokers individually?
Assumptions
This is a production cluster and you don't want any data loss.
You have partition replicas spanned across the brokers
For each partition you have at least one replica on each broker
all zks are accessible by each broker
This is how I would do it
Take down individual broker.
When one of the broker is down, then describe the topics to check if only replicas that are not shown, belong to broker which was taken down.
Restart the broker and again verify that all partitions are in sync before going for next broker.
Then stop-start each zk individually, each time tracking if all replicas and partitions are in sync. That way at least one zk is available for 2 brokers to maintain their meta data