Get total partition count in each Kafka broker - apache-kafka

I would like to calculate the number of partitions in each of my broker. We have a muli-DC distributed architecture; and would like to get the partition count per broker for maintenance and admin tasks
This is what was suggested in one of the blogposts; and works fine and this is at cluster level; however I need a similar script for per broker
zookeeper="ZK_SERVER1:2181,ZK_SERVER2:2181,ZK_SERVER3:2181"
sum=0
for i in $(/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper $zookeeper ); do count=$(/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --describe --zookeeper $zookeeper --topic $i |grep Leader | wc -l); sum=`expr $sum + $count` ; echo 'total partitions is ' $sum; done

Partition count is exposed as a JMX Mbean.
Install some agent such as Prometheus JMX Exporter, Datadog, New Relic, etc. on each broker, then collect and aggregate that information, adding tags for DC for further grouping, as necessary
Otherwise, I don't see why you couldn't add another loop to your script for a list of different Zookeeper endpoints for each Kafka cluster.
You need to parse that output to find per broker

You could use Admin interface, list the topics, describe their metadata (that should contain the hosting broker IDs), then describe the cluster and match the IDs.
This is more or less what kafka-topics does underneath with different commands, as it's just a wrapper for underlying Java application.

Related

Is it safe to reduce the replication factor count in altering kafka topic? [duplicate]

There are currently 22 replicas configured for specific topic in Kafka 0.9.0.1.
Is it possible to reduce the replication factor of the topic to 3?
How to do it via Kafka CLI or Kafka Manager?
I found a way to increase replicas number only here
Yes. Changing (increasing or decreasing) the replication factor can be done using the following 2-step process:
First, you'll need to create a partition assignment structure for the given topic in the form of a json file. Here's an example:
{
"version":1,
"partitions":[
{"topic":"<topic-name>","partition":0,"replicas":[<broker-ids>]},
{"topic":"<topic-name","partition":1,"replicas":[<broker-ids>]},
...
{"topic":"<topic-name","partition":n,"replicas":[<broker-ids>]},
]
}
Save this file with any name. Let's say - decrease-replication-factor.json.
Note - The <broker-ids> in the end represents the comma separated list of broker ids you want your replicas to exist on.
Run the script kafka-reassign-paritions and supply the above json as an input in the following way:
kafka-reassign-partitions --zookeeper <zookeeper-server-list>:2181
--reassignment-json-file decrease-replication-factor.json --execute
Now, if you run the describe command for the given topic, you should see the reduced replicas as per the supplied json.
There are some tools as well created in the Kafka community that can help you achieve this. Here is one such example created by LinkedIn.

Reduce topic replication factor with Kafka manager or Kafka cli

There are currently 22 replicas configured for specific topic in Kafka 0.9.0.1.
Is it possible to reduce the replication factor of the topic to 3?
How to do it via Kafka CLI or Kafka Manager?
I found a way to increase replicas number only here
Yes. Changing (increasing or decreasing) the replication factor can be done using the following 2-step process:
First, you'll need to create a partition assignment structure for the given topic in the form of a json file. Here's an example:
{
"version":1,
"partitions":[
{"topic":"<topic-name>","partition":0,"replicas":[<broker-ids>]},
{"topic":"<topic-name","partition":1,"replicas":[<broker-ids>]},
...
{"topic":"<topic-name","partition":n,"replicas":[<broker-ids>]},
]
}
Save this file with any name. Let's say - decrease-replication-factor.json.
Note - The <broker-ids> in the end represents the comma separated list of broker ids you want your replicas to exist on.
Run the script kafka-reassign-paritions and supply the above json as an input in the following way:
kafka-reassign-partitions --zookeeper <zookeeper-server-list>:2181
--reassignment-json-file decrease-replication-factor.json --execute
Now, if you run the describe command for the given topic, you should see the reduced replicas as per the supplied json.
There are some tools as well created in the Kafka community that can help you achieve this. Here is one such example created by LinkedIn.

Kafka Topic, Broker, ZooKeeper architecture overview

I have read a bunch of articles regarding Kafka architecture but I'm still brand-new in this and when it came to coding there was some confusion if I get the things correctly.
From what I understand Kafka server, broker and node are synonyms. There can be a few brokers within Kafka cluster. There is a Kafka topic (T1) and it consists of a few partitions (P1, P2..). These partitions can be replicated across the brokers (B1, B2..). B1 can be leader for P1, B2 for P2 and so on. Do we say that there is topic T1 defined for broker or cluster, and if we treat topic as set of partitions can we say 'topic replicas'?
From the official Kafka documentation:
bootstrap.servers: A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,.... Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
So from what I understand, defining host1:port1,host2:port2 says that there are two brokers.
In this case, does ZooKeeper automatically distribute a message to a leader when executing bin/kafka-console-producer.sh --broker-list host1:port1,host2:port2 --topic test ? (I believe somewhere I have read that a producer should read broker id from ZooKeeper, but wouldn't it be unnecessary here?)
Is it equal to publishing using bin/kafka-console-producer.sh --zookeeper host1:z_port1,host2:z_port2 --topic test ?
How should I basically understand bin/kafka-configs.sh --zookeeper host1:z_port1,host2:z_port2? We have only one zookeeper instance?
Do we say that there is topic T1 defined for broker or cluster, and if we treat topic as set of partitions can we say 'topic replicas'?
1) Cluster. 2) Partitions are individually replicated across multiple brokers, often more than the replication factor itself. The more proper term would be the "in sync replicas (ISR)"
does ZooKeeper automatically distribute a message to a leader when executing
Zookeeper does not, no. Your client communicates with a Broker Controller, then receives all brokers in the cluster, which also returns metadata about which broker is the leader for which topic-partitions. The client then individually connects and produces to each leader broker for the calculated partitions
Is it equal to publishing
Producing*, yes.
We have only one zookeeper instance?
One Zookeeper cluster can manage multiple Kafka clusters via a feature called a chroot, the root directory in the Zookeeper znodes that contains information about the managed service.
Also, kafka-topics command can now use --bootstrap-server, not --zookeeper

Kafka consumer lag through JMX

I'm trying to monitor the lag of a consumer group in Kafka 0.10.
Our consumers are keeping track of their offsets in Kafka rather than ZooKeper. This means I can get the figures using the following:
bin/kafka-consumer-groups.sh --bootstrap-server <broker> --describe --group <group-name>
This works fine. However, my broker already makes use of the Prometheus JMX exporter for collecting a number of stats. I've connected JConsole to the brokers but can't see the same data being reported in JMX as reported by kafka-consumer-groups.sh.
Is there anyway to get this information from Kafka with JMX without needing any additional tools?
You could retrieve the atrributes {topic}-{partition}.records-lag of metric kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id} for all partitions. That should be equivalent to the output of consumer-groups.sh

Find broker id used in the Kafka cluster

I want to know the list of taken broker ids in a kafka cluster. For example, in a cluster with 10 nodes if I create a topic with 10 partitions(or more) I can see from the output of a describe topic command, the brokers to which it has been assigned.
./bin/kafka-topics --describe --zookeeper <zkconnect>:2181 --topic rbtest3
Can I collect this information without creating a topic?
You can get list of used broker ids using zookeeper cli.
zookeeper-3.4.8$ ./bin/zkCli.sh -server zookeeper-1:2181 ls /brokers/ids | tail -1
[0]
You also can use the zookeeper-shell.sh script that ships with the Kafka distribution, like this:
linux$ ./zookeeper-shell.sh zookeeper-IPaddress:2181 <<< "ls /brokers/ids"
Just add the IP address of any of your Zookeeper servers (and/or change the port if necessary, for example when running multiple Zookeeper instances on the same server).
This alternative can be useful when, for example, you find yourself inside a container (Docker, LXC, etc.) that is exclusively running a Kafka client; but Zookeeper itself is somewhere else (say, in a different container).
I hope it helps. =:)
#kafka broker id
cat $KAFKA_HOME/logs/meta.properties
If you want to know what is the broker ID of a specific broker - the easiest way is to look at its controller.log, I found:
cat /var/log/kafka/controller.log
[2021-02-18 13:20:22,639] INFO [ControllerEventThread controllerId=1003] Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
[2021-02-18 13:20:22,646] DEBUG [Controller id=1003] Broker 1002 has been elected as the controller, so stopping the election process. (kafka.controller.KafkaController)
controllerId=1003 ---> this is your brokerID (1003)
[substitute your path to the kafka logs, of course ...]
you can use kafka-manager, an open-source tool powered by yahoo.
You can run the following command:
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=<your topic> --broker-list=<your broker list> --time=-2
This will list all of the brokers with their id and the beginning offset.
Using the Zookeeper CLI
sh /bin/zkCli.sh -server zookeeper-1:2181 ls /brokers/ids
Then to get details of the broker, you can use the "get" with the ids that was the output generated from the previous command
sh /bin/zkCli.sh -server zookeeper-1:2181 get /brokers/ids/<broker-id>