Reduce topic replication factor with Kafka manager or Kafka cli - apache-kafka

There are currently 22 replicas configured for specific topic in Kafka 0.9.0.1.
Is it possible to reduce the replication factor of the topic to 3?
How to do it via Kafka CLI or Kafka Manager?
I found a way to increase replicas number only here

Yes. Changing (increasing or decreasing) the replication factor can be done using the following 2-step process:
First, you'll need to create a partition assignment structure for the given topic in the form of a json file. Here's an example:
{
"version":1,
"partitions":[
{"topic":"<topic-name>","partition":0,"replicas":[<broker-ids>]},
{"topic":"<topic-name","partition":1,"replicas":[<broker-ids>]},
...
{"topic":"<topic-name","partition":n,"replicas":[<broker-ids>]},
]
}
Save this file with any name. Let's say - decrease-replication-factor.json.
Note - The <broker-ids> in the end represents the comma separated list of broker ids you want your replicas to exist on.
Run the script kafka-reassign-paritions and supply the above json as an input in the following way:
kafka-reassign-partitions --zookeeper <zookeeper-server-list>:2181
--reassignment-json-file decrease-replication-factor.json --execute
Now, if you run the describe command for the given topic, you should see the reduced replicas as per the supplied json.
There are some tools as well created in the Kafka community that can help you achieve this. Here is one such example created by LinkedIn.

Related

how to distribute messages to all partitions in topic defined by `offset.storage.topic` in kafka connect

I have deployed debezium using the docker image pulled from docker pull debezium/connect
In the documentation provided at https://hub.docker.com/r/debezium/connect the description for one of the environment variable OFFSET_STORAGE_TOPIC is as follows:
This environment variable is required when running the Kafka Connect
service. Set this to the name of the Kafka topic where the Kafka
Connect services in the group store connector offsets. The topic must
have a large number of partitions (e.g., 25 or 50), be highly
replicated (e.g., 3x or more) and should be configured for compaction.
I've created the required topic named mydb-connect-offsets with 25 partitions and replication factor of 5.
The deployment is successful and everything is working fine. A sample message in mydb-connect-offsets topic looks like this. The key is ["sample-connector",{"server":"mydatabase"}] and value is
{
"transaction_id": null,
"lsn_proc": 211534539955768,
"lsn_commit": 211534539955768,
"lsn": 211534539955768,
"txId": 709459398,
"ts_usec": 1675076680361908
}
As the key is fixed, all the messages are getting to the same partition of the topic. My question is why does the documentation says that the topic must have a large number of partitions when only one partition is going to be used eventually? Also, what needs to be done to distribute the messages across all partitions?
The offsets are keyed by connector name because they must be ordered.
The large partition count is to manage offset storage of many distinct connectors in parallel, not only one.

Get total partition count in each Kafka broker

I would like to calculate the number of partitions in each of my broker. We have a muli-DC distributed architecture; and would like to get the partition count per broker for maintenance and admin tasks
This is what was suggested in one of the blogposts; and works fine and this is at cluster level; however I need a similar script for per broker
zookeeper="ZK_SERVER1:2181,ZK_SERVER2:2181,ZK_SERVER3:2181"
sum=0
for i in $(/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper $zookeeper ); do count=$(/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --describe --zookeeper $zookeeper --topic $i |grep Leader | wc -l); sum=`expr $sum + $count` ; echo 'total partitions is ' $sum; done
Partition count is exposed as a JMX Mbean.
Install some agent such as Prometheus JMX Exporter, Datadog, New Relic, etc. on each broker, then collect and aggregate that information, adding tags for DC for further grouping, as necessary
Otherwise, I don't see why you couldn't add another loop to your script for a list of different Zookeeper endpoints for each Kafka cluster.
You need to parse that output to find per broker
You could use Admin interface, list the topics, describe their metadata (that should contain the hosting broker IDs), then describe the cluster and match the IDs.
This is more or less what kafka-topics does underneath with different commands, as it's just a wrapper for underlying Java application.

Is it safe to reduce the replication factor count in altering kafka topic? [duplicate]

There are currently 22 replicas configured for specific topic in Kafka 0.9.0.1.
Is it possible to reduce the replication factor of the topic to 3?
How to do it via Kafka CLI or Kafka Manager?
I found a way to increase replicas number only here
Yes. Changing (increasing or decreasing) the replication factor can be done using the following 2-step process:
First, you'll need to create a partition assignment structure for the given topic in the form of a json file. Here's an example:
{
"version":1,
"partitions":[
{"topic":"<topic-name>","partition":0,"replicas":[<broker-ids>]},
{"topic":"<topic-name","partition":1,"replicas":[<broker-ids>]},
...
{"topic":"<topic-name","partition":n,"replicas":[<broker-ids>]},
]
}
Save this file with any name. Let's say - decrease-replication-factor.json.
Note - The <broker-ids> in the end represents the comma separated list of broker ids you want your replicas to exist on.
Run the script kafka-reassign-paritions and supply the above json as an input in the following way:
kafka-reassign-partitions --zookeeper <zookeeper-server-list>:2181
--reassignment-json-file decrease-replication-factor.json --execute
Now, if you run the describe command for the given topic, you should see the reduced replicas as per the supplied json.
There are some tools as well created in the Kafka community that can help you achieve this. Here is one such example created by LinkedIn.

How we can Dump kafka topic into presto

I need to pushing a JSON file into a Kafka topic, connecting the topic in presto and structuring the JSON data into a queryable table.
I am following this tutorial https://prestodb.io/docs/current/connector/kafka-tutorial.html#step-2-load-data
I am not able to understand how this command will work.
$ ./kafka-tpch load --brokers localhost:9092 --prefix tpch. --tpch-type tiny
Suppose I have created test topic in kafka using producer. How will tpch file will generate of this topic?
If you already have a topic, you should skip to step 3 where it actually sets up the topics to query via Presto
kafka-tpch load creates new topics with the specified prefix
Above command creates a tpch schema and loads various tables under it. This can be used for testing purpose. If you want to work with your actual kafka topics, you need to enlist them in /catalog/kafka.properties against kafka.tables-names. If you simply provide a topic name without prefix (such as test_topic), it would land into "default" schema. However, if you specify a topic name with prefix (such as test_schema.test_topic), then the topic would appear under test_schema. While querying using presto, you can provide this schema name.

kafka different topics set different partitions

As I know, 'num.partitions' in kafka server.properties will be worked for all topics.
Now I want to set partitionNumber=1 for topicA and partitionNumber=2 for topicB.
Is that possible to implementation with high level api?
num.partitions is a value used when a topic is generated automatically. If you generate a topic yourself, you can set any number of partitions as you want.
You can generate a topic yourself with the following command. (replication factor 3 and the number of partitions 2. Capital words are what you have to replace.)
bin/kafka-topics.sh --create --zookeeper ZOOKEEPER_HOSTNAME:ZOOKEEPER_PORT \
--replication-factor 3 --partitions 2 --topic TOPIC_NAME
There a configuration value that can be set on a Kafka Broker.
auto.create.topics.enable=true
True is actually the default setting,
Enable auto creation of topic on the server. If this is set to true then attempts to produce data or fetch metadata for a non-existent topic will automatically create it with the default replication factor and number of partitions.
So if you read or write from a non-existent partition as if it existed, if will automatically create one for you. I've never heard of using the high level api to automatically create one.
Looking over the Kafka Protocol Documentation, there doesn't seem to be a provided way to create topics.