kafka different topics set different partitions

kafka different topics set different partitions - apache-kafka

As I know, 'num.partitions' in kafka server.properties will be worked for all topics.
Now I want to set partitionNumber=1 for topicA and partitionNumber=2 for topicB.
Is that possible to implementation with high level api?

num.partitions is a value used when a topic is generated automatically. If you generate a topic yourself, you can set any number of partitions as you want.
You can generate a topic yourself with the following command. (replication factor 3 and the number of partitions 2. Capital words are what you have to replace.)
bin/kafka-topics.sh --create --zookeeper ZOOKEEPER_HOSTNAME:ZOOKEEPER_PORT \
--replication-factor 3 --partitions 2 --topic TOPIC_NAME

There a configuration value that can be set on a Kafka Broker.
auto.create.topics.enable=true
True is actually the default setting,
Enable auto creation of topic on the server. If this is set to true then attempts to produce data or fetch metadata for a non-existent topic will automatically create it with the default replication factor and number of partitions.
So if you read or write from a non-existent partition as if it existed, if will automatically create one for you. I've never heard of using the high level api to automatically create one.
Looking over the Kafka Protocol Documentation, there doesn't seem to be a provided way to create topics.

Related

Reduce topic replication factor with Kafka manager or Kafka cli

There are currently 22 replicas configured for specific topic in Kafka 0.9.0.1.
Is it possible to reduce the replication factor of the topic to 3?
How to do it via Kafka CLI or Kafka Manager?
I found a way to increase replicas number only here

Yes. Changing (increasing or decreasing) the replication factor can be done using the following 2-step process:
First, you'll need to create a partition assignment structure for the given topic in the form of a json file. Here's an example:
{
"version":1,
"partitions":[
{"topic":"<topic-name>","partition":0,"replicas":[<broker-ids>]},
{"topic":"<topic-name","partition":1,"replicas":[<broker-ids>]},
...
{"topic":"<topic-name","partition":n,"replicas":[<broker-ids>]},
]
}
Save this file with any name. Let's say - decrease-replication-factor.json.
Note - The <broker-ids> in the end represents the comma separated list of broker ids you want your replicas to exist on.
Run the script kafka-reassign-paritions and supply the above json as an input in the following way:
kafka-reassign-partitions --zookeeper <zookeeper-server-list>:2181
--reassignment-json-file decrease-replication-factor.json --execute
Now, if you run the describe command for the given topic, you should see the reduced replicas as per the supplied json.
There are some tools as well created in the Kafka community that can help you achieve this. Here is one such example created by LinkedIn.

Kafka Consumer does not receive data when one of the brokers is down

Kafka Quickstart
Using Kafka v2.1.0 on RHEL v6.9
Consumer fails to receive data when one of the Kafka brokers is down.
Steps performed:
1. Start zookeeper
2. Start Kafka-Server0 (localhost:9092, kafkalogs1)
3. Start Kafka-Server1 (localhost:9094, kafkalog2)
4. Create topic "test1", num of partitions = 1, replication factor = 2
5. Run producer for topic "test1"
6. Run consumer
7. Send messages from the producer
8. Receive messages on the consumer side.
All the above steps worked without any issues.
When I shutdown Kafka-Server0, the consumer stops getting data from Producer.
When I bring back up Kafka-Server0, the consumer starts to get messages from where it left off.
These are the commands used
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic test1
The behavior is the same (no message received on the consumer side) when I run the consumer with two servers specified in the --bootstrap-server option.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9094 --topic test1
Any idea why the consumer stops getting messages when server0 is down even though the replication factor for the topic test1 was set to 2?
There is a similar question already but it was not answered completely
Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down

If the offsets topic is unavailable, you cannot consume.
Look at the server.properties file for these, and see the comment above, and increase accordingly (only applies if topic doesn't already exist)
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
According to your previous question, looks like it only has one replica
See how you can increase replication factor for an existing topic

In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in a topic __consumer_offsets.
You can think of a scenario where you created a topic with a replication factor of 1. In case the broker goes down the data is only on that Kafka node which is down. So you can't get this data. Same analogy applies to __consumer_offsets topic.
You need to revisit the server.properties in order to get features you are expecting. But in case you still wanna consume the messages from the replica partition, you may need to re-start the console consumer with --from-beginning true

Kafka Topic creation upon specific broker

Regarding Kafka topic creation. I have an understanding that Kafka cluster can have several brokers/nodes/servers. Each broker can have one or more topics configured. A topic created could be in one or more brokers depending on partitions provided during topic creation. Is there any way how one can tell in which broker/s should a topic and it's partitions be created?
Regards,
Lokesh

When creating the topic, you can either just specify the number of partitions and replicas and let Kafka distribute them. Or you can directly specify the assignment - which partition and replica goes where.
If you are using the kafka-topics.sh script which is part of kafka, you can use the option --replica-assignment for it. For example:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic topic1 --replica-assignment 0:1:2,0:1:2,0:1:2
If the topic already exists, you can use the kafka-reassign-partitions.sh tool to change the assignment.
This might contain some more details about is: https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.2CreateTopics

Consume and produce message in particular Kafka partition?

For reading all partitions in topic:
~bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic myTopic --from-beginning
How can I consume particular partition of the topic? (for instance with partition key 13)
And how produce message in partition with particular partition key? Is it possible?

You can't using console consumer and producer. But you can using higher level clients (in any language that works for you).
You may use for example assign method to manually assign a specific topic-partition to consume (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L906)
You may use a custom Partitioner to override the partitioning logic where you will decide manually how to partition your messages (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java#L206-L208)

With the many clients that are available you can specify the partition number just like serejja has stated.
Also look into https://github.com/cakesolutions/scala-kafka-client which uses actors and provides multiple modes for manual partitions and offsets.
If you want to do the same on the terminal, I suggest using kafkacat. (https://github.com/edenhill/kafkacat)
My personal choice during development.
You can do things like
kafkacat -b localhost:9092 -f 'Topic %t[%p], offset::: %o, data: %s key: %k\n' -t testtopic
And for a specific partition, you just need to use -p flag.

Console producer and consumer do not provide this flexibility. You could achieve this through Kafka APIs.
You could manually assign partition to consumer using assign() operation KafkaConsumer/Assign. This will disable group rebalancing. Please use this very carefully.
You could specify partition detail in KafkaProducer message. If not specified, it stores as per Partitioner policy.

How can I consume particular partition of the topic? (for instance
with partition key 13)
There is a flag called --partition in kafka-console-consumer
--partition <Integer: partition> The partition to consume from.
Consumption starts from the end of
the partition unless '--offset' is
specified.
The command is as follows:
bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic test --partition 0 --from-beginning

Are broker nodes in kafka cluster configured to handle number of partition?

Kafka places the partitions and replicas in a way that the brokers with least number of existing partitions are used first. Does it mean that brokers are pre-configured to handle the partitions.

When you create a topic, you set the number of partitions.
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test
Also, there is a num.partions parameter you can use. (This is used when a topic is automatically created.)
A broker can have as many partitions as it wants as long as it has enough diskspace, memory, and network bandwidth.
In the screenshot above, you can see the partition of test. If you make a topic with three partitions, you will have two more folders with test-1 and test-2.
Each partition has an index file, a timeindex file, and a log file. The log file keeps Kafka data for that partition.