Limiting log size for a particular topic in kafka - apache-kafka

I am trying to limit log size for a kafka topic.
I followed the kafka documentation(Kafka 0.10.0 Documentation) and set these two property cleanup.policy=delete,retention.bytes=1.
--> result of topic describe command
./kafka-topics.sh --describe --zookeeper localhost:2181 --topic
kafkatest1 Topic:kafkatest1 PartitionCount:3
ReplicationFactor:1
Configs:cleanup.policy=delete,retention.bytes=1
I was expecting that whatever message i am writing to topic 'kafkatest1' will get deleted automatically since i have set retention.bytes to 1.
but messages are keep getting appended.
Is there any additional configuration is required to achieve this?

Please check your log.segment.bytes (the default is 1073741824 bytes). As far as I know the retention will be taken into account when a log segment is rolled.
Could you explain what you are trying to a achieve? Why sending a message to delete it?

Related

delete specific messages from kafka topic __consumer_offsets

I want to delete all messages that are contained in the __consumer_offsets table that start with a given key (resetting one particular consumer group without affecting the rest).
Is there a way to do this?
Kafka comes with a ConsumerGroupCommand tool. You cand find some information in the Kafka documentation.
If you plan to reset a particular Consumer Group ("myConsumerGroup") without affecting the rest you can use
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --group myConsumerGroup --topic topic1 --to-latest
Depending on your requirement you can reset the offsets for each partition of the topic with that tool. The help function or documentation explain the options.

How to find the consumer topic and the group id from the __consumer_offsets topic in kafka?

I am trying to parse the logs from the __consumer_offsets topic in kafka. The idea is to find the group id, topic and the consumer which is creating load in my kafka cluster.
The command I am executing is below
bin/kafka-console-consumer.sh --topic __consumer_offsets --bootstrap-server brokers --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --new-consumer --consumer.config consumer.conf
Now the output looks like this
[dedupeconsumergroup,daas.dedupe.avrosyslog.incoming,4]::[OffsetMetadata[8646,NO_METADATA],CommitTime 1538115746766,ExpirationTime 1538202146766]
[dedupeconsumergroup,daas.dedupe.avrosyslog.incoming,6]::[OffsetMetadata[8639,NO_METADATA],CommitTime 1538115746766,ExpirationTime 1538202146766]
Can someone help me in understanding this log or point me to the documentation. Thanks in advance.
[dedupeconsumergroup,daas.dedupe.avrosyslog.incoming,6]::[OffsetMetadata[8639,NO_METADATA],CommitTime 1538115746766,ExpirationTime 1538202146766]
it means that consumer group with name dedupeconsumergroup read/commited offset 8639 on partition 6 in topic daas.dedupe.avrosyslog.incoming
which particular consumer read particular offset is probably little more complicated to find as you would have to know which partition was assigned to which consumer at a given point in time - it can change over time due to rebalancing.

Data deleted from kafka topics

The topics that I had started yesterday are not having any data. The topics appear in the list command but all data is lost. The data was produced and saved in the topic. However, after a period that I dont know, the data is gone. It should be noted that I did consume the data once.
How?
What settings should I change?
If you are trying to consumer using the same consumer group and that group's offsets are not expired yet then you will not get anything back.
If you are not consuming as part of the same consumer group, perhaps topic data is expired and deleted?
You can check the start and end offset of topic; e.g.
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test-orders --time -2
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test-orders --time -1
If reported start and end offsets match there is no data in the topic. In that case, you can take a look at retention.ms at topic or broker level. Increasing the value of this config would keep the records longer at the expense of using additional disk space.

Kafka 10.2 new consumer vs old consumer

I've spent some hours to figure out what was going on but didn't manage to find the solution.
Here is my set up on a single machine:
1 zookeeper running
3 broker running (on port 9092/9093/9094)
1 topic with 3 partitions and 3 replications (each partition are properly assigned between brokers)
I'm using kafka console producer to insert messages. If i check the replication offset (cat replication-offset-checkpoint), I see that my messages are properly ingested by Kafka.
Now I use the kafka console consumer (new):
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic testTopicPartitionned2
I dont see anything consumed. I tried to delete my logs folder (/tmp/kafka-logs-[1,2,3]), create new topics, still nothing.
However when I use the old kafka consumer:
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopicPartitionned2
I can see my messages.
Am I missing something big here to make this new consumer work ?
Thanks in advance.
Check to see what setting the consumer is using for auto.offset.reset property
This will affect what a consumer group without a previously committed offset will do in terms of setting where to start reading messages from a partition.
Check the Kafka docs for more on this.
Try providing all your brokers to --bootstrap-server argument to see if you notice any differnce:
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --from-beginning --topic testTopicPartitionned2
Also, your topic name is rather long. I assume you've already made sure you provide the correct topic name.

Consume and produce message in particular Kafka partition?

For reading all partitions in topic:
~bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic myTopic --from-beginning
How can I consume particular partition of the topic? (for instance with partition key 13)
And how produce message in partition with particular partition key? Is it possible?
You can't using console consumer and producer. But you can using higher level clients (in any language that works for you).
You may use for example assign method to manually assign a specific topic-partition to consume (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L906)
You may use a custom Partitioner to override the partitioning logic where you will decide manually how to partition your messages (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java#L206-L208)
With the many clients that are available you can specify the partition number just like serejja has stated.
Also look into https://github.com/cakesolutions/scala-kafka-client which uses actors and provides multiple modes for manual partitions and offsets.
If you want to do the same on the terminal, I suggest using kafkacat. (https://github.com/edenhill/kafkacat)
My personal choice during development.
You can do things like
kafkacat -b localhost:9092 -f 'Topic %t[%p], offset::: %o, data: %s key: %k\n' -t testtopic
And for a specific partition, you just need to use -p flag.
Console producer and consumer do not provide this flexibility. You could achieve this through Kafka APIs.
You could manually assign partition to consumer using assign() operation KafkaConsumer/Assign. This will disable group rebalancing. Please use this very carefully.
You could specify partition detail in KafkaProducer message. If not specified, it stores as per Partitioner policy.
How can I consume particular partition of the topic? (for instance
with partition key 13)
There is a flag called --partition in kafka-console-consumer
--partition <Integer: partition> The partition to consume from.
Consumption starts from the end of
the partition unless '--offset' is
specified.
The command is as follows:
bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic test --partition 0 --from-beginning