when the amount of messages reach the maxsize of retention.bytes ,the kafka will delete messages,the offset will be reset to zero? - apache-kafka

I am new to kafka, when we use kafka,we can set the retention.bytes. say we set to 1GB, if the amount of message reach 1GB,kafka will delete messages.I want to ask that the offset will be reset to zero?
second, the consumer set auto.offset.reset to largest, after kafka delete the messages, what offset will the consumer start?

For your question #1, with honoring both size-based and time-based policies, log might be rolled over to a new empty log segment. New log segment file's starting offset will be the offset of the next message that will be appended to the log.
For your question #2, it depends. If the offset tracked by consumer is out of range due to the message deletion, then it will be reset to the largest offset.

Related

How can I know that a kafka topic is full?

Lets say I have one kafka broker configured with one partition
log.retention.bytes=80000
log.retention.hours=6
What will happen if I try to send a record with the producer api to a broker and the log of the topic got full before the retention period?
Will my message get dropped?
Or will kafka free some space from the old messages and add mine?
How can I know if a topic is getting full and logs are being deleted before being consumed?
Is there a way to monitor or expose a metric when a topic is getting full?
What will happen if I try to send a record with the producer api to a
broker and the log of the topic got full before the retention period?
Will my message get dropped? Or will kafka free some space from the
old messages and add mine?
cleanup.policy property from topic config which by default is delete, says that "The delete policy will discard old segments when their retention time or size limit has been reached."
So, if you send record with producer api and topic got full, it will discard old segments.
How can I know if a topic is getting full and logs are being deleted
before being consumed?
Is there a way to monitor or expose a metric when a topic is getting full?
You can get Partition size using below script:
/bin/kafka-log-dirs.sh --describe --bootstrap-server : --topic-list
You will need to develop a script that will run above script for fetching current size of topic and send it periodically to Datadog.
In Datadog, you can create widget that will trigger appropriate action(e.g. sending email alerts) once size reaches a particular threshold.
It's not exactly true, a topic is never full, at least by default.
I said by default because like #Mukesh said the cleanup.policy will discard old segments when their retention time or size limit is reached, but by default there is no size limit only a time limit and the property that handle that is retention.bytes (set by default to -1).
It will let only a time limit on message, note that the retention.bytes value is set by partition so to specify a limit on a topic, you have to multiply by the numbers of partitions on that topic.
EDIT :
There is a tons of metrics that kafka export (in JMX) and in thoses you can found global metrics about segments (total numbers, per topic numbers, size, rate of rolling segments etc...).

Kafka topics beyond retention period

What happens to topics that are beyond their retention period? The messages will get wiped out but will the topic still exist and if so, will it write to offset 0 if there is only one partition on a topic?
Each offset within a partition is always assigned to a single message, and it won't be reassigned. From Log Compaction Basics documentation:
Note that the messages in the tail of the log retain the original offset assigned when they were first written—that never changes. Note also that all offsets remain valid positions in the log, even if the message with that offset has been compacted away ...
The brokers will hold no data for those topics, but the offsets will be set at their "high water mark" until new messages are produced.
The topic metadata will still exist, and the offsets always increase, never reset.

Max number of messages that can be stored in a Kafka topic partition?

I have a retention policy set for 48 hours. So old logs are eventually flushed. But topic's offset number keeps growing. When does this number get reset? What happens when the max offset number is reached? Also, new segments are rolled with base offset as filename at the time of creating new segment.What will be the filenames of .log and .index files when this limit is reached?
The following is the current base offset for log segment :
The offset is never reset because the max offset value is so big (int64) that you won't ever reach it.

Kafka Topic Partition

Kafka Topic Partition offset position always start from 0 or random value and How to ensure the consumer record is the first record in the partition ? Is there any way to find out ? If any please let me know. Thanks.
Yes and no.
When you start a new topic, the offset start at zero. Depending on the Kafka version you are using, the offsets are
logical – and incremented message by message (since 0.8.0: https://issues.apache.org/jira/browse/KAFKA-506) – or
physical – ie, the offset is increased by the number of bytes for each message.
Furthermore, old log entries are cleared by configurable conditions:
retention time: eg, keep message of the last week only
retention size: eg, use at max 10GB of storage; delete old messages that cannot be stored any more
log-compaction (since 0.8.1): you only preserve the latest value for each key (see https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction)
Thus, the first offset might not be zero if old messages got deleted. Furthermore, if you turn on log-compaction, some offsets might be missing.
In any case, you can always seek to any offset safely, as Kafka can figure out if the offset is valid or not. For an invalid offset, is automatically advances to the next valid offset. Thus, if you seek to offset zero, you will always get the oldest message that is stored.
Yes, Kafka offset starts from 0 and ends with byte length of the complete record and then next record picks the offset from there onward.
As Kafka is distributed so we can not assure that Consumer will get the data in ordered way.

consumer offset update for batch message need example

I would like my consumer to update ZK about its offsets once I received 10MB of messages total size.
Is there a way to customize my consumer to update offset after I got 10MB message?
First, set auto.commit.enable property to false to disable auto commit behavior of consumers. Then keep the size of messages you've got so far to some variable, and if it reaches to 10MB, commit the offset using ConsumerConnector interface's commitOffsets method. Then the offset in ZK will be updated. After that reset the size variable to 0.