Kafka: Can we limit a number of messages per key?

Kafka: Can we limit a number of messages per key? - apache-kafka

Suppose I want to remain not more than 100 recent messages per key in Kafka topic. Can I reach this policy somehow? For example can I configure compaction policy to store recent N messages (not only one per key)?

There is no such way that Kafka provides. Kafka doesn't remove logs (kind of compaction) based on message-based recency. It rather depends on the time-lapse & size of a log file.

Related

Is it possible in Spring Kafka to send a messages that will expire on a per message (not per template or higher) basis

I am trying to use Kafka as a request-response system between two clients much like RabbitMQ and I was wondering if it is possible to set the expiration of a message so that after it is posted it will automatically get deleted from the Kafka servers.
I'm trying to do it on a per message level as well (but even if it were per-topic it is okay, but I'd like to use the same template if possible).
I was checking ProducerRecord, but all it had was timestamp. I also don't see any mention of it in KafkaHeaders

Kafka records are deleted in segments (a group of messages) based on overall topic retention.
Spring is just a client. It doesn't control the server side logic of the log cleaner.

How can I know that a kafka topic is full?

Lets say I have one kafka broker configured with one partition
log.retention.bytes=80000
log.retention.hours=6
What will happen if I try to send a record with the producer api to a broker and the log of the topic got full before the retention period?
Will my message get dropped?
Or will kafka free some space from the old messages and add mine?
How can I know if a topic is getting full and logs are being deleted before being consumed?
Is there a way to monitor or expose a metric when a topic is getting full?

What will happen if I try to send a record with the producer api to a
broker and the log of the topic got full before the retention period?
Will my message get dropped? Or will kafka free some space from the
old messages and add mine?
cleanup.policy property from topic config which by default is delete, says that "The delete policy will discard old segments when their retention time or size limit has been reached."
So, if you send record with producer api and topic got full, it will discard old segments.
How can I know if a topic is getting full and logs are being deleted
before being consumed?
Is there a way to monitor or expose a metric when a topic is getting full?
You can get Partition size using below script:
/bin/kafka-log-dirs.sh --describe --bootstrap-server : --topic-list
You will need to develop a script that will run above script for fetching current size of topic and send it periodically to Datadog.
In Datadog, you can create widget that will trigger appropriate action(e.g. sending email alerts) once size reaches a particular threshold.

It's not exactly true, a topic is never full, at least by default.
I said by default because like #Mukesh said the cleanup.policy will discard old segments when their retention time or size limit is reached, but by default there is no size limit only a time limit and the property that handle that is retention.bytes (set by default to -1).
It will let only a time limit on message, note that the retention.bytes value is set by partition so to specify a limit on a topic, you have to multiply by the numbers of partitions on that topic.
EDIT :
There is a tons of metrics that kafka export (in JMX) and in thoses you can found global metrics about segments (total numbers, per topic numbers, size, rate of rolling segments etc...).

Implementation of queues using kafka server

I want to implement a queue mechanism using kafka. But could not find anywhere that if it's possible to just peek data from the queue created for any topic without moving forward into it.
I want to read data from the queue and on the basis of different conditions want to remove the existing message or add another message into this queue. Also, is it possible to use a single kafka server from different machines.
I referred to tutorialspoint for learning more about it.
Thanks in advance. Any leads would be appreciated.

Keep in mind that Kakfa scales with multiple partitions per topic, and it doesn't give any ordering guarantee between partitions. So don't use kafka if you want strict ordering. Within a consumer group, if you want n consumers per topic, you need to have atleast n partitions.
Consumers don't remove messages, they commit the offset of a message. Default configuration in most clients is to auto commit offset on read. You can re-insert messages into the topic anytime. But you cannot skip a message and expect to process it later.
You can connect as many machines as you want to a kafka server. Typically, you have multiple servers as a kafka cluster, with replication for fault tolerance.

Can I delete a Kafka Partition version 0.10.0.1

We have a requirement that demands to delete/purge data for any given partition within a topic. I am using Kafka 0.10.0.1. Is there any way I can delete entire partition content on demand? If yes then how. I see that we can use log compaction to post a null message for a key and delete it but other than that is there any way to achieve deletion?

Kafka does not currently support reducing the number of partitions for a topic, so no out-of-box tool is offered to be used to remove a partition directly.

Partitions and Replications for the Apache Kafka

I have read the entire Documentation from the suggested website http://kafka.apache.org/ and did not able to understand the Hardware Requirements
1)I need a clarification on: How many Partitions and Replication is Required for collecting minimum 50GB of data per/day for single topic
2)It is given that the 0000000000000.log file is able to store up-to 100GB of data. Is it possible to reduce this log file size for reducing the usage of I/O ?

If the data is uniformed ingested during the entire day, that means that you need to ingest something like 600kb per second, all depends on the number of messages that are on those 600kb (according to Jay Creps explanation here you need to calculate something like 22 bytes of overhead per message) (keep in mind that the way you ACK the messages from the producer is also very important)
But you should be able with 1 topic and 1 partition to get this throughput from a producer.

1.Check this link it has the answer to choose #partitions:
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/][1]
Yes it is possible to change the maximum size of log file in kafka. You have to set the below mentioned property on each of the brokers and then restart the brokers.
log.segment.bytes=1073741824
Above line will set the log segment size to 1GB.