Removing one message from a topic in Kafka - apache-kafka

I'm new at using Kafka and I have one question. Can I delete only ONE message from a topic if I know the topic, the offset and the partition? And if not is there any alternative?

It is not possible to remove a single message from a Kafka topic, even though you know its partition and offset.
Keep in mind, that Kafka is not a key/value store but a topic is rather an append-only(!) log that represents a stream of data.
If you are looking for alternatives to remove a single message you may
Have your consumer clients ignore that message
Enable log compaction and send a tompstone message
Write a simple job (KafkaStreams) to consume the data, filter out that one message and produce all messages to a new topic.

Related

Kafka Consumer writing to the same queue from where it is reading

I have a use case where I want to consume from a kafka topic and depending on some logic if I am not able to process the message right now, I want to enqueue the message back to the same topic from where it had been read
Something like this
Topic1 ---> Consumer ---> Can't process now
^
|Re-enqueues________|
Is it possible ?
Yes, this is possible.
However, be aware that depending on your retention settings the re-ingested message might exist in the topic multiple times. Also, the consumer will consume all messages as long as it is running which could lead to the case that it has consumed all valid messages but keeps on re-ingesting the other messages over and over again.
The typical pattern to deal with messages that should be re-ingested into your pipeline is to send them to a dedicated Kafka topic. Once your consumer is fixed to be able to process those messages you can then have your consumer read that dedicated topic just once.

How to make fanout in Apache Kafka?

I need to send message for all consumers, but before detect who should get this message, how to do that using Kafka?
Should I use Kafks stream to filter data then send to consumers?
As I know each consumers should be added to unique consumer group, but how to detect in real time, who must receive message ?
Kafka decouples consumer and producer and when you write into a topic, you don't know which consumers might read the data.
Thus, in Kafka you never "send a message to a consumer", you just write the message into a topic and that's it.
Consumers just read from topics.

How to make restart-able producer?

Latest version of kafka support exactly-once-semantics (EoS). To support this notion, extra details are added to each message. This means that at your consumer; if you print offsets of messages they won't be necessarily sequential. This makes harder to poll a topic to read the last committed message.
In my case, consumer printed something like this
Offset-0 0
Offset-2 1
Offset-4 2
Problem: In order to write restart-able proudcer; I poll the topic and read the content of last message. In this case; last message would be offset#5 which is not a valid consumer record. Hence, I see errors in my code.
I can use the solution provided at : Getting the last message sent to a kafka topic. The only problem is that instead of using consumer.seek(partition, last_offset=1); I would use consumer.seek(partition, last_offset-2). This can immediately resolve my issue, but it's not an ideal solution.
What would be the most reliable and best solution to get last committed message for a consumer written in Java? OR
Is it possible to use local state-store for a partition? OR
What is the most recommended way to store last message to withstand producer-failure? OR
Are kafka connectors restartable? Is there any specific API that I can use to make producers restartable?
FYI- I am not looking for quick fix
In my case, multiple producers push data to one big topic. Therefore, reading entire topic would be nightmare.
The solution that I found is to maintain another topic i.e. "P1_Track" where producer can store metadata. Within a transaction a producer will send data to one big topic and P1_Track.
When I restart a producer, it will read P1_Track and figure out where to start from.
Thinking about storing last committed message in a database and using it when producer process restarts.

Can i consume based on specific condition in Kafka?

I'm writing a msg in to Kafka and consuming it in the other end.
Doing some process in it and writing it back to another Kafka topic.
I want to know which message response is for which request..
currently decided to capture the offset id from consumer side then write in the response and read the response payload and decide the same.
For this approach we need to read each message is there any other way we can consume based on consumer config condition?
Consumers can only read the whole topic. You can only skip messages via seek() but there is no conditions that you can evaluate on the broker to filter messages.
You will need to consume the whole topic an process/filter in the client.

Should a Kafka consumer consume from all partitions to get a complete message?

I was going through Kafka, and I understand that topics will be partitioned into ordered sequences of partitions.
Since topics are partitioned, should the consumer consume from all of the partitions, in order to get a complete message? If this is so, whose responsibility it is to make complete message out of all messages?
Each complete message is written, in its entirety, into a single offset in a single partition. Just by reading a single message from any one partition using any of the available consumer API's, you're assured of getting the whole thing.
See the Kafka site for more info