Topic messages disappears from confluent topics after refreshing page - apache-kafka

Topic messages are disappearing from a topic when using confluent client. The only ones I can see (while not reloading page), are messages which I create using the "Produce" option in the same page. Kafka configurations are ok (I think), but I still don't understand what is wrong?

Looks like you are producing and consuming messages through a web browser.
Consumers typically subscribe to a topic and commit the offsets which have been consumed. The subsequent polls do not return the older messages (unless you do a seek operation) but only the newly produced messages.
The term disappearing may be applicable in two contexts:
As said above, consumer has already consumed that message and doesn't consume it again (because it has polled it already)
Your topic retention policy could be deleting older messages. You can check this, by using built in tools like kafka-console-consumer or kafka-avro-console-consumer with --from-beginning flag. If the messages are there means that is an issue with your consumer.
If you are calling consumer.poll() on every reload, then you will only get the messages after the previous call to poll (i.e. produced after the last reload). In case, you want all messages that have been present in the topic, since beginning or since sometime, you need to seek from beginning or since some timestamp or offset. See seek in KafkaConsumer

Related

How to get only latest message when re-connecting to an existing consumer group in Kafka

In my case, it is a valid possibility that a consumer is offline for a longer period. During that offline period, events are still published to the topic.
When the consumer comes back online, it will re-use its existing consumer group, which has been lagging. Is it possible to skip forward to the latest message only? That is, ignore all earlier messages. In other words, I want to alter the offset to the latest message prior to consuming.
There is the spring.kafka.consumer.auto-offset property, but as far as I understand, this is only applicable for new consumer groups. Here, I am re-using an existing consumer group when the consumer comes back online. That said, if there is a possibility to automatically prune a consumer group when its consumer goes offline, this property could work, but I am not sure if such functionality exists?
I am working with the Spring Boot Kafka integration.
You can use consumer seek method after you calculate the last offset then subtract one from that, commit, and start polling.
Otherwise, simply don't use the same group. Generate a unique one and/or disable auto commits, then you're guaranteed to always use the auto.offset.reset config and lag is meaningless across app restarts

How to get notified about expired kafka events

Is there any mechanism to get notified (by a specific logfile entry or similar) in case an event within a kafka topic is expired due to retention policies? (I know this should avoided by design, but still).
I know about consumer lag monitoring tools for monitoring offset discrepancies between a published event and related consumer groups but they provide afaik only numbers (the offset difference).
In another simple words: How can we find out if kafka events were never consumed and therefore expired?
The log cleaner thread will output deletion events to the broker logs, but it'll reflect file segments not particular messages

Kafka Consumer Re-reading Messages

I've seen an issue where all my messages in my topic gets re-read by my consumer. I only have 1 consumer, and I turn it on/off while I'm developing/testing. I notice that sometimes after days of not running the consumer, when I turn it on again suddenly it re-reads all my messages.
The clientid and groupid stays the same throughout. I explicitly call commitSync, since my enable.auto.commit=false. I do set auto.offset.reset=earliest, but to my understanding that should only kick in if the offset is deleted on the server. I'm using IBM Bluemix's MessageHub service, so maybe that's automatically deleting an offset?
Does anyone have any clues/ideas?
Thanks.
Yes offsets are automatically deleted if you don't commit for 24hours.
This is the default setting with Kafka and we've not changed it.

What happens when the kafka log is rolled over? Does a consumer miss the messages in the old log file?

Let's say I've just added a messages that caused a log rollover in Kafka. Therefore this message and the one immediately before it are now in the archived/rolled log file. Do I miss these messages, in other words, if now a consumer comes along it will not be receiving this messages? How does Kafka handle this scenario?
You will still be able to retrieve messages in log segments that have been rolled and from a consumer perspective you will not notice any difference. Messages are available for consumption until the retention criteria has been reached and old log segments are deleted.

Jboss Messaging. sending one message per time

We are using JBOSS 5.1.0, we using topic for storing our messages. And our client is making a durable subscription to get those messages.
Everything is working fine, but one issue is we are getting data from TCP client, we are processing and keeping it in topic, it is sending around 10 messages per second, and our client is reading one message at a time. There is a huge gap between that, and after sometime JBOSS Topic have many messages and it crashes saying out of memory.
IS there any workaround for this.
Basically the producer is producing 10x more messages than consumer can handle. If this situation is stable (not only during peak), this will never work.
If you limit the producer to send only one message per second (which is of course possible, e.g. check out RateLimiter), what will you do with extra messages on the producer side? If they are not queueing up in the topic, they will queue up on the producer side.
You have few choices:
somehow tune your consumer to process messages faster, so the topic is never filled up
tune the topic to use persistent storage. This is much better. Not only the topic won't store everything in memory, but you might also get transactional behaviour (messages are durable)
put a queue of messages that you want to set to the topic and process one message per second. That queue must be persistent and must be able to keep more messages than the topic currently can