How long a rollbacked message is kept in a Kafka topic - apache-kafka

I came across with this scenario when implementing a chained transaction manager inside our spring boot application interacting with consuming messages from JMS then publishing to a Kafka topic.
My testing strategy was explained on here:
Unable to synchronise Kafka and MQ transactions usingChainedKafkaTransaction
In short I threw a RuntimeException on purpose after consuming messages from MQ and writing them to Kafka just to test transaction behaviour.
However as the rollback functionality worked OK I could see the number of uncommitted messages in the Kafka topic growing forever even if a rollback was happening with each processing. In a few seconds I ended up with hundreds of uncommitted messages in the topic.
Naturally I asked myself if a message is rollbacked why would it still be there taking storage. I understand with transaction isolation set to read_committed they will never get consumed but the idea of a poison message being rollbacked again and again eating up your storage does not sound right to me.
So my question is:
Am I missing something? Is there a configuration in place for a "time to live" or similar for a message that was rollbacked. I tried to read the Kafka docs around this subject but I could not find anything. Is such a setting is not in place what would be a good practice to deal with situations like this and avoid wasting storage.
Thank you in advance for your inputs.

That's just the way Kafka works.
Publishing a record always takes a slot in the partition log. Whether or not a consumer can see that record depends on whether it is committed or not (assuming the isolation level is read_committed).
Kafka achieves its extraordinary throughput because of its simple log architecture.
Rollback is assumed to be somewhat rare.
If you are getting so many rollbacks then your application architecture is probably at fault.
You should probably shut things down for a while if you keep rolling back.
To specifically answer your question, see log-rentention-hours.
The uncommitted records are kept for a week by default.

Related

kafka Manager not showing consumer lag or Sum of partition offsets

I don't know if it is config issue or not. The kafka manager can get everything except consumer lag. Can anyone help me?
I'm using kafka 2.2.0
I also checked logs, no errors at all.
And also one of the important thing is that my code base is processing messages and inserting it to database as well.
But why I posted this question is because I may not be to see lags in the topic if it exist or not.
So that I can decide how many consumer i have to run.

Throttling of messages on consumer side

I am beginner level at kafka and have developed consumer for kafka messages which looks good right now.
Though there is a requirement came along while testing of consumer that may be some throttling of messages will be needed at consumer side.
The consumer (.net core, using confluent), after receiving messages, calls api and api processes the message. As part this process, It has few number of read and write to database.
The scenario is, Consumer may receive millions or atleast few thousand of messages daily. This makes load on DB side as part of processing.
So I am thinking to put some throttling on receiving messages on kafka consumer so the DB will not be overloaded. I have checked the option for poll but seems its not all that I want.
For example, within 10 minutes, consumer can receive 100k messages only. Something like that.
Could anybody please suggest how to implement throttling of messages on kafka consumer or is there any better way that this can be handled?
I investigated more and come to know from expert that "throttling on consumer side is not easy to implement, since kafka consumer is implemented in such way to read and process messages as soon as they are available in kafka topic. So, speed is a benefit in kafka world :)"
Seems I can not do much at kafka consumer side. I am thinking to see on the other side and may be separating reads (to replica) and writes to the database can help.

Exactly-once semantics in spring Kafka

I need to apply transactions in a system that comprises of below components:
A Kafka producer, this is some external application which would publish messages on a kafka topic.
A Kafka consumer, this is a spring boot application where I have configured the kafka listener and after processing the message, it needs to be saved to a NoSQL database.
I have gone through several blogs like this & this, and all of them talks about the transactions in context of streaming application, where the messages would be read-processed-written back to a Kafka topic.
I don't see any clear example or blog around achieving transactionality in the use case similar to mine i.e. producing-processing-writing to a DB in a single atomic transaction. I believe it to be very common scenario & there must be some support for it as well.
Can someone please guide me on how to achieve this? Any relevant code snippet would be greatly appreciated.
in a single atomic transaction.
There is no way to do it; Kafka doesn't support XA transactions (nor do most NoSQL DBs). You can use Spring's transaction synchronization for best-effort 1PC.
See the documentation.
Spring for Apache Kafka implements normal Spring transaction synchronization.
It provides "best efforts 1PC" - see Distributed transactions in Spring, with and without XA for more understanding and the limitations.
I'm guessing you're trying to solve the scenario where your consumer goes down after writing to the database but before committing the offsets, or other similar problems. Unfortunately this means you have to build your own fault-tolerance.
In the case of the problem I mentioned above, this means you would have to manage the consumer offsets in your end-output database, updating them in the same database transaction that you're writing the output of your consumer application to.

Using a Kafka consumer in order for a message to be consumed by exactly once semantics

I am new to Kafka and I am seeking guidance on how to use Kafka in order to implement the following message pattern:
First, I want the message to be asynchronous and furthermore it needs to be "consumed" i.e. a single consumer should consume it and other consumers won't be able to consume it thereafter.
A use case of this message pattern is when you have multiple instances of a "delivery service" and you want only one of these instances to consume the message (this assumes one cannot leverage idempotency for some reason).
Can someone please advise how to configure the Kafka Consumer in order to achieve the above?
I think you're essentially looking to use Kafka as a traditional message queue (e.g. Rabbit MQ) where in the message gets removed after consumption. There has been quite a lot of debate on this. As it is always the case, there are merits and demerits on both sides of the fence.
The answers on this post are more or less against the idea ...
However...
This article talks about an approach on how you could possibly try and make it work. The messages won't really be deleted but the approach is quite similar. It is a fairly comprehensive post that covers the overhead and the optimisations that you could explore to make it more efficient.
I hope this helps!
Great question and its something a lot of us struggle with when deploying and using Kafka. In fact, there are a number of times where a project I was working on tried to use Kafka for the use case you described with very little success.
In a nutshell, there are a few Message Exchange Patterns that you come across when dealing with messaging:
Request->Reply
Publish/Subscribe
Queuing (which is what you are trying to do)
Without digging too deep into why, Kafka was really built simply for Publish/Subscribe. There are other products that implement the other features separately and one that actually does all three.
So a question I have for you is would you be open to using something other than Kafka for this project?
You may use spring kafka to do this. Spring Kafka takes care of lot of configurations and boiler plate code. Check example here https://www.baeldung.com/spring-kafka. This should get your started.
Also, you may need to read on how Kafka actually works. The messages that you publish to the Topics in Kafka are natively asynchronous. Your producers don't worry about who consumes it or what happens to the messages once published.
Then consumers in your delivery services should subscribe to the topics. If you want your delivery services to consume a message only once, then the consumers for your delivery services should be in the same group (same group id). Kafka takes care of making sure that the message that was consumed by one of the Consumers (in a same group) won't be available to other Consumers.
The default message retention period is seven days which is configurable in Kafka.

Jboss Messaging. sending one message per time

We are using JBOSS 5.1.0, we using topic for storing our messages. And our client is making a durable subscription to get those messages.
Everything is working fine, but one issue is we are getting data from TCP client, we are processing and keeping it in topic, it is sending around 10 messages per second, and our client is reading one message at a time. There is a huge gap between that, and after sometime JBOSS Topic have many messages and it crashes saying out of memory.
IS there any workaround for this.
Basically the producer is producing 10x more messages than consumer can handle. If this situation is stable (not only during peak), this will never work.
If you limit the producer to send only one message per second (which is of course possible, e.g. check out RateLimiter), what will you do with extra messages on the producer side? If they are not queueing up in the topic, they will queue up on the producer side.
You have few choices:
somehow tune your consumer to process messages faster, so the topic is never filled up
tune the topic to use persistent storage. This is much better. Not only the topic won't store everything in memory, but you might also get transactional behaviour (messages are durable)
put a queue of messages that you want to set to the topic and process one message per second. That queue must be persistent and must be able to keep more messages than the topic currently can