kafka Manager not showing consumer lag or Sum of partition offsets - apache-kafka

I don't know if it is config issue or not. The kafka manager can get everything except consumer lag. Can anyone help me?
I'm using kafka 2.2.0
I also checked logs, no errors at all.
And also one of the important thing is that my code base is processing messages and inserting it to database as well.
But why I posted this question is because I may not be to see lags in the topic if it exist or not.
So that I can decide how many consumer i have to run.

Related

How long a rollbacked message is kept in a Kafka topic

I came across with this scenario when implementing a chained transaction manager inside our spring boot application interacting with consuming messages from JMS then publishing to a Kafka topic.
My testing strategy was explained on here:
Unable to synchronise Kafka and MQ transactions usingChainedKafkaTransaction
In short I threw a RuntimeException on purpose after consuming messages from MQ and writing them to Kafka just to test transaction behaviour.
However as the rollback functionality worked OK I could see the number of uncommitted messages in the Kafka topic growing forever even if a rollback was happening with each processing. In a few seconds I ended up with hundreds of uncommitted messages in the topic.
Naturally I asked myself if a message is rollbacked why would it still be there taking storage. I understand with transaction isolation set to read_committed they will never get consumed but the idea of a poison message being rollbacked again and again eating up your storage does not sound right to me.
So my question is:
Am I missing something? Is there a configuration in place for a "time to live" or similar for a message that was rollbacked. I tried to read the Kafka docs around this subject but I could not find anything. Is such a setting is not in place what would be a good practice to deal with situations like this and avoid wasting storage.
Thank you in advance for your inputs.
That's just the way Kafka works.
Publishing a record always takes a slot in the partition log. Whether or not a consumer can see that record depends on whether it is committed or not (assuming the isolation level is read_committed).
Kafka achieves its extraordinary throughput because of its simple log architecture.
Rollback is assumed to be somewhat rare.
If you are getting so many rollbacks then your application architecture is probably at fault.
You should probably shut things down for a while if you keep rolling back.
To specifically answer your question, see log-rentention-hours.
The uncommitted records are kept for a week by default.

Throttling of messages on consumer side

I am beginner level at kafka and have developed consumer for kafka messages which looks good right now.
Though there is a requirement came along while testing of consumer that may be some throttling of messages will be needed at consumer side.
The consumer (.net core, using confluent), after receiving messages, calls api and api processes the message. As part this process, It has few number of read and write to database.
The scenario is, Consumer may receive millions or atleast few thousand of messages daily. This makes load on DB side as part of processing.
So I am thinking to put some throttling on receiving messages on kafka consumer so the DB will not be overloaded. I have checked the option for poll but seems its not all that I want.
For example, within 10 minutes, consumer can receive 100k messages only. Something like that.
Could anybody please suggest how to implement throttling of messages on kafka consumer or is there any better way that this can be handled?
I investigated more and come to know from expert that "throttling on consumer side is not easy to implement, since kafka consumer is implemented in such way to read and process messages as soon as they are available in kafka topic. So, speed is a benefit in kafka world :)"
Seems I can not do much at kafka consumer side. I am thinking to see on the other side and may be separating reads (to replica) and writes to the database can help.

The following subscribed topics are not assigned to any members

I'm still quite new to kafka, and something happened to me which I don't understand. I have 2 apps. One is using SpringKafkas listeners to consumer binary avro messages and process them. For some debugging purposes I wrote trivial second cli app, which take topic, offset brokers etc. to consume it using just plain kafka classes, without spring kafka. Both were working for a while. Recently something went wrong with the server.
Main app now does not consume any messages. During startup is sometimes(!) prints message like: "The following subscribed topics are not assigned to any members ..." When I enable debugging, I see it's spinning on "Leader for partition ... unavailable for fetching offset, wait for metadata refresh".
Ok, I could understand from that, that "something went wrong" with the cluster. What is puzzling me a lot is, that our cli app has no issues whatever to connect to the cluster and prints all messages from beginning of topic. I reverified multiple times, that both apps uses same brokers, same topic, different groupId. Neither app specifies partition id anywhere.
What could be the cause, that one app have no problemm reading from topic, while another is unable to "connect" and consume neither preexisting nor new messages from topic?

Is it possible to track the consumers in Kafka and the timestamp in which a consumer consumed a message?

Is it possible to track who the consumers are in Kafka? Perhaps some way for a consumer to 'register' itself as a consumer to a Kafka topic?
If #1 is possible, then is it also possible for Kafka to track the time when a consumer consumed a message?
It is possible to implement these features in the application itself, but I wonder if Kafka already provides some way to do this. I can't find any documentation on these features so perhaps this is not possible in Kafka, but it would be great to get confirmation. Thank you!
You can track consumer groups, but I do not think you can track consumers within the group very easily. Within a group, it gives you lag, and from lag, you would need to read that offset difference to actually get the time
There is no other such "registration process".
What you could do is develop an "interceptor" that is able to track messages and times throughout the system. That is how Confluent Control Center is able to graphically display if/when consumers get messages
However, that requires additional configurations on all consumers. More specifically, the interceptor on the classpath.

Kafka Consumer Re-reading Messages

I've seen an issue where all my messages in my topic gets re-read by my consumer. I only have 1 consumer, and I turn it on/off while I'm developing/testing. I notice that sometimes after days of not running the consumer, when I turn it on again suddenly it re-reads all my messages.
The clientid and groupid stays the same throughout. I explicitly call commitSync, since my enable.auto.commit=false. I do set auto.offset.reset=earliest, but to my understanding that should only kick in if the offset is deleted on the server. I'm using IBM Bluemix's MessageHub service, so maybe that's automatically deleting an offset?
Does anyone have any clues/ideas?
Thanks.
Yes offsets are automatically deleted if you don't commit for 24hours.
This is the default setting with Kafka and we've not changed it.