I am interested in monitoring the consuming behavior. In particular, I would like to know when which messages were ready by which consumer group. Is there an offset or consumer history that I can access?
If it helps, I use Confluent Cloud for setting up the topics, etc.
If I understand your question correctly, you would like to know when events were processed by your consumer?
In that case, you should just add logging to your consumer code, then use a log-collection tool like Elasticsearch or Splunk like you'd use for tracking logs/history across any other services.
Related
Is there any way we can detect crash or shut down of consumer?
I want that kafka server publish event when mentioned situation to all kafka clients (publishers, consumers....).
Is it possible?
Kafka keeps track of the consumed offsets per consumer on special internal topics. You could setup a special "monitoring service", have that constantly consuming from those offset internal topics, and trigger any notification/alerting mechanisms as needed so that your publishers and other consumers are notified programatically. This other SO question has more details about that.
Depending on your use case, lag monitoring is also a really good way to know if your consumers are falling behind and/or crashed. There's multiple solutions for that out there, or again, you could build your own to customize alerting/notification behavior.
Applications which have Kafka consumers could have REST services as part of deployment.
The problem here is that manipulation of offsets cannot happen when the consumer group is active and requires the consumer group to be inactive by stopping the application. This would also mean the REST services would be down for that amount of time.
Please suggest if there are ways to have them in the same deployment and yet allow offset manipulation without downtime or should they be not bundled altogether, thanks.
manipulation of offsets cannot happen when the consumer group is active and requires the consumer group to be inactive by stopping the application.
Since you tagged the question with spring-kafka, I assume you are using Spring for Apache Kafka.
You can stop the listener container(s), which will close the consumer(s); then restart them.
You can also manipulate the offsets programmatically, while the consumers are running.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek
I haven't used Kafka before and wanted to know if messages are published through Kafka what are the possible ways to capture that info?
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Haven't used Kafka before and while reading up I did find that Kafka needs ZooKeeper running too.
I don't need to publish info just process data received from Kafka publisher.
Any pointers will help.
Kafka is a distributed streaming platform that allows you to process streams of records in near real-time.
Producers publish records/messages to Topics in the cluster.
Consumers subscribe to Topics and process those messages as they are available.
The Kafka docs are an excellent place to get up to speed on the core concepts: https://kafka.apache.org/intro
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Kafka has its own TCP based protocol, not a native HTTP client (assuming that's what you actually mean by REST)
Consumers are the only way to get and subsequently process data, however plenty of external tooling exists to make it so you don't have to write really any code if you don't want to in order to work on that data
Is it possible to track who the consumers are in Kafka? Perhaps some way for a consumer to 'register' itself as a consumer to a Kafka topic?
If #1 is possible, then is it also possible for Kafka to track the time when a consumer consumed a message?
It is possible to implement these features in the application itself, but I wonder if Kafka already provides some way to do this. I can't find any documentation on these features so perhaps this is not possible in Kafka, but it would be great to get confirmation. Thank you!
You can track consumer groups, but I do not think you can track consumers within the group very easily. Within a group, it gives you lag, and from lag, you would need to read that offset difference to actually get the time
There is no other such "registration process".
What you could do is develop an "interceptor" that is able to track messages and times throughout the system. That is how Confluent Control Center is able to graphically display if/when consumers get messages
However, that requires additional configurations on all consumers. More specifically, the interceptor on the classpath.
We want to use Apache Kafka as a live message broker only. A message is distributed and instantly utilized (fire and forget).
Could we theoretically keep no logs and just send and forget?
Which config entries should we change?
log.retention.hours and log.retention.bytes?
That's not how Kafka works. If you don't keep logs, then your messages won't be available to consume. If you set it to a low value, you lose the message if your consumer is offline (crashes, etc).
Kafka tracks the messages that each consumer has read, so you don't need to think about deleting to prevent re-reading.
If you still don't like this approach, then just use a 'traditional' MQ that does have the semantics you are looking for :)
You might find this whitepaper useful.
Disclaimer: I work for Confluent