Is there any way we can detect crash or shut down of consumer?
I want that kafka server publish event when mentioned situation to all kafka clients (publishers, consumers....).
Is it possible?
Kafka keeps track of the consumed offsets per consumer on special internal topics. You could setup a special "monitoring service", have that constantly consuming from those offset internal topics, and trigger any notification/alerting mechanisms as needed so that your publishers and other consumers are notified programatically. This other SO question has more details about that.
Depending on your use case, lag monitoring is also a really good way to know if your consumers are falling behind and/or crashed. There's multiple solutions for that out there, or again, you could build your own to customize alerting/notification behavior.
Related
Is there a way to notify consumer about the new events being published to kafka topics which consumer has subscribed to while consumer is not actively listening? I know the question itself seems confusing but i was thinking if it is really necessary to have one process running continuously in order to consume messages. I think it will make consumer process easier if we know when the message is available to read.
Consumers read messages by polling the topic, so fundamentally, you must have a process running continuously. If the consumer does not poll within the value of the property max.poll.interval.ms, the consumer will leave the group. A hallmark feature of event-driven architectures is that consumers and producers are decoupled; the consumer does not know whether the producer even exists. Therefore, there is no way to know when a message is available to read without actively polling.
Is it possible to track who the consumers are in Kafka? Perhaps some way for a consumer to 'register' itself as a consumer to a Kafka topic?
If #1 is possible, then is it also possible for Kafka to track the time when a consumer consumed a message?
It is possible to implement these features in the application itself, but I wonder if Kafka already provides some way to do this. I can't find any documentation on these features so perhaps this is not possible in Kafka, but it would be great to get confirmation. Thank you!
You can track consumer groups, but I do not think you can track consumers within the group very easily. Within a group, it gives you lag, and from lag, you would need to read that offset difference to actually get the time
There is no other such "registration process".
What you could do is develop an "interceptor" that is able to track messages and times throughout the system. That is how Confluent Control Center is able to graphically display if/when consumers get messages
However, that requires additional configurations on all consumers. More specifically, the interceptor on the classpath.
We want to use Apache Kafka as a live message broker only. A message is distributed and instantly utilized (fire and forget).
Could we theoretically keep no logs and just send and forget?
Which config entries should we change?
log.retention.hours and log.retention.bytes?
That's not how Kafka works. If you don't keep logs, then your messages won't be available to consume. If you set it to a low value, you lose the message if your consumer is offline (crashes, etc).
Kafka tracks the messages that each consumer has read, so you don't need to think about deleting to prevent re-reading.
If you still don't like this approach, then just use a 'traditional' MQ that does have the semantics you are looking for :)
You might find this whitepaper useful.
Disclaimer: I work for Confluent
I have an event stream that I want to publish. It's partitioned into topics, continually updates, will need to scale horizontally (and not having a SPOF is nice), and may require replaying old events in certain circumstances. All the features that seem to match Kafka's capabilities.
I want to publish this to the world through a public API that anyone can connect to and get events. Is Kafka a suitable technology for exposing as a public API?
I've read the Documentation page, but not gone any deeper yet. ACLs seem to be sensible.
My concerns
Consumers will be anywhere in the world. I can't see that being a problem seeing Kafka's architecture. The rate of messages probably won't be more than 10 per second.
Is integration with zookeeper an issue?
Are there any arguments against letting subscriber clients connect that I don't control?
Are there any arguments against letting subscriber clients connect that I don't control?
One of the issues that I would consider is possible group.id collisions.
Let's say that you have one single topic to be used by the world for consuming your messages.
Now if one of your clients has a multi-node system and wants to avoid reading the same message twice, they would set the same group.id to both nodes, forming a consumer group.
But, what if someone else in the world uses the same group.id? They would affect the first client, causing it to lose messages. There seems to be no security at that level.
I'm using node-amqp. For each queue, there is one sender and one consumer. On the sender side, I need to maintain a list of active consumers. The question is when a consumer computer crashed, how would I get a notification and delete it from the list at the sender side?
I think you may not be using the MQ concept correctly. The whole point is to disconnect the consumers from the producers. On the whole it is not the job of the producers to know anything about the consumers, except the type of message they will be consuming. To the point that the producer will keep producing if a consumer crashes and the messages will continue to build up in the queue it was reading from.
There is a way to do it by using RabbitMQ's HTTP API (at http://server-name:55672/api/) to get list of connections, but it is too brutal for frequently queries. Another way in theory is to use alternate exchanges to detect undelivered messages, but I didn't tried this way yet.
Also, it may be possible to detect unexpected consumer disconnection by using dead-letter-exchanges as described there: http://www.rabbitmq.com/dlx.html