Apache Kafka - Advisory Message - apache-kafka

Apache Kafka has option "Advisory Message", similar to ActiveMQ?
ActiveMQ Advisory Message -> http://activemq.apache.org/advisory-message.html

No, there is no such functionality. Instead of managing cluster with advisory messages, kafka relies on zookeeper and whenever some action is required (e.g. delete topic, perform rebalance) it creates appropriate "command" node in zk.
Having said this, kafka exposes a lot of it's underpinnings as JMX accessible statistics.

Related

Triggering kubernetes job for a kafka message

I have a kubernetes service that only does something when it consumes a message from a Kafka queue. The queue does not have messages very often, and running the service as a job triggered whenever a message is found would save resources.
I see that Kubernetes has this functionality for AMQP-type message services: https://kubernetes.io/docs/tasks/job/coarse-parallel-processing-work-queue/
Is there a way to adapt this for Kafka, given that Kafka does not support AMQP? I'd switch to a different messaging system, but I have other services that also read from this queue that require Kafka.
That Kafka consumer Service is all you really need. If you want to save resources, this could be paired with KEDA autoscaler such that it scales up and down, depending on load or consumer group lag.
Or you can use serverless platforms such as KNative to trigger based on Kafka (or other messaging systems) events.
Kafka does not support AMQP
Kafka Connect should be able to bridge AMQP to Kafka. E.g. Apache Camel has connectors for both.

Kafka 2.0 - Kafka Connect Sink - Creating a Kafka Producer

We are currently on HDF (Hortonworks Dataflow) 3.3.1 which bundles Kafka 2.0.0 and are trying to use Kafka Connect in distributed mode to launch a Google Cloud PubSub Sink connector.
We are planning on sending back some metadata into a Kafka Topic and need to integrate a Kafka producer into the flush() function of the Sink task java code.
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
Also, how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source? I need to use the same Bootstrap server list to start the producer.
Currently I am changing the config for the sink connector, adding bootstrap server list as a property and parsing it in the Java code for the connector. I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible.
Kindly help on this.
Thanks in advance.
need to integrate a Kafka producer into the flush() function of the Sink task java code
There is no producer instance exposed in the SinkTask API...
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
I mean, you can add whatever code you want. As far as negative impacts go, that's up to you to benchmark on your own infrastructure. Obviously adding more blocking code makes the other processes slower overall
how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source?
Sinks and sources are not workers. Look at connect-distributed.properties
I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible
It's not possible. Adding extra properties to the sink/source configs are the only way. (Feel free to make a Kafka JIRA requesting such a feature of exposing the worker configs, though)

Is there any way to forward Kafka messages from topic on one server to topic on another server?

I have a scenario where we are forwarding our application logs to Kafka topic using fluentD agents,
as Kafka team introduced Kerberos authentication and fluentD version not supporting this authentication, I cannot directly use forward logs.
Now we have introduced a new Kafka server without authentication and created a topic there, I want forward messages from this topic in the new server to another topic in another server using Kafka connectors,
want to know how I can achieve this?
There's several different tools that enable you to stream messages from a Kafka topic on one cluster to a different cluster, including:
MirrorMaker (open source, part of Apache Kafka)
Confluent's Replicator (commercial tool, 30 day free trial)
uReplicator (open sourced from Uber)
Mirus (open sourced from Salesforce)
Brucke (open source)
Disclaimer: I work for Confluent.

Bypass Zookeeper in producer/consumer clients?

This is a follow-up question to an earlier discussion. I think of Zookeeper as a coordinator for instances of the Kafka broker, or "message bus". I understand why we might want producer/consumer clients transacting through Zookeeper -- because Zookeeper has built-in fault-tolerance as to which Kafka broker to transact with. But with the new model -- ie, 0.10.1+ -- should we always bypass Zookeeper altogether in our producer/consumer clients? Are we giving up any advantages (Eg, better fault-tolerance) by doing that? Or is Zookeeper ultimately still at work behind the scenes?
To add to the answer of Hans Jespersen, recent Kafka producer/consumer clients (0.9+) do not interact with ZooKeeper anymore.
Nowadays ZooKeeper is only used by the Kafka brokers (i.e., the server-side of Kafka). This means you can e.g. lock-down external access from clients to all ZooKeeper instances for better security.
I understand why we might want producer/consumer clients transacting through Zookeeper -- because Zookeeper has built-in fault-tolerance as to which Kafka broker to transact with.
Producer/consumer clients are not "transacting" through ZooKeeper, see above.
But with the new model -- ie, 0.10.1+ -- should we always bypass Zookeeper altogether in our producer/consumer clients?
If the motivation of your question is because you want to implement your own Kafka producer or consumer client, then the answer is: your custom client should not ZooKeeper any longer. The official Kafka producer/consumer clients (Java/Scala) or e.g. Confluent's C/C++, Python, or Go clients for Kafka demonstrate how scalability, fault-tolerance, etc. can be achieved by leveraging Kafka functionality (rather than having to rely on a separate service such as ZooKeeper).
Are we giving up any advantages (Eg, better fault-tolerance) by doing that? Or is Zookeeper ultimately still at work behind the scenes?
No, we are not giving up any advantages here. Otherwise the Kafka project would not have changed its producer/consumer clients to stop using ZooKeeper and start using Kafka themselves for their inner workings.
ZooKeeper is only still at work behind the scenes for the Kafka brokers, see above.
Zookeeper is still at work behind the scenes but the 0.9+ clients don't need to worry about it anymore because consumer offsets are now stored in a Kafka topic rather than in zookeeper.

Basic kafka topic availability checker

I need a simple health checker for Apache Kafka. I dont want something large and complex like Yahoo Kafka Manager, basically I want to check if a topic is healthy or not and if a consumer is healthy.
My first idea was to create a separate heart-beat topic and periodically send and read messages to/from it in order to check availability and latency.
The second idea is to read all the data from Apache Zookeeper. I can get all brokers, partitions, topics etc. from ZK, but I dont know if ZK can provide something like failure detection info.
As I said, I need something simple that I can use in my app health checker.
Some existing tools you can try them out if you haven't yet -
Burrow Linkedin's Kafka Consumer Lag Checking
exhibitor Netflix's ZooKeeper co-process for instance monitoring, backup/recovery, cleanup and visualization.
Kafka System Tools Kafka command line tools