Is there a way to collect Kafka Producer configs from the Kafka cluster?
I know that these settings are stored on the client itself.
I am interested at least in client.id and a collection of topics that the producer is publishing to.
There is no such tool provided by Apache Kafka (or Confluent) to acquire this information.
I worked on a team that built a tool called the Stream Registry that did provide a centralized location for this information
May be you can have a look into kafkacat.github url
We find it very helpful in troubleshooting, kafka issues.
Related
What are the different ways to get Kafka Cluster Audit log to GCP Logging?
Can anyone share more information on how can I achieve it?
Thank you!
Assuming you have access to the necessary topic (from what I understand the Audit topic is not stored on your own cluster), to get data out of Kafka, you need a consumer. This could be in any language.
To get data into Cloud Logging, you need to use its API.
That being said, you could use any compatible pair of Kafka clients & Cloud logging clients that you would be comfortable with.
For example, you could write or find a Kafka Connect Sink connector that wraps the Java Cloud Logging client.
I am trying to plot an overall topology for my Kafka cluster (i.e., producers-->topics-->consumers).
For the mapping from topics to consumers, I'm able to obtain it using the kafka-consumer-groups.sh script.
However, for the mapping from producers to topics, I understand there is no equivalent script in vanilla Kafka.
Question:
Does the Schema Registry allow us to associate metadata with producers and/or topics or otherwise create a mapping of all producers producing to a particular topic?
Schema Registry has no such functionality
Closest I've seen to something like this, is using Distributed Tracing (Brave library) or Cloudera's SMM tool, which requires authorized Kafka clients so it can trace requests and Producer client.id to topics, then consumer instances to groups
There's also the Stream Registry project
which I helped with the initial version for the vision of managing client state/discovery, but I think it took different direction and the documentation is not maintained
We are doing a stateful operation. Our cluster is managed. Everytime for internal topic creation , we have to ask admin guys to unlock so that internal topics can be created by the kafka stream app. We have control over target cluster not source cluster.
So, wanted to understand which cluster - source/ target are internal topics created?
AFAIK, There is only one cluster that the kafka-streams app connects to and all topics source/target/internal are created there.
So far, Kafka Stream applications can support connection to only one cluster as defined in the BOOTSTRAP_SERVERS_CONFIG in Stream configurations.
As answered above also, all source topics reside in those brokers and all internal topics(changelog/repartition topics) are created in the same cluster. KStream app will create the target topic in the same cluster as well.
It will be worth looking into the server logs to understand and analyze the actual root cause.
As the other answers suggest there should be only one cluster that the Kafka Stream application connects to. Internal topics are created by the Kafka stream application and will only be used by the application that created it. However, there could be some configuration related to security set on the Broker side which could be preventing the streaming application from creating these topics:
If security is enabled on the Kafka brokers, you must grant the underlying clients admin permissions so that they can create internal topics set. For more information, see Streams Security.
Quoted from here
Another point to keep in mind is that the internal topics are automatically created by the Stream application and there is no explicit configuration for auto creation of internal topics.
We are currently on HDF (Hortonworks Dataflow) 3.3.1 which bundles Kafka 2.0.0 and are trying to use Kafka Connect in distributed mode to launch a Google Cloud PubSub Sink connector.
We are planning on sending back some metadata into a Kafka Topic and need to integrate a Kafka producer into the flush() function of the Sink task java code.
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
Also, how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source? I need to use the same Bootstrap server list to start the producer.
Currently I am changing the config for the sink connector, adding bootstrap server list as a property and parsing it in the Java code for the connector. I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible.
Kindly help on this.
Thanks in advance.
need to integrate a Kafka producer into the flush() function of the Sink task java code
There is no producer instance exposed in the SinkTask API...
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
I mean, you can add whatever code you want. As far as negative impacts go, that's up to you to benchmark on your own infrastructure. Obviously adding more blocking code makes the other processes slower overall
how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source?
Sinks and sources are not workers. Look at connect-distributed.properties
I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible
It's not possible. Adding extra properties to the sink/source configs are the only way. (Feel free to make a Kafka JIRA requesting such a feature of exposing the worker configs, though)
My Kafka Consumers commit their offsets to Kafka(instead of Zookeeper), so I cannot use Kafka Manager.
Burrow is great, however, I cannot use Go in our production environment. :(
So I'm wondering are there any Apache Kafka consumer lag checker besides the above two? I Googled it but didn't find much useful information. Thanks in advance!
You could use Remora https://github.com/zalando-incubator/remora. Its an application that can be deployed with your kafka
Not exactly the same but it can use to monitor lag.
https://github.com/quantifind/KafkaOffsetMonitor
Topic position
There is also records-lag-max JMX metric available at every Kafka Consumer instance.
So you can monitor this one either directly form your application by accesing MBean Server or remotely.