I am new to Apache Kafka and I'm trying to build a Python app which is able to handle Kafka messages. I've set Kafka up to produce and consume messages locally. Now I also want this to work non-locally, so that I can send messages from everywhere to my Python app.
My idea was to just expose the specific port that Kafka is using by using Localtunnel. I thought this would just mirror the local messages, so that I can consume them via the generated URL. But surprise, it doesn't work.
I just don't receive any messages at all. Do you have an idea why this is? Do I maybe have to configure the listeners in the Kafka server.properties first?
Thanks!
Related
We are currently on HDF (Hortonworks Dataflow) 3.3.1 which bundles Kafka 2.0.0 and are trying to use Kafka Connect in distributed mode to launch a Google Cloud PubSub Sink connector.
We are planning on sending back some metadata into a Kafka Topic and need to integrate a Kafka producer into the flush() function of the Sink task java code.
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
Also, how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source? I need to use the same Bootstrap server list to start the producer.
Currently I am changing the config for the sink connector, adding bootstrap server list as a property and parsing it in the Java code for the connector. I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible.
Kindly help on this.
Thanks in advance.
need to integrate a Kafka producer into the flush() function of the Sink task java code
There is no producer instance exposed in the SinkTask API...
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
I mean, you can add whatever code you want. As far as negative impacts go, that's up to you to benchmark on your own infrastructure. Obviously adding more blocking code makes the other processes slower overall
how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source?
Sinks and sources are not workers. Look at connect-distributed.properties
I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible
It's not possible. Adding extra properties to the sink/source configs are the only way. (Feel free to make a Kafka JIRA requesting such a feature of exposing the worker configs, though)
I'm developing a Kafka Sink connector on my own. My deserializer is JSONConverter. However, when someone send a wrong JSON data into my connector's topic, I want to omit this record and send this record to a specific topic of my company.
My confuse is: I can't find any API for me to get my Connect's bootstrap.servers.(I know it's in the confluent's etc directory but it's not a good idea to write hard code of the directory of "connect-distributed.properties" to get the bootstrap.servers)
So question, is there another way for me to get the value of bootstrap.servers conveniently in my connector program?
Instead of trying to send the "bad" records from a SinkTask to Kafka, you should instead try to use the dead letter queue feature that was added in Kafka Connect 2.0.
You can configure the Connect runtime to automatically dump records that failed to be processed to a configured topic acting as a DLQ.
For more details, see the KIP that added this feature.
Is it possible to connect cometD client with Kafka producer? Any suggestions?
Currently I am having a CometD client in python which is extracting data real time from a Salesforce object.
Now I want to push that data into Kafka producer. Is it possible to do that? And how?
Solved.
By using https://github.com/dkmadigan/python-bayeux-client to extract the events from Salesforce, I was able to push into the Kafka broker.
I'm just starting with Kafka and Kafka Streaming Applications. I wrote a Kafka Stream App that consumes from one topic, process this messages, and send them to another topic.
To the best of my knowledge, the only ways that I have found to run this Kafka Stream App coded are:
Run Java Class from IDE.
Generate *.jar file and run it from prompt.
I would like to know if there is any way to automatically run Kafka Streaming Applications on Kafka server startup. For example: copy the *.jar file to some folder of my Kafka installation, and automatically run this stream app when I start my Kafka server.
Your Kafka broker (server) and your Kafka Streams application are independent from one another. You can start them however you manage processes on your server, whether it's something like initd or systemd, or container-based solutions like Docker or Kubernetes.
In my experience, if your streams application starts well before your broker or ZooKeeper, then it may time out waiting for them to come online. So you may need to configure the streams process to restart in such a situation.
I'm new with kafka and I'm trying to publish data from external application via http but I cannot find the way to do this.
I already created a topic in kafka and test it producing and consuming the message but I don't know how to insert/publish message via http, I tried to invoke the following url to retrieve the topics but it does not retrieve any data http://servername:2181/topics/
I'm using cloudera 5.12.1.
You can access to your topics, if it was already created, using APIs. The easy way...(see client list)
Or see Connects Config to manage connectors by REST (rest.host.name, rest.port parameters). But only connectors...
To consume or produce message in a topic, use a middleware. IT is more feaseble.
Check out the open source Kafka REST Proxy from Confluent. It does exactly what you want.
You can get it standalone, or as part of Confluent Platform.