Kafka Connect writes data to non-existing topic - apache-kafka

Does Kafka Connect creates the topic on the fly if it doesn't exist (but provided as a destination) or fails to copy messages to it?
I need to create such topics on the fly or programmatically (Java API) at least, not manually using scripts.
I searched this info, but it seems topics have to be already created before migration

Kafka Connect doesn't really control this.
There's a setting in Kafka that enables/disables automatic topic creation.
If this is turned on - Kafka Connect will create its' own topics, if not - you have to create them yourselves.

By default, Kafka will not create a new topic when a consumer subscribes to a non-existing topic. you should enable the auto.create.topics.enable=truein your Kafka server configuration file which enables auto-creation of topics on the server.
Once you turn on this feature Kafka will automatically create topics on the fly. When an application tries to connect to a non-existing topic, Kafka will create that topic automatically.

Related

MongoDB Atlas Source Connector Single Topic

I am using Confluent MongoDB Atlas Source Connector to pull data from MongoDB collection to Kafka. I have noticed that the connector is creating multiple topics in the Kafka Cluster. I need the data to be available on one topic so that the consumer application can consume the data from the topic. How can I do this?
Besides, why the Kafka connector is creating so many topics? isn't is difficult for consumer applications to retrieve the data with that approach?
Kafka Connect creates 3 internal topics for the whole cluster for managing its own workload. You should never need/want external consumers to use these
In addition to that, connectors can create their own topics. Debezium for example creates a "database history topic", and again, this shouldn't be read outside of the Connect framework.
Most connectors only need to create one for the source to pull data into, which is what consumers actually should care about

Creating new kafka topic using nifi

I'm trying to put some data into an uncreated Kafka topic using the PublishKafka_2.0 processor in Nifi.
I don't have a direct approach to the Kafka server - only via the nifi flow, and i need to create 3 new topics for the data.
How can it be done using nifi??
thank you!
You would need to enable automatic creation of Kafka topics from Kafka itself. NiFi doesn't have any control over Kafka. It just supports consuming and producing. From the sound of it, you may have a setup where automatic topic creation is disabled, so you'll need to have someone create the topics for you.

Kafka internal topic : Where are the internal topics created - source or target broker?

We are doing a stateful operation. Our cluster is managed. Everytime for internal topic creation , we have to ask admin guys to unlock so that internal topics can be created by the kafka stream app. We have control over target cluster not source cluster.
So, wanted to understand which cluster - source/ target are internal topics created?
AFAIK, There is only one cluster that the kafka-streams app connects to and all topics source/target/internal are created there.
So far, Kafka Stream applications can support connection to only one cluster as defined in the BOOTSTRAP_SERVERS_CONFIG in Stream configurations.
As answered above also, all source topics reside in those brokers and all internal topics(changelog/repartition topics) are created in the same cluster. KStream app will create the target topic in the same cluster as well.
It will be worth looking into the server logs to understand and analyze the actual root cause.
As the other answers suggest there should be only one cluster that the Kafka Stream application connects to. Internal topics are created by the Kafka stream application and will only be used by the application that created it. However, there could be some configuration related to security set on the Broker side which could be preventing the streaming application from creating these topics:
If security is enabled on the Kafka brokers, you must grant the underlying clients admin permissions so that they can create internal topics set. For more information, see Streams Security.
Quoted from here
Another point to keep in mind is that the internal topics are automatically created by the Stream application and there is no explicit configuration for auto creation of internal topics.

Kafka 2.0 - Kafka Connect Sink - Creating a Kafka Producer

We are currently on HDF (Hortonworks Dataflow) 3.3.1 which bundles Kafka 2.0.0 and are trying to use Kafka Connect in distributed mode to launch a Google Cloud PubSub Sink connector.
We are planning on sending back some metadata into a Kafka Topic and need to integrate a Kafka producer into the flush() function of the Sink task java code.
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
Also, how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source? I need to use the same Bootstrap server list to start the producer.
Currently I am changing the config for the sink connector, adding bootstrap server list as a property and parsing it in the Java code for the connector. I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible.
Kindly help on this.
Thanks in advance.
need to integrate a Kafka producer into the flush() function of the Sink task java code
There is no producer instance exposed in the SinkTask API...
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
I mean, you can add whatever code you want. As far as negative impacts go, that's up to you to benchmark on your own infrastructure. Obviously adding more blocking code makes the other processes slower overall
how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source?
Sinks and sources are not workers. Look at connect-distributed.properties
I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible
It's not possible. Adding extra properties to the sink/source configs are the only way. (Feel free to make a Kafka JIRA requesting such a feature of exposing the worker configs, though)

Kafka Connect - Delete Connector with configs?

I know how to delete Kafka connector as mentioned here Kafka Connect - How to delete a connector
But I am not sure if it also delete/erase specific connector related configs, offsets and status from *.sorage.topic for that worker?
For e.g:
Lets say I delete a connector having connector-name as"connector-abc-1.0.0" and Kafka connect worker was started with following config.
offset.storage.topic=<topic.name>.internal.offsets
config.storage.topic=<topic.name>.internal.configs
status.storage.topic=<topic.name>.internal.status
Now after DELETE call for that connector, will it erased all records from above internal topics for that specific connector?
So that I can create new connector with "same name" on same worker but different config(different offset.start or connector.class)?
When you delete a connector, the offsets are retained in the offsets topic.
If you recreate the connector with the same name, it will re-use the offsets from the previous execution (even if the connector was deleted in between).
Since Kafka is append only, then only way the messages in those Connect topics would be removed is if it were published with the connector name as the message key, and null as the value.
You could inspect those topics using console consumer to see what data is in them including --property print.key=true, and keep the consumer running when you delete a connector.
You can PUT a new config at /connectors/{name}/config, but any specific offsets that are used are dependent upon the actual connector type (sink / source); for example, there is the internal Kafka __consumer_offsets topic, used by Sink connectors, as well as the offset.storage.topic, optionally used by source connectors.
"same name" on same worker but different config(different offset.start or connector.class)?
I'm not sure changing connector.class would be a good idea with the above in mind since it'd change the connector behavior completely. offset.start isn't a property I'm aware of, so you'll need to see the documentation of that specific connector class to know what it does.