Generating timestamp based documenIds in kafka connect - apache-kafka

I am sending data from Kafka to Couchbase using kafka sink connector (https://github.com/apache/kafka & https://github.com/couchbase/kafka-connect-couchbase).
I am using CB v5.1.0 and kafka 2.12
I have not enabled any kind of documentId generation in kafka connect (in file quickstart-couchbase-sink.properties). So, the connector is using the whole document as key. I want to generate key as topic-partition-offset-randomString-timestamp.
How can this be achieved? I could find something here - https://docs.confluent.io/current/connect/kafka-connect-elasticsearch/configuration_options.html but I don't see key.ignore option anywhere in kafka and kafka-connect code.

Related

Stream both schema and data changes from MySQL to MySQL using Kafka Connect

How we can stream schema and data changes along with some kind of transformations into another MySQL instance using Kafka connect source connector.
Is there a way to propagate schema changes also if I use Kafka's Python library(confluent_kafka) to consume and transform messages before loading into target DB.
You can use Debezium to stream MySQL binlogs into Kafka. Debezium is built upon Kafka Connect framework.
From there, you can use whatever client you want, including Python, to consume and transform the data.
If you want to write to MySQL, you can use Kafka Connect JDBC sink connector.
Here is an old post on this topic - https://debezium.io/blog/2017/09/25/streaming-to-another-database/

How to make a Data Pipeline from MQTT to KAFKA Broker to MongoDB?

How can I make a data pipeline, I am sending data from MQTT to KAFKA topic using Source Connector. and on the other side, I have also connected Kafka Broker to MongoDB using Sink Connector. I am having trouble making a data pipeline that goes from MQTT to KAFKA and then MongoDB. Both connectors are working properly individually. How can I integrate them?
here is my MQTT Connector
MQTT Connector
Node 1 MQTT Connector
Message Published from MQTT
Kafka Consumer
Node 2 MongoDB Connector
MongoDB
that is my MongoDB Connector
MongoDB Connector
It is hard to tell what exactly the problem is without more logs, please provide your connect.config as well, please check /status of your connector, I still did not understand exactly what the issue you are facing, you are saying that , MQTT SOURCE CONNECTOR sending messages successfully to KAFKA TOPIC and your MONGO DB SINK CONNECTOR successfully reading this KAFKA TOPIC and write to your mobgodb, hence your pipeline, Where is the error? Is your KAFKA is the same KAFKA? Or separated different KAFKA CLUSTERS? Seems like both localhost, but is it the same machine?
Please elaborate and explain what are you expecting? What does "pipeline" means in your word?
You need both connectors to share same kafka cluster, what does node1 and node2 mean is it seperate kafka instance? Your connector need to connect to the same kafka "node" / cluster in order to share the data inside the kafka topic one for input and one for output, share your bootstrap service parameters, share your server.properties as well of the kafka
In order to run two different connect clusters inside same kafka , you need to set in different internal topics for each connect cluster
config.storage.topic
offset.storage.topic
status.storage.topic

Kafka Streams application integrate with Kafka JDBC sink connector

I am trying to use kafka streams for some sort of computation, and send the result of computation to a topic which is sinked to database by JDBC sink connector. The result needs to be serialized using avro with confluent schema registry. Is there any demo or guide to show how to handle this scenario?
Not clear what you mean by "integrate"; Kafka Streams is independent from Kafka Connect, however both can be used from ksqlDB
The existing examples of Kafka Connect should be adequate enough using the output topic of your Streams tasks
As for Kafka Streams, you'd need to use the Confluent Avro Serde's and add Schema Registry URL to the StreamsConfig.

How to sink MessagePack-encoded messages into MongoDB from Kafka

I have a Kafka topic where the values are MessagePack-encoded.
Is there any way to sink the records from this topic into MongoDB using the MongoDB Kafka connector, or must the record values simply be stored as JSON?
You will need to find or create your own Kafka Connect Converter, then add that package to each Connect worker's classpath, followed by setting it as your key/value converter setting, from which the existing Mongo Sink Connector can deserialize the messages into a Struct and Schema form, and handle correctly.
JSON was never a requirement. Avro and Protobuf should work as well

How to fetch Kafka source connector schema based on connector name

I am using Confluent JDBC Kafka connector to publish messages into topic. The source connector will send data to topic along with schema on each poll. I want to retrieve this schema.
Is it possible? How? Can anyone suggest me
My intention is to create a KSQL stream or table based on schema build by Kafka connector on poll.
The best way to do this is to use Avro, in which the schema is stored separately and automatically used by Kafka Connect and KSQL.
You can use Avro by configuring Kafka Connect to use the AvroConverter. In your Kafka Connect worker configuration set:
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://schema-registry:8081
(Update schema-registry to the hostname of where your Schema Registry is running)
From there, in KSQL you just use
CREATE STREAM my_stream WITH (KAFKA_TOPIC='source_topic', VALUE_FORMAT='AVRO');
You don't need to specify the schema itself here, because KSQL fetches it from the Schema Registry.
You can read more about Converters and serialisers here.
Disclaimer: I work for Confluent, and wrote the referenced blog post.