How to fetch Kafka source connector schema based on connector name - apache-kafka

I am using Confluent JDBC Kafka connector to publish messages into topic. The source connector will send data to topic along with schema on each poll. I want to retrieve this schema.
Is it possible? How? Can anyone suggest me
My intention is to create a KSQL stream or table based on schema build by Kafka connector on poll.

The best way to do this is to use Avro, in which the schema is stored separately and automatically used by Kafka Connect and KSQL.
You can use Avro by configuring Kafka Connect to use the AvroConverter. In your Kafka Connect worker configuration set:
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://schema-registry:8081
(Update schema-registry to the hostname of where your Schema Registry is running)
From there, in KSQL you just use
CREATE STREAM my_stream WITH (KAFKA_TOPIC='source_topic', VALUE_FORMAT='AVRO');
You don't need to specify the schema itself here, because KSQL fetches it from the Schema Registry.
You can read more about Converters and serialisers here.
Disclaimer: I work for Confluent, and wrote the referenced blog post.

Related

Stream both schema and data changes from MySQL to MySQL using Kafka Connect

How we can stream schema and data changes along with some kind of transformations into another MySQL instance using Kafka connect source connector.
Is there a way to propagate schema changes also if I use Kafka's Python library(confluent_kafka) to consume and transform messages before loading into target DB.
You can use Debezium to stream MySQL binlogs into Kafka. Debezium is built upon Kafka Connect framework.
From there, you can use whatever client you want, including Python, to consume and transform the data.
If you want to write to MySQL, you can use Kafka Connect JDBC sink connector.
Here is an old post on this topic - https://debezium.io/blog/2017/09/25/streaming-to-another-database/

Sending Avro messages to Kafka

I have an app that produces an array of messages in raw JSON periodically. I was able to convert that to Avro using the avro-tools. I did that because I needed the messages to include schema due to the limitations of Kafka-Connect JDBC sink. I can open this file on notepad++ and see that it includes the schema and a few lines of data.
Now I would like to send this to my central Kafka Broker and then use Kafka Connect JDBC sink to put the data in a database. I am having a hard time understanding how I should be sending these Avro files I have to my Kafka Broker. Do I need a schema registry for my purposes? I believe Kafkacat does not support Avro so I suppose I will have to stick with the kafka-producer.sh that comes with the Kafka installation (please correct me if I am wrong).
Question is: Can someone please share the steps to produce my Avro file to a Kafka broker without getting Confluent getting involved.
Thanks,
To use the Kafka Connect JDBC Sink, your data needs an explicit schema. The converter that you specify in your connector configuration determines where the schema is held. This can either be embedded within the JSON message (org.apache.kafka.connect.json.JsonConverter with schemas.enabled=true) or held in the Schema Registry (one of io.confluent.connect.avro.AvroConverter, io.confluent.connect.protobuf.ProtobufConverter, or io.confluent.connect.json.JsonSchemaConverter).
To learn more about this see https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained
To write an Avro message to Kafka you should serialise it as Avro and store the schema in the Schema Registry. There is a Go client library to use with examples
without getting Confluent getting involved.
It's not entirely clear what you mean by this. The Kafka Connect JDBC Sink is written by Confluent. The best way to manage schemas is with the Schema Registry. If you don't want to use the Schema Registry then you can embed the schema in your JSON message but it's a suboptimal way of doing things.

Kafka Streams application integrate with Kafka JDBC sink connector

I am trying to use kafka streams for some sort of computation, and send the result of computation to a topic which is sinked to database by JDBC sink connector. The result needs to be serialized using avro with confluent schema registry. Is there any demo or guide to show how to handle this scenario?
Not clear what you mean by "integrate"; Kafka Streams is independent from Kafka Connect, however both can be used from ksqlDB
The existing examples of Kafka Connect should be adequate enough using the output topic of your Streams tasks
As for Kafka Streams, you'd need to use the Confluent Avro Serde's and add Schema Registry URL to the StreamsConfig.

How to sink MessagePack-encoded messages into MongoDB from Kafka

I have a Kafka topic where the values are MessagePack-encoded.
Is there any way to sink the records from this topic into MongoDB using the MongoDB Kafka connector, or must the record values simply be stored as JSON?
You will need to find or create your own Kafka Connect Converter, then add that package to each Connect worker's classpath, followed by setting it as your key/value converter setting, from which the existing Mongo Sink Connector can deserialize the messages into a Struct and Schema form, and handle correctly.
JSON was never a requirement. Avro and Protobuf should work as well

Generating timestamp based documenIds in kafka connect

I am sending data from Kafka to Couchbase using kafka sink connector (https://github.com/apache/kafka & https://github.com/couchbase/kafka-connect-couchbase).
I am using CB v5.1.0 and kafka 2.12
I have not enabled any kind of documentId generation in kafka connect (in file quickstart-couchbase-sink.properties). So, the connector is using the whole document as key. I want to generate key as topic-partition-offset-randomString-timestamp.
How can this be achieved? I could find something here - https://docs.confluent.io/current/connect/kafka-connect-elasticsearch/configuration_options.html but I don't see key.ignore option anywhere in kafka and kafka-connect code.