How do I tell a topic on confluent cloud to use a specific schema programmatically? - apache-kafka

So I know how to create topics on Confluent Cloud with the confluent_kafka AdminClient instance but I’m not sure how to set the topic’s message schema programmatically? To clarify, I have the schema I want to use saved locally in an avro schema file(.avsc)

Use the AdminClient to create the topic and then use the SchemaRegistryClient to register the schema for the topic.

Related

Integrating Flink Kafka with schema registry

We are using a confluent Platform for Kafka deployment. We are using a schema registry for storing schema. Is it possible to integrate schema registry with flink? How to read the data in AVRO format from confluent platform?
These classes are designed to meet this need
ConfluentRegistryAvroSerializationSchema
ConfluentRegistryAvroDeserializationSchema
See the linked JavaDoc for more info on the classes.
Each can be provided to the Kafka Connector via the respective serialization method arguments.
Flink SQL can also be used.

Sending Avro messages to Kafka

I have an app that produces an array of messages in raw JSON periodically. I was able to convert that to Avro using the avro-tools. I did that because I needed the messages to include schema due to the limitations of Kafka-Connect JDBC sink. I can open this file on notepad++ and see that it includes the schema and a few lines of data.
Now I would like to send this to my central Kafka Broker and then use Kafka Connect JDBC sink to put the data in a database. I am having a hard time understanding how I should be sending these Avro files I have to my Kafka Broker. Do I need a schema registry for my purposes? I believe Kafkacat does not support Avro so I suppose I will have to stick with the kafka-producer.sh that comes with the Kafka installation (please correct me if I am wrong).
Question is: Can someone please share the steps to produce my Avro file to a Kafka broker without getting Confluent getting involved.
Thanks,
To use the Kafka Connect JDBC Sink, your data needs an explicit schema. The converter that you specify in your connector configuration determines where the schema is held. This can either be embedded within the JSON message (org.apache.kafka.connect.json.JsonConverter with schemas.enabled=true) or held in the Schema Registry (one of io.confluent.connect.avro.AvroConverter, io.confluent.connect.protobuf.ProtobufConverter, or io.confluent.connect.json.JsonSchemaConverter).
To learn more about this see https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained
To write an Avro message to Kafka you should serialise it as Avro and store the schema in the Schema Registry. There is a Go client library to use with examples
without getting Confluent getting involved.
It's not entirely clear what you mean by this. The Kafka Connect JDBC Sink is written by Confluent. The best way to manage schemas is with the Schema Registry. If you don't want to use the Schema Registry then you can embed the schema in your JSON message but it's a suboptimal way of doing things.

How to replicate schema with Kafka mirror maker?

We are using the mirror maker to sync on-premise and AWS Kafka topics. How can a topic with its schema registered in on-premise be replicated exactly the same in other clusters (AWS in this case)?
How Avro schema is replicated using mirror maker?
MirrorMaker only copies byte arrays, not schemas. And doesn't care about the format of the data
As of Confluent 4.x or later, Schema Registry added endpoint GET /schemas/ids/(number). So, if your consumers are configured to the original registry, this shouldn't matter since your destination consumers can lookup the schema ID.
You otherwise can mirror the _schemas topic as well, as recommend by Confluent when using Confluent Replicator
If you absolutely need one-to-one schema copying, you would need to implement a MessageHandler interface, and pass this on to the MirrorMaker command, to get and post the schema, similar to the internal logic I added to this Kafka Connect plugin (which you could use Connect instead of MirrorMaker). https://github.com/OneCricketeer/schema-registry-transfer-smt

How to fetch Kafka source connector schema based on connector name

I am using Confluent JDBC Kafka connector to publish messages into topic. The source connector will send data to topic along with schema on each poll. I want to retrieve this schema.
Is it possible? How? Can anyone suggest me
My intention is to create a KSQL stream or table based on schema build by Kafka connector on poll.
The best way to do this is to use Avro, in which the schema is stored separately and automatically used by Kafka Connect and KSQL.
You can use Avro by configuring Kafka Connect to use the AvroConverter. In your Kafka Connect worker configuration set:
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://schema-registry:8081
(Update schema-registry to the hostname of where your Schema Registry is running)
From there, in KSQL you just use
CREATE STREAM my_stream WITH (KAFKA_TOPIC='source_topic', VALUE_FORMAT='AVRO');
You don't need to specify the schema itself here, because KSQL fetches it from the Schema Registry.
You can read more about Converters and serialisers here.
Disclaimer: I work for Confluent, and wrote the referenced blog post.

How does Schema Registry integrate with Kafka Source Connector?

I have added Topic-Key and Topic-Value schemas for a given topic using REST APIs. In my custom connector, do I need to create a schema again using SchemaBuilder? How do I access the registered schemas inside my connector ?