Validating AVRO schema on Confluent server - apache-kafka

If we enable the property confluent.value.schema.validation on Confluent server, how is the actual validation performed? Does the broker deserialize the message and check its format? Or does it validate only that the message has the correct id of the schema?

It would need to deserialize the data, even partially to actually get the ID, so yes, it does both.
Try testing by forging a Avro Kafka record with an existing ID but an invalid payload for the schema of that ID.

Related

How does a kafka connect connector know which schema to use?

Let's say I have a bunch of different topics, each with their own json schema. In schema registry, I indicated which schemas exist within the different topics, not directly refering to which topic a schema applies. Then, in my sink connector, I only refer to the endpoint (URL) of the schema registry. So to my knowledge, I never indicated which registered schema a kafka connector (e.g., JDBC sink) should be used in order to deserialize a message from a certain topic?
Asking here as I can't seem to find anything online.
I am trying to decrease my kafka message size by removing overhead of having to specify the schema in each message, and using schema registry instead. However, I cannot seem to understand how this could work.
Your producer serializes the schema id directly in the bytes of the record. Connect (or consumers with the json deserializer) use the schema that's part of each record.
https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#wire-format
If you're trying to decrease message size, don't use JSON, but rather a binary format and enable topic compression such as ZSTD

Is it possible to Consume data using JDBC connector using Schema Registry, If a JAVA producer is producing data without Schema?

My requirement is when a producer is producing data without schema , I need to register a new schema in Schema- Register to consume data into JDBC converter.
Have found this, but is it possible to get any other solution.
Schema Registry is not a requirement to use JDBC Connector, but JDBC Sink connector does require a schema in the record payload, as the linked answer says.
The source connector can read data and generate records without a schema, but this has no interaction with any external producer client.
If you have producers that generate records without any schema, then it's unclear what schema you would be registering anywhere. But you can try to use a ProducerInterceptor to intercept and inspect those records to do whatever you need to.

How to deserialize avro message using mirrormaker?

I want to replicate a kafka topic to an azure event hub.
The messages are in avro format and uses a schema that is behind a schema registry with USER_INFO authentication.
Using a java client to connect to kafka, I can use a KafkaAvroDeserializer to deserialize the message correctly.
But this configuration doesn't seems to work with mirrormaker.
Is is possible to deserialize the avro message using mirrormaker before sending it ?
Cheers
For MirrorMaker1, the consumer deserializer properties are hard-coded
Unless you plan on re-serializing the data into a different format when the producer sends data to EventHub, you should stick to using the default ByteArrayDeserializer.
If you did want to manipulate the messages in any way, that would need to be done with a MirrorMakerMessageHandler subclass
For MirrorMaker2, you can use AvroConverter followed by some transforms properties, but still ByteArrayConverter would be preferred for a one-to-one byte copy.

How to version a field in avro schema when Kafka Consumer updates?

Example :- I have a field named
"abc":[
{"key1":"value1", "key2":"value2"},
{"key1":"value1", "key2":"value2"}
]
Consumer1, consumer2 consuming this variable, where as now consumer2 require few more fields and need to change the structure.
How to address this issue by following best practice?
You can use type map in Avro schema. key is always a string and value can be any type but should one type for the whole map.
So, in your case, introduce a map into your schema. consumer_1 can consume the event and get they keys needed only for the consumer_1 and do the same for consumer_2. But still same Avro schema.
Note: you can not send null to the map in schema. you need to send empty map.
If possible introduce Schema Registry server for schema versioning. Register all the different avro schema's at schema registry and a version Id will be given. Connect your producer and consumer app with schema registry server to fetch the registered schema for the respective Kafka message. Now message with any kind of schema can be received by any consumer with full compatibility.

Why Apache Kafka consumer would use a different version of schema to deserialize record other than the one sent along with the data?

Let us assume I am using Avro serialization while sending data to kafka.
While consuming record from Apache Kafka, I get both the schema and the record. I can use the schema to parse the record. I am not getting the scenario why consumer would use a different version of schema to deserialize the record. Can someone help?
The message is serialized with the the unique id for a specific version of the schema when produced onto Kafka. The consumer would use that unique schema id to deserialize.
Taken from https://docs.confluent.io/current/schema-registry/avro.html#schema-evolution