We are using a confluent Platform for Kafka deployment. We are using a schema registry for storing schema. Is it possible to integrate schema registry with flink? How to read the data in AVRO format from confluent platform?
These classes are designed to meet this need
See the linked JavaDoc for more info on the classes.
Each can be provided to the Kafka Connector via the respective serialization method arguments.
Flink SQL can also be used.
The difference between vanilla apache Avro and Avro with confluent schema registry is that when using apache avro , we send schema+message in kafka topic whereas in confluent schema registry , we send schemaID+message in kafka topic ? So here , schema registry helps in performance improvement via schema look up in registry. Is there any other benefit of using confluent schema registry ? Also , does apache avro supports compatabilty rules of schema evolution like schema registry ?
Note: There are other implementations of a "Schema Registry" that can use used with Kafka.
Here are a list of reasons
Clients can discover schemas without interacting with Kafka. For example, Apache Hive / Presto / Spark can download schemas from the Registry to perform analytics.
The registry is centrally responsible for compatibility checks rather than pushing each client to operate that on their own (to answer your second question)
The same applies to any serialization format, as well, not only Avro
I came across the following article on how to use the schema registry available in the confluent platform.
According to that article, we can specify confluent.schema.registry.url in server.properties to point Kafka to the schema registry.
My question is, is it possible to point a Kafka cluster which is not a part of confluent platform deployment, to a schema registry using confluent.schema.registry.url?
Server-side schema validation is part of Confluent Server, not Apache Kafka.
I will make sure that that docs page gets updated to be more clear - thanks for raising it.
How to auto-save Avro schema in Confluent Schema Registry from Apache NiFi flow?
That's basically the question.
I am not finding the way of automatically storing the Avro schema of the record in the Confluent Schema Registry from a NiFi flow. It is possible to flexibly read and populate message with the reference to the schema in the Confluent Schema-Registry, but there should be a way of auto-creating one in the registry instead of demanding Confluent Schema-Registry to be initialized upfront before NiFi flow starts.
Here is my current Flow:
I'm reading from a Postgres table using QueryDatabaseTableRecord processor (version 1.10) and publishing [new] records to a Kafka topic using PublishKafkaRecord_2_0 (version 1.10.0).
I want to publish to Kafka in Avro format storing (and passing around) the Avro schema in the Confluent Schema Registry (that works well in other places of my NiFi setup).
For that, I am using AvroRecordSetWriter in the "Record Writer" property on the QueryDatabaseTableRecord processor with the following properties:
PublishKafkaRecord processor is configured to read Avro schema from the input message (using the Confluent schema registry, the schema is not embedded into each FlowFile) and uses same AvroRecordSetWriter as QueryDatabaseTableRecord processor to write to Kafka.
That's basically it.
Trying to replace the first AvroRecordSetWriter with one that embeds the schema with the hope that the second AvroRecordSetWriter could auto generate schema in the Confluent Schema Registry on publish, since I don't want to bloat each message with my embedded Avro schema.
I've tried to follow the advice from the comment as follows
With that I was trying to make first access to the Confluent Schema Registry the last step in the chain. Unfortunately, my attempts were unsuccessful. The only option that worked was my initial described in this question that required a schema in the registry upfront/in advance to work.
Both other cases that I tried ended up with the exception:
org.apache.nifi.schema.access.SchemaNotFoundException: Cannot write Confluent Schema Registry Reference because the Schema Identifier is not known
Please note, that I cannot use "Inherit Schema from record" in the last writer's schema access, since I'm getting an invalid combination and the NiFi config validation doesn't pass such combination through.
Can Confluent Schema Registry used by applications outside of Kafka Streams? I am specifically interested in using this component with message queues other than Apache Kafka, such as Cloud Pub/Sub. Based on investigations the component seem like tightly coupled with applications using Confluent Platform.
Well, the Confluent Schema Registry does depend on Kafka (it's where the schemas are actually stored). You don't need the rest of Confluent Platform.
While there is a Storage interface that could, in theory, be re-written against an external system, I am not aware of a way to change out the default implementation.
Once you had Kafka (and subsequently Zookeeper), the REST API itself could be wrapped by any external serialization library. Flink, NiFi, and StreamSets for example, have taken this approach for Avro schema management.
I can run example by following Start Streaming with Kafka and Spring Cloud, but unfortunately it doesn't use confluent schema registry. I read the confluent schema registry part of Spring Cloud Stream reference guide, but it didn't work with my confluent 3.0.0 and the guide doesn't mention how to produce Avro message using confluent schema registry. So, can anyone guide me how to achieve it? Thanks!
The Spring Cloud Stream is not yet compatible with Confluent Schema Registry. See discussion in this thread https://github.com/spring-cloud/spring-cloud-stream/issues/850