I am trying to connect a running redpanda kafka cluster to a redpanda schema registry, so that the schema registry verifies incoming messages to the topic, and/or messages being read from the topic.
I am able to add a schema to the registry and read it back with curl requests, as well as add messages to the kafka topic I created in the redpanda cluster.
My question is, how to implement the schema registry with a topic in a kafka cluster? Like how do I instruct the schema registry and/or the kafka topic to validate incoming messages against the schema I added to the registry?
Thanks for your help or a point in the right direction!
Relevant info:
Cluster & topic creation:
https://vectorized.io/docs/guide-rpk-container
rpk container start -n 3
rpk topic create -p 6 -r 3 new-topic --brokers <broker1_address>,<broker2_address>...
Schema Registry creation:
https://vectorized.io/blog/schema_registry/
Command to add a schema:
curl -s \
-X POST \
"http://localhost:8081/subjects/sensor-value/versions" \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
-d '{"schema": "{\"type\":\"record\",\"name\":\"sensor_sample\",\"fields\":[{\"name\":\"timestamp\",\"type\":\"long\",\"logicalType\":\"timestamp-millis\"},{\"name\":\"identifier\",\"type\":\"string\",\"logicalType\":\"uuid\"},{\"name\":\"value\",\"type\":\"long\"}]}"}' \
| jq
The client is responsible for such integration. For example, the Confluent Schema Registry includes KafkaAvroSerializer Java class that wraps an HTTP Client that handles the schema registration and message validation. The broker doesn't handle "topic schemas", since schemas are really per-record. Unless RedPanda has something similar, broker-side validation is only offered by Enterprise "Confluent Server."
RedPanda is primarily a server that exposes a Kafka-compatible API; I assume it is up to you to create (de)serializer interfaces for your respective client languages. There is a Python example on Vectorized Github.
That being said, the Confluent Schema Registry should work with RedPanda as well, and so you can use its serializers and HTTP client libraries with it.
Related
I have successfully set up Kafka Connect in distributed mode locally with the Confluent BigQuery connector. The topics are being made available to me by another party; I am simply moving these topics into my Kafka Connect on my local machine, and then to the sink connector (and thus into BigQuery).
Because of the topics being created by someone else, the schema registry is also being managed by them. So in my config, I set "schema.registry.url":https://url-to-schema-registry, but we have multiple topics which all use the same schema entry, which is located at, let's say, https://url-to-schema-registry/subjects/generic-entry-value/versions/1.
What is happening, however, is that Connect is looking for the schema entry based on the topic name. So let's say my topic is my-topic. Connect is looking for the entry at this URL: https://url-to-schema-registry/subjects/my-topic-value/versions/1. But instead, I want to use the entry located at https://url-to-schema-registry/subjects/generic-entry-value/versions/1, and I want to do so for any and all topics.
How can I make this change? I have tried looking at this doc: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#configuration-details as well as this class: https://github.com/confluentinc/schema-registry/blob/master/schema-serializer/src/main/java/io/confluent/kafka/serializers/subject/TopicRecordNameStrategy.java
but this looks to be a config parameter for the schema registry itself (which I have no control over), not the sink connector. Unless I'm not configuring something correctly.
Is there a way for me to configure my sink connector to look for a specified schema entry like generic-entry-value/versions/..., instead of the default format topic-name-value/versions/...?
The strategy is configurable at the connector level.
e.g.
value.converter.value.subject.name.strategy=...
There are only strategies built-in, however for Topic and/or RecordName lookups. You'll need to write your own class for static lookups from "generic-entry" if you otherwise cannot copy this "generic-entry-value" schema into new subjects
e.g
# get output of this to a file
curl ... https://url-to-schema-registry/subjects/generic-entry-value/versions/1/schema
# upload it again where "new-entry" is the name of the other topic
curl -XPOST -d #schema.json https://url-to-schema-registry/subjects/new-entry-value/versions
I've enabled SASL PLAIN authentication for my Zookeper and Broker. It seems working, I can only see topics and their content by using the credentials I set. The problem is, even though the status for all connectors were "RUNNING", there wasn't any data coming to kafka topics. So I restarted Kafka Connect and now I can't connect it, a connection refused error occurs.
It was already confusing me, how does Kafka Connect establish a connection with a SASL activated broker? It needs to be authenticated to be able to write data to a topic right? How can I do that? For example; I've provided the Schema Registry basic authentication information for Kafka Connect in connect-distributed.properties file like that:
schema.registry.basic.auth.user.info=admin:secret
key.converter.basic.auth.user.info=admin:secret
value.converter.basic.auth.user.info=admin:secret
schema.registry.basic.auth.credentials.source=USER_INFO
key.converter.basic.auth.credentials.source=USER_INFO
value.converter.basic.auth.credentials.source=USER_INFO
I believe I need to do something similar. But in tutorials I didn't see anything about that.
EDIT:
Connect service seems to be runnning, but connectors can't fetch the metadata of topics. That means there is a problem with authentication to Kafka.
It seems to be working with below configuration. I am not sure if you need to add producer. and consumer. parts but they don't cause any problems. I've added these lines to connect-distributed.properties file.
sasl.mechanism=PLAIN
security.protocol=SASL_PLAINTEXT
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="admin" \
password="secret";
producer.sasl.mechanism=PLAIN
producer.security.protocol=SASL_PLAINTEXT
producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="admin" \
password="secret";
consumer.sasl.mechanism=PLAIN
consumer.security.protocol=SASL_PLAINTEXT
consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="admin" \
password="secret";
I use rest api for Kafka Connect. After reboot server all connectors are deleted and return result is empty.
curl -H "Accept:application/json" localhost:8083/connectors
output is : []
Connectors are stored in Kafka topics
By default, Kafka (and Zookeeper) stores its topics/data under /tmp, which is wiped on restart
As mentioned in the comments, there are also other ways you'd end up with temporary data
I have confluent kafka, zookeeper, schema-registry and ksql running in containers on Kubernetes cluster. Kafka, zookeeper and schema registry works fine, a can create topic and write data in Avro format, but when I'm trying to check ksql and create some streaming with curl like:
curl -XPOST http://ksql-svc.someapp:8080/ksql -H "Content-Type: application/json" -d $'
{"ksql": "CREATE STREAM kawabanga_stream (log_id varchar, created_date varchar) WITH (kafka_topic = '\'kawabanga\'', value_format = '\'avro\'');","streamsProperties":{}}'
I get error:
[{"error":{"statementText":"CREATE STREAM kawabanga_stream (log_id varchar, created_date varchar) WITH (kafka_topic = 'kawabanga', value_format = 'avro');","errorMessage":{"message":"Avro schema file path should be set for avro topics.","stackTrace":["io.confluent.ksql.ddl.commands.RegisterTopicCommand.extractTopicSerDe(RegisterTopicCommand.java:75)","io.confluent.ksql.ddl.commands.RegisterTopicCommand.<init>
Please find belowe my ksql server config:
# cat /etc/ksql/ksqlserver.properties
bootstrap.servers=kafka-0.kafka-hs:9093,kafka-1.kafka-hs:9093,kafka-
2.kafka-hs:9093
schema.registry.host=schema-svc
schema.registry.port=8081
ksql.command.topic.suffix=commands
listeners=http://0.0.0.0:8080
Also I tried to start server without schema.registry strings but with no luck
You must set the configuration ksql.schema.registry.url (see KSQL v0.5 documentation).
FYI: We will have better documentation for Avro usage in KSQL and for Confluent Schema registry integration with the upcoming GA release of KSQL in early April (as part of Confluent Platform 4.1).
I don't know how to check version, but I'm using docker image confluentinc/ksql-cli (I think it's 0.5). For Kafka I used Confluent Kafka Docker image 4.0.0
The KSQL server shows the version on startup.
If you can't see the KSQL server startup message, you can also check the KSQL version by entering "VERSION" in the KSQL CLI prompt (ksql> VERSION).
According to the KSQLConfig class, you should use the ksql.schema.registry.url property to specify the location of the Schema Registry.
This looks to of been the case since at least v0.5 onwards.
It's also worth noting that using the RESTful API directly isn't currently supported. So you may find the API changes between releases.
I was wondering can I use Confluent Schema registry to generate (and then send it to kafka) schema less avro records? If yes can somebody please share some resources for it?
I am not able to find any example on Confluent website and Google.
I have a plain delimited file and I have a separate schema for it, currently I am using Avro Generic Record schema to serialize the Avro records and sending it through Kafka. This way the schema is still attached with the record which makes it more bulkier. My logic is that if I remove the schema while sending the record from kafka I will be able to get higher throughput.
The Confluent Schema Registry will send Avro messages serialized without the entire Avro Schema in the message. I think this is what you mean by "schema less" messages.
The Confluent Schema Registry will store the Avro schemas and only a short index id is included in the message on the wire.
The full docs including a quickstart guide for testing the Confluent Schema Registry is here
http://docs.confluent.io/current/schema-registry/docs/index.html
You can register the your avro schema first time with the help of below command from cmd
curl -X POST -i -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"schema": "{\"type\": \"string\"}"}' \
http://localhost:8081/subjects/topic
You can see all versions of your topic using
curl -X GET -i http://localhost:8081/subjects/topic/versions
To see complete Acro schema for version 1 from all versions present in confluent schema registry use below command, will show schema in json format
curl -X GET -i http://localhost:8081/subjects/topica/versions/1
Avro schema registration is task of Kafka producer
After having schema in confluent schema registry, you just need to publish avro generic records to specific kafka topic, in our case it is 'topic'
Kafka Consumer :Use below code to take latest schema for specific Kafka topic
val schemaReg = new CachedSchemaRegistryClient(kafkaAvroSchemaRegistryUrl, 100)
val schemaMeta = schemaReg.getLatestSchemaMetadata(kafkaTopic + "-value")
val schema = schemaMeta.getSchema
val schema =new Schema.Parser().parse(schema)
Above will be use to get schema and then we can use confluent to decode record from kafka topic.