I have a use case where I have a JSON and I want to generate schema and record out of the JSON and publish a record.
I have configured the value serializer and Schema setting is Backward compatible.
First JSON
String json = "{\n" +
" \"id\": 1,\n" +
" \"name\": \"Headphones\",\n" +
" \"price\": 1250.0,\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
Version 1 schema registered.
Received message in avro console consumer.
Second JSON.
String json = "{\n" +
" \"id\": 1,\n" +
" \"price\": 1250.0,\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
Registered schema Successfully.
Sent message.
Now tried sending the JSON 1 sent successfully
Schema 3:
String json = "{\n" +
" \"id\": 1,\n" +
" \"name\": \"Headphones\",\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
Got error for this case.
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409
How is that schema generated from 2nd JSON was registered and the
third one was rejected? Although I didn't have any Default key for the
deleted field? Is it that the Schema Registry always accepts the 1st
evolution? (2nd schema over 1st)
Schema in schema registry
Version 1 schema
{
"fields": [
{
"doc": "Type inferred from '1'",
"name": "id",
"type": "int"
},
{
"doc": "Type inferred from '\"Headphones\"'",
"name": "name",
"type": "string"
},
{
"doc": "Type inferred from '1250.0'",
"name": "price",
"type": "double"
},
{
"doc": "Type inferred from '[\"home\",\"green\"]'",
"name": "tags",
"type": {
"items": "string",
"type": "array"
}
}
],
"name": "myschema",
"type": "record" }
Version 2:
{
"fields": [
{
"doc": "Type inferred from '1'",
"name": "id",
"type": "int"
},
{
"doc": "Type inferred from '1250.0'",
"name": "price",
"type": "double"
},
{
"doc": "Type inferred from '[\"home\",\"green\"]'",
"name": "tags",
"type": {
"items": "string",
"type": "array"
}
}
],
"name": "myschema",
"type": "record" }
Let's go over the backwards compatibility rules... https://docs.confluent.io/current/schema-registry/avro.html#compatibility-types
First, the default isn't transitive, so version 3 only will look at version 2.
The backwards rule states you can delete fields or add optional fields (those with a default). I assume your schema generator tool doesn't know how to use optionals, so you're only allowed to delete, not add.
Between version 1 and 2, you've deleted the name field, which is valid.
Between version 2 and the incoming 3, it thinks you're trying to post a new schema which removes price (this is okay}, but adds a required name field, which is not allowed.
Related
I wanted to add a new field into an AVRO schema of type "record" that cannot be null and therefore has a default value. The topic is set to compatibility type "Full_Transitive".
The schema did not change from the last version, only the last field produktType was added:
{
"type": "record",
"name": "Finished",
"namespace": "com.domain.finishing",
"doc": "Schema to indicate the end of the ongoing saga...",
"fields": [
{
"name": "numberOfAThing",
"type": [
"null",
{
"type": "string",
"avro.java.string": "String"
}
],
"default": null
},
{
"name": "previousNumbersOfThings",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
},
"default": []
},
{
"name": "produktType",
"type": {
"type": "record",
"name": "ProduktType",
"fields": [
{
"name": "art",
"type": "int",
"default": 1
},
{
"name": "code",
"type": "int",
"default": 10003
}
]
},
"default": { "art": 1, "code": 10003 }
}
]
}
I've checked with the schema-registry that the new version of the schema is compatible.
But when we try to read old messages that do not contain that new field with the new schema (where the defaults are) there is a EOF Exception and it does not seem to work.
The part that causes headaches is the new added field "produktType". It cannot be null so we tried adding defaults. Which is possible for primitive type fields ("int" and so on). The line "default": { "art": 1, "code": 10003 } seems to be ok with the schema-registry but does not seem to have an effect when we read messages from the topic that do not contain this field.
The schema registry also marks it as not compatible when the last "default": { "art": 1, "code": 10003 } line is missing (but also "default": true works regarding schema compatibility...).
The AVRO specification for complex types contains an example for type "record" and default {"a": 1} so that is where we got that idea from. But since its not working something is still wrong.
There are similar questions like this one claiming records can only have null as a default or this un-answered one.
Is this supposed to work? And if so how can defaults for these "type": "record" fields be defined? Or is it still true that records can only have null as default?
Thanks!
Update on the compatibility cases:
Schema V1 (old one without the new field): can read v1 and v2 records.
Schema V2 (new field added): cannot read v1 records, can read v2 records
The case where a consumer using schema v2 encountering records using v1 is the surprising one - as I thought the defaults are for that purpose.
Even weirder: when I don't set the new field values at all. The v2 record does contain some values:
I have no idea where the value for code is from. The schema uses other numbers for its defaults:
So one of them seems to work, the other does not.
I have started exploring Kafka and Kafka connect recently and did some initial set up .
But wanted to explore more on schema registry part .
My schema registry is started now what i should do .
I have a AVRO schema stored in avro_schema.avsc.
here is the schema
{
"name": "FSP-AUDIT-EVENT",
"type": "record",
"namespace": "com.acme.avro",
"fields": [
{
"name": "ID",
"type": "string"
},
{
"name": "VERSION",
"type": "int"
},
{
"name": "ACTION_TYPE",
"type": "string"
},
{
"name": "EVENT_TYPE",
"type": "string"
},
{
"name": "CLIENT_ID",
"type": "string"
},
{
"name": "DETAILS",
"type": "string"
},
{
"name": "OBJECT_TYPE",
"type": "string"
},
{
"name": "UTC_DATE_TIME",
"type": "long"
},
{
"name": "POINT_IN_TIME_PRECISION",
"type": "string"
},
{
"name": "TIME_ZONE",
"type": "string"
},
{
"name": "TIMELINE_PRECISION",
"type": "string"
},
{
"name": "AUDIT_EVENT_TO_UTC_DT",
"type": [
"string",
"null"
]
},
{
"name": "AUDIT_EVENT_TO_DATE_PITP",
"type": "string"
},
{
"name": "AUDIT_EVENT_TO_DATE_TZ",
"type": "string"
},
{
"name": "AUDIT_EVENT_TO_DATE_TP",
"type": "string"
},
{
"name": "GROUP_ID",
"type": "string"
},
{
"name": "OBJECT_DISPLAY_NAME",
"type": "string"
},
{
"name": "OBJECT_ID",
"type": [
"string",
"null"
]
},
{
"name": "USER_DISPLAY_NAME",
"type": [
"string",
"null"
]
},
{
"name": "USER_ID",
"type": "string"
},
{
"name": "PARENT_EVENT_ID",
"type": [
"string",
"null"
]
},
{
"name": "NOTES",
"type": [
"string",
"null"
]
},
{
"name": "SUMMARY",
"type": [
"string",
"null"
]
}
]
}
Is my schema is valid .I converted it online from JSON ?
where should i keep this schema file location i am not sure about .
Please guide me with the step to follow
.
I am sending records from Lambda function and from JDBC source both .
So basically how can i enforce AVRO schema and test ?
Do i have to change anything in avro-consumer properties file ?
Or is this correct way to register schema
./bin/kafka-avro-console-producer \
--broker-list b-3.**:9092,b-**:9092,b-**:9092 --topic AVRO-AUDIT_EVENT \
--property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema" : "{\"type\":\"struct\",\"fields\":[{\"type\":\"string\",\"optional\":false,\"field\":\"ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"VERSION\"},{\"type\":\"string\",\"optional\":true,\"field\":\"ACTION_TYPE\"},{\"type\":\"string\",\"optional\":true,\"field\":\"EVENT_TYPE\"},{\"type\":\"string\",\"optional\":true,\"field\":\"CLIENT_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"DETAILS\"},{\"type\":\"string\",\"optional\":true,\"field\":\"OBJECT_TYPE\"},{\"type\":\"string\",\"optional\":true,\"field\":\"UTC_DATE_TIME\"},{\"type\":\"string\",\"optional\":true,\"field\":\"POINT_IN_TIME_PRECISION\"},{\"type\":\"string\",\"optional\":true,\"field\":\"TIME_ZONE\"},{\"type\":\"string\",\"optional\":true,\"field\":\"TIMELINE_PRECISION\"},{\"type\":\"string\",\"optional\":true,\"field\":\"GROUP_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"OBJECT_DISPLAY_NAME\"},{\"type\":\"string\",\"optional\":true,\"field\":\"OBJECT_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"USER_DISPLAY_NAME\"},{\"type\":\"string\",\"optional\":true,\"field\":\"USER_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"PARENT_EVENT_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"NOTES\"},{\"type\":\"string\",\"optional\":true,\"field\":\"SUMMARY\"},{\"type\":\"string\",\"optional\":true,\"field\":\"AUDIT_EVENT_TO_UTC_DT\"},{\"type\":\"string\",\"optional\":true,\"field\":\"AUDIT_EVENT_TO_DATE_PITP\"},{\"type\":\"string\",\"optional\":true,\"field\":\"AUDIT_EVENT_TO_DATE_TZ\"},{\"type\":\"string\",\"optional\":true,\"field\":\"AUDIT_EVENT_TO_DATE_TP\"}],\"optional\":false,\"name\":\"test\"}"}' http://localhost:8081/subjects/view/versions
what next i have to do
But when i try to see my schema i get only below
curl --silent -X GET http://localhost:8081/subjects/AVRO-AUDIT-EVENT/versions/latest
this is the result
{"subject":"AVRO-AUDIT-EVENT","version":1,"id":161,"schema":"{\"type\":\"string\",\"optional\":false}"}
Why i do not see my full registered schema
Also when i try to delete schema
i get below error
{"error_code":405,"message":"HTTP 405 Method Not Allowed"
i am not sure if my schema is registered correctly .
Please help me.
Thanks in Advance
is my schema valid
You can use the REST API of the Registry to try and submit it and see...
where should i keep this schema file location i am not sure about
It's not clear how you're sending messages...
If you actually wrote Kafka producer code, you store it within your code (as a string) or as a resource file.. If using Java, you can instead use the SchemaBuilder class to create the Schema object
You need to rewrite your producer to use Avro Schema and Serializers if you've not already
If we create AVRO schema will it work for Json as well .
Avro is a Binary format, but there is a JSONDecoder for it.
what should be the URL for our AVRO schema properties file ?
It needs to be the IP of your Schema Registry once you figure out how to start it. (with schema-registry-start)
Do i have to change anything in avro-consumer properties file ?
You need to use the Avro Deserializer
is this correct way to register schema
.> /bin/kafka-avro-console-producer \
Not quite. That's how you produce a message with a schema (and you need to use the correct schema). You also must provide --property schema.registry.url
You use the REST API of the Registry to register and verify schemas
I have a use case where I have a JSON and I want to generate schema and record out of the JSON and publish a record.
I have configured the value serializer and Schema setting is Backward compatible.
First JSON
String json = "{\n" +
" \"id\": 1,\n" +
" \"name\": \"Headphones\",\n" +
" \"price\": 1250.0,\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
Version 1 schema registered.
Received message in avro console consumer.
Second JSON.
String json = "{\n" +
" \"id\": 1,\n" +
" \"price\": 1250.0,\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
Registered schema Successfully.
Sent message.
Now tried sending the JSON 1 sent successfully
Schema 3:
String json = "{\n" +
" \"id\": 1,\n" +
" \"name\": \"Headphones\",\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
Got error for this case.
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409
How is that schema generated from 2nd JSON was registered and the
third one was rejected? Although I didn't have any Default key for the
deleted field? Is it that the Schema Registry always accepts the 1st
evolution? (2nd schema over 1st)
Schema in schema registry
Version 1 schema
{
"fields": [
{
"doc": "Type inferred from '1'",
"name": "id",
"type": "int"
},
{
"doc": "Type inferred from '\"Headphones\"'",
"name": "name",
"type": "string"
},
{
"doc": "Type inferred from '1250.0'",
"name": "price",
"type": "double"
},
{
"doc": "Type inferred from '[\"home\",\"green\"]'",
"name": "tags",
"type": {
"items": "string",
"type": "array"
}
}
],
"name": "myschema",
"type": "record" }
Version 2:
{
"fields": [
{
"doc": "Type inferred from '1'",
"name": "id",
"type": "int"
},
{
"doc": "Type inferred from '1250.0'",
"name": "price",
"type": "double"
},
{
"doc": "Type inferred from '[\"home\",\"green\"]'",
"name": "tags",
"type": {
"items": "string",
"type": "array"
}
}
],
"name": "myschema",
"type": "record" }
Let's go over the backwards compatibility rules... https://docs.confluent.io/current/schema-registry/avro.html#compatibility-types
First, the default isn't transitive, so version 3 only will look at version 2.
The backwards rule states you can delete fields or add optional fields (those with a default). I assume your schema generator tool doesn't know how to use optionals, so you're only allowed to delete, not add.
Between version 1 and 2, you've deleted the name field, which is valid.
Between version 2 and the incoming 3, it thinks you're trying to post a new schema which removes price (this is okay}, but adds a required name field, which is not allowed.
I'm using Avro schema to write data to Kafka topic. Initially, everything worked fine. After adding one more new field(scan_app_id) in avro file. I'm facing this error.
Avro file:
{
"type": "record", "name": "Initiate_Scan", "namespace": "avro",
"doc": "Avro schema registry for Initiate_Scan", "fields": [
{
"name": "app_id",
"type": "string",
"doc": "3 digit application id"
},
{
"name": "app_name",
"type": "string",
"doc": "application name"
},
{
"name": "dev_stage",
"type": "string",
"doc": "development stage"
},
{
"name": "scan_app_id",
"type": "string",
"doc": "unique scan id for an app in Veracode"
},
{
"name": "scan_name",
"type": "string",
"doc": "scan details"
},
{
"name": "seq_num",
"type": "int",
"doc": "unique number"
},
{
"name": "result_flg",
"type": "string",
"doc": "Y indicates results of scan available",
"default": "Y"
},
{
"name": "request_id",
"type": "int",
"doc": "unique id"
},
{
"name": "scan_number",
"type": "int",
"doc": "number of scans"
} ] }
Error:
Caused by: org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema:
{"type":"record","name":"Initiate_Scan","namespace":"avro","doc":"Avro
schema registry for
Initiate_Scan","fields":[{"name":"app_id","type":{"type":"string","avro.java.string":"String"},"doc":"3
digit application
id"},{"name":"app_name","type":{"type":"string","avro.java.string":"String"},"doc":"application
name"},{"name":"dev_stage","type":{"type":"string","avro.java.string":"String"},"doc":"development
stage"},{"name":"scan_app_id","type":{"type":"string","avro.java.string":"String"},"doc":"unique
scan id for an
App"},{"name":"scan_name","type":{"type":"string","avro.java.string":"String"},"doc":"scan
details"},{"name":"seq_num","type":"int","doc":"unique
number"},{"name":"result_flg","type":{"type":"string","avro.java.string":"String"},"doc":"Y
indicates results of scan
available","default":"Y"},{"name":"request_id","type":"int","doc":"unique
id"},{"name":"scan_number","type":"int","doc":"number of scans"}]}
INFO Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer:1017)
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Register operation timed out; error code: 50002
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:292)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:284)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:279)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:61)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:93)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:72)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:54)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:768)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:745)
at com.ssc.svc.svds.initiate.InitiateProducer.initiateScanData(InitiateProducer.java:146)
at com.ssc.svc.svds.initiate.InitiateProducer.topicsData(InitiateProducer.java:41)
at com.ssc.svc.svds.initiate.InputData.main(InputData.java:31)
I went through Confluent documentation about 50002 error, which says
A schema should be compatible with the previously registered schema.
Does this mean I cannot make changes/update existing schema ?
How to fix this?
Actually, the link says 50002 -- Operation timed out. If it was indeed incompatible, the response would actually say so.
In any case, if you add a new field, you are required to define a default value.
This way, any consumers defined with a newer schema that are reading older messages know what value to set to that field.
A straight-forward list of allowed Avro changes I found is by Oracle
Possible errors are:
A field is added without a default value
I am trying to validate an inserted document against a schema, and was trying to find a way to validate the inserted document.
There are libraries like MongoEngine that say they do the work, but is there a way to do document validation directly via pymongo ?
The python driver docs are indeed a little light on how to use the db.command. Here is a complete working example:
from pymongo import MongoClient
from collections import OrderedDict
import sys
client = MongoClient() # supply connection args as appropriate
db = client.testX
db.myColl.drop()
db.create_collection("myColl") # Force create!
# $jsonSchema expression type is prefered. New since v3.6 (2017):
vexpr = {"$jsonSchema":
{
"bsonType": "object",
"required": [ "name", "year", "major", "gpa" ],
"properties": {
"name": {
"bsonType": "string",
"description": "must be a string and is required"
},
"gender": {
"bsonType": "string",
"description": "must be a string and is not required"
},
"year": {
"bsonType": "int",
"minimum": 2017,
"maximum": 3017,
"exclusiveMaximum": False,
"description": "must be an integer in [ 2017, 3017 ] and is required"
},
"major": {
"enum": [ "Math", "English", "Computer Science", "History", None ],
"description": "can only be one of the enum values and is required"
},
"gpa": {
# In case you might want to allow doubles OR int, then add
# "int" to the bsonType array below:
"bsonType": [ "double" ],
"minimum": 0,
"description": "must be a double and is required"
}
}
}
}
# Per the docs, args to command() require that the first kev/value pair
# be the command string and its principal argument, followed by other
# arguments. There are two ways to do this: Using an OrderDict:
cmd = OrderedDict([('collMod', 'myColl'),
('validator', vexpr),
('validationLevel', 'moderate')]
db.command(cmd)
# Or, use the kwargs construct:
# db.command('collMod','myColl', validator=vexpr, validationLevel='moderate')
try:
db.myColl.insert({"x":1})
print "NOT good; the insert above should have failed."
except:
print "OK. Expected exception:", sys.exc_info()
try:
okdoc = {"name":"buzz", "year":2019, "major":"Math", "gpa":3.8}
db.myColl.insert(okdoc)
print "All good."
except:
print "exc:", sys.exc_info()
MongoDB supports document validation at the engine level so you'll pick it up via pymongo. You declare your "schema" (rules actually) to the engine. Here's a great place to start: https://docs.mongodb.com/manual/core/document-validation/
You can make a separated JSON file for your Document Validations Schema, like this:
{
"collMod": "users",
"validator": {
"$jsonSchema": {
"bsonType": "object",
"required": ["email", "password","name"],
"properties": {
"email": {
"bsonType": "string",
"description": "Correo Electrónico"
},
"password": {
"bsonType": "string",
"description": "Una representación Hash de la contraseña"
},
"name": {
"bsonType": "object",
"required": ["first", "last"],
"description": "Objeto que separa los nombres y apellidos",
"properties": {
"first": {
"bsonType": "string",
"description": "Primer y segundo nombre"
},
"last": {
"bsonType": "string",
"description": "Primer y segundo apellido"
}
}
},
}
}
}
}
Then you can use in python script, example:
from pymongo import MongoClient
import json #parse JSON file as dict
from collections import OrderedDict #preserve the order (key, value) in the gived insertions on the dict
client = MongoClient("your_mongo_uri")
db = client.your_db_name
with open('your_schema_file.json', 'r') as j:
d = json.loads(j.read())
d = OrderedDict(d)
db.command(d)
OrderedDict Info
collMod Info
Schema Validation Info
I know 2 options to deal with:
By creating or setting schema for collection, so any insertions will be checked against it on server side, rejected or warned depending on validationAction
The following code demonstrates scheme creation and testing:
import pymongo
mongo_client = MongoClient(url=...,
port=...,
username=...,
password=...,
authSource=...,
authMechanism=...,
connect=True, )
mongo_client.server_info()
db = mongo_client.your_db
users = db.create_collection(name="users",
validator={"$jsonSchema": {
"bsonType": "object",
"required": ["username"],
"properties": {
"username": {
"bsonType": "string",
"pattern": "[a-z0-9]{5,15}",
"description": "user name (required), only lowercase letters "
"and digits allowed, from 5 to 15 characters long"
},
"email": {
"bsonType": "string",
"description": "User's email (optional)"
},
}
}},
validationAction="error",
)
# Inserting user document that fits the scheme
users.insert_one({"username": "admin", "email": "some_admin_mail"})
# Insertion below will be rejected with "pymongo.errors.WriteError: Document failed validation, full error"
# caused by too short username (root)
users.insert_one({"username": "root", "email": "some_root_mail"})
You can think about your Mongo's documents as ordinary JSON entities and check them on the client code side using standard JSON scheme validation
from jsonschema import validate
from jsonschema.exceptions import ValidationError
db = MongoClient(...).your_db
schema = {
"type": "object",
"required": ["username"],
"properties": {
"username": {"type": "string", "pattern": "[a-z0-9]{5,15}"},
"email": {"type": "string"},
},
}
try:
new_user = {"username": "admin", "email": "some_admin_mail"}
# No exception will be raised in validation below
validate(instance=new_user, schema=schema)
db.users.insert_one(new_user)
new_user = {"username": "root", "email": "some_root_mail"}
# Exception <ValidationError: 'root' does not match '[a-z0-9]{5,15}'> will be raised
validate(instance=new_user, schema=schema)
db.users.insert_one(new_user)
except ValidationError:
# Performing error