Deserialize Avro message with schema type as object - apache-kafka

I'm reading from a Kafka topic, which contains Avro messages. I have given below schema.
I'm unable to parse the schema and deserialize the message.
{
"type": "record",
"name": "Request",
"namespace": "com.sk.avro.model",
"fields": [
{
"name": "empId",
"type": [
"null",
"string"
],
"default": null,
"description": "REQUIRED "
},
{
"name": "carUnit",
"type": "com.example.CarUnit",
"default": "ABC",
"description": "Car id"
}
]
}
I'm getting below error :
The type of the "carUnit" field must be a defined name or a {"type": ...} expression
Can anyone please help out.

How about
{
"name": "businessUnit",
"type": {
"type": {
"type": "record ",
"name": "BusinessUnit ",
"fields": [{
"name": "hostName ",
"type": "string "
}]
}
}
}

Related

PubSub Subscription error with REPEATED Column Type - Avro Schema

I am trying to use the PubSub Subscription "Write to BigQuery" but am running into an issue with the "REPEATED" column type. the message I get when update the subscription is
Incompatible schema mode for field 'Values': field is REQUIRED in the topic schema, but REPEATED in the BigQuery table schema
My Avro Schema is:
{
"type": "record",
"name": "Avro",
"fields": [
{
"name": "ItemID",
"type": "string"
},
{
"name": "UserType",
"type": "string"
},
{
"name": "Values",
"type": [
{
"type": "record",
"name": "Values",
"fields": [
{
"name": "AttributeID",
"type": "string"
},
{
"name": "AttributeValue",
"type": "string"
}
]
}
]
}
]
}
Input JSON That "Matches" Schema:
{
"ItemID": "Item_1234",
"UserType": "Item",
"Values": {
"AttributeID": "TEST_ID_1",
"AttributeValue": "Value_1"
}
}
my Table looks like:
ItemID | STRING | NULLABLE
UserType | STRING | NULLABLE
Values | RECORD | REPEATED
AttributeID | STRING | NULLABLE
AttributeValue | STRING | NULLABLE
I am able to "Test" and "Validate Schema" and it comes back with a success. Question is, what am I missing on the Avro for the Values node to make it "REPEATED" vs "Required" for subscription to be created.
The issue is that Values is not an array type in your Avro schema, meaning it expects only one in the message, while it is a repeated type in your BigQuery schema, meaning it expects a list of them.
Per Kamal's comment above, this schema works:
{
"type": "record",
"name": "Avro",
"fields": [
{
"name": "ItemID",
"type": "string"
},
{
"name": "UserType",
"type": "string"
},
{
"name": "Values",
"type": {
"type": "array",
"items": {
"name": "NameDetails",
"type": "record",
"fields": [
{
"name": "ID",
"type": "string"
},
{
"name": "Value",
"type": "string"
}
]
}
}
}
]
}
the payload:
{
"ItemID": "Item_1234",
"UserType": "Item",
"Values": [
{ "AttributeID": "TEST_ID_1" },
{ "AttributeValue": "Value_1" }
]
}

Kafka & Avro producer - Schema being registered is incompatible with an earlier schema for subject

Im running schema-registry-confluent example with my local and I got an error when I modified the schema of the message:
This is my schema:
{
"type": "record",
"namespace": "io.confluent.tutorial.pojo.avro",
"name": "OrderDetail",
"fields": [
{
"name": "number",
"type": "long",
"doc": "The order number."
},
{
"name": "date",
"type": "long",
"logicalType": "date",
"doc": "The date the order was submitted."
},
{
"name": "client",
"type": {
"type": "record",
"name": "Client",
"fields": [
{ "name": "code", "type": "string" }
]
}
}
]
}
And I tried to send this message on the producer:
{"number": 2343434, "date": 1596490462, "client": {"code": "1234"}}
But I got this error:
org.apache.kafka.common.errors.InvalidConfigurationException: Schema being registered is incompatible with an earlier schema for subject "example-topic-avro-value"; error code: 409

Confluent Kafka producer message format for nested records

I have a AVRO schema registered in a kafka topic and am trying to send data to it. The schema has nested records and I'm not sure how I correctly send data to it using confluent_kafka python.
Example schema: *ingore any typos in schema (real one is very large, just an example)
{
"namespace": "company__name",
"name": "our_data",
"type": "record",
"fields": [
{
"name": "datatype1",
"type": ["null", {
"type": "record",
"name": "datatype1_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
{
"name": "datatype2",
"type": ["null", {
"type": "record",
"name": "datatype2_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
]
}
I am trying to send data to this schema using confluent_kafka python version. When I have done this prior, the records were not nested and I would use a typical dictionary key: value pairs and serialize it. How can I send nested data to work with schema.
What I tried so far...
message = {'datatype1':
{'site': 'sitename',
'units': 'm'
}
}
this version does not cause any kafka errors, but the all of the columns show up as null
and...
message = {'datatype1':
{'datatype1_1':
{'site': 'sitename',
'units': 'm'
}
}
}
This version produced a kafka error with the schema.
If you use namespaces, you don't have to worry about naming collisions and you can properly structure your optional records:
for example, both
{
"meta": {
"instanceID": "something"
}
}
And
{}
are valid instances of:
{
"doc": "Survey",
"name": "Survey",
"type": "record",
"fields": [
{
"name": "meta",
"type": [
"null",
{
"name": "meta",
"type": "record",
"fields": [
{
"name": "instanceID",
"type": [
"null",
"string"
],
"namespace": "Survey.meta"
}
],
"namespace": "Survey"
}
],
"namespace": "Survey"
}
]
}

Invalid Schema on Confluent Controlcenter

I am just trying to set up a Value-Schema for a Topic in the Web interface of Confluent Control Center.
I chose the Avro-format and tried the following schema:
{
"fields": [
{"name":"date",
"type":"dates",
"doc":"Date of the count"
},
{"name":"time",
"type":"timestamp-millis",
"doc":"date in ms"
},
{"name":"count",
"type":"int",
"doc":"Number of Articles"
}
],
"name": "articleCount",
"type": "record"
}
But the interface keeps on saying the input schema is invalid.
I have no idea why.
Any help is appreciated!
There are issues related to datatypes.
"type":"dates" => "type": "string"
"type":"timestamp-millis" => "type": {"type": "long", "logicalType": "timestamp-millis"}
Updated schema will be like:
{
"fields": [
{
"name": "date",
"type": "string",
"doc": "Date of the count"
},
{
"name": "time",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "date in ms"
},
{
"name": "count",
"type": "int",
"doc": "Number of Articles"
}
],
"name": "articleCount",
"type": "record"
}
Sample Payload:
{
"date": "2020-07-10",
"time": 12345678900,
"count": 1473217
}
More reference related to Avro datatypes can be found here:
https://docs.oracle.com/database/nosql-12.1.3.0/GettingStartedGuide/avroschemas.html

Camel Json Validation throws NoJsonBodyValidationException

I'm trying to perform Header validation for incoming GET request. I referred the Camel JSON schema validator component and followed below steps to implement in my project i.e.
Adding camel-json-validator-starter dependency in build.gradle
Adding Employee.json (YAML converted to JSON) in Resource folder of my Spring boot project. Here initially I had Open API 3.0 yaml specification file and I converted the same to json
Invoking validation with below code
rest(/employee).id("get-employee")
.produces(JSON_MEDIA_TYPE)
.get()
.description("The employee API")
.outType(EmployeeResponse.class)
.responseMessage()
.code(HttpStatus.OK.toString())
.message("Get Employee")
.endResponseMessage()
.route()
.to("json-validator:openapi.json")
.to("bean:employeeService?method=getEmployee()");
Running the project throws a org.apache.camel.component.jsonvalidator.NoJsonBodyValidationException, I'm using GET request but why is it expecting Request body, I just wanted to validate the Headers and request param from the incoming request. I'm not sure if my approach is right and what I'm missing.
I ran into this problem last year when adopting OpenAPI and came to the conclusion that it was too much work. I could not get FULL validation from the JSON validator using OpenAPI because there was some differences between the way OpenAPI declares schema definitions and the full JSON schema definitions.
Looking a the documentation of the JSON validation component you find this:
The JSON Schema Validator component performs bean validation of the message body against JSON Schemas v4 draft using the NetworkNT JSON Schema library (https://github.com/networknt/json-schema-validator). This is a full stand alone JSON Schema and if you read the github pages you find this.
OpenAPI Support
The OpenAPI 3.0 specification is using JSON schema to validate the request/response, but there are some differences. With a configuration file, you can enable the library to work with OpenAPI 3.0 validation.
OpenAPI schema appears to be a subset of the real JSON Schema.
Before I show you a more detailed example. Look at the example given in the camel documentation here: https://camel.apache.org/components/latest/json-validator-component.html. Compare that json schema file with the openAPI schema definitions and you will see they are not the same.
A useful tool here is https://jsonschema.net you can paste your json example here and infer a schema. I use this tool and the OpenAPI Pet Store example in the example below,
OpenAPI Petstore Pet Object Example:
{
"id": 0,
"category": {
"id": 0,
"name": "string"
},
"name": "doggie",
"photoUrls": [
"string"
],
"tags": [
{
"id": 0,
"name": "string"
}
],
"status": "available"
}
The openAPI specification saved in JSON produces this definition:
"Pet": {
"type": "object",
"required": [
"name",
"photoUrls"
],
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"category": {
"$ref": "#/definitions/Category"
},
"name": {
"type": "string",
"example": "doggie"
},
"photoUrls": {
"type": "array",
"xml": {
"name": "photoUrl",
"wrapped": true
},
"items": {
"type": "string"
}
},
"tags": {
"type": "array",
"xml": {
"name": "tag",
"wrapped": true
},
"items": {
"$ref": "#/definitions/Tag"
}
},
"status": {
"type": "string",
"description": "pet status in the store",
"enum": [
"available",
"pending",
"sold"
]
}
},
"xml": {
"name": "Pet"
}
}
When I convert this to proper JSON schema syntax the JSON Schema looks like this:
{
"definitions": {},
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://example.com/root.json",
"type": "object",
"title": "The Root Schema",
"required": [
"id",
"category",
"name",
"photoUrls",
"tags",
"status"
],
"properties": {
"id": {
"$id": "#/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"category": {
"$id": "#/properties/category",
"type": "object",
"title": "The Category Schema",
"required": [
"id",
"name"
],
"properties": {
"id": {
"$id": "#/properties/category/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"name": {
"$id": "#/properties/category/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
}
},
"name": {
"$id": "#/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"doggie"
],
"pattern": "^(.*)$"
},
"photoUrls": {
"$id": "#/properties/photoUrls",
"type": "array",
"title": "The Photourls Schema",
"items": {
"$id": "#/properties/photoUrls/items",
"type": "string",
"title": "The Items Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
},
"tags": {
"$id": "#/properties/tags",
"type": "array",
"title": "The Tags Schema",
"items": {
"$id": "#/properties/tags/items",
"type": "object",
"title": "The Items Schema",
"required": [
"id",
"name"
],
"properties": {
"id": {
"$id": "#/properties/tags/items/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"name": {
"$id": "#/properties/tags/items/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
}
}
},
"status": {
"$id": "#/properties/status",
"type": "string",
"title": "The Status Schema",
"default": "",
"examples": [
"available"
],
"pattern": "^(.*)$"
}
}
}
There is some differences between OpenAPI's schema definition and JSON Schema definition.
failOnNullBody (producer) - Whether to fail if no body exists.
Default is true
Try setting the option in your call:
.to("json-validator:openapi.json?failOnNullBody=false")