Invalid Schema on Confluent Controlcenter - apache-kafka

I am just trying to set up a Value-Schema for a Topic in the Web interface of Confluent Control Center.
I chose the Avro-format and tried the following schema:
{
"fields": [
{"name":"date",
"type":"dates",
"doc":"Date of the count"
},
{"name":"time",
"type":"timestamp-millis",
"doc":"date in ms"
},
{"name":"count",
"type":"int",
"doc":"Number of Articles"
}
],
"name": "articleCount",
"type": "record"
}
But the interface keeps on saying the input schema is invalid.
I have no idea why.
Any help is appreciated!

There are issues related to datatypes.
"type":"dates" => "type": "string"
"type":"timestamp-millis" => "type": {"type": "long", "logicalType": "timestamp-millis"}
Updated schema will be like:
{
"fields": [
{
"name": "date",
"type": "string",
"doc": "Date of the count"
},
{
"name": "time",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
},
"doc": "date in ms"
},
{
"name": "count",
"type": "int",
"doc": "Number of Articles"
}
],
"name": "articleCount",
"type": "record"
}
Sample Payload:
{
"date": "2020-07-10",
"time": 12345678900,
"count": 1473217
}
More reference related to Avro datatypes can be found here:
https://docs.oracle.com/database/nosql-12.1.3.0/GettingStartedGuide/avroschemas.html

Related

Kafka & Avro producer - Schema being registered is incompatible with an earlier schema for subject

Im running schema-registry-confluent example with my local and I got an error when I modified the schema of the message:
This is my schema:
{
"type": "record",
"namespace": "io.confluent.tutorial.pojo.avro",
"name": "OrderDetail",
"fields": [
{
"name": "number",
"type": "long",
"doc": "The order number."
},
{
"name": "date",
"type": "long",
"logicalType": "date",
"doc": "The date the order was submitted."
},
{
"name": "client",
"type": {
"type": "record",
"name": "Client",
"fields": [
{ "name": "code", "type": "string" }
]
}
}
]
}
And I tried to send this message on the producer:
{"number": 2343434, "date": 1596490462, "client": {"code": "1234"}}
But I got this error:
org.apache.kafka.common.errors.InvalidConfigurationException: Schema being registered is incompatible with an earlier schema for subject "example-topic-avro-value"; error code: 409

org.apache.avro.AvroTypeException: Unknown union branch EventId

I am trying to convert a json to avro using 'kafka-avro-console-producer' and publish it to kafka topic.
I am able to do that flat json/schema's but for below given schema and json I am getting
"org.apache.avro.AvroTypeException: Unknown union branch EventId" error.
Any help would be appreciated.
Schema :
{
"type": "record",
"name": "Envelope",
"namespace": "CoreOLTPEvents.dbo.Event",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "Value",
"fields": [{
"name": "EventId",
"type": "long"
}, {
"name": "CameraId",
"type": ["null", "long"],
"default": null
}, {
"name": "SiteId",
"type": ["null", "long"],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Value"
}],
"default": null
}, {
"name": "after",
"type": ["null", "Value"],
"default": null
}, {
"name": "op",
"type": "string"
}, {
"name": "ts_ms",
"type": ["null", "long"],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Envelope"
}
And Json input is like below :
{
"before": null,
"after": {
"EventId": 12,
"CameraId": 10,
"SiteId": 11974
},
"op": "C",
"ts_ms": null
}
And in my case I cant alter schema, I can alter only json such a way that it works
If you are using the Avro JSON format, the input you have is slightly off. For unions, non-null values need to be specified such that the type information is listed: https://avro.apache.org/docs/current/spec.html#json_encoding
See below for an example which I think should work.
{
"before": null,
"after": {
"CoreOLTPEvents.dbo.Event.Value": {
"EventId": 12,
"CameraId": {
"long": 10
},
"SiteId": {
"long": 11974
}
}
},
"op": "C",
"ts_ms": null
}
Removing "connect.name": "CoreOLTPEvents.dbo.Event.Value" and "connect.name": "CoreOLTPEvents.dbo.Event.Envelope" as The RecordType can only contains {'namespace', 'aliases', 'fields', 'name', 'type', 'doc'} keys.
Could you try with below schema and see if you are able to produce the msg?
{
"type": "record",
"name": "Envelope",
"namespace": "CoreOLTPEvents.dbo.Event",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "Value",
"fields": [
{
"name": "EventId",
"type": "long"
},
{
"name": "CameraId",
"type": [
"null",
"long"
],
"default": "null"
},
{
"name": "SiteId",
"type": [
"null",
"long"
],
"default": "null"
}
]
}
],
"default": null
},
{
"name": "after",
"type": [
"null",
"Value"
],
"default": null
},
{
"name": "op",
"type": "string"
},
{
"name": "ts_ms",
"type": [
"null",
"long"
],
"default": null
}
]
}

Deserialize Avro message with schema type as object

I'm reading from a Kafka topic, which contains Avro messages. I have given below schema.
I'm unable to parse the schema and deserialize the message.
{
"type": "record",
"name": "Request",
"namespace": "com.sk.avro.model",
"fields": [
{
"name": "empId",
"type": [
"null",
"string"
],
"default": null,
"description": "REQUIRED "
},
{
"name": "carUnit",
"type": "com.example.CarUnit",
"default": "ABC",
"description": "Car id"
}
]
}
I'm getting below error :
The type of the "carUnit" field must be a defined name or a {"type": ...} expression
Can anyone please help out.
How about
{
"name": "businessUnit",
"type": {
"type": {
"type": "record ",
"name": "BusinessUnit ",
"fields": [{
"name": "hostName ",
"type": "string "
}]
}
}
}

Camel Json Validation throws NoJsonBodyValidationException

I'm trying to perform Header validation for incoming GET request. I referred the Camel JSON schema validator component and followed below steps to implement in my project i.e.
Adding camel-json-validator-starter dependency in build.gradle
Adding Employee.json (YAML converted to JSON) in Resource folder of my Spring boot project. Here initially I had Open API 3.0 yaml specification file and I converted the same to json
Invoking validation with below code
rest(/employee).id("get-employee")
.produces(JSON_MEDIA_TYPE)
.get()
.description("The employee API")
.outType(EmployeeResponse.class)
.responseMessage()
.code(HttpStatus.OK.toString())
.message("Get Employee")
.endResponseMessage()
.route()
.to("json-validator:openapi.json")
.to("bean:employeeService?method=getEmployee()");
Running the project throws a org.apache.camel.component.jsonvalidator.NoJsonBodyValidationException, I'm using GET request but why is it expecting Request body, I just wanted to validate the Headers and request param from the incoming request. I'm not sure if my approach is right and what I'm missing.
I ran into this problem last year when adopting OpenAPI and came to the conclusion that it was too much work. I could not get FULL validation from the JSON validator using OpenAPI because there was some differences between the way OpenAPI declares schema definitions and the full JSON schema definitions.
Looking a the documentation of the JSON validation component you find this:
The JSON Schema Validator component performs bean validation of the message body against JSON Schemas v4 draft using the NetworkNT JSON Schema library (https://github.com/networknt/json-schema-validator). This is a full stand alone JSON Schema and if you read the github pages you find this.
OpenAPI Support
The OpenAPI 3.0 specification is using JSON schema to validate the request/response, but there are some differences. With a configuration file, you can enable the library to work with OpenAPI 3.0 validation.
OpenAPI schema appears to be a subset of the real JSON Schema.
Before I show you a more detailed example. Look at the example given in the camel documentation here: https://camel.apache.org/components/latest/json-validator-component.html. Compare that json schema file with the openAPI schema definitions and you will see they are not the same.
A useful tool here is https://jsonschema.net you can paste your json example here and infer a schema. I use this tool and the OpenAPI Pet Store example in the example below,
OpenAPI Petstore Pet Object Example:
{
"id": 0,
"category": {
"id": 0,
"name": "string"
},
"name": "doggie",
"photoUrls": [
"string"
],
"tags": [
{
"id": 0,
"name": "string"
}
],
"status": "available"
}
The openAPI specification saved in JSON produces this definition:
"Pet": {
"type": "object",
"required": [
"name",
"photoUrls"
],
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"category": {
"$ref": "#/definitions/Category"
},
"name": {
"type": "string",
"example": "doggie"
},
"photoUrls": {
"type": "array",
"xml": {
"name": "photoUrl",
"wrapped": true
},
"items": {
"type": "string"
}
},
"tags": {
"type": "array",
"xml": {
"name": "tag",
"wrapped": true
},
"items": {
"$ref": "#/definitions/Tag"
}
},
"status": {
"type": "string",
"description": "pet status in the store",
"enum": [
"available",
"pending",
"sold"
]
}
},
"xml": {
"name": "Pet"
}
}
When I convert this to proper JSON schema syntax the JSON Schema looks like this:
{
"definitions": {},
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://example.com/root.json",
"type": "object",
"title": "The Root Schema",
"required": [
"id",
"category",
"name",
"photoUrls",
"tags",
"status"
],
"properties": {
"id": {
"$id": "#/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"category": {
"$id": "#/properties/category",
"type": "object",
"title": "The Category Schema",
"required": [
"id",
"name"
],
"properties": {
"id": {
"$id": "#/properties/category/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"name": {
"$id": "#/properties/category/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
}
},
"name": {
"$id": "#/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"doggie"
],
"pattern": "^(.*)$"
},
"photoUrls": {
"$id": "#/properties/photoUrls",
"type": "array",
"title": "The Photourls Schema",
"items": {
"$id": "#/properties/photoUrls/items",
"type": "string",
"title": "The Items Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
},
"tags": {
"$id": "#/properties/tags",
"type": "array",
"title": "The Tags Schema",
"items": {
"$id": "#/properties/tags/items",
"type": "object",
"title": "The Items Schema",
"required": [
"id",
"name"
],
"properties": {
"id": {
"$id": "#/properties/tags/items/properties/id",
"type": "integer",
"title": "The Id Schema",
"default": 0,
"examples": [
0
]
},
"name": {
"$id": "#/properties/tags/items/properties/name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"string"
],
"pattern": "^(.*)$"
}
}
}
},
"status": {
"$id": "#/properties/status",
"type": "string",
"title": "The Status Schema",
"default": "",
"examples": [
"available"
],
"pattern": "^(.*)$"
}
}
}
There is some differences between OpenAPI's schema definition and JSON Schema definition.
failOnNullBody (producer) - Whether to fail if no body exists.
Default is true
Try setting the option in your call:
.to("json-validator:openapi.json?failOnNullBody=false")

How do I COPY a nested Avro field to Redshift as a single field?

I have the following Avro schema for a record, and I'd like to issue a COPY to Redshift:
"fields": [{
"name": "id",
"type": "long"
}, {
"name": "date",
"type": {
"type": "record",
"name": "MyDateTime",
"namespace": "com.mynamespace",
"fields": [{
"name": "year",
"type": "int"
}, {
"name": "monthOfYear",
"type": "int"
}, {
"name": "dayOfMonth",
"type": "int"
}, {
"name": "hourOfDay",
"type": "int"
}, {
"name": "minuteOfHour",
"type": "int"
}, {
"name": "secondOfMinute",
"type": "int"
}, {
"name": "millisOfSecond",
"type": ["int", "null"],
"default": 0
}, {
"name": "zone",
"type": {
"type": "string",
"avro.java.string": "String"
},
"default": "America/New_York"
}],
"noregistry": []
}
}]
I want to condense the object in MyDateTime during the COPY to a single column in Redshift. I saw that you can map nested JSON data to a top-level column: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json-jsonpaths , but I haven't figured out a way to concatenate the fields directly in the COPY command.
In other words, is there a way to convert the following record (originally in Avro format)
{
"id": 6,
"date": {
"year": 2010,
"monthOfYear": 10,
"dayOfMonth": 12,
"hourOfDay": 14,
"minuteOfHour": 26,
"secondOfMinute": 42,
"millisOfSecond": {
"int": 0
},
"zone": "America/New_York"
}
}
Into a row in Redshift that looks like:
id | date
---------------------------------------------
6 | 2010-10-12 14:26:42:000 America/New_York
I'd like to do this directly with COPY
You would need to declare the Avro file(s) as an Redshift Spectrum external table and then use a query over that to insert the data into the local Redshift table.
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html