Avro invalid default for union field - scala

I'm trying to serialise and then write to the hortonworks schema registry an avro schema but I'm getting the following error message during the write operation.
Caused by: java.lang.RuntimeException: An exception was thrown while processing request with message: [Invalid default for field viewingMode: null not a [{"type":"record","name":"aName","namespace":"domain.assembled","fields":[{"name":"aKey","type":"string"}]},{"type":"record","name":"anotherName","namespace":"domain.assembled","fields":[{"name":"anotherKey","type":"string"},{"name":"yetAnotherKey","type":"string"}]}]]
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.handleSchemaIdVersionResponse(SchemaRegistryClient.java:678)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.doAddSchemaVersion(SchemaRegistryClient.java:664)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.lambda$addSchemaVersion$1(SchemaRegistryClient.java:591)
This is the avro schema
{
"type": "record",
"name": "aSchema",
"namespace": "domain.assembled",
"fields": [
{
"name": "viewingMode",
"type": [
{
"name": "aName",
"type": "record",
"fields": [
{"name": "aKey","type": "string"}
]
},
{
"name": "anotherName",
"type": "record",
"fields": [
{"name": "anotherKey","type": "string"},
{"name": "yetAnotherKey","type": "string"}
]
}
]
}
]
}
Whoever if I add a "null" as the first type of the union this the succeeds. Do avro union types require a "null"? In my case this would be an incorrect representation of data so I'm not keen on doing it.
If it makes any difference I'm using avro 1.9.1.
Also, apologies if the tags are incorrect but couldn't find a hortonworks-schema-registry tag and don't have enough rep to create a new one.

Turns out if was an issue with hortonwork's schema registry.
This has actually already been fixed here and I've requested a new release here. Hopefully this happens soon.

Related

Avro: org.apache.avro.AvroTypeException: Expected long. Got START_OBJECT

I am working on an Avro schema and trying to create a testing data to test it with Kafa, but when I produce the message got this error: "Caused by: org.apache.avro.AvroTypeException: Expected long. Got START_OBJECT"
The Schema I created is like this:
{
"name": "MyClass",
"type": "record",
"namespace": "com.acme.avro",
"doc":"This schema is for streaming information",
"fields":[
{"name":"batchId", "type": "long"},
{"name":"status", "type": {"type": "enum", "name": "PlannedTripRequestedStatus", "namespace":"com.acme.avro.Dtos", "symbols":["COMPLETED", "FAILED"]}},
{"name":"runRefId", "type": "int"},
{"name":"tripId", "type": ["null", "int"]},
{"name": "referenceNumber", "type": ["null", "string"]},
{"name":"errorMessage", "type": ["null", "string"]}
]
}
The testing data is like this:
{
"batchId": {
"long": 3
},
"status": "COMPLETED",
"runRefId": {
"int": 1000
},
"tripId": {
"int": 200
},
"referenceNumber": {
"string": "ReferenceNumber1111"
},
"errorMessage": {
"string": "Hello World"
}
}
However, when I registered this schema and try to produce a message with Confluent console tool, I got the error: org.apache.avro.AvroTypeException: Expected long. Got START_OBJECT The whole error message is like this:
org.apache.kafka.common.errors.SerializationException: Error deserializing {"batchId": ...} to Avro of schema {"type":...}" at io.confluent.kafka.formatter.AvroMessageReader.readFrom(AvroMessageReader.java:134)
at io.confluent.kafka.formatter.SchemaMessageReader.readMessage(SchemaMessageReader.java:325)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:51)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
Caused by: org.apache.avro.AvroTypeException: Expected long. Got START_OBJECT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:511)
at org.apache.avro.io.JsonDecoder.readLong(JsonDecoder.java:177)
at org.apache.avro.io.ResolvingDecoder.readLong(ResolvingDecoder.java:169)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:197)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at io.confluent.kafka.schemaregistry.avro.AvroSchemaUtils.toObject(AvroSchemaUtils.java:213)
at io.confluent.kafka.formatter.AvroMessageReader.readFrom(AvroMessageReader.java:124)
Does any know what I did wrong with my schema or test data? Thank you so much!
You only need the type object if the type is unclear (union of string or number, for example), or its nullable.
For batchId and runRefId, just use simple values

Avro union field with "local-timestamp-millis" deserialization issue

Avro: 10.1
Dataflow (Apache Beam): 2.28.0
Runner: org.apache.beam.runners.dataflow.DataflowRunner
Avro schema piece:
{
"name": "client_timestamp",
"type": [
"null",
{ "type": "long", "logicalType": "local-timestamp-millis" }
],
"default": null,
"doc": "Client side timestamp of this xxx"
},
An exception when writing Avro output file:
Caused by: org.apache.avro.UnresolvedUnionException:
Not in union ["null",{"type":"long","logicalType":"local-timestamp-millis"}]:
2021-03-12T12:21:17.599
Link to a longer stacktrace
Some of the steps taken:
Replacing "logicalType":"local-timestamp-millis" with "logicalType":"timestamp-millis" causes the same error.
Writing Avro locally also works.
Removing "type": "null" option eliminates the exception
Try something like following:
{
"name": "client_timestamp",
"type": ["null", "long"],
"doc": "Client side timestamp of this xxx",
"default": null,
"logicalType": "timestamp-millis"
}
It is a way to define logicalType in avro schema (link).

Problem producing Avro serialized object through kafka-avro-console-producer

I'm producing a message by using the kafka-avro-console-producer binary by doing:
kafka-avro-console-producer --broker-list broker:9092 --topic example-topic --property schema.registry.url='http://schema-registry:8081 --property value.schema='{"type": "record","name": "test","fields": [{"name": "before", "type": ["null", {"type": "record", "name": "columns", "fields":[{"name": "name", "type": "string"}]}],"default": "null"},{"name": "after", "type": ["null", "columns"],"default": "null"}]}'
{"before": null,"after": {"name": "John"}}'
sending the following message:
{"before": null,"after": {"name": "John"}}
and by appling the following Avro schema:
{
"type": "record",
"name": "test",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "columns",
"fields": [{
"name": "name",
"type": "string"
}]
}],
"default": "null"
}, {
"name": "after",
"type": ["null", "columns"],
"default": "null"
}]
}
the error I'm getting in return is the following:
Caused by: org.apache.avro.AvroTypeException: Unknown union branch name
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
at io.confluent.kafka.formatter.AvroMessageReader.jsonToAvro(AvroMessageReader.java:213)
at io.confluent.kafka.formatter.AvroMessageReader.readMessage(AvroMessageReader.java:180)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:55)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
For those of you that are willing to go deeper into the rabbit hole, I'm making an integration between Oracle Golden Gate and Apache Kafka by using the Oracle Golden Gate Big Data connector. I'm currently experiencing problems with an equivalent model of the one described in here:
https://www.ateam-oracle.com/oracle-goldengate-big-data-adapter-apache-kafka-producer
When trying to apply the schema described in the above webpage to it's corresponding model(and after completing the JSON model), I'm getting the same error as the one I'm getting with the model and schema in the question.
Thank you all very much.
This is the problem
"type": ["null", "columns"]
You cannot refer back to other record types. You'll need to expand that out like you did for the other field

Debezium might produce invalid schema

I face an issue with Avro and Schema Registry. After Debezium created a schema and a topic, I have downloaded the schema from Schema Registry. I put it into a .asvc file and it looks like this:
{
"type": "record",
"name": "Envelope",
"namespace": "my.namespace",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "MyValue",
"fields": [
{
"name": "id",
"type": "int"
}
]
}
],
"default": null
},
{
"name": "after",
"type": [
"null",
"MyValue"
],
"default": null
}
]
}
I ran two experiments:
I tried to put it back into Schema Registry but I get this error: MyValue is not correct. When I remove "after" record, the schema seems to work well.
I used 'generate-sources' from avro-maven-plugin to generate the Java classes. When I try to consume the topic above, I see this error:
Exception in thread "b2-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. [...]: Error registering Avro schema: [...]
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409
Did anyone faced the same problem? Is it Debezium that is producing an invalid schema or is Schema Registry that is having a bug?
MyValue is not correct.
That's not an Avro type. You would have to embed the actual record within the union, just like the before value
In other words, you're not able to cross reference the record types within a schema, AFAIK
When I try to consume the topic above, I see this error:
A consumer does not register schemas, so it's not clear how you're getting that error unless maybe using Kafka Streams, which produces into intermediate topics

Avro genericdata.Record ignores data types

I have the following avro schema
{ "namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
I use the following snippet to set up a Record
val schema = new Schema.Parser().parse(new File("data/user.avsc"))
val user1 = new GenericData.Record(schema) //strangely this schema only checks for valid fields NOT types.
user1.put("name", "Fred")
user1.put("favorite_number", "Jones")
I would have thought that this would fail to validate against the schema
When I add the line
user1.put("last_name", 100)
It generates a run time error, which is what I would expect in the first case as well.
Exception in thread "main" org.apache.avro.AvroRuntimeException: Not a valid schema field: last_name
at org.apache.avro.generic.GenericData$Record.put(GenericData.java:125)
at csv2avro$.main(csv2avro.scala:40)
at csv2avro.main(csv2avro.scala)
What's going on here?
It won't fail when adding it into the Record, it will fail when it tries to serialize because it is at that point when it is trying to match the type. As far as I'm aware that is the only place it does type checking.