Avro union field with "local-timestamp-millis" deserialization issue - apache-beam

Avro: 10.1
Dataflow (Apache Beam): 2.28.0
Runner: org.apache.beam.runners.dataflow.DataflowRunner
Avro schema piece:
{
"name": "client_timestamp",
"type": [
"null",
{ "type": "long", "logicalType": "local-timestamp-millis" }
],
"default": null,
"doc": "Client side timestamp of this xxx"
},
An exception when writing Avro output file:
Caused by: org.apache.avro.UnresolvedUnionException:
Not in union ["null",{"type":"long","logicalType":"local-timestamp-millis"}]:
2021-03-12T12:21:17.599
Link to a longer stacktrace
Some of the steps taken:
Replacing "logicalType":"local-timestamp-millis" with "logicalType":"timestamp-millis" causes the same error.
Writing Avro locally also works.
Removing "type": "null" option eliminates the exception

Try something like following:
{
"name": "client_timestamp",
"type": ["null", "long"],
"doc": "Client side timestamp of this xxx",
"default": null,
"logicalType": "timestamp-millis"
}
It is a way to define logicalType in avro schema (link).

Related

Avro: org.apache.avro.AvroTypeException: Expected long. Got START_OBJECT

I am working on an Avro schema and trying to create a testing data to test it with Kafa, but when I produce the message got this error: "Caused by: org.apache.avro.AvroTypeException: Expected long. Got START_OBJECT"
The Schema I created is like this:
{
"name": "MyClass",
"type": "record",
"namespace": "com.acme.avro",
"doc":"This schema is for streaming information",
"fields":[
{"name":"batchId", "type": "long"},
{"name":"status", "type": {"type": "enum", "name": "PlannedTripRequestedStatus", "namespace":"com.acme.avro.Dtos", "symbols":["COMPLETED", "FAILED"]}},
{"name":"runRefId", "type": "int"},
{"name":"tripId", "type": ["null", "int"]},
{"name": "referenceNumber", "type": ["null", "string"]},
{"name":"errorMessage", "type": ["null", "string"]}
]
}
The testing data is like this:
{
"batchId": {
"long": 3
},
"status": "COMPLETED",
"runRefId": {
"int": 1000
},
"tripId": {
"int": 200
},
"referenceNumber": {
"string": "ReferenceNumber1111"
},
"errorMessage": {
"string": "Hello World"
}
}
However, when I registered this schema and try to produce a message with Confluent console tool, I got the error: org.apache.avro.AvroTypeException: Expected long. Got START_OBJECT The whole error message is like this:
org.apache.kafka.common.errors.SerializationException: Error deserializing {"batchId": ...} to Avro of schema {"type":...}" at io.confluent.kafka.formatter.AvroMessageReader.readFrom(AvroMessageReader.java:134)
at io.confluent.kafka.formatter.SchemaMessageReader.readMessage(SchemaMessageReader.java:325)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:51)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
Caused by: org.apache.avro.AvroTypeException: Expected long. Got START_OBJECT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:511)
at org.apache.avro.io.JsonDecoder.readLong(JsonDecoder.java:177)
at org.apache.avro.io.ResolvingDecoder.readLong(ResolvingDecoder.java:169)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:197)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at io.confluent.kafka.schemaregistry.avro.AvroSchemaUtils.toObject(AvroSchemaUtils.java:213)
at io.confluent.kafka.formatter.AvroMessageReader.readFrom(AvroMessageReader.java:124)
Does any know what I did wrong with my schema or test data? Thank you so much!
You only need the type object if the type is unclear (union of string or number, for example), or its nullable.
For batchId and runRefId, just use simple values

Problem producing Avro serialized object through kafka-avro-console-producer

I'm producing a message by using the kafka-avro-console-producer binary by doing:
kafka-avro-console-producer --broker-list broker:9092 --topic example-topic --property schema.registry.url='http://schema-registry:8081 --property value.schema='{"type": "record","name": "test","fields": [{"name": "before", "type": ["null", {"type": "record", "name": "columns", "fields":[{"name": "name", "type": "string"}]}],"default": "null"},{"name": "after", "type": ["null", "columns"],"default": "null"}]}'
{"before": null,"after": {"name": "John"}}'
sending the following message:
{"before": null,"after": {"name": "John"}}
and by appling the following Avro schema:
{
"type": "record",
"name": "test",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "columns",
"fields": [{
"name": "name",
"type": "string"
}]
}],
"default": "null"
}, {
"name": "after",
"type": ["null", "columns"],
"default": "null"
}]
}
the error I'm getting in return is the following:
Caused by: org.apache.avro.AvroTypeException: Unknown union branch name
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
at io.confluent.kafka.formatter.AvroMessageReader.jsonToAvro(AvroMessageReader.java:213)
at io.confluent.kafka.formatter.AvroMessageReader.readMessage(AvroMessageReader.java:180)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:55)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
For those of you that are willing to go deeper into the rabbit hole, I'm making an integration between Oracle Golden Gate and Apache Kafka by using the Oracle Golden Gate Big Data connector. I'm currently experiencing problems with an equivalent model of the one described in here:
https://www.ateam-oracle.com/oracle-goldengate-big-data-adapter-apache-kafka-producer
When trying to apply the schema described in the above webpage to it's corresponding model(and after completing the JSON model), I'm getting the same error as the one I'm getting with the model and schema in the question.
Thank you all very much.
This is the problem
"type": ["null", "columns"]
You cannot refer back to other record types. You'll need to expand that out like you did for the other field

Avro invalid default for union field

I'm trying to serialise and then write to the hortonworks schema registry an avro schema but I'm getting the following error message during the write operation.
Caused by: java.lang.RuntimeException: An exception was thrown while processing request with message: [Invalid default for field viewingMode: null not a [{"type":"record","name":"aName","namespace":"domain.assembled","fields":[{"name":"aKey","type":"string"}]},{"type":"record","name":"anotherName","namespace":"domain.assembled","fields":[{"name":"anotherKey","type":"string"},{"name":"yetAnotherKey","type":"string"}]}]]
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.handleSchemaIdVersionResponse(SchemaRegistryClient.java:678)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.doAddSchemaVersion(SchemaRegistryClient.java:664)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.lambda$addSchemaVersion$1(SchemaRegistryClient.java:591)
This is the avro schema
{
"type": "record",
"name": "aSchema",
"namespace": "domain.assembled",
"fields": [
{
"name": "viewingMode",
"type": [
{
"name": "aName",
"type": "record",
"fields": [
{"name": "aKey","type": "string"}
]
},
{
"name": "anotherName",
"type": "record",
"fields": [
{"name": "anotherKey","type": "string"},
{"name": "yetAnotherKey","type": "string"}
]
}
]
}
]
}
Whoever if I add a "null" as the first type of the union this the succeeds. Do avro union types require a "null"? In my case this would be an incorrect representation of data so I'm not keen on doing it.
If it makes any difference I'm using avro 1.9.1.
Also, apologies if the tags are incorrect but couldn't find a hortonworks-schema-registry tag and don't have enough rep to create a new one.
Turns out if was an issue with hortonwork's schema registry.
This has actually already been fixed here and I've requested a new release here. Hopefully this happens soon.

Debezium might produce invalid schema

I face an issue with Avro and Schema Registry. After Debezium created a schema and a topic, I have downloaded the schema from Schema Registry. I put it into a .asvc file and it looks like this:
{
"type": "record",
"name": "Envelope",
"namespace": "my.namespace",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "MyValue",
"fields": [
{
"name": "id",
"type": "int"
}
]
}
],
"default": null
},
{
"name": "after",
"type": [
"null",
"MyValue"
],
"default": null
}
]
}
I ran two experiments:
I tried to put it back into Schema Registry but I get this error: MyValue is not correct. When I remove "after" record, the schema seems to work well.
I used 'generate-sources' from avro-maven-plugin to generate the Java classes. When I try to consume the topic above, I see this error:
Exception in thread "b2-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. [...]: Error registering Avro schema: [...]
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409
Did anyone faced the same problem? Is it Debezium that is producing an invalid schema or is Schema Registry that is having a bug?
MyValue is not correct.
That's not an Avro type. You would have to embed the actual record within the union, just like the before value
In other words, you're not able to cross reference the record types within a schema, AFAIK
When I try to consume the topic above, I see this error:
A consumer does not register schemas, so it's not clear how you're getting that error unless maybe using Kafka Streams, which produces into intermediate topics

Confluent Schema Registry timed out error

I'm using Avro schema to write data to Kafka topic. Initially, everything worked fine. After adding one more new field(scan_app_id) in avro file. I'm facing this error.
Avro file:
{
"type": "record", "name": "Initiate_Scan", "namespace": "avro",
"doc": "Avro schema registry for Initiate_Scan", "fields": [
{
"name": "app_id",
"type": "string",
"doc": "3 digit application id"
},
{
"name": "app_name",
"type": "string",
"doc": "application name"
},
{
"name": "dev_stage",
"type": "string",
"doc": "development stage"
},
{
"name": "scan_app_id",
"type": "string",
"doc": "unique scan id for an app in Veracode"
},
{
"name": "scan_name",
"type": "string",
"doc": "scan details"
},
{
"name": "seq_num",
"type": "int",
"doc": "unique number"
},
{
"name": "result_flg",
"type": "string",
"doc": "Y indicates results of scan available",
"default": "Y"
},
{
"name": "request_id",
"type": "int",
"doc": "unique id"
},
{
"name": "scan_number",
"type": "int",
"doc": "number of scans"
} ] }
Error:
Caused by: org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema:
{"type":"record","name":"Initiate_Scan","namespace":"avro","doc":"Avro
schema registry for
Initiate_Scan","fields":[{"name":"app_id","type":{"type":"string","avro.java.string":"String"},"doc":"3
digit application
id"},{"name":"app_name","type":{"type":"string","avro.java.string":"String"},"doc":"application
name"},{"name":"dev_stage","type":{"type":"string","avro.java.string":"String"},"doc":"development
stage"},{"name":"scan_app_id","type":{"type":"string","avro.java.string":"String"},"doc":"unique
scan id for an
App"},{"name":"scan_name","type":{"type":"string","avro.java.string":"String"},"doc":"scan
details"},{"name":"seq_num","type":"int","doc":"unique
number"},{"name":"result_flg","type":{"type":"string","avro.java.string":"String"},"doc":"Y
indicates results of scan
available","default":"Y"},{"name":"request_id","type":"int","doc":"unique
id"},{"name":"scan_number","type":"int","doc":"number of scans"}]}
INFO Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer:1017)
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Register operation timed out; error code: 50002
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:292)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:284)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:279)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:61)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:93)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:72)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:54)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:768)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:745)
at com.ssc.svc.svds.initiate.InitiateProducer.initiateScanData(InitiateProducer.java:146)
at com.ssc.svc.svds.initiate.InitiateProducer.topicsData(InitiateProducer.java:41)
at com.ssc.svc.svds.initiate.InputData.main(InputData.java:31)
I went through Confluent documentation about 50002 error, which says
A schema should be compatible with the previously registered schema.
Does this mean I cannot make changes/update existing schema ?
How to fix this?
Actually, the link says 50002 -- Operation timed out. If it was indeed incompatible, the response would actually say so.
In any case, if you add a new field, you are required to define a default value.
This way, any consumers defined with a newer schema that are reading older messages know what value to set to that field.
A straight-forward list of allowed Avro changes I found is by Oracle
Possible errors are:
A field is added without a default value