Problem producing Avro serialized object through kafka-avro-console-producer - apache-kafka

I'm producing a message by using the kafka-avro-console-producer binary by doing:
kafka-avro-console-producer --broker-list broker:9092 --topic example-topic --property schema.registry.url='http://schema-registry:8081 --property value.schema='{"type": "record","name": "test","fields": [{"name": "before", "type": ["null", {"type": "record", "name": "columns", "fields":[{"name": "name", "type": "string"}]}],"default": "null"},{"name": "after", "type": ["null", "columns"],"default": "null"}]}'
{"before": null,"after": {"name": "John"}}'
sending the following message:
{"before": null,"after": {"name": "John"}}
and by appling the following Avro schema:
{
"type": "record",
"name": "test",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "columns",
"fields": [{
"name": "name",
"type": "string"
}]
}],
"default": "null"
}, {
"name": "after",
"type": ["null", "columns"],
"default": "null"
}]
}
the error I'm getting in return is the following:
Caused by: org.apache.avro.AvroTypeException: Unknown union branch name
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
at io.confluent.kafka.formatter.AvroMessageReader.jsonToAvro(AvroMessageReader.java:213)
at io.confluent.kafka.formatter.AvroMessageReader.readMessage(AvroMessageReader.java:180)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:55)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
For those of you that are willing to go deeper into the rabbit hole, I'm making an integration between Oracle Golden Gate and Apache Kafka by using the Oracle Golden Gate Big Data connector. I'm currently experiencing problems with an equivalent model of the one described in here:
https://www.ateam-oracle.com/oracle-goldengate-big-data-adapter-apache-kafka-producer
When trying to apply the schema described in the above webpage to it's corresponding model(and after completing the JSON model), I'm getting the same error as the one I'm getting with the model and schema in the question.
Thank you all very much.

This is the problem
"type": ["null", "columns"]
You cannot refer back to other record types. You'll need to expand that out like you did for the other field

Related

JDBC sink topic with multiple structs to postgres

I am trying to sink a few topics top a postgres database. However the topic schema defines a array at the top level and within it multiple structs. Automapping does not work and I cannot find any reference how to handle this. I need all structs because they are dependent types, the second struct references the first struct as a field.
Currently it breaks when hitting the 2nd struct stating statusChangeEvent (struct) has no mapping to sql column type. This because it is using auto.create to make a table (probably called ProcessStatus) then when hitting the second entry there is no column of course.
[
{
"type": "record",
"name": "processStatus",
"namespace": "company.some.process",
"fields": [
{
"name": "code",
"doc": "The code of the processStatus",
"type": "string"
},
{
"name": "name",
"doc": "The name of the processStatus",
"type": "string"
},
{
"name": "description",
"type": "string"
},
{
"name": "isCompleted",
"type": "boolean"
},
{
"name": "isSuccessfullyCompleted",
"type": "boolean"
}
]
},
{
"type": "record",
"name": "StatusChangeEvent",
"namespace": "company.some.process",
"fields": [
{
"name": "contNumber",
"type": "string"
},
{
"name": "processId",
"type": "string"
},
{
"name": "processVersion",
"type": "int"
},
{
"name": "extProcessId",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "fromStatus",
"type": "process.status"
},
{
"name": "toStatus",
"doc": "The new status of the process",
"type": "company.some.process.processStatus"
},
{
"name": "changeDateTime",
"type": "long",
"logicalType": "timestamp-millis"
},
{
"name": "isPublic",
"type": "boolean"
}
]
}
]
I am not using ksql atm. Which connector settings are suited for this task? If there is a ksql alternative it would be nice to know but the current requirement is to use the JDBC connector.
I tried using flatten but it does not support struct fields that have a schema. Which seems kind of weird. Aren't schema's the whole selling point of connect with kafka? Or is it more of a constraint you have to work around?
Aren't schema's the whole selling point of connect with kafka?
Yes, but Postgres (or the JDBC Sink, in general) doesn't really support nested objects within columns. For that, you're better off with a document database, such as using Mongo Sink Connector.
Which connector settings are suited for this task?
None, really, other than transforms. You could write your own if flatten doesn't work.
You could try pre-defining your table to use JSONB for the two status columns, however, that's more of a workaround.

Avro union field with "local-timestamp-millis" deserialization issue

Avro: 10.1
Dataflow (Apache Beam): 2.28.0
Runner: org.apache.beam.runners.dataflow.DataflowRunner
Avro schema piece:
{
"name": "client_timestamp",
"type": [
"null",
{ "type": "long", "logicalType": "local-timestamp-millis" }
],
"default": null,
"doc": "Client side timestamp of this xxx"
},
An exception when writing Avro output file:
Caused by: org.apache.avro.UnresolvedUnionException:
Not in union ["null",{"type":"long","logicalType":"local-timestamp-millis"}]:
2021-03-12T12:21:17.599
Link to a longer stacktrace
Some of the steps taken:
Replacing "logicalType":"local-timestamp-millis" with "logicalType":"timestamp-millis" causes the same error.
Writing Avro locally also works.
Removing "type": "null" option eliminates the exception
Try something like following:
{
"name": "client_timestamp",
"type": ["null", "long"],
"doc": "Client side timestamp of this xxx",
"default": null,
"logicalType": "timestamp-millis"
}
It is a way to define logicalType in avro schema (link).

Avro invalid default for union field

I'm trying to serialise and then write to the hortonworks schema registry an avro schema but I'm getting the following error message during the write operation.
Caused by: java.lang.RuntimeException: An exception was thrown while processing request with message: [Invalid default for field viewingMode: null not a [{"type":"record","name":"aName","namespace":"domain.assembled","fields":[{"name":"aKey","type":"string"}]},{"type":"record","name":"anotherName","namespace":"domain.assembled","fields":[{"name":"anotherKey","type":"string"},{"name":"yetAnotherKey","type":"string"}]}]]
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.handleSchemaIdVersionResponse(SchemaRegistryClient.java:678)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.doAddSchemaVersion(SchemaRegistryClient.java:664)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.lambda$addSchemaVersion$1(SchemaRegistryClient.java:591)
This is the avro schema
{
"type": "record",
"name": "aSchema",
"namespace": "domain.assembled",
"fields": [
{
"name": "viewingMode",
"type": [
{
"name": "aName",
"type": "record",
"fields": [
{"name": "aKey","type": "string"}
]
},
{
"name": "anotherName",
"type": "record",
"fields": [
{"name": "anotherKey","type": "string"},
{"name": "yetAnotherKey","type": "string"}
]
}
]
}
]
}
Whoever if I add a "null" as the first type of the union this the succeeds. Do avro union types require a "null"? In my case this would be an incorrect representation of data so I'm not keen on doing it.
If it makes any difference I'm using avro 1.9.1.
Also, apologies if the tags are incorrect but couldn't find a hortonworks-schema-registry tag and don't have enough rep to create a new one.
Turns out if was an issue with hortonwork's schema registry.
This has actually already been fixed here and I've requested a new release here. Hopefully this happens soon.

How to use Schema registry for Kafka Connect AVRO

I have started exploring Kafka and Kafka connect recently and did some initial set up .
But wanted to explore more on schema registry part .
My schema registry is started now what i should do .
I have a AVRO schema stored in avro_schema.avsc.
here is the schema
{
"name": "FSP-AUDIT-EVENT",
"type": "record",
"namespace": "com.acme.avro",
"fields": [
{
"name": "ID",
"type": "string"
},
{
"name": "VERSION",
"type": "int"
},
{
"name": "ACTION_TYPE",
"type": "string"
},
{
"name": "EVENT_TYPE",
"type": "string"
},
{
"name": "CLIENT_ID",
"type": "string"
},
{
"name": "DETAILS",
"type": "string"
},
{
"name": "OBJECT_TYPE",
"type": "string"
},
{
"name": "UTC_DATE_TIME",
"type": "long"
},
{
"name": "POINT_IN_TIME_PRECISION",
"type": "string"
},
{
"name": "TIME_ZONE",
"type": "string"
},
{
"name": "TIMELINE_PRECISION",
"type": "string"
},
{
"name": "AUDIT_EVENT_TO_UTC_DT",
"type": [
"string",
"null"
]
},
{
"name": "AUDIT_EVENT_TO_DATE_PITP",
"type": "string"
},
{
"name": "AUDIT_EVENT_TO_DATE_TZ",
"type": "string"
},
{
"name": "AUDIT_EVENT_TO_DATE_TP",
"type": "string"
},
{
"name": "GROUP_ID",
"type": "string"
},
{
"name": "OBJECT_DISPLAY_NAME",
"type": "string"
},
{
"name": "OBJECT_ID",
"type": [
"string",
"null"
]
},
{
"name": "USER_DISPLAY_NAME",
"type": [
"string",
"null"
]
},
{
"name": "USER_ID",
"type": "string"
},
{
"name": "PARENT_EVENT_ID",
"type": [
"string",
"null"
]
},
{
"name": "NOTES",
"type": [
"string",
"null"
]
},
{
"name": "SUMMARY",
"type": [
"string",
"null"
]
}
]
}
Is my schema is valid .I converted it online from JSON ?
where should i keep this schema file location i am not sure about .
Please guide me with the step to follow
.
I am sending records from Lambda function and from JDBC source both .
So basically how can i enforce AVRO schema and test ?
Do i have to change anything in avro-consumer properties file ?
Or is this correct way to register schema
./bin/kafka-avro-console-producer \
--broker-list b-3.**:9092,b-**:9092,b-**:9092 --topic AVRO-AUDIT_EVENT \
--property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema" : "{\"type\":\"struct\",\"fields\":[{\"type\":\"string\",\"optional\":false,\"field\":\"ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"VERSION\"},{\"type\":\"string\",\"optional\":true,\"field\":\"ACTION_TYPE\"},{\"type\":\"string\",\"optional\":true,\"field\":\"EVENT_TYPE\"},{\"type\":\"string\",\"optional\":true,\"field\":\"CLIENT_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"DETAILS\"},{\"type\":\"string\",\"optional\":true,\"field\":\"OBJECT_TYPE\"},{\"type\":\"string\",\"optional\":true,\"field\":\"UTC_DATE_TIME\"},{\"type\":\"string\",\"optional\":true,\"field\":\"POINT_IN_TIME_PRECISION\"},{\"type\":\"string\",\"optional\":true,\"field\":\"TIME_ZONE\"},{\"type\":\"string\",\"optional\":true,\"field\":\"TIMELINE_PRECISION\"},{\"type\":\"string\",\"optional\":true,\"field\":\"GROUP_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"OBJECT_DISPLAY_NAME\"},{\"type\":\"string\",\"optional\":true,\"field\":\"OBJECT_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"USER_DISPLAY_NAME\"},{\"type\":\"string\",\"optional\":true,\"field\":\"USER_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"PARENT_EVENT_ID\"},{\"type\":\"string\",\"optional\":true,\"field\":\"NOTES\"},{\"type\":\"string\",\"optional\":true,\"field\":\"SUMMARY\"},{\"type\":\"string\",\"optional\":true,\"field\":\"AUDIT_EVENT_TO_UTC_DT\"},{\"type\":\"string\",\"optional\":true,\"field\":\"AUDIT_EVENT_TO_DATE_PITP\"},{\"type\":\"string\",\"optional\":true,\"field\":\"AUDIT_EVENT_TO_DATE_TZ\"},{\"type\":\"string\",\"optional\":true,\"field\":\"AUDIT_EVENT_TO_DATE_TP\"}],\"optional\":false,\"name\":\"test\"}"}' http://localhost:8081/subjects/view/versions
what next i have to do
But when i try to see my schema i get only below
curl --silent -X GET http://localhost:8081/subjects/AVRO-AUDIT-EVENT/versions/latest
this is the result
{"subject":"AVRO-AUDIT-EVENT","version":1,"id":161,"schema":"{\"type\":\"string\",\"optional\":false}"}
Why i do not see my full registered schema
Also when i try to delete schema
i get below error
{"error_code":405,"message":"HTTP 405 Method Not Allowed"
i am not sure if my schema is registered correctly .
Please help me.
Thanks in Advance
is my schema valid
You can use the REST API of the Registry to try and submit it and see...
where should i keep this schema file location i am not sure about
It's not clear how you're sending messages...
If you actually wrote Kafka producer code, you store it within your code (as a string) or as a resource file.. If using Java, you can instead use the SchemaBuilder class to create the Schema object
You need to rewrite your producer to use Avro Schema and Serializers if you've not already
If we create AVRO schema will it work for Json as well .
Avro is a Binary format, but there is a JSONDecoder for it.
what should be the URL for our AVRO schema properties file ?
It needs to be the IP of your Schema Registry once you figure out how to start it. (with schema-registry-start)
Do i have to change anything in avro-consumer properties file ?
You need to use the Avro Deserializer
is this correct way to register schema
.> /bin/kafka-avro-console-producer \
Not quite. That's how you produce a message with a schema (and you need to use the correct schema). You also must provide --property schema.registry.url
You use the REST API of the Registry to register and verify schemas

Confluent Schema Registry timed out error

I'm using Avro schema to write data to Kafka topic. Initially, everything worked fine. After adding one more new field(scan_app_id) in avro file. I'm facing this error.
Avro file:
{
"type": "record", "name": "Initiate_Scan", "namespace": "avro",
"doc": "Avro schema registry for Initiate_Scan", "fields": [
{
"name": "app_id",
"type": "string",
"doc": "3 digit application id"
},
{
"name": "app_name",
"type": "string",
"doc": "application name"
},
{
"name": "dev_stage",
"type": "string",
"doc": "development stage"
},
{
"name": "scan_app_id",
"type": "string",
"doc": "unique scan id for an app in Veracode"
},
{
"name": "scan_name",
"type": "string",
"doc": "scan details"
},
{
"name": "seq_num",
"type": "int",
"doc": "unique number"
},
{
"name": "result_flg",
"type": "string",
"doc": "Y indicates results of scan available",
"default": "Y"
},
{
"name": "request_id",
"type": "int",
"doc": "unique id"
},
{
"name": "scan_number",
"type": "int",
"doc": "number of scans"
} ] }
Error:
Caused by: org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema:
{"type":"record","name":"Initiate_Scan","namespace":"avro","doc":"Avro
schema registry for
Initiate_Scan","fields":[{"name":"app_id","type":{"type":"string","avro.java.string":"String"},"doc":"3
digit application
id"},{"name":"app_name","type":{"type":"string","avro.java.string":"String"},"doc":"application
name"},{"name":"dev_stage","type":{"type":"string","avro.java.string":"String"},"doc":"development
stage"},{"name":"scan_app_id","type":{"type":"string","avro.java.string":"String"},"doc":"unique
scan id for an
App"},{"name":"scan_name","type":{"type":"string","avro.java.string":"String"},"doc":"scan
details"},{"name":"seq_num","type":"int","doc":"unique
number"},{"name":"result_flg","type":{"type":"string","avro.java.string":"String"},"doc":"Y
indicates results of scan
available","default":"Y"},{"name":"request_id","type":"int","doc":"unique
id"},{"name":"scan_number","type":"int","doc":"number of scans"}]}
INFO Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer:1017)
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Register operation timed out; error code: 50002
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:292)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:284)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:279)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:61)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:93)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:72)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:54)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:768)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:745)
at com.ssc.svc.svds.initiate.InitiateProducer.initiateScanData(InitiateProducer.java:146)
at com.ssc.svc.svds.initiate.InitiateProducer.topicsData(InitiateProducer.java:41)
at com.ssc.svc.svds.initiate.InputData.main(InputData.java:31)
I went through Confluent documentation about 50002 error, which says
A schema should be compatible with the previously registered schema.
Does this mean I cannot make changes/update existing schema ?
How to fix this?
Actually, the link says 50002 -- Operation timed out. If it was indeed incompatible, the response would actually say so.
In any case, if you add a new field, you are required to define a default value.
This way, any consumers defined with a newer schema that are reading older messages know what value to set to that field.
A straight-forward list of allowed Avro changes I found is by Oracle
Possible errors are:
A field is added without a default value