Cannot invoke "Object.getClass()" because "datum" is null of string in field 'schemaVersionId' - Confluent Schema Registry - scala

I have an avro schema for which I am generating the Java bean using the avro-maven-plugin. I am then instantiating it and sending it to Kafka using also Confluent's Schema Registry. I can also consume and deserialise the avro into a Spark DataFrame just fine. The problem I am facing is that it is forcing me to set the schemaVersionId at the producer level. If I don't set it, the KafkaAvroSerializer will throw the error in the title. Any ideas please?
val contractEvent: ContractEvent = new ContractEvent()
contractEvent.setSchemaVersionId("1") // This should be appended as the schema is auto-registered.
contractEvent.setIngestedAt("123")
contractEvent.setChangeType(changeTypeEnum.U)
contractEvent.setServiceName("Contract")
contractEvent.setPayload(avroContract)
{
"type": "record",
"namespace": "xxxxxxx",
"name": "ContractEvent",
"fields": [
{"name": "ingestedAt", "type": "string"},
{"name": "eventType",
"type": {
"name": "eventTypeEnum",
"type": "enum", "symbols" : ["U", "D", "B"]
}
},
{"name": "serviceName", "type": "string"},
{"name": "payload",
"type": {
"type": "record",
"name": "Contract",
"fields": [
{"name": "identifier", "type": "string"},
{"name": "createdBy", "type": "string"},
{"name": "createdDate", "type": "string"},
]
}
}
]
}

Related

How can I save Kafka message Key in document for MongoDB Sink?

Right now I have a MongoDB Sink and it saves the value of incoming AVRO messages correctly.
I need it to save the Kafka Message Key in the document.
I have tried org.apache.kafka.connect.transforms.HoistField$Key in order to add the key to the value that is being saved, but this did nothing. It did work when using ProvidedInKeyStrategy, but I don't want my _id to be the Kafka message Key.
My configuration:
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"connection.uri": "mongodb://mongo1",
"database": "mongodb",
"collection": "sink",
"topics": "topics.foo",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"transforms": "hoistKey",
"transforms.hoistKey.type":"org.apache.kafka.connect.transforms.HoistField$Key",
"transforms.hoistKey.field":"kafkaKey"
}
Kafka message schema:
{
"type": "record",
"name": "Smoketest",
"namespace": "some_namespace",
"fields": [
{
"name": "timestamp",
"type": "int",
"logicalType": "timestamp-millis"
}
]
}
Kafka key schema:
[
{
"type": "enum",
"name": "EnvironmentType",
"namespace": "some_namespace",
"doc": "DEV",
"symbols": [
"Dev",
"Test",
"Accept",
"Sandbox",
"Prod"
]
},
{
"type": "record",
"name": "Key",
"namespace": "some_namespace",
"doc": "The standard Key type that is used as key",
"fields": [
{
"name": "conversation_id",
"doc": "The first system producing an event sets this field",
"type": "string"
},
{
"name": "broker_key",
"doc": "The key of the broker",
"type": "string"
},
{
"name": "user_id",
"doc": "User identification",
"type": [
"null",
"string"
]
},
{
"name": "application",
"doc": "The name of the application",
"type": [
"null",
"string"
]
},
{
"name": "environment",
"doc": "The type of environment",
"type": "type.EnvironmentType"
}
]
}
]
Using https://github.com/f0xdx/kafka-connect-wrap-smt I can now wrap all the data from the kafka message into a single document to save in my mongodb sink.

Kafka & Avro producer - Schema being registered is incompatible with an earlier schema for subject

Im running schema-registry-confluent example with my local and I got an error when I modified the schema of the message:
This is my schema:
{
"type": "record",
"namespace": "io.confluent.tutorial.pojo.avro",
"name": "OrderDetail",
"fields": [
{
"name": "number",
"type": "long",
"doc": "The order number."
},
{
"name": "date",
"type": "long",
"logicalType": "date",
"doc": "The date the order was submitted."
},
{
"name": "client",
"type": {
"type": "record",
"name": "Client",
"fields": [
{ "name": "code", "type": "string" }
]
}
}
]
}
And I tried to send this message on the producer:
{"number": 2343434, "date": 1596490462, "client": {"code": "1234"}}
But I got this error:
org.apache.kafka.common.errors.InvalidConfigurationException: Schema being registered is incompatible with an earlier schema for subject "example-topic-avro-value"; error code: 409

Confluent Kafka producer message format for nested records

I have a AVRO schema registered in a kafka topic and am trying to send data to it. The schema has nested records and I'm not sure how I correctly send data to it using confluent_kafka python.
Example schema: *ingore any typos in schema (real one is very large, just an example)
{
"namespace": "company__name",
"name": "our_data",
"type": "record",
"fields": [
{
"name": "datatype1",
"type": ["null", {
"type": "record",
"name": "datatype1_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
{
"name": "datatype2",
"type": ["null", {
"type": "record",
"name": "datatype2_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
]
}
I am trying to send data to this schema using confluent_kafka python version. When I have done this prior, the records were not nested and I would use a typical dictionary key: value pairs and serialize it. How can I send nested data to work with schema.
What I tried so far...
message = {'datatype1':
{'site': 'sitename',
'units': 'm'
}
}
this version does not cause any kafka errors, but the all of the columns show up as null
and...
message = {'datatype1':
{'datatype1_1':
{'site': 'sitename',
'units': 'm'
}
}
}
This version produced a kafka error with the schema.
If you use namespaces, you don't have to worry about naming collisions and you can properly structure your optional records:
for example, both
{
"meta": {
"instanceID": "something"
}
}
And
{}
are valid instances of:
{
"doc": "Survey",
"name": "Survey",
"type": "record",
"fields": [
{
"name": "meta",
"type": [
"null",
{
"name": "meta",
"type": "record",
"fields": [
{
"name": "instanceID",
"type": [
"null",
"string"
],
"namespace": "Survey.meta"
}
],
"namespace": "Survey"
}
],
"namespace": "Survey"
}
]
}

Tags are not written in influxdb through kafka-connect-influxdb

I am trying to connect kafka sink to influxdb. While it works but it does not save tags. For example if i send this to kafka topic
{"id": 1, "product": "pencil", "quantity": 100, "price": 50, "tags" : {"DEVICE": "living", "location": "home"}}`
Data is saved to influxdb but only the fields part.
I have been trying to debug this but failed. The versions i am using:
kafka 2.11-2.4.0
influxdb: 1.7.7
I encountered this too when I followed the Avro tags example on this page:
https://docs.confluent.io/kafka-connect-influxdb/current/influx-db-sink-connector/index.html
The "tags" schema in the example was incorrect. The example defines tags as:
{
"name": "tags",
"type": {
"name": "tags",
"type": "record",
"fields": [{
"name": "DEVICE",
"type": "string"
}, {
"name": "location",
"type": "string"
}]
}
}
It should actually be
{
"name": "tags",
"type": {
"type": "map",
"values": "string"
}
}
This web page provided the solution: https://rmoff.net/2020/01/23/notes-on-getting-data-into-influxdb-from-kafka-with-kafka-connect/

kafka: producer and consumer with different avro file

I am processing 2 different avro files:
avroConsumer:
{"namespace": "autoGenerated.avro",
"type": "record",
"name": "UserConsumer",
"fields": [
{"name": "Name", "type": "string"},
{"name": "Surname", "type":["null","string"],"default": null},
{"name": "favorite_number", "type": ["long", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
avroProducer:
{"namespace": "autoGenerated.avro",
"type": "record",
"name": "UserProducer",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
On compiling procedure a deserialization error occurs but I thought that defining the "default" attribute in the consumer should make it work correctly.
Reference: http://avro.apache.org/docs/current/spec.html#Schema+Resolution
if the reader's record schema has a field that contains a default
value, and writer's schema does not have a field with the same name,
then the reader should use the default value from its field.
Do you have some ideas? Can I define a different consumer avro file than the producer avro file?