Cannot invoke "Object.getClass()" because "datum" is null of string in field 'schemaVersionId' - Confluent Schema Registry

Cannot invoke "Object.getClass()" because "datum" is null of string in field 'schemaVersionId' - Confluent Schema Registry - scala

I have an avro schema for which I am generating the Java bean using the avro-maven-plugin. I am then instantiating it and sending it to Kafka using also Confluent's Schema Registry. I can also consume and deserialise the avro into a Spark DataFrame just fine. The problem I am facing is that it is forcing me to set the schemaVersionId at the producer level. If I don't set it, the KafkaAvroSerializer will throw the error in the title. Any ideas please?
val contractEvent: ContractEvent = new ContractEvent()
contractEvent.setSchemaVersionId("1") // This should be appended as the schema is auto-registered.
contractEvent.setIngestedAt("123")
contractEvent.setChangeType(changeTypeEnum.U)
contractEvent.setServiceName("Contract")
contractEvent.setPayload(avroContract)
{
"type": "record",
"namespace": "xxxxxxx",
"name": "ContractEvent",
"fields": [
{"name": "ingestedAt", "type": "string"},
{"name": "eventType",
"type": {
"name": "eventTypeEnum",
"type": "enum", "symbols" : ["U", "D", "B"]
}
},
{"name": "serviceName", "type": "string"},
{"name": "payload",
"type": {
"type": "record",
"name": "Contract",
"fields": [
{"name": "identifier", "type": "string"},
{"name": "createdBy", "type": "string"},
{"name": "createdDate", "type": "string"},
]
}
}
]
}

Related

How can I save Kafka message Key in document for MongoDB Sink?

Right now I have a MongoDB Sink and it saves the value of incoming AVRO messages correctly.
I need it to save the Kafka Message Key in the document.
I have tried org.apache.kafka.connect.transforms.HoistField$Key in order to add the key to the value that is being saved, but this did nothing. It did work when using ProvidedInKeyStrategy, but I don't want my _id to be the Kafka message Key.
My configuration:
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"connection.uri": "mongodb://mongo1",
"database": "mongodb",
"collection": "sink",
"topics": "topics.foo",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"transforms": "hoistKey",
"transforms.hoistKey.type":"org.apache.kafka.connect.transforms.HoistField$Key",
"transforms.hoistKey.field":"kafkaKey"
}
Kafka message schema:
{
"type": "record",
"name": "Smoketest",
"namespace": "some_namespace",
"fields": [
{
"name": "timestamp",
"type": "int",
"logicalType": "timestamp-millis"
}
]
}
Kafka key schema:
[
{
"type": "enum",
"name": "EnvironmentType",
"namespace": "some_namespace",
"doc": "DEV",
"symbols": [
"Dev",
"Test",
"Accept",
"Sandbox",
"Prod"
]
},
{
"type": "record",
"name": "Key",
"namespace": "some_namespace",
"doc": "The standard Key type that is used as key",
"fields": [
{
"name": "conversation_id",
"doc": "The first system producing an event sets this field",
"type": "string"
},
{
"name": "broker_key",
"doc": "The key of the broker",
"type": "string"
},
{
"name": "user_id",
"doc": "User identification",
"type": [
"null",
"string"
]
},
{
"name": "application",
"doc": "The name of the application",
"type": [
"null",
"string"
]
},
{
"name": "environment",
"doc": "The type of environment",
"type": "type.EnvironmentType"
}
]
}
]

Using https://github.com/f0xdx/kafka-connect-wrap-smt I can now wrap all the data from the kafka message into a single document to save in my mongodb sink.

Kafka & Avro producer - Schema being registered is incompatible with an earlier schema for subject

Im running schema-registry-confluent example with my local and I got an error when I modified the schema of the message:
This is my schema:
{
"type": "record",
"namespace": "io.confluent.tutorial.pojo.avro",
"name": "OrderDetail",
"fields": [
{
"name": "number",
"type": "long",
"doc": "The order number."
},
{
"name": "date",
"type": "long",
"logicalType": "date",
"doc": "The date the order was submitted."
},
{
"name": "client",
"type": {
"type": "record",
"name": "Client",
"fields": [
{ "name": "code", "type": "string" }
]
}
}
]
}
And I tried to send this message on the producer:
{"number": 2343434, "date": 1596490462, "client": {"code": "1234"}}
But I got this error:
org.apache.kafka.common.errors.InvalidConfigurationException: Schema being registered is incompatible with an earlier schema for subject "example-topic-avro-value"; error code: 409

Confluent Kafka producer message format for nested records

I have a AVRO schema registered in a kafka topic and am trying to send data to it. The schema has nested records and I'm not sure how I correctly send data to it using confluent_kafka python.
Example schema: *ingore any typos in schema (real one is very large, just an example)
{
"namespace": "company__name",
"name": "our_data",
"type": "record",
"fields": [
{
"name": "datatype1",
"type": ["null", {
"type": "record",
"name": "datatype1_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
{
"name": "datatype2",
"type": ["null", {
"type": "record",
"name": "datatype2_1",
"fields": [
{"name": "site", "type": "string"},
{"name": "units", "type": "string"}
]
}]
"default": null
}
]
}
I am trying to send data to this schema using confluent_kafka python version. When I have done this prior, the records were not nested and I would use a typical dictionary key: value pairs and serialize it. How can I send nested data to work with schema.
What I tried so far...
message = {'datatype1':
{'site': 'sitename',
'units': 'm'
}
}
this version does not cause any kafka errors, but the all of the columns show up as null
and...
message = {'datatype1':
{'datatype1_1':
{'site': 'sitename',
'units': 'm'
}
}
}
This version produced a kafka error with the schema.

If you use namespaces, you don't have to worry about naming collisions and you can properly structure your optional records:
for example, both
{
"meta": {
"instanceID": "something"
}
}
And
{}
are valid instances of:
{
"doc": "Survey",
"name": "Survey",
"type": "record",
"fields": [
{
"name": "meta",
"type": [
"null",
{
"name": "meta",
"type": "record",
"fields": [
{
"name": "instanceID",
"type": [
"null",
"string"
],
"namespace": "Survey.meta"
}
],
"namespace": "Survey"
}
],
"namespace": "Survey"
}
]
}

Tags are not written in influxdb through kafka-connect-influxdb

I am trying to connect kafka sink to influxdb. While it works but it does not save tags. For example if i send this to kafka topic
{"id": 1, "product": "pencil", "quantity": 100, "price": 50, "tags" : {"DEVICE": "living", "location": "home"}}`
Data is saved to influxdb but only the fields part.
I have been trying to debug this but failed. The versions i am using:
kafka 2.11-2.4.0
influxdb: 1.7.7

I encountered this too when I followed the Avro tags example on this page:
https://docs.confluent.io/kafka-connect-influxdb/current/influx-db-sink-connector/index.html
The "tags" schema in the example was incorrect. The example defines tags as:
{
"name": "tags",
"type": {
"name": "tags",
"type": "record",
"fields": [{
"name": "DEVICE",
"type": "string"
}, {
"name": "location",
"type": "string"
}]
}
}
It should actually be
{
"name": "tags",
"type": {
"type": "map",
"values": "string"
}
}
This web page provided the solution: https://rmoff.net/2020/01/23/notes-on-getting-data-into-influxdb-from-kafka-with-kafka-connect/

kafka: producer and consumer with different avro file

I am processing 2 different avro files:
avroConsumer:
{"namespace": "autoGenerated.avro",
"type": "record",
"name": "UserConsumer",
"fields": [
{"name": "Name", "type": "string"},
{"name": "Surname", "type":["null","string"],"default": null},
{"name": "favorite_number", "type": ["long", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
avroProducer:
{"namespace": "autoGenerated.avro",
"type": "record",
"name": "UserProducer",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
On compiling procedure a deserialization error occurs but I thought that defining the "default" attribute in the consumer should make it work correctly.
Reference: http://avro.apache.org/docs/current/spec.html#Schema+Resolution
if the reader's record schema has a field that contains a default
value, and writer's schema does not have a field with the same name,
then the reader should use the default value from its field.
Do you have some ideas? Can I define a different consumer avro file than the producer avro file?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Cannot invoke "Object.getClass()" because "datum" is null of string in field 'schemaVersionId' - Confluent Schema Registry - scala

Related

How can I save Kafka message Key in document for MongoDB Sink?

Kafka & Avro producer - Schema being registered is incompatible with an earlier schema for subject

Confluent Kafka producer message format for nested records

Tags are not written in influxdb through kafka-connect-influxdb

kafka: producer and consumer with different avro file

Categories

Resources