org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null" - apache-kafka

I am trying to use the confluent Kafka s3 connector using confluent-4.1.1.
s3-sink
"value.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
When I run Kafka connectors for the s3 sink, I get this error message:
ERROR WorkerSinkTask{id=singular-s3-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"
at io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1649)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1562)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1323)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:1047)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
My Schema contains only 1 array type field and its schema is like this
{"name":"item_id","type":{"type":"array","items":["null","string"]},"default":[]}
I am able to see the deserialized message using the kafka-avro-console-consumer command. I have seen a similar question but in his case, he was using Avro serializer for key also.
./confluent-4.1.1/bin/kafka-avro-console-consumer --topic singular_custom_postback --bootstrap-server localhost:9092 -max-messages 2
"item_id":[{"string":"15552"},{"string":"37810"},{"string":"38061"}]
"item_id":[]
I cannot put the entire output I get from the console consumer as it contains sensitive user information, so I have added the only array type field in my schema.
Thanks in advance.

The io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1649) is called for the conversion of avro schema of the message you read to the connect sink's internal schema. I believe it is not related to the data of your message. That is why the AbstractKafkaAvroDeserializer can successfully deserialise your message (e.g. via kafka-avro-console-consumer), as your message is a valid avro message. The above exception may occur if your default value is null, while null is not a valid value of your field. E.g.
{
"name":"item_id",
"type":{
"type":"array",
"items":[
"string"
]
},
"default": null
}
I would propose you to remotely debug connect and see what exactly is failing.

Same problem as the question that you have linked to.
In the source code, you can see this condition.
case ARRAY: {
if (!jsonValue.isArray()) {
throw new DataException("Invalid JSON for array default value: " + jsonValue.toString());
}
And the exception can be thrown when the schema type is defined in your case as type:"array", but the payload itself has a null value (or any other value type) rather than actually an array, despite what you have defined as your schema default value. The default is only applied when the items element isn't there at all, not when "items":null
Other than that, I would suggest a schema like so, i.e. a record object, not just a named array, with a default of an empty array, not null.
{
"type" : "record",
"name" : "Items",
"namespace" : "com.example.avro",
"fields" : [ {
"name" : "item_id",
"type" : {
"type" : "array",
"items" : [ "null", "string" ]
},
"default": []
} ]
}

Related

Kafka connect RabbitMQ unable to use insert field transform: Only Struct objects supported for [field insertion], found: [B

I'm trying to use the InsertField kafka connect transformation with rabbitmq connector.
my configuration:
"config": {
"connector.class": "io.confluent.connect.rabbitmq.RabbitMQSourceConnector",
"confluent.topic.bootstrap.servers": "kafka:29092",
"topic.creation.default.replication.factor": 1,
"topic.creation.default.partitions": 1,
"tasks.max": "2",
"kafka.topic": "test",
"rabbitmq.queue": "events",
"rabbitmq.host": "rabbitmq",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms": "InsertField",
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertField.static.field": "MessageSource",
"transforms.InsertField.static.value": "Kafka Connect framework"
}
I have also tried using BytesArrayConverter as the value. Using python, I send a message as follows:
msg = json.dumps(body)
self.channel.basic_publish(exchange="", routing_key="events", body=msg)
where using encode() to transform it into a byte array does not work as well.
The exception I'm receiving is:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [field insertion], found: [B
at org.apache.kafka.connect.transforms.util.Requirements.requireStruct(Requirements.java:52)
at org.apache.kafka.connect.transforms.InsertField.applyWithSchema(InsertField.java:162)
at org.apache.kafka.connect.transforms.InsertField.apply(InsertField.java:133)
at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 11 more
I understand the error and thought that using JsonConverter will solve it, but I was wrong. I've also used "value.converter.schemas.enable" : "false" to no avail.
Would appreciate any help. I don't mind sending the data in json form or bytes form, I just want a key:value pair to be added to the event.
Thanks
As the error indicates, you can only insert fields into structs. To get a Struct from RabbitMQ String/Bytes schemas, you must chain a HoistField transform before InsertField one.
To get any Struct from JSONConverter, your JSON needs two top level fields named schema and payload, then connector needs
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/
Alternatively, use Kafka headers for "source" information, rather than trying to inject into the value

kafka mongodb sink connector issue while writing to mongodb

I am facing issue while writing to mongodb using mongo kafka sink connector.I am using mongodb of v5.0.3 and Strimzi kafka of v2.8.0. I have added p1/mongo-kafka-connect-1.7.0-all.jar and p2/mongodb-driver-core-4.5.0.jar in connect cluster plugins path.Created connector using below
{
"name": "mongo-sink",
"config": {
"topics": "sinktest2",
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"connection.uri": "mongodb://mm-0.mongoservice.st.svc.cluster.local:27017,mm-1.mongoservice.st.svc.cluster.local:27017",
"database": "sinkdb",
"collection": "sinkcoll",
"mongo.errors.tolerance": "all",
"mongo.errors.log.enable": true,
"errors.log.include.messages": true,
"errors.deadletterqueue.topic.name": "sinktest2.deadletter",
"errors.deadletterqueue.context.headers.enable": true
}
}
root#ubuntuserver-0:/persistent# curl http://localhost:8083/connectors/mongo-sink/status
{"name":"mongo-sink","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"localhost:8083"}],"type":"sink"}
When I check the status after creating connector it is showing running, but when I start sending records to kafka topic connector is running into issues.connector status is showing as below.
root#ubuntuserver-0:/persistent# curl http://localhost:8083/connectors/mongo-sink/status
{
"name":"mongo-sink",
"connector":{
"state":"RUNNING",
"worker_id":"localhost:8083"
},
"tasks":[
{
"id":0,
"state":"FAILED",
"worker_id":"localhost:8083",
"trace":"org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:473)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error: \n\tat org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:324)\n\tat org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertValue(WorkerSinkTask.java:540)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)\n\t... 13 more\nCaused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])\"{ \"; line: 1, column: 1])\n at [Source: (byte[])\"{ \"; line: 1, column: 4]\nCaused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])\"{ \"; line: 1, column: 1])\n at [Source: (byte[])\"{ \"; line: 1, column: 4]\n\tat com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664)\n\tat com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:486)\n\tat com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:498)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd2(UTF8StreamJsonParser.java:3033)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3003)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextFieldName(UTF8StreamJsonParser.java:989)\n\tat com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:249)\n\tat com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)\n\tat com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)\n\tat com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4270)\n\tat com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2734)\n\tat org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:64)\n\tat org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:322)\n\tat org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertValue(WorkerSinkTask.java:540)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:473)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n"
}
],
"type":"sink"
}
I am writing sample json record to kafka topic.
./kafka-console-producer.sh --topic sinktest2 --bootstrap-server sample-kafka-kafka-bootstrap:9093 --producer.config /persistent/client.txt < /persistent/emp.json
emp.json is below file
{
"employee": {
"name": "abc",
"salary": 56000,
"married": true
}
}
I don't see any logs in connector pod and no databse and collection being created in mongodb.
Please help to resolve this issue. Thank you !!
I think you are missing some configuration parameters like converter, and schema.
Update your config to add following:
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
If you are using KafkaConnect on kubernetes, you may create the sink connector as shown below. Create a file with name like mongo-sink-connector.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: mongodb-sink-connector
labels:
strimzi.io/cluster: my-connect-cluster
spec:
class: com.mongodb.kafka.connect.MongoSinkConnector
tasksMax: 2
config:
connection.uri: "mongodb://root:password#mongodb-0.mongodb-headless.default.svc.cluster.local:27017"
database: test
collection: sink
topics: sink-topic
key.converter: org.apache.kafka.connect.json.JsonConverter
value.converter: org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable: false
value.converter.schemas.enable: false
Execute the command:
$ kubectl apply -f mongo-sink-connector.yaml
you should see the output:
kafkaconnector.kafka.strimzi.io/mongo-apps-sink-connector created
Before starting the producer, check the status of connector and verify the topic has created as follows:
Status:
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$ curl http://localhost:8083/connectors/mongodb-sink-connector/status
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"IP-ADDRESS:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"IP-ADDRESS:8083"},{"id":1,"state":"RUNNING","worker_id":"IP-ADDRESS:8083"}],"type":"sink"}
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$
Check topic creation, you will see sink-topic
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$ bin/kafka-topics.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --list
__consumer_offsets
__strimzi-topic-operator-kstreams-topic-store-changelog
__strimzi_store_topic
connect-cluster-configs
connect-cluster-offsets
connect-cluster-status
sink-topic
Now, go on kafka server to execute the producer
[kafka#my-cluster-kafka-0 kafka]$ bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap:9092 --topic sink-topic
Successful execution will show you a prompt like > to enter/test the data
>{"employee": {"name": "abc", "salary": 56000, "married": true}}
>
On anther terminal, connect to kafka server and start consumer to verify the data
[kafka#my-cluster-kafka-0 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sink-topic --from-beginning
{"employee": {"name": "abc", "salary": 56000, "married": true}}
If you see this data, means everything is working fine. Now let us check on mongodb. Connect with your mongodb server and check
rs0:PRIMARY> use test
switched to db test
rs0:PRIMARY> show collections
sink
rs0:PRIMARY> db.sink.find()
{ "_id" : ObjectId("6234a4a0dad1a2638f57a6b2"), "employee" : { "name" : "abc", "salary" : NumberLong(56000), "married" : true } }
et Voila!
You're hitting a serialization exception. I'll break the message out a bit:
com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input:
expected close marker for Object (start marker at [Source: (byte[])"{ "; line: 1, column: 1])
at [Source: (byte[])"{ "; line: 1, column: 4]
Caused by: com.fasterxml.jackson.core.io.JsonEOFException:
Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])"{ "; line: 1, column: 1])
at [Source: (byte[])"{ "; line: 1, column: 4]
"expected close marker for Object" suggests to me that the parser is expecting to see the entire JSON object as one line, rather than pretty-printed.
{"employee": {"name": "abc", "salary": 56000, "married": true}}

Error With RowKey Definition on Confluent BigTable Sink Connector

I'm trying to use the BigTable Sink Connector from Confluent to read data from kafka and write it into my BigTable Instance, but I'm receiving the following message error:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:614)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: Error with RowKey definition: Row key definition was defined, but received, deserialized kafka key is not a struct. Unable to construct a row key.
at io.confluent.connect.bigtable.client.RowKeyExtractor.getRowKey(RowKeyExtractor.java:69)
at io.confluent.connect.bigtable.client.BufferedWriter.addWriteToBatch(BufferedWriter.java:84)
at io.confluent.connect.bigtable.client.InsertWriter.write(InsertWriter.java:47)
at io.confluent.connect.bigtable.BaseBigtableSinkTask.put(BaseBigtableSinkTask.java:99)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
... 10 more
The message producer, due to some technical limitations, will not be able to produce the messages with the key property and, because of that, I'm using some Transforms to get information from payload and setting it as the key message.
Here's my connector payload:
{
"name" : "DATALAKE.BIGTABLE.SINK.QUEUEING.ZTXXD",
"config" : {
"connector.class" : "io.confluent.connect.gcp.bigtable.BigtableSinkConnector",
"key.converter" : "org.apache.kafka.connect.storage.StringConverter",
"value.converter" : "org.apache.kafka.connect.json.JsonConverter",
"topics" : "APP-DATALAKE-QUEUEING-ZTXXD_DATALAKE-V1",
"transforms" : "HoistField,AddKeys,ExtractKey",
"gcp.bigtable.project.id" : "bigtable-project-id",
"gcp.bigtable.instance.id" : "bigtable-instance-id",
"gcp.bigtable.credentials.json" : "XXXXX",
"transforms.ExtractKey.type" : "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.HoistField.field" : "raw_data_cf",
"transforms.ExtractKey.field" : "KEY1,ATT1",
"transforms.HoistField.type" : "org.apache.kafka.connect.transforms.HoistField$Value",
"transforms.AddKeys.type" : "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.AddKeys.fields" : "KEY1,ATT1",
"row.key.definition" : "KEY1,ATT1",
"table.name.format" : "raw_ZTXXD_DATALAKE",
"consumer.override.group.id" : "svc-datalake-KAFKA_2_BIGTABLE",
"confluent.topic.bootstrap.servers" : "xxxxxx:9092",
"input.data.format" : "JSON",
"confluent.topic" : "_dsp-confluent-license",
"input.key.format" : "STRING",
"key.converter.schemas.enable" : "false",
"confluent.topic.security.protocol" : "SASL_SSL",
"row.key.delimiter" : "/",
"confluent.topic.sasl.jaas.config" : "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"XXXXX\" password=\"XXXXXX\";",
"value.converter.schemas.enable" : "false",
"auto.create.tables" : "true",
"auto.create.column.families" : "true",
"confluent.topic.sasl.mechanism" : "PLAIN"
}
}
And here's my message produced to Kafka:
{
"MANDT": "110",
"KEY1": "1",
"KEY2": null,
"ATT1": "1M",
"ATT2": "0000000000",
"TABLE_NAME": "ZTXXD_DATALAKE",
"IUUC_OPERATION": "I",
"CREATETIMESTAMP": "2022-01-24T20:26:45.247Z"
}
In my transforms I'm doing three operations:
HoistField is putting my payload inside a two-level structure (the connect docs for BigTable says that connect expects a two-level structure in order to be able to infer the family columns
addKey is adding the columns that I consider key to the message key
ExtractKey is removing the key from the fields added in the header, leaving only the values ​​themselves.
I've been reading the documentation for this connector for Bigtable and it's not clear to me if the connector works well with the JSON format. Could you let me know?
JSON should work, but...
deserialized kafka key is not a struct
This is because you have set the schemas.enable=false property on the value converter, such that when you do ValueToKey, it's not a Connect Struct type; the HoistField makes a Java Map instead.
If you're not able to use the Schema Registry and switch the serialization format, then you'll need to try and find a way to get the REST Proxy to infer the schema of the JSON message before it produces the data (I don't think it can). Otherwise, your records need to include schema and payload fields, and you need to enable schemas on the converters. Explained here
Another option - There may be a transform project around that sets the schema of the record, but it's not builtin.. (it's not part of SetSchemaMetadata)

Handling empty/invalid Mqtt Messages with Kafka Connect

I am trying to ingest data from Mqtt into Kafka. Unfortunately, some of those Mqtt-Messages are either empty or invalid JSON. I assume that is what leads to the following exception:
{
"name": "source_mqtt_alarms",
"connector": {
"state": "RUNNING",
"worker_id": "-redacted-:8083"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "-redacted-:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException:
Tolerance exceeded in error handler\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:196)\n\t
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:122)\n\t
at org.apache.kafka.connect.runtime.WorkerSourceTask.convertTransformedRecord(WorkerSourceTask.java:314)\n\t
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:340)\n\t
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:264)\n\t
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)\n\t
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)\n\t
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\t
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\t
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\t
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\t
at java.base/java.lang.Thread.run(Thread.java:834)\n
Caused by: org.apache.kafka.connect.errors.DataException: Conversion error: null value for field that is required and has no default value\n\t
at org.apache.kafka.connect.json.JsonConverter.convertToJson(JsonConverter.java:611)\n\t
at org.apache.kafka.connect.json.JsonConverter.convertToJsonWithEnvelope(JsonConverter.java:592)\n\t
at org.apache.kafka.connect.json.JsonConverter.fromConnectData(JsonConverter.java:346)\n\t
at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:63)\n\t
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$2(WorkerSourceTask.java:314)\n\t
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:146)\n\t
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:180)\n\t
... 11 more\n"
}
],
"type": "source"
}
From what I've learned so far, it looks like the incoming (empty/invalid) messages do not contain values that are declared as non-optional, which leads to the exception above.
My question would be, where is the connector taking that expectation from? It says "null value for field that is required and has no default value", but how is that field required if the schema is (I assume) created per message?
Additional information:
I am using the Lenses.io Stream Reactor Mqtt Source Connector. The configuration is as follows:
{
"name": "source_mqtt_alarms",
"config": {
"topics": "alarms",
"connect.mqtt.kcql": "INSERT INTO alarms SELECT * FROM `-redacted-/+/alarms` WITHCONVERTER=`com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter`",
"connect.mqtt.client.id": "kafka_connect_alarms",
"tasks.max": 1,
"connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector",
"connect.mqtt.service.quality": 2,
"connect.mqtt.hosts": "ssl://-redacted-:8883",
"connect.mqtt.ssl.ca.cert": "/usr/share/certs/cumu.crt",
"connect.mqtt.ssl.cert": "/usr/share/certs/mqtt.crt",
"connect.mqtt.ssl.key": "/usr/share/certs/mqtt.pem",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": true,
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": true,
}
}
Edit: I just went through the logs of the Kafka Connect worker and it's giving a bit more information. Prior to the exception above, I get a lost of these:
[2021-05-26 08:27:19,552] ERROR Error handling message with id:0 on topic:-redacted-/alarms (com.datamountaineer.streamreactor.connect.mqtt.source.MqttManager)
java.util.NoSuchElementException: head of empty list
at scala.collection.immutable.Nil$.head(List.scala:430)
at scala.collection.immutable.Nil$.head(List.scala:427)
at com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter$.convert(JsonSimpleConverter.scala:76)
at com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter$.convert(JsonSimpleConverter.scala:70)
at com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter.convert(JsonSimpleConverter.scala:37)
at com.datamountaineer.streamreactor.connect.mqtt.source.MqttManager.messageArrived(MqttManager.scala:110)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.deliverMessage(CommsCallback.java:514)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.handleMessage(CommsCallback.java:417)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.run(CommsCallback.java:214)
at java.base/java.lang.Thread.run(Thread.java:834)

Kafka Connect issue when reading from a RabbitMQ queue

I'm trying to read data into my topic from a RabbitMQ queue using the Kafka connector with the configuration below:
{
"name" : "RabbitMQSourceConnector1",
"config" : {
"connector.class" : "io.confluent.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max" : "1",
"kafka.topic" : "rabbitmqtest3",
"rabbitmq.queue" : "taskqueue",
"rabbitmq.host" : "localhost",
"rabbitmq.username" : "guest",
"rabbitmq.password" : "guest",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true"
}
}
But I´m having troubles when converting the source stream to JSON format as I´m losing the original message
Original:
{'id': 0, 'body': '010101010101010101010101010101010101010101010101010101010101010101010'}
Received:
{"schema":{"type":"bytes","optional":false},"payload":"eyJpZCI6IDEsICJib2R5IjogIjAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMDEwMTAxMCJ9"}
Does anyone have an idea why this is happening?
EDIT: I tried to convert the message to String using the "value.converter": "org.apache.kafka.connect.storage.StringConverter", but the result is the same:
11/27/19 4:07:37 PM CET , 0 , [B#1583a488
EDIT2:
I´m now receiving the JSON file but the content is still encoded in BASE64
Any idea on how to convert it back to UTF8 directly?
{
"name": "adls-gen2-sink",
"config": {
"connector.class":"io.confluent.connect.azure.datalake.gen2.AzureDataLakeGen2SinkConnector",
"tasks.max":"1",
"topics":"rabbitmqtest3",
"flush.size":"3",
"format.class":"io.confluent.connect.azure.storage.format.json.JsonFormat",
"value.converter":"org.apache.kafka.connect.converters.ByteArrayConverter",
"internal.value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"topics.dir":"sw66jsoningest",
"confluent.topic.bootstrap.servers":"localhost:9092",
"confluent.topic.replication.factor":"1",
"partitioner.class" : "io.confluent.connect.storage.partitioner.DefaultPartitioner"
}
}
UPDATE:
I got the solution, considering this flow:
Message (JSON) --> RabbitMq (ByteArray) --> Kafka (ByteArray) -->ADLS (JSON)
I used this converter on the RabbitMQ to Kafka connector to decode the message from Base64 to UTF8.
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter"
Afterwards I treated the message as a String and saved it as a JSON.
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"format.class":"io.confluent.connect.azure.storage.format.json.JsonFormat",
Many thanks!
If you set schemas.enable": "false", you shouldn't be getting the schema and payload fields
If you want no translation to happen at all, use ByteArrayConverter
If your data is just a plain string (which includes JSON), use StringConverter
It's not clear how you're printing the resulting message, but looks like you're printing the byte array and not decoding it to a String