kafka mongodb sink connector issue while writing to mongodb - mongodb

I am facing issue while writing to mongodb using mongo kafka sink connector.I am using mongodb of v5.0.3 and Strimzi kafka of v2.8.0. I have added p1/mongo-kafka-connect-1.7.0-all.jar and p2/mongodb-driver-core-4.5.0.jar in connect cluster plugins path.Created connector using below
{
"name": "mongo-sink",
"config": {
"topics": "sinktest2",
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"connection.uri": "mongodb://mm-0.mongoservice.st.svc.cluster.local:27017,mm-1.mongoservice.st.svc.cluster.local:27017",
"database": "sinkdb",
"collection": "sinkcoll",
"mongo.errors.tolerance": "all",
"mongo.errors.log.enable": true,
"errors.log.include.messages": true,
"errors.deadletterqueue.topic.name": "sinktest2.deadletter",
"errors.deadletterqueue.context.headers.enable": true
}
}
root#ubuntuserver-0:/persistent# curl http://localhost:8083/connectors/mongo-sink/status
{"name":"mongo-sink","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"localhost:8083"}],"type":"sink"}
When I check the status after creating connector it is showing running, but when I start sending records to kafka topic connector is running into issues.connector status is showing as below.
root#ubuntuserver-0:/persistent# curl http://localhost:8083/connectors/mongo-sink/status
{
"name":"mongo-sink",
"connector":{
"state":"RUNNING",
"worker_id":"localhost:8083"
},
"tasks":[
{
"id":0,
"state":"FAILED",
"worker_id":"localhost:8083",
"trace":"org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:473)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error: \n\tat org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:324)\n\tat org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertValue(WorkerSinkTask.java:540)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)\n\t... 13 more\nCaused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])\"{ \"; line: 1, column: 1])\n at [Source: (byte[])\"{ \"; line: 1, column: 4]\nCaused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])\"{ \"; line: 1, column: 1])\n at [Source: (byte[])\"{ \"; line: 1, column: 4]\n\tat com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664)\n\tat com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:486)\n\tat com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:498)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd2(UTF8StreamJsonParser.java:3033)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3003)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextFieldName(UTF8StreamJsonParser.java:989)\n\tat com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:249)\n\tat com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)\n\tat com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)\n\tat com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4270)\n\tat com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2734)\n\tat org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:64)\n\tat org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:322)\n\tat org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertValue(WorkerSinkTask.java:540)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:473)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n"
}
],
"type":"sink"
}
I am writing sample json record to kafka topic.
./kafka-console-producer.sh --topic sinktest2 --bootstrap-server sample-kafka-kafka-bootstrap:9093 --producer.config /persistent/client.txt < /persistent/emp.json
emp.json is below file
{
"employee": {
"name": "abc",
"salary": 56000,
"married": true
}
}
I don't see any logs in connector pod and no databse and collection being created in mongodb.
Please help to resolve this issue. Thank you !!

I think you are missing some configuration parameters like converter, and schema.
Update your config to add following:
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
If you are using KafkaConnect on kubernetes, you may create the sink connector as shown below. Create a file with name like mongo-sink-connector.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: mongodb-sink-connector
labels:
strimzi.io/cluster: my-connect-cluster
spec:
class: com.mongodb.kafka.connect.MongoSinkConnector
tasksMax: 2
config:
connection.uri: "mongodb://root:password#mongodb-0.mongodb-headless.default.svc.cluster.local:27017"
database: test
collection: sink
topics: sink-topic
key.converter: org.apache.kafka.connect.json.JsonConverter
value.converter: org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable: false
value.converter.schemas.enable: false
Execute the command:
$ kubectl apply -f mongo-sink-connector.yaml
you should see the output:
kafkaconnector.kafka.strimzi.io/mongo-apps-sink-connector created
Before starting the producer, check the status of connector and verify the topic has created as follows:
Status:
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$ curl http://localhost:8083/connectors/mongodb-sink-connector/status
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"IP-ADDRESS:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"IP-ADDRESS:8083"},{"id":1,"state":"RUNNING","worker_id":"IP-ADDRESS:8083"}],"type":"sink"}
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$
Check topic creation, you will see sink-topic
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$ bin/kafka-topics.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --list
__consumer_offsets
__strimzi-topic-operator-kstreams-topic-store-changelog
__strimzi_store_topic
connect-cluster-configs
connect-cluster-offsets
connect-cluster-status
sink-topic
Now, go on kafka server to execute the producer
[kafka#my-cluster-kafka-0 kafka]$ bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap:9092 --topic sink-topic
Successful execution will show you a prompt like > to enter/test the data
>{"employee": {"name": "abc", "salary": 56000, "married": true}}
>
On anther terminal, connect to kafka server and start consumer to verify the data
[kafka#my-cluster-kafka-0 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sink-topic --from-beginning
{"employee": {"name": "abc", "salary": 56000, "married": true}}
If you see this data, means everything is working fine. Now let us check on mongodb. Connect with your mongodb server and check
rs0:PRIMARY> use test
switched to db test
rs0:PRIMARY> show collections
sink
rs0:PRIMARY> db.sink.find()
{ "_id" : ObjectId("6234a4a0dad1a2638f57a6b2"), "employee" : { "name" : "abc", "salary" : NumberLong(56000), "married" : true } }
et Voila!

You're hitting a serialization exception. I'll break the message out a bit:
com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input:
expected close marker for Object (start marker at [Source: (byte[])"{ "; line: 1, column: 1])
at [Source: (byte[])"{ "; line: 1, column: 4]
Caused by: com.fasterxml.jackson.core.io.JsonEOFException:
Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])"{ "; line: 1, column: 1])
at [Source: (byte[])"{ "; line: 1, column: 4]
"expected close marker for Object" suggests to me that the parser is expecting to see the entire JSON object as one line, rather than pretty-printed.
{"employee": {"name": "abc", "salary": 56000, "married": true}}

Related

MongoDB Kafka connect ChangeStreamHandler do not support truncatedArrays

I am using ChangeStreamHandler in mongo Kafka sink connector to stream changes from mongo source to sink collection
"change.data.capture.handler": "com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler"
On updates events from the source MongoDB collection the change stream handler is failing with exception
ERROR Unable to process record SinkRecord{kafkaOffset=3, timestampType=CreateTime} ConnectRecord{topic='quickstart.sampleData', kafkaPartition=0, key={"_id": {"_data": "8262A5CD4B000000012B022C0100296E5A1004B80560BF7F114B04962A5F523CEAB5D046645F6964006462A5CC9B84956FD488691BF10004"}}, keySchema=Schema{STRING}, value={"_id": {"_data": "8262A5CD4B000000012B022C0100296E5A1004B80560BF7F114B04962A5F523CEAB5D046645F6964006462A5CC9B84956FD488691BF10004"}, "operationType": "update", "clusterTime": {"$timestamp": {"t": 1655033163, "i": 1}}, "ns": {"db": "quickstart", "coll": "sampleData"}, "documentKey": {"_id": {"$oid": "62a5cc9b84956fd488691bf1"}}, "updateDescription": {"updatedFields": {"hello": "moto"}, "removedFields": [], "truncatedArrays": []}}, valueSchema=Schema{STRING}, timestamp=1655033166742, headers=ConnectHeaders(headers=)} (com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData)
org.apache.kafka.connect.errors.DataException: Warning unexpected field(s) in updateDescription [truncatedArrays]. {"updatedFields": {"hello": "moto"}, "removedFields": [], "truncatedArrays": []}. Cannot process due to risk of data loss.
at com.mongodb.kafka.connect.sink.cdc.mongodb.operations.OperationHelper.getUpdateDocument(OperationHelper.java:99)
at com.mongodb.kafka.connect.sink.cdc.mongodb.operations.Update.perform(Update.java:57)
at com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler.handle(ChangeStreamHandler.java:84)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.lambda$buildWriteModelCDC$3(MongoProcessedSinkRecordData.java:99)
at java.base/java.util.Optional.flatMap(Optional.java:294)
Below is the Change stream event received on the sink side
{"schema":{"type":"string","optional":false},"payload":"{\"_id\": {\"_data\": \"8262A5CD4B000000012B022C0100296E5A1004B80560BF7F114B04962A5F523CEAB5D046645F6964006462A5CC9B84956FD488691BF10004\"}, \"operationType\": \"update\", \"clusterTime\": {\"$timestamp\": {\"t\": 1655033163, \"i\": 1}}, \"ns\": {\"db\": \"quickstart\", \"coll\": \"sampleData\"}, \"documentKey\": {\"_id\": {\"$oid\": \"62a5cc9b84956fd488691bf1\"}}, \"updateDescription\": {\"updatedFields\": {\"hello\": \"moto\"}, \"removedFields\": [], \"truncatedArrays\": []}}"}
On looking at the code in class
com.mongodb.kafka.connect.sink.cdc.mongodb.operations.OperationHelper.getUpdateDocument(OperationHelper.java:99)
It shows that the updateDescription.updatedfields only handles updatedFields & removedFields.. support for truncatedArrays is not present.
Is this a bug? or I need to tune my source connector to somehow stop sending truncatedArrays in changeEvents.
I had the same issue here and i could solve that setting up the following configuration at the Source Connector:
"change.stream.full.document": "updateLookup"
A Full Exemple:
{
"name": "mongo-simple-source",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"connection.uri": "yourMongodbUri",
"database": "yourDataBase",
"collection": "yourCollection",
"change.stream.full.document": "updateLookup"
}
}

Why I receive a lot of duplicates with debezium?

I'm testing Debezium platform in a local deployment with docker-compose. Here's my test case:
run postgres, kafka, zookeeper and 3 replicas of debezium/connect:1.3
configure connector in one of the replica with the following configs:
{
"name": "database-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "wal2json",
"slot.name": "database",
"database.hostname": "debezium_postgis_1",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "database",
"database.server.name": "database",
"heartbeat.interval.ms": 5000,
"table.whitelist": "public.outbox",
"transforms.outbox.table.field.event.id": "event_uuid",
"transforms.outbox.table.field.event.key": "event_name",
"transforms.outbox.table.field.event.payload": "payload",
"transforms.outbox.table.field.event.payload.id": "event_uuid",
"transforms.outbox.route.topic.replacement": "${routedByValue}",
"transforms.outbox.route.by.field": "topic",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"max.batch.size": 1,
"offset.commit.policy": "io.debezium.engine.spi.OffsetCommitPolicy.AlwaysCommitOffsetPolicy",
"binary.handling.mode": "bytes"
}
}
run a script that executes 2000 insert in outbox table by calling this method from another class
#Transactional
public void write(String eventName, String topic, byte[] payload) {
Outbox newRecord = new Outbox(eventName, topic, payload);
repository.save(newRecord);
repository.delete(newRecord);
}
After some seconds (when I see the first messages on Kafka), I kill the replica who's handling the stream. Let's say it delivered successfully 200 messages on the right topic.
I get from the topic where debezium stores offsets the last offset message:
{
"transaction_id": null,
"lsn_proc": 24360992,
"lsn": 24495808,
"txId": 560,
"ts_usec": 1595337502556806
}
then I open a db shell and run the following
SELECT slot_name, restart_lsn - pg_lsn('0/0') as restart_lsn, confirmed_flush_lsn - pg_lsn('0/0') as confirmed_flush_lsn FROM pg_replication_slots; and postgres reply:
[
{
"slot_name": "database",
"restart_lsn": 24360856,
"confirmed_flush_lsn": 24360992
}
]
After 5 minutes I killed the replica, Kafka rebalances connectors and it deploy a new running task on one of the living replicas.
The new connector starts handling the stream, but it seems that it starts from the beginning because after it finish I found 2200 messages on Kafka.
With that configuration (max.batch.size: 1 and AlwaysCommitPolicy) I expect to see max 2001 messages.
Where am I wrong ?
I found the problem in my configuration:
"offset.commit.policy": "io.debezium.engine.spi.OffsetCommitPolicy.AlwaysCommitOffsetPolicy" works only with the Embedded API.
Moreover the debezium/connect:1.3 docker image has a default value for OFFSET_FLUSH_INTERVAL_MS of 1 minute. So if I stop the container within its first 1 minute, no offsets will be stored on kafka

Kafka jdbc connect sink: Is it possible to use pk.fields for fields in value and key?

The issue i'm having is that when jdbc sink connector consumes kafka message, the key variables when writing to db is null.
However, when i consume directly through the kafka-avro-consumer - I can see the key and value variables with it's values because I use this config: --property print.key=true.
ASK: is there away to make sure that jdbc connector is processing the message key variable values?
console kafka-avro config
/opt/confluent-5.4.1/bin/kafka-avro-console-consumer \
--bootstrap-server "localhost:9092" \
--topic equipmentidentifier.persist \
--property parse.key=true \
--property key.separator=~ \
--property print.key=true \
--property schema.registry.url="http://localhost:8081" \
--property key.schema=[$KEY_SCHEMA] \
--property value.schema=[$IDENTIFIER_SCHEMA,$VALUE_SCHEMA]
error:
org.apache.kafka.connect.errors.RetriableException: java.sql.SQLException: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO "assignment_table" ("created_date","custome
r","id_type","id_value") VALUES('1970-01-01 03:25:44.567+00'::timestamp,123,'BILL_OF_LADING','BOL-123') was aborted: ERROR: null value in column "equipment_ide
ntifier_type" violates not-null constraint
Detail: Failing row contains (null, null, null, null, 1970-01-01 03:25:44.567, 123, id, 56). Call getNextException to see other errors in the batch.
org.postgresql.util.PSQLException: ERROR: null value in column "equipment_identifier_type" violates not-null constraint
Sink config:
task.max=1
topic=assignment
connect.class=io.confluet.connect.jdbc.JdbcSinkConnector
connection.url=jdbc:postgresql://localhost:5432/db
connection.user=test
connection.password=test
table.name.format=assignment_table
auto.create=false
insert.mode=insert
pk.fields=customer,equip_Type,equip_Value,id_Type,id_Value,cpId
transforms=flatten
transforms.flattenKey.type=org.apache.kafka.connect.transforms.Flatten$Key
transforms.flattenKey.delimiter=_
transforms.flattenKey.type=org.apache.kafka.connect.transforms.Flatten$Value
transforms.flattenKey.delimiter=_
Kafka key:
{
"assignmentKey": {
"cpId": {
"long": 1001
},
"equip": {
"Identifier": {
"type": "eq",
"value": "eq_45"
}
},
"vendorId": {
"string": "vendor"
}
}
}
Kafka value:
{
"assigmentValue": {
"id": {
"Identifier": {
"type": "id",
"value": "56"
}
},
"timestamp": {
"long": 1234456756
},
"customer": {
"long": 123
}
}
}
You need to tell the connector to use fields from the key, because by default it won't.
pk.mode=record_key
However you need to use fields from either the Key or the Value, not both as you have in your config currently:
pk.fields=customer,equip_Type,equip_Value,id_Type,id_Value,cpId
If you set pk.mode=record_key then pk.fields will refer to the fields in the message key.
Ref: https://docs.confluent.io/current/connect/kafka-connect-jdbc/sink-connector/sink_config_options.html#sink-pk-config-options
See also https://rmoff.dev/kafka-jdbc-video

org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"

I am trying to use the confluent Kafka s3 connector using confluent-4.1.1.
s3-sink
"value.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
When I run Kafka connectors for the s3 sink, I get this error message:
ERROR WorkerSinkTask{id=singular-s3-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"
at io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1649)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1562)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1323)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:1047)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
My Schema contains only 1 array type field and its schema is like this
{"name":"item_id","type":{"type":"array","items":["null","string"]},"default":[]}
I am able to see the deserialized message using the kafka-avro-console-consumer command. I have seen a similar question but in his case, he was using Avro serializer for key also.
./confluent-4.1.1/bin/kafka-avro-console-consumer --topic singular_custom_postback --bootstrap-server localhost:9092 -max-messages 2
"item_id":[{"string":"15552"},{"string":"37810"},{"string":"38061"}]
"item_id":[]
I cannot put the entire output I get from the console consumer as it contains sensitive user information, so I have added the only array type field in my schema.
Thanks in advance.
The io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1649) is called for the conversion of avro schema of the message you read to the connect sink's internal schema. I believe it is not related to the data of your message. That is why the AbstractKafkaAvroDeserializer can successfully deserialise your message (e.g. via kafka-avro-console-consumer), as your message is a valid avro message. The above exception may occur if your default value is null, while null is not a valid value of your field. E.g.
{
"name":"item_id",
"type":{
"type":"array",
"items":[
"string"
]
},
"default": null
}
I would propose you to remotely debug connect and see what exactly is failing.
Same problem as the question that you have linked to.
In the source code, you can see this condition.
case ARRAY: {
if (!jsonValue.isArray()) {
throw new DataException("Invalid JSON for array default value: " + jsonValue.toString());
}
And the exception can be thrown when the schema type is defined in your case as type:"array", but the payload itself has a null value (or any other value type) rather than actually an array, despite what you have defined as your schema default value. The default is only applied when the items element isn't there at all, not when "items":null
Other than that, I would suggest a schema like so, i.e. a record object, not just a named array, with a default of an empty array, not null.
{
"type" : "record",
"name" : "Items",
"namespace" : "com.example.avro",
"fields" : [ {
"name" : "item_id",
"type" : {
"type" : "array",
"items" : [ "null", "string" ]
},
"default": []
} ]
}

PostgreSQL and Kafka Connect integration issue

I am testing JDBC Sink connector to dump records from Kafka to PostgreSQL. Here is the Connector config:
{
"name": "jdbc-sink-postgresql-1",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "role",
"connection.url": "jdbc:postgresql://localhost:5432/postgres?user=&password=",
"auto.create": "false",
"insert.mode": "upsert",
"mode":"incrementing",
"table.name.format":"role",
"pk.mode":"record_value",
"pk.fields":"role_id"
}
}
When I run the connector, I am getting below exception:
java.sql.BatchUpdateException: Batch entry 1 INSERT INTO "role" ("role_id","role_name") VALUES (123,'admin') ON CONFLICT ("role_id") DO UPDATE SET "role_name"=EXCLUDED."role_name" was aborted.
Call getNextException to see the cause.
at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2778))
Any pointers as to what am I missing here? Please let me know if more information is needed.
So, the problem was with the table. This is how I created table at first:
CREATE TABLE role(
role_id int PRIMARY KEY,
role_name VARCHAR (255) UNIQUE NOT NULL
);
The test data in topic looked like this:
./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic role --property schema.registry.url=http://localhost:8081/ --property value.schema='{"type":"record","name":"myRecord","fields": [{"name": "role_id","type": "int"},{"name": "role_name","type": "string"}]}' --key-serializer org.apache.kafka.common.serialization.StringSerializer --value-serializer io.confluent.kafka.serializers.KafkaAvroSerializer --property print.key=true
{"role_id":122, "role_name":"admin"}
{"role_id":123, "role_name":"admin"}
{"role_id":124, "role_name":"admin"}
{"role_id":125, "role_name":"admin"}
{"role_id":126, "role_name":"admin"}
So, when my test data had the same value for role_name field again and again, it violated the unique constraint and hence the error.
What all I did?
I dropped the table.
Created a new table without the unique key constraint and the above data was pushed to PostgreSQL without issues.