Kafka jdbc connect sink: Is it possible to use pk.fields for fields in value and key?

Kafka jdbc connect sink: Is it possible to use pk.fields for fields in value and key? - apache-kafka

The issue i'm having is that when jdbc sink connector consumes kafka message, the key variables when writing to db is null.
However, when i consume directly through the kafka-avro-consumer - I can see the key and value variables with it's values because I use this config: --property print.key=true.
ASK: is there away to make sure that jdbc connector is processing the message key variable values?
console kafka-avro config
/opt/confluent-5.4.1/bin/kafka-avro-console-consumer \
--bootstrap-server "localhost:9092" \
--topic equipmentidentifier.persist \
--property parse.key=true \
--property key.separator=~ \
--property print.key=true \
--property schema.registry.url="http://localhost:8081" \
--property key.schema=[$KEY_SCHEMA] \
--property value.schema=[$IDENTIFIER_SCHEMA,$VALUE_SCHEMA]
error:
org.apache.kafka.connect.errors.RetriableException: java.sql.SQLException: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO "assignment_table" ("created_date","custome
r","id_type","id_value") VALUES('1970-01-01 03:25:44.567+00'::timestamp,123,'BILL_OF_LADING','BOL-123') was aborted: ERROR: null value in column "equipment_ide
ntifier_type" violates not-null constraint
Detail: Failing row contains (null, null, null, null, 1970-01-01 03:25:44.567, 123, id, 56). Call getNextException to see other errors in the batch.
org.postgresql.util.PSQLException: ERROR: null value in column "equipment_identifier_type" violates not-null constraint
Sink config:
task.max=1
topic=assignment
connect.class=io.confluet.connect.jdbc.JdbcSinkConnector
connection.url=jdbc:postgresql://localhost:5432/db
connection.user=test
connection.password=test
table.name.format=assignment_table
auto.create=false
insert.mode=insert
pk.fields=customer,equip_Type,equip_Value,id_Type,id_Value,cpId
transforms=flatten
transforms.flattenKey.type=org.apache.kafka.connect.transforms.Flatten$Key
transforms.flattenKey.delimiter=_
transforms.flattenKey.type=org.apache.kafka.connect.transforms.Flatten$Value
transforms.flattenKey.delimiter=_
Kafka key:
{
"assignmentKey": {
"cpId": {
"long": 1001
},
"equip": {
"Identifier": {
"type": "eq",
"value": "eq_45"
}
},
"vendorId": {
"string": "vendor"
}
}
}
Kafka value:
{
"assigmentValue": {
"id": {
"Identifier": {
"type": "id",
"value": "56"
}
},
"timestamp": {
"long": 1234456756
},
"customer": {
"long": 123
}
}
}

You need to tell the connector to use fields from the key, because by default it won't.
pk.mode=record_key
However you need to use fields from either the Key or the Value, not both as you have in your config currently:
pk.fields=customer,equip_Type,equip_Value,id_Type,id_Value,cpId
If you set pk.mode=record_key then pk.fields will refer to the fields in the message key.
Ref: https://docs.confluent.io/current/connect/kafka-connect-jdbc/sink-connector/sink_config_options.html#sink-pk-config-options
See also https://rmoff.dev/kafka-jdbc-video

Related

kafka mongodb sink connector issue while writing to mongodb

I am facing issue while writing to mongodb using mongo kafka sink connector.I am using mongodb of v5.0.3 and Strimzi kafka of v2.8.0. I have added p1/mongo-kafka-connect-1.7.0-all.jar and p2/mongodb-driver-core-4.5.0.jar in connect cluster plugins path.Created connector using below
{
"name": "mongo-sink",
"config": {
"topics": "sinktest2",
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"connection.uri": "mongodb://mm-0.mongoservice.st.svc.cluster.local:27017,mm-1.mongoservice.st.svc.cluster.local:27017",
"database": "sinkdb",
"collection": "sinkcoll",
"mongo.errors.tolerance": "all",
"mongo.errors.log.enable": true,
"errors.log.include.messages": true,
"errors.deadletterqueue.topic.name": "sinktest2.deadletter",
"errors.deadletterqueue.context.headers.enable": true
}
}
root#ubuntuserver-0:/persistent# curl http://localhost:8083/connectors/mongo-sink/status
{"name":"mongo-sink","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"localhost:8083"}],"type":"sink"}
When I check the status after creating connector it is showing running, but when I start sending records to kafka topic connector is running into issues.connector status is showing as below.
root#ubuntuserver-0:/persistent# curl http://localhost:8083/connectors/mongo-sink/status
{
"name":"mongo-sink",
"connector":{
"state":"RUNNING",
"worker_id":"localhost:8083"
},
"tasks":[
{
"id":0,
"state":"FAILED",
"worker_id":"localhost:8083",
"trace":"org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:473)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error: \n\tat org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:324)\n\tat org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertValue(WorkerSinkTask.java:540)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)\n\t... 13 more\nCaused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])\"{ \"; line: 1, column: 1])\n at [Source: (byte[])\"{ \"; line: 1, column: 4]\nCaused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])\"{ \"; line: 1, column: 1])\n at [Source: (byte[])\"{ \"; line: 1, column: 4]\n\tat com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664)\n\tat com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:486)\n\tat com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:498)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd2(UTF8StreamJsonParser.java:3033)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3003)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextFieldName(UTF8StreamJsonParser.java:989)\n\tat com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:249)\n\tat com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)\n\tat com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)\n\tat com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4270)\n\tat com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2734)\n\tat org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:64)\n\tat org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:322)\n\tat org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertValue(WorkerSinkTask.java:540)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:496)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:473)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:328)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n"
}
],
"type":"sink"
}
I am writing sample json record to kafka topic.
./kafka-console-producer.sh --topic sinktest2 --bootstrap-server sample-kafka-kafka-bootstrap:9093 --producer.config /persistent/client.txt < /persistent/emp.json
emp.json is below file
{
"employee": {
"name": "abc",
"salary": 56000,
"married": true
}
}
I don't see any logs in connector pod and no databse and collection being created in mongodb.
Please help to resolve this issue. Thank you !!

I think you are missing some configuration parameters like converter, and schema.
Update your config to add following:
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
If you are using KafkaConnect on kubernetes, you may create the sink connector as shown below. Create a file with name like mongo-sink-connector.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: mongodb-sink-connector
labels:
strimzi.io/cluster: my-connect-cluster
spec:
class: com.mongodb.kafka.connect.MongoSinkConnector
tasksMax: 2
config:
connection.uri: "mongodb://root:password#mongodb-0.mongodb-headless.default.svc.cluster.local:27017"
database: test
collection: sink
topics: sink-topic
key.converter: org.apache.kafka.connect.json.JsonConverter
value.converter: org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable: false
value.converter.schemas.enable: false
Execute the command:
$ kubectl apply -f mongo-sink-connector.yaml
you should see the output:
kafkaconnector.kafka.strimzi.io/mongo-apps-sink-connector created
Before starting the producer, check the status of connector and verify the topic has created as follows:
Status:
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$ curl http://localhost:8083/connectors/mongodb-sink-connector/status
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"IP-ADDRESS:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"IP-ADDRESS:8083"},{"id":1,"state":"RUNNING","worker_id":"IP-ADDRESS:8083"}],"type":"sink"}
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$
Check topic creation, you will see sink-topic
[kafka#my-connect-cluster-connect-5d47fb574-69xpv kafka]$ bin/kafka-topics.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --list
__consumer_offsets
__strimzi-topic-operator-kstreams-topic-store-changelog
__strimzi_store_topic
connect-cluster-configs
connect-cluster-offsets
connect-cluster-status
sink-topic
Now, go on kafka server to execute the producer
[kafka#my-cluster-kafka-0 kafka]$ bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap:9092 --topic sink-topic
Successful execution will show you a prompt like > to enter/test the data
>{"employee": {"name": "abc", "salary": 56000, "married": true}}
>
On anther terminal, connect to kafka server and start consumer to verify the data
[kafka#my-cluster-kafka-0 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sink-topic --from-beginning
{"employee": {"name": "abc", "salary": 56000, "married": true}}
If you see this data, means everything is working fine. Now let us check on mongodb. Connect with your mongodb server and check
rs0:PRIMARY> use test
switched to db test
rs0:PRIMARY> show collections
sink
rs0:PRIMARY> db.sink.find()
{ "_id" : ObjectId("6234a4a0dad1a2638f57a6b2"), "employee" : { "name" : "abc", "salary" : NumberLong(56000), "married" : true } }
et Voila!

You're hitting a serialization exception. I'll break the message out a bit:
com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input:
expected close marker for Object (start marker at [Source: (byte[])"{ "; line: 1, column: 1])
at [Source: (byte[])"{ "; line: 1, column: 4]
Caused by: com.fasterxml.jackson.core.io.JsonEOFException:
Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])"{ "; line: 1, column: 1])
at [Source: (byte[])"{ "; line: 1, column: 4]
"expected close marker for Object" suggests to me that the parser is expecting to see the entire JSON object as one line, rather than pretty-printed.
{"employee": {"name": "abc", "salary": 56000, "married": true}}

how to access latest offset of topic in confluent kafka rest proxy to calculate lag

In confluent kafka rest proxy we can get the last committed offset of particular consumer group but how can we get the latest offset of topic to calculate the lag.

You can use Kafka REST Proxy to fetch the latest offset committed for a particular partition. According to the Confluent Docs,
GET /consumers/(string: group_name)/instances/(string: instance)/offsets
Get the last committed offsets for the given partitions (whether the
commit happened by this process or another).
Note that this request must be made to the specific REST proxy
instance holding the consumer instance.
Parameters:
group_name (string) -- The name of the consumer group
instance (string) -- The ID of the consumer instance Request JSON
Array of Objects:
partitions -- A list of partitions to find the last committed offsets for
partitions[i].topic (string) -- Name of the topic
partitions[i].partition (int) -- Partition ID
Response JSON Array of Objects:
offsets -- A list of committed offsets
offsets[i].topic (string) -- Name of the topic for which an offset was committed
offsets[i].partition (int) -- Partition ID for which an offset was committed
offsets[i].offset (int) -- Committed offset
offsets[i].metadata (string) -- Metadata for the committed offset
Status Codes:
404 Not Found --
Error code 40402 -- Partition not found
Error code 40403 -- Consumer instance not found
Example Request:
GET /consumers/testgroup/instances/my_consumer/offsets HTTP/1.1
Host: proxy-instance.kafkaproxy.example.com
Accept: application/vnd.kafka.v2+json, application/vnd.kafka+json, application/json
{
"partitions": [
{
"topic": "test",
"partition": 0
},
{
"topic": "test",
"partition": 1
}
]
}
Example Response:
HTTP/1.1 200 OK
Content-Type: application/vnd.kafka.v2+json
{"offsets":
[
{
"topic": "test",
"partition": 0,
"offset": 21,
"metadata":""
},
{
"topic": "test",
"partition": 1,
"offset": 31,
"metadata":""
}
]
}

Looks like there is an early access feature for this: https://docs.confluent.io/platform/current/kafka-rest/api.html#get--clusters-cluster_id-consumer-groups-consumer_group_id-lags

JSONParseException reading data using Kafka rest API

KAFKA TOPIC (test3)
$ kafka-console-consumer --bootstrap-server broker:9092 --topic test3 --from-beginning
"Can we write to a topic that does not exist?"
"Can we write to a topic that does not exist?"
{"foo":"bar"}
["foo","bar"]
confluent
confluent
confluent
kafka
logs
0
0
Consumer (kafka-rest API on localhost:8082)
Create a consumer POST request to http://localhost:8082/consumers/rested
Request Body:
{
"format": "json",
"auto.offset.reset": "earliest",
"auto.commit.enable": "false"
}
Response Body:
{
"instance_id": "rest-consumer-dfa6ee0e-4f24-46dc-b0dc-dda3b80866ff",
"base_uri": "http://rest-proxy:8082/consumers/rested/instances/rest-consumer-dfa6ee0e-4f24-46dc-b0dc-dda3b80866ff"
}
Create a subscription usihg POST to http://localhost:8082/consumers/rested/instances/rest-consumer-dfa6ee0e-4f24-46dc-b0dc-dda3b80866ff/subscription
using Headers:
Host: http://localhost:8082
Content-Type: application/vnd.kafka.v2+json
and Request Body:
{
"topics": [
"test3"
]
}
returns a Response of 204 No Content.
Read records by making a GET request to http://localhost:8082/consumers/rested/instances/rest-consumer-dfa6ee0e-4f24-46dc-b0dc-dda3b80866ff/records
using the Headers:
Host: http://localhost:8082
Accept: application/vnd.kafka.json.v2+json
returns the Response:
{
"error_code": 50002,
"message": "Kafka error: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'key': was expecting ('true', 'false' or 'null')\n at [Source: (byte[])\"key\"; line: 1, column: 7]"
}
How can we fix this issue and ensure that we receive the data?
Exception (on Kafka)
The running Kafka Rest Proxy server log has the following exception:
rest-proxy | [2018-12-31 03:09:27,232] INFO 172.25.0.1 - - [31/Dec/2018:03:09:26 +0000] "GET /consumers/rest-consumer/instances/rest-consumer-8e49873e-13ce-46a5-be1f-0237a0369efe/records HTTP/1.1" 500 211 341 (io.confluent.rest-utils.requests)
rest-proxy | [2018-12-31 03:09:27,235] ERROR Unexpected exception in consumer read task id=io.confluent.kafkarest.v2.KafkaConsumerReadTask#59611e28 (io.confluent.kafkarest.v2.KafkaConsumerReadTask)
rest-proxy | org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'key': was expecting ('true', 'false' or 'null')
rest-proxy | at [Source: (byte[])"key"; line: 1, column: 7]
rest-proxy | Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'key': was expecting ('true', 'false' or 'null')
rest-proxy | at [Source: (byte[])"key"; line: 1, column: 7]
rest-proxy | at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
rest-proxy | at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:679)
rest-proxy | at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3526)
rest-proxy | at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2621)
rest-proxy | at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:826)
rest-proxy | at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:723)
rest-proxy | at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4141)
rest-proxy | at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4000)
rest-proxy | at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3091)
rest-proxy | at io.confluent.kafkarest.v2.JsonKafkaConsumerState.deserialize(JsonKafkaConsumerState.java:79)
rest-proxy | at io.confluent.kafkarest.v2.JsonKafkaConsumerState.createConsumerRecord(JsonKafkaConsumerState.java:64)
rest-proxy | at io.confluent.kafkarest.v2.KafkaConsumerReadTask.maybeAddRecord(KafkaConsumerReadTask.java:158)
rest-proxy | at io.confluent.kafkarest.v2.KafkaConsumerReadTask.addRecords(KafkaConsumerReadTask.java:142)
rest-proxy | at io.confluent.kafkarest.v2.KafkaConsumerReadTask.doPartialRead(KafkaConsumerReadTask.java:99)
rest-proxy | at io.confluent.kafkarest.v2.KafkaConsumerManager$RunnableReadTask.run(KafkaConsumerManager.java:370)
rest-proxy | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
rest-proxy | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
rest-proxy | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
rest-proxy | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
rest-proxy | at java.lang.Thread.run(Thread.java:748)
Consumer-Groups CLI
I can view the consumer-group on the CLI but it has no active members:
$ kafka-consumer-groups --bootstrap-server broker:9092 --list
has the result:
console-consumer-60695
console-consumer-62259
console-consumer-19307
console-consumer-47906
console-consumer-40838
rested
However, when I attempt to retrieve the members:
$ kafka-consumer-groups --bootstrap-server localhost:29092 --group rest-consumer --describe --members
Consumer group 'rested' has no active members.

TL;DR
You need to wrap your key in double-quotes, Not because all keys need to be wrapped in quotes but with a JSON parser you need to make your key valid JSON and a string wrapped in double-quotes is valid JSON.
If you really need to process this message you would need to read it in a different format than JSON.
Long Answer
You have a record with a key that does not have quotes which makes the value invalid JSON so when the Jackson JSON parser tries to parse the key it's not valid JSON (which isn't clear from the error message but when it doesn't see a quote or a square or curly bracket it starts to assume it's a boolean or null).
You can see where it's grabbing they key and trying to decode it as JSON here
https://github.com/confluentinc/kafka-rest/blob/a9b7cc527a26fdf09db27d148f2e71bfe3d87a6a/kafka-rest/src/main/java/io/confluent/kafkarest/v2/JsonKafkaConsumerState.java#L69
I was able to reproduce your error using this method
curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" \
--data '{"name": "my_consumer_instance", "format": "json", "auto.offset.reset": "latest"}' \
http://localhost:8082/consumers/my_json_consumer
curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" --data '{"topics":["testjsontopic"]}' \
http://localhost:8082/consumers/my_json_consumer/instances/my_consumer_instance/subscription
./bin/kafka-console-producer \
--broker-list :9092 \
--topic testjsontopic \
--property parse.key=true \
--property key.separator="&"
>"key"&{"foo":"bar"}
*Ctrl-C
curl -X GET -H "Accept: application/vnd.kafka.json.v2+json" \
http://localhost:8082/consumers/my_json_consumer/instances/my_consumer_instance/records
At this point I am able to read the record but when I add a key without the quotes I get the same error as you
./bin/kafka-console-producer \
--broker-list :9092 \
--topic testjsontopic \
--property parse.key=true \
--property key.separator="&"
>key&{"foo":"bar"}
Now when I call this code
curl -X GET -H "Accept: application/vnd.kafka.json.v2+json" \
http://localhost:8082/consumers/my_json_consumer/instances/my_consumer_instance/records
Now I receive this error
com.fasterxml.jackson.core.JsonParseException: Unrecognized token
'key': was expecting ('true', 'false' or 'null')
Use this to read your topics keys also
./bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic testjsontopic --property print.key=true --from-beginning

org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"

I am trying to use the confluent Kafka s3 connector using confluent-4.1.1.
s3-sink
"value.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
When I run Kafka connectors for the s3 sink, I get this error message:
ERROR WorkerSinkTask{id=singular-s3-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"
at io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1649)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1562)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1323)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:1047)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
My Schema contains only 1 array type field and its schema is like this
{"name":"item_id","type":{"type":"array","items":["null","string"]},"default":[]}
I am able to see the deserialized message using the kafka-avro-console-consumer command. I have seen a similar question but in his case, he was using Avro serializer for key also.
./confluent-4.1.1/bin/kafka-avro-console-consumer --topic singular_custom_postback --bootstrap-server localhost:9092 -max-messages 2
"item_id":[{"string":"15552"},{"string":"37810"},{"string":"38061"}]
"item_id":[]
I cannot put the entire output I get from the console consumer as it contains sensitive user information, so I have added the only array type field in my schema.
Thanks in advance.

The io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1649) is called for the conversion of avro schema of the message you read to the connect sink's internal schema. I believe it is not related to the data of your message. That is why the AbstractKafkaAvroDeserializer can successfully deserialise your message (e.g. via kafka-avro-console-consumer), as your message is a valid avro message. The above exception may occur if your default value is null, while null is not a valid value of your field. E.g.
{
"name":"item_id",
"type":{
"type":"array",
"items":[
"string"
]
},
"default": null
}
I would propose you to remotely debug connect and see what exactly is failing.

Same problem as the question that you have linked to.
In the source code, you can see this condition.
case ARRAY: {
if (!jsonValue.isArray()) {
throw new DataException("Invalid JSON for array default value: " + jsonValue.toString());
}
And the exception can be thrown when the schema type is defined in your case as type:"array", but the payload itself has a null value (or any other value type) rather than actually an array, despite what you have defined as your schema default value. The default is only applied when the items element isn't there at all, not when "items":null
Other than that, I would suggest a schema like so, i.e. a record object, not just a named array, with a default of an empty array, not null.
{
"type" : "record",
"name" : "Items",
"namespace" : "com.example.avro",
"fields" : [ {
"name" : "item_id",
"type" : {
"type" : "array",
"items" : [ "null", "string" ]
},
"default": []
} ]
}

PostgreSQL and Kafka Connect integration issue

I am testing JDBC Sink connector to dump records from Kafka to PostgreSQL. Here is the Connector config:
{
"name": "jdbc-sink-postgresql-1",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "role",
"connection.url": "jdbc:postgresql://localhost:5432/postgres?user=&password=",
"auto.create": "false",
"insert.mode": "upsert",
"mode":"incrementing",
"table.name.format":"role",
"pk.mode":"record_value",
"pk.fields":"role_id"
}
}
When I run the connector, I am getting below exception:
java.sql.BatchUpdateException: Batch entry 1 INSERT INTO "role" ("role_id","role_name") VALUES (123,'admin') ON CONFLICT ("role_id") DO UPDATE SET "role_name"=EXCLUDED."role_name" was aborted.
Call getNextException to see the cause.
at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2778))
Any pointers as to what am I missing here? Please let me know if more information is needed.

So, the problem was with the table. This is how I created table at first:
CREATE TABLE role(
role_id int PRIMARY KEY,
role_name VARCHAR (255) UNIQUE NOT NULL
);
The test data in topic looked like this:
./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic role --property schema.registry.url=http://localhost:8081/ --property value.schema='{"type":"record","name":"myRecord","fields": [{"name": "role_id","type": "int"},{"name": "role_name","type": "string"}]}' --key-serializer org.apache.kafka.common.serialization.StringSerializer --value-serializer io.confluent.kafka.serializers.KafkaAvroSerializer --property print.key=true
{"role_id":122, "role_name":"admin"}
{"role_id":123, "role_name":"admin"}
{"role_id":124, "role_name":"admin"}
{"role_id":125, "role_name":"admin"}
{"role_id":126, "role_name":"admin"}
So, when my test data had the same value for role_name field again and again, it violated the unique constraint and hence the error.
What all I did?
I dropped the table.
Created a new table without the unique key constraint and the above data was pushed to PostgreSQL without issues.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kafka jdbc connect sink: Is it possible to use pk.fields for fields in value and key? - apache-kafka

Related

kafka mongodb sink connector issue while writing to mongodb

how to access latest offset of topic in confluent kafka rest proxy to calculate lag

JSONParseException reading data using Kafka rest API

org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"

PostgreSQL and Kafka Connect integration issue

Categories

Resources