Cannot sink Kafka stream to JDBC: unrecoverable exception - apache-kafka

I'm trying to sink a stream of data from a MQTT Source with schema enabled to a Microsoft SQL Server database.
I've followed many posts on the matter, regardless I receive the following error:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:484)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:265)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:182)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:150)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The MQTT Source configuration is:
connector.class=com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector
key.converter.schemas.enable=true
value.converter.schemas.enable=true
name=schema-mqtt-source
connect.mqtt.kcql=INSERT INTO schema_IoT SELECT * FROM machine/sensor/mytopic/test WITHCONVERTER=`com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter`
value.converter=org.apache.kafka.connect.json.JsonConverter
connect.mqtt.service.quality=0
key.converter=org.apache.kafka.connect.json.JsonConverter
connect.mqtt.hosts=tcp://host:1884
A data sample that is ingested into Kafka is the following:
{
"timestamp": 1526912884265,
"partition": 0,
"key": {
"schema": {
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "topic"
},
{
"type": "string",
"optional": false,
"field": "id"
}
],
"optional": false,
"name": "com.datamountaineer.streamreactor.connect.converters.MsgKey"
},
"payload": {
"topic": "machine/sensor/mytopic/test",
"id": "0"
}
},
"offset": 0,
"topic": "schema_IoT",
"value": {
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "sentOn"
},
{
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "fan"
},
{
"type": "int64",
"optional": false,
"field": "buzzer"
},
{
"type": "int64",
"optional": false,
"field": "light"
},
{
"type": "double",
"optional": false,
"field": "temperature"
},
{
"type": "string",
"optional": false,
"field": "assetName"
},
{
"type": "int64",
"optional": false,
"field": "led"
},
{
"type": "boolean",
"optional": false,
"field": "water"
}
],
"optional": false,
"name": "metrics",
"field": "metrics"
}
],
"optional": false,
"name": "machine_sensor_mytopic_test"
},
"payload": {
"sentOn": 1526913070679,
"metrics": {
"fan": 1,
"buzzer": 0,
"light": 255,
"temperature": 22.296352538102642,
"assetName": "SIMopcua",
"led": 0,
"water": false
}
}
}
}
Finally the properties of the jdbc-sink connector config. file are:
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
connection.password=password
topics=schema_IoT
batch.size=10
key.converter.schemas.enable=true
auto.evolve=true
connection.user=username
name=sink-mssql
value.converter.schemas.enable=true
auto.create=true
connection.url=jdbc:sqlserver://hostname:port;databaseName=mydb;user=username;password=mypsd;
value.converter=org.apache.kafka.connect.json.JsonConverter
insert.mode=insert
key.converter=org.apache.kafka.connect.json.JsonConverter
What am I doing wrong? Any help is appreciated.
Fabio

Related

Debezium MS SQL Server produces wrong JSON format not recognized by Flink

I have the following setting (verified using curl connector/connector-name):
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.user": "admin",
"database.dbname": "test",
"database.hostname": "mssql-host",
"database.password": "xxxxxxx",
"database.history.kafka.bootstrap.servers": "server:9092", "database.history.kafka.topic": "dbhistory.test", "value.converter.schemas.enable": "false",
"name": "mssql-cdc",
"database.server.name": "test",
"database.port": "1433",
"include.schema.changes": "false"
}
I was able to pull CDC events into Kafka topic. It is in following format:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "value"
}
],
"optional": true,
"name": "test.dbo.tryme2.Value",
"field": "before"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "string",
"optional": true,
"field": "value"
}
],
"optional": true,
"name": "test.dbo.tryme2.Value",
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "connector"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Enum",
"version": 1,
"parameters": {
"allowed": "true,last,false,incremental"
},
"default": "false",
"field": "snapshot"
},
{
"type": "string",
"optional": false,
"field": "db"
},
{
"type": "string",
"optional": true,
"field": "sequence"
},
{
"type": "string",
"optional": false,
"field": "schema"
},
{
"type": "string",
"optional": false,
"field": "table"
},
{
"type": "string",
"optional": true,
"field": "change_lsn"
},
{
"type": "string",
"optional": true,
"field": "commit_lsn"
},
{
"type": "int64",
"optional": true,
"field": "event_serial_no"
}
],
"optional": false,
"name": "io.debezium.connector.sqlserver.Source",
"field": "source"
},
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "id"
},
{
"type": "int64",
"optional": false,
"field": "total_order"
},
{
"type": "int64",
"optional": false,
"field": "data_collection_order"
}
],
"optional": true,
"field": "transaction"
}
],
"optional": false,
"name": "test.dbo.tryme2.Envelope"
},
"payload": {
"before": null,
"after": {
"id": 777,
"value": "xxxxxx"
},
"source": {
"version": "1.8.1.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1647169350996,
"snapshot": "true",
"db": "test",
"sequence": null,
"schema": "dbo",
"table": "tryme2",
"change_lsn": null,
"commit_lsn": "00000043:00000774:000a",
"event_serial_no": null
},
"op": "r",
"ts_ms": 1647169350997,
"transaction": null
}
}
In Flink, when I created a source table using the topic, I get:
Caused by: java.io.IOException: Corrupt Debezium JSON message
I already have "value.converter.schemas.enable": "false", why doesn't this work?
Just found out that the configuration was hierarchical, meaning your have to supply both value.converter and value.converter.schemas.enable to override Kafka Connect worker configuration at connector level.
I sincerely hope there were some sort of validation so I did not have to wonder for hours.
Also if schema is desired, there is a Flink configuration hidden in the doc:
In order to interpret such messages, you need to add the option 'debezium-json.schema-include' = 'true' into above DDL WITH clause (false by default). Usually, this is not recommended to include schema because this makes the messages very verbose and reduces parsing performance.
I have to say this is really bad developer experience.

org.apache.kafka.connect.transforms.ReplaceField does not work

The documentation I used: https://docs.confluent.io/platform/current/connect/transforms/replacefield.html
I use this connector to rename PersonId column to Id by using the org.apache.kafka.connect.transforms.ReplaceField and setting renames to PersonId:Id
{
"name": "SQL_Connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "hostname",
"database.port": "1433",
"database.user": "user",
"database.password": "password",
"database.dbname": "sqlserver",
"database.server.name": "server",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.test",
"transforms": "RenameField,addStaticField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "PersonId:Id",
"transforms.addStaticField.type":"org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.addStaticField.static.field":"table",
"transforms.addStaticField.static.value":"changedtablename",
}
}
But when I get the value in the topic the field PersonId is not changed:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "PersonId"
}
],
"optional": true,
"name": "test.Value",
"field": "before"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "PersonId"
}
],
"optional": true,
"name": "test.Value",
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "connector"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Enum",
"version": 1,
"parameters": {
"allowed": "true,last,false"
},
"default": "false",
"field": "snapshot"
},
{
"type": "string",
"optional": false,
"field": "db"
},
{
"type": "string",
"optional": false,
"field": "schema"
},
{
"type": "string",
"optional": false,
"field": "table"
},
{
"type": "string",
"optional": true,
"field": "change_lsn"
},
{
"type": "string",
"optional": true,
"field": "commit_lsn"
},
{
"type": "int64",
"optional": true,
"field": "event_serial_no"
}
],
"optional": false,
"name": "io.debezium.connector.sqlserver.Source",
"field": "source"
},
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"field": "table"
}
],
"optional": false,
"name": "test.Envelope"
},
"payload": {
"before": null,
"after": {
"PersonId": 1,
},
"source": {
"version": "1.0.3.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1627628793596,
"snapshot": "true",
"db": "test",
"schema": "dbo",
"table": "TestTable",
"change_lsn": null,
"commit_lsn": "00023472:00000100:0001",
"event_serial_no": null
},
"op": "r",
"ts_ms": 1627628793596,
"table": "changedtablename"
}
}
How do I change the Field?
You can only replace fields that are at the top-level of the Kafka Record, as the example in the doc shows.
That being said, you will need to extract the after field first

Need primary key information in Debezium connector for postgres insert events

I am using Debezium connector for postgres with Kafka connect.
For an insert row event written to Kafka by the connector, I need information about which columns are primary keys and which are not. Is there a way to achieve this ?
Pasting a sample insert event generated in Kafka:
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "int32",
"optional": false,
"field": "bucket_type"
}
],
"optional": true,
"name": "postgresconfigdb.config.alert_configs.Value",
"field": "before"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "int32",
"optional": false,
"field": "bucket_type"
}
],
"optional": true,
"name": "postgresconfigdb.config.alert_configs.Value",
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "connector"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Enum",
"version": 1,
"parameters": {
"allowed": "true,last,false"
},
"default": "false",
"field": "snapshot"
},
{
"type": "string",
"optional": false,
"field": "db"
},
{
"type": "string",
"optional": false,
"field": "schema"
},
{
"type": "string",
"optional": false,
"field": "table"
},
{
"type": "int64",
"optional": true,
"field": "txId"
},
{
"type": "int64",
"optional": true,
"field": "lsn"
},
{
"type": "int64",
"optional": true,
"field": "xmin"
}
],
"optional": false,
"name": "io.debezium.connector.postgresql.Source",
"field": "source"
},
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "id"
},
{
"type": "int64",
"optional": false,
"field": "total_order"
},
{
"type": "int64",
"optional": false,
"field": "data_collection_order"
}
],
"optional": true,
"field": "transaction"
}
],
"optional": false,
"name": "postgresconfigdb.config.alert_configs.Envelope"
},
"payload": {
"before": null,
"after": {
"id": 1100,
"bucket_type": 10
},
"source": {
"version": "1.2.0.Final",
"connector": "postgresql",
"name": "postgresconfigdb",
"ts_ms": 1599830887858,
"snapshot": "true",
"db": "configdb",
"schema": "config",
"table": "alert_configs",
"txId": 2139888,
"lsn": 379356048,
"xmin": null
},
"op": "r",
"ts_ms": 1599830887859,
"transaction": null
}
}
Here the columns in the table are 'id' and 'bucket_type', the values of which are reported in the json-path payload->after.
There is information about columns which are not null in the column specific 'optional' boolean field, however no information about which columns are primary keys. (id in this case)
you find information about what fields are PK columns in Kafka key.

while consuming from kafka in druid, roll up merges two rows to 1 instead of adding them

I trying to use druid to consume events from kafka, however when I'm using roll-up to consume the data, the number of events seem to coming in wrong. without roll-up the numbers are accurate. I am using Druid 0.17.1.
i have observed that while roll up is happening instead of aggregating the events to n it aggregates to 1.
here is my ingestion spec
{
"dataSchema": {
"dataSource": "notificationstatus",
"timestampSpec": {
"column": "date",
"format": "yyyy-MM-dd-HH:mm:ss Z",
"missingValue": null
},
"dimensionsSpec": {
"dimensions": [{
"type": "string",
"name": "Process",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "Channel",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "Status",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "Message",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "CampaignID",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "BannerID",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
}
],
"dimensionExclusions": [
"date",
"count"
]
},
"metricsSpec": [{
"type": "count",
"name": "count"
}],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true,
"intervals": null
},
"transformSpec": {
"filter": {
"type": "not",
"field": {
"type": "like",
"dimension": "Status",
"pattern": "INFO",
"escape": null,
"extractionFn": null
}
},
"transforms": []
}
},
"ioConfig": {
"topic": "notificationstatus",
"inputFormat": {
"type": "tsv",
"columns": [
"source",
"ymd",
"date",
"Process",
"deviceID",
"Channel",
"CampaignID",
"BannerID",
"Status",
"Message",
"11",
"12"
],
"listDelimiter": null,
"delimiter": "\t",
"findColumnsFromHeader": false,
"skipHeaderRows": 0
},
"replicas": 1,
"taskCount": 1,
"taskDuration": "PT3600S",
"consumerProperties": {},
"pollTimeout": 100,
"startDelay": "PT5S",
"period": "PT30S",
"useEarliestOffset": false,
"completionTimeout": "PT1800S",
"lateMessageRejectionPeriod": null,
"earlyMessageRejectionPeriod": null,
"lateMessageRejectionStartDateTime": null,
"stream": "notificationstatus",
"useEarliestSequenceNumber": false,
"type": "kafka"
},
"tuningConfig": {
"type": "kafka",
"maxRowsInMemory": 1000000,
"maxBytesInMemory": 0,
"maxRowsPerSegment": 5000000,
"maxTotalRows": null,
"intermediatePersistPeriod": "PT10M",
"basePersistDirectory": "/home/akash/Downloads/druidVer/apache-druid-0.17.1/var/tmp/druid-realtime-persist622909873559398926",
"maxPendingPersists": 0,
"indexSpec": {
"bitmap": {
"type": "concise"
},
"dimensionCompression": "lz4",
"metricCompression": "lz4",
"longEncoding": "longs"
},
"indexSpecForIntermediatePersists": {
"bitmap": {
"type": "concise"
},
"dimensionCompression": "lz4",
"metricCompression": "lz4",
"longEncoding": "longs"
},
"buildV9Directly": true,
"reportParseExceptions": false,
"handoffConditionTimeout": 0,
"resetOffsetAutomatically": false,
"segmentWriteOutMediumFactory": null,
"workerThreads": null,
"chatThreads": null,
"chatRetries": 8,
"httpTimeout": "PT10S",
"shutdownTimeout": "PT80S",
"offsetFetchPeriod": "PT30S",
"intermediateHandoffPeriod": "P2147483647D",
"logParseExceptions": false,
"maxParseExceptions": 2147483647,
"maxSavedParseExceptions": 0,
"skipSequenceNumberAvailabilityCheck": false,
"repartitionTransitionDuration": "PT120S"
},
"type": "kafka"
}

Kafka JDBC sink no handling null values

I am trying to insert data with the Kafka JDBC Sink connector, but it is returning me this exception.
org.apache.kafka.connect.errors.DataException: Invalid null value for required INT64 field
The records have the following schema:
[
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"field": "ID"
},
{
"type": "int64",
"field": "TENANT_ID"
},
{
"type": "string",
"field": "ITEM"
},
{
"type": "int64",
"field": "TIPO"
},
{
"type": "int64",
"field": "BUSINESS_CONCEPT"
},
{
"type": "string",
"field": "ETIQUETA"
},
{
"type": "string",
"field": "VALOR"
},
{
"type": "string",
"field": "GG_T_TYPE"
},
{
"type": "string",
"field": "GG_T_TIMESTAMP"
},
{
"type": "string",
"field": "TD_T_TIMESTAMP"
},
{
"type": "string",
"field": "POS"
}
]
},
"payload": {
"ID": 298457,
"TENANT_ID": 83,
"ITEM": "0-0-0",
"TIPO": 4,
"BUSINESS_CONCEPT": null,
"ETIQUETA": "¿Cuándo ha ocurrido?",
"VALOR": "2019-05-31T10:33:00Z",
"GG_T_TYPE": "I",
"GG_T_TIMESTAMP": "2019-05-31 14:35:19.002221",
"TD_T_TIMESTAMP": "2019-06-05T10:46:55.0106",
"POS": "00000000530096832544"
}
}
]
As you can see, the value BUSINESS_CONCEPT can be null. It is the only null value, so I suppose the exception is due to that field. How could I make the sink insert the value as null?
You need to change the definition of
{
"type": "int64",
"field": "BUSINESS_CONCEPT"
}
to
{
"type": ["null", "int64"],
"field": "BUSINESS_CONCEPT"
}
in order to treat BUSINESS_CONCEPT as optional field.