Kafka Connect JDBC Sink "schema does not exist" - apache-kafka

I am tryin to sink data into postgresql with kafka connect but I am getting the error that the schema does not exist.
Is it possible that the name of the topic, that includes dots, makes the problem, because the error mentioned that the schema "logstash" does not exist, and this is the string till the first dot?
ERROR:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:568)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:326)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: org.postgresql.util.PSQLException: ERROR: schema \"logstash\" does not exist
Position: 14
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:546)
... 10 more
Caused by: java.sql.SQLException: org.postgresql.util.PSQLException: ERROR: schema \"logstash\" does not exist
Position: 14
... 12 more
Sink config:
{
"name": "jdbc.apache.access.log.sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "logstash.apache.access.log",
"connection.url": "jdbc:postgresql://<IP_OF_POSTGRESQL>:5432/kafka",
"connection.user": "kafka",
"connection.password": "<PASSWORD>",
"insert.mode": "upsert",
"pk.mode": "kafka",
"auto.create": true,
"auto.evolve": true,
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true"
}
}
Schema (called with API):
{
"subject": "logstash.apache.access.log-value",
"version": 3,
"id": 3,
"schema": "{\"type\":\"record\",\"name\":\"log\",\"namespace\":\"value_logstash.apache.access\",\"fields\":[{\"name\":\"clientip\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"verb\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"response\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"request\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"bytes\",\"type\":[\"null\",\"string\"],\"default\":null}]}"
}
EDITED:
I tried to create a new topic with underscores. It looks like the dots are really the cause of the error. Is there any solution I can avoid it or do I made a mistake in my configuration...?

You should be able to use a RegexRouter SMT to rename a topic to something without periods before the Sink action of the database write.

Related

Avro schema must be a record

I'm trying to use S3SinkConnector with the following settings:
{
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
"flush.size": 1,
"s3.bucket.name": "*****",
"s3.object.tagging": "true",
"s3.region": "us-east-2",
"aws.access.key.id": "*****",
"aws.secret.access.key": "*****",
"s3.part.retries": 5,
"s3.retry.backoff.ms": 1000,
"behavior.on.null.values": "ignore",
"keys.format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"headers.format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"store.kafka.headers": "true",
"store.kafka.keys": "true",
"topics": "***",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"topics.dir": "kafka-backup",
"value.converter": "io.confluent.connect.json.JsonSchemaConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
"locale": "en-US",
"timezone": "UTC",
"timestamp.extractor": "Record"
}
The records in Kafka are store in JSON format and saved there via io.confluent.connect.json.JsonSchemaConverter. So all messages has strict schema.
When sink connector trying to read records from Kafka - I got exception - "Avro schema must be a record."
I didn't get why I got this error, cause I don't use any avro format.
The full stack trace:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:631)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Avro schema must be a record.
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:124)
at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:150)
at org.apache.parquet.avro.AvroParquetWriter.access$200(AvroParquetWriter.java:36)
at org.apache.parquet.avro.AvroParquetWriter$Builder.getWriteSupport(AvroParquetWriter.java:182)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:563)
at io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider$1.write(ParquetRecordWriterProvider.java:102)
at io.confluent.connect.s3.format.S3RetriableRecordWriter.write(S3RetriableRecordWriter.java:46)
at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.write(KeyValueHeaderRecordWriterProvider.java:107)
at io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:562)
at io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:311)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:254)
at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:205)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:601)
Version of S3connector - 10.3.0
Version of kafka-connect - 7.0.1
You need to not use ParquetFormat, or you need to produce Avro. ParquetFormat requires Avro (source of S3 sink).

Kafka Connect Mongo Sink connector is failing due to schema

I'm doing my first steps in the kafka-connect area and things are not going as expected.
I created a Mongo sink connector (kafka to mongo), but it fails to consume the messages with the following error:
Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic test_topic to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:114)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
I don't wanna use any schema but for some reason it fails on the deserialize, probably because the schema config is missing.
My configurations:
{
"name": "kafka-to-mongo",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"topics": "test_topic",
"connection.uri": "mongodb://mongodb4:27017/test-db",
"database": "auto-resolve",
"value.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"collection": "test_collection",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"document.id.strategy.partial.value.projection.list": "id",
"document.id.strategy.partial.value.projection.type": "AllowList",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy"
}
}
Example of a message in the topic:
{
"_id": "62a6e4d88c1e1e0011902616",
"type": "alert",
"key": "test444",
"timestamp": 1655104728,
"source_system": "api.test",
"tags": [
{
"type": "aaa",
"value": "bbb"
},
{
"type": "fff",
"value": "rrr"
}
],
"ack": "no",
"main": "nochange",
"all_ids": []
}
Any ideas? Thanks!

org.postgresql.util.PSQLException: ERROR: syntax error

I am trying to add debezium-connector-postgres to my Kafka Connect.
First I validated my config by
PUT http://localhost:8083/connector-plugins/io.debezium.connector.postgresql.PostgresConnector/config/validate
{
"name": "postgres-kafkaconnector",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
"database.server.name": "my_db_server",
"table.include.list": "public.products",
"plugin.name": "pgoutput"
}
It returns this which shows no error:
{
"name": "io.debezium.connector.postgresql.PostgresConnector",
"error_count": 0,
...
Then I try add this connector by
POST http://localhost:8083/connectors
{
"name": "postgres-kafkaconnector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "example.com",
"database.port": "5432",
"database.dbname": "my_db",
"database.user": "xxx",
"database.password": "xxx",
"database.server.name": "my_db_server",
"table.include.list": "public.products",
"plugin.name": "pgoutput"
}
}
The connector got added successfully.
However, I got error when I run
GET http://localhost:8083/connectors/postgres-kafkaconnector/status
ERROR WorkerSourceTask{id=postgres-kafkaconnector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:179)
io.debezium.jdbc.JdbcConnectionException: ERROR: syntax error
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initPublication(PostgresReplicationConnection.java:180)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.createReplicationSlot(PostgresReplicationConnection.java:351)
at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:136)
at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:130)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:208)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2565)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2297)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:322)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:322)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:308)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:284)
at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:236)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initPublication(PostgresReplicationConnection.java:137)
... 11 more
How can I see the full log especially that 11 more lines? I also checked my Kubernetes pod log, it shows same part without that 11 more lines.
The current error content is not very useful. Any help for further debugging would be appreciate!
UPDATE 1:
GET http://localhost:8083/connectors
returns
[
"postgres-kafkaconnector"
]
Before we are using Postgres 9.6.12, after switching to Postgres 13.6.
With same setup step, it works well this time.
My best guess is maybe because the debezium-connector-postgres version 1.8.1.Final I am using does not work well with old Postgres 9.6.12.

Kafka Connector - Tolerance exceeded in error handler

I have above 50 source connectors over sql server but two of them are going in error, please tell me what could be the reason as we have limited access to kafka server.
{
"name": "xxxxxxxxxxxxx",
"connector": {
"state": "RUNNING",
"worker_id": "xxxxxxxxxxxxxx:8083"
},
"tasks": [
{
"state": "FAILED",
"trace": "org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)\n\tat org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:44)\n\tat org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:292)\n\tat org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:228)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: org.apache.kafka.connect.errors.DataException: Schema required for [updating schema metadata]\n\tat org.apache.kafka.connect.transforms.util.Requirements.requireSchema(Requirements.java:31)\n\tat org.apache.kafka.connect.transforms.SetSchemaMetadata.apply(SetSchemaMetadata.java:64)\n\tat org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:44)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)\n\t... 11 more\n",
"id": 0,
"worker_id": "xxxxxxxxxxxxx:8083"
}
],
"type": "source"
}
Source Connector configurations:
{
"name": "xxxxxxxx",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.history.kafka.topic": "dbhistory.fullfillment.ecom",
"transforms": "unwrap,setSchemaName",
"internal.key.converter.schemas.enable": "false",
"offset.storage.partitons": "2",
"include.schema.changes": "false",
"table.whitelist": "dbo.abc",
"decimal.handling.mode": "double",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms.setSchemaName.schema.name": "com.data.meta.avro.abc",
"database.dbname": "xxxxxx",
"database.user": "xxxxxx",
"database.history.kafka.bootstrap.servers": "xxxxxxxxxxxx",
"database.server.name": "xxxxxxx",
"database.port": "xxxxxx",
"transforms.setSchemaName.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"key.converter.schemas.enable": "false",
"value.converter.schema.registry.url": "http://xxxxxxxxxx:8081",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "xxxxxxx",
"database.password": "xxxxxxx",
"internal.value.converter.schemas.enable": "false",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "xxxxxxxxxxx"
}
}
If you look at the stack trace in the trace field, and replacing the \n and \t characters within with newline and tabs, you will see:
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:44)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:292)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:228)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Schema required for [updating schema metadata]
at org.apache.kafka.connect.transforms.util.Requirements.requireSchema(Requirements.java:31)
at org.apache.kafka.connect.transforms.SetSchemaMetadata.apply(SetSchemaMetadata.java:64)
at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:44)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 11 more
And thus the cause of your error is being thrown in the SetSchemaMetadata Single Message Transform: org.apache.kafka.connect.errors.DataException: Schema required for [updating schema metadata]
I would check the configuration on your connectors, isolate the ones that have failed, and look at the Single Message Transform configuration. This issue might be relevant.

S3 sink record field TimeBasedPartitioner not working

I am trying to deploy s3 sink connector where s3 partitions needs to be based on a field coming in data props.eventTime
Following is my config :
{
"name" : "test_timeBasedPartitioner",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"partition.duration.ms": "3600000",
"s3.region": "us-east-1",
"topics.dir": "dream11",
"flush.size": "50000",
"topics": "test_topic",
"s3.part.size": "5242880",
"tasks.max": "5",
"timezone": "Etc/UTC",
"locale": "en",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"rotate.schedule.interval.ms": "1800000",
"path.format": "'EventDate'=YYYYMMdd",
"s3.bucket.name": "test_bucket",
"partition.duration.ms": "86400000",
"timestamp.extractor": "RecordField",
"timestamp.field": "props.eventTime"
}
Following is my sample json that is present in kafka topic :
{
"eventName": "testEvent",
"props": {
"screen_resolution": "1436x720",
"userId": 0,
"device_name": "1820",
"eventTime": "1565792661712"
}
}
And the exception that I am getting is :
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:546)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:302)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Invalid format: "1564561484906" is malformed at "4906"
at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
at io.confluent.connect.storage.partitioner.TimeBasedPartitioner$RecordFieldTimestampExtractor.extract(TimeBasedPartitioner.java:281)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:199)
at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:176)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:195)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:524)
... 10 more
Is there something that I am missing here to configure?
Any help would be appreciated.
Your field props.eventTime is coming in as microsecond and not millisecond.
This can be identified in the stack trace and by inspecting the relevant code in org.joda.time doParseMillis method, which is used by the Connector partitioner TimeBasedPartitioner and its timestamp extractor from message payload RecordFieldTimestampExtractor when the timestamp.field is a STRING:
Caused by: java.lang.IllegalArgumentException: Invalid format: "1564561484906" is malformed at "4906"
at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
at io.confluent.connect.storage.partitioner.TimeBasedPartitioner$RecordFieldTimestampExtractor.extract(TimeBasedPartitioner.java:281)
You could follow one of these solutions:
Write your own TimestampExtractor to support microsecond. You can check how to write a custom TimestampExtractor here.
Change/Transform your source data to include a field that comes in with millisecond instead of microsecond
Followup on some issues where default TimestampExtractor flexibility is discussed and suggest/contribute to have it support your use case.