I have above 50 source connectors over sql server but two of them are going in error, please tell me what could be the reason as we have limited access to kafka server.
{
"name": "xxxxxxxxxxxxx",
"connector": {
"state": "RUNNING",
"worker_id": "xxxxxxxxxxxxxx:8083"
},
"tasks": [
{
"state": "FAILED",
"trace": "org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)\n\tat org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:44)\n\tat org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:292)\n\tat org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:228)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: org.apache.kafka.connect.errors.DataException: Schema required for [updating schema metadata]\n\tat org.apache.kafka.connect.transforms.util.Requirements.requireSchema(Requirements.java:31)\n\tat org.apache.kafka.connect.transforms.SetSchemaMetadata.apply(SetSchemaMetadata.java:64)\n\tat org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:44)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)\n\t... 11 more\n",
"id": 0,
"worker_id": "xxxxxxxxxxxxx:8083"
}
],
"type": "source"
}
Source Connector configurations:
{
"name": "xxxxxxxx",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.history.kafka.topic": "dbhistory.fullfillment.ecom",
"transforms": "unwrap,setSchemaName",
"internal.key.converter.schemas.enable": "false",
"offset.storage.partitons": "2",
"include.schema.changes": "false",
"table.whitelist": "dbo.abc",
"decimal.handling.mode": "double",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms.setSchemaName.schema.name": "com.data.meta.avro.abc",
"database.dbname": "xxxxxx",
"database.user": "xxxxxx",
"database.history.kafka.bootstrap.servers": "xxxxxxxxxxxx",
"database.server.name": "xxxxxxx",
"database.port": "xxxxxx",
"transforms.setSchemaName.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"key.converter.schemas.enable": "false",
"value.converter.schema.registry.url": "http://xxxxxxxxxx:8081",
"internal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "xxxxxxx",
"database.password": "xxxxxxx",
"internal.value.converter.schemas.enable": "false",
"internal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "xxxxxxxxxxx"
}
}
If you look at the stack trace in the trace field, and replacing the \n and \t characters within with newline and tabs, you will see:
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:44)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:292)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:228)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Schema required for [updating schema metadata]
at org.apache.kafka.connect.transforms.util.Requirements.requireSchema(Requirements.java:31)
at org.apache.kafka.connect.transforms.SetSchemaMetadata.apply(SetSchemaMetadata.java:64)
at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:44)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 11 more
And thus the cause of your error is being thrown in the SetSchemaMetadata Single Message Transform: org.apache.kafka.connect.errors.DataException: Schema required for [updating schema metadata]
I would check the configuration on your connectors, isolate the ones that have failed, and look at the Single Message Transform configuration. This issue might be relevant.
Related
I'm trying to use S3SinkConnector with the following settings:
{
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
"flush.size": 1,
"s3.bucket.name": "*****",
"s3.object.tagging": "true",
"s3.region": "us-east-2",
"aws.access.key.id": "*****",
"aws.secret.access.key": "*****",
"s3.part.retries": 5,
"s3.retry.backoff.ms": 1000,
"behavior.on.null.values": "ignore",
"keys.format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"headers.format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"store.kafka.headers": "true",
"store.kafka.keys": "true",
"topics": "***",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"topics.dir": "kafka-backup",
"value.converter": "io.confluent.connect.json.JsonSchemaConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
"locale": "en-US",
"timezone": "UTC",
"timestamp.extractor": "Record"
}
The records in Kafka are store in JSON format and saved there via io.confluent.connect.json.JsonSchemaConverter. So all messages has strict schema.
When sink connector trying to read records from Kafka - I got exception - "Avro schema must be a record."
I didn't get why I got this error, cause I don't use any avro format.
The full stack trace:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:631)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Avro schema must be a record.
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:124)
at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:150)
at org.apache.parquet.avro.AvroParquetWriter.access$200(AvroParquetWriter.java:36)
at org.apache.parquet.avro.AvroParquetWriter$Builder.getWriteSupport(AvroParquetWriter.java:182)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:563)
at io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider$1.write(ParquetRecordWriterProvider.java:102)
at io.confluent.connect.s3.format.S3RetriableRecordWriter.write(S3RetriableRecordWriter.java:46)
at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.write(KeyValueHeaderRecordWriterProvider.java:107)
at io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:562)
at io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:311)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:254)
at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:205)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:601)
Version of S3connector - 10.3.0
Version of kafka-connect - 7.0.1
You need to not use ParquetFormat, or you need to produce Avro. ParquetFormat requires Avro (source of S3 sink).
I have data in Kafka Topic which is Avro serialised and compressed using zstd codec. To transfer this data to S3 I have created a S3SinkConnector with below config -
{
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"s3.region": "ap-south-1",
"topics.dir": "0/test_debezium_sept_12_mon_5/public/test_table",
"flush.size": "10000",
"tasks.max": "1",
"s3.part.size": "67108864",
"timezone": "Asia/Calcutta",
"rotate.interval.ms": "60000",
"locale": "en_GB",
"format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"s3.bucket.name": "zeta-aws-aps1-metis-0-s3-pvt",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"partition.duration.ms": "86400000",
"schema.compatibility": "NONE",
"topics": "cdc_test_debezium_sept_12_mon_5.public.test_table",
"parquet.codec": "gzip",
"connect.meta.data": "true",
"value.converter.schema.registry.url": {{url}},
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"name": "cdc_test_debezium_sept_12_mon_5.public.test_table_cdc_zeta-aws-aps1-metis-0-s3-pvt_ap-south-1_sink",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"path.format": "'date'=YYYY-MM-dd",
"rotate.schedule.interval.ms": "180000",
"timestamp.extractor": "RecordField",
"key.converter.schema.registry.url": "{{url}}",
"timestamp.field": "cdc_source_ts_ms"
}
The above S3SinkConnector fails with following error
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic cdc_test_debezium_sept_12_mon_5.public.test_table to Avro: \n\tat io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:118)\n\tat org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$0(WorkerSinkTask.java:492)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:146)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:180)\n\t... 13 more\nCaused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 1106\nCaused by: java.io.EOFException
NOTE: If I disable compression on producer/kafka side then S3 connector works fine. Issue is only while enabling compression on Kafka side.
I'm doing my first steps in the kafka-connect area and things are not going as expected.
I created a Mongo sink connector (kafka to mongo), but it fails to consume the messages with the following error:
Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic test_topic to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:114)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
I don't wanna use any schema but for some reason it fails on the deserialize, probably because the schema config is missing.
My configurations:
{
"name": "kafka-to-mongo",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"topics": "test_topic",
"connection.uri": "mongodb://mongodb4:27017/test-db",
"database": "auto-resolve",
"value.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"collection": "test_collection",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"document.id.strategy.partial.value.projection.list": "id",
"document.id.strategy.partial.value.projection.type": "AllowList",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy"
}
}
Example of a message in the topic:
{
"_id": "62a6e4d88c1e1e0011902616",
"type": "alert",
"key": "test444",
"timestamp": 1655104728,
"source_system": "api.test",
"tags": [
{
"type": "aaa",
"value": "bbb"
},
{
"type": "fff",
"value": "rrr"
}
],
"ack": "no",
"main": "nochange",
"all_ids": []
}
Any ideas? Thanks!
I am tryin to sink data into postgresql with kafka connect but I am getting the error that the schema does not exist.
Is it possible that the name of the topic, that includes dots, makes the problem, because the error mentioned that the schema "logstash" does not exist, and this is the string till the first dot?
ERROR:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:568)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:326)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: org.postgresql.util.PSQLException: ERROR: schema \"logstash\" does not exist
Position: 14
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:546)
... 10 more
Caused by: java.sql.SQLException: org.postgresql.util.PSQLException: ERROR: schema \"logstash\" does not exist
Position: 14
... 12 more
Sink config:
{
"name": "jdbc.apache.access.log.sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "logstash.apache.access.log",
"connection.url": "jdbc:postgresql://<IP_OF_POSTGRESQL>:5432/kafka",
"connection.user": "kafka",
"connection.password": "<PASSWORD>",
"insert.mode": "upsert",
"pk.mode": "kafka",
"auto.create": true,
"auto.evolve": true,
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true"
}
}
Schema (called with API):
{
"subject": "logstash.apache.access.log-value",
"version": 3,
"id": 3,
"schema": "{\"type\":\"record\",\"name\":\"log\",\"namespace\":\"value_logstash.apache.access\",\"fields\":[{\"name\":\"clientip\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"verb\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"response\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"request\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"bytes\",\"type\":[\"null\",\"string\"],\"default\":null}]}"
}
EDITED:
I tried to create a new topic with underscores. It looks like the dots are really the cause of the error. Is there any solution I can avoid it or do I made a mistake in my configuration...?
You should be able to use a RegexRouter SMT to rename a topic to something without periods before the Sink action of the database write.
The initial sync works as expected but then the connector just stops and does not care about further table changes. There are no errors thrown and the connector is still marked as active and running.
Database: Amazon Postgres v10.7
Debezium config:
"name": "postgres_cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "...",
"database.port": "5432",
"database.user": "...",
"database.password": "...",
"database.dbname": "...",
"database.server.name": "...",
"table.whitelist": "public.table1,public.table2,public.table3",
"plugin.name": "pgoutput",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "unwrap, route, extractId",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": false,
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "[^.]+\\.[^.]+\\.(.+)",
"transforms.route.replacement": "postgres_$1",
"transforms.extractId.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractId.field": "id"
}
}
Any thoughts about what the problem could be?
Edit:
Log-Errors:
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to flush, timed out while waiting for producer to flush outstanding 75687 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)