JDBC Sink connector throwing java.sql.BatchUpdateException - apache-kafka

I started a Sink JDBC some weeks ago. Everything was fine until the logs started to thow this error:
[2019-06-27 11:35:44,121] WARN Write of 500 records failed, remainingRetries=10 (io.confluent.connect.jdbc.sink.JdbcSinkTask:68)
java.sql.BatchUpdateException: [Teradata JDBC Driver] [TeraJDBC 16.20.00.10] [Error 1338] [SQLState HY000] A failure occurred while executing a PreparedStatement batch request. Details of the failure can be found in the exception chain that is accessible with getNextException.
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeBatchUpdateException(ErrorFactory.java:149)
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeBatchUpdateException(ErrorFactory.java:138)
at com.teradata.jdbc.jdbc_4.TDPreparedStatement.executeBatchDMLArray(TDPreparedStatement.java:276)
at com.teradata.jdbc.jdbc_4.TDPreparedStatement.executeBatch(TDPreparedStatement.java:2754)
at io.confluent.connect.jdbc.sink.BufferedRecords.flush(BufferedRecords.java:99)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:78)
at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:62)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:66)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:429)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:250)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:179)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:139)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:182)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I have already tried to low the batch.size property, evento as low as 100 and it is still failing.
Added connector status:
{"name":"teradata-sink-K_C_OSUSR_DGL_DFORM_I1-V2",
"connector":{
"state":"RUNNING",
"worker_id":"10.28.148.64:41029"},
"tasks":[{"state":"FAILED","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:451)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:250)
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:179)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:139)
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:182)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)\n","id":0,"worker_id":"10.28.148.64:41029"}]}

I faced similar issue while trying to write to teradata from spark using JDBC. Apparently, this happens with teradata only. It happens when you try to write to one table using multiple JDBC connections parallely. In my case, spark was forking multiple jdbc connection to teradata while writing the data to Teradata.
Your options are:
You have to restrict your application not to make multiple jdbc
connections, you can monitor that from Viewpoint if you have access.
You can try providing a jdbc type=FASTLOAD and see if it is working.
like this
Or go for TPT which handles this issue internally.

Related

Unable to run connect-distributed.sh

I get the below error when I am executing the command.. My kafka messages are not flowing down into hdfs
/bin/connect-distributed.sh /etc/schema-registry/connect-avro-distributed.properties /kafka-connect/quickstart-hdfs.properties
ERROR Failed to start task local-file-source-0 (org.apache.kafka.connect.runtime.Worker:456)
org.apache.kafka.common.config.ConfigException: Invalid value io.confluent.kafka.serializers.subject.TopicNameStrategy for configuration key.subject.name.strategy: Class io.confluent.kafka.serializers.subject.TopicNameStrategy could not be found.
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:718)
at org.apache.kafka.common.config.ConfigDef$ConfigKey.<init>(ConfigDef.java:1063)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:148)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:168)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:207)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:369)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:382)
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDeConfig.baseConfigDef(AbstractKafkaSchemaSerDeConfig.java:153)
at io.confluent.connect.avro.AvroConverterConfig.<init>(AvroConverterConfig.java:27)
at io.confluent.connect.avro.AvroConverter.configure(AvroConverter.java:63)
at org.apache.kafka.connect.runtime.isolation.Plugins.newConverter(Plugins.java:263)
at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:434)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:865)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:110)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:880)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:876)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
connect-distributedonly takes one property file, not two. The quickstart files are meant to be used with connect-standalone
The actual error you're getting about the topic name strategy is likely an issue/bug with the version of the Avro serializers you're using

Is there a way to using kafka schema registry without magic byte?

I'm trying to make my applications work using the schema registry from confluent but at this point I'm not in total control of the producers, you can even see them as legacy applications that simply are not bound to the confluent products.
I was looking at the confluent information and it seems all the messages should include in the payload a Magic Byte and Schema ID
https://docs.confluent.io/3.2.0/schema-registry/docs/serializer-formatter.html
or else when I try to consume it I get an error:
[2020-09-25 13:12:09,008] ERROR WorkerSinkTask{id=s3_parquet_connector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:491)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:324)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:200)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic com.obj_pos to Protobuf:
at io.confluent.connect.protobuf.ProtobufConverter.toConnectData(ProtobufConverter.java:123)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:491)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Protobuf message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2020-09-25 13:12:09,010] ERROR WorkerSinkTask{id=s3_parquet_connector-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
my question is, if there is a way of somehow either disable this magic byte check or if I could create a kafka stream that would just append a this 5 bytes to the initial message so that afterwards I could consume it with a consumer that would connect to the schema registry.
What is happening is that the producer is out of my control so I would need somehow to be able to deserialize messages that do not contain those 5 bytes because they are produced by producers that don't rely on the confluent serializers/de-serializers
they are produced by producers that don't rely on the confluent serializers
Then the problem isn't the Registry.
You shouldn't be using the Converters written by Confluent to consume the messages, as those are bound to the Registry, and there is no way to skip it.
You would instead use the BlueApron ones (assuming the data is really protobuf), or write your own Converter classes.

io.debezium.DebeziumException: The db history topic or its content is fully or partially missing

I am facing frequent issues related to db history topic which is created by the connector itself. There is a temporary solution (by changing the name of the db history topic) which I tried but it's not the better way to handle it. Also, the retention byte is set to -1.
This is the error stack.
ERROR WorkerSourceTask{id=cdcit.ventures.sandbox.streamdomain.streamsubdomain.order-filter-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
io.debezium.DebeziumException: The db history topic or its content is fully or partially missing. Please check database history topic configuration and re-execute the snapshot.
at io.debezium.relational.HistorizedRelationalDatabaseSchema.recover(HistorizedRelationalDatabaseSchema.java:47)
at io.debezium.connector.sqlserver.SqlServerConnectorTask.start(SqlServerConnectorTask.java:87)
at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:101)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:213)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2020-09-04 19:12:26,445] ERROR WorkerSourceTask{id=cdcit.ventures.sandbox.streamdomain.streamsubdomain.order-filter-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
You must use a single database history topic per connector. The topic must not be used by more than one connector.
please change the value of parameter "name" in the config "connector.properties" to a new name.
Thanks.

debezium error: "option "include-unchanged-toast" = "0" is unknown"

I have a question about debezium and I was stuck for more than two weeks.
I use debezium to pull data from remote PostgreSQL every minute, and I always see the below error as output:
"option "include-unchanged-toast" = "0" is unknown".
the environment is as follows:
debezium 0.9.5 final
postgresql 10.8
kafka 2.1.1
the connector configuration is as follows:
name=diansheng340831
slot.name=diansheng340831
connector.class=io.debezium.connector.postgresql.PostgresConnector
database.hostname=*.*.*
database.port=5432
database.user=*
database.password=*
database.dbname=*
database.history.kafka.bootstrap.servers=localhost:9092
database.server.name=diansheng340831
plugin.name=wal2json
table.whitelist=*
transforms=dropFieldByValue
transforms.dropFieldByValue.type=org.apache.kafka.connect.transforms.ReplaceField$Value
transforms.dropFieldByValue.whitelist=after
max.batch.size=5120
snapshot.fetch.size=5120
database.tcpKeepAlive=true
schema.refresh.mode=columns_diff_exclude_unchanged_toast
max.queue.size=202400
snapshot.mode=never
the error is as follows:
ERROR unexpected exception while streaming logical changes (io.debezium.connector.postgresql.RecordsStreamProducer:147)
org.postgresql.util.PSQLException: ERROR: option "include-unchanged-toast" = "0" is unknown
Where: slot "guangdian10831", output plugin "wal2json", in the startup callback
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1116)
at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1035)
at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:41)
at org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:155)
at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:124)
at org.postgresql.core.v3.replication.V3PGReplicationStream.read(V3PGReplicationStream.java:70)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.read(PostgresReplicationConnection.java:264)
at io.debezium.connector.postgresql.RecordsStreamProducer.streamChanges(RecordsStreamProducer.java:140)
at io.debezium.connector.postgresql.RecordsStreamProducer.lambda$start$0(RecordsStreamProducer.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-08-22 11:16:23,425] INFO WorkerSourceTask{id=guangdian10831-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:397)
[2019-08-22 11:16:23,426] INFO WorkerSourceTask{id=guangdian10831-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:414)
[2019-08-22 11:16:23,426] ERROR WorkerSourceTask{id=guangdian10831-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: An exception ocurred in the change event producer. This connector will be stopped.
at io.debezium.connector.base.ChangeEventQueue.throwProducerFailureIfPresent(ChangeEventQueue.java:170)
at io.debezium.connector.base.ChangeEventQueue.poll(ChangeEventQueue.java:151)
at io.debezium.connector.postgresql.PostgresConnectorTask.poll(PostgresConnectorTask.java:161)
at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:244)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:220)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Can somebody help me?
I have solved the problem.
I commit the following code in debezium source code and rebuild it.
#Override
public ChainedLogicalStreamBuilder tryOnceOptions(ChainedLogicalStreamBuilder builder) {
// return builder.withSlotOption("include-unchanged-toast", 0); ---original
return builder; ---modified
}

Kafka Connect: No suitable driver found

I am trying Kafka with Postgres Sink using JDBC-sink connector.
Exception:
INFO Unable to connect to database on attempt 1/3. Will retry in 10000 ms. (io.confluent.connect.jdbc.util.CachedConnectionProvider:91)
java.sql.SQLException: No suitable driver found for jdbc:postgresql://localhost:5432/casb
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at io.confluent.connect.jdbc.util.CachedConnectionProvider.newConnection(CachedConnectionProvider.java:85)
at io.confluent.connect.jdbc.util.CachedConnectionProvider.getValidConnection(CachedConnectionProvider.java:68)
at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:56)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:69)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:495)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:288)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:198)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:166)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Sink.properties:
name=test-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=fp_test
connection.url=jdbc:postgresql://localhost:5432/casb
connection.user=admin
connection.password=***
auto.create=true
I have set plugin.path=/usr/share/java/kafka-connect-jdbc
On /usr/share/java/kafka-connect-jdbc I have the following files:
kafka-connect-jdbc-4.0.0.jar , postgresql-9.4-1206-jdbc41.jar, sqlite-jdbc-3.8.11.2.jar and some other jars that basically come packaged along with confluent.
I then downloaded postgres-jdbc driver jar postgresql-42.2.2.jar, copied it in the same folder and tried again. Still the same exception.
Kindly help me out with this.
Setting plugin.path=/usr/share/java and CLASSPATH=/usr/share/java/kafka-connect-jdbc/ solved the issue.