I am currently configuring a Kafka JDBC sink connector to write my kafka messages in a Postgres table. All is working fine except the error handling part. Sometimes, messages in my topic have wrong data and so the database constraints fail with an expected SQL EXCEPTION duplicate key...
I would like to put these wrong messages in a DLQ and to commit the offset to process the next messages, so I configured the connector with
"errors.tolerance": "all"
"errors.deadletterqueue.topic.name": "myDLQTopicName"
but it does not change a thing, the connector retries until it crashes.
Is there another configuration I'm missing? I saw only these two in the confluent documentation
(I see in the jdbc connector changelog that the error handling in the put stage is implemented in the version 10.1.0 (CCDB-192) and I'm using the last version of the connector 10.5.1)
"The Kafka Connect framework provides generic error handling and dead-letter queue capabilities which are available for problems with [de]serialisation and Single Message Transforms. When it comes to errors that a connector may encounter doing the actual pull or put of data from the source/target system, it’s down to the connector itself to implement logic around that."
If the duplicate key are the only type of bad records you need to deal with, you might consider use upsert in insert.mode
Related
Currently I setup 2 separate connectors running the JDBC Sink Connector to ingest topics produced from the producer to be read into the database. Sometimes, I see errors in the logs, which cause messages produced fails to get stored into the database.
The errors I constantly see is
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id:11
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject 'topic-io..models.avro.Topic' not found; error code404
Which is true because TopicRecordName is not supposed to be directed toward this topic but another topic that I directed to, it is just supposed to be directed toward models.avro.Topic
I was wondering if this happens constantly, is there a way to re-ingest those produced records/messages into the database after the messages got produced. For example, if messages got produced during 12am-1am, and some kind of errors showed up in the logs and failed to consume those messages during that timeframe, the configurations or offset can restore it by re-ingesting it to the database. The error is due to the schema registry link failed to read/ link to the correct schema link. It failed because it read the incorrect worker file, since one of my worker file have a value.converter.value.subject.name.strategy=io.confluent.kafka.serializers.subject.TopicRecordNameStrategy while the other connector does not read that subjectName.
Currently, I set the consumer.auto.offset.reset=earliest to start reading message.
Is there a way to get back those data like into a file and I can restore those data because I am deploying to production and there must be data consumed into the database at all times without any errors.
Rather than mess with the consumer group offsets, which would eventually cause correctly processed data to get consumed again and duplicated, you could use the dead letter queue configurations to send error records to a new topic, which you'd need to monitor and consume before the topic retention completely drops the events
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/
one of my worker file have a [different config]
This is why configuration management software is important. Don't modify one server in a distributed system without a process that updates them all. Ansible/Terraform are most common if you're not running the connectors in Kubernetes
What I have: http://prntscr.com/szmkn4
That's the most barebone version of it. Some stuff's gonna come later, but for now the issue is that data is properly arriving in my consumer in form of a JSON string.
I want to throw it into a flink table, which I create with this statement: http://prntscr.com/szmll3
I then check if it got created, just to be sure and get this: http://prntscr.com/szmn79
Next I wanna turn on the machine and check my data with "SELECT * FROM RawData" and get the following error:
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.kafka.shaded.org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
I assume it's an issue with how I created my table, but am honestly not sure where/what/how.
My publisher's properties in NiFi are:
https://prnt.sc/szoe6z
and
http://prntscr.com/szoeka
If you need any additional information from me, feel free to ask.
Thanks in advance,
Psy
[ERROR] Could not execute SQL statement. Reason: org.apache.flink.kafka.shaded.org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
That likely means that the Kafka bootstrap servers you've specified to Flink cannot have their hostnames resolved on the Flink servers. You'd know if it's a NiFi issue because you'd see errors in the NiFi flow saying it couldn't produce to Kafka. It might be producing to the wrong topic or even the wrong set of brokers if you have multiple Kafka clusters, but the error you posted isn't a NiFi issue.
Reading the documentation of this connector there isn't a mention about this characteristic.
So, does this connector guarantee that it won't produce duplicated records under broker crashes or whatever could happen?
Do we have to configure something to get indempotence the same way we would do with any other Kafka Producer (enable.idempotence: true)?
Kafka Connect JDBC source connector, is not idempotent at the moment. Here's the relevant KIP-318 and JIRA ticket.
I'm using kafka connect (confluent distribution) to connect an mqtt broker to a kafka topic (https://docs.lenses.io/connectors/source/mqtt.html), but when a message arrives and it isn't conform to the expected schema, the connector stops!
How can I prevent this from happening?
I'd like also to manage the error and for example keep track of it!
If you are using a ready made connector, you need to satisfy the proper schema. If any error occurs it will stop the connector. So, best way is to identify the schema error based on error message.
If its impossible to use the existing connector, create one for your own which could satisfy your need.
I'm developing a Kafka Sink connector on my own. My deserializer is JSONConverter. However, when someone send a wrong JSON data into my connector's topic, I want to omit this record and send this record to a specific topic of my company.
My confuse is: I can't find any API for me to get my Connect's bootstrap.servers.(I know it's in the confluent's etc directory but it's not a good idea to write hard code of the directory of "connect-distributed.properties" to get the bootstrap.servers)
So question, is there another way for me to get the value of bootstrap.servers conveniently in my connector program?
Instead of trying to send the "bad" records from a SinkTask to Kafka, you should instead try to use the dead letter queue feature that was added in Kafka Connect 2.0.
You can configure the Connect runtime to automatically dump records that failed to be processed to a configured topic acting as a DLQ.
For more details, see the KIP that added this feature.