How to ignore error result in Kafka Connect Elasticsearch - apache-kafka

I am trying to run kafka connect for elastic search .
But because of some mistake i entered wrong record in kafka topic .
Now i fixed that issue and inserting correct value but elastic search is still throwing error on previous record in the topic
Here is the error
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'lambdaDemo0': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"lambdaDemo0-9749-0e710000fd04"; line: 1, column: 13]
Is there any way i can ignore the older record in the topic and tell kafka connect to pick latest record ?
I am trying to delete the topic i get topic marked for deletion but still records are present in the topic .
I tried below two properties but does seems to be working
drop.invalid.message=true
behavior.on.malformed.documents=ignore
Please suggest how i can clean up the wrong record in the topic

You can tell Kafka Connect to just skip bad records
errors.tolerance = all
Optionally, you can route these messages to another topic (known as a dead letter queue) for inspection by adding
errors.tolerance = all
errors.deadletterqueue.topic.name = my-dlq-topic
These settings are valid for Kafka Connect with any connector that is failing in the serialisation/deserialisation stage of processing. For more information see this article.

Related

How to solve my error in redshiftsinkconnector

I try to connect kafka and redshift in the redshiftsink connector.the connector is running and the task is
enter image description here failed .
Your error - Failed to deserialize data in topic ... to Avro
So, if your data is not Avro, then change your key.converter and/or value.converter to the appropriate config. You need to consult your Producer code for the matching serializers.

Kafka connect S3 source failing with read-only registry

I am trying to read avro records stored in S3 in order to put them back in a kafka topic using the S3 source provided by confluent.
I already have the topics and the registry setup with the right schemas but when the connect S3 source tries to serialize the my records to the topics I get this error
Caused by: org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema: ... at
io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:121)
at
io.confluent.connect.avro.AvroConverter$Serializer.serialize(AvroConverter.java:143)
at
io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:84)
... 15 more Caused by:
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
Subject com-row-count-value is in read-only mode; error code: 42205
at
io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:495)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:486)
at
io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:459)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:214)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:276)
at
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:252)
at
io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:75)
it seems that the connect producer does not try to get the schema_id if it exists but tries to write it but my registry is readonly.
Anyone knows if this is an issue or there are some configuration I am missing ?
If you're sure the correct schema for that subject is already registered by some other means, you can try to set auto.register.schemas to false in the serializer configuration.
See here for more details: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#handling-differences-between-preregistered-and-client-derived-schemas

Does kafka connect restart failed task

We have a source connector that reads from rdbms and put to kafka. It uses schema registry with avro schema.
I am finding following exceptions in kafka connect log and schema registry log respectively.
1.
Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:426)
WorkerSourceTask{id=A-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:443)
Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:186)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
.
.
Caused by: org.apache.kafka.connect.errors.DataException: Failed to serialize Avro data from topic A :
at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:91)
at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:63)
.
.
Caused by: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema:
.
.
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Register operation timed out; error code: 50002
.
.
Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:187)
Stopping JDBC source task (io.confluent.connect.jdbc.source.JdbcSourceTask:314)
Closing the Kafka producer with timeoutMillis = 30000 ms.
(org.apache.kafka.clients.producer.KafkaProducer:1182)
2.
Wait to catch up until the offset at 1 (io.confluent.kafka.schemaregistry.storage.KafkaStore:304)
Request Failed with exception (io.confluent.rest.exceptions.DebuggableExceptionMapper:62)
io.confluent.kafka.schemaregistry.rest.exceptions.RestSchemaRegistryTimeoutException: Register operation timed out
at io.confluent.kafka.schemaregistry.rest.exceptions.Errors.operationTimeoutException(Errors.java:132)
.
.
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Write to the Kafka store timed out while
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.register(KafkaSchemaRegistry.java:508)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.registerOrForward(KafkaSchemaRegistry.java:553)
.
.
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreTimeoutException: KafkaStoreReaderThread failed to reach target offset within the timeout interval. targetOffset: 3, offsetReached: 1, timeout(ms): 50
0
So basically schema registry before registering schema moves offset to latest and there it time out 500ms.
My question was this.
How can I find why it is not able to read from kafka?
Does the source connector task restart or poll data for the failed task of one connector? Because in later section of the log I see this.
Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:426)
WorkerSourceTask{id=A-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:443)
So eariler it failed after this, but now it is not printing it, which means it passed.
The key thing to note is that when it failed eariler reading, it failed task for only one connector A and others passed. Later I didn't find the exception for the connector A.
If the task is not starting or connector is not polling again, I need to restart task using rest API.
Any help will be greatly appriciated.
Thanks in advance.
Regarding your question title, read the error.
task will not recover until manually restarted
If you have more than one task, you would still expect to see logs from other tasks.
As far as offset commits, source task offsets would not be committed until the task succeeds, and no logs given show something "moving to latest"
The error has nothing to do with reading from Kafka. The error is a timeout in your schema registry client in the AvroConverter, which isn't required for Kafka Connect.

Strange behaviour when using AvroConverter in Confluent Sink Connector

I was running confluent(v5.5.1) s3 sink connector with below config:
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://registryurl",
"value.converter.value.subject.name.strategy":"io.confluent.kafka.serializers.subject.RecordNameStrategy",
......
And got below error in the log like:
DEBUG Sending GET with input null to http://registryurl/schemas/ids/309?fetchMaxId=false
DEBUG Sending POST with input {......} to http://registryurl/MyRecordName?deleted=true
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro value schema version for id 309
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
There are 2 questions that baffles me here:
Why the sink connector sends additional POST request to schema registry given it's just a consumer? And I have successfully received messages when using a standard kafka consumer, which ONLY sends a GET request to the schema registry.
As per this docs and official doc, the schema subject format will be like SubjectNamingStrategy-value or -key. However judging by the log, it does not suffix the request with "-value". I have tried all the 3 strategies and found ONLY the default TopicNameStrategy works as expected.
Appreciated if anyone could shed some light here.
Thanks a lot

Kafka stream : Is there a way to ignore specific offsets in a topic partition while writing to another topic

Background: i used wrong avro schema registry while producing to prod topic and as a result the kafka connect went down because of the messages with wrong schema id.So as a recovery plan we wanted to copy the messages in the prod topic to a test topic and then write the good messages to the hdfs.But we are facing issues with certain offsets that have wrong schema id while reading from prod topic.Is there a way to ignore such offsets while writing to another topic.
Exception in thread "StreamThread-1"
org.apache.kafka.streams.errors.StreamsException: Failed to deserialize value
for record. topic=xxxx, partition=9, offset=1259032
Caused by: org.apache.kafka.common.errors.SerializationException: Error
retrieving Avro schema for id 600
Caused by:
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
Schema not found io.confluent.rest.exceptions.RestNotFoundException: Schema not found
io.confluent.rest.exceptions.RestNotFoundException: Schema not found
{code}
You can change the deserialization exception handler to skip over those record as describe in the docs: https://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-records
Ie, you set LogAndContinueExceptionHandler in the config via parameter default.deserialization.exception.handler.