AVRO serialization exception UTF8\String - apache-kafka

I have a kafka producer and consumer on different services, the consumer code was rolled out and worked fine then today I rolled out the producer side changes and get the serialization exception here on the consumer. I use an confluent AVRO schema registry server, which had also been working fine until today.
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 59
Caused by: java.lang.ClassCastException: class org.apache.avro.util.Utf8 cannot be cast to class java.lang.String (org.apache.avro.util.Utf8 is in unnamed module of loader org.apache.catalina.loader.ParallelWebappClassLoader #77bd7fe7; java.lang.String is in module java.base of loader 'bootstrap')
at com.mydev.ret.lib.avro.mark.put(Mark.java:132)
As part of this the schema has changed but that is not the first time this has happened, what might be significant is the we are after moving to avro 1.9.1 AND kafka-avro-serializer-6.0.0
Any ideas, seeing the string and UTF issue makes me thing there might be an artifact mismatch across producer and consumer.

Related

how to resolve java.lang.IllegalArgumentException Unsupported Avro type

private KafkaTemplate<String, KafkaMessage> kafkaTemplate;
Message<KafkaMessage> message = MessageBuilder
        .withPayload(kafkaMessage)
        .setHeader(KafkaHeaders.TOPIC, targetTopic)
        .setHeader(KafkaHeaders.MESSAGE_KEY, "someStringValue" )
        .setHeader("X-Custom-Header", headerCreator.generateHeader(source, type)).build();
ListenableFuture<SendResult<String, KafkaMessage>> listenableFuture = kafkaTemplate.send(message);
This is my code. and the exception occurs at send method.
The exception is java.lang.IllegalArgumentException: Unsupported Avro type. Supported types are null, Boolean, Integer, Long, Float, Double, String, byte[] and IndexedRecord ?
Assuming that the Kafka topic is expecting an AVRO serialized object, you can add the plugin "avro-maven-plugin" to the project POM, and let Maven to generate the AVRO classes for you.
This plugin reads the AVRO schema' files, and automatically (once the project is build) generates the POJO classes. If the schema contains an error or is not valid, you will be warned before executing any code.
The KafkaTeamplate should use this POJO instead of KafkaMessage.
I recommend reading How to Use Schema Registry and Avro in Spring Boot Applications for a complete consumer and producer example, using Confluent components, for the overall project configuration (SERDEs, schema registry, etc.).

Implementing custom AvroConverter for confluent kafka-connect-s3

I am using Confluent's Kafka s3 connect for copying data from apache Kafka to AWS S3.
The problem is that I have Kafka data in AVRO format which is NOT using Confluent Schema Registry’s Avro serializer and I cannot change the Kafka producer. So I need to deserialize existing Avro data from Kafka and then persist the same in parquet format in AWS S3. I tried using confluent's AvroConverter as value converter like this -
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost/api/v1/avro
And i am getting this error -
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic dcp-all to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:110)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
As far as I understand, "io.confluent.connect.avro.AvroConverter" will only work if the data is written in Kafka using Confluent Schema Registry’s Avro serializer and hence I am getting this error. So my question is Do I need to implement a generic AvroConverter in this case? And if yes, how do I extend the existing source code - https://github.com/confluentinc/kafka-connect-storage-cloud?
Any help here will be appreciated.
You don't need to extend that repo. You just need to implement a Converter (part of Apache Kafka) shade it into a JAR, then place it on your Connect worker's CLASSPATH, like BlueApron did for Protobuf
Or see if this works - https://github.com/farmdawgnation/registryless-avro-converter
NOT using Confluent Schema Registry
Then what registry are you using? Each one that I know of has configurations to interface with the Confluent one

SerializationException: Error serializing Avro message

I'm using KStreams which have a AVRO schema and are hooked up with the schema registry. When I start processing the stream, I get a NullPointerException as follows:
Caused by: org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: java.lang.NullPointerException: null of int in field SCORE_THRSHLD_EXCD of gbl_au_avro
at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:139)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:92)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:53)
at io.confluent.kafka.streams.serdes.avro.GenericAvroSerializer.serialize(GenericAvroSerializer.java:63)
at io.confluent.kafka.streams.serdes.avro.GenericAvroSerializer.serialize(GenericAvroSerializer.java:39)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:154)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:98)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
Not sure what the issue is here. Both source and sink have schema repository linked to them and have schemas defined correctly.
Can you please suggest what I might be doing wrong?

Kafka stream : Is there a way to ignore specific offsets in a topic partition while writing to another topic

Background: i used wrong avro schema registry while producing to prod topic and as a result the kafka connect went down because of the messages with wrong schema id.So as a recovery plan we wanted to copy the messages in the prod topic to a test topic and then write the good messages to the hdfs.But we are facing issues with certain offsets that have wrong schema id while reading from prod topic.Is there a way to ignore such offsets while writing to another topic.
Exception in thread "StreamThread-1"
org.apache.kafka.streams.errors.StreamsException: Failed to deserialize value
for record. topic=xxxx, partition=9, offset=1259032
Caused by: org.apache.kafka.common.errors.SerializationException: Error
retrieving Avro schema for id 600
Caused by:
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
Schema not found io.confluent.rest.exceptions.RestNotFoundException: Schema not found
io.confluent.rest.exceptions.RestNotFoundException: Schema not found
{code}
You can change the deserialization exception handler to skip over those record as describe in the docs: https://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-records
Ie, you set LogAndContinueExceptionHandler in the config via parameter default.deserialization.exception.handler.

Deserializing exception while consuming from a Kafka Topic

While consuming messages from a Kafka topic, I am getting the below exception repeatedly. Could somebody explain what the exception means and how to avoid it?
Exception stacktrace -
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition TEST-TOPIC1.0-0 at offset 0. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 61
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:171)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:188)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:330)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:323)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:63)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndID(CachedSchemaRegistryClient.java:118)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:121)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:92)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:54)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:918)
at org.apache.kafka.clients.consumer.internals.Fetcher.access$2600(Fetcher.java:93)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1095)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1200(Fetcher.java:944)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:567)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:528)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1086)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
Follow issue details here
https://github.com/confluentinc/schema-registry/issues/667
Optionally removing your kafka and confluent data folders can resolve this issue, if there is a missing avro schema