SerializationException: Error serializing Avro message - apache-kafka

I'm using KStreams which have a AVRO schema and are hooked up with the schema registry. When I start processing the stream, I get a NullPointerException as follows:
Caused by: org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: java.lang.NullPointerException: null of int in field SCORE_THRSHLD_EXCD of gbl_au_avro
at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:139)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:92)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:53)
at io.confluent.kafka.streams.serdes.avro.GenericAvroSerializer.serialize(GenericAvroSerializer.java:63)
at io.confluent.kafka.streams.serdes.avro.GenericAvroSerializer.serialize(GenericAvroSerializer.java:39)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:154)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:98)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
Not sure what the issue is here. Both source and sink have schema repository linked to them and have schemas defined correctly.
Can you please suggest what I might be doing wrong?

Related

Schema Registry Issue - Protobuf Unions

We are currently working on a POC that deals with using a 'oneof' to have multiple events into the same topic. However, we seem to be getting a serialization exception when publishing to the union Kafka topics.
We are creating a union protobuf schema that calls the other event schemas using the oneof. These event schemas uses imports coming from google like (google/type/date.proto) that can't be added as references while evolving schemas in the registry.
Currently we are using 6.1.1 schema registry version and not sure if this is the cause or this is the way it works. Below is the error we are facing for your reference. We are not sure if there is any additional setup or configuration that is needed in such a scenario. Appreciate some advise on this !
org.apache.kafka.common.errors.SerializationException: Error
serializing Protobuf message at
io.confluent.kafka.serializers.protobuf.AbstractKafkaProtobufSerializer.serializeImpl(AbstractKafkaProtobufSerializer.java:106)
~[kafka-protobuf-serializer-6.1.1.jar:na] Caused by:
java.io.IOException: Incompatible schema syntax = "proto3"; ERROR
25580 --- [nio-8090-exec-1] o.a.c.c.C.[.[.[.[dispatcherServlet] :
Servlet.service() for servlet [dispatcherServlet] in context with path
[/kafka_producer_ri] threw exception [Request processing failed;
nested exception is org.apache.camel.CamelExecutionException:
Exception occurred during execution on the exchange:
Exchange[3497788AEF9624A-0000000000000000]] with root cause
java.io.IOException: Incompatible schema syntax = "proto3";
Thanks

AVRO serialization exception UTF8\String

I have a kafka producer and consumer on different services, the consumer code was rolled out and worked fine then today I rolled out the producer side changes and get the serialization exception here on the consumer. I use an confluent AVRO schema registry server, which had also been working fine until today.
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 59
Caused by: java.lang.ClassCastException: class org.apache.avro.util.Utf8 cannot be cast to class java.lang.String (org.apache.avro.util.Utf8 is in unnamed module of loader org.apache.catalina.loader.ParallelWebappClassLoader #77bd7fe7; java.lang.String is in module java.base of loader 'bootstrap')
at com.mydev.ret.lib.avro.mark.put(Mark.java:132)
As part of this the schema has changed but that is not the first time this has happened, what might be significant is the we are after moving to avro 1.9.1 AND kafka-avro-serializer-6.0.0
Any ideas, seeing the string and UTF issue makes me thing there might be an artifact mismatch across producer and consumer.

Is there a way to using kafka schema registry without magic byte?

I'm trying to make my applications work using the schema registry from confluent but at this point I'm not in total control of the producers, you can even see them as legacy applications that simply are not bound to the confluent products.
I was looking at the confluent information and it seems all the messages should include in the payload a Magic Byte and Schema ID
https://docs.confluent.io/3.2.0/schema-registry/docs/serializer-formatter.html
or else when I try to consume it I get an error:
[2020-09-25 13:12:09,008] ERROR WorkerSinkTask{id=s3_parquet_connector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:491)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:324)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:200)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic com.obj_pos to Protobuf:
at io.confluent.connect.protobuf.ProtobufConverter.toConnectData(ProtobufConverter.java:123)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:491)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Protobuf message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2020-09-25 13:12:09,010] ERROR WorkerSinkTask{id=s3_parquet_connector-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
my question is, if there is a way of somehow either disable this magic byte check or if I could create a kafka stream that would just append a this 5 bytes to the initial message so that afterwards I could consume it with a consumer that would connect to the schema registry.
What is happening is that the producer is out of my control so I would need somehow to be able to deserialize messages that do not contain those 5 bytes because they are produced by producers that don't rely on the confluent serializers/de-serializers
they are produced by producers that don't rely on the confluent serializers
Then the problem isn't the Registry.
You shouldn't be using the Converters written by Confluent to consume the messages, as those are bound to the Registry, and there is no way to skip it.
You would instead use the BlueApron ones (assuming the data is really protobuf), or write your own Converter classes.

Kafka stream : Is there a way to ignore specific offsets in a topic partition while writing to another topic

Background: i used wrong avro schema registry while producing to prod topic and as a result the kafka connect went down because of the messages with wrong schema id.So as a recovery plan we wanted to copy the messages in the prod topic to a test topic and then write the good messages to the hdfs.But we are facing issues with certain offsets that have wrong schema id while reading from prod topic.Is there a way to ignore such offsets while writing to another topic.
Exception in thread "StreamThread-1"
org.apache.kafka.streams.errors.StreamsException: Failed to deserialize value
for record. topic=xxxx, partition=9, offset=1259032
Caused by: org.apache.kafka.common.errors.SerializationException: Error
retrieving Avro schema for id 600
Caused by:
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
Schema not found io.confluent.rest.exceptions.RestNotFoundException: Schema not found
io.confluent.rest.exceptions.RestNotFoundException: Schema not found
{code}
You can change the deserialization exception handler to skip over those record as describe in the docs: https://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-records
Ie, you set LogAndContinueExceptionHandler in the config via parameter default.deserialization.exception.handler.

Deserializing exception while consuming from a Kafka Topic

While consuming messages from a Kafka topic, I am getting the below exception repeatedly. Could somebody explain what the exception means and how to avoid it?
Exception stacktrace -
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition TEST-TOPIC1.0-0 at offset 0. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 61
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:171)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:188)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:330)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:323)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:63)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndID(CachedSchemaRegistryClient.java:118)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:121)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:92)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:54)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:918)
at org.apache.kafka.clients.consumer.internals.Fetcher.access$2600(Fetcher.java:93)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1095)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1200(Fetcher.java:944)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:567)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:528)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1086)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
Follow issue details here
https://github.com/confluentinc/schema-registry/issues/667
Optionally removing your kafka and confluent data folders can resolve this issue, if there is a missing avro schema