Confluent Kafka Rest Proxy - Avro Deserialization - apache-kafka

I am trying to use Confluent Kafka REST Proxy to retrieve data in Avro format from one of my topics but unfortunately I get a deserialization error. I am querying the Kafka REST proxy using the following command
curl -X GET -H "Accept: application/vnd.kafka.avro.v2+json"
http://localhost:8082/consumers/my-group/instances/my-consumer/records?timeout=30000
And I get as response
{
"error_code": 50002,
"message": "Kafka error: Error deserializing key/value for partition input-0 at offset 0. If needed, please seek past the record to continue consumption."
}
and the logs on Kafka Rest Proxy server are:
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition input-0 at offset 0. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
The data have been produced using KafkaAvroSerializer and the schema is present on the Schema Registry. Also note that data are readable by using avro-console-consumer on CLI.
Does anybody know how to resolve this issue?

It's most likely that as well as valid Avro messages on the topic, you also have invalid ones. That's what this error means, and is exactly the error that I got when I tried to consume a non-Avro message locally with the REST Proxy:
ERROR Unexpected exception in consumer read task id=io.confluent.kafkarest.v2.KafkaConsumerReadTask#2e20d4f3 (io.confluent.kafkarest.v2.KafkaConsumerReadTask)
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition avrotest-0 at offset 2. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
I would use a tool such as kafkacat to inspect the actual messages at the offset given in the error, e.g.:
kafkacat -C -b localhost:9092 -t test_topic_avro -o 0 -c 1
The -o 0 will consume the message at offset 0, and -c 1 means consume just one message.
You can also seek past the problematic offset, e.g. for topic avrotest move the offset to 1:
echo '{ "offsets": [ { "topic": "avrotest", "partition": 0, "offset": 1 } ] }' | \
http POST localhost:8082/consumers/rmoff_consumer_group/instances/rmoff_consumer_instance/positions \
Content-Type:application/vnd.kafka.v2+json

It wasn't supported to have String keys and AVRO values in the rest proxy until recently:
https://github.com/confluentinc/kafka-rest/issues/210
So recently that the code has been merged, but issue is still open and docs haven't been updated fully:
https://github.com/confluentinc/kafka-rest/pull/797

Related

Error while consuming AVRO Kafka Topic from KSQL Stream

I created some dummydata as a Stream in KSQLDB with
VALUE_FORMAT='JSON' TOPIC='MYTOPIC'
The Setup is over Docker-compose. I am running a Kafka Broker, Schema-registry, ksqldbcli, ksqldb-server, zookeeper
Now I want to consume these records from the topic.
My first and last approach was over the commandline with following command
docker run --net=host --rm confluentinc/cp-schema-registry:5.0.0 kafka-avro-console-consumer
--bootstrap-server localhost:29092 --topic DXT --from-beginning --max-messages 10
--property print.key=true --property print.value=true
--value-deserializer io.confluent.kafka.serializers.KafkaAvroDeserializer
--key-deserializer org.apache.kafka.common.serialization.StringDeserializer
But that just returns the error
[2021-04-22 21:45:42,926] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
I also tried it with different use cases in Java Spring but with no prevail. I just cannot consume the created topics.
If I would need to define my own schema, where should I do that and what would be the easiest way because I just created a stream in Ksqldb?
Is there an easy to follow example. I did not specifiy anything else when I created the stream like in the quickstart example on Ksqldb.io. (I added the schema-registry in my deployment)
As I am a noob that is sitting here for almost 10 hours any help would be appreciated.
Edit: I found that pure JSON does not need the Schema-registry with ksqldb. Here.
But how to deserialize it?
If you've written JSON data to the topic then you can read it with the kafka-console-consumer.
The error you're getting (Error deserializing Avro message for id -1…Unknown magic byte!) is because you're using the kafka-avro-console-consumer which attempts to deserialise the topic data as Avro - which it isn't, hence the error.
You can also use PRINT DXT; from within ksqlDB.

Strange behaviour when using AvroConverter in Confluent Sink Connector

I was running confluent(v5.5.1) s3 sink connector with below config:
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://registryurl",
"value.converter.value.subject.name.strategy":"io.confluent.kafka.serializers.subject.RecordNameStrategy",
......
And got below error in the log like:
DEBUG Sending GET with input null to http://registryurl/schemas/ids/309?fetchMaxId=false
DEBUG Sending POST with input {......} to http://registryurl/MyRecordName?deleted=true
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro value schema version for id 309
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
There are 2 questions that baffles me here:
Why the sink connector sends additional POST request to schema registry given it's just a consumer? And I have successfully received messages when using a standard kafka consumer, which ONLY sends a GET request to the schema registry.
As per this docs and official doc, the schema subject format will be like SubjectNamingStrategy-value or -key. However judging by the log, it does not suffix the request with "-value". I have tried all the 3 strategies and found ONLY the default TopicNameStrategy works as expected.
Appreciated if anyone could shed some light here.
Thanks a lot

How to ignore error result in Kafka Connect Elasticsearch

I am trying to run kafka connect for elastic search .
But because of some mistake i entered wrong record in kafka topic .
Now i fixed that issue and inserting correct value but elastic search is still throwing error on previous record in the topic
Here is the error
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'lambdaDemo0': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"lambdaDemo0-9749-0e710000fd04"; line: 1, column: 13]
Is there any way i can ignore the older record in the topic and tell kafka connect to pick latest record ?
I am trying to delete the topic i get topic marked for deletion but still records are present in the topic .
I tried below two properties but does seems to be working
drop.invalid.message=true
behavior.on.malformed.documents=ignore
Please suggest how i can clean up the wrong record in the topic
You can tell Kafka Connect to just skip bad records
errors.tolerance = all
Optionally, you can route these messages to another topic (known as a dead letter queue) for inspection by adding
errors.tolerance = all
errors.deadletterqueue.topic.name = my-dlq-topic
These settings are valid for Kafka Connect with any connector that is failing in the serialisation/deserialisation stage of processing. For more information see this article.

Error retrieving Avro schema for id 1, Subject not found.; error code: 40401

Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 1
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
Confluent Version 4.1.0
I am consuming data from a couple of topics(topic_1, topic_2) using KTable, joining the data and then pushing the data onto another topic(topic_out) using KStream. (Ktable.toStream())
The data is in avro format
When I check the schema by using
curl -X GET http://localhost:8081/subjects/
I find
topic_1-value
topic_1-key
topic_2-value
topic_2-key
topic_out-value
but there is no subject with topic_out-key. Why is it not created?
output from topic_out:
kafka-avro-console-consumer --bootstrap-server localhost:9092 --from-beginning --property print.key=true --topic topic_out
"code1 " {"code":{"string":"code1 "},"personid":{"string":"=NA="},"agentoffice":{"string":"lic1 "},"status":{"string":"a"},"sourcesystem":{"string":"ILS"},"lastupdate":{"long":1527240990138}}
I can see the key being generated, but no subject for key.
Why is subject with key required?
I am feeding this topic to another connector (hdfs-sink) to push the data to hdfs but it fails with below error
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 5\nCaused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
when I look at the schema-registry.logs, I can see:
[2018-05-24 15:40:06,230] INFO 127.0.0.1 - -
[24/May/2018:15:40:06 +0530] "POST /subjects/topic_out-key?deleted=true HTTP/1.1" 404 51 9 (io.confluent.rest-utils.requests:77)
any idea why the subject topic_out-key not being created?
any idea why the subject topic_out-key not being created
Because the Key of your Kafka Streams output is a String, not an Avro encoded string.
You can verify that using kafka-console-consumer instead and adding --property print.value=false and not seeing any special characters compared to the same command when you do print the value (this is showing the data is binary Avro)
From Kafka Connect, you must use Kafka's StringConverter class for key.converter property rather than the Confluent Avro one

Kafka stream : Is there a way to ignore specific offsets in a topic partition while writing to another topic

Background: i used wrong avro schema registry while producing to prod topic and as a result the kafka connect went down because of the messages with wrong schema id.So as a recovery plan we wanted to copy the messages in the prod topic to a test topic and then write the good messages to the hdfs.But we are facing issues with certain offsets that have wrong schema id while reading from prod topic.Is there a way to ignore such offsets while writing to another topic.
Exception in thread "StreamThread-1"
org.apache.kafka.streams.errors.StreamsException: Failed to deserialize value
for record. topic=xxxx, partition=9, offset=1259032
Caused by: org.apache.kafka.common.errors.SerializationException: Error
retrieving Avro schema for id 600
Caused by:
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
Schema not found io.confluent.rest.exceptions.RestNotFoundException: Schema not found
io.confluent.rest.exceptions.RestNotFoundException: Schema not found
{code}
You can change the deserialization exception handler to skip over those record as describe in the docs: https://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-records
Ie, you set LogAndContinueExceptionHandler in the config via parameter default.deserialization.exception.handler.