Correct key-serializer to use for kafka avro - apache-kafka

If I use org.apache.kafka.common.serialization.StringSerializer in my key-serializer attribute yml file, the key that gets published in Kafka is correct but I get the SerializationException error : Error deserializing Avro message for id -1 when that message is consumed.
But when I use io.confluent.kafka.serializers.KafkaAvroSerializer instead, I don't get the SerializationException error but there are leading characters that gets added with the key. The characters are \u00014H and I have no idea where they came from. I'm using UUID as key and the application is in Spring Boot.
What should be the proper serializer to use? The value-serializer I use is io.confluent.kafka.serializers.KafkaAvroSerializer

The characters are \u00014H and I have no idea where they came from
They came from you using String Deserializer when you're consuming the Avro bytes.
If you only have UUID strings, then you don't need Avro. Kafka has its own UUIDSerializer

Related

How to deserialize avro message using mirrormaker?

I want to replicate a kafka topic to an azure event hub.
The messages are in avro format and uses a schema that is behind a schema registry with USER_INFO authentication.
Using a java client to connect to kafka, I can use a KafkaAvroDeserializer to deserialize the message correctly.
But this configuration doesn't seems to work with mirrormaker.
Is is possible to deserialize the avro message using mirrormaker before sending it ?
Cheers
For MirrorMaker1, the consumer deserializer properties are hard-coded
Unless you plan on re-serializing the data into a different format when the producer sends data to EventHub, you should stick to using the default ByteArrayDeserializer.
If you did want to manipulate the messages in any way, that would need to be done with a MirrorMakerMessageHandler subclass
For MirrorMaker2, you can use AvroConverter followed by some transforms properties, but still ByteArrayConverter would be preferred for a one-to-one byte copy.

Does Kafka Consumer Deserializer have to match Producer Serializer?

Does the deserializer used by a consumer has to match the serializer used by the produced?
If a producer adds JSON values to messages then could the consumer choose to use the ByteArrayDeserializeror, StringDeserializeror JsonDeserializer regardless of which serializer the producer used or does it have to match?
If they have to match, what is the consequence of using a different one? Would this result in an exception, no data or something else?
It has to be compatible, not necessarily the matching pair
ByteArrayDeserializer can consume anything.
StringDeserializer assumes UTF8 encoded strings and might cast other types to strings upon consumption
JsonDeserializer will attempt to parse the message and will fail on invalid JSON
If you used an Avro, Protobuf, Msgpack, etc binary-format consumer then those also attempt to parse the message and will fail if they don't recognize their respective containers

Kafka message Value conversion from Integer to String

I have a kafka message in one kafka topic. One of the keys of this message key=ID and value of that key is value=12345678910111213141.
Type of this value is integer. I want to convert the type to string.
Currently am doing this in some hacky way:
consume message
convert the type
produce the message to other topic
Is there an easier way to do this?
PS: Don't have the access to the first producer which sends the message as integer.
If I understand your question correctly, this will not be possible. As far as Kafka is concerned, all data is stored as Bytes and Kafka does not know which Serializer was used to generate the byte code.
Therefore, you can only deserialize the value in the same way as it was serialized by the Producer. As I understand, this was done using a Integer Serializer. But as you do not have acces to the Producer, you have no chance but reading it as an Integer and converting it to a String afterwards.

Structuring Data Received from Kafka Topic

I have fetched some data from a kafka topic. I have used this configuration in the yml file
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
The data I got in console is
?8Enterprise recommended by AC
Is this an Avro Data??
And if is it so.. What should I do to convert it as Json data. Do I need to convert or deserialize??
depends on the way you have gererate the message before write it to a topic.
If you have deserialized with String Deserializar and your output seems correct, probably you also used the String Serializer to write the value of the message in the topic.
This is not Avro or structured data. You can use json or Avro to serialize data when creating the message and then use the same deserializer when consuming the messages to get the message back with the srtuctured data.
If you tell me your programming language maybe i can give you an example.
I hope It helps

Unable to read Kafka topic avro messages

Kafka connect events for Debezium connector is Avro encoded.
Mentioned the following in the connect-standalone.properties passed to Kafka connect standalone service.
key.converter=io.confluent.connect.avro.AvroConverter
value.confluent=io.confluent.connect.avro.AvroConverter
internal.key.converter=io.confluent.connect.avro.AvroConverter
internal.value.converter=io.confluent.connect.avro.AvroConverter
schema.registry.url=http://ip_address:8081
internal.key.converter.schema.registry.url=http://ip_address:8081
internal.value.converter.schema.registry.url=http://ip_address:8081
Configuring the Kafka consumer code with these properties:
Properties props = new Properties();
props.put("bootstrap.servers", "ip_address:9092");
props.put("zookeeper.connect", "ip_address:2181");
props.put("group.id", "test-consumer-group");
props.put("auto.offset.reset","smallest");
//Setting auto comit to false to ensure that on processing failure we retry the read
props.put("auto.commit.offset", "false");
props.put("key.converter.schema.registry.url", "ip_address:8081");
props.put("value.converter.schema.registry.url", "ip_address:8081");
props.put("schema.registry.url", "ip_address:8081");
In the consumer implementation, following is the code to read the key and value components. I am getting the schema for key and value from Schema Registry using REST.
GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
return reader.read(null, DecoderFactory.get().binaryDecoder(byteData, null));
Parsing the key worked fine. While parsing the value part of the message, I am getting ArrayIndexOutOfBoundsException.
Downloaded the source code for Avro and debugged. Found that the GenericDatumReader.readInt method is returning a negative value. This value is expected to be the index of an array (symbols) and hence should have been positive.
Tried consuming events using the kafka-avro-standalone-consumer but it threw an ArrayIndexOutOfBoundsException too. So, my guess is that the message is improperly encoded at Kafka connect (producer) & the issue is with the configuration.
Following are the questions:
Is there anything wrong with the configuration passed at producer or consumer?
Why is key de-serialization working but not that of value?
Is there anything else needed to be done for things to work? (like specifying character encoding somewhere).
Can Debezium with Avro be used in production, or is it an experimental feature for now? The post on Debezium Avro specifically says that examples involving Avro will be included in future.
There have been many posts where Avro deserialization threw ArrayIndexOutOfBoundsException but could not relate it to the problem I am facing.
Followed the steps in http://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html & things are working fine now.