Read Avro Buffer encoded messages with ConsumeKafka (NIFI) - apache-kafka

I am new to NIFI(and not much experience with Kafka), and I am trying to consume the messages that the producer is generating. To do this job, I am using the the ConsumeKafka processor on NIFI.
The messages are arriving (I can see them on the queue), but when I check the queue, and try to view the messages, I can only see the content with hex format (f.e: in original format I can see a message that says: No viewer is registered for this content type).
The messages that the producer is sending are encoded avro buffer (this is the reference I have taken: https://blog.mimacom.com/apache-kafka-with-node-js/) And when I check the consumer from the console, each message has this format:
02018-09-21T08:37:44.587Z #02018-09-21T08:37:44.587Z #
I have read that the processor UpdateRecord can help to change the hex code to plain text, but I can't make it happen.
How can I configure this UpdateRecord processor?
Chears

Instead of ConsumeKafka, it is better to use ConsumeKafkaRecord processor appropriate to the Kafka version you're using and configure Record Reader with an AvroReader and set Record Writer to the writer of your choice.
Once that's done, you have to configure the AvroReader controller service with a Schema registry. You can use AvroSchemaRegistry where you would specify the schema for the Avro messages that you're receiving in Kafka.
A quick look at this tutorial would help you achieve what you want: https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries

Related

How to deserialize avro message using mirrormaker?

I want to replicate a kafka topic to an azure event hub.
The messages are in avro format and uses a schema that is behind a schema registry with USER_INFO authentication.
Using a java client to connect to kafka, I can use a KafkaAvroDeserializer to deserialize the message correctly.
But this configuration doesn't seems to work with mirrormaker.
Is is possible to deserialize the avro message using mirrormaker before sending it ?
Cheers
For MirrorMaker1, the consumer deserializer properties are hard-coded
Unless you plan on re-serializing the data into a different format when the producer sends data to EventHub, you should stick to using the default ByteArrayDeserializer.
If you did want to manipulate the messages in any way, that would need to be done with a MirrorMakerMessageHandler subclass
For MirrorMaker2, you can use AvroConverter followed by some transforms properties, but still ByteArrayConverter would be preferred for a one-to-one byte copy.

How to monitor 'bad' messages written to kafka topic with no schema

I use Kafka Connect to take data from RabbitMQ into kafka topic. The data comes without schema so in order to associate schema I use ksql stream. On top of the stream I create a new topic that now has a defined schema. At the end I take the data to BQ database. My question is how do I monitor messages that have not passed the stream stage? in this way, do i support schema evolution? and if not, how can use the schema registry functionality?
Thanks
use Kafka Connect to take data ... data comes without schema
I'm not familiar specifically with Rabbitmq connector, but if you use the Confluent converter classes that do use schemas, then it would have one, although maybe only a string or bytes schema
If ksql is consuming the non-schema topic, then there's a consumer group associated with that process. You can monitor its lag to know how many messages have not yet been processed by ksql. If ksql is unable to parse a message because it's "bad", then I assume it's either skipped or the stream stops consuming completely; this is likely configurable
If you've set the output topic format to Avro, for example, then the schema will automatically be registered to the Registry. There will be no evolution until you modify the fields of the stream

How to send a file from producer to consumer in kafka using spring boot?

I am having a kafka program to send a single message from producer and it is consumed by the consumer successfully.But i have a question is there any way to send a file instead of a single text message..If yes How can it be done?
You can send whatever you want in Kafka. Kafka won't infere anything from your data, it will handle it as a simple array of bytes. However be careful about the size of your Kafka record, some cluster parameters like message.max.bytes ( and many other) have to be updated accordingly.
So to answer your question, just read your file using any kind of IO reader ( depending on your programming language) and just send your file using a bytes or String serializer.
Yannick

Kafka topic with different format of data

I have written some avro data to the topic “test-avro” using Kafka-avro-console-producer.
Then I have written some plain text data to the same topic “test-avro” using Kafka-console-producer.
After this, all the data in the topic got corrupted. Can anyone explain what caused this to happen like this?
You simply cannot use the avro-console-consumer (or a Consumer with an Avro deserializer) anymore to read those offsets because it'll assume all data in the topic is Avro and use Confluent's KafkaAvroDeserializer.
The plain console-producer will push non-Avro encoded UTF-8 strings and use the StringSerializer, which will not match the wire format expected for the Avro deserializer
The only way to get past them is to know what offsets are bad, and wait for them to expire on the topic, or reset a consumer group to begin after those messages. Or, you can always use the ByteArrayDeserializer, and add a bunch of conditional logic for parsing your messages to ensure no data-loss.
tl;dr The producer and consumer must agree on the data format of the topic.

Apache NiFi : Validate the FlowFile data created by ConsumeKafka

I am pretty new to NiFi. We have the setup done already where we are able to consume the Kafka messages.
In the NiFi UI, I created the Processor with ConsumeKafka_0_10. When the messages are published (different process), My processor is able to pick up the required data/messages properly.
I go to "Data provenance" and can see that the correct data is received.
However, I want to have the next process as some validator. That will read the flowfile from consumekafka and do basic validation (user-supplied script should be good)
How do we that or which processor works here?
Also any way to convert the flowfile input format into csv or json format?
You have a few options. Depending on the flowfile content format, you can use ValidateRecord with a *Reader record reader controller service configured to validate it. If you already have a script to do this in Groovy/Javascript/Ruby/Python, ExecuteScript is also a solution.
Similarly, to convert the flowfile content into CSV or JSON, use a ConvertRecord processor, with a ScriptedReader and a CSVRecordSetWriter or JsonRecordSetWriter to output into the correct format. These processes use the Apache NiFi record structure internally to convert from arbitrary input/output formats with high performance. Further reading is available at blogs.apache.org/nifi and bryanbende.com.