MassTransit Kafka Rider get raw message - apache-kafka

I need to get raw message that was sent to kafka for logging.
For example, if validation context.Message was failed.
I tried answer from this Is there a way to get raw message from MassTransit?, but it doesn't work and context.TryGetMessage<JToken>() return null all the time.

The Confluent.Kafka client does not expose the raw message data, only the deserialized message type. Therefore, MassTransit does not have a message body accessible.

Related

Differentiate target consumer in MassTransit

There are few consumers listening on one Kafka topic. A message has a parameter by which it can be determined which consumer needs to consume it. What mechanism of MassTransit use to implement such a solution?
As explained in the interoperability documentation, MassTransit uses the messageType header to determine which message types are present in the serialized message body. If there are no message types, such as when the RawJson message deserializer is used, it will deliver the message to all registered consumers.
Now, with Kafka, the type itself is part of the TopicEndpoint configuration, so only that message type is dispatched to the endpoint. Depending upon the serialization (AVRO, Json, etc.) the experience depends upon whether or not the message types are available.
You could certainly write your own deserializer that uses that parameter to determine which message types are in the message and write your own deserializer to properly respond to TryGetMessage<T> with the supported types. The best example of that would be either the JsonConsumeContext, or the recently updated RawJsonConsumeContext that now supports transport headers for message identification.

Spring Kafka Template send with retry based on different cases

I am using Spring Kafka's KafkaTemplate for sending message using the async way and doing proper error handing using callback.
Also, I have configured the Kafka producer to have maximum of retries (MAX_INTEGER).
However there maybe some errors which is related with avro serialization, but for those retry wouldn't help. So how can I escape those error without retries but for other broker related issues I want to do retry?
The serialization exception will occur before the message is sent, so the retries property is irrelevant in that case; it only applies when the message is actually sent.

kafka use-case for error topics

I'm trying to put a pipeline in place and I just realized I don't really know why there will be an error and why there will be an error topic. There is some metadata that I will be counting on to be certain values but other than that, is there anything that is a "typical" kafka error? I'm not sure what the "typcial" kafka error topic is used for. This is specifically for a streams application. Thanks for any help.
One example of an error topic in a streaming environment would be that it contains messages that failed to abide by their contract.. example: if your incoming events are meant to be in a certain json format, your spark application will first try to parse the event into a class that fits the events json contract.
If it is in the right format, it is parsed and the app continues.
If it is in the incorrect format, the parsing fails and the json string is sent to the error topic.
Other use cases could be to to send the event back to an error topic to be processed at a later time.. ie network issues connecting to other services.

Unexpected character in Kafka message with Flume

I have an ingestion pipeline using Flume & Kafka, consuming CSV files, converting events in JSON in a Flume Interceptor and pushing it in Kafka.
When I'm logging the message before being sent to Kafka, it's a normal, valid JSON. But when consuming the same message from Kafka, I'm getting errors when trying to serialize it, saying it's not valid JSON.
Indeed I have unrecognized chars at the beginning of my message:
e.g. �
I think it stands for the empty header that flume try to had to the event when posting to Kafka. But I can't seem to be able to prevent this from happening.
Does anyone knows how to completely remove headers from Flume events being sent or more precisely, remove those chars ?
Looks like a basic character encoding issue, like if kafka runs on Linux while the producer runs on a windows machine. You might want to triple-check that all machines handle utf-8 encoded messages.
this post should be your friend.

Does Kafka support request response messaging

I am investigating Kafka 9 as a hobby project and completed a few "Hello World" type examples.
I have got to thinking about Real World Kafka applications based on request response messaging in general and more specifically how to link a Kafka request message to its response message.
I was thinking along the lines of using a generated UUID as the request message key and employ this request UUID as the associated response message key. Much the same type of mechanism that WebSphere MQ has message correlation id.
My end 2 end process would be.
1). Kafka client generates a random UUID and sends a single Kafka request message.
2). The server would consume this request message extract & store the request UUID value
3). complete a Business Process using the message payload.
4). Respond with a response message that employs the stored UUID value from the request message as response message Key.
5). the Kafka client polls the response topic until it either timeouts or retrieves a message with the original request UUID value.
What I concerned about is that the Kafka Consumer polling will remove other clients messages from the response topic and increment the offsets making other clients fail.
Am I trying to apply Kafka in a use case it was never designed for?
Is it possible to implement request/response messaging in Kafka?
Even though Kafka provides convenience methods to persist the committed offsets for a given consumer group, you're not required to use that behavior and can write your own if you feel the need. Even so, the use of Kafka the way you've described it is a bit awkward for the use case as each client needs to repeatedly search the topic for a specific response. That's inefficient at best.
You could break the problem into two parts, continuing to use Kafka to deliver requests to and responses from your server. The only piece you'd need to add would be some sort of API layer that your clients talk to and which encapsulates the Kafka-specific logic from your clients. This layer would need a local DB (relational or NoSQL) that could store responses by uuid making it very fast and easy for the API to answer whether a response is available for a specific uuid.
Easier! You can only write on zookeeper that the UUID X should be answered on partition Y, and make the producer that sent that UUID consume the partition Y... Does that make sense?
I think you need a well defined shard key of the service that invokes the request. Your request should contain this shard key and the name of the topic where to post response. Also you should create some sort of state machine and when a message regarding your task comes you transition to some state... this would be for strict async design
In theory, you could
assign an ID to each request and message that is supposed to get a result message;
create a hash function that would map this ID to an identifier of of a partition,
when sending the result message, use the same hash function to get the identifier of the partition to send it to,
in the producer you could only observe that given partition.
That would reduce the need to crawl many messages in that topic to filter out the result required by the waiting request handler.