Differentiate target consumer in MassTransit - apache-kafka

There are few consumers listening on one Kafka topic. A message has a parameter by which it can be determined which consumer needs to consume it. What mechanism of MassTransit use to implement such a solution?

As explained in the interoperability documentation, MassTransit uses the messageType header to determine which message types are present in the serialized message body. If there are no message types, such as when the RawJson message deserializer is used, it will deliver the message to all registered consumers.
Now, with Kafka, the type itself is part of the TopicEndpoint configuration, so only that message type is dispatched to the endpoint. Depending upon the serialization (AVRO, Json, etc.) the experience depends upon whether or not the message types are available.
You could certainly write your own deserializer that uses that parameter to determine which message types are in the message and write your own deserializer to properly respond to TryGetMessage<T> with the supported types. The best example of that would be either the JsonConsumeContext, or the recently updated RawJsonConsumeContext that now supports transport headers for message identification.

Related

MassTransit Kafka Rider get raw message

I need to get raw message that was sent to kafka for logging.
For example, if validation context.Message was failed.
I tried answer from this Is there a way to get raw message from MassTransit?, but it doesn't work and context.TryGetMessage<JToken>() return null all the time.
The Confluent.Kafka client does not expose the raw message data, only the deserialized message type. Therefore, MassTransit does not have a message body accessible.

Is Kafka message headers the right place to put event type name?

In scenario where multiple single domain event types are produced to single topic and only subset of event types are consumed by consumer i need a good way to read the event type before taking action.
I see 2 options:
Put event type (example "ORDER_PUBLISHED") into message body (payload) itself which would be like broker agnostic approach and have other advantages. But would involve parsing of every message just to know the event type.
Utilize Kafka message headers which would allow to consume messages without extra payload parsing.
The context is event-sourcing. Small commands, small payloads. There are no huge bodies to parse. Golang. All messages are protobufs. gRPC.
What is typical workflow in such scenario.
I tried to google on this topic, but didn't found much on Headers use-cases and good practices.
Would be great to hear when and how to use Kafka message headers and when not to use.
Clearly the same topic should be used for different event types that apply to the same entity/aggregate (reference). Example: BookingCreated, BookingConfirmed, BookingCancelled, etc. should all go to the same topic in order to (excuse the pun) guarantee ordering of delivery (in this case the booking ID is the message key).
When the consumer gets one of these events, it needs to identify the event type, parse the payload, and route to the processing logic accordingly. The event type is the piece of message metadata that allows this identification.
Thus, I think a custom Kafka message header is the best place to indicate the type of event. I'm not alone:
Felipe Dutra: "Kafka allow you to put meta-data as header of your message. So use it to put information about the message, version, type, a correlationId. If you have chain of events, you can also add the correlationId of opentracing"
This GE ERP system has a header labeled "event-type" to show "The type of the event that is published" to a kafka topic (e.g., "ProcessOrderEvent").
This other solution mentions that "A header 'event' with the event type is included in each message" in their Kafka integration.
Headers are new in Kafka. Also, as far as I've seen, Kafka books focus on the 17 thousand Kafka configuration options and Kafka topology. Unfortunately, we don't easily find much on how an event-driven architecture can be mapped with the proper semantics onto elements of the Kafka message broker.

SQS: How to forward message to subscriber based on a certain key

I have a validation service which takes in validation-requests and publishes them to a SQS queue. Now based on the type of validation request, I want to forward the message to that specific service.
So basically, I have one producer and multiple consumers, but essentially, one message is to be consumed by only one consumer.
What approach should I use? Should I have a different SQS queue for each service or I can do this using a single queue based on message type?
As I see it, you have three options;
The first option, like you say is to have a unique consumer for each message type. This is the approach we use and we have thousands of queues and many different messages types.
The second option would be to decorate the message being pushed onto SQS with something that would indicate it's desired consume, then have a generic consumer in your application that can forward the message on to the right consumer. Though this approach is generally seen as an anti pattern, I would personally agree.
Thirdly, you could take advantage of SNS filtering but that's only if you use SNS right now otherwise you'd have to invest in some time to setup it up and make it work.
Hope that helps!

Kafka data types of messages

I was wondering about what types of data we could have in Kafka topics.
As I know in application level this is a key-value pairs and this could be the data of type which is supported by the language.
For example we send some messages to the topic, could it be some json, parquet files, serialized data or we operate with the messages only like with the plain text format?
Thanks for you help.
There are various message formats depending on if you are talking about the APIs, the wire protocol, or the on disk storage.
Some of these Kafka Message formats are described in the docs here
https://kafka.apache.org/documentation/#messageformat
Kafka has the concept of a Serializer/Deserializer or SerDes (pronounced Sir-Deez).
https://en.m.wikipedia.org/wiki/SerDes
A Serializer is a function that can take any message and converts it into the byte array that is actually sent on the wire using the Kafka Protocol.
A Deserializer does the opposite, it reads the raw message bytes portion of the Kafka wire protocol and re-creates a message as you want the receiving application to see it.
There are built-in SerDes libraries for Strings, Long, ByteArrays, ByteBuffers and a wealth of community SerDes libraries for JSON, ProtoBuf, Avro, as well as application specific message formats.
You can build your own SerDes libraries as well see the following
How to create Custom serializer in kafka?
On the topic it's always just serialised data. Serialisation happens in the producer before sending and deserialisation in the consumer after fetching. Serializers and deserializers are pluggable, so as you said at application level it's key value pairs of any data type you want.

Does Kafka support request response messaging

I am investigating Kafka 9 as a hobby project and completed a few "Hello World" type examples.
I have got to thinking about Real World Kafka applications based on request response messaging in general and more specifically how to link a Kafka request message to its response message.
I was thinking along the lines of using a generated UUID as the request message key and employ this request UUID as the associated response message key. Much the same type of mechanism that WebSphere MQ has message correlation id.
My end 2 end process would be.
1). Kafka client generates a random UUID and sends a single Kafka request message.
2). The server would consume this request message extract & store the request UUID value
3). complete a Business Process using the message payload.
4). Respond with a response message that employs the stored UUID value from the request message as response message Key.
5). the Kafka client polls the response topic until it either timeouts or retrieves a message with the original request UUID value.
What I concerned about is that the Kafka Consumer polling will remove other clients messages from the response topic and increment the offsets making other clients fail.
Am I trying to apply Kafka in a use case it was never designed for?
Is it possible to implement request/response messaging in Kafka?
Even though Kafka provides convenience methods to persist the committed offsets for a given consumer group, you're not required to use that behavior and can write your own if you feel the need. Even so, the use of Kafka the way you've described it is a bit awkward for the use case as each client needs to repeatedly search the topic for a specific response. That's inefficient at best.
You could break the problem into two parts, continuing to use Kafka to deliver requests to and responses from your server. The only piece you'd need to add would be some sort of API layer that your clients talk to and which encapsulates the Kafka-specific logic from your clients. This layer would need a local DB (relational or NoSQL) that could store responses by uuid making it very fast and easy for the API to answer whether a response is available for a specific uuid.
Easier! You can only write on zookeeper that the UUID X should be answered on partition Y, and make the producer that sent that UUID consume the partition Y... Does that make sense?
I think you need a well defined shard key of the service that invokes the request. Your request should contain this shard key and the name of the topic where to post response. Also you should create some sort of state machine and when a message regarding your task comes you transition to some state... this would be for strict async design
In theory, you could
assign an ID to each request and message that is supposed to get a result message;
create a hash function that would map this ID to an identifier of of a partition,
when sending the result message, use the same hash function to get the identifier of the partition to send it to,
in the producer you could only observe that given partition.
That would reduce the need to crawl many messages in that topic to filter out the result required by the waiting request handler.