Is Kafka message headers the right place to put event type name? - apache-kafka

In scenario where multiple single domain event types are produced to single topic and only subset of event types are consumed by consumer i need a good way to read the event type before taking action.
I see 2 options:
Put event type (example "ORDER_PUBLISHED") into message body (payload) itself which would be like broker agnostic approach and have other advantages. But would involve parsing of every message just to know the event type.
Utilize Kafka message headers which would allow to consume messages without extra payload parsing.
The context is event-sourcing. Small commands, small payloads. There are no huge bodies to parse. Golang. All messages are protobufs. gRPC.
What is typical workflow in such scenario.
I tried to google on this topic, but didn't found much on Headers use-cases and good practices.
Would be great to hear when and how to use Kafka message headers and when not to use.

Clearly the same topic should be used for different event types that apply to the same entity/aggregate (reference). Example: BookingCreated, BookingConfirmed, BookingCancelled, etc. should all go to the same topic in order to (excuse the pun) guarantee ordering of delivery (in this case the booking ID is the message key).
When the consumer gets one of these events, it needs to identify the event type, parse the payload, and route to the processing logic accordingly. The event type is the piece of message metadata that allows this identification.
Thus, I think a custom Kafka message header is the best place to indicate the type of event. I'm not alone:
Felipe Dutra: "Kafka allow you to put meta-data as header of your message. So use it to put information about the message, version, type, a correlationId. If you have chain of events, you can also add the correlationId of opentracing"
This GE ERP system has a header labeled "event-type" to show "The type of the event that is published" to a kafka topic (e.g., "ProcessOrderEvent").
This other solution mentions that "A header 'event' with the event type is included in each message" in their Kafka integration.
Headers are new in Kafka. Also, as far as I've seen, Kafka books focus on the 17 thousand Kafka configuration options and Kafka topology. Unfortunately, we don't easily find much on how an event-driven architecture can be mapped with the proper semantics onto elements of the Kafka message broker.

Related

Differentiate target consumer in MassTransit

There are few consumers listening on one Kafka topic. A message has a parameter by which it can be determined which consumer needs to consume it. What mechanism of MassTransit use to implement such a solution?
As explained in the interoperability documentation, MassTransit uses the messageType header to determine which message types are present in the serialized message body. If there are no message types, such as when the RawJson message deserializer is used, it will deliver the message to all registered consumers.
Now, with Kafka, the type itself is part of the TopicEndpoint configuration, so only that message type is dispatched to the endpoint. Depending upon the serialization (AVRO, Json, etc.) the experience depends upon whether or not the message types are available.
You could certainly write your own deserializer that uses that parameter to determine which message types are in the message and write your own deserializer to properly respond to TryGetMessage<T> with the supported types. The best example of that would be either the JsonConsumeContext, or the recently updated RawJsonConsumeContext that now supports transport headers for message identification.

SQS: How to forward message to subscriber based on a certain key

I have a validation service which takes in validation-requests and publishes them to a SQS queue. Now based on the type of validation request, I want to forward the message to that specific service.
So basically, I have one producer and multiple consumers, but essentially, one message is to be consumed by only one consumer.
What approach should I use? Should I have a different SQS queue for each service or I can do this using a single queue based on message type?
As I see it, you have three options;
The first option, like you say is to have a unique consumer for each message type. This is the approach we use and we have thousands of queues and many different messages types.
The second option would be to decorate the message being pushed onto SQS with something that would indicate it's desired consume, then have a generic consumer in your application that can forward the message on to the right consumer. Though this approach is generally seen as an anti pattern, I would personally agree.
Thirdly, you could take advantage of SNS filtering but that's only if you use SNS right now otherwise you'd have to invest in some time to setup it up and make it work.
Hope that helps!

Ingesting data from REST api to Kafka

I have many REST API to pull the data from different data sources, now i want to publish these rest response to different kafka topics. Also i want to make sure that duplicate data is not getting produced.
Is there any tools available to do this kind of operations?
So in general a Kafka processing pipeline should be able to handle messages that are sent multiple times. Exactly once delivery of Kafka messages is a feature that's only been around since mid 2017 (giving that I'm writing this Jan 2018), and Kafka 0.11, so in general unless you're super bleedy edge in your Kafka installation your pipeline should be able to handle multiple deliveries of the same message.
That's of course your pipeline. Now you have a problem where you have a data source that may deliver the message to you multiple times, to your HTTP -> Kafka microservice.
Theoretically you should design your pipeline to be idempotent: that multiple applications of the same change message should only affect the data once. This is, of course, easier said than done. But if you manage this then "problem solved": just send duplicate messages through and whatever it doesn't matter. This is probably the best thing to drive for, regardless of whatever once only delivery CAP Theorem bending magic KIP-98 does. (And if you don't get why this super magic well here's a homework topic :) )
Let's say your input data is posts about users. If your posted data includes some kind of updated_at date you could create a transaction log Kafka topic. Set the key to be the user ID and the values to be all the (say) updated_at fields applied to that user. When you're processing a HTTP Post look up the user in a local KTable for that topic, examine if your post has already been recorded. If it's already recorded then don't produce the change into Kafka.
Even without the updated_at field you could save the user document in the KTable. If Kafka is a stream of transaction log data (the database inside out) then KTables are the streams right side out: a database again. If the current value in the KTable (the accumulation of all applied changes) matches the object you were given in your post, then you've already applied the changes.

Does Kafka support request response messaging

I am investigating Kafka 9 as a hobby project and completed a few "Hello World" type examples.
I have got to thinking about Real World Kafka applications based on request response messaging in general and more specifically how to link a Kafka request message to its response message.
I was thinking along the lines of using a generated UUID as the request message key and employ this request UUID as the associated response message key. Much the same type of mechanism that WebSphere MQ has message correlation id.
My end 2 end process would be.
1). Kafka client generates a random UUID and sends a single Kafka request message.
2). The server would consume this request message extract & store the request UUID value
3). complete a Business Process using the message payload.
4). Respond with a response message that employs the stored UUID value from the request message as response message Key.
5). the Kafka client polls the response topic until it either timeouts or retrieves a message with the original request UUID value.
What I concerned about is that the Kafka Consumer polling will remove other clients messages from the response topic and increment the offsets making other clients fail.
Am I trying to apply Kafka in a use case it was never designed for?
Is it possible to implement request/response messaging in Kafka?
Even though Kafka provides convenience methods to persist the committed offsets for a given consumer group, you're not required to use that behavior and can write your own if you feel the need. Even so, the use of Kafka the way you've described it is a bit awkward for the use case as each client needs to repeatedly search the topic for a specific response. That's inefficient at best.
You could break the problem into two parts, continuing to use Kafka to deliver requests to and responses from your server. The only piece you'd need to add would be some sort of API layer that your clients talk to and which encapsulates the Kafka-specific logic from your clients. This layer would need a local DB (relational or NoSQL) that could store responses by uuid making it very fast and easy for the API to answer whether a response is available for a specific uuid.
Easier! You can only write on zookeeper that the UUID X should be answered on partition Y, and make the producer that sent that UUID consume the partition Y... Does that make sense?
I think you need a well defined shard key of the service that invokes the request. Your request should contain this shard key and the name of the topic where to post response. Also you should create some sort of state machine and when a message regarding your task comes you transition to some state... this would be for strict async design
In theory, you could
assign an ID to each request and message that is supposed to get a result message;
create a hash function that would map this ID to an identifier of of a partition,
when sending the result message, use the same hash function to get the identifier of the partition to send it to,
in the producer you could only observe that given partition.
That would reduce the need to crawl many messages in that topic to filter out the result required by the waiting request handler.

kafka as event store in event sourced system

This question is similar to Using Kafka as a (CQRS) Eventstore. Good idea?, but more implementation specific.
How to use kafka as event store, when I have thousands of event "sources" (aggregate roots in DDD)? As I've read in linked question and some other places, I'll have problems with topic per source. If I split events to topics by type, it will be much easier to consume and store, but I need access to event stream of particular source. How to do event sourcing with kafka?
Post all of your event sources to a single topic with a data type (thrift?) that includes some unique identifier for each event source. Then create consumers for each event type that you are interested in and identify each with a unique consumer group name. This way each unique source consumer will have its own offset value in zookeeper. Everybody reads the whole topic but only outputs (or deals with) info from a single source (or group of sources).