Kafka Stream/Table Join and Message Headers - apache-kafka

Have spent a whole day googling and trying, and what I believe the current state is that the message headers are accessible from the Processor APIs.
I wanted to access to Process API from DSL, so I implemented a ValueTransformSupplier and from there I have access to the processor context which gives me access to the header of the Stream.
But here's the kicker ..
I am doing a Stream / Table join and the header I to access is from the Table record NOT the message header from the stream which is what the ProcessContext holds.
So, is there a way to access the headers on the message that is represented in the KTable from the Stream/Table join?

Related

Merge multiple json messages from one topic into single json message

i am working on a requirement where i have to merge multiple json messages coming out of different tables of a single transaction into a single json message object. I am using Apache-Kafka.
Ex: "Customer Order" is the transaction which has 1) Customer_details 2) Order_details 3) Payment_details
4) Delivery_instructions tables. From the "Orders" topic, i am getting separate jsons for each of the above tables. Importantly, the transaction is started with a "BEGIN" status and closed by "END" status tags in the json messages.
My requirement is to merge these jsons into a single nested json object and push the same to another topic which will be consumed by another party. Please let me know if there are any supporting documentation or tools for this.
I did not find any useful info when i searched for supportive documentation.
You can use Kafka Streams join/cogroup/aggregate functions to group all data based on some information, such as the transaction ID. Session windowing may help.
By the way, Kafka already has transactional producer support, so explicitly putting BEGIN and END in separate records may not be the best idea. There are other ways to do orchestration across multiple, chained topics where the last consumer in the chain would assume it is the "end" (although it could be converted to producer itself, and continue the chain).

How to get unmatched records from a stream to stream join in ksqldb?

I'm just starting out in ksqldb and I just can't figure out what is the right flow model for what I'm trying to achieve.
I have a kafka topic that receives several types of events in json format. My requirement is two find pairs from 2 types of events in the topic, join them into a single message and write it back to kafka. Each pair will have the same value in the key field of the kafka message.
The event types are 'meta' and 'data'.
This is pretty straight forward when talking about inner join. I've created a source stream on top of the source topic and then built two streams on top of that stream, where each stream is selecting one of the event types. Afterwards, I've created an inner join CSAS, on top of those two streams and write the paired output message out to a kafka topic. My requirement says I need to find a match within a 1 hour window.
The problem is that I also have to find 'orphaned' or unmatched records, within this same time window and produce a message to kafka (it will have an 'error' type in it's event type field). The message should have data from the 'orphaned' event type, plus a custom header that I need to add.
So basically, I'm talking about a full outer join and I need to get only records that didn't find it's pair, within the 1 hour window.
I'm using ksqldb version 0.18.
So what would be the right way to go about what I'm trying to achieve, without upgrading my ksqldb version?

Is Kafka message headers the right place to put event type name?

In scenario where multiple single domain event types are produced to single topic and only subset of event types are consumed by consumer i need a good way to read the event type before taking action.
I see 2 options:
Put event type (example "ORDER_PUBLISHED") into message body (payload) itself which would be like broker agnostic approach and have other advantages. But would involve parsing of every message just to know the event type.
Utilize Kafka message headers which would allow to consume messages without extra payload parsing.
The context is event-sourcing. Small commands, small payloads. There are no huge bodies to parse. Golang. All messages are protobufs. gRPC.
What is typical workflow in such scenario.
I tried to google on this topic, but didn't found much on Headers use-cases and good practices.
Would be great to hear when and how to use Kafka message headers and when not to use.
Clearly the same topic should be used for different event types that apply to the same entity/aggregate (reference). Example: BookingCreated, BookingConfirmed, BookingCancelled, etc. should all go to the same topic in order to (excuse the pun) guarantee ordering of delivery (in this case the booking ID is the message key).
When the consumer gets one of these events, it needs to identify the event type, parse the payload, and route to the processing logic accordingly. The event type is the piece of message metadata that allows this identification.
Thus, I think a custom Kafka message header is the best place to indicate the type of event. I'm not alone:
Felipe Dutra: "Kafka allow you to put meta-data as header of your message. So use it to put information about the message, version, type, a correlationId. If you have chain of events, you can also add the correlationId of opentracing"
This GE ERP system has a header labeled "event-type" to show "The type of the event that is published" to a kafka topic (e.g., "ProcessOrderEvent").
This other solution mentions that "A header 'event' with the event type is included in each message" in their Kafka integration.
Headers are new in Kafka. Also, as far as I've seen, Kafka books focus on the 17 thousand Kafka configuration options and Kafka topology. Unfortunately, we don't easily find much on how an event-driven architecture can be mapped with the proper semantics onto elements of the Kafka message broker.

Join multiple logs through kafka?

I have 2 types logs...
Http "GET" logs. Stores UUID, the raw HTTP request + total processing time. (Stored in "logs" topic, NOT keyed)
Event/Command log from the application. Stores UUID + the event generated of the request of 1 (Stored in "events" topic, has key)
What's the best way to join these guys? I know there are various platform to do this...
I was thinking first I need to to read the "logs" topic parse it and store it back into a "parsed" topic with the proper key.
Join "events" topic with the "parsed" topic.
Also events/commands (#2 logs) can come in days latter (but mostly instant within a minute or 2).
What's the purpose of the join - is it to drive further processing, or for analytics?
Since you already have your data in Apache Kafka, I would recommend using the Kafka Streams API, and/or KSQL. KSQL runs on top of Kafka Streams. You can join between topics using either of these.
You can do things like rekey topics with KSQL as well.

Does Kafka support request response messaging

I am investigating Kafka 9 as a hobby project and completed a few "Hello World" type examples.
I have got to thinking about Real World Kafka applications based on request response messaging in general and more specifically how to link a Kafka request message to its response message.
I was thinking along the lines of using a generated UUID as the request message key and employ this request UUID as the associated response message key. Much the same type of mechanism that WebSphere MQ has message correlation id.
My end 2 end process would be.
1). Kafka client generates a random UUID and sends a single Kafka request message.
2). The server would consume this request message extract & store the request UUID value
3). complete a Business Process using the message payload.
4). Respond with a response message that employs the stored UUID value from the request message as response message Key.
5). the Kafka client polls the response topic until it either timeouts or retrieves a message with the original request UUID value.
What I concerned about is that the Kafka Consumer polling will remove other clients messages from the response topic and increment the offsets making other clients fail.
Am I trying to apply Kafka in a use case it was never designed for?
Is it possible to implement request/response messaging in Kafka?
Even though Kafka provides convenience methods to persist the committed offsets for a given consumer group, you're not required to use that behavior and can write your own if you feel the need. Even so, the use of Kafka the way you've described it is a bit awkward for the use case as each client needs to repeatedly search the topic for a specific response. That's inefficient at best.
You could break the problem into two parts, continuing to use Kafka to deliver requests to and responses from your server. The only piece you'd need to add would be some sort of API layer that your clients talk to and which encapsulates the Kafka-specific logic from your clients. This layer would need a local DB (relational or NoSQL) that could store responses by uuid making it very fast and easy for the API to answer whether a response is available for a specific uuid.
Easier! You can only write on zookeeper that the UUID X should be answered on partition Y, and make the producer that sent that UUID consume the partition Y... Does that make sense?
I think you need a well defined shard key of the service that invokes the request. Your request should contain this shard key and the name of the topic where to post response. Also you should create some sort of state machine and when a message regarding your task comes you transition to some state... this would be for strict async design
In theory, you could
assign an ID to each request and message that is supposed to get a result message;
create a hash function that would map this ID to an identifier of of a partition,
when sending the result message, use the same hash function to get the identifier of the partition to send it to,
in the producer you could only observe that given partition.
That would reduce the need to crawl many messages in that topic to filter out the result required by the waiting request handler.