Kafka DSL stream swallow custom headers - apache-kafka

Is it possible to forward incoming messages with custom headers from topic A to B in DSL stream processor?
I notice all of my incomming messages in topic A contains custom headers, but when I put them into topic B all headers are swallowed by stream processor.
I usestream.to(outputTopic); method to process messages.
I have found this task, which is still OPEN.
https://issues.apache.org/jira/browse/KAFKA-5632?src=confmacro

Your observation is correct. Up to Kafka 1.1, Kafka Streams drops records headers.
Record header support is added in (upcoming) Kafka 2.0 allowing to read and modify headers using the Processor API (cf. https://issues.apache.org/jira/browse/KAFKA-6850). With KAFKA-6850, record headers will also be preserved (ie, auto-forwarded) if the DSL is used.
The mentioned issue KAFKA-5632 is about header manipulation at DSL level, that is still not supported in Kafka 2.0.
To manipulate headers using the DSL in Kafka 2.0, you can mix-and-match Processor API into the DSL by using KStream#transformValues(), #transform() or #process().

Related

Kafka Consumer and Producer

Can I have the consumer act as a producer(publisher) as well? I have a user case where a consumer (C1) polls a topic and pulls messages. after processing the message and performing a commit, it needs to notify another process to carry on remaining work. Given this use case is it a valid design for Consumer (C1) to publish a message to a different topic? i.e. C1 is also acting as a producer
Yes. This is a valid use case. We have many production applications does the same, consuming events from a source topic, perform data enrichment/transformation and publish the output into another topic for further processing.
Again, the implementation pattern depends on which tech stack you are using. But if you after Spring Boot application, you can have look at https://medium.com/geekculture/implementing-a-kafka-consumer-and-kafka-producer-with-spring-boot-60aca7ef7551
Totally valid scenario, for example you can have connector source or a producer which simple push raw data to a topic.
The receiver is loosely coupled to your publisher so they cannot communicate each other directly.
Then you need C1 (Mediator) to consume message from the source, transform the data and publish the new data format to a different topic.
If you're using a JVM based client, this is precisely the use case for using Kafka Streams rather than the base Consumer/Producer API.
Kafka Streams applications must consume from an initial topic, then can convert(map), filter, aggregate, split, etc into other topics.
https://kafka.apache.org/documentation/streams/

Using SmallRye Reactive Messaging in Quarkus for Fan-out use case

I'm new to Quarkus and SmallRye reactive messaging.
I have a use case where the application receives Kafka messages. Each message contains a batch of "records" and an identifier. Based on the identifier, these records will be routed to other "routers", which are essentially Kafka topic(s), each record as a new Kafka message to those particular topics(s).
Some requirements:
The original Kafka message only be ack-ed if there is a potential router, otherwise will be nack-ed and sent to a DLQ.
Context propagation is required as we use OTel for tracing.
Don't know if applicable here, but it should be optimized for throughput.
How can I achieve this with Quarku SmallRye Reactive Messaging Kafka? Apology if there is any vague points.

Is it possible to have a DeadLetter Queue topic on Kafka Source Connector side?

Is it possible to have a DeadLetter Queue topic on Kafka Source Connector side?
We have a challenge with the events processed by the IBM MQ Source connector, which is processing N number of messages but sending N-100 messages, where 100 messages are the Poison messages.
But from below blog by Robin Moffatt, I can see it is not doable to have DLQ on Source Connectors side.
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/
Below note mentioned in above article:
Note that there is no dead letter queue for source connectors.
1Q) Please confirm if anyone used the Deadletter queue for the IBM MQ Source Connector (below is the documentation)
https://github.com/ibm-messaging/kafka-connect-mq-source
2Q) Is anyone used the DLQ on any other source connectors side?
3Q) Why it is a limitation on not having DLQ on source connector side?
Thanks.
errors.tolerance is available for source connectors too - refer docs
However, if you compare that to sinks, no, DLQ options are not available. You would instead need to parse the Connector logs with the event details, then pipe that to a topic on your own.
Overall, how would the source connectors decide what events are bad? A network connection exception means that no messages would be read at all, so there's nothing to produce. If messages fail to serialize to Kafka events, then they also would fail to be produced... Your options are either to fail-fast, or skip and log.
If you're just wanting to send through binary data as-is, then nothing would be "poisonous" it can be done with the ByteArrayConverter class, but that's not really a good use case for Kafka Connect since it's primarily designed around Structured types with parsible Schemas, but at least with that option, data gets into Kafka and you can use Kstreams to branch/filter good messages from the bad ones

Is Kafka message headers the right place to put event type name?

In scenario where multiple single domain event types are produced to single topic and only subset of event types are consumed by consumer i need a good way to read the event type before taking action.
I see 2 options:
Put event type (example "ORDER_PUBLISHED") into message body (payload) itself which would be like broker agnostic approach and have other advantages. But would involve parsing of every message just to know the event type.
Utilize Kafka message headers which would allow to consume messages without extra payload parsing.
The context is event-sourcing. Small commands, small payloads. There are no huge bodies to parse. Golang. All messages are protobufs. gRPC.
What is typical workflow in such scenario.
I tried to google on this topic, but didn't found much on Headers use-cases and good practices.
Would be great to hear when and how to use Kafka message headers and when not to use.
Clearly the same topic should be used for different event types that apply to the same entity/aggregate (reference). Example: BookingCreated, BookingConfirmed, BookingCancelled, etc. should all go to the same topic in order to (excuse the pun) guarantee ordering of delivery (in this case the booking ID is the message key).
When the consumer gets one of these events, it needs to identify the event type, parse the payload, and route to the processing logic accordingly. The event type is the piece of message metadata that allows this identification.
Thus, I think a custom Kafka message header is the best place to indicate the type of event. I'm not alone:
Felipe Dutra: "Kafka allow you to put meta-data as header of your message. So use it to put information about the message, version, type, a correlationId. If you have chain of events, you can also add the correlationId of opentracing"
This GE ERP system has a header labeled "event-type" to show "The type of the event that is published" to a kafka topic (e.g., "ProcessOrderEvent").
This other solution mentions that "A header 'event' with the event type is included in each message" in their Kafka integration.
Headers are new in Kafka. Also, as far as I've seen, Kafka books focus on the 17 thousand Kafka configuration options and Kafka topology. Unfortunately, we don't easily find much on how an event-driven architecture can be mapped with the proper semantics onto elements of the Kafka message broker.

protobuf within avro encoded message on kafka

Wanted to know if there is a better way to solve the problem that we are having. Here is the flow:
Our client code understands only protocol buffers (protobuf). On the server side, our gateway gets the protobuf and puts it onto Kafka.
Now avrò is the recommended encoding scheme, so we put the specific protobuf within avro (as a byte array) and we put it onto the message bus. The reason we do this is to avoid having to do entire protobuf->avro conversion.
On the consumer side, it reads the avro message, gets the protobuf out of it and works on that.
How reliable is protobuf with Kafka? Are there a lot of people using it? What exactly are the advantages/disadvantages of using Kafka with protobuf?
Is there a better way to handle our use case/scenario?
thanks
Kafka doesn't differentiate between encoding schemes since at the end every message flows in and out of kafka as binary.
Both Proto-buff and Avro are binary based encoding schemes, why would you want to wrap a proto-buff inside an Avro schema, when you can directly put the proto-buff message into Kafka?