ProcessorContext#header() is Empty - apache-kafka

We have kafka stream application. Producer is adding header in kafka message before sending it to Kafka Streaming application.
In Kafka streaming app we are using AbstractProcessor and context.forward(null, Optional.of(event)); to forward message to another topic.
But header is getting lossed. I want header to be as it is from input message to output topic.
ProcessorContext Interface. headers() method says Returns the headers of the current input record but it's empty in my case though I am sending message with header.
* Returns the headers of the current input record; could be null if it is not available
* #return the headers
*/
Headers headers();
Kafka Stream API Version: 2.3.1

context.headers() shoud be called with in process() if using a Processor or transform() if using a Transformer.

Related

KAFKA client library (confluent-kafka-go): synchronisation between consumer and producer in the case of auto.offset.reset = latest

I have a use case where I want to implement synchronous request / response on top of kafka. For example when the user sends an HTTP request, I want to produce a message on a specific kafka input topic that triggers a dataflow eventually resulting in a response produced on an output topic. I want then to consume the message from the output topic and return the response to the caller.
The workflow is:
HTTP Request -> produce message on input topic -> (consume message from input topic -> app logic -> produce message on output topic) -> consume message from output topic -> HTTP Response.
To implement this case, upon receiving the first HTTP request I want to be able to create on the fly a consumer that will consume from the output topic, before producing a message on the input topic. Otherwise there is a possibility that messages on the output topic are "lost". Consumers in my case have a random group.id and have auto.offset.reset = latest for application reasons.
My question is how I can make sure that the consumer is ready before producing messages. I make sure that I call SubscribeTopics before producing messages. but in my tests so far when there are no committed offsets and kafka is resetting offsets to latest, there is a possibility that messages are lost and never read by my consumer because kafka sometimes thinks that the consumer registered after the messages have been produced.
My workaround so far is to sleep for a bit after I create the consumer to allow kafka to proceed with the commit reset workflow before I produce messages.
I have also tried to implement logic in a rebalance call back (triggered by a consumer subscribing to a topic), in which I am calling assign with offset = latest for the topic partition, but this doesn't seem to have fixed my issue.
Hopefully there is a better solution out there than sleep.
Most HTTP client libraries have an implicit timeout. There's no guarantee your consumer will ever consume an event or that a downstream producer will send data to the "response topic".
Instead, have your initial request immediately return 201 Accepted status (or 400, for example, if you do request validation) with some tracking ID. Then require polling GET requests by-id for status updates either with 404 status or 200 + some status field within the request body.
You'll need a database to store intermediate state.

MassTransit Kafka Rider get raw message

I need to get raw message that was sent to kafka for logging.
For example, if validation context.Message was failed.
I tried answer from this Is there a way to get raw message from MassTransit?, but it doesn't work and context.TryGetMessage<JToken>() return null all the time.
The Confluent.Kafka client does not expose the raw message data, only the deserialized message type. Therefore, MassTransit does not have a message body accessible.

Publish messages that could not be de-serialized to DLT topic

I do not understand how messages that could not be de-serialized can be written to a DLT topic with spring kafka.
I configured the consumer according to the spring kafka docs and this works well for exceptions that occur after de-serialization of the message.
But when the message is not de-serializable a org.apache.kafka.common.errors.SerializationExceptionis thrown while polling for messages.
Subsequently, SeekToCurrentErrorHandler.handle(Exception thrownException, List<ConsumerRecord<?, ?>> records, ...) is called with this exception but with an empty list of records and is therefore unable to write something to DLT topic.
How can I write those messages to DLT topic also?
The problem is that the exception is thrown by the Kafka client itself so Spring doesn't get to see the actual record that failed.
That's why we added the ErrorHandlingDeserializer2 which can be used to wrap the actual deserializer; the failure is passed to the listener container and re-thrown as a DeserializationException.
See the documentation.
When a deserializer fails to deserialize a message, Spring has no way to handle the problem, because it occurs before the poll() returns. To solve this problem, version 2.2 introduced the ErrorHandlingDeserializer2. This deserializer delegates to a real deserializer (key or value). If the delegate fails to deserialize the record content, the ErrorHandlingDeserializer2 returns a null value and a DeserializationException in a header that contains the cause and the raw bytes. When you use a record-level MessageListener, if the ConsumerRecord contains a DeserializationException header for either the key or value, the container’s ErrorHandler is called with the failed ConsumerRecord. The record is not passed to the listener.
The DeadLetterPublishingRecoverer has logic to detect the exception and publish the failed record.

How to send a file from producer to consumer in kafka using spring boot?

I am having a kafka program to send a single message from producer and it is consumed by the consumer successfully.But i have a question is there any way to send a file instead of a single text message..If yes How can it be done?
You can send whatever you want in Kafka. Kafka won't infere anything from your data, it will handle it as a simple array of bytes. However be careful about the size of your Kafka record, some cluster parameters like message.max.bytes ( and many other) have to be updated accordingly.
So to answer your question, just read your file using any kind of IO reader ( depending on your programming language) and just send your file using a bytes or String serializer.
Yannick

Kafka DSL stream swallow custom headers

Is it possible to forward incoming messages with custom headers from topic A to B in DSL stream processor?
I notice all of my incomming messages in topic A contains custom headers, but when I put them into topic B all headers are swallowed by stream processor.
I usestream.to(outputTopic); method to process messages.
I have found this task, which is still OPEN.
https://issues.apache.org/jira/browse/KAFKA-5632?src=confmacro
Your observation is correct. Up to Kafka 1.1, Kafka Streams drops records headers.
Record header support is added in (upcoming) Kafka 2.0 allowing to read and modify headers using the Processor API (cf. https://issues.apache.org/jira/browse/KAFKA-6850). With KAFKA-6850, record headers will also be preserved (ie, auto-forwarded) if the DSL is used.
The mentioned issue KAFKA-5632 is about header manipulation at DSL level, that is still not supported in Kafka 2.0.
To manipulate headers using the DSL in Kafka 2.0, you can mix-and-match Processor API into the DSL by using KStream#transformValues(), #transform() or #process().