I am creating a kafka streams/in-out kind of application.
Sample code looks like the following
private MessageChannel output;
public void process(List<String> input) {
--somelogic
output.send()
}
Based on my understanding, kafka buffers the messages before sending them out. Now in case the app crashes / container crashes, there is a possibility of losing the buffered messages.
How can we ensure that messages are actually sent out
(something like KafkaTemplate.flush() here ?)
EDIT
Based on the suggestion by Gary Russell, we should set the FLUSH header on the last message.
Follow up question -
Given the last send method call will become a blocking call because of the blocking nature of the kafkaProducer.flush() method, if there are exceptions thrown during the sending process of the kafka message (e.g. io exception/ auth exception) will they be raised in the same method context ?
e.g. in the following code will the kafka sender exception be caught in the catch block ?
public void process(List<String> input) {
--somelogic
try {
Message<?> message = //
message.setHeader(KafkaIntegrationHeaders.FLUSH, true);
output.send()
}
catch(Exception e) {
e.printstacktrace()
}
}
The underlying KafkaProducerMessageHandler has a property:
/**
* Specify a SpEL expression that evaluates to a {#link Boolean} to determine whether
* the producer should be flushed after the send. Defaults to looking for a
* {#link Boolean} value in a {#link KafkaIntegrationHeaders#FLUSH} header; false if
* absent.
* #param flushExpression the {#link Expression}.
*/
public void setFlushExpression(Expression flushExpression) {
Currently, the binder doesn't support customizing this property.
However, if you are sending Message<?>s, you can set the KafkaIntegrationHeaders.FLUSH header to true on the last message in the batch.
Related
i'm using spring kafka and i have a kafka consumer written in java spring boot. My consumer consume batch wise and relevant configuration beans are given below.
#Bean
public ConsumerFactory<String, Object> consumerFactory() {
Map<String, Object> config = new HashMap<>();
// default configs like bootstrap servers, key and value deserializers are here
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "5");
return new DefaultKafkaConsumerFactory<>(config);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setCommitLogLevel(LogIfLevelEnabled.Level.DEBUG);
factory.setBatchListener(true);
return factory;
}
I consume messages and send those messages to an API endpoint. If the api is not available or if the rest template throw an error i want to send the whole batch to a DLT without retrying.
What i want to do is send the whole batch to the DLT without retrying. If we throw BatchListenerFailedException that specific index number owning message from the batch will send to a DLT. In BatchListenerFailedException we can pass only one integer value as index value and not a list. But what i want is to send the whole batch as it is to a DLT topic without retrying. Is there a way to achieve that?
my spring Kafka version is 2.8.6
Edit
my default error handler is like below
#Bean
public CommonErrorHandler commonErrorHandler() {
ExponentialBackOffWithMaxRetries exponentialBackOffWithMaxRetries = new ExponentialBackOffWithMaxRetries(5);
exponentialBackOffWithMaxRetries.setInitialInterval(my val);
exponentialBackOffWithMaxRetries.setMultiplier(my val);
exponentialBackOffWithMaxRetries.setMaxInterval(my val);
DefaultErrorHandler errorHandler = new DefaultErrorHandler(
new DeadLetterPublishingRecoverer(kafkaTemplate(),
(record, exception) -> new TopicPartition(record.topic() + "-dlt", record.partition())),
exponentialBackOffWithMaxRetries);
errorHandler.addNotRetryableExceptions(ParseException.class);
errorHandler.addNotRetryableExceptions(EventHubNonRetryableException.class);
return errorHandler;
}
In my case in used ExponentialBackOffWithMaxRetries instead of FixedBackOff. In my case i have 3 scenarios.
1 - Retry messages and send it to DLT (Throwing any other exception than BatchListenerFailedException)
2 - Send couple of messages from the batch to DLT without retrying (Using BatchListenerFailedException for this)
3 - Send whole batch to the DLT without retrying.
3rd one is the place where i'm struggling. If i send some other exception then it will retry couple of times. (Even if i used FixedBackOff instead of ExponentialBackOffWithMaxRetries )
Throw something else other than BatchListenerFailedException; use a DefaultErrorHandler with a DeadLetterPublishingRecoverer with no retries (new FixedBackOff(0L, 0L)).
EDIT
Starting with versions 3.0.0, 2.9.3, 2.8.11, you can configure not retryable exceptions for batch errors.
https://github.com/spring-projects/spring-kafka/issues/2459
See
/**
* Add exception types to the default list. By default, the following exceptions will
* not be retried:
* <ul>
* <li>{#link DeserializationException}</li>
* <li>{#link MessageConversionException}</li>
* <li>{#link ConversionException}</li>
* <li>{#link MethodArgumentResolutionException}</li>
* <li>{#link NoSuchMethodException}</li>
* <li>{#link ClassCastException}</li>
* </ul>
* All others will be retried, unless {#link #defaultFalse()} has been called.
* #param exceptionTypes the exception types.
* #see #removeClassification(Class)
* #see #setClassifications(Map, boolean)
*/
#SafeVarargs
#SuppressWarnings("varargs")
public final void addNotRetryableExceptions(Class<? extends Exception>... exceptionTypes) {
add(false, exceptionTypes);
notRetryable(Arrays.stream(exceptionTypes));
}
Note that 2.8.x is now out of OSS support. https://spring.io/projects/spring-kafka#support
I'm processing messages from a Kafka topic with Samza. Some of the messages come with a timestamp in the future and I'd like to postpone the processing until after that timestamp. In the meantime, I'd like to keep processing other incoming messages.
What I tried to do is make my Task queue the messages and implement the WindowableTask to periodically check the messages if their timestamp allows to process them. The basic idea looks like this:
public class MyTask implements StreamTask, WindowableTask {
private HashSet<MyMessage> waitingMessages = new HashSet<>();
#Override
public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
byte[] message = (byte[]) incomingMessageEnvelope.getMessage();
MyMessage parsedMessage = MyMessage.parseFrom(message);
if (parsedMessage.getValidFromDateTime().isBeforeNow()) {
// Do the processing
} else {
waitingMessages.add(parsedMessage);
}
}
#Override
public void window(MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
for (MyMessage message : waitingMessages) {
if (message.getValidFromDateTime().isBeforeNow()) {
// Do the processing and remove the message from the set
}
}
}
}
This obviously has some downsides. I'd be losing my waiting messages in memory when I redeploy my task. So I'd like to know the best practice for delaying the processing of messages with Samza. Do I need to reemit the messages to the same topic again and again until I can finally process them? We're talking about delaying the processing for a few minutes up to 1-2 hours here.
It's important to keep in mind, when dealing with message queues, is that they perform a very specific function in a system: they hold messages while the processor(s) are busy processing preceding messages. It is expected that a properly-functioning message queue will deliver messages on demand. What this implies is that as soon as a message reaches the head of the queue, the next pull on the queue will yield the message.
Notice that delay is not a configurable part of the equation. Instead, delay is an output variable of a system with a queue. In fact, Little's Law offers some interesting insights into this.
So, in a system where a delay is necessary (for example, to join/wait for a parallel operation to complete), you should be looking at other methods. Typically a queryable database would make sense in this particular instance. If you find yourself keeping messages in a queue for a pre-set period of time, you're actually using the message queue as a database - a function it was not designed to provide. Not only is this risky, but it also has a high likelihood of hurting the performance of your message broker.
I think you could use key-value store of Samza to keep state of your task instance instead of in-memory Set.
It should look something like:
public class MyTask implements StreamTask, WindowableTask, InitableTask {
private KeyValueStore<String, MyMessage> waitingMessages;
#SuppressWarnings("unchecked")
#Override
public void init(Config config, TaskContext context) throws Exception {
this.waitingMessages = (KeyValueStore<String, MyMessage>) context.getStore("messages-store");
}
#Override
public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector,
TaskCoordinator taskCoordinator) {
byte[] message = (byte[]) incomingMessageEnvelope.getMessage();
MyMessage parsedMessage = MyMessage.parseFrom(message);
if (parsedMessage.getValidFromDateTime().isBefore(LocalDate.now())) {
// Do the processing
} else {
waitingMessages.put(parsedMessage.getId(), parsedMessage);
}
}
#Override
public void window(MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
KeyValueIterator<String, MyMessage> all = waitingMessages.all();
while(all.hasNext()) {
MyMessage message = all.next().getValue();
// Do the processing and remove the message from the set
}
}
}
If you redeploy you task Samza should recreate state of key-value store (Samza keeps values in special kafka topic related to key-value store). You need of course provide some extra configuration of your store (in above example for messages-store).
You could read about key-value store here (for the latest Samza version):
https://samza.apache.org/learn/documentation/0.14/container/state-management.html
This is a followup question to - Reading the same message several times from Kafka. If there is a better way to ask this question without posting a new question, let me know. In this post Gary mentions
"But you will still see later messages first if they have already been retrieved so you will have to discard those too."
Is there a clean way to discard messages already read by poll() after calling seek()? I started implementing logic to do this by saving the initial offset (in recordOffset), incrementing it on success. On failure, I call seek() and set the value of recordOffset to record.offset(). Then for every new message I check to see if the record.offset() is greater than recordOffset. If it is, I simply call acknowledge(), thereby "discarding" all the previously read messages. Here is the code -
// in onMessage()...
if (record.offset() > recordOffset){
acknowledgment.acknowledge();
return;
}
try {
processRecord(record);
recordOffset = record.offset()+1;
acknowledgment.acknowledge();
} catch (Exception e) {
recordOffset = record.offset();
consumerSeekCallback.seek(record.topic(), record.partition(), record.offset());
}
The problem with this approach is that it gets complicated with multiple partitions. Is there an easier/cleaner way?
EDIT 1
Based on Gary's suggestion below, I tried adding an errorHandler like this -
#KafkaListener(topicPartitions =
{#org.springframework.kafka.annotation.TopicPartition(topic = "${kafka.consumer.topic}", partitions = { "1" })},
errorHandler = "SeekToCurrentErrorHandler")
Is there something wrong with this syntax as I get "Cannot resolve method 'errorHandler'"?
EDIT 2
After Gary explained the 2 error handlers, I removed the above errorHandler and added below to the config file -
#Bean
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(new DefaultKafkaConsumerFactory<>(kafkaProps()));
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
When I start the application, I get this error now...
java.lang.NoSuchMethodError: org.springframework.util.Assert.state(ZLjava/util/function/Supplier;)V
at org.springframework.kafka.listener.adapter.MessagingMessageListenerAdapter.determineInferredType(MessagingMessageListenerAdapter.java:396)
Here is line 396 -
Assert.state(!this.isConsumerRecordList || validParametersForBatch,
() -> String.format(stateMessage, "ConsumerRecord"));
Assert.state(!this.isMessageList || validParametersForBatch,
() -> String.format(stateMessage, "Message<?>"));
Starting with version 2.0.1, if the container's ErrorHandler is a RemainingRecordsErrorHandler, such as the SeekToCurrentErrorHandler, the remaining records (including the failed one) are sent to the error handler instead of the listener.
This allows the SeekToCurrentErrorHandler to reposition every partition so the next poll will return the unprocessed record(s).
/**
* An error handler that seeks to the current offset for each topic in the remaining
* records. Used to rewind partitions after a message failure so that it can be
* replayed.
*
* #author Gary Russell
* #since 2.0.1
*
*/
public class SeekToCurrentErrorHandler implements RemainingRecordsErrorHandler
EDIT
There are two types of error handler. The KafkaListenerErrorHandler (specified in the annotation) works at the listener level; it is wired into the listener adapter that invokes the #KafkaListener annotation and thus only has access to the current record.
The second error handler (configured on the listener container) works at the container level and thus has access to the remaining records. The SeekToCurrentErrorHandler is a container-level error handler.
It is configured on the container properties in the container factory...
#Bean
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(this.consumerFactory);
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.RECORD);
return factory;
}
You go right way and yes you have to deal with different partitions as well. There is a FilteringMessageListenerAdapter, but you still have to write the logic.
After opening an order with our brokerage firm, we desire to obtain the fill price from the ExecutionReport messages. Below you will find the callback code used.
The MarketDataSnapshotFullRefresh messages are received properly, but the second if block is never triggered. Strangely, the corresponding messages.log file does contain multiple 35=8 messages.
We use QuickFIX/J as FIX engine.
#Override
public void fromApp(Message message, SessionID sessionID) throws FieldNotFound, IncorrectDataFormat, IncorrectTagValue, UnsupportedMessageType {
if (message instanceof MarketDataSnapshotFullRefresh) {
// do stuff with MarketDataSnapshotFullRefresh
}
if(message instanceof ExecutionReport) {
// do stuff with ExecutionReport
}
Message handling is ideally done by a quickfix.MessageCracker, though sometimes handling them in fromApp is the way to go.
You can read more about message cracking here: QuickFIX/J User Manual - Receiving Messages.
I'll outline both ways:
In fromApp
Messages coming in fromApp are not of specific message types as defined in the QuickFIX/J library, but are of type quickfix.Message. If you wanted to process them the way you are doing now (from fromApp), you would have to inspect the MsgType manually:
MsgType msgType = (MsgType) message.getHeader( ).getField( new MsgType( ) );
Based on the type retrieved, you would call a handler method for the specific message type:
if( msgType.valueEquals( MsgType.MARKET_DATA_SNAPSHOT_FULL_REFRESH ) )
handleMarketDataSnapshotFullRefresh( message, sessionID );
else if ...
...
private void handleMarketDataSnapshotFullRefresh( quickfix.Message msg, SessionID sessionID ) {
// handler implementation
}
Using MessageCracker
Another way to handle incoming messages as mentioned before, is through a MessageCracker. You would e.g. extend the class that implements quickfix.Application with quickfix.MessageCracker.
Add an onMessage method with two parameters, first being the message type, second a SessionID. Call crack from the fromApp method which will route to the appropriate handler.
import quickfix.*;
public class MyApplication extends MessageCracker implements Application
{
public void fromApp(Message message, SessionID sessionID)
throws FieldNotFound, UnsupportedMessageType, IncorrectTagValue {
crack(message, sessionID);
}
#Handler
public void onMessage(quickfix.fix44.MarketDataSnapshotFullRefresh mdsfr, SessionID sessionID) {
// handler implementation
}
}
Why are you doing the message processing in the wrong place ? If you check what is recommended by Quickfix you will see they recommend message processing happens in onMessage (which you might not have implemented). And there should only exist a message cracker in fromApp method.
Or else your fromApp method is going to be a hotchpotch of code and the next person handling your code is not going to be a happy soul.
I'm not quite sure why it happened. I request Market Data Request (with 263=1) and the counterparty gave response with (absolutely) MarketDataSnapshotFullRefresh (35=W). I've included the onMessage(QuickFix.FIX42.MarketDataSnapshotFullRefresh ...) on my message's cracker.. But the app has thrown the exception "QuickFix.UnsupportedMessageType"...
So, I tried to capture the SnapshotMarketData directly from "FromApp" (without Message Cracker) and it successfully done. So what's the matter with my message's cracker? Any idea?
This is the "FromApp" currently..
public void FromApp(QuickFix.Message msg, SessionID sessionID) //every inbound Application-level message
{
if (msg.Header.GetField(Tags.MsgType) == MsgType.MARKET_DATA_SNAPSHOT_FULL_REFRESH)
Homepage._homepage.GetFixMessage(msg.ToString());
else
Crack(msg, sessionID);
}
And this is the Message Cracker previously (before I capture directly from "FromApp"
#region MessageCracker handlers
public void onMessage(QuickFix.FIX42.MarketDataSnapshotFullRefresh mdsnapshot, SessionID s)
{
Homepage._homepage.GetFixMessage(mdsnapshot.ToString());
}
#endregion
OnMessage needs to start with a capital "O".
QF/n uses the C# convention of capitalizing method names.