How to implement retry and recover logic with Spring Reactive Kafka - apache-kafka

We are using the https://github.com/reactor/reactor-kafka project for implementing Spring Reactive Kafka. But we want to utilize Kafka retry and recover logic with reactive Kafka.
Can anyone provide some sample code?

Since you are using spring ecosystem for retry and recovery you can use spring-retry looks at there documentation spring -retry. There are enough references available on web.
A sample example below class is consuming messages from kafka topic and processing.
The method consuming is marked Retryable, so in case there is
exception processing it will retry and if retry doesn't succeed then
the corresponding recovery method will be called.
public class KafkaListener{
#KafkaListener(topic="books-topic", id ="group-1")
#Retryable(maxAttempts = 3, value = Exception.class))
public void consuming(String message){
// To do message processing
// Whenever there is exception thrown from this method
// - it will retry 3 times in total
// - Even after retry we get exception then it will be handed of to below
// recover method recoverConsuming
}
#Recover
public void recoverConsuming(Exception exception, String message){
// Recovery logic
// you can implement your recovery scenario
}
}

Related

Spring Batch partitioned job JMS acknowledgement

Let's say I have a Spring Batch remote partitioned job, i.e. I have a manager application instance which starts the job and partitions the work and I have multiple workers who are executing individual partitions.
The message channel where the partitions are sent to the workers is an ActiveMQ queue and the Spring Integration configuration is based on JMS.
Assume that I wanna make sure that in case of a worker crashing in the middle of the partition execution, I want to make sure that another worker will pick up the same partition.
I think here's where acknowledging JMS messages would come in handy to only acknowledge a message in case a worker has fully completed its work on a particular partition but it seems as soon as the message is received by a worker, the message is acknowledged right away and in case of failures in the worker Spring Batch steps, the message won't reappear (obviously).
Is this even possible with Spring Batch? I've tried transacted sessions too but it doesn't really work either.
I know how to achieve this with JMS API. The difficulty comes from the fact that there is a lot of abstraction with Spring Batch in terms of messaging, and I'm unable to figure it out.
I know how to achieve this with JMS API. The difficulty comes from the fact that there is a lot of abstraction with Spring Batch in terms of messaging, and I'm unable to figure it out.
In this case, I think the best way to answer this question is to remove all these abstractions coming from Spring Batch (as well as Spring Integration), and try to see where the acknowledgment can be configured.
In a remote partitioning setup, workers are listeners on a queue in which messages coming from the manager are of type StepExecutionRequest. The most basic code form of a worker in this setup is something like the following (simplified version of StepExecutionRequestHandler, which is configured as a Spring Integration service activator when using the RemotePartitioningWorkerStepBuilder):
#Component
public class BatchWorkerStep {
#Autowired
private JobRepository jobRepository;
#Autowired
private StepLocator stepLocator;
#JmsListener(destination = "requests")
public void receiveMessage(final Message<StepExecutionRequest> message) throws JMSException {
StepExecutionRequest request = message.getObject();
Long jobExecutionId = request.getJobExecutionId();
Long stepExecutionId = request.getStepExecutionId();
String stepName = request.getStepName();
StepExecution stepExecution = jobRepository.getStepExecution(jobExecutionId, stepExecutionId);
Step step = stepLocator.getStep(stepName);
try {
step.execute(stepExecution);
stepExecution.setStatus(BatchStatus.COMPLETED);
} catch (Throwable e) {
stepExecution.addFailureException(e);
stepExecution.setStatus(BatchStatus.FAILED);
} finally {
jobRepository.update(stepExecution); // this is needed in a setup where the manager polls the job repository
}
}
}
As you can see, the JMS message acknowledgment cannot be configured on the worker side (there is no way to do it with attributes of JmsListener, so it has to be done somewhere else. And this is actually at the message listener container level with DefaultJmsListenerContainerFactory#setSessionAcknowledgeMode.
Now if you are using Spring Integration to configure the messaging middleware, you can configure the acknowledgment mode in Spring Integration .

InvalidPidMappingException causes kafka stream application to shut down

I have this application written in kafka streams. Every now and then it throws InvalidPidMappingException.
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.errors.InvalidPidMappingException: The producer attempted to use a producer id which is not currently assigned to its transactional id.
And I have this snippet of code which sets the Uncaught Exception Handler for the app
streams.setUncaughtExceptionHandler(
(Thread thread, Throwable exp) -> {
log.error("Unhandled exception in thread with name ", exp);
SpringApplication.exit(applicationContext, () -> 1);
}
);
I understand that this exception occurs when the coordinator expires the producer's transaction id after it has not received any transaction status updates from it.
I have few questions regarding this exception:
I thought that after this exception, the producer would retry to
sync it's transaction id with the coordinator and resume without causing the kafka stream thread to be killed. Even if I change the above code snippet to not exit the Spring Application on InvalidPidMappingException, it still kills the stream thread. Is there a way to avoid the death of streams thread on InvalidPidMappingException? I have seen the desired behaviour when there is UnknownProducerIdException. Or am I missing something here?
Other than transactional id expiration, can there be any other reason for this exception to occur?
Why the InvalidPidMappingException is handled differently than
UnknownProducerIdException? The former kills the Stream Tread and the latter recovers just fine.
I am using the following versions of the libraries:
spring-kafka-version = '2.5.5.RELEASE'
apache-kafka-clientVersion = '2.5.1'
confluent-version = '5.4.2'
A few months late, but the discussion on this Apacha Kafka Jira is helpful.
To summarise, with v2.8.0 and above of the Apache Kafka streams library you can use the new setUncaughtExceptionHandler() method in org.apache.kafka.streams.KafkaStreams to handle any uncaught exceptions and keep the stream running by terminating the current thread and creating a new one for future processing by returning org.apache.kafka.streams.errors.StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.REPLACE_THREAD.
e.g.
kafkaStreams.setStreamsUncaughtExceptionHandler(e -> {
return StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.REPLACE_THREAD;
});
Javadocs for KafkaStreams::setUncaughtExceptionHandler method added in 2.8.0

Kafka Transaction in case multi Threading

I am trying to create kafka producer in trasnsaction i.e. i want to write a group of msgs if anyone fails i want to rollback all the msg.
kafkaProducer.beginTransaction();
try
{
// code to produce to kafka topic
}
catch(Exception e)
{
kafkaProducer.abortTransaction();
}
kafkaProducer.commitTransaction();
The problem is for single thread above works just fine, but when multiple threads writes it throws exception
Invalid transaction attempted from state IN_TRANSITION to IN_TRANSITION
while debugging I found that if the thread1 transaction is in progress and thread2 also says beingTransaction it throws this exception. What I dont find if how to solve this issue. One possible thing I could find is creating a pool of produce.
Is there any already available API for kafka producer pool or i will have to create my own.
Below is the improvement jira already reported for this.
https://issues.apache.org/jira/browse/KAFKA-6278
Any other suggestion will be really helpful
You can only have a single transaction in progress at a time with a producer instance.
If you have multiple threads doing separate processing and they all need exactly once semantics, you should have a producer instance per thread.
Not sure if this was resolved.
you can use apache common pool2 to create a producer instance pool.
In the create() method of the factory implementation you can generate and assign a unique transactionalID to avoid a conflict (ProducerFencedException)

Produce called with an IAsyncSerializer value serializer configured but an ISerializer is required when using Avro Serializer

I am working with Kafka cluster and using Transactional Producer for atomic streaming (read-process-write).
// Init Transactions
_transactionalProducer.InitTransactions(DefaultTimeout);
// Begin the transaction
_transactionalProducer.BeginTransaction();
// produce message to one or many topics
var topic = Topics.MyTopic;
_transactionalProducer.Produce(topic, consumeResult.Message);
I am using AvroSerializer since I publish messages with Schema.
Produce throws an exception:
"System.InvalidOperationException: Produce called with an IAsyncSerializer value serializer configured but an ISerializer is required.\r\n at Confluent.Kafka.Producer`2.Produce(TopicPartition topicPartition, Message`2 message, Action`1 deliveryHandler)"
All examples I've seen for transactional producer use Produce method rather than ProduceAsync so not sure I can simply switch to ProduceAsync and assume that transactional produce will function correctly. Correct me if I'm wrong or help me find documentation.
Otherwise, I am not able to find AvroSerializer that is not Async, inheriting from ISerializer.
public class AvroSerializer<T> : IAsyncSerializer<T>
I didn't realize that there is AsSyncOverAsync method which I can use when creating the Serializer. This exists because Kafka Consumer is also still Sync and not Async.
For example:
new AvroSerializer<TValue>(schemaRegistryClient, serializerConfig).AsSyncOverAsync();
Here is Confluent documentation of that method.
//
// Summary:
// Create a sync serializer by wrapping an async one. For more information on the
// potential pitfalls in doing this, refer to Confluent.Kafka.SyncOverAsync.SyncOverAsyncSerializer`1.
public static ISerializer<T> AsSyncOverAsync<T>(this IAsyncSerializer<T> asyncSerializer);

JPA transaction handling between #Stateless and #Asynchronous EJBs

I have a stateless EJB which inserts data into database, sends a response immediately and in the last step calls an asynchronous EJB. Asynchronous EJB can run for long (I mean 5-10 mins which is longer then JPA transaction timeout). The asynchronous ejb needs to read (and work on it) the same record tree (only read) as the one persisted by stateless EJB.
Is seems that the asynchronous bean tries to read the record tree before it was commited or inserted (JPA) by the statelsss EJB so record tree is not visible by async bean.
Stateless EJB:
#Stateless
public class ReceiverBean {
public void receiverOfIncomingRequest(data) {
long id = persistRequest(data);
sendResponseToJmsBasedOnIncomingData(data);
processorAsyncBean.calculate(id);
}
}
}
Asynchronous EJB:
#Stateless
public class ProcessorAsyncBean {
#Asynchronous
public void calculate(id) {
Data data = dao.getById(id); <- DATA IS ALLWAYS NULL HERE!
// the following method going to send
// data to external system via internet (TCP/IP)
Result result = doSomethingForLongWithData(data);
updateData(id, result);
}
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void updateData(id, result) {
dao.update(id, result);
}
Maybe I can use a JMS queue to send a signal with ID to the processor bean instead of calling asyc ejb (and message driven bean read data from database) but I want to avoid that if possible.
Another solution can be to pass the whole record tree as a detached JPA object to the processor async EJB instead of reading data back from database.
Can I make async EJB work well in this structure somehow?
-- UPDATE --
I was thinking about using Weblogic JMS. There is another issue here. In case of big load, when there are 100 000 or more data in queue (that will be normal) and there is no internet connection then all of my data in the queue will fail. In case of that exception (or any) appears during sending data via internet (by doSomethingForLongWithData method) the data will be rollbacked to the original queue based on the redelivery-limit and repetitaion settings of Weblogic. This rollback event will generate 100 000 or more threads on Weblogic in the managed server to manage redelivery. That new tons of background processes can kill or at least slow down the server.
I can use IBM MQ as well because we have MQ infrastructure. MQ does not have this kind of affect on Weblogic server but MQ does not have redelivery-limit and delay function. So in case of error (rollback) the message will appear immediately on the MQ again, without delay and I built a hand mill. Thread.sleep() in the catch condition is not a solution in EE application I guess...
Is seems that the asynchronous bean tries to read the record tree before it was commited or inserted (JPA) by the statelsss EJB so record tree is not visible by async bean.
This is expected behavior with bean managed transactions. Your are starting the asynchronous EJB from the EJB with its own transaction context. The asynchronous EJB never uses the callers transaction context (see EJB spec 4.5.3).
As long as you are not using transaction isolation level "read uncommited" with your persistence, you won't see the still not commited data from the caller.
You must think about the case, when the asynch job won't commit (e.g. applicationserver shutdown or abnormal abortion). Is the following calculation and update critical? Is the asynchronous process recoverable if not executed successfully or not even called?
You can think about using bean managed transactions, commiting before calling the asynchronous EJB. Or you can delegate the data update to another EJB with a new transactin context. This will be commited before the call of the asynchronous EJB. This is usally ok for uncritical stuff, missing or failing.
Using persistent and transactional JMS messages along with a dead letter queue has the advantage of a reliable processing of your caclulation and update, even with stopping / starting application server in between or with temporal errors during processing.
You just need to call async method next to the one with transaction markup, so when transaction is committed.
For example, caller of receiverOfIncomingRequest() method, could add
processorAsyncBean.calculate(id);
call next to it.
UPDATE : extended example
CallerMDB
#TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
public void onMessage(Message message) {
long id = receiverBean.receiverOfIncomingRequest(data);
processorAsyncBean.calculate(id);
}
ReceiverBean
#TransactionAttribute(TransactionAttributeType.REQUIRED)
public long receiverOfIncomingRequest(data) {
long id = persistRequest(data);
sendResponseToJmsBasedOnIncomingData(data);
return id;
}