I'm a bit confused about poll() expected behavior. In my app, if the processLogic() works, then I should commit manually the offset so when the next poll() is called I do receive the new messages.
Problem occurs when processLogic() throws an error. I set the consumer to seek to the offset failed during processing. On next poll(), it again receives the same message.(correct behavior as I ordered consumer to manually reset the offset to that position)Imagine it works fine and also doCommitSync() is called.
The unexpected behavior occurs in the following poll(). It should take the new messages, but it still retrieve the last message, which leads to call again to the processLogic() function and doCommitSync(). It also throws the following error during doCommitSync():
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
My consumer config:
enable.auto.commit=false
isolation.level=read_committed
auto.offset.reset=latest
public void runConsumer() {
Runtime.getRuntime().addShutdownHook(new Thread(this::shutdown));
try {
consumer.subscribe(topics);
while (!closed.get()) {
processedStatus.set(false);
final ConsumerRecords<String, String> consumedRecords = consumer.poll(numRecords);
if (!consumedRecords.isEmpty()) {
StreamSupport.stream(consumedRecords.spliterator(), false)
.map(ConsumerRecord::value)
.forEach(record -> {
try {
processLogic(); //do some logic which can fail
processedStatus.set(true);
} catch (Exception e) {
logger.error("Error applying action: " + record.getUuid(), e);
}
if (processedStatus.get()) {
doCommitSync();
} else {
consumer.seek(new TopicPartition(recordTopic, recordPartition), recordOffset);
}
});
}
}
} catch (WakeupException e) {
logger.error("Kafka Consumer wakeup exception");
} finally {
alertConsumer.close();
shutdownLatch.countDown();
}
Related
What I've to do is pause the KafkaConsumer if during message consuming an error is thrown.
This is what I wrote
#KafkaListener(...)
public void consume(
#Header(KafkaHeaders.CONSUMER) KafkaConsumer<String,String> consumer,
#Payload String message) {
try {
//consumer message
} catch(Exception e) {
saveConsumer(consumer);
consumer.pause();
}
}
Then I wrote a REST service in order to resume the consumer
#RestController
#RequestMapping("/consumer")
class ConsumerRestController {
#PostMapping("/resume")
public void resume() {
KafkaConsumer<String,String> consumer = getConsumer();
if(consumer != null) {
consumer.resume(consumer.paused());
}
}
}
Now, I've two questions.
First question: When I call consumer.pause() from #KafkaListener annotated method what happens?
Consumer is immediately paused or I can receive other messages associated on other offset of same topic-partition.
For example, I have "message1" with offset 3 and "message2" with offset 4, "message1" cause an exception, what happens to "message2"? Is it consumed anyway?
Second question: Resuming the consumer from REST service give a ConcurrentModificationException because KafkaConsumer is not thread safe. So, how come I have to do this?
Do not pause the consumer directly; pause the container instead.
#KafkaListener(id = "foo", ...)
#Autowired KafkaListenerEndpointRegistry;
...
registry.getListenerContainer("foo").pause();
The pause will take effect before the next poll; if you want to immediately pause (and not process the remaining records from the last poll), throw an exeption after pausing (assuming you are using the, now default, SeekToCurrentErrorHandler.
For me it seems like kafka transactional producer is behaving like a regular producer, the meesages are visible on the topic as send is called for each message. Maybe I am missing something basic. I was expecting the messages to appear in the topic only after the producer commit method is called. In my code below produce.commitTransactions() is commented out but I still get the messages in the topic. Thanks for any pointers.
public static void main(String[] args) {
try {
Properties producerConfig = new Properties();
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "...");
producerConfig.put(ProducerConfig.CLIENT_ID_CONFIG, "transactional-producer-1");
producerConfig.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); // enable idempotence
producerConfig.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "test-transactional-id-1"); // set transaction id
producerConfig.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
producerConfig.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(producerConfig);
producer.initTransactions(); //initiate transactions
try {
producer.beginTransaction(); //begin transactions
for (Integer i = 0; i < 1000; i++) {
producer.send(new ProducerRecord<String, String>("t_test", i.toString(), "value_" + i));
}
// producer.commitTransaction(); //commit
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
}
producer.close();
} catch (Exception e) {
System.out.println(e.toString());
}
}
When it comes to transactions in Kafka you need to consider a Producer/Consumer pair. A Producer itself, as you have observed, is just producing data and either committing the transaction or not.
Only in interplay with a consumer you can "complete" a transaction by setting the KafkaConsumer configuration isolation.level set to read_committed (by default it is set to read_uncommitted). This configuration is described as:
isolation.level: Controls how to read messages written transactionally. If set to read_committed, consumer.poll() will only return transactional messages which have been committed. If set to read_uncommitted' (the default), consumer.poll() will return all messages, even transactional messages which have been aborted. Non-transactional messages will be returned unconditionally in either mode.
I'm having a hard time figuring simple patterns for handling exceptions in the consumer of a Kafka topic.
Scenario is as follows: in the consumer I call an external service. If the service is unavailable I want to retry a few times and then stop consuming.
The simplest pattern seems a blocking synchronous way of dealing with it, something like this in java:
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
boolean processed=false;
int count=0;
while (!processed) {
try {
callService(..);
} catch (Exception e) {
if (count++ < 3) {
Thread.sleep(5000);
continue;
} else throw new RuntimeException();
}
}
}
However, I have the feeling there must be a simpler approach (without using third party libraries), and one that avoids blocking the thread.
Seems like a common thing we would like to have, yet I could not find a simple example for this pattern.
There is no such retrial mechanism provided by Kafka out of the box. With the experience of using RabbitMQ where the MQ provides a retry exchange. These exchanges are called as Dead-Letter-Exchanges in RabbitMQ.
https://www.rabbitmq.com/dlx.html
You can apply the same pattern in the case of kafka.
On message processing failure we can publish a copy of the message to another topic and wait for the next message. Let’s call the new topic the ‘retry_topic’. The consumer of the ‘retry_topic’ will receive the message from the Kafka and then will wait some predefined time, for example one hour, before starting the message processing. This way we can postpone next attempts of the message processing without any impact on the ‘main_topic’ consumer. If processing in the ‘retry_topic’ consumer fails we just have to give up and store the message in the ‘failed_topic’ for further manual handling of this problem. The ‘main_topic’ consumer code may look like this:
Pushing message to retry_topic on failure/exception
void consumeMainTopicWithPostponedRetry() {
while (true) {
Message message = takeNextMessage("main_topic");
try {
process(message);
} catch (Exception ex) {
publishTo("retry_topic");
LOGGER.warn("Message processing failure. Will try once again in the future.", ex);
}
}
}
Consumer of the retry topic
void consumeRetryTopic() {
while (true) {
Message message = takeNextMessage("retry_topic");
try {
process(message);
waitSomeLongerTime();
} catch (Exception ex) {
publishTo("failed_topic");
LOGGER.warn("Message processing failure. Will skip it.", ex);
}
}
}
The above strategy and examples are picked from the below link. The whole credit goes to the owner of the blog post.
https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a
For non-blocking way of doing above can be understood by reading the whole blog post. Hope this helps.
If I have a enable.auto.commit=false and I call consumer.poll() without calling consumer.commitAsync() after, why does consumer.poll() return
new records the next time it's called?
Since I did not commit my offset, I would expect poll() would return the latest offset which should be the same records again.
I'm asking because I'm trying to handle failure scenarios during my processing. I was hoping without committing the offset, the poll() would return the same records again so I can re-process those failed records again.
public class MyConsumer implements Runnable {
#Override
public void run() {
while (true) {
ConsumerRecords<String, LogLine> records = consumer.poll(Long.MAX_VALUE);
for (ConsumerRecord record : records) {
try {
//process record
consumer.commitAsync();
} catch (Exception e) {
}
/**
If exception happens above, I was expecting poll to return new records so I can re-process the record that caused the exception.
**/
}
}
}
}
The starting offset of a poll is not decided by the broker but by the consumer. The consumer tracks the last received offset and asks for the following bunch of messages during the next poll.
Offset commits come into play when a consumer stops or fails and another instance that is not aware of the last consumed offset picks up consumption of a partition.
KafkaConsumer has pretty extensive Javadoc that is well worth a read.
Consumer will read from last commit offset if it get re balanced (means if any consumer leave the group or new consumer added) so handling de-duplication does not come straight forward in kafka so you have to store the last process offset in external store and when rebalance happens or app restart you should seek to that offset and start processing or you should check against some unique key in message against DB to find is dublicate
I would like to share some code how you can solve this in Java code.
The approach is that you poll the records, try to process them and if an exception occurs, you seek to the minima of the topic partitions. After that, you do the commitAsync().
public class MyConsumer implements Runnable {
#Override
public void run() {
while (true) {
List<ConsumerRecord<String, LogLine>> records = StreamSupport
.stream( consumer.poll(Long.MAX_VALUE).spliterator(), true )
.collect( Collectors.toList() );
boolean exceptionRaised = false;
for (ConsumerRecord<String, LogLine> record : records) {
try {
// process record
} catch (Exception e) {
exceptionRaised = true;
break;
}
}
if( exceptionRaised ) {
Map<TopicPartition, Long> offsetMinimumForTopicAndPartition = records
.stream()
.collect( Collectors.toMap( r -> new TopicPartition( r.topic(), r.partition() ),
ConsumerRecord::offset,
Math::min
) );
for( Map.Entry<TopicPartition, Long> entry : offsetMinimumForTopicAndPartition.entrySet() ) {
consumer.seek( entry.getKey(), entry.getValue() );
}
}
consumer.commitAsync();
}
}
}
With this setup, you poll the messages again and again until you successfully process all messages of one poll.
Please note, that your code should be able to handle a poison pill. Otherwise, your code will stuck in an endless loop.
I am using rabbitmq and I want to make sure that if I have a connection problem in the client, the messages that I posted won't be lost. I simulate it with eclipse: I do system.exit the program of fetching after 100 messages. I posted 1000 messages. The second run I don't limit the number of messages and it returns me 840 messages with 3 times. Can you help me?
the code of the producer is:
public void run() {
String json =SimpleQueueServiceSample.getFromList();
while (!(json.equals(""))){
json =SimpleQueueServiceSample.getFromList();
try {
c.basicPublish("", "test",
MessageProperties.PERSISTENT_TEXT_PLAIN, json.getBytes());
} catch (IOException e) {
e.printStackTrace();
}
}
try {
c.waitForConfirmsOrDie();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
the code of the consumber is:
QueueingConsumer consumer = new QueueingConsumer(channel);
channel.basicConsume(QUEUE_NAME, true, consumer);
while (true) {
System.out.println(count++);
QueueingConsumer.Delivery delivery = consumer.nextDelivery();
String message = new String(delivery.getBody());
System.out.println(" [x] Received '" + message + "'");
}
So the challenge for your scenario is how you're handling the acknowledgements.
channel.basicConsume(QUEUE_NAME, true, consumer);
Is the problem. The second parameter of true is the auto-acknowledge field.
To fix that, use:
channel.basicConsume(QUEUE_NAME, false, consumer);
while (true) {
QueueingConsumer.Delivery delivery = consumer.nextDelivery();
//...
channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false);
}
It looks like you're using RabbitMQ's tutorials, and your code snippet is from part one. If you look at part two, they start talking about acknowledgements and setting up quality of service to provide round-robin dispatch.
It's worth pointing out that the basicConsume() and nextDelivery() combination rely upon a hidden queue that lives within the consumer. So when you call basicConsume() several messages are pulled down to the client to local storage.
The benefit at that approach is that it avoids additional network overhead from calling for each individual message. The problem is that it can put more messages within your local consumer than you wish and you may lose messages if the consumer drops before processing all of the messages in the local hidden queue.
If you truly want your consumers only working on one message a time so that nothing is lost, you probably want to look at the basicGet() method instead of the basicConsume().