How to assign partitions before seek on ConsumerSeekCallback? - apache-kafka

I get the following exception
java.lang.IllegalStateException: No current assignment for partition
on
(ConsumerSeekCallback)callback.seek(topic, partition, offset);
From the kafka documentation -
void seek​(java.lang.String topic, int partition, long offset)
Perform a seek operation. When called from ConsumerSeekAware.onPartitionsAssigned(Map, ConsumerSeekCallback) or from ConsumerSeekAware.onIdleContainer(Map, ConsumerSeekCallback) perform the seek immediately on the consumer. When called from elsewhere, queue the seek operation to the consumer. The queued seek will occur after any pending offset commits. The consumer must be currently assigned the specified partition.
What should i do if the partition is not assigned?

You can capture which topics are assigned to you in ConsumerSeekAware - only perform the seek if you actually have received the topic.
If you extend AbstractConsumerSeekAware, you can call this
/**
* Return the callback for the specified topic/partition.
* #param topicPartition the topic/partition.
* #return the callback (or null if there is no assignment).
*/
#Nullable
protected ConsumerSeekCallback getSeekCallbackFor(TopicPartition topicPartition) {
return this.callbacks.get(topicPartition);
}

Related

Spring Cloud Stream Kafka Producer force producer.flush

I am creating a kafka streams/in-out kind of application.
Sample code looks like the following
private MessageChannel output;
public void process(List<String> input) {
--somelogic
output.send()
}
Based on my understanding, kafka buffers the messages before sending them out. Now in case the app crashes / container crashes, there is a possibility of losing the buffered messages.
How can we ensure that messages are actually sent out
(something like KafkaTemplate.flush() here ?)
EDIT
Based on the suggestion by Gary Russell, we should set the FLUSH header on the last message.
Follow up question -
Given the last send method call will become a blocking call because of the blocking nature of the kafkaProducer.flush() method, if there are exceptions thrown during the sending process of the kafka message (e.g. io exception/ auth exception) will they be raised in the same method context ?
e.g. in the following code will the kafka sender exception be caught in the catch block ?
public void process(List<String> input) {
--somelogic
try {
Message<?> message = //
message.setHeader(KafkaIntegrationHeaders.FLUSH, true);
output.send()
}
catch(Exception e) {
e.printstacktrace()
}
}
The underlying KafkaProducerMessageHandler has a property:
/**
* Specify a SpEL expression that evaluates to a {#link Boolean} to determine whether
* the producer should be flushed after the send. Defaults to looking for a
* {#link Boolean} value in a {#link KafkaIntegrationHeaders#FLUSH} header; false if
* absent.
* #param flushExpression the {#link Expression}.
*/
public void setFlushExpression(Expression flushExpression) {
Currently, the binder doesn't support customizing this property.
However, if you are sending Message<?>s, you can set the KafkaIntegrationHeaders.FLUSH header to true on the last message in the batch.

Spring Kafka difference between ContainerProperties(TopicPartitionOffset... topicPartitions) and ContainerProperties(String... topics)

I am creating a new KafkaMessageListenerContainer using a ContainerProperties.
Using ContainerProperties(String... topics), the Consumer Group looks fine: "state": "STABLE", "isSimpleConsumerGroup": false
Using ContainerProperties(TopicPartitionOffset... topicPartitions), the Consumer Groups is not automatically created. It is finally created when a message is sent but the Consumer Group doesn't look fine: "state": "EMPTY", "isSimpleConsumerGroup": true
What's the difference between them, did I miss something. I am expecting to have the same result using the two different ContainerProperties constructors.
ContainerProperties containerProps = new ContainerProperties(tpo.toArray(new TopicPartitionOffset[tpo.size()]));
containerProps.setGroupId(name);
// ContainerProperties containerProps = new ContainerProperties("poc-topic1",
// "poc-topic2", "poc-topic3");
// containerProps.setGroupId(name);
containerProps.setMessageListener(new TopicMessageListener(name));
DefaultKafkaConsumerFactory<String, Serializable> factory = new DefaultKafkaConsumerFactory<>(
Utils.get().getConsumerProperties());
container = new KafkaMessageListenerContainer<>(factory, containerProps);
// container.setAutoStartup(true);
// container.setBeanName(name);
// container.checkGroupId();
container.start();
That's not correct. The topic subscription causes a consumer group and their partitions distribution between group members.
When you do explicit partition assignment, not consumer group is involved at all.
See more in Apache Kafka docs: https://docs.confluent.io/platform/current/clients/consumer.html#consumer-groups
And respective JavaDocs:
/**
* Manually assign a list of partitions to this consumer. This interface does not allow for incremental assignment
* and will replace the previous assignment (if there is one).
* <p>
* If the given list of topic partitions is empty, it is treated the same as {#link #unsubscribe()}.
* <p>
* Manual topic assignment through this method does not use the consumer's group management
* functionality. As such, there will be no rebalance operation triggered when group membership or cluster and topic
* metadata change. Note that it is not possible to use both manual partition assignment with {#link #assign(Collection)}
* and group assignment with {#link #subscribe(Collection, ConsumerRebalanceListener)}.
* <p>
* If auto-commit is enabled, an async commit (based on the old assignment) will be triggered before the new
* assignment replaces the old one.
*
* #param partitions The list of partitions to assign this consumer
* #throws IllegalArgumentException If partitions is null or contains null or empty topics
* #throws IllegalStateException If {#code subscribe()} is called previously with topics or pattern
* (without a subsequent call to {#link #unsubscribe()})
*/
#Override
public void assign(Collection<TopicPartition> partitions) {
and:
/**
* Subscribe to the given list of topics to get dynamically assigned partitions.
* <b>Topic subscriptions are not incremental. This list will replace the current
* assignment (if there is one).</b> It is not possible to combine topic subscription with group management
* with manual partition assignment through {#link #assign(Collection)}.
*
* If the given list of topics is empty, it is treated the same as {#link #unsubscribe()}.
*
* <p>
* This is a short-hand for {#link #subscribe(Collection, ConsumerRebalanceListener)}, which
* uses a no-op listener. If you need the ability to seek to particular offsets, you should prefer
* {#link #subscribe(Collection, ConsumerRebalanceListener)}, since group rebalances will cause partition offsets
* to be reset. You should also provide your own listener if you are doing your own offset
* management since the listener gives you an opportunity to commit offsets before a rebalance finishes.
*
* #param topics The list of topics to subscribe to
* #throws IllegalArgumentException If topics is null or contains null or empty elements
* #throws IllegalStateException If {#code subscribe()} is called previously with pattern, or assign is called
* previously (without a subsequent call to {#link #unsubscribe()}), or if not
* configured at-least one partition assignment strategy
*/
#Override
public void subscribe(Collection<String> topics) {

KafkaMessageListenerContainer.stop() is not stopping consumption of messages in message listener

UseCase: Given topic with 100 messages in kafka topic, I want to read messaged from offset 10 to offset 20. I could able to fetch from beginning offset. when i reach end offset, I have written code to stop the container.Even after execution of code, Consumer can consume further messages(from offset 21).It only stops after reading all messages in the topic
#Service
public class Consumer1 implements MessageListener<String, GenericRecord> {
#Override
public void onMessage(ConsumerRecord<String, GenericRecord> data) {
log.info("feed record {}", data);
if (data.offset() == 20) {
feedService.stopConsumer();
}
}
}
#Service
public class FeedService{
// start logic here
public void stopConsumer() {
kafkaMessageListenerContainer.stop();
}
}
Note: I am using spring-kafka latest version(2.6.4). One observation is container stop method is being executed but consumer is not getting closed.And no errors on output
The stop() doesn't terminate the current records batch cycle:
while (isRunning()) {
try {
pollAndInvoke();
}
catch (#SuppressWarnings(UNUSED) WakeupException e) {
// Ignore, we're stopping or applying immediate foreign acks
}
That pollAndInvoke() calls a KafkaConsumer.poll(), gets some records collection and invokes your onMessage() on each record. At some point you decide to call the stop, but it doesn't mean that we are really in the end of that records list to exit immediately.
We really stop on the next cycle when that isRunning() returns false for us already.

Which kafka property decides Poll frequency for KafkaConsumer?

I am trying to understand kafka in some details with respect to kafka streams (kafka stream client to kafka).
I understand that KafkConsumer (java client) would get data from kafka, however I am not able to understand at which frequency does client poll kakfa topic to fetch the data?
The frequency of the poll is defined by your code because you're responsible to call poll.
A very naive example of user code using KafkaConsumer is like the following
public class KafkaConsumerExample {
...
static void runConsumer() throws InterruptedException {
final Consumer<Long, String> consumer = createConsumer();
final int giveUp = 100; int noRecordsCount = 0;
while (true) {
final ConsumerRecords<Long, String> consumerRecords =
consumer.poll(1000);
if (consumerRecords.count()==0) {
noRecordsCount++;
if (noRecordsCount > giveUp) break;
else continue;
}
consumerRecords.forEach(record -> {
System.out.printf("Consumer Record:(%d, %s, %d, %d)\n",
record.key(), record.value(),
record.partition(), record.offset());
});
consumer.commitAsync();
}
consumer.close();
System.out.println("DONE");
}
}
In this case the frequency is defined by the duration of processing the messages in consumerRecords.forEach.
However, keep in mind that if you don't call poll "fast enough" your consumer will be considered dead by the broker coordinator and a rebalance will be triggered.
This "fast enough" is determined by the property max.poll.interval.ms in kafka >= 0.10.1.0. See this answer for more details.
max.poll.interval.ms default value is five minutes, so if your consumerRecords.forEach takes longer than that your consumer will be considered dead.
If you don't want to use the raw KafkaConsumer directly you could use alpakka kafka, a library for consume from and produce to kafka topics in a safe and backpressured way (is based on akka streams).
With this library, the frequency of poll is determined by configuration akka.kafka.consumer.poll-interval.
We say is safe because it will continue polling to avoid the consumer is considered dead even when your processing can't keep up the rate. It's able to do this because KafkaConsumer allows pausing the consumer
/**
* Suspend fetching from the requested partitions. Future calls to {#link #poll(Duration)} will not return
* any records from these partitions until they have been resumed using {#link #resume(Collection)}.
* Note that this method does not affect partition subscription. In particular, it does not cause a group
* rebalance when automatic assignment is used.
* #param partitions The partitions which should be paused
* #throws IllegalStateException if any of the provided partitions are not currently assigned to this consumer
*/
#Override
public void pause(Collection<TopicPartition> partitions) { ... }
To fully understand this you should read about akka-streams and backpressure.

Spring Kafka: How to discard messages already retrieved by poll() after doing a seek()?

This is a followup question to - Reading the same message several times from Kafka. If there is a better way to ask this question without posting a new question, let me know. In this post Gary mentions
"But you will still see later messages first if they have already been retrieved so you will have to discard those too."
Is there a clean way to discard messages already read by poll() after calling seek()? I started implementing logic to do this by saving the initial offset (in recordOffset), incrementing it on success. On failure, I call seek() and set the value of recordOffset to record.offset(). Then for every new message I check to see if the record.offset() is greater than recordOffset. If it is, I simply call acknowledge(), thereby "discarding" all the previously read messages. Here is the code -
// in onMessage()...
if (record.offset() > recordOffset){
acknowledgment.acknowledge();
return;
}
try {
processRecord(record);
recordOffset = record.offset()+1;
acknowledgment.acknowledge();
} catch (Exception e) {
recordOffset = record.offset();
consumerSeekCallback.seek(record.topic(), record.partition(), record.offset());
}
The problem with this approach is that it gets complicated with multiple partitions. Is there an easier/cleaner way?
EDIT 1
Based on Gary's suggestion below, I tried adding an errorHandler like this -
#KafkaListener(topicPartitions =
{#org.springframework.kafka.annotation.TopicPartition(topic = "${kafka.consumer.topic}", partitions = { "1" })},
errorHandler = "SeekToCurrentErrorHandler")
Is there something wrong with this syntax as I get "Cannot resolve method 'errorHandler'"?
EDIT 2
After Gary explained the 2 error handlers, I removed the above errorHandler and added below to the config file -
#Bean
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(new DefaultKafkaConsumerFactory<>(kafkaProps()));
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
When I start the application, I get this error now...
java.lang.NoSuchMethodError: org.springframework.util.Assert.state(ZLjava/util/function/Supplier;)V
at org.springframework.kafka.listener.adapter.MessagingMessageListenerAdapter.determineInferredType(MessagingMessageListenerAdapter.java:396)
Here is line 396 -
Assert.state(!this.isConsumerRecordList || validParametersForBatch,
() -> String.format(stateMessage, "ConsumerRecord"));
Assert.state(!this.isMessageList || validParametersForBatch,
() -> String.format(stateMessage, "Message<?>"));
Starting with version 2.0.1, if the container's ErrorHandler is a RemainingRecordsErrorHandler, such as the SeekToCurrentErrorHandler, the remaining records (including the failed one) are sent to the error handler instead of the listener.
This allows the SeekToCurrentErrorHandler to reposition every partition so the next poll will return the unprocessed record(s).
/**
* An error handler that seeks to the current offset for each topic in the remaining
* records. Used to rewind partitions after a message failure so that it can be
* replayed.
*
* #author Gary Russell
* #since 2.0.1
*
*/
public class SeekToCurrentErrorHandler implements RemainingRecordsErrorHandler
EDIT
There are two types of error handler. The KafkaListenerErrorHandler (specified in the annotation) works at the listener level; it is wired into the listener adapter that invokes the #KafkaListener annotation and thus only has access to the current record.
The second error handler (configured on the listener container) works at the container level and thus has access to the remaining records. The SeekToCurrentErrorHandler is a container-level error handler.
It is configured on the container properties in the container factory...
#Bean
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(this.consumerFactory);
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.RECORD);
return factory;
}
You go right way and yes you have to deal with different partitions as well. There is a FilteringMessageListenerAdapter, but you still have to write the logic.