We're trying to read the data from Kafka at specified window time (so we have Kafka consumer), that means avoiding the data read at other times. However we're not sure how to shut down the consumer after the time period is expired. I wonder if there are any example of how to do that? Many thanks in advance for helping us.
You can disable the autoStartup and then manually start the kafka containers using KafkaListenerEndpointRegistry start and stop methods
#KafkaListener Lifecycle Management
public class KafkaConsumer {
#Autowired
private KafkaListenerEndpointRegistry registry;
#KafkaListener(id = "myContainer", topics = "myTopic", autoStartup = "false")
public void listen(...) { ... }
#Schedule(cron = "")
public void scheduledMethod() {
registry.start();
registry.stop()
}
But in the above approach there is no guarantee that all messages from kafka will be consumed in that time frame (It depends on load and processing speed)
I had the same use case but I wrote the scheduler specifying the max poll record for one batch and kept a counter if the counter matched the max polled record then I consider that the processing for the batch is finished as it has processed the record which it got during the one poll.
And then I am unsubscribing to the topic and closing the consumer. The next time when the scheduler will run it will again process the max poll record specified limit.
fixedDelayString fulfil the purpose that the scheduler starts after the specified time limit once the previous is finished.
#EnableScheduling
public class MessageScheduler{
#Scheduled(initialDelayString = "${fixedInitialDelay.in.milliseconds}", fixedDelayString = "${fixedDelay.in.milliseconds}")
public void run(){
/*write your kafka consumer here with manual commit*/
/*once your batch is finished processing unsubcribe and close the consumer*/
kafkaConsumer.unsubscribe();
kafkaConsumer.close();
}
}
Related
I'm using #KafkaListener and ConcurrentKafkaListenerContainerFactory to listen to 3 kafka topics and each topic has 10 partitions. I have few questions on how this works.
ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(
ConsumerFactory<String, String> consumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(30);
factory.getContainerProperties().setSyncCommits(true);
return factory;
}
#KafkaListener(topics = "topic1", containerFactory="kafkaListenerContainerFactory")
public void handleMessage(final ConsumerRecord<Object, String> arg0) throws Exception {
}
#KafkaListener(topics = "topic2", containerFactory="kafkaListenerContainerFactory")
public void handleMessage(final ConsumerRecord<Object, String> arg0) throws Exception {
}
#KafkaListener(topics = "topic3", containerFactory="kafkaListenerContainerFactory")
public void handleMessage(final ConsumerRecord<Object, String> arg0) throws Exception {
}
my listener.ackmode is return and enable.auto.commit is set to false and partition.assignment.strategy: org.apache.kafka.clients.consumer.RoundRobinAssignor
1) my understanding about concurrency is, since i set my concurrency (at factory level) to 30, and i have a total of 30 partitions (for all three topic together) to read from, each thread will be assigned one partition. Is my understanding correct? how does it impact if i override concurrency again inside #KafkaListener annotation?
2) When spring call the poll() method, does it poll from all three topics?
3) since i set listener.ackmode is set to return, will it wait until all of the records that were returned in a single poll() to completed before issuing a next poll()? Also what happens if my records are taking longer than max.poll.interval.ms to process? Lets say 1-100 offsets are returned in a single poll() call and my code is only able to process 50 before max.poll.interval.ms is hit, will spring issue another poll at this time because it already hit max.poll.interval.ms? if so will the next poll() return records from offset 51?
really appreciate for your time and help
my listener.ackmode is return
There is no such ackmode; since you don't set it on the factory, your actual ack mode is BATCH (the default). To use ack mode record (if that's what you mean), you must so configure the factory container properties.
my understanding about concurrency is ...
Your understanding is incorrect; concurrency can not be greater than the number of partitions in the topic with the most partitions (if a listener listens to multiple topics). Since you only have 10 partitions in each topic, your actual concurrency is 10.
Overriding the concurrency on a listener simply overrides the factory setting; you always need at least as many partitions as concurrency.
When spring call the poll() method, does it poll from all three topics?
Not with that configuration; you have 3 concurrent containers, each with 30 consumers listening to one topic. You have 90 consumers.
If you have a single listener for all 3 topics, the poll will return records from all 3; but you still may have 20 idle consumers, depending on how the partition assignor allocates the partitions - see the logs "partitions assigned" for exactly how the partitions are allocated. The round robin assignor should distribute them ok.
will spring issue another poll at this time
Spring has no control - if you are taking too long, the Consumer thread is in the listener - the Consumer is not thread-safe so we can't issue an asynchronous poll.
You must process max.poll.records within max.poll.interval.ms to avoid Kafka from rebalancing the partitions.
The ack mode makes no difference; it's all about processing the results of the poll in a timely fashion.
I need to pause the Kafka consumer from consuming messages from the topic until the message reaches it's waiting time. For this one, I used pause/resume methods in Kafka. But when I resume, the first message that consumed before pausing, will not be received again. But still, the offset of the topic has not been updated since I do manual acknowledgment (lag is one).
#StreamListener(ChannelName.MESSAGE_INPUT_RETRY_CHANNEL)
public void onMessageRetryReceive(org.springframework.messaging.Message<Message> message, #Header(KafkaHeaders.CONSUMER)KafkaConsumer<?,?> consumer){
long waitTime = //Calculate the wait time of the message
Acknowledgment acknowledgment = message.getHeaders().get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment.class);
if(waitTime > 0){
consumer.pause(Collections.singleton(new TopicPartition("message-retry-topic",0)));
}else{
messageProducer.sendMessage(message.getPayload());
acknowledgment.acknowledge();
}
}
#Bean
public ApplicationListener<ListenerContainerIdleEvent> idleListener() {
return event -> {
boolean isReady = //Logi to check if ready to resume
if(isReady){
event.getConsumer().resume(event.getConsumer().paused());
}
};
}
This relates to the question mentioned in KafkaConsumer resume partition cannot continue to receive uncommitted messages. But I'm not sure how the seeks method can be helpful to retrieve the 1st consumed message. I'm using the spring cloud stream. I need some suggestion on this
The fact that you don't call acknowledgment.acknowledge();, doesn't mean that your KafkaConsumer instance doesn't keep the last consumed position in the memory.
We definitely need to commit offsets for subsequent consumers on the partition. Currently ran consumer doesn't need such an information to be committed, because it has it in its own in-memory state.
To be able to reconsume the same record you need to perform seek() operation.
See Docs for more info: https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek
we have one consumer group and three topics, all three topics are of different schema . created one consumer with a for loop passing each topic at a time and polling it processing and committing manually. Method used is consumer created common and in for loop I am subscribing one topic at a time and processing data.
I am seeing a random lag of consumer , although the topic has data my consumer fetches no records from topic and fetches sometimes. When I work out with a single topic instead of looping through three topics it is working but unable to reproduce.
need help to debug the issue and reproduce the same,
Rather than looping three topics in a single method, you could create a skeleton thread like so that consumes from any topic. See examples here
I can't say if this will "fix" the problem, but trying to consume from topics with different schemas in one application is usually not a scalable pattern, but it's not really clear what you're trying to do.
class ConsumerThread extends Thread {
KafkaConsumer consumer;
AtomicBoolean stopped = new AtomicBoolean();
ConsumerThread(Properties props, String subscribePattern) {
this.consumer = new KafkaConsumer...
this.consumer.subscribe(subscribePattern);
}
#Override
public void run() {
while (!this.stopped.get()) {
... records = this.consumer.poll(100);
for ( ... each record ... ) {
// Process record
}
}
}
public void stop() {
this.stopped.set(true);
}
}
Not meant to be production-grade
Then run three consumers independently.
new ConsumerThread("t1").start();
new ConsumerThread("t2").start();
new ConsumerThread("t3").start();
Note: KafkaConsumer is not thread-safe.
I'm using a kafka spring consumer that is under group management.
I have the following code in my consumer class
public class ConsumerHandler implements Receiver<String, Message>, ConsumerSeekAware {
#Value("${topic}")
protected String topic;
public ConsumerHandler(){}
#KafkaListener(topics = "${topic}")
public Message receive(List<ConsumerRecord<String, Message>> messages, Acknowledgment acknowledgment) {
for (ConsumerRecord<String, Message> message : messages) {
Message msg = message.value();
this.handleMessage(any, message);
}
acknowledgment.acknowledge();
return null;
}
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {
for (Entry<TopicPartition, Long> pair : assignments.entrySet()) {
TopicPartition tp = pair.getKey();
callback.seekToEnd(tp.topic(),tp.partition());
}
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> assignments, ConsumerSeekCallback callback) {}
}
This code works great while my consumer is running. However, sometimes the amount of messages being processed is too much and the messages stack up. I've implemented concurrency on my consumers and still sometimes there's delay in the messages over time.
So as a workaround, before I figure out why the delay is happening, I'm trying to keep my consumer up to the latest messages.
I'm having to restart my app to get partition assigned invoked so that my consumer seeks to end and starts processing the latest messages.
Is there a way to seek to end without having to bounce my application?
Thanks.
As explained in the JavaDocs and the reference manual, you can save off the ConsumerSeekCallback passed into registerSeekCallback in a ThreadLocal<ConsumerSeekCallback>.
Then, you can perform arbitrary seek operations whenever you want; however, since the consumer is not thread-safe, you must perform the seeks within your #KafkaListener so they run on the consumer thread - hence the need to store the callback in a ThreadLocal.
In version 2.0 and later, you can add the consumer as a parameter to the #KafkaListener method and perform the seeks directly thereon.
public Message receive(List<ConsumerRecord<String, Message>> messages, Acknowledgment acknowledgment,
Consumer<?, ?> consumer) {
The current version is 2.1.6.
I have never found or seen a fixed solution for this kind of problem. The way I do is to boost performance as high as possible base on the amount of messages to be processed and Kafka parameters.
Let's say if you have a shopping online app then you can control the upper bound of the number of transactions per day, said N. So you should make the app work well in the scenario where 1.5*N or 2*N transactions will need to sync to Kafka cluster. You keep this state until a day your shopping app reaches a new level and you will need to upgrade your Kafka system again. For shopping online app there are a special high number of transactions in promotion or mega sales days so what you prepare for your system is for these days.
I have a Kafka topic with 1-partition. 1 listener is defined in my spring-boot app using #KafkaListener. The listener uses a ThreadPoolTaskExecutor which picks the ConsumerRecord and processes it. However, I can see the strict ordering that kafka promises doesn't hold, in this scenario, as I can see offsets jumping sometimes (verified using timestamp) when parallel Threads starts processing... So questions:
Why does the Ordering doesn't follow for parallel threads within
listener?
How can we achieve parallelism and Ordering at the same time, so
that the parallel thread picks up the next offset and not jump?
EDIT 1
public class DefaultTopicListener {
#Autowired
ThreadPoolTaskExecutor executorPool;
#KafkaListener(topicPartitions=#TopicPartition(topic="defaultTopic",
partitions={"0"}))
public void onMessage(ConsumerRecord<String, CustomPayload> request) {
CustomPayload message = request.value();
try {
executorPool.execute(new Runnable() {
#Override
public void run() {
logger.info(
"onMessage : executorPool_THREAD_{}-> -> Offset {}.... ",
Thread.currentThread().getId(), request.offset());
}
});
} catch (RejectedExecutionException ex) {
logger.error(
"onMessage : executorPool -> Queue Full Request Rejected for offset -> {}", ex, );
}
}
public class Config {
#Bean("executorPool")
public ThreadPoolTaskExecutor executorPool(){
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(3);
executor.setMaxPoolSize(5);
executor.setQueueCapacity(5);
return executor;
}
}
Kindly advise.
Kafka typically recommend one thread per consumer. If you want to decouple processing from consumption in that case hands off ConsumerRecords instances to a blocking queue consumed by a pool of processor threads that actually handle the record processing.
https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
However ordering in this case is not guaranteed, as the threads will execute independently an earlier chunk of data may actually be processed after a later chunk of data just due to the luck of thread execution timing.
Ordering and Parallelism can be achieved by having multiple partitions and a single thread responsible for the partition, all the records in the partition will be processed in order by the thread.
It's not clear what you mean. Thread pools don't "pick" things, they are given tasks to run. You need to show your code.
Speculating...
If your listener is handing off a ConsumerRecord to a thread pool then, of course, record ordering is lost since the records are processed on different threads (unless the pool has a size of 1).
For a single partition, the listener container invokes the listener on a single thread. You must not hand off the work to other threads if you want to retain order.
The only way to achieve concurrency is to use multiple partitions and increase the concurrency on the container. The partitions will be distributed across the container threads.
Or, you need to manage the acknowledgments within your code to make sure no "jumps" are committed.
Ordering is only guaranteed within a partition so, again, you must not hand off to another thread.