In Spring-Kafka which method does the same thing like onPartitionsRevoked? - apache-kafka

I know that in Sping-Kafka we have below methods:
void registerSeekCallback(ConsumerSeekCallback callback);
void onPartitionsAssigned(Map assignments, ConsumerSeekCallback callback);
void onIdleContainer(Map assignments, ConsumerSeekCallback callback);
But which one it does the same thing like the native ConsumerRebalanceListener method onPartitionsRevoked?
"This method will be called before a rebalance operation starts and
after the consumer stops fetching data. It is recommended that offsets
should be committed in this callback to either Kafka or a custom
offset store to prevent duplicate data."
If I want to implement the ConsumerRebalanceListener, how can I pass the KafkaConsumer reference? I only see the Consumer from Spring-Kafka.
=========update======
Hi,Gary when I add RebalanceListener this into the ContainerProperties. I can see that both methods get triggered. however, I got the exception, saying sth like "Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing" Do you have any idea?
=========== update 2 ============
public ConcurrentMessageListenerContainer<Integer, String> createContainer(
ContainerProperties containerProps, IKafkaConsumer iKafkaConsumer) {
Map<String, Object> props = consumerProps();
DefaultKafkaConsumerFactory<Integer, String> cf = new DefaultKafkaConsumerFactory<Integer, String>(props);
**RebalanceListner rebalanceListner = new RebalanceListner(cf.createConsumer());**
CustomKafkaMessageListener ckml = new CustomKafkaMessageListener(iKafkaConsumer, rebalanceListner);
CustomRecordFilter cff = new CustomRecordFilter();
FilteringAcknowledgingMessageListenerAdapter faml = new FilteringAcknowledgingMessageListenerAdapter(ckml, cff, true);
SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy();
retryPolicy.setMaxAttempts(5);
FixedBackOffPolicy backOffPolicy = new FixedBackOffPolicy();
backOffPolicy.setBackOffPeriod(1500); // 1.5 seconds
RetryTemplate rt = new RetryTemplate();
rt.setBackOffPolicy(backOffPolicy);
rt.setRetryPolicy(retryPolicy);
rt.registerListener(ckml);
RetryingAcknowledgingMessageListenerAdapter rml = new RetryingAcknowledgingMessageListenerAdapter(faml, rt);
containerProps.setConsumerRebalanceListener(rebalanceListner);
containerProps.setMessageListener(rml);
containerProps.setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL_IMMEDIATE);
containerProps.setErrorHandler(ckml);
containerProps.setAckOnError(false);
ConcurrentMessageListenerContainer<Integer, String> container = new ConcurrentMessageListenerContainer<>(
cf, containerProps);
container.setConcurrency(1);
return container;
}

You can add a RebalanceListener to the container's ContainerProperties that are passed into the constructor.

Related

why kafkaitemReader is always including last offset record of previous job run in the new job execution?

I am using spring batch kafkaItemReader in a job which is executed on a fixed delay of 10 seconds. Once the job with a chunk size of 1000 is completed, spring scheduler re-submits the same job again after a delay of 10 seconds. I am observing that KafkaReader is always including the last offset record in the subsequent job executions. Suppose, in the first job execution, records are processed from offset 1-1000, in my next job execution I am expecting kafkaItemReader to pick records from 1001 offset. But, in the next execution, kafkaItemReader is picking it up from offset 1000 (which is already processed).
Adding code blocks
//Job is getting submitted with scheduled task scheduler with below parameters
<task:scheduled-tasks>
<task:scheduled ref="runScheduler" method="run" fixed-delay="5000"/>
</task:scheduled-tasks>
//Job Parameters for each submission
String dateParam = new Date().toString();
JobParameters param =
new JobParametersBuilder().addString("date", dateParam).toJobParameters
//Below is the kafkaItemReader configuration
#Bean
public KafkaItemReader<String, String> kafkaItemReader() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "");
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, "");
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "");
props.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, "");
props.put(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "");
props.put(SslConfigs.SSL_KEY_PASSWORD_CONFIG, "");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class);
Map<TopicPartition,Long> partitionOffset = new HashMap<>();
return new KafkaItemReaderBuilder<String, String>()
.partitions(0)
.consumerProperties(props)
.name("customers-reader")
.saveState(true)
.pollTimeout(Duration.ofSeconds(10))
.topic("")
.partitionOffsets(partitionOffset)
.build();
}
#Bean
public Step kafkaStep(StepBuilderFactory stepBuilderFactory,ItemWriter testItemWriter,KafkaItemReader kafkaItemReader) throws Exception {
return stepBuilderFactory.get("kafkaStep")
.chunk(10)
.reader(kafkaItemReader)
.writer(testItemWriter)
.build();
}
#Bean
public Job kafkaJob(Step kafkaStep,JobBuilderFactory jobBuilderFactory) throws Exception {
return jobBuilderFactory.get("kafkaJob").incrementer(new RunIdIncrementer())
.start(kafkaStep)
.build();
}
Am i missing some config which is causing this behaviour? I don't see this behaviour if i stop and re-run the application, it picks the offset properly in this case.
You are running a new job instance on each shcedule (by using a different date as an identifying job parameter), but your reader is a singleton bean. This means it will be reused for each run without being reinitialized with the correct offset. You can make it step-scoped to have a new instance of the reader for each run:
#Bean
#StepScope
public KafkaItemReader<String, String> kafkaItemReader() {
...
}
This will give you the same behaviour as if you restart the application, which you said fixes the issue.

Kafka listener, get all messages

Good day collegues.
I have Kafka project using Spring Kafka what listen a definite topic.
I need one time in a day listen all messages, put them into a collection and find specific message there.
I couldn't understand how to read all messages in one #KafkaListener method.
My class is:
#Component
public class KafkaIntervalListener {
public CountDownLatch intervalLatch = new CountDownLatch(1);
private final SCDFRunnerService scdfRunnerService;
public KafkaIntervalListener(SCDFRunnerService scdfRunnerService) {
this.scdfRunnerService = scdfRunnerService;
}
#KafkaListener(topics = "${kafka.interval-topic}", containerFactory = "intervalEventKafkaListenerContainerFactory")
public void intervalListener(IntervalEvent event) throws UnsupportedEncodingException, JSONException {
System.out.println("Recieved interval message: " + event);
IntervalType type = event.getType();
Instant instant = event.getInterval();
List<IntervalEvent> events = new ArrayList<>();
events.add(event);
events.size();
this.intervalLatch.countDown();
}
}
My events collection always has size = 1;
I tried to use different loops, but then, my collection become filed 530 000 000 times the same message.
UPDATE:
I have found a way to do it with factory.setBatchListener(true); But i need to find launch it with #Scheduled(cron = "${kafka.cron}", zone = "Europe/Moscow"). Right now this method is always is listening. Now iam trying something like this:
#Scheduled(cron = "${kafka.cron}", zone = "Europe/Moscow")
public void run() throws Exception {
kafkaIntervalListener.intervalLatch.await();
}
It doesn't work, in debug mode my breakpoint never works on this site.
The listener container is, by design, message-driven.
For fetching messages on-demand, it's better to use the Kafka Consumer API directly and fetch messages using the poll() method.

How to check if Kafka Consumer is ready

I have Kafka commit policy set to latest and missing first few messages. If I give a sleep of 20 seconds before starting to send the messages to the input topic, everything is working as desired. I am not sure if the problem is with consumer taking long time for partition rebalancing. Is there a way to know if the consumer is ready before starting to poll ?
You can use consumer.assignment(), it will return set of partitions and verify whether all of the partitions are assigned which are available for that topic.
If you are using spring-kafka project, you can include spring-kafka-test dependancy and use below method to wait for topic assignment , but you need to have container.
ContainerTestUtils.waitForAssignment(Object container, int partitions);
You can do the following:
I have a test that reads data from kafka topic.
So you can't use KafkaConsumer in multithread environment, but you can pass parameter "AtomicReference assignment", update it in consumer-thread, and read it in another thread.
For example, snipped of working code in project for testing:
private void readAvro(String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
String bootstrapServers,
int readTimeout) {
// print the topic name
AtomicReference<Set<TopicPartition>> assignment = new AtomicReference<>();
new Thread(() -> readAvro(bootstrapServers, readFromKafka, needStop, events, readTimeout, assignment)).start();
long startTime = System.currentTimeMillis();
long maxWaitingTime = 30_000;
for (long time = System.currentTimeMillis(); System.currentTimeMillis() - time < maxWaitingTime;) {
Set<TopicPartition> assignments = Optional.ofNullable(assignment.get()).orElse(new HashSet<>());
System.out.println("[!kafka-consumer!] Assignments [" + assignments.size() + "]: "
+ assignments.stream().map(v -> String.valueOf(v.partition())).collect(Collectors.joining(",")));
if (assignments.size() > 0) {
break;
}
try {
Thread.sleep(1_000);
} catch (InterruptedException e) {
e.printStackTrace();
needStop.set(true);
break;
}
}
System.out.println("Subscribed! Wait summary: " + (System.currentTimeMillis() - startTime));
}
private void readAvro(String bootstrapServers,
String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
int readTimeout,
AtomicReference<Set<TopicPartition>> assignment) {
KafkaConsumer<String, byte[]> consumer = (KafkaConsumer<String, byte[]>) queueKafkaConsumer(bootstrapServers, "latest");
System.out.println("Subscribed to topic: " + readFromKafka);
consumer.subscribe(Collections.singletonList(readFromKafka));
long started = System.currentTimeMillis();
while (!needStop.get()) {
assignment.set(consumer.assignment());
ConsumerRecords<String, byte[]> records = consumer.poll(1_000);
events.addAll(CommonUtils4Tst.readEvents(records));
if (readTimeout == -1) {
if (events.size() > 0) {
break;
}
} else if (System.currentTimeMillis() - started > readTimeout) {
break;
}
}
needStop.set(true);
synchronized (MainTest.class) {
MainTest.class.notifyAll();
}
consumer.close();
}
P.S.
needStop - global flag, to stop all running thread if any in case of failure of success
events - list of object, that i want to check
readTimeout - how much time we will wait until read all data, if readTimeout == -1, then stop when we read anything
Thanks to Alexey (I have also voted up), I seemed to have resolved my issue essentially following the same idea.
Just want to share my experience... in our case we using Kafka in request & response way, somewhat like RPC. Request is being sent on one topic and then waiting for response on another topic. Running into a similar issue i.e. missing out first response.
I have tried ... KafkaConsumer.assignment(); repeatedly (with Thread.sleep(100);) but doesn't seem to help. Adding a KafkaConsumer.poll(50); seems to have primed the consumer (group) and receiving the first response too. Tested few times and it consistently working now.
BTW, testing requires stopping application & deleting Kafka topics and, for a good measure, restarted Kafka too.
PS: Just calling poll(50); without assignment(); fetching logic, like Alexey mentioned, may not guarantee that consumer (group) is ready.
You can modify an AlwaysSeekToEndListener (listens only to new messages) to include a callback:
public class AlwaysSeekToEndListener<K, V> implements ConsumerRebalanceListener {
private final Consumer<K, V> consumer;
private Runnable callback;
public AlwaysSeekToEndListener(Consumer<K, V> consumer) {
this.consumer = consumer;
}
public AlwaysSeekToEndListener(Consumer<K, V> consumer, Runnable callback) {
this.consumer = consumer;
this.callback = callback;
}
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
consumer.seekToEnd(partitions);
if (callback != null) {
callback.run();
}
}
}
and subscribe with a latch callback:
CountDownLatch initLatch = new CountDownLatch(1);
consumer.subscribe(singletonList(topic), new AlwaysSeekToEndListener<>(consumer, () -> initLatch.countDown()));
initLatch.await(); // blocks until consumer is ready and listening
then proceed to start your producer.
If your policy is set to latest - which takes effect if there are no previously committed offsets - but you have no previously committed offsets, then you should not worry about 'missing' messages, because you're telling Kafka not to care about messages that were sent 'previously' to your consumers being ready.
If you care about 'previous' messages, you should set the policy to earliest.
In any case, whatever the policy, the behaviour you are seeing is transient, i.e. once committed offsets are saved in Kafka, on every restart the consumers will pick up where they left previoulsy
I needed to know if a kafka consumer was ready before doing some testing, so i tried with consumer.assignment(), but it only returned the set of partitions assigned, but there was a problem, with this i cannot see if this partitions assigned to the group had offset setted, so later when i tried to use the consumer it didnĀ“t have offset setted correctly.
The solutions was to use committed(), this will give you the last commited offsets of the given partitions that you put in the arguments.
So you can do something like: consumer.committed(consumer.assignment())
If there is no partitions assigned yet it will return:
{}
If there is partitions assigned, but no offset yet:
{name.of.topic-0=null, name.of.topic-1=null}
But if there is partitions and offset:
{name.of.topic-0=OffsetAndMetadata{offset=5197881, leaderEpoch=null, metadata=''}, name.of.topic-1=OffsetAndMetadata{offset=5198832, leaderEpoch=null, metadata=''}}
With this information you can use something like:
consumer.committed(consumer.assignment()).isEmpty();
consumer.committed(consumer.assignment()).containsValue(null);
And with this information you can be sure that the kafka consumer is ready.

Kafka consumer does not start from latest message

I want to have a Kafka Consumer which starts from the latest message in a topic.
Here is the java code:
private static Properties properties = new Properties();
private static KafkaConsumer<String, String> consumer;
static
{
properties.setProperty("bootstrap.servers","localhost");
properties.setProperty("enable.auto.commit", "true");
properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("group.id", "test");
properties.setProperty("auto.offset.reset", "latest");
consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singletonList("mytopic"));
}
#Override
public StreamHandler call() throws Exception
{
while (true)
{
ConsumerRecords<String, String> consumerRecords = consumer.poll(200);
Iterable<ConsumerRecord<String, String>> records = consumerRecords.records("mytopic");
for(ConsumerRecord<String, String> rec : records)
{
System.out.println(rec.value());
}
}
}
Although the value for auto.offset.reset is latest, but the consumer starts form messages which belong to 2 days ago and then it catches up with the latest messages.
What am I missing?
Have you run this same code before with the same group.id? The auto.offset.reset parameter is only used if there is not an existing offset already stored for your consumer. So if you've run the example previously, say two days ago, and then you run it again, it will start from the last consumed position.
Use seekToEnd() if you would like to manually go to the end of the topic.
See https://stackoverflow.com/a/32392174/1392894 for a slightly more thorough discussion of this.
If you want to manually control the position of your offsets you need to set enable.auto.commit = false.
If you want to position all offsets to the end of each partition then call seekToEnd()
https://kafka.apache.org/0102/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToEnd(java.util.Collection)

Kafka Consumer not getting invoked when the kafka Producer is set to Sync

I have a requirement where there are 2 topics to be maintained 1 with synchronous approach and other with an asynchronous way.
The asynchronous works as expected invoking the consumer record, however in the synchronous approach the consumer code is not getting invoked.
Below is the code declared in the config file
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9093");
props.put(ProducerConfig.RETRIES_CONFIG, 3);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
I have enabled autoFlush true here
#Bean( name="KafkaPayloadSyncTemplate")
public KafkaTemplate<String, KafkaPayload> KafkaPayloadSyncTemplate() {
return new KafkaTemplate<String,KafkaPayload>(producerFactory(),true);
}
The control stops thereafter not making any calls to the consumer after returning the recordMetadataResults object
private List<RecordMetadata> sendPayloadToKafkaTopicInSync() throws InterruptedException, ExecutionException {
final List<RecordMetadata> recordMetadataResults = new ArrayList<RecordMetadata>();
KafkaPayload kafkaPayload = constructKafkaPayload();
ListenableFuture<SendResult<String,KafkaPayload>>
future = KafkaPayloadSyncTemplate.send(TestTopic, kafkaPayload);
SendResult<String, KafkaPayload> results;
results = future.get();
recordMetadataResults.add(results.getRecordMetadata());
return recordMetadataResults;
}
Consumer Code
public class KafkaTestListener {
#Autowired
TestServiceImpl TestServiceImpl;
public final CountDownLatch countDownLatch = new CountDownLatch(1);
#KafkaListener(id="POC", topics = "TestTopic", group = "TestGroup")
public void listen(ConsumerRecord<String,KafkaPayload> record, Acknowledgment acknowledgment) {
countDownLatch.countDown();
TestServiceImpl.consumeKafkaMessage(record);
System.out.println("Acknowledgment : " + acknowledgment);
acknowledgment.acknowledge();
}
}
Based on the issue, I have 2 questions
Should we manually call the listen() inside the Listener Class when its a Sync Producer. If Yes, How to do that ?
If the listener(#KafkaListener) get called automatically, what other setup/configurations do I need to add to make this working.
Thanks for the inputs in advance
-Srikant
You should be sure that you use consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); for Consumer Properties.
Not sure what you mean about sync/async, but produce and consume are fully distinguished operations. And you can't affect consumer from your producer side. Because in between there is Kafka Broker.