Spring kafka manual offset commit does not work as expected - apache-kafka

In my kafka consumer application, configurations are as below and I am using spring kafka version 2.8.
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaConfig.getBootStrapServer());
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.IntegerDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put(ConsumerConfig.GROUP_ID_CONFIG, CONSUMER_GROUP_ID);
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, MAX_POLL_RECORDS);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
kafka listner container factory is configured as below and I am using MANUAL_IMMEDIATE acknowledgement.
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<Integer, Order_Response>> kafkaListenerContainerFactory() throws SSMUtilFailedException {
ConcurrentKafkaListenerContainerFactory<Integer, Order_Response> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(1);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setSyncCommits(true);
factory.getContainerProperties().setCommitLogLevel(LogIfLevelEnabled.Level.INFO);
return factory;
}
My kafkalistner would look like this. Here I manaully acknowledge all the consumed records.
#KafkaListener(topics = KAFKA_CONSUME_TOPIC)
public void listenForOrderResponses(ConsumerRecord<Integer, record> consumedRecord, Acknowledgment ack){
ack.acknowledge();}
When I forcefully crash (JVM) consumer application and start it again, the consumer does not fetch records from the last committed offset. It misses some of the messages and the offset has increased. I want to consume from the last committed offset. Could you please tell me what is missing here?

Related

linger.ms not working as expected in Spring Kafka

I've a use case where I need to do batch processing using Kafka. Suppose if there are some 100 requests coming in 1 Minute, instead of publishing each request immediately, I wanted to batch all the 100 requests and publish it to topic once.
But with the below configuration, batch processing is not happening, as soon as a message is sent it is getting published to topic and simultaneously received in consumer
ProducerConfig
public class KafkaProducerConfig {
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.ACKS_CONFIG, 1);
props.put(ProducerConfig.RETRIES_CONFIG, 0);
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);
props.put(ProducerConfig.LINGER_MS_CONFIG, 60000);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 100000);
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, DefaultPartitioner.class);
return props;
}
#Bean
public ProducerFactory<String, String> producerFactory() {
return new DefaultKafkaProducerFactory<>(producerConfigs());
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
}
ConsumerConfig
public class KafkaConfig {
ConsumerFactory<String, String> kafkaConsumerFactory(Boolean autoCommit) {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, autoCommit);
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 1000);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 20000);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 5);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "batch");
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean("kafkaListenerContainerFactory")
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(kafkaConsumerFactory(Boolean.TRUE));
factory.setBatchListener(true);
factory.setConcurrency(1);
return factory;
}
}
Here I've set linger.ms = 60000 and as per my understanding if linger.ms is set to some value, then producer will wait at least for that amount of time even if sender thread became available earlier and batch size is not reached.
But here in my case as soon as a message is sent it is getting published to topic without waiting for 60000 ms or for batch size to reach the value which is being set
Producer
#Autowired
private KafkaTemplate<String, String> kafka;
kafka.send("batch-test", message);
Consumer
#KafkaListener(id = "testGroup", topics = {"batch-test"})
public void test(#Payload List<String> messages, #Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitions,
#Header(KafkaHeaders.OFFSET) List<Long> offsets) {
for (int i = 0; i < messages.size(); i++) {
System.out.println(messages.get(i) + partitions.get(i) + "-" + offsets.get(i) + "");
}
}
Small Update here
I checked the Spring kakfa logs and found that ProducerConfig values are not being set properly, they are falling back to default values
roducerConfig values:
acks = -1
batch.size = 16384
bootstrap.servers = [localhost:9092]
buffer.memory = 33554432
client.dns.lookup = use_all_dns_ips
client.id = producer-1
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = true
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
Why is linger.ms and batch.size is falling back to default values?
Found solution for this, linger.ms was not causing any effect because the value was not being set from the code (still don't know why). Hence the linger.ms has its default value as 0 because of which batching was not happening.
Later I set the value for both linger.ms and batch.size in the application.properties then it worked.
spring.kafka.producer.batch-size=1000000
spring.kafka.producer.properties.linger.ms=10000
As far as my understanding BATCH_SIZE_CONFIG (batch.size) will work perfectly along with LINGER_MS_CONFIG (linger.ms), only when multiple records intended to send to the same partition, but not across the partitions, that means, you should be having some keying mechanism or better partitioning strategy in place to achieve batching while producing events into the topic.
From Doc:
batch.size
When multiple records are sent to the same partition, the producer will batch them
together. This parameter controls the amount of memory in bytes (not messages!)
that will be used for each batch. When the batch is full, all the messages in the batch
will be sent. However, this does not mean that the producer will wait for the batch to
become full. The producer will send half-full batches and even batches with just a sin‐
gle message in them. Therefore, setting the batch size too large will not cause delays
in sending messages; it will just use more memory for the batches. Setting the batch
size too small will add some overhead because the producer will need to send mes‐
sages more frequently.
linger.ms
linger.ms controls the amount of time to wait for additional messages before send‐
ing the current batch. KafkaProducer sends a batch of messages either when the cur‐
rent batch is full or when the linger.ms limit is reached. By default, the producer will
send messages as soon as there is a sender thread available to send them, even if
there’s just one message in the batch. By setting linger.ms higher than 0, we instruct
the producer to wait a few milliseconds to add additional messages to the batch
before sending it to the brokers. This increases latency but also increases throughput
(because we send more messages at once, there is less overhead per message).

Kafka Transactional read committed Consumer

I have transactional and normal Producer in application which are writting to topic kafka-topic as below.
Configuration for transactional Kafka Producer
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kakfa cluster
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.RETRIES_CONFIG, 5);
/*The amount of time to wait before attempting to retry a failed request to a given topic partition.
* This avoids repeatedly sending requests in a tight loop under some failure scenarios.*/
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 3);
/*"The configuration controls the maximum amount of time the client will wait "
"for the response of a request. If the response is not received before the timeout "
"elapses the client will resend the request if necessary or fail the request if "
"retries are exhausted.";.*/
props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 1);
/*To avoid duplicate msg*/
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
/*Will wait for ack from broker n all replicas*/
props.put(ProducerConfig.ACKS_CONFIG, "all");
/*Kafka Transactional Properties */
props.put(ProducerConfig.CLIENT_ID_CONFIG, "transactional-producer");
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "test-transactional-id"); // set transaction id
return props;
}
#Bean
public KafkaProducer<String, String> kafkaProducer() {
return new KafkaProducer<>(producerConfigs());
}
Normal Producer config are same only ProducerConfig.CLIENT_ID_CONFIG and ProducerConfig.TRANSACTIONAL_ID_CONFIG are not added.
Consumer config is as below
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
//list of host:port pairs used for establishing the initial connections to the Kafka cluster
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
//allows a pool of processes to divide the work of consuming and processing records
props.put(ConsumerConfig.GROUP_ID_CONFIG, "kafka_group");
//automatically reset the offset to the earliest offset
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
//Auto commit is set false.Will do manual commit
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
/*Kafka Transactional Property ->Controls how to read messages written transactionally
* read_committed - poll transactional messages which have been committed only
* read_uncommitted - will return all messages, even transactional messages
* default is read_uncommitted
* */
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
return props;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
As I am setting isolation.level as read_committed so It should consumer only transactional messages from subscribed topic.
But is it consuming transactional and non-transactional messages from topic.
Do I am missing any configuration so that consumer will only consume transactional messages from subscribed topic.
Thanks in advance :-)
It doesn't work that way. isolation.level only pertains to records committed by transactional producers. All consumers see records published by non-transactional producers.
You need to use two different topics to get the behavior you desire,.

My producer can create a topic, but data doesn't seem to be stored inside the broker

My producer can create a topic, but it doesn't seem to store any data inside a broker. I can check that the topic is created with kafka-topics script.
When I tried to consume with kafka-console-consumer, it doesn't consume anything. (I know --from-beginning.)
When I produced with kafka-console-producer, my consumer(kafka-console-consumer) can consume it right away. So there is something wrong with my java code.
And when I run my code with localhost:9092, it worked fine. And when I consume the topic with my consumer code, it was working properly. My producer works with Kafka server on my local machine but doesn't work with another Kafka server on remote machine.
Code :
//this code is inside the main method
Properties properties = new Properties();
//properties.put("bootstrap.servers", "localhost:9092");
//When I used localhost, my consumer code consumes it fine.
properties.put("bootstrap.servers", "192.168.0.30:9092");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>(properties);
ProducerRecord<String, String> record = new ProducerRecord<>("test5", "1111","jin1111");
//topc is created, but consumer can't consume any data.
//I tried putting different values for key and value parameters but no avail.
try {
kafkaProducer.send(record);
System.out.println("complete");
} catch (Exception e) {
e.printStackTrace();
} finally {
kafkaProducer.close();
System.out.println("closed");
}
/*//try{
for(int i = 0; i < 10000; i++){
System.out.println(i);
kafkaProducer.send(new ProducerRecord("test", Integer.toString(i), "message - " + i ));
}*/
My CLI (Putty) :
I want to see my consumer consuming when I run my java code. (Those data shown in the image are from the producer script.)
update
After reading answers and comments, this is what I've tried so far. Still not consuming any messages. I think message produced in this code is not stored in the broker. I tried with the different server, too. The same problem. Topic was created, but no consumer exists in the consumer group list and can't consume. And no data can be consumed with consumer script.
I also tried permission change. (chown) and tried with etc/hosts files. but no luck. I'll keep on trying until I solve this.
public static void main(String[] args){
Properties properties = new Properties();
//properties.put("bootstrap.servers", "localhost:9092");
properties.put("bootstrap.servers", "192.168.0.30:9092");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("linger.ms", "1");
properties.put("batch.size", "16384");
properties.put("request.timeout.ms", "30000");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>(properties);
ProducerRecord<String, String> record = new ProducerRecord<>("test5", "1111","jin1111");
System.out.println("1");
try {
kafkaProducer.send(record);
//kafkaProducer.send(record).get();
// implement Callback
System.out.println("complete");
kafkaProducer.flush();
System.out.println("flush completed");
} catch (Exception e) {
e.printStackTrace();
} finally {
kafkaProducer.flush();
System.out.println("another flush test");
kafkaProducer.close();
System.out.println("closed");
}
}
When I run this in Eclipse, the console shows :
To complete the ppatierno answer, you should call KafkaProducer.flush() before calling KafkaProducer.close(). This is a blocking call and will not return before all record got sent.
Yannick
My guess is that your main method exits and the application ends before the message is sent by the Kafka client.
The send method is not sync. The client buffers messages and send them after reaching a timeout named linger time (see linger.ms) or the buffer is filled to a specific size (see batch.size parameter for example). The default linger time is anyway 0.
So what your main method does is providing the message to the send method but then it exits and the underlying thread in the Kafka client isn't able to send the message.
I finally figured out. If you experienced similar problem, there are things you can do.
In your server.properties, uncomment these and put the ip and port.
(There seems to be a problem with the port, so I changed it.)
listeners=PLAINTEXT://192.168.0.30:9093
advertised.listeners=PLAINTEXT://192.168.0.30:9093
(Before restarting your broker with your changed server.properties, you might want to clean all existing log.dir. Try this, if nothing works)
Some other things you might want to consider :
change your log.dir. Usually the default path is tmp, but sometimes there is a noexec setting, so configure to a different location
check your etc/hosts
check your permission : And use chown and chmod
change zookeeper port and kafka port if necessary.
change broker.id
My working producer code :
public class Producer1 {
public static void main(String[] args){
Properties properties = new Properties();
properties.put("bootstrap.servers", "192.168.0.30:9093");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<String, String>(properties);
ProducerRecord<String, String> record = new ProducerRecord<>("test", "1","jin");
try {
kafkaProducer.send(record);
System.out.println("complete");
} catch (Exception e) {
e.printStackTrace();
} finally {
kafkaProducer.close();
System.out.println("closed");
}
}
}
working Consumer code:
public class Consumer1 {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "192.168.0.30:9093");
props.put("group.id", "jin");
props.put("auto.offset.reset", "earliest");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Collections.singletonList("test"));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records){
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
}
}
} catch (Exception e){
e.printStackTrace();
} finally {
consumer.close();
System.out.println("closed");
}
}
}
Console :

is there a way to read only new (unread) messages in kafka consumer?

in consumer when subscribed to a topic and start consuming messages it read the messages from the the beginning is there any way to read only unread messages ? this is the code i used for consuming the messages.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
//Kafka Consumer subscribes list of topics here.
consumer.subscribe(Arrays.asList(topicName));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value= %s\n", record.offset(), record.key(), record.value());
}
Turn off auto.commit and manually commit each message offset after your app has successfully read it. That way if the app crashes it will restart at exactly the last committed offset.
i have try your code, and it read only unread messages. what's your client version and server version?

Kafka consumer does not start from latest message

I want to have a Kafka Consumer which starts from the latest message in a topic.
Here is the java code:
private static Properties properties = new Properties();
private static KafkaConsumer<String, String> consumer;
static
{
properties.setProperty("bootstrap.servers","localhost");
properties.setProperty("enable.auto.commit", "true");
properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("group.id", "test");
properties.setProperty("auto.offset.reset", "latest");
consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singletonList("mytopic"));
}
#Override
public StreamHandler call() throws Exception
{
while (true)
{
ConsumerRecords<String, String> consumerRecords = consumer.poll(200);
Iterable<ConsumerRecord<String, String>> records = consumerRecords.records("mytopic");
for(ConsumerRecord<String, String> rec : records)
{
System.out.println(rec.value());
}
}
}
Although the value for auto.offset.reset is latest, but the consumer starts form messages which belong to 2 days ago and then it catches up with the latest messages.
What am I missing?
Have you run this same code before with the same group.id? The auto.offset.reset parameter is only used if there is not an existing offset already stored for your consumer. So if you've run the example previously, say two days ago, and then you run it again, it will start from the last consumed position.
Use seekToEnd() if you would like to manually go to the end of the topic.
See https://stackoverflow.com/a/32392174/1392894 for a slightly more thorough discussion of this.
If you want to manually control the position of your offsets you need to set enable.auto.commit = false.
If you want to position all offsets to the end of each partition then call seekToEnd()
https://kafka.apache.org/0102/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToEnd(java.util.Collection)