Spring kafka consumer don't commit to kafka server after leader changed - apache-kafka

I am using spring-kafka 2.1.10.RELEASE. I have a consumer with next properties (copied almost all of them):
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [kafka1.local:9093, kafka2.local:9093, kafka3.local:9093]
check.crcs = true
client.id = kafkaListener-0
connections.max.idle.ms = 540000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = kafkaLisneterContainer
heartbeat.interval.ms = 3000
interceptor.classes = null
internal.leave.group.on.close = true
isolation.level = read_uncommitted
max.poll.interval.ms = 300000
max.poll.records = 50
metadata.max.age.ms = 300000
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
Apache Kafka version on my production is 2.11-1.0.0-0pan4.
There is a cluster with 3 nodes of kafka inside:
Faced a serious problem and cannot even reproduce it locally. And this is what happened:
I started my application with both kafka Producer and Consumer inside.
Everything worked fine untill leader node for my topic wasn't changed at 2019-01-17 06:47:39:
2019-01-17/controller.2019-01-17-03.aaa-aa3.gz:2019-01-17 06:47:39,365
+0000 [controller-event-thread] [kafka.controller.KafkaController] INFO [Controller id=3] New leader and ISR for partition topic_name-0
is {"leader":1,"leader_epoch":3,"isr":[1,3]}
(kafka.controller.KafkaController)
After that my consumer stopped commiting offsets to Kafka. Last commit took place same hour and same minute when the leader was changed - 17th January 2019 06:47.
4) MOST MYSTERIOUS: in application everything kind-a works OK. Spring-consumer reads new messages and sends them to kafka. I see such logs. Seems like spring consumer saves its offset in memory and sends commit to remote kafka (no errors and etc.):
2019-01-23 14:03:20,975 +0000
[kafkaLisneterContainer-0-C-1] [Fetcher] DEBUG [Consumer
clientId=kafkaListener-0,
groupId=kafkaLisneterContainer] Fetch READ_UNCOMMITTED at
offset 164871 for partition aaa-1 returned fetch data
(error=NONE, highWaterMark=164871, lastStableOffset = -1,
logStartOffset = 116738, abortedTransactions = null,
recordsSizeInBytes=0) 2019-01-23 14:03:20,975 +0000
[externalbetting] [kafkaLisneterContainer-0-C-1] [Fetcher]
DEBUG [Consumer clientId=kafkaListener-0,
groupId=kafkaLisneterContainer] Added READ_UNCOMMITTED fetch
request for partition eaaa-1 at offset 164871 to node
aaa-aa1.local:9093 (id: 1 rack: null) 2019-01-23 14:03:20,975
5) But anyway Lag in Apache kafka grows. And if I restart my application, spring bean consumer will be re-created and will loose its in-memory saved offset. It will read that Lag from kafka and process that records for second time.
Please, help to find the key!

When you enable auto commit (Kafka's default), the commits are completely managed by the kafka-clients and Spring has no control over it.
Setting it to false will allow the listener container to commit the offsets which it will do after each batch of records (poll result) by default or after every record if you set the container AckMode property to RECORD.
The container will also reliably commit any pending offsets when partitions are revoked due to a rebalance.
I generally recommend not using auto commit.

Related

Timeout expired while fetching topic metadata - Kafka

We are trying to consume the broker messages in Kafka hosted in Windows standalone.
Consumer is running in Kubernetes.
server.properties:
listeners=PLAINTEXT://:29092
advertised.listeners=PLAINTEXT://myhostname:29092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
Consumer Error:
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1627883198273
[main] INFO XXX.XXX.KafkaConsumerProperties - Kafka Topic Name : table-update
[main] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-GroupConsumer-1, groupId=GroupConsumer] Subscribed to topic(s): table-update
[main] INFO XX.XX.XXXXX- Could not run Loader: Timeout expired while fetching topic metadata .
Consumer config values :
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [myhostname:29092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = consumer-GroupConsumer-1
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = GroupConsumer
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 52428800
max.poll.interval.ms = 2147483647
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 120000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 60000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = XXXX
Please help me in resolving this issue.
While fetching the metadata for topics there are possibilities of multiple error conditions like invalid topic name, metadata leader, offline partitions etc. You can get more info about the errors by logging at debug level. The debug level logs the error condition
log.debug("Topic metadata fetch included errors: {}", errors);

#KafkaListener not recovering after DisconnectException

I have a Kafka consumer (Spring boot) configured using #KafkaListener. This was running in production and all was good until as part of the maintenance the brokers were restarted. By docs, I was expecting that the kafka listener would recover once broker is back up. However this is not what I observed from the logs. The logs stopped with following Exception:
2020-04-22 11:11:28,802|INFO|automator-consumer-app-id-0-C-1|org.apache.kafka.clients.FetchSessionHandler|[Consumer clientId=automator-consumer-app-id-0, groupId=automator-consumer-app-id] Node 10 was unable to process the fetch request with (sessionId=2138208872, epoch=348): FETCH_SESSION_ID_NOT_FOUND.
2020-04-22 11:24:23,798|INFO|automator-consumer-app-id-0-C-1|org.apache.kafka.clients.FetchSessionHandler|[Consumer clientId=automator-consumer-app-id-0, groupId=automator-consumer-app-id] Error sending fetch request (sessionId=499459047, epoch=314160) to node 7: org.apache.kafka.common.errors.DisconnectException.
2020-04-22 11:36:37,241|INFO|automator-consumer-app-id-0-C 1|org.apache.kafka.clients.FetchSessionHandler|[Consumer clientId=automator-consumer-app-id-0, groupId=automator-consumer-app-id] Error sending fetch request (sessionId=2033512553, epoch=342949) to node 4: org.apache.kafka.common.errors.DisconnectException.
Once the application was restarted, the connectivity reestablished. I was wondering if this could be related with any of the consumer configuration below.
2020-04-22 12:46:59,681|INFO|main|org.apache.kafka.clients.consumer.ConsumerConfig|ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [msk-e00-br1.int.bell.ca:9092]
check.crcs = true
client.dns.lookup = default
client.id = automator-consumer-app-id-0
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = automator-consumer-app-id
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
Increase the value of
max.incremental.fetch.session.cache.slots
. The default value is 1K. You can refer the answer here : How to check the actual number of incremental fetch session cache slots used in Kafka cluster?
Basically, your application is continuously listening the messages from the topic, suppose if there is no message published in the topic you will get this type of exception.
org.apache.kafka.common.errors.DisconnectException: null
Disconnect Exception class
If we start sending messages to topic, application will start running and consume those messages.
Here you need to increase the request timeout in your properties files.
consumer.request.timeout.ms:

Offset commit failed on partition

Kafka consumer continues to print error messages
I built a cluster (kafka version 2.3.0) using 5 machines kafka, which has a partition with a partition of 0 and a data copy of 3. When I consume the kafka-clients api, I continue to output exceptions:
Offset commit failed on partition test-0 at offset 1: The request timed out.
But this topic reads and writes messages are fine.
Consumer configuration:
Auto.commit.interval.ms = 5000
Auto.offset.reset = latest
Bootstrap.servers = [qs-kfk-01:9092, qs-kfk-02:9092, qs-kfk-03:9092, qs-kfk-04:9092, qs-kfk-05:9092]
Check.crcs = true
Client.id =
Connections.max.idle.ms = 540000
Default.api.timeout.ms = 60000
Enable.auto.commit = true
Exclude.internal.topics = true
Fetch.max.bytes = 52428800
Fetch.max.wait.ms = 500
Fetch.min.bytes = 1
Group.id = erp-sales
Heartbeat.interval.ms = 3000
Interceptor.classes = []
Internal.leave.group.on.close = true
Isolation.level = read_uncommitted
Key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
Max.partition.fetch.bytes = 1048576
Max.poll.interval.ms = 300000
Max.poll.records = 500
Metadata.max.age.ms = 300000
Metric.reporters = []
Metrics.num.samples = 2
Metrics.recording.level = INFO
Metrics.sample.window.ms = 30000
Partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
Receive.buffer.bytes = 65536
Reconnect.backoff.max.ms = 1000
Reconnect.backoff.ms = 50
Request.timeout.ms = 30000
Retry.backoff.ms = 100
Sasl.client.callback.handler.class = null
Sasl.jaas.config = null
Sasl.kerberos.kinit.cmd = /usr/bin/kinit
Sasl.kerberos.min.time.before.relogin = 60000
Sasl.kerberos.service.name = null
Sasl.kerberos.ticket.renew.jitter = 0.05
Sasl.kerberos.ticket.renew.window.factor = 0.8
Sasl.login.callback.handler.class = null
Sasl.login.class = null
Sasl.login.refresh.buffer.seconds = 300
Sasl.login.refresh.min.period.seconds = 60
Sasl.login.refresh.window.factor = 0.8
Sasl.login.refresh.window.jitter = 0.05
Sasl.mechanism = GSSAPI
Security.protocol = PLAINTEXT
Send.buffer.bytes = 131072
Session.timeout.ms = 10000
Ssl.cipher.suites = null
Ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
Ssl.endpoint.identification.algorithm = https
Ssl.key.password = null
Ssl.keymanager.algorithm = SunX509
Ssl.keystore.location = null
Ssl.keystore.password = null
Ssl.keystore.type = JKS
Ssl.protocol = TLS
Ssl.provider = null
Ssl.secure.random.implementation = null
Ssl.trustmanager.algorithm = PKIX
Ssl.truststore.location = null
Ssl.truststore.password = null
Ssl.truststore.type = JKS
Value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
java code:
ConsumerRecords<K, V> consumerRecords = _kafkaConsumer.poll(50L);
for (ConsumerRecord<K, V> record : consumerRecords.records(topic)) {
kafkaConsumer.receive(topic, record.key(), record.value(), record.partition(), record.offset());
}
I have tried the following:
Increase request timeout to 5 minutes (not working)
Replaced another group-id (it work): I found that as long as I use this group-id, there will be problems.
Kill the machine where the group coordinator is located. After the group coordinator switches to another machine, the error remains.
I get continuous error message output in the console
2019-11-03 16:21:11.687 DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-1, groupId=erp-sales] Sending asynchronous auto-commit of offsets {test-0=OffsetAndMetadata{offset=1, metadata=''}}
2019-11-03 16:21:11.704 ERROR org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-1, groupId=erp-sales] Offset commit failed on partition test-0 at offset 1: The request timed out.
2019-11-03 16:21:11.704 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-1, groupId=erp-sales] Group coordinator qs-kfk-04:9092 (id: 2147483643 rack: null) is unavailable or invalid, will attempt rediscovery
2019-11-03 16:21:11.705 DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-1, groupId=erp-sales] Asynchronous auto-commit of offsets {test-0=OffsetAndMetadata{offset=1, metadata=''}} failed due to retriable error: {}
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets.
Caused by: org.apache.kafka.common.errors.TimeoutException: The request timed out.
2019-11-03 16:21:11.708 DEBUG org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-1, groupId=erp-sales] Manually disconnected from 2147483643. Removed requests: OFFSET_COMMIT.
2019-11-03 16:21:11.708 DEBUG org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient - [Consumer clientId=consumer-1, groupId=erp-sales] Cancelled request with header RequestHeader(apiKey=OFFSET_COMMIT, apiVersion=4, clientId=consumer-1, correlationId=42) due to node 2147483643 being disconnected
2019-11-03 16:21:11.708 DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-1, groupId=erp-sales] Asynchronous auto-commit of offsets {test-0=OffsetAndMetadata{offset=1, metadata=''}} failed due to retriable error: {}
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets.
Caused by: org.apache.kafka.common.errors.DisconnectException: null
Why did the submission of the offset fail?
Why is the offset information in the kafka cluster still correct when the submission fails?
Thank you for your help.
Group coordinator Unavailability is the main cause of this issue.
Group coordinator is Unavailable --
This issue is already raised in the KAFKA Community (KAFKA-7017).
You can fix this by deleting the topic _offset_topics and restart the cluster.
You can go through the following to get few more details.
https://www.stuckintheloop.com/posts/my_first_kafka_outage/
#Rohit Yadav, Thanks for the answer, I have replaced a consumer group for the time being and I am working very well. But now the client has another problem, continuous output:
2019-11-05 14:11:14.892 INFO org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-1, groupId=fs-sales-be] Error sending fetch request (sessionId=2035993355, epoch=1) to node 4: org.apache.kafka.common.errors.DisconnectException.
2019-11-05 14:11:14.892 INFO org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-1, groupId=fs-sales-be] Error sending fetch request (sessionId=181175071, epoch=INITIAL) to node 5: org.apache.kafka.common.errors.DisconnectException.
What is this caused by this? 4, 5 node status is OK

Testing Kafka HA and getting, NetworkException: The server disconnected before a response was received

running Confluent Kafka 4.1.1 community.
I have...
min in-sync replicas = 2
Topic: 1 partition, replica count 3
Total of 3 brokers.
Producer is set to acks = -1
All other producer settings are default.
I launch my application and as it starts to write records to Kafka I down one of the brokers on purpose and I immediately get: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
Based on the settings above. shouldn't the producer write() succeed this and not throw an error?
Clarification
I kill a broker on purpose
This only seems happens if the leader broker is killed?
Without seeing full config. and log messages, hard to say, still..
In Kafka, all writes go through the leader partition. In your setting, out of 3 brokers, you killed 1. So it should be possible to write successfully to the remaining 2 and get acknowledgement. But in case the broker which has been killed is the leader node, it can result in an exception.
From the docs:
acks=all This means the leader will wait for the full set of in-sync
replicas to acknowledge the record. This guarantees that the record
will not be lost as long as at least one in-sync replica remains
alive. This is the strongest available guarantee.
You can in any case set retries to a value higher than 0 and see the behaviour - a new leader should be elected and your write should be eventually succeed
For Spring Cloud Stream for kafka binder for Azure Eventhub for Kafka
Exception:
{"timestamp":"2020-09-23 23:37:18.541","level":"ERROR","class":"org.springframework.kafka.support.LoggingProducerListener.onError 84", "thread":"kafka-producer-network-thread | producer-2","traceId":"","message":Exception thrown when sending a message with key='null' and payload='{123, 34, 115, 116, 97, 116, 117, 115, 34, 58, 34, 114, 101, 97, 100, 121, 34, 44, 34, 101, 118, 101...' to topic executor-networkexception and partition 3:}
org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
{"timestamp":"2020-09-23 23:37:18.545","level":"WARN ","class":"org.apache.kafka.clients.producer.internals.Sender.completeBatch 568", "thread":"kafka-producer-network-thread | producer-2","traceId":"","message":[Producer clientId=producer-2] Received invalid metadata error in produce request on partition executor-networkexception-3 due to org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.. Going to request metadata update now}
Solution: setup idle time, retry count, retry backoff time -
spring:
cloud:
stream:
kafka:
binder:
brokers: srsmvsdneventhubstage.servicebus.windows.net:9093
configuration:
sasl.jaas.config: 'org.apache.kafka.common.security.plain.PlainLoginModulerequiredusername="$ConnectionString"password="Endpoint=sb://xxxxx.servicebus.windows.net/;=";'
sasl.mechanism: PLAIN
security.protocol: SASL_SSL
retries: 3
retry.backoff.ms: 60
connections.max.idle.ms: 240000
Reference:
http://kafka.apache.org/090/documentation.html (read http://kafka.apache.org/090/documentation.html#producerconfigs)
[https://github.com/Azure/azure-event-hubs-for-kafka/blob/master/CONFIGURATION.md][2] (read connections.max.idle.ms)
Logs:
"org.apache.kafka.clients.producer.ProducerConfig.logAll 279", "thread":"hz._hzInstance_1_dev.cached.thread-14","traceId":"","message":ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [srsmvsdneventhubstage.servicebus.windows.net:9093]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = [hidden]
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = PLAIN
security.protocol = SASL_SSL
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
}
New-
"org.apache.kafka.clients.producer.ProducerConfig.logAll 279", "thread":"hz._hzInstance_1_dev.cached.thread-20","traceId":"","message":ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [xxxxx.servicebus.windows.net:9093]
buffer.memory = 33554432
client.id =
compression.type = none
**connections.max.idle.ms = 240000**
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
**retries = 3**
**retry.backoff.ms = 60**
sasl.client.callback.handler.class = null
sasl.jaas.config = [hidden]
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = PLAIN
security.protocol = SASL_SSL
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
}

StreamsException: No valid committed offset found for input topic

I have two node Kafka cluster with replication factor two. My Kafka version is 0.10.2.0.
Previously my application was used 0.11.0.0 Kafka Streams API. This is changed 0.11.0.1 Kafka stream to fix Kafka Streams application is throwing 'CommitFailedException' exceptions as suggested at https://issues.apache.org/jira/browse/KAFKA-5786 and https://issues.apache.org/jira/browse/KAFKA-5152
Following are ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [1.1.1.1:9092, 1.1.1.2:9092]
check.crcs = true
client.id = sample-app-0.0.1-27c65ef7-d07f-4619-96be-852f5772d73d-StreamThread-1-consumer
connections.max.idle.ms = 540000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = sample-app-0.0.1
heartbeat.interval.ms = 3000
interceptor.classes = null
internal.leave.group.on.close = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 2147483647
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [org.apache.kafka.streams.processor.internals.StreamPartitionAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
But one of my application is throwing below exception after running some time even though previously committed offsets are available. What would be the possible reason? Please explain.
017-12-04 12:03:25,410 ERROR c.e.s.c.f.k.s.r.f.SampleStreamsApp [sample-app-0.0.1-096850bc-5f18-4c59-b0c7-63ad00e08aa1-StreamThread-1] No valid committed offset found for input topic sampel (partition 0) and no valid reset policy configured. You need to set configuration parameter "auto.offset.reset" or specify a topic specific reset policy via KStreamBuilder#stream(StreamsConfig.AutoOffsetReset offsetReset, ...) or KStreamBuilder#table(StreamsConfig.AutoOffsetReset offsetReset, ...) org.apache.kafka.streams.errors.StreamsException: No valid committed offset found for input topic sample (partition 0) and no valid reset policy configured. You need to set configuration parameter "auto.offset.reset" or specify a topic specific reset policy via KStreamBuilder#stream(StreamsConfig.AutoOffsetReset offsetReset, ...) or KStreamBuilder#table(StreamsConfig.AutoOffsetReset offsetReset, ...)
at org.apache.kafka.streams.processor.internals.StreamThread.resetInvalidOffsets(StreamThread.java:567)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:538)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:490)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
Caused by: org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partitions: [sample-0]
at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsets(Fetcher.java:425)
at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNeeded(Fetcher.java:254)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1640)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1083)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:536)
... 3 more