Kafka Consumer Group Rebalance and Group Coordinator dead - apache-kafka
I have been playing around with Kafka (1.0.0) for a couple of months and trying to understand how consumer group works. I have a single broker Kafka and I am using Kafka-Connect-Cassandra to consume messages from topics to database tables. I have 10 topics, all have just one partition and I have a Single Consumer Group with 10 Consumer instances (one for each topic).
While running this setup I sometimes see the following logs in kafka-connect console:
1:
[Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Revoking previously assigned partitions [topic1-0, topic2-0, ....] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Successfully joined group with generation 349 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Setting newly assigned partitions [topic1-0, topic2-0, ....] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:341)
After this it starts consuming messages and writes to Cassandra Tables.
This happens frequently on irregular intervals.
However, sometimes the connector stops and shuts down. Then it starts and consumes messages again. This is the log:
INFO [Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO WorkerSinkTask{id=cassandra-sink-casb-0} Committing offsets asynchronously using sequence number 42: {topic1-0=OffsetAndMetadata{offset=1074, metadata=''}, topic2-0=OffsetAndMetadata{offset=112, metadata=''}, ...}} (org.apache.kafka.connect.runtime.WorkerSinkTask:311)
INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1214)
INFO Stopping connector cassandra-sink-casb (org.apache.kafka.connect.runtime.Worker:304)
INFO Stopping task cassandra-sink-casb-0 (org.apache.kafka.connect.runtime.Worker:464)
INFO Stopping Cassandra sink. (com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkTask:79)
INFO Shutting down Cassandra driver session and cluster. (com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraJsonWriter:253)
INFO Stopped connector cassandra-sink-casb (org.apache.kafka.connect.runtime.Worker:320)
INFO Finished stopping tasks in preparation for rebalance (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1244)
INFO [Worker clientId=connect-1, groupId=connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Successfully joined group with generation 7 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-1dc56cda-ed54-4181-a5f9-d11022d8e8c3', leaderUrl='http://127.0.1.1:8083/', offset=8, connectorIds=[cassandra-sink-casb], taskIds
=[cassandra-sink-casb-0]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1192)
INFO Starting connectors and tasks using config offset 8 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:837)
INFO Starting connector cassandra-sink-casb (org.apache.kafka.connect.runtime.distributed.DistributedHerder:890)
2:
org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms,
which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum
size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:722)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:600)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1250)
at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommitSync(WorkerSinkTask.java:299)
at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommit(WorkerSinkTask.java:327)
at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:398)
at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:547)
at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1300(WorkerSinkTask.java:62)
at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsRevoked(WorkerSinkTask.java:618)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:419)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:359)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:295)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1146)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1111)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:410)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:283)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:198)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:166)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Successfully joined group with generation 343 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Setting newly assigned partitions [topic1-0, topic2-0,...] (org.apache.kafka.cl
ients.consumer.internals.ConsumerCoordinator:341)
INFO WorkerSinkTask{id=cassandra-sink-casb-0} Committing offsets asynchronously using sequence number 155: {topic1-0=OffsetAndMetadata{offset=836, metadata=''}, topic2-0=OffsetAndMetadata{offset=86, metadata=''}, ...}} (org.apache.kafka.connect.runtime.WorkerSinkTask:311)
Again sometimes Kafka-Connect starts consuming messages after the rebalance and sometimes it shuts down.
I have the following questions:
1) Why does Group Coordinator (Kafka Broker) dies?
I am looking into multiple Kafka-Configs to resolve these issues, like connections.max.idle.ms, max.poll.records, session.timeout.ms , group.min.session.timeout.ms and group.max.session.timeout.
I am not sure what the best configs would be for things to run smoothly.
2) Why does rebalance occurs?
I know group rebalance can occur on adding a new task, changing the task, etc. But I haven't changed anything. Sometimes Kafka Connect framework seem to handle the error a bit too aggressive and kills the connect tasks instead of carrying on working.
Related
When collecting data with Modbus protocol through kafka Producer, collection stops after a certain period of time
I have deployed a Kafka cluster on a GCP instance. I used the connector through config/connect-distributed.properties. Start collecting data through restapi using the following command: curl -X POST -H "Content-Type:application/json" \ --data '{ "name": "operation1", "config": { "connector.class": "org.apache.plc4x.kafka.Plc4xSourceConnector", "default.topic": "operation1", "tasks.max": "1", "sources": "Modbus", "sources.Modbus.connectionString": "modbus:tcp://<IP address:port>", "sources.Modbus.pollReturnInterval": "10000", "sources.Modbus.bufferSize": "5000", "sources.Modbus.jobReferences": "operation1", "jobs": "operation1", "jobs.operation1.fields": "BMS1-1, BMS1-2, BMS2-1, BMS2-2, BMS2-3, PCS, ETC", "jobs.operation1.interval": "1000", "jobs.operation1.fields.BMS1-1": "input-register:1[125]", "jobs.operation1.fields.BMS1-2": "input-register:126[12]", "jobs.operation1.fields.BMS2-1": "input-register:201[125]", "jobs.operation1.fields.BMS2-2": "input-register:326[125]", "jobs.operation1.fields.BMS2-3": "input-register:451[16]", "jobs.operation1.fields.PCS": "input-register:501[89]", "jobs.operation1.fields.ETC": "input-register:601[5]" } }' http://localhost:8083/connectors In the log of config/connect-distributed.properties , the following log appears and collection is successful. However, collection stops after a certain amount of time (minutes or hours). [2022-05-10 05:36:44,522] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484) [2022-05-10 05:36:54,526] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484) [2022-05-10 05:37:04,530] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484) [2022-05-10 05:37:14,534] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484) [2022-05-10 05:37:24,550] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484) [2022-05-10 05:37:34,554] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484) After a certain amount of time, the log message changes to the following: [2022-05-10 05:42:36,597] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148) [2022-05-10 05:42:38,598] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148) [2022-05-10 05:42:40,598] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148) [2022-05-10 05:42:42,599] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148) [2022-05-10 05:42:44,600] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148) At this time, if you check the status of the Connector with curl , it is still Running. curl -X GET localhost:8083/connectors/operation1/status {"name":"operation1","connector":{"state":"RUNNING","worker_id":"<IP>:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"<IP>:8083"}],"type":"source"} I really don't know why. Help Logs modified to DEBUG level. [2022-05-10 08:14:18,708] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-12 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,708] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-0 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,708] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-6 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-18 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-9 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-3 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-15 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-21 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-24 at position FetchPosition{offset=18793, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Built incremental fetch (sessionId=714175396, epoch=1010) for node 1. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 9 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-offsets-12, connect-offsets-0, connect-offsets-6, connect-offsets-18, connect-offsets-9, connect-offsets-3, connect-offsets-15, connect-offsets-21, connect-offsets-24), canUseTopicIds=True) to broker <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274) [2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-1, correlationId=3034) and timeout 30000 to node 1: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=714175396, sessionEpoch=1010, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521) [2022-05-10 08:14:18,757] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Received FETCH response from node 1 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3030): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=1712137779, responses=[]) (org.apache.kafka.clients.NetworkClient:879) [2022-05-10 08:14:18,757] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Node 1 sent an incremental fetch response with throttleTimeMs = 0 for session 1712137779 with 0 response partition(s), 1 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584) [2022-05-10 08:14:18,758] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-2 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,758] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Built incremental fetch (sessionId=1712137779, epoch=1006) for node 1. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 1 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351) [2022-05-10 08:14:18,758] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-status-2), canUseTopicIds=True) to broker <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274) [2022-05-10 08:14:18,758] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3033) and timeout 30000 to node 1: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=1712137779, sessionEpoch=1006, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521) [2022-05-10 08:14:18,759] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Received FETCH response from node 0 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3031): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=619420322, responses=[]) (org.apache.kafka.clients.NetworkClient:879) [2022-05-10 08:14:18,759] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Received FETCH response from node 0 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-3, correlationId=1014): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=208110829, responses=[]) (org.apache.kafka.clients.NetworkClient:879) [2022-05-10 08:14:18,759] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Node 0 sent an incremental fetch response with throttleTimeMs = 0 for session 208110829 with 0 response partition(s), 1 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584) [2022-05-10 08:14:18,759] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Node 0 sent an incremental fetch response with throttleTimeMs = 0 for session 619420322 with 0 response partition(s), 2 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-configs-0 at position FetchPosition{offset=698, offsetEpoch=Optional[54], currentLeader=LeaderAndEpoch{leader=Optional[<IP>92.153:9092 (id: 0 rack: null)], epoch=54}} to node <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-3 at position FetchPosition{offset=129, offsetEpoch=Optional[50], currentLeader=LeaderAndEpoch{leader=Optional[<IP>92.153:9092 (id: 0 rack: null)], epoch=54}} to node <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Built incremental fetch (sessionId=208110829, epoch=1008) for node 0. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 1 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-configs-0), canUseTopicIds=True) to broker <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-3, correlationId=1015) and timeout 30000 to node 0: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=208110829, sessionEpoch=1008, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-0 at position FetchPosition{offset=116, offsetEpoch=Optional[54], currentLeader=LeaderAndEpoch{leader=Optional[<IP>92.153:9092 (id: 0 rack: null)], epoch=54}} to node <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Built incremental fetch (sessionId=619420322, epoch=1008) for node 0. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 2 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-status-0, connect-status-3), canUseTopicIds=True) to broker <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274) [2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3034) and timeout 30000 to node 0: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=619420322, sessionEpoch=1008, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521) [2022-05-10 08:14:18,812] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Received FETCH response from node 2 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-1, correlationId=3032): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=581764107, responses=[]) (org.apache.kafka.clients.NetworkClient:879) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Node 2 sent an incremental fetch response with throttleTimeMs = 0 for session 581764107 with 0 response partition(s), 8 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-8 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-14 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-2 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-20 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-11 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-5 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-23 at position FetchPosition{offset=599, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-17 at position FetchPosition{offset=70, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:18,814] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Built incremental fetch (sessionId=581764107, epoch=1006) for node 2. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 8 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351) [2022-05-10 08:14:18,814] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-offsets-8, connect-offsets-14, connect-offsets-2, connect-offsets-20, connect-offsets-11, connect-offsets-5, connect-offsets-23, connect-offsets-17), canUseTopicIds=True) to broker <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274) [2022-05-10 08:14:18,814] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-1, correlationId=3035) and timeout 30000 to node 2: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=581764107, sessionEpoch=1006, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521) [2022-05-10 08:14:18,977] DEBUG [operation1|task-0] Job statistics (operation1, Modbus) number of requests: 354 (201 success, 43.2 % failed, 0.0 % too slow), min latency: 82.47 ms, mean latency: 93.20 ms, median: 89.56 ms (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperImpl:250) [2022-05-10 08:14:19,073] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Received FETCH response from node 2 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3032): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=1118973913, responses=[]) (org.apache.kafka.clients.NetworkClient:879) [2022-05-10 08:14:19,073] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Node 2 sent an incremental fetch response with throttleTimeMs = 0 for session 1118973913 with 0 response partition(s), 2 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584) [2022-05-10 08:14:19,073] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-4 at position FetchPosition{offset=85, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:19,073] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-1 at position FetchPosition{offset=115, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245) [2022-05-10 08:14:19,074] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Built incremental fetch (sessionId=1118973913, epoch=1008) for node 2. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 2 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351) [2022-05-10 08:14:19,074] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-status-1, connect-status-4), canUseTopicIds=True) to broker <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274) [2022-05-10 08:14:19,074] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3035) and timeout 30000 to node 2: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=1118973913, sessionEpoch=1008, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521) [2022-05-10 08:14:19,126] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Received FETCH response from node 0 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-1, correlationId=3033): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=407599491, responses=[]) (org.apache.kafka.clients.NetworkClient:879) [2022-05-10 08:14:19,126] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Node 0 sent an incremental fetch response with throttleTimeMs = 0 for session 407599491 with 0 response partition(s), 8 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584)
This seems to have been an issue with how the PLC4X connector handles errors. It was causing the connector to stop requesting new messages from the Modbus server after a timeout occurred. However what was interesting was that if the TCP connection to the Modbus server was interrupted, the PLC4X connector would reconnect and start polling again. Can you please try building the latest PLC4X connector from the PLC4X Github repo a fix has been pushed to it? PLC4X Kafka Connector Repository The PLC4X Kafka Connector doesn't fail the Kafka connector on a failed Kafka Connector->PLC connection. Instead it waits for the connection to be restored and begins polling again. From your comments it would also seem that you have a Modbus server available on a public IP address. This isn't the best design as Modbus provides no security.
Same offset and partition record is getting consumed twice causing duplicates
I am trying to consume records using application written in spring-kafka. I am facing very unique condition and not able to understand why this is happening ? My consumer application is running with 2 concurrency meaning 2 consumer thread subscribed to topic having two partitions.I am consuming records and placing it into table using upsert with offset, partitions and insert timestamp. I am seeing duplicate values with same offset and partition in the table which should not occur. There is no difference in the timestamp value, means insert occurred at the same time. I am not sure how is it possible? I don't see any issue in the log as well. I am not sure what is happening at the Producer end but we can't have 2 values at the same offset anyway, so not sure whether this is an issue at consumer end of producer end.Any suggestion or thought which would help me to triage this issue? Kafka log : I don't see anything unusual in the log as well. 14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 2.4.0 14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 77a89fcf8d7fa018 14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1604154596318 14:29:56.319 [main] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Subscribed to topic(s): kaas.pe.enrollment.csp.ts2 14:29:57.914 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Cluster ID: 6cbv7QOaSW6j1vXrOCE4jA 14:29:57.914 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Cluster ID: 6cbv7QOaSW6j1vXrOCE4jA 14:29:57.915 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Discovered group coordinator apslp1563.uhc.com:9093 (id: 2147483574 rack: null) 14:29:57.915 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Discovered group coordinator apslp1563.uhc.com:9093 (id: 2147483574 rack: null) 14:29:57.923 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] (Re-)joining group 14:29:57.924 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] (Re-)joining group 14:29:58.121 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] (Re-)joining group 14:29:58.125 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] (Re-)joining group 14:30:13.127 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Finished assignment for group at generation 23: {consumer-csp-prov-emerald-test-1-19d92ba5-5dc3-433d-b967-3cf1ce1b4174=org.apache.kafka.clients.consumer.ConsumerPartitionAssignor$Assignment#d7e2a1f, consumer-csp-prov-emerald-test-2-5833c212-7031-4ab1-944b-7e26f7d7a293=org.apache.kafka.clients.consumer.ConsumerPartitionAssignor$Assignment#53c3aad3} 14:30:13.131 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Successfully joined group with generation 23 14:30:13.131 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Successfully joined group with generation 23 14:30:13.134 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Adding newly assigned partitions: kaas.pe.enrollment.csp.ts2-1 14:30:13.134 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Adding newly assigned partitions: kaas.pe.enrollment.csp.ts2-0 14:30:13.143 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Setting offset for partition kaas.pe.enrollment.csp.ts2-0 to the committed offset FetchPosition{offset=500387, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=apslp1559.uhc.com:9093 (id: 69 rack: null), epoch=37}} 14:30:13.143 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Setting offset for partition kaas.pe.enrollment.csp.ts2-1 to the committed offset FetchPosition{offset=499503, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=apslp1562.uhc.com:9093 (id: 72 rack: null), epoch=36}} Code : #KafkaListener(topics = "${app.topic}", groupId = "${app.group_id_config}") public void run(ConsumerRecord<String, GenericRecord> record, Acknowledgment acknowledgement) throws Exception { try { prov_tin_number = record.value().get("providerTinNumber").toString(); prov_tin_type = record.value().get("providerTINType").toString(); enroll_type = record.value().get("enrollmentType").toString(); vcp_prov_choice_ind = record.value().get("vcpProvChoiceInd").toString(); error_flag = ""; dataexecutor.peStremUpsertTbl(prov_tin_number, prov_tin_type, enroll_type, vcp_prov_choice_ind, error_flag, record.partition(), record.offset()); acknowledgement.acknowledge(); }catch (Exception ex) { System.out.println(record); System.out.println(ex.getMessage()); } }
How to fix kafka streams problem related to group coordinator is unavailable or invalid, will attempt rediscovery
I have a problem when I try to run Kafka Streams application with PROCESSING_GUARANTEE_CONFIG that equals to "exactly once semantic" for other cases as for example at least once semantic it works very well. I noticed in the logs that something is going wrong and I found some of the recommendation here in order to fix this problem but unfortunately it didn't helped me :( 03:35:28.627 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Discovered group coordinator kafka:9093 (id: 2147483646 rack: null) 03:35:28.627 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Group coordinator kafka:9093 (id: 2147483646 rack: null) is unavailable or invalid, will attempt rediscovery 03:35:28.628 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=to-transform] Discovered group coordinator kafka:9093 (id: 2147483646 rack: null) 03:35:28.628 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Group coordinator kafka:9093 (id: 2147483646 rack: null) is unavailable or invalid, will attempt rediscovery 03:35:48.628 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Discovered group coordinator kafka:9093 (id: 2147483646 rack: null) 03:35:48.630 INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Found no committed offset for partition topic-0 03:35:48.631 INFO o.a.k.c.c.KafkaConsumer - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions 03:35:48.631 INFO o.a.k.s.p.i.StreamThread - stream-thread [transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING 03:35:48.631 INFO o.a.k.s.KafkaStreams - stream-client [transform-f8268b2b-4673-49ac-9396-6a2b86d45697] State transition from REBALANCING to RUNNING 03:35:48.632 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Attempt to heartbeat failed for since member id transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer-6aacbde6-4553-43ee-bc2f-2b5718e55acf is not valid. 03:35:48.632 INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Found no committed offset for partition topic-0 03:35:48.633 INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Resetting offset for partition topic-0 to offset 0. 03:35:48.634 INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group 03:35:48.634 INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Lost previously assigned partitions topic-0 03:35:48.634 INFO o.a.k.s.p.i.StreamThread - stream-thread [transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1] at state RUNNING: partitions [topic-0] lost due to missed rebalance. As for example first recommendation if I run just single kafka broker node then I have to set up partitions and replications configs to 1 as well second recommendation was to restart kafka broker that also gave no results kafka: image: wurstmeister/kafka:2.12-2.4.1 ports: - "9092:9092" - "9093:9093" depends_on: - zookeeper links: - zookeeper:zk environment: KAFKA_BROKER_ID: 1 KAFKA_LISTENERS: OUTSIDE://kafka:9092,INSIDE://kafka:9093 KAFKA_ADVERTISED_LISTENERS: OUTSIDE://localhost:9092,INSIDE://kafka:9093 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT,PLAINTEXT:PLAINTEXT KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE KAFKA_LOG_RETENTION_HOURS: 1 KAFKA_MESSAGE_MAX_BYTES: 1048576 KAFKA_REPLICA_FETCH_MAX_BYTES: 1048576 KAFKA_GROUP_MAX_SESSION_TIMEOUT_MS: 30000 KAFKA_NUM_PARTITIONS: 1 KAFKA_DEFAULT_REPLICATION_FACTOR: 1 KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: 1 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_TRANSACTION_STATE_LOG_NUM_PARTITIONS: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 KAFKA_DELETE_RETENTION_MS: 86400000 KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_CREATE_TOPICS: topic:1:1, transform:1:1 Thanks for any help kind regards, Victor
There can be many reasons for the observed issue. In general, exaclty-once is more expensive and puts a higher load on the brokers and the KafkaStreams application. Also note, that if you really want to get exactly-once processing, you should run with at least 3 brokers (and topics should be configured with a replication factor of 3, and min-isr of 2). Otherwise, EOS cannot really be guaranteed. Increasing the commit.interval.ms might help to mitigate the issue. Note, that for EOS, it might lead to higher processing latency (that is the reason why the default commit interval is reduced to 100ms if EOS is enable). If you can accept a higher latency, you might want to increase it to for example 1 seconds. Also, there is a heavy investment into EOS and newer versions contain many improvements. If you can, you might want to upgrade to upcoming 2.6 release and test the new "eos_beta" processing mode (requires brokers 2.5 or newer).
Unable to consume with specific consumer group on a Kafka cluster
When I try to consume a topic with a specific consumer group it fails. I can consume the same topic with a new consumer group. When describe command is used on the topic there are no consumers attached to any partitions: GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID dracg clog 1 105288 105588 300 - - - dracg clog 2 104232 104532 300 - - - dracg clog 3 104525 104820 295 - - - dracg clog 0 104941 105243 302 - - - Even console consumer code cannot consume with this consumer group I'm giving relevant -group join- section of console consumer log below, group leader DEBUG [Consumer clientId=consumer-1, groupId=dracg] Joining group with current subscription: [clog] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending JoinGroup (JoinGroupRequestData(groupId='dracg', sessionTimeoutMs=10000, rebalanceTimeoutMs=300000, memberId='consumer-1-0d44d911-c975-4dfe-83d9-4b96b5fc9638', groupInstanceId='null', protocolType='consumer', protocols=[JoinGroupRequestProtocol(name='range', metadata=[0, 0, 0, 0, 0, 1, 0, 23, 112, 114, 101, 112, 114, 111, 100, 45, 70, 66, 77, 66, 45, 99, 104, 97, 110, 110, 101, 108, 108, 111, 103, 0, 0, 0, 0])])) to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Received successful JoinGroup response: JoinGroupResponseData(throttleTimeMs=0, errorCode=0, generationId=5117, protocolName='range', leader='rdkafka-b5a0e7ac-8311-410f-bf04-b2b2712bad7a', memberId='consumer-1-0d44d911-c975-4dfe-83d9-4b96b5fc9638', members=[]) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending follower SyncGroup to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null): SyncGroupRequestData(groupId='dracg', generationId=5117, memberId='consumer-1-0d44d911-c975-4dfe-83d9-4b96b5fc9638', groupInstanceId='null', assignments=[]) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) INFO [Consumer clientId=consumer-1, groupId=dracg] Successfully joined group with generation 5117 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Enabling heartbeat thread (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) INFO [Consumer clientId=consumer-1, groupId=dracg] Setting newly assigned partitions: (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending Heartbeat request to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Received successful Heartbeat response (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending asynchronous auto-commit of offsets {} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Completed asynchronous auto-commit of offsets {} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending Heartbeat request to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Received successful Heartbeat response (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending Heartbeat request to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) DEBUG [Consumer clientId=consumer-1, groupId=dracg] Received successful Heartbeat response (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) Kafka version 2.3.0 If you have any clue on this please share.
kafka consumer AbstractCoordinator: Discovered coordinator Java client
I have 3 brokers running with broker id s 0 1 and 2. The Consumer (Java Client) picks up broker 0 as group coordinator and starts to consume messages correctly. But when the broker 0 which is the group coordinator is down, the consumer does not do anything and stalls on the poll() method. The process resumes only when that broker 0 is up and running. How to handle this scenario of group co-ordinator change in the Java client? I get this error when group coordinator dies: 16/09/22 17:42:45 INFO internals.AbstractCoordinator: Discovered coordinator datascience1.sv2.trulia.com:9092 (id: 2147483647 rack: null) for group group2. 16/09/22 17:42:45 INFO internals.AbstractCoordinator: (Re-)joining group group2 16/09/22 17:42:45 INFO internals.AbstractCoordinator: Marking the coordinator datascience1.sv2.trulia.com:9092 (id: 2147483647 rack: null) dead for group group2
See this answer: https://stackoverflow.com/a/50954843/7321097 and https://stackoverflow.com/a/50595475/7321097 The isue is in the offsets.topic.replication.factor & replication.factor configs.