there is large file with 40M records. the spool dir connector processed half of the records but after that it stopped pushing the records to the Topic. log is something like below -
327878 [2021-01-07 23:08:59,903] INFO Processed 20060000 lines of /dir/dir1/abc.txt_1607697517821.txt (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceTask:144)
327879 [2021-01-07 23:08:59,997] INFO Processed 20080000 lines of /dir/dir1/abc.txt_1607697517821.txt (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceTask:144)
327880 [2021-01-07 23:09:00,225] INFO Processed 20100000 lines of /dir/dir1/abc.txt_1607697517821.txt (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceTask:144)
327881 [2021-01-07 23:09:04,788] INFO WorkerSourceTask{id=cust-stream-1} Committing offsets (org.apache.kafka.connect .runtime.WorkerSourceTask:478)
327882 [2021-01-07 23:09:04,788] INFO WorkerSourceTask{id=cust-stream-1} flushing 0 outstanding messages for offset c ommit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
327883 [2021-01-07 23:09:04,795] INFO WorkerSourceTask{id=cust-stream-1} Finished commitOffsets successfully in 6 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:574)
327884 [2021-01-07 23:09:04,795] INFO WorkerSourceTask{id=cust-stream-0} Committing offsets (org.apache.kafka.connect .runtime.WorkerSourceTask:478)
327885 [2021-01-07 23:09:04,795] INFO WorkerSourceTask{id=cust-stream-0} flushing 0 outstanding messages for offset c ommit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
327886 [2021-01-07 23:09:04,795] INFO WorkerSourceTask{id=cust-stream-2} Committing offsets (org.apache.kafka.connect .runtime.WorkerSourceTask:478)
commit offset flush messages in last few lines are getting repeated in the log.
abc_1607697517821.txt.PROCESSING file still exists , showing that it's not finished yet.
Related
Trying to set up schema transfer SMT (https://github.com/OneCricketeer/schema-registry-transfer-smt) with MM2.
The first iteration works successfully. A schema is created in the target registry and the messages in the topic are displayed correctly.
But then replication stops, new messages stop coming to the target cluster.
After disabling SMT schema transfer, message replication starts working again, but as expected without SMT, the consumer tries to deserialize events with the source schema ID without success.
We are on Kafka 2.8/Confluent 6.0 and using MM2 for A->B one-way replication.
MirrorMaker 2 settings
/etc/kafka/connect-mirror-maker.properties:
clusters=source, target-stage
source.bootstrap.servers=broker-src.net:9091
target-stage.bootstrap.servers=broker-target:9091
source->target-stage.enabled=True
source->target-stage.topics=test-mm-.*
topics.blacklist=.*[\\-\\.]internal, .*\\.replica, __.*
target-stage->source.enabled=False
replication.policy.class=com.amazonaws.kafka.samples.CustomMM2ReplicationPolicy
replication.factor=1
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
source->target-stage.transforms=avroSchemaTransfer
source->target-stage.transforms.avroSchemaTransfer.transfer.message.keys=false
source->target-stage.transforms.avroSchemaTransfer.src.schema.registry.url=http://schema-reg-src.net:8081
source->target-stage.transforms.avroSchemaTransfer.src.basic.auth.credentials.source=USER_INFO
source->target-stage.transforms.avroSchemaTransfer.src.basic.auth.user.info=user:pass
source->target-stage.transforms.avroSchemaTransfer.dest.schema.registry.url=http://schema-reg-target.net:8081
source->target-stage.transforms.avroSchemaTransfer.dest.basic.auth.credentials.source=USER_INFO
source->target-stage.transforms.avroSchemaTransfer.dest.basic.auth.user.info=user:pass
source->target-stage.transforms.avroSchemaTransfer.type=cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer
The logs show these errors:
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
Caused by: cricket.jmoore.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema 1 not found; error code: 40403
I can’t figure out what to do with the magic byte and who needed Schema 1?
MirrorMaker 2 logs slice
/var/log/kafka/connect-mirror-maker.log:
[2022-12-29 15:50:17,618] INFO Initializing: org.apache.kafka.connect.runtime.TransformationChain{cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer} (org.apache.kafka.connect.runtime.Worker:606)
[2022-12-29 15:50:18,365] INFO WorkerSourceTask{id=MirrorSourceConnector-0} Source task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSourceTask:233)
[2022-12-29 15:50:18,438] INFO [Consumer clientId=consumer-null-12, groupId=null] Cluster ID: tIsZUjsuRvm3HYsWU0lUsA (org.apache.kafka.clients.Metadata:279)
[2022-12-29 15:50:18,536] INFO WorkerSourceTask{id=MirrorSourceConnector-1} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 15:50:18,536] INFO WorkerSourceTask{id=MirrorSourceConnector-1} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 15:50:18,537] ERROR WorkerSourceTask{id=MirrorSourceConnector-1} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:191)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:341)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:256)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:239)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2022-12-29 15:50:18,601] ERROR WorkerSourceTask{id=MirrorSourceConnector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:191)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:341)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:256)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:239)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2022-12-29 15:50:18,602] ERROR WorkerSourceTask{id=MirrorSourceConnector-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:192)
[2022-12-29 15:50:18,604] INFO [Producer clientId=producer-9] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer:1189)
[2022-12-29 15:50:18,608] INFO Stopping task-thread-MirrorSourceConnector-0 took 6 ms. (org.apache.kafka.connect.mirror.MirrorSourceTask:120)
[2022-12-29 15:50:18,608] INFO [Producer clientId=target-stage-producer] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1189)
[2022-12-29 15:50:22,701] ERROR Unable to fetch schema id 1 in source registry for record value (cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer:159)
[2022-12-29 15:50:22,701] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 15:50:22,706] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 15:50:22,706] ERROR WorkerSourceTask{id=MirrorHeartbeatConnector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:191)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:341)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:256)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:239)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2022-12-29 15:50:18,602] ERROR WorkerSourceTask{id=MirrorSourceConnector-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:192)
[2022-12-29 15:50:18,604] INFO [Producer clientId=producer-9] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer:1189)
[2022-12-29 15:50:18,608] INFO Stopping task-thread-MirrorSourceConnector-0 took 6 ms. (org.apache.kafka.connect.mirror.MirrorSourceTask:120)
[2022-12-29 15:50:18,608] INFO [Producer clientId=target-stage-producer] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1189)
[2022-12-29 15:50:22,701] ERROR Unable to fetch schema id 1 in source registry for record value (cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer:159)
[2022-12-29 15:50:22,701] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 15:50:22,706] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 15:50:22,706] ERROR WorkerSourceTask{id=MirrorHeartbeatConnector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:191)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:341)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:256)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:239)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.ConnectException: cricket.jmoore.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema 1 not found; error code: 40403
at cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer.translateRegistrySchema(SchemaRegistryTransfer.java:163)
at cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer.updateKeyValue(SchemaRegistryTransfer.java:113)
at cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer.apply(SchemaRegistryTransfer.java:72)
at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
... 11 more
Caused by: cricket.jmoore.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema 1 not found; error code: 40403
at cricket.jmoore.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
at cricket.jmoore.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
at cricket.jmoore.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:660)
at cricket.jmoore.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:642)
at cricket.jmoore.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:225)
at cricket.jmoore.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaBySubjectAndId(CachedSchemaRegistryClient.java:299)
at cricket.jmoore.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaById(CachedSchemaRegistryClient.java:284)
at cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer.translateRegistrySchema(SchemaRegistryTransfer.java:157)
... 16 more
[2022-12-29 15:50:22,706] ERROR WorkerSourceTask{id=MirrorHeartbeatConnector-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:192)
[2022-12-29 15:50:22,707] INFO [Producer clientId=target-stage-producer] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1189)
[2022-12-29 15:51:06,669] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 15:51:06,670] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 15:51:06,705] INFO WorkerSourceTask{id=MirrorCheckpointConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 15:51:06,705] INFO WorkerSourceTask{id=MirrorCheckpointConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 15:51:06,711] INFO Stopping task-thread-MirrorCheckpointConnector-0 took 6 ms. (org.apache.kafka.connect.mirror.MirrorCheckpointTask:98)
[2022-12-29 15:51:06,711] INFO [Producer clientId=target-stage-producer] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1189)
[2022-12-29 15:51:06,784] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} Finished commitOffsets successfully in 115 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:586)
[2022-12-29 15:51:12,501] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 15:51:12,502] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 15:51:17,691] INFO WorkerSourceTask{id=MirrorSourceConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 15:51:17,691] INFO WorkerSourceTask{id=MirrorSourceConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 15:51:17,713] INFO WorkerSourceTask{id=MirrorCheckpointConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 15:51:17,714] INFO WorkerSourceTask{id=MirrorCheckpointConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 17:00:07,890] INFO Found 0 new topic-partitions on source. Found 0 deleted topic-partitions on source. Found 97 topic-partitions missing on target-stage.
(org.apache.kafka.connect.mirror.MirrorSourceConnector:241)
[2022-12-29 17:10:07,976] INFO Found 0 new topic-partitions on source. Found 0 deleted topic-partitions on source. Found 97 topic-partitions missing on target-stage.
(org.apache.kafka.connect.mirror.MirrorSourceConnector:241)
[2022-12-29 17:16:07,140] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 17:16:07,140] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 17:16:07,142] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} Finished commitOffsets successfully in 2 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:586)
[2022-12-29 17:16:12,556] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 17:16:12,556] INFO WorkerSourceTask{id=MirrorHeartbeatConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 17:16:17,831] INFO WorkerSourceTask{id=MirrorSourceConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 17:16:17,832] INFO WorkerSourceTask{id=MirrorSourceConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 17:16:17,833] INFO WorkerSourceTask{id=MirrorCheckpointConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 17:16:17,833] INFO WorkerSourceTask{id=MirrorCheckpointConnector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
[2022-12-29 17:16:17,833] INFO WorkerSourceTask{id=MirrorSourceConnector-1} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:488)
[2022-12-29 17:16:17,833] INFO WorkerSourceTask{id=MirrorSourceConnector-1} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:505)
Thanks for using my SMT.
The unknown byte error will occur if someone doesn't send a valid payload into your topic.
The schema not found error can only happen on the source registry, as the error says, since it's copying from there.
After disabling the SMT schema transfer, message replication starts working again
Okay, but can you deserialize those events with a consumer successfully?
We have a kafka streams application (2.0) which is communicating with kafka brokers (1.1.0). The streams application has been reprocessing the entire log for no discernible reason - the application hadn't been restarted, wasn't being rebalanced, and was just sitting around - in some cases it was processing messages, in others it was waiting to receive messages (having processed messages less than 6 hours ago). We've done a fair amount of research and have ruled out a potential cause by setting the offset-retention-minutes to 1 week, the same amount of time as our message retention. Additionally, it wouldn't make sense that this would be the root cause of the issue the consumer group offset was reset while it was actively processing messages.
There is nothing interesting in the broker logs around the time of the events:
[2019-02-21 09:02:20,009] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-02-21 09:12:20,009] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-02-21 09:12:51,084] INFO [ProducerStateManager partition=MY_TOPIC-1] Writing producer snapshot at offset 422924 (kafka.log.ProducerStateManager)
[2019-02-21 09:12:51,085] INFO [Log partition=MY_TOPIC-1, dir=/data1/kafka] Rolled new log segment at offset 422924 in 1 ms. (kafka.log.Log)
[2019-02-21 09:14:56,384] INFO [ProducerStateManager partition=MY_TOPIC-12] Writing producer snapshot at offset 295610 (kafka.log.ProducerStateManager)
[2019-02-21 09:14:56,384] INFO [Log partition=MY_TOPIC-12, dir=/data1/kafka] Rolled new log segment at offset 295610 in 1 ms. (kafka.log.Log)
[2019-02-21 09:15:19,365] INFO [ProducerStateManager partition=__transaction_state-8] Writing producer snapshot at offset 3939084 (kafka.log.ProducerStateManager)
[2019-02-21 09:15:19,365] INFO [Log partition=__transaction_state-8, dir=/data1/kafka] Rolled new log segment at offset 3939084 in 0 ms. (kafka.log.Log)
[2019-02-21 09:21:26,755] INFO [ProducerStateManager partition=MY_TOPIC-9] Writing producer snapshot at offset 319799 (kafka.log.ProducerStateManager)
[2019-02-21 09:21:26,755] INFO [Log partition=MY_TOPIC-9, dir=/data1/kafka] Rolled new log segment at offset 319799 in 1 ms. (kafka.log.Log)
[2019-02-21 09:22:20,009] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-02-21 09:23:31,283] INFO [ProducerStateManager partition=__consumer_offsets-17] Writing producer snapshot at offset 47345110 (kafka.log.ProducerStateManager)
[2019-02-21 09:23:31,297] INFO [Log partition=__consumer_offsets-17, dir=/data1/kafka] Rolled new log segment at offset 47345110 in 28 ms. (kafka.log.Log)
And absolutely nothing in the application logs (even with the log level set to DEBUG).
Any ideas about what might be causing this issue?
Upgrading the Kafka brokers to 2.0.0 resolved this issue.
I am using a 3 nodes kafka connect cluster to write data from a source to to kafka topic and from topic to destination. Everything works fine in distributed mode but when one of the worker is stopped and then restarted then I am getting the below message.
[2017-09-13 23:48:44,519] WARN Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:741)
[2017-09-13 23:48:44,519] INFO Current config state offset 5 is behind group assignment 20, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:785)
[2017-09-13 23:48:45,018] INFO Finished reading to end of log and updated config snapshot, new config log offset: 5 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:789)
[2017-09-13 23:48:45,018] INFO Current config state offset 5 does not match group assignment 20. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:765)
[2017-09-13 23:48:45,018] INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1187)
[2017-09-13 23:48:45,018] INFO Wasn't unable to resume work after last rebalance, can skip stopping connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1219)
[2017-09-13 23:48:45,018] INFO (Re-)joining group connect-cluster (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:432)
[2017-09-13 23:48:45,023] INFO Successfully joined group connect-cluster with generation 38 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:399)
[2017-09-13 23:48:45,023] INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-e51c1e8b-c95a-406b-8c56-2a0d4fc432f6', leaderUrl='http://10.10.10.10:8083/', offset=20, connectorIds=[], taskIds=[oracle_jdbc_sink_test-0]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1166)
[2017-09-13 23:48:45,023] WARN Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:741)
[2017-09-13 23:48:45,023] INFO Current config state offset 5 is behind group assignment 20, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:785)
[2017-09-13 23:48:45,535] INFO Finished reading to end of log and updated config snapshot, new config log offset: 5 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:789)
[2017-09-13 23:48:45,535] INFO Current config state offset 5 does not match group assignment 20. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:765)
[2017-09-13 23:48:45,535] INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1187)
[2017-09-13 23:48:45,535] INFO Wasn't unable to resume work after last rebalance, can skip stopping connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1219)
[2017-09-13 23:48:45,535] INFO (Re-)joining group connect-cluster (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:432)
[2017-09-13 23:48:45,540] INFO Successfully joined group connect-cluster with generation 38 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:399)
[2017-09-13 23:48:45,540] INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-e51c1e8b-c95a-406b-8c56-2a0d4fc432f6', leaderUrl='http://10.10.10.10:8083/', offset=20, connectorIds=[], taskIds=[oracle_jdbc_sink_test-0]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1166)
[2017-09-13 23:48:45,540] WARN Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:741)
[2017-09-13 23:48:45,540] INFO Current config state offset 5 is behind group assignment 20, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:785)
[2017-09-13 23:48:46,042] INFO Finished reading to end of log and updated config snapshot, new config log offset: 5 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:789)
[2017-09-13 23:48:46,042] INFO Current config state offset 5 does not match group assignment 20. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:765)
[2017-09-13 23:48:46,042] INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1187)
[2017-09-13 23:48:46,042] INFO Wasn't unable to resume work after last rebalance, can skip stopping connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1219)
You can try resolving this issue by deleting and recreating config topics or changing the group Id.
I was making some tests on an old topic when I noticed some strange behaviours. Reading Kafka's log I noticed this "removed 8 expired offsets" message:
[GroupCoordinator 1001]: Stabilized group GROUP_NAME generation 37 (kafka.coordinator.GroupCoordinator)
[GroupCoordinator 1001]: Assignment received from leader for group GROUP_NAME for generation 37 (kafka.coordinator.GroupCoordinator)
Deleting segment 0 from log __consumer_offsets-31. (kafka.log.Log)
Deleting segment 0 from log __consumer_offsets-45. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-45/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-31/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-13. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-13/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-11. (kafka.log.Log)
Deleting segment 4885 from log __consumer_offsets-11. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-11/00000000000000004885.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-11/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-26. (kafka.log.Log)
Deleting segment 12406 from log __consumer_offsets-26. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-26/00000000000000012406.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-26/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-22. (kafka.log.Log)
Deleting segment 8643 from log __consumer_offsets-22. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-22/00000000000000008643.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-22/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-6. (kafka.log.Log)
Deleting segment 9757 from log __consumer_offsets-6. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-6/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-6/00000000000000009757.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-14. (kafka.log.Log)
Deleting segment 1 from log __consumer_offsets-14. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-14/00000000000000000001.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-14/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
[GroupCoordinator 1001]: Preparing to restabilize group GROUP_NAME with old generation 37 (kafka.coordinator.GroupCoordinator)
[GroupCoordinator 1001]: Stabilized group GROUP_NAME generation 38 (kafka.coordinator.GroupCoordinator)
[GroupCoordinator 1001]: Assignment received from leader for group GROUP_NAME for generation 38 (kafka.coordinator.GroupCoordinator)
[Group Metadata Manager on Broker 1001]: Removed 8 expired offsets in 1 milliseconds. (kafka.coordinator.GroupMetadataManager)
In fact, I have 2 questions:
How does this offset expiration work for a consumer group?
Can this expired offset explain this behaviour where my consumer would not poll anything when it had auto.offset.reset = latest, but it polled from the last committed offset when it had auto.offset.reset = earliest ?
Update
Since Apache Kafka 2.1, offsets won't be deleted as long as the consumer group is active, independent if the consumers commit offsets or not, ie, the offset.retention.minutes clocks only starts to tick when the group becomes empty (in older released, the clock started to tick directly when the commit happened).
Cf. https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets
Original Answer
Kafka, by default deletes committed offsets after a configurable period of time. See parameter offsets.retention.minutes. Ie, if a consumer group is inactive (ie, does not commit any offsets) for this amount of time, the offsets get deleted. Thus, even if the consumer is running, if it does not commit offsets for some partitions, those offsets are subject to offset.retention.minutes.
If you start a consumer, the following happens:
look for a (valid) committed offset (for the consumer group)
if valid offset is found, resume from there
if no valid offset is found, reset offset according to auto.offset.reset parameter
Thus, if your offsets got deleted and auto.offset.reset = latest, you consumer will not poll anything until new data is added to the topic. If auto.offset.reset = earliest it should consume the whole topic.
See this JIRA for a discussion about this https://issues.apache.org/jira/browse/KAFKA-3806 and https://issues.apache.org/jira/browse/KAFKA-4682
Check my answer here. You should not forget about file rolling. It impacts offset files removal.
I found this in my server.log:
[2016-03-29 18:24:59,349] INFO Scheduling log segment 3773408933 for log g17-4 for deletion. (kafka.log.Log)
[2016-03-29 18:24:59,349] INFO Scheduling log segment 3778380412 for log g17-4 for deletion. (kafka.log.Log)
[2016-03-29 18:24:59,403] WARN [ReplicaFetcherThread-3-4], Replica 2 for partition [g17,4] reset its fetch offset from 3501121050 to current leader 4's start offset 3501121050 (kafka.server.ReplicaFetcherThread)
[2016-03-29 18:24:59,403] ERROR [ReplicaFetcherThread-3-4], Current offset 3781428103 for partition [g17,4] out of range; reset offset to 3501121050 (kafka.server.ReplicaFetcherThread)
[2016-03-29 18:25:27,816] INFO Rolled new log segment for 'g17-12' in 1 ms. (kafka.log.Log)
[2016-03-29 18:25:35,548] INFO Rolled new log segment for 'g18-10' in 2 ms. (kafka.log.Log)
[2016-03-29 18:25:35,707] INFO Partition [g18,10] on broker 2: Shrinking ISR for partition [g18,10] from 2,4 to 2 (kafka.cluster.Partition)
[2016-03-29 18:25:36,042] INFO Partition [g18,10] on broker 2: Expanding ISR for partition [g18,10] from 2 to 2,4 (kafka.cluster.Partition)
The offset of replication is larger than leader's, so the replication data will delete, and then copy the the data from leader.
But when copying, the cluster is very slow; some storm topology fail due to no response from Kafka.
How do I prevent this problem from occurring?
How do I slow down the replication rate, while replication is copying?