I have an apache kafka streams application . I notice that it sometimes shutsdown when a rebalancing occurs with no real reason for the shutdown . It doesn't even throw an exception.
Here are some logs on the same
[2022-03-08 17:13:37,024] INFO [Consumer clientId=svc-stream-collector-StreamThread-1-consumer, groupId=svc-stream-collector] Adding newly assigned partitions: (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2022-03-08 17:13:37,024] ERROR stream-thread [svc-stream-collector-StreamThread-1] A Kafka Streams client in this Kafka Streams application is requesting to shutdown the application (org.apache.kafka.streams.processor.internals.StreamThread)
[2022-03-08 17:13:37,030] INFO stream-client [svc-stream-collector] State transition from REBALANCING to PENDING_ERROR (org.apache.kafka.streams.KafkaStreams)
old state:REBALANCING new state:PENDING_ERROR
[2022-03-08 17:13:37,031] INFO [Consumer clientId=svc-stream-collector-StreamThread-1-consumer, groupId=svc-stream-collector] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2022-03-08 17:13:37,032] INFO stream-thread [svc-stream-collector-StreamThread-1] Informed to shut down (org.apache.kafka.streams.processor.internals.StreamThread)
[2022-03-08 17:13:37,032] INFO stream-thread [svc-stream-collector-StreamThread-1] State transition from PARTITIONS_REVOKED to PENDING_SHUTDOWN (org.apache.kafka.streams.processor.internals.StreamThread)
[2022-03-08 17:13:37,067] INFO stream-thread [svc-stream-collector-StreamThread-1] Thread state is already PENDING_SHUTDOWN, skipping the run once call after poll request (org.apache.kafka.streams.processor.internals.StreamThread)
[2022-03-08 17:13:37,067] WARN stream-thread [svc-stream-collector-StreamThread-1] Detected that shutdown was requested. All clients in this app will now begin to shutdown (org.apache.kafka.streams.processor.internals.StreamThread)
I'm suspecting its because there are no newly assigned partitions in the log below
[2022-03-08 17:13:37,024] INFO [Consumer clientId=svc-stream-collector-StreamThread-1-consumer, groupId=svc-stream-collector] Adding newly assigned partitions: (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
However I'm not exactly sure why this error occurs . Any help would be appreciated.
Related
A jobmanager and taskmanager are running on a single VM. Also Kafka runs on the same server.
I have 10 tasks, all read from different kafka topics , process messages and write back to Kafka.
Sometimes I find my task manager is down and nothing is working. I tried to figure out the problem by checking the logs and I believe it is a problem with Kafka connection. (Or maybe a network problem?. But everything is on a single server.)
What I want to ask is, if for a short period I lose connection to Kafka what happens. Why tasks are failing and most importantly why task manager crushes?
Some logs:
2022-11-26 23:35:15,626 INFO org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-15] Disconnecting from node 0 due to request timeout.
2022-11-26 23:35:15,626 INFO org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-8] Disconnecting from node 0 due to request timeout.
2022-11-26 23:35:15,626 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=cpualgosgroup1-1, groupId=cpualgosgroup1] Disconnecting from node 0 due to request timeout.
2022-11-26 23:35:15,692 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=telefilter1-0, groupId=telefilter1] Cancelled in-flight FETCH request with correlation id 3630156 due to node 0 being disconnected (elapsed time since creation: 61648ms, elapsed time since send: 61648ms, request timeout: 30000ms)
2022-11-26 23:35:15,702 INFO org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-15] Cancelled in-flight PRODUCE request with correlation id 2159429 due to node 0 being disconnected (elapsed time since creation: 51069ms, elapsed time since send: 51069ms, request timeout: 30000ms)
2022-11-26 23:35:15,702 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=cpualgosgroup1-1, groupId=cpualgosgroup1] Cancelled in-flight FETCH request with correlation id 2344708 due to node 0 being disconnected (elapsed time since creation: 51184ms, elapsed time since send: 51184ms, request timeout: 30000ms)
2022-11-26 23:35:15,702 INFO org.apache.kafka.clients.NetworkClient [] - [Producer clientId=producer-15] Cancelled in-flight PRODUCE request with correlation id 2159430 due to node 0 being disconnected (elapsed time since creation: 51069ms, elapsed time since send: 51069ms, request timeout: 30000ms)
2022-11-26 23:35:15,842 WARN org.apache.kafka.clients.producer.internals.Sender [] - [Producer clientId=producer-15] Received invalid metadata error in produce request on partition tele.alerts.cpu-4 due to org.apache.kafka.common.errors.NetworkException: Disconnected from node 0. Going to request metadata update now
2022-11-26 23:35:15,842 WARN org.apache.kafka.clients.producer.internals.Sender [] - [Producer clientId=producer-8] Received invalid metadata error in produce request on partition tele.alerts.cpu-6 due to org.apache.kafka.common.errors.NetworkException: Disconnected from node 0. Going to request metadata update now
2
and then
2022-11-26 23:35:56,673 WARN org.apache.flink.runtime.taskmanager.Task [] - CPUTemperatureAnalysisAlgorithm -> Sink: Writer -> Sink: Committer (1/1)#0 (619139347a459b6de22089ff34edff39_d0ae1ab03e621ff140fb6b0b0a2932f9_0_0) switched from RUNNING to FAILED with failure cause: org.apache.flink.util.FlinkException: Disconnect from JobManager responsible for 8d57994a59ab86ea9ee48076e80a7c7f.
at org.apache.flink.runtime.taskexecutor.TaskExecutor.disconnectJobManagerConnection(TaskExecutor.java:1702)
...
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.util.concurrent.TimeoutException: The heartbeat of JobManager with id 99d52303d7e24496ae661ddea2b6a372 timed out.
2022-11-26 23:35:56,682 INFO org.apache.flink.runtime.taskmanager.Task [] - Triggering cancellation of task code CPUTemperatureAnalysisAlgorithm -> Sink: Writer -> Sink: Committer (1/1)#0 (619139347a459b6de22089ff34edff39_d0ae1ab03e621ff140fb6b0b0a2932f9_0_0).
2022-11-26 23:35:57,199 INFO org.apache.flink.runtime.taskmanager.Task [] - Attempting to fail task externally TemperatureAnalysis -> Sink: Writer -> Sink: Committer (1/1)#0 (619139347a459b6de22089ff34edff39_15071110d0eea9f1c7f3d75503ff58eb_0_0).
2022-11-26 23:35:57,202 WARN org.apache.flink.runtime.taskmanager.Task [] - TemperatureAnalysis -> Sink: Writer -> Sink: Committer (1/1)#0 (619139347a459b6de22089ff34edff39_15071110d0eea9f1c7f3d75503ff58eb_0_0) switched from RUNNING to FAILED with failure cause: org.apache.flink.util.FlinkException: Disconnect from JobManager responsible for 8d57994a59ab86ea9ee48076e80a7c7f.
at org.apache.flink.runtime.taskexecutor.TaskExecutor.disconnectJobManagerConnection(TaskExecutor.java:1702)
Why taskexecutor loses connection to JobManager?
If I dont care any data lost, how should I configure Kafka clients and flink recovery. I just want Kafka Client not to die. Especially I dont want my tasks or task managers to crush. If I lose connection, is it possible to configure Flink to just for wait? If we can`t read, wait and if we can't write back to Kafka, just wait?
The heartbeat of JobManager with id 99d52303d7e24496ae661ddea2b6a372 timed out.
Sounds like the server is somewhat overloaded. But you could try increasing the heartbeat timeout.
Running Kafka Connect via this repo https://github.com/entechlog/kafka-examples/tree/master/kafka-connect-standalone except I have added extra configs for AWS MSK IAM authentication. I've also updated the .env file to use different variables, like the AWS MSK IAM jar file, AWS key/secret key credentials, and a few other minor things. Note that this repo runs in standalone mode, but I have updated the launch shell script to run in distributed mode: exec connect-distributed /etc/"${COMPONENT}"/"${COMPONENT}".properties. But I have NOT created a file called kafka-connect.properties.template because when I do, I get a whole host of errors like Missing required configuration "group.id" which has no default value. which makes no sense to me as I can see it in the docker-compose.yml file.
My goal is to get data from a third party Kafka cluster into BigQuery.
When I run my docker-compose file, I see no errors, a few warnings, but nothing that stands out to me. I get a lot of warnings like this: [2021-12-14 01:57:37,917] WARN The configuration 'camelcase.default.dataset' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:380), and this dataset is required to send data to BigQuery. Not making sense why things like dataset are not being used.
Here are the latest logs:
[2021-12-14 01:57:37,921] INFO Kafka version: 6.2.1-ce (org.apache.kafka.common.utils.AppInfoParser:119)
[2021-12-14 01:57:37,922] INFO Kafka commitId: 14770bfc4e973178 (org.apache.kafka.common.utils.AppInfoParser:120)
[2021-12-14 01:57:37,922] INFO Kafka startTimeMs: 1639447057921 (org.apache.kafka.common.utils.AppInfoParser:121)
[2021-12-14 01:57:39,332] INFO [Producer clientId=producer-3] Cluster ID: k2eIXxm_RkmWu2-R2d0N1Q (org.apache.kafka.clients.Metadata:279)
[2021-12-14 01:57:39,413] INFO [Consumer clientId=consumer-connect-kafka-connect-group-3, groupId=connect-kafka-connect-group] Cluster ID: k2eIXxm_RkmWu2-R2d0N1Q (org.apache.kafka.clients.Metadata:279)
[2021-12-14 01:57:39,428] INFO [Consumer clientId=consumer-connect-kafka-connect-group-3, groupId=connect-kafka-connect-group] Subscribed to partition(s): connect-configs-0 (org.apache.kafka.clients.consumer.KafkaConsumer:1123)
[2021-12-14 01:57:39,428] INFO [Consumer clientId=consumer-connect-kafka-connect-group-3, groupId=connect-kafka-connect-group] Seeking to EARLIEST offset of partition connect-configs-0 (org.apache.kafka.clients.consumer.internals.SubscriptionState:619)
[2021-12-14 01:57:40,727] INFO Finished reading KafkaBasedLog for topic connect-configs (org.apache.kafka.connect.util.KafkaBasedLog:228)
[2021-12-14 01:57:40,728] INFO Started KafkaBasedLog for topic connect-configs (org.apache.kafka.connect.util.KafkaBasedLog:230)
[2021-12-14 01:57:40,729] INFO Started KafkaConfigBackingStore (org.apache.kafka.connect.storage.KafkaConfigBackingStore:290)
[2021-12-14 01:57:40,729] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Herder started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:312)
[2021-12-14 01:57:45,059] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Cluster ID: k2eIXxm_RkmWu2-R2d0N1Q (org.apache.kafka.clients.Metadata:279)
[2021-12-14 01:57:45,089] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Discovered group coordinator <bootstrap server and port here> (id: 2147483643 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:848)
[2021-12-14 01:57:45,095] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:221)
[2021-12-14 01:57:45,095] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:538)
[2021-12-14 01:57:45,957] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:538)
[2021-12-14 01:57:49,038] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Successfully joined group with generation Generation{generationId=1, memberId='connect-1-a4cc5355-60da-46c3-8228-bff2de664f2c', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:594)
[2021-12-14 01:57:49,160] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Successfully synced group in generation Generation{generationId=1, memberId='connect-1-a4cc5355-60da-46c3-8228-bff2de664f2c', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:758)
[2021-12-14 01:57:49,161] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Joined group at generation 1 with protocol version 2 and got assignment: Assignment{error=0, leader='connect-1-a4cc5355-60da-46c3-8228-bff2de664f2c', leaderUrl='http://kafka-connect:8083/', offset=10, connectorIds=[], taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1694)
[2021-12-14 01:57:49,162] WARN [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1119)
[2021-12-14 01:57:49,162] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Current config state offset -1 is behind group assignment 10, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1183)
[2021-12-14 01:57:49,359] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Finished reading to end of log and updated config snapshot, new config log offset: 10 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1190)
[2021-12-14 01:57:49,359] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Starting connectors and tasks using config offset 10 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1244)
[2021-12-14 01:57:49,359] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1272)
[2021-12-14 01:57:50,791] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Session key updated (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1582)
And also this:
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource will be ignored.
WARNING: The (sub)resource method createConnector in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectors in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectorPlugins in org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource contains empty path annotation.
And when I navigate to http://localhost:8083/connectors in my browser, I get an empty list [].
I'm using kafka 1.1.0. A kafka stream consistently throws this exception (albeit with different messages)
WARN o.a.k.s.p.i.RecordCollectorImpl#onCompletion:166 - task [0_0] Error sending record (key KEY value VALUE timestamp TIMESTAMP) to topic OUTPUT_TOPIC due to Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.; No more records will be sent and no more offsets will be recorded for this task.
WARN o.a.k.s.p.i.AssignedStreamsTasks#closeZombieTask:202 - stream-thread [90556797-3a33-4e35-9754-8a63200dc20e-StreamThread-1] stream task 0_0 got migrated to another thread already. Closing it as zombie.
WARN o.a.k.s.p.internals.StreamThread#runLoop:752 - stream-thread [90556797-3a33-4e35-9754-8a63200dc20e-StreamThread-1] Detected a task that got migrated to another thread. This implies that this thread missed a rebalance and dropped out of the consumer group. Trying to rejoin the consumer group now.
org.apache.kafka.streams.errors.TaskMigratedException: StreamsTask taskId: 0_0
ProcessorTopology:
KSTREAM-SOURCE-0000000000:
topics:
[INPUT_TOPIC]
children: [KSTREAM-PEEK-0000000001]
KSTREAM-PEEK-0000000001:
children: [KSTREAM-MAP-0000000002]
KSTREAM-MAP-0000000002:
children: [KSTREAM-SINK-0000000003]
KSTREAM-SINK-0000000003:
topic:
OUTPUT_TOPIC
Partitions [INPUT_TOPIC-0]
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:238)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:411)
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:918)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:798)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:750)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:720)
Caused by: org.apache.kafka.common.errors.ProducerFencedException: task [0_0] Abort sending since producer got fenced with a previous record
I'm not sure what is causing this exception. When I restart application it appears to successfully process a few records before failing with the same exception. Strangely enough, the records are successfully processed several times even though the stream is set to exactly once processing. Here is the stream configuration:
Properties streamProperties = new Properties();
streamProperties.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
streamProperties.put(StreamsConfig.APPLICATION_ID_CONFIG, service.getName());
streamProperties.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, "exactly_once");
//Should be DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG - but that field is private.
streamProperties.put("default.production.exception.handler", ErrorHandler.class);
streamProperties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokerUrl);
streamProperties.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 3);
streamProperties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 10);
streamProperties.put(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl);
streamProperties.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true);
Out of the three servers, only two generate relevant logs when restarting the streams application. Here are logs from the first server:
[2018-05-09 14:42:14,635] INFO [GroupCoordinator 1]: Member INPUT_TOPIC-09dd8ac8-2cd6-4dd1-b963-63ea804c8fcc-StreamThread-1-consumer-3fedb398-91fe-480a-b5ee-1b5879d0956c in group INPUT_TOPIC has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2018-05-09 14:42:14,636] INFO [GroupCoordinator 1]: Preparing to rebalance group INPUT_TOPIC with old generation 1 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2018-05-09 14:42:14,636] INFO [GroupCoordinator 1]: Group INPUT_TOPIC with generation 2 is now empty (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2018-05-09 14:42:15,848] INFO [GroupCoordinator 1]: Preparing to rebalance group INPUT_TOPIC with old generation 2 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2018-05-09 14:42:15,848] INFO [GroupCoordinator 1]: Stabilized group INPUT_TOPIC generation 3 (__consumer_offsets-29) (kafka.coordinator.group.GroupCoordinator)
[2018-05-09 14:42:15,871] INFO [GroupCoordinator 1]: Assignment received from leader for group INPUT_TOPIC for generation 3 (kafka.coordinator.group.GroupCoordinator)
And from the second server:
[2018-05-09 14:42:16,228] INFO [TransactionCoordinator id=0] Initialized transactionalId INPUT_TOPIC-0_0 with producerId 2010 and producer epoch 37 on partition __transaction_state-37 (kafka.coordinator.transaction.TransactionCoordinator)
[2018-05-09 14:44:22,121] INFO [TransactionCoordinator id=0] Completed rollback ongoing transaction of transactionalId: INPUT_TOPIC-0_0 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
[2018-05-09 14:44:42,263] ERROR [ReplicaManager broker=0] Error processing append operation on partition OUTPUT_TOPIC-0 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 37 (request epoch), 38 (server epoch)
It appears like the first server sees that the consumer has failed and removes it from the consumer group before it is registered with the second server. Any ideas what could be causing the consumer to fail? Or, any ideas handling this failure gracefully? It's possible that it is this bug, does anyone know of a possible workaround?
I'm not sure what caused the problem, but reducing the max.poll.records to 1 fixed the problem.
Whats the default behavior of kafka (version 0.10) consumer if it tries to rejoin the consumer group.
I am using a single consumer for a consumer group but it seems like it got struck at rejoining.
After each 10 min it print following line in consumer logs.
2016-08-11 13:54:53,803 INFO o.a.k.c.c.i.ConsumerCoordinator [pool-5-thread-1] ****Revoking previously assigned partitions**** [] for group image-consumer-group
2016-08-11 13:54:53,803 INFO o.a.k.c.c.i.AbstractCoordinator [pool-5-thread-1] (Re-)joining group image-consumer-group
2016-08-11 14:04:53,992 INFO o.a.k.c.c.i.AbstractCoordinator [pool-5-thread-1] Marking the coordinator dead for group image-consumer-group
2016-08-11 14:04:54,095 INFO o.a.k.c.c.i.AbstractCoordinator [pool-5-thread-1] Discovered coordinator for group image-consumer-group.
2016-08-11 14:04:54,096 INFO o.a.k.c.c.i.AbstractCoordinator [pool-5-thread-1] (Re-)joining group image-consumer-group
Restart consumer application is not helping.
If you're gonna to have only one consumer instance in a group, then use the consumer with manual assignment strategy. (Simple Consumer).
Manual topic assignment does not use the consumer's group management functionality so heart beats are not required.
I would appreciate your help on this.
I am building a Apache Kafka consumer to subscribe to another already running Kafka. Now, my problem is that when my producer pushes message to server...my consumer does not receive them .. and I get the below info in my logs printed::
13/08/30 18:00:58 INFO producer.SyncProducer: Connected to xx.xx.xx.xx:6667:false for producing
13/08/30 18:00:58 INFO producer.SyncProducer: Disconnecting from xx.xx.xx.xx:6667:false
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] Stopping leader finder thread
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] Stopping all fetchers
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] All connections stopped
I am not sure if I am missing any important configuration here...However, I can see some messages coming from my server using WireShark but they are not getting consumed by my consumer....
My code is the exact replica of the sample consumer example:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
UPDATE:
[2013-09-03 00:57:30,146] INFO Starting ZkClient event thread.
(org.I0Itec.zkclient.ZkEventThread)
[2013-09-03 00:57:30,146] INFO Opening socket connection to server /xx.xx.xx.xx:2181 (org.apache.zookeeper.ClientCnxn)
[2013-09-03 00:57:30,235] INFO Connected to xx.xx.xx:6667 for producing (kafka.producer.SyncProducer)
[2013-09-03 00:57:30,299] INFO Socket connection established to 10.224.62.212/10.224.62.212:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2013-09-03 00:57:30,399] INFO Disconnecting from xx.xx.xx.net:6667 (kafka.producer.SyncProducer)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] Stopping leader finder thread (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] Stopping all fetchers (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] All connections stopped (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx-1378195030443-cce6fc51], Cleared all relevant queues for this fetcher (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx.-1378195030443-cce6fc51], Cleared the data chunks in all the consumer message iterators (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], Committing all offsets after clearing the fetcher queues (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] ERROR [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], zk client is null. Cannot commit offsets (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], Releasing partition ownership (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], exception during rebalance (kafka.consumer.ZookeeperConsumerConnector)
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:185)
at scala.None$.get(Option.scala:183)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance$2.apply(ZookeeperConsumerConnector.scala:434)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance$2.apply(ZookeeperConsumerConnector.scala:429)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:429)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374)
at scala.collection.immutable.Range$ByOne$class.foreach$mVc$sp(Range.scala:282)
at scala.collection.immutable.Range$$anon$2.foreach$mVc$sp(Range.scala:265)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:369)
at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:681)
at kafka.consumer.ZookeeperConsumerConnector$WildcardStreamsHandler.<init>(ZookeeperConsumerConnector.scala:715)
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreamsByFilter(ZookeeperConsumerConnector.scala:140)
at kafka.consumer.ConsoleConsumer$.main(ConsoleConsumer.scala:196)
at kafka.consumer.ConsoleConsumer.main(ConsoleConsumer.scala)
Can you please provide your producer code sample?
Do you have the latest 0.8 version checked out? It appears that there has been some known issue with consumerFetched deadlock which has been patched and fixed in the current version
you can try to use the admin console script to consume messages making sure your producer is working fine.
If possible post some more logs and code snippet, should help debugging further
(it seems I need more reputation to make a comment so had to answer instead)