JBoss 4.2.2 nodes start to cluster then suspect each other - jboss

I have a website running with JBoss 4.2.2 on an existing Red Hat server. I'm setting up a second server so as to have a clustered pair (which will then be load-balanced). However, I can't get them to cluster successfully.
The existing server starts up JBoss with:
run.sh -c default -b 0.0.0.0
(I know the 'default' configuration doesn't support clustering out of the box - I'm using a modified version of it which includes clustering support.)
When I start the second JBoss instance with the same command, it forms its own cluster without noticing the first. Both use the same partition name and multicast address and port.
I tried the McastReceiverTest and McastSenderTest programs to check that the machines could communicate over multicast; they could.
I then noticed the info at http://docs.jboss.org/jbossas/docs/Clustering_Guide/beta422/html/ch07s07s07.html, saying that JGroups cannot bind to all interfaces, and instead binds to the default interface; so presumably it was binding to 127.0.0.1, and thereby not getting the messages through. So instead I set the instances to tell JGroups to use the internal IPs:
run.sh -c default -b 0.0.0.0 -Djgroups.bind_addr=10.51.1.131
run.sh -c default -b 0.0.0.0 -Djgroups.bind_addr=10.51.1.141
(.131 is the existing server, .141 is the new server).
The nodes now notice each other and form a cluster - at first. However, while trying to deploy the .ear, the server log says this:
2010-08-07 22:26:39,321 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.51.1.131:46294 (own address=10.51.1.141:47629)
2010-08-07 22:26:45,412 WARN [org.jgroups.protocols.FD] I was suspected by 10.51.1.131:48733; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2010-08-07 22:26:49,324 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.51.1.131:46294 (own address=10.51.1.141:47629)
2010-08-07 22:26:49,324 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.51.1.131:46294 (number=0)
2010-08-07 22:26:49,529 DEBUG [org.jgroups.protocols.MERGE2] initial_mbrs=[[own_addr=10.51.1.141:60365, coord_addr=10.51.1.141:60365, is_server=true]]
2010-08-07 22:26:52,092 WARN [org.jboss.cache.TreeCache] replication failure with method_call optimisticPrepare; id:18; Args: ( arg[0] = GlobalTransaction:<10.51.1.131:46294>:5421085 ...) exception org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/Yudu_ear,Yudu-ejb_jar,Yudu-ejbPU/com/yudu/ejb/entity, caller=GlobalTransaction:<10.51.1.131:46294>:5421085, lock=read owners=[GlobalTransaction:<10.51.1.131:46294>:5421081] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=1, waitingUpgrader=0)
...and the .ear fails to deploy.
If I change CacheMode in ejb3-entity-cache-service.xml from REPL_SYNC to LOCAL, the .ear deploys correctly, although of course the entity cache replication then doesn't happen. However, the log still shows interesting signs of the same problem.
It looks like:
first the new node finds the existing one and forms a cluster
then the FD checks fail, and after a set number of failures the new node splits off from the cluster and forms its own cluster of one
then it finds it again, re-clusters and this time the FD checks work.
Relevant bits of the log file:
2010-08-07 23:47:07,423 INFO [org.jgroups.protocols.UDP] socket information: local_addr=10.51.1.141:35666, mcast_addr=228.1.2.3:45566, bind_addr=/10.51.1.141, ttl=2 sock: bound to 10.51.1.141:35666, receive buffer size=131071, send buffer size=131071 mcast_recv_sock: bound to 0.0.0.0:45566, send buffer size=131071, receive buffer size=131071 mcast_send_sock: bound to 10.51.1.141:59196, send buffer size=131071, receive buffer size=131071
2010-08-07 23:47:07,431 DEBUG [org.jgroups.protocols.UDP] created unicast receiver thread
2010-08-07 23:47:09,445 DEBUG [org.jgroups.protocols.pbcast.GMS] initial_mbrs are [[own_addr=10.51.1.131:48888, coord_addr=10.51.1.131:48888, is_server=true]]
2010-08-07 23:47:09,446 DEBUG [org.jgroups.protocols.pbcast.GMS] election results: {10.51.1.131:48888=1}
2010-08-07 23:47:09,446 DEBUG [org.jgroups.protocols.pbcast.GMS] sending handleJoin(10.51.1.141:35666) to 10.51.1.131:48888
2010-08-07 23:47:09,751 DEBUG [org.jgroups.protocols.pbcast.GMS] [10.51.1.141:35666]: JoinRsp=[10.51.1.131:48888|61] [10.51.1.131:48888, 10.51.1.141:35666] [size=2]
2010-08-07 23:47:09,752 DEBUG [org.jgroups.protocols.pbcast.GMS] new_view=[10.51.1.131:48888|61] [10.51.1.131:48888, 10.51.1.141:35666]
...
2010-08-07 23:47:10,047 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Number of cluster members: 2
2010-08-07 23:47:10,047 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Other members: 1
...
2010-08-07 23:47:20,034 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.51.1.131:48888 (own address=10.51.1.141:35666)
2010-08-07 23:47:30,037 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.51.1.131:48888 (own address=10.51.1.141:35666)
2010-08-07 23:47:30,038 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.51.1.131:48888 (number=0)
2010-08-07 23:47:40,040 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.51.1.131:48888 (own address=10.51.1.141:35666)
2010-08-07 23:47:40,040 DEBUG [org.jgroups.protocols.FD] heartbeat missing from 10.51.1.131:48888 (number=1)
...
2010-08-07 23:48:19,758 WARN [org.jgroups.protocols.FD] I was suspected by 10.51.1.131:48888; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2010-08-07 23:48:20,054 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.51.1.131:48888 (own address=10.51.1.141:35666)
2010-08-07 23:48:20,055 DEBUG [org.jgroups.protocols.FD] [10.51.1.141:35666]: received no heartbeat ack from 10.51.1.131:48888 for 6 times (60000 milliseconds), suspecting it
2010-08-07 23:48:20,058 DEBUG [org.jgroups.protocols.FD] broadcasting SUSPECT message [suspected_mbrs=[10.51.1.131:48888]] to group
...
2010-08-07 23:48:21,691 DEBUG [org.jgroups.protocols.pbcast.NAKACK] removing 10.51.1.131:48888 from received_msgs (not member anymore)
2010-08-07 23:48:21,691 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (127.0.0.1:1099) received membershipChanged event:
2010-08-07 23:48:21,691 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead members: 0 ([])
2010-08-07 23:48:21,691 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members : 0 ([])
2010-08-07 23:48:21,691 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 1 ([127.0.0.1:1099])
...
2010-08-07 23:49:59,793 WARN [org.jgroups.protocols.FD] I was suspected by 10.51.1.131:48888; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2010-08-07 23:50:09,796 WARN [org.jgroups.protocols.FD] I was suspected by 10.51.1.131:48888; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2010-08-07 23:50:19,144 DEBUG [org.jgroups.protocols.FD] Recevied Ack. is invalid (was from: 10.51.1.131:48888),
2010-08-07 23:50:19,144 DEBUG [org.jgroups.protocols.FD] Recevied Ack. is invalid (was from: 10.51.1.131:48888),
...
2010-08-07 23:50:21,791 DEBUG [org.jgroups.protocols.pbcast.GMS] new=[10.51.1.131:48902], suspected=[], leaving=[], new view: [10.51.1.141:35666|63] [10.51.1.141:35666, 10.51.1.131:48902]
...
2010-08-07 23:50:21,792 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[10.51.1.141:35666|63] [10.51.1.141:35666, 10.51.1.131:48902]
2010-08-07 23:50:21,792 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=10.51.1.141:35666] view is [10.51.1.141:35666|63] [10.51.1.141:35666, 10.51.1.131:48902]
2010-08-07 23:50:21,822 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 63, delta: 1) : [127.0.0.1:1099, 127.0.0.1:1099]
2010-08-07 23:50:21,822 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] membership changed from 1 to 2
...
2010-08-07 23:50:31,825 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 10.51.1.131:48902 (own address=10.51.1.141:35666)
2010-08-07 23:50:31,832 DEBUG [org.jgroups.protocols.FD] received ack from 10.51.1.131:48902
But I'm at a loss to understand why the FD checks fail the first time round; and although it eventually seems to cluster with the other node, the initial failure seems to be enough to mess up the deployment when it tries to share entity state, and thereby prevent it from actually working in a useful way.
If anyone can shed light on this I'll be hugely grateful!

I think that before you move on to JBoss 4.2.3 (which is probably a good place to be eventually) or building a new configuration (I agree with #skaffman about pruning being easier than adding), you might want to try the following:
On 10.51.1.131:
run.sh -c default -b 10.51.1.131 -Djgroups.bind_addr=10.51.1.131
On 10.51.1.141:
run.sh -c default -b 10.51.1.141 -Djgroups.bind_addr=10.51.1.141
According to all the documentation I can find on this, the -b parameter is the server instance bind address, and having them be different might be creating some significant schizophrenia for JGroups. I had a four-server clustered environment working successfully for over three years, and that was part of the recommended configuration from RH/JBoss (we had a support contract, and got help from Bela Ban).

Related

Zookeeper error: Exception causing close of session 0x0 due to java.io.IOException: Len error

We have a well configured zookeeper and kafka cluster nodes. The manual test for creation a topic and sending a message on that topic passed successfully. But when I run a test from a test equipment in order to create a topic with MQTT protocol, I receive:
Exception causing close of session 0x0 due to java.io.IOException: Len error 271056900
[myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1008] - Closed socket connection for client /192.18.0.1:15659 (no session established for client).
Can someone give me some hint on how to solve this issue?
Looks like you are exceeding your jute.maxbuffer. Try to increase it. Here you can find some more information.
If you are using docker-compose, this helps me:
environment:
KAFKA_OPTS: -Djute.maxbuffer=500000000

Zookeeper unable to talk to new Kafka broker

In an attempt to reduce the storage on my AWS instance I decided to launch a new, smaller instance and setup Kafka again from scratch using the Ansible playbook we had from before. I then terminated the old, larger instance and took its IP address that it and the other brokers were using and put it on my new instance.
When tailing my Zookeeper logs however I'm receiving this error -
2018-04-13 14:17:34,884 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#810] - Connection broken for id 1, my id = 2, error =
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:153)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:211)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
2018-04-13 14:17:34,885 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#813] - Interrupting SendWorker
2018-04-13 14:17:34,884 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#727] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
I double checked and all 3 Kafka broker IP addresses are correctly listed in these location and I restarted all their services to be safe.
/etc/hosts
/etc/kafka/config/server.properties
/etc/zookeeper/conf/zoo.cfg
/etc/filebeat/filebeat.yml

eclipse console suddenly showing INFO

I'm using a apache HttpClient and I've started seeing some INFO output on the eclipse console:
0 [main] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
3 [main] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
3861 [pool-1-thread-25] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
3861 [pool-1-thread-25] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request
3913 [pool-1-thread-16] INFO org.apache.commons.httpclient.HttpMethodBase - Response content length is not known
To my knowledge, nothing has changed. How can I get rid of it?
It's probably your logging library. HttpClient likely depends on commons-logging, which automatically picks up a logging implementation in your classpath (either java.util.logging or log4j) which by default writes on the console.

Leader election with Curator and Zookeeper

I am running 3 instances of ZooKeeper and the config is this:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper1
clientPort=2181
maxClientCnxns=1000
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
I am using the leader election example code given here:
https://git-wip-us.apache.org/repos/asf?p=curator.git;a=tree;f=curator-examples/src/main/java/leader;h=73b547eadb98995c0ccbd06a5b76d0741ffef263;hb=HEAD
The code runs fine with TestingServer but when I change connection string to : "127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183", I get the exceptions:
[main-SendThread(127.0.0.1:2183)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 127.0.0.1/127.0.0.1:2183. Will not attempt to authenticate using SASL (unknown error)
[main-SendThread(127.0.0.1:2183)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /127.0.0.1:56111, server: 127.0.0.1/127.0.0.1:2183
[main-SendThread(127.0.0.1:2183)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 127.0.0.1/127.0.0.1:2183, sessionid = 0x3521552283c0000, negotiated timeout = 40000
[main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
[main-SendThread(127.0.0.1:2183)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x3521552283c0000, likely server has closed socket, closing socket connection and attempting reconnect
[main-EventThread] INFO org.apache.curator.framework.imps.EnsembleTracker - New config event received: null
[main-EventThread] ERROR org.apache.curator.framework.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up
java.lang.NullPointerException
at java.io.ByteArrayInputStream.<init>(ByteArrayInputStream.java:106)
at org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:163)
at org.apache.curator.framework.imps.EnsembleTracker.access$200(EnsembleTracker.java:48)
at org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:134)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:829)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:611)
at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:151)
at org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:210)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:528)
[main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
What could be the issue?
I am hitting the same issue. I think it might be related to the Zookeeper 3.5.1 ClientCnxn. Even though I return back to curator 2.6.0, I still see the same stack trace. A GET_CONFIG event type is sent without the event data.
My stack trace looks like this:
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception was not retry-able or retry gave up
! java.lang.NullPointerException: null
! at java.io.ByteArrayInputStream.(ByteArrayInputStream.java:106)
! at org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:163)
! at org.apache.curator.framework.imps.EnsembleTracker.access$200(EnsembleTracker.java:48)
! at org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:134)
! at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:829)
! at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:611)
! at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:151)
! at org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:210)
! at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)
! at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:528)
If use Zookeeper 3.5.1, then curator-recipes 3.2.1+ fix this issue.

Apache Kafka - Consumer not receiving messages from producer

I would appreciate your help on this.
I am building a Apache Kafka consumer to subscribe to another already running Kafka. Now, my problem is that when my producer pushes message to server...my consumer does not receive them .. and I get the below info in my logs printed::
13/08/30 18:00:58 INFO producer.SyncProducer: Connected to xx.xx.xx.xx:6667:false for producing
13/08/30 18:00:58 INFO producer.SyncProducer: Disconnecting from xx.xx.xx.xx:6667:false
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] Stopping leader finder thread
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] Stopping all fetchers
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] All connections stopped
I am not sure if I am missing any important configuration here...However, I can see some messages coming from my server using WireShark but they are not getting consumed by my consumer....
My code is the exact replica of the sample consumer example:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
UPDATE:
[2013-09-03 00:57:30,146] INFO Starting ZkClient event thread.
(org.I0Itec.zkclient.ZkEventThread)
[2013-09-03 00:57:30,146] INFO Opening socket connection to server /xx.xx.xx.xx:2181 (org.apache.zookeeper.ClientCnxn)
[2013-09-03 00:57:30,235] INFO Connected to xx.xx.xx:6667 for producing (kafka.producer.SyncProducer)
[2013-09-03 00:57:30,299] INFO Socket connection established to 10.224.62.212/10.224.62.212:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2013-09-03 00:57:30,399] INFO Disconnecting from xx.xx.xx.net:6667 (kafka.producer.SyncProducer)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] Stopping leader finder thread (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] Stopping all fetchers (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] All connections stopped (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx-1378195030443-cce6fc51], Cleared all relevant queues for this fetcher (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx.-1378195030443-cce6fc51], Cleared the data chunks in all the consumer message iterators (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], Committing all offsets after clearing the fetcher queues (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] ERROR [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], zk client is null. Cannot commit offsets (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], Releasing partition ownership (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], exception during rebalance (kafka.consumer.ZookeeperConsumerConnector)
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:185)
at scala.None$.get(Option.scala:183)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance$2.apply(ZookeeperConsumerConnector.scala:434)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance$2.apply(ZookeeperConsumerConnector.scala:429)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:429)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374)
at scala.collection.immutable.Range$ByOne$class.foreach$mVc$sp(Range.scala:282)
at scala.collection.immutable.Range$$anon$2.foreach$mVc$sp(Range.scala:265)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:369)
at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:681)
at kafka.consumer.ZookeeperConsumerConnector$WildcardStreamsHandler.<init>(ZookeeperConsumerConnector.scala:715)
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreamsByFilter(ZookeeperConsumerConnector.scala:140)
at kafka.consumer.ConsoleConsumer$.main(ConsoleConsumer.scala:196)
at kafka.consumer.ConsoleConsumer.main(ConsoleConsumer.scala)
Can you please provide your producer code sample?
Do you have the latest 0.8 version checked out? It appears that there has been some known issue with consumerFetched deadlock which has been patched and fixed in the current version
you can try to use the admin console script to consume messages making sure your producer is working fine.
If possible post some more logs and code snippet, should help debugging further
(it seems I need more reputation to make a comment so had to answer instead)