After few successful updates to solr, throws SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper - apache-zookeeper

I am using solr version 7.3.1 and using it with 3 external zookeeper nodes.
Below is my zookeeper config for one of the node, all have similar config:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/<user>/zookeeper/zookdata/zk2/
clientPort=2182
server.1=localhost:2666:3666
server.2=localhost:2667:3667
server.3=localhost:2668:3668
Then, I am using these 3 nodes to start solr. In my application I am using localhost:2182,localhost:2183 to connect to solr, using below code.
List<String> zkHosts = Arrays.asList(solrZkHostPort.split(","));
CloudSolrClient.Builder builder = new CloudSolrClient.Builder(zkHosts, Optional.empty());
solrClient = builder.build();
I am using multiple spark executors to update document to solr. It works fine for some 1100-1300 records update after that update fails with below exception:
Caused by: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:2182,localhost:2183 within 10000 ms
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:183)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:120)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:110)
at org.apache.solr.common.cloud.ZkStateReader.<init>(ZkStateReader.java:285)
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:155)
at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:399)
at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:828)
at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:818)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
at com.package.SomeApplicationClass
... 16 more
Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:2182,localhost:2183 within 10000 ms
at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:232)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:175)
... 26 more
I get below exception too, not sure though if this has any significance:
18/09/20 12:45:40 WARN ClientCnxn: Session 0x0 for server localhost/127.0.0.1:2183, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
any idea what could be the problem? Do I need to change zookeeper config or they way I create solr client needs to change?

Noob mistake.
My spark job was creating more then 1000 spark executors and for each executor, solr client was getting created and I was not closing the solrClient. closed solrclient after executor completed.

Related

Zookeeper unable to talk to new Kafka broker

In an attempt to reduce the storage on my AWS instance I decided to launch a new, smaller instance and setup Kafka again from scratch using the Ansible playbook we had from before. I then terminated the old, larger instance and took its IP address that it and the other brokers were using and put it on my new instance.
When tailing my Zookeeper logs however I'm receiving this error -
2018-04-13 14:17:34,884 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#810] - Connection broken for id 1, my id = 2, error =
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:153)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:211)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
2018-04-13 14:17:34,885 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#813] - Interrupting SendWorker
2018-04-13 14:17:34,884 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#727] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
I double checked and all 3 Kafka broker IP addresses are correctly listed in these location and I restarted all their services to be safe.
/etc/hosts
/etc/kafka/config/server.properties
/etc/zookeeper/conf/zoo.cfg
/etc/filebeat/filebeat.yml

Schema Registry won't start after upgrading to Confluent 4.1

I have recently upgraded Confluent to 4.1 but schema registry seems to have some issues. On confluent start schema-registry (and consequently ksql-server) cannot start.
Here's the error I get in the logs of schema-registry:
[2018-04-20 11:27:38,426] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:65)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:203)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:63)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:41)
at io.confluent.rest.Application.createServer(Application.java:165)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: io.confluent.kafka.schemaregistry.storage.exceptions.StoreException: Failed to write Noop record to kafka store.
at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:139)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:201)
... 4 more
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreException: Failed to write Noop record to kafka store.
at io.confluent.kafka.schemaregistry.storage.KafkaStore.getLatestOffset(KafkaStore.java:423)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.waitUntilKafkaReaderReachesLastOffset(KafkaStore.java:276)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:137)
... 5 more
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:77)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.getLatestOffset(KafkaStore.java:418)
... 7 more
Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
[2018-04-20 11:27:38,430] INFO Shutting down schema registry (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry:726)
[2018-04-20 11:27:38,430] INFO [kafka-store-reader-thread-_schemas]: Shutting down (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:66)
[2018-04-20 11:27:38,431] INFO [kafka-store-reader-thread-_schemas]: Stopped (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:66)
[2018-04-20 11:27:38,440] INFO [kafka-store-reader-thread-_schemas]: Shutdown completed (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:66)
[2018-04-20 11:27:38,446] INFO KafkaStoreReaderThread shutdown complete. (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:227)
I have no clue why this error is reported and the error messages are not that meaningful to me.
After failing, confluent start schema-registry and confluent start ksql-server bring both services up but when starting KSQL I get the following warning:
**************** WARNING ******************
Remote server address may not be valid:
Error issuing GET to KSQL server
Caused by: java.net.ConnectException: Connection refused (Connection refused)
Caused by: Could not connect to the server.
*******************************************
When trying to run a command (e.g. show tables;) the following error is reported:
ksql> show tables;
Error issuing POST to KSQL server
Caused by: java.net.ConnectException: Connection refused (Connection refused)
Caused by: Could not connect to the server.
EDIT: I've fixed this by destroying current run (confluent destroy) but it would be interesting if someone could explain this issue.
From the info you've posted it feels like you may have had some zombie processes or bad data somewhere, though I can't be sure.
The Schema Registry was complaining that it couldn't write a message to Kafka, because the Kafka broker was complaining that it didn't own the topic partition the Schema Registry was writing to. This might of been caused by a previous Kafka broker, (from the old install), still running.
Did you confluent stop before upgrading?
Using confluent destroy, as you did, to flatted/reset the installation is always a good option, as long as you're not precious about your data. Checking for spurious processes, (or using the old 'reboot machine' trick), can also be a good place to start when things aren't behaving as you'd expect.
Glad its all sorted now :D
Andy

Root cause for Connection broken for id 1, my id = 3, error =

I am using Confluent 4 for kafka and zookeeper installation.
On our Kafka Cluster environment (of 3 brokers and 3 zookeeper nodes running on 3 aws instances)
we are seeing a set of below warnings, repeatedly getting recorded in the broker's server.log file.
We have not observed any functionality issues due to this yet, but we are not able to find the root cause and there may be a chance in future it will affect the clients or other broker nodes. We are not sure yet about this. Below is the set of warnings
[2018-04-03 12:00:40,707] WARN Interrupted while waiting for message on queue (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1097)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:932)
[2018-04-03 12:00:40,707] WARN Connection broken for id 1, my id = 3, error = (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1013)
[2018-04-03 12:00:40,708] WARN Interrupting SendWorker (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2018-04-03 12:00:40,707] WARN Send worker leaving thread (org.apache.zookeeper.server.quorum.QuorumCnxManager)
This set of warnings get repeated and getting observed in all 3 kafka nodes.
If anyone has any idea about why this warning gets generate, then please let me know.
Thanks in advance.
This sounds like a known issue with newer version of Zk, Check out this JIRA https://issues.apache.org/jira/browse/ZOOKEEPER-2938
In my case, I was replacing a ZK node and the old one was still running which I didn't realize. So I had created 2x nodes with the same "myid".

SQLTransientConnectionException after upgrading MariaDB connector

I've upgraded the "org.mariadb.jdbc" % "mariadb-java-client" connector from 1.5.9 to 1.6.0 and I'd started to fail while connecting to the DB due to timeout exceptions.
I'm using it with HikariCP 2.5.1 and Slick 3.2.0. If I rollback the change again to MariaDB connector 1.5.9 it successfully connects, and if I try to upgrade directly to 2.0.1, it fails with the very same error.
The thing is that based on the 1.6.0 changelog, we shouldn't experiment any breaking change. But according to the differences in the GitHub repository, it could have some more modifications than the ones specified in the changelog :/
Exception with a local DB:
java.sql.SQLTransientConnectionException: xxx.db - Connection is not available, request timed out after 5006ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
at slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:18)
at slick.jdbc.JdbcBackend$BaseSession.<init>(JdbcBackend.scala:439)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:47)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:38)
at slick.basic.BasicBackend$DatabaseDef$class.acquireSession(BasicBackend.scala:218)
at slick.jdbc.JdbcBackend$DatabaseDef.acquireSession(JdbcBackend.scala:38)
at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:239)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLNonTransientConnectionException: Could not connect to address=(host=localhost)(port=3306)(type=master) : Connection refused
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:156)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:118)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.throwException(ExceptionMapper.java:92)
at org.mariadb.jdbc.Driver.connect(Driver.java:108)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:341)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:193)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:430)
at com.zaxxer.hikari.pool.HikariPool.access$500(HikariPool.java:64)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:570)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:563)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 common frames omitted
Caused by: java.sql.SQLException: Could not connect to address=(host=localhost)(port=3306)(type=master) : Connection refused
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1020)
at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:481)
at org.mariadb.jdbc.Driver.connect(Driver.java:103)
... 12 common frames omitted
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:392)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:1013)
... 14 common frames omitted
Exception with a remote DB:
java.sql.SQLTransientConnectionException: xxx.db - Connection is not available, request timed out after 5003ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
at slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:18)
at slick.jdbc.JdbcBackend$BaseSession.<init>(JdbcBackend.scala:439)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:47)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:38)
at slick.basic.BasicBackend$DatabaseDef$class.acquireSession(BasicBackend.scala:218)
at slick.jdbc.JdbcBackend$DatabaseDef.acquireSession(JdbcBackend.scala:38)
at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:239)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It certainly looks like, at least in the local case, that:
The port is mis-configured. Or,
The pool size exceeds the database maximum. Or,
The username/password is incorrect. Or,
The user does not have permission to connect to the server or the specified database.
java.net.ConnectException: Connection refused is a socket level error. Typically, this indicates that either there was no server running on the specified port, or the server is rejecting the connection for some other reason (security, etc).
You might double check all of the driver/datasource properties to verify that they are correct. It might be useful if you could post your HikariCP configuration.
Disclaimer: I am one of the mariadb developper
In 1.6.0 version, "usePipelineAuth" option has changed the connection implementation.
During connection, different queries are executed. When option is active those queries are send using pipeline (all queries are send, then only all results are reads), permitting faster connection creation.
That permit to saving network latency.
Disabling this option will probably solved your issue.
At the same time, i've create an issue on mariadb tracker.

Zookeeper: Connection request from old client will be dropped if server is in r-o mode

storm version: 0.82
zookeeper version: 3.4.5.
We have a small storm cluster (1 nimbus and 3 supervisors), so using just 1 zookeeper instance that's co-located with storm nimbus.
Infrequently we start getting the following errors in the zookeeper logs and our storm cluster comes to a standstill.
2014-04-05 13:27:32,885 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFact
ory#197] - Accepted socket connection from /10.0.1.183:56121
2014-04-05 13:27:32,886 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#7
93] - Connection request from old client /10.0.1.183:56121; will be dropped if server is in r-o mode
2014-04-05 13:27:32,886 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#8
32] - Client attempting to renew session 0x1452dd02834002e at /10.0.1.183:56121
2014-04-05 13:27:32,886 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#5
95] - Established session 0x1452dd02834002e with negotiated timeout 40000 for client /10.0.1.183:561
21
On the storm end we start seeing the following in supervisor and worker logs:
2014-04-05 11:37:29 ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.
2014-04-05 11:37:29 cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2014-04-05 11:37:31 ClientCnxn [WARN] Session 0x1452dd028340015 for server null, unexpected error,
losing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-04-05 11:37:42 CuratorFrameworkImpl [ERROR] Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(Curat
rFrameworkImpl.java:380)
at com.netflix.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl
java:49)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:617)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
Do we need to downgrade zookeeper to 3.3.3 or is there a known issue/config that we're missing?
We also experienced several issues with Storm 0.9 and Zookeeper 3.4.X, even though not exactly the one you describe.
Storm mailing list are also reporting such incompatibility issues:
https://mail.google.com/mail/u/0/#search/label%3Astorm+zookeeper+3.4/144313a45ba069b5
https://mail.google.com/mail/u/0/#search/label%3Astorm+zookeeper+3.4/1447d95d10ce7582
This later one is pointing us to this Storm pull request, which should hopefully let us use ZK 3.4.X with future versions of Storm when it will be released:
https://github.com/apache/incubator-storm/pull/29
Until then, I would recommend downgrading ZK to 3.3.6 (you may install a specific separate instance of ZK for Storm if you absolutely need ZK 3.4.X for another system). You could also clone the Storm code and merge that pull request locally or compile the latest version of the trunk, but that's a bit adventurous and more tiresome than just waiting for those nice folks to just deliver a new release for us :)
A workaround for this situation is to clear storm's data directory (configured in strom.yaml==>storm.local.dir), then restart the supervisor. I did that in my test environment by clear storm's data directory and restart the nimbus and supervisor.
I think it's caused by a previous crash of the storm cluster, and the supervisor can not recovery from such a spot.