Unable to take backup by Cassy - scalardb

I have been trying to take backup using Cassy, but I could only get metadata backup.
It seems that there's no error-logs on Cassy and the status is "STARTED" on Cassy's BackupList.
Below is the steps I tried for deployment. Is there any lacking steps or something should be correct?
First, I created scalar DL, cassandra and envoy with git clone from below.
git clone https://github.com/scalar-labs/scalar-samples.git
I've chencked that it works and I can execute contract correctly.
Then I added Cassy container as below.
Add ssh to cassandra nodes.
Change commitlog_sync from periodic to batch.
Git clone from below
git clone https://github.com/scalar-labs/cassy
Edit cassy.properties file to add S3 infomation and other paths.
Create container using cloned Dockerfile.
cassy/build/docker/Dockerfile

I'm not fully sure about the environment you are working on.
Is Cassy master in your localhost and other components like Cassandra are in Docker ?
If that is the case, then I doubt that Cassy can connect Cassandra via JMX.
BTW, are there any logs in /var/log/scalar ?

I deployed all nodes on one docker EC2.
Cassandra and Cassy can connect using docker local network.
Also I checked if Cassandra is listening JMX.
172.21.0.2 is the cassandra container's docker network IP.
# netstat -tln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:9042 0.0.0.0:* LISTEN
tcp 0 0 172.21.0.2:7000 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.11:46715 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN
If I deploy cassy using "./gradlew installDist", I can get /var/log/scalar/cassy.log.
It shows only a below log.
2020-07-22 05:35:21.641 [pool-3-thread-1] INFO c.s.c.transferer.AwsS3FileUploader - Uploading /tmp/cassy.db.dump
But If I changed to deploy cassy using "./gradlew docker", there is no such log file on cassy container.

Related

awx docker compose phase fails

I am trying to install AWX as part of a docker container image following the guide here but it fails in the docker-compose step like so:
docker-compose -f tools/docker-compose/_sources/docker-compose.yml up --remove-orphans
tools_postgres_1 is up-to-date
tools_redis_1 is up-to-date
Starting tools_awx_1 ...
Starting tools_awx_1 ... error
ERROR: for tools_awx_1 Cannot start service awx_1: driver failed programming external connectivity on endpoint tools_awx_1 (5711eb4a73513b5bd3c27c990761000325f71ae48dbe1e6308d33bf07382415a): Error starting userland proxy: listen tcp4 0.0.0.0:8080: bind: address already in use
ERROR: for awx_1 Cannot start service awx_1: driver failed programming external connectivity on endpoint tools_awx_1 (5711eb4a73513b5bd3c27c990761000325f71ae48dbe1e6308d33bf07382415a): Error starting userland proxy: listen tcp4 0.0.0.0:8080: bind: address already in use
ERROR: Encountered errors while bringing up the project.
make: *** [Makefile:485: docker-compose] Error 1
When I scan for the user of port 8080 it shows this:
$ sudo netstat -pna | grep 8080
tcp6 0 0 :::8080 :::* LISTEN 733/java
This most likely appears to be related to Jenkins since I recollect having installed Jenkins with all default options and I use this port to reach to the Jenkins UI.
So what should be the alternative here without removing Jenkins or changing the port number for the Jenkins daemon (is it a daemon that runs Jenkins btw?) How can I change the endpoint for this AWX docker image? I'm rather new to docker and AWX, not to mention the devops ecosystem in general so would appreciate a bit of context around your suggestions.

Bind for 0.0.0.0:50000 failed: port is already allocated on MacOS

I initially ran jenkins in a docker container through my MacOS terminal successfully after running docker-compose up which generated the long admin password cypher. However after I restarted my machine, the setup vanished. But each time I run docker-compose up after exposing jenkins port 8080 on port 8082 and Jira port 50000 on port 200000 having tried exposing them externally on other ports previously, I keep getting the error below:
**Creating jenkins ... error
ERROR: for jenkins Cannot start service jenkins: driver failed programming external connectivity on endpoint jenkins (****************************************************): Bind for 0.0.0.0:20000 failed: port is already allocated
ERROR: for jenkins Cannot start service jenkins: driver failed programming external connectivity on endpoint jenkins (****************************************************): Bind for 0.0.0.0:20000 failed: port is already allocated**
I have stopped, killed and removed all containers, removed all images and pruned all networks, but nothing seems to work.
What's a way around this and how do I free up allocated ports?
You can find the process that is running on port 20000 using:
lsof:
lsof -nP -iTCP -sTCP:LISTEN | grep <port-number>
or
netstat:
netstat -anv | grep <port-number>
It is probably just an old process that stays as zombie. Just kill that process (you can use kill -9 <pid>) and try the same operation again.

VSCode Error: The terminal process failed to launch: Path to shell executable "/usr/bin/tmux" does not exist

After VScode installation, when I tried to open my integrated terminal window, each time I am getting error mentioned in the title.
I don't know what is the correct path to shell-executable. Before VSCode installation, the only change I done in terminal is I installed zsh in it.
System Details -
OS: Ubuntu 20.04 LTS
VSCode version: 1.53.2
I solved this issue by changing by default shell for vs-code with following steps:
Open settings-search in VSCode with Cntr + Shift + p
Search for default
clicked Terminal: Select Default Shell
clicked zsh /usr/bin/zsh, I selected zsh as I recently installed it and like to use it, you can use other terminal options as well.
Thank you.
I had the same issue. I resolved it by changing the path of "terminal.integrated.shell.linux" in the settings.json file.
Link : https://code.visualstudio.com/docs/supporting/troubleshoot-terminal-launch
The option "terminal.integrated.shell.linux" is deprecated.
The new VSCode config is (mine is 1.62.3 on Ubuntu):
"terminal.integrated.defaultProfile.linux": "zsh",
"terminal.integrated.profiles.linux": {
"zsh": {
"path": "/usr/bin/zsh",
}
},
If only "terminal.integrated.defaultProfile." is set, it can not find zsh.
everyone.
I had a similar issue, but I wasn't able to identify if something was using the port 5432.
That was the way that I solved the issue:
I used sudo ss -tulwn | grep LISTEN
➜ xxxx-xxxx-xxxxxx git:(main) sudo ss -tulwn | grep LISTEN
tcp LISTEN 0 244 127.0.0.1:5432 0.0.0.0:*
tcp LISTEN 0 511 127.0.0.1:35613 0.0.0.0:*
tcp LISTEN 0 100 127.0.0.1:49152 0.0.0.0:*
tcp LISTEN 0 128 127.0.0.1:30800 0.0.0.0:*
tcp LISTEN 0 128 127.0.0.1:30900 0.0.0.0:*
tcp LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:*
tcp LISTEN 0 5 127.0.0.1:631 0.0.0.0:*
tcp LISTEN 0 5 [::1]:631 [::]:*
docker container ls (just checking)
➜ xxxx-xxxx-xxxxxx git:(main) docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
docker ps -a (just checking)
➜ xxxx-xxxx-xxxxxx git:(main) docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
docker network prune
➜ xxxx-xxxx-xxxxxx git:(main) docker network prune
WARNING! This will remove all custom networks not used by at least one container.
After that, docker-compose down and docker-compose up --build works
And I was happy using docker... until now.
My next steps:
Configure container's environment for the first time
Run docker-compose run —rm web bash in a separate terminal pane/window. You should now be inside container, which is
indicated by showing you a hashtag before commands.
Run rails db:create within container.
Run rails db:migrate within container.
Run the environment seeds (if you want to)
Run rails db:seed.

How to make a TCP outgoing connection with Docker container?

My Go application makes TLS connections via tls.Dial() to exchange data.
It works fine when run from the host:
But the outgoing connection doesn't seem to work when the app is run from a Docker container. The app hangs indefinitely.
Note 1: Same behavior with using docker run -p $(docker-machine ip):2500:2500 ...
Note 2: VM doesn't have extra port forwarding settings other than the default settings that came with docker-machine's default VM.
Docker image build with Dockerfile:
FROM golang:latest
RUN mkdir -p "$GOPATH/src/path/to/app"
# Install dependencies
RUN go get github.com/path/to/dep
VOLUME "$GOPATH/src/path/to/app"
EXPOSE 2500
WORKDIR "$GOPATH/src/path/to/app"
CMD ["go", "run", "main.go"]
Host is OS X running docker-machine.
Question
How can I make the TCP outgoing connection to work?
You are either using boot2docker or docker-machine (since you are running docker on OSX). If you are using boot2docker, you have to forward the ports on VirtualBox as well as docker, have a look at this blog post:
https://fogstack.wordpress.com/2014/02/09/docker-on-osx-port-forwarding/
If you are using docker-machine, you have to connect to the docker-machine assigned ip, not localhost, have a look at this post:
https://github.com/docker/machine/issues/710
I see now that you are using docker-machine specifically, so the post about docker-machine should answer your question.
Edit: I misunderstood the question. You are trying to make an outgoing connection on a forwarded port. That is not correct. By default docker can make outgoing connections on any port. The port forwarding is for incoming connections only. Please try again without specifying any ports to forward. My suspicion is that you are trying to make an outgoing connection on the incoming (forwarded) port.
I've just had exactly the same problem. Was unable to connect out at all.
Restarted the container, and suddenly outgoing connections worked fine. It's possible that the container survived an update of docker?
Currently using Docker version 18.09.3, build 774a1f4

Zookeeper connection error

We have a standalone zookeeper setup on a dev machine. It works fine for every other dev machine except this one testdev machine.
We get this error over and over again when trying to connect to zookeeper through testdev:
2012-11-09 14:06:53,909 - INFO [main-SendThread(zk01.dev.bunchball.net:2181):ClientCnxn$SendThread#947] - Socket connection established to zk01.dev.bunchball.net/192.168.8.58:2181, initiating session
2012-11-09 14:06:53,911 - INFO [main-SendThread(zk01.dev.bunchball.net:2181):ClientCnxn$SendThread#1183] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2012-11-09 14:06:55,366 - INFO [main-SendThread(zk01.dev.bunchball.net:2181):ClientCnxn$SendThread#1058] - Opening socket connection to server zk01.dev.bunchball.net/192.168.8.58:2181
2012-11-09 14:06:55,368 - INFO [main-SendThread(zk01.dev.bunchball.net:2181):ClientCnxn$SendThread#947] - Socket connection established to zk01.dev.bunchball.net/192.168.8.58:2181, initiating session
2012-11-09 14:06:55,368 - INFO [main-SendThread(zk01.dev.bunchball.net:2181):ClientCnxn$SendThread#1183] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2012-11-09 14:06:57,271 - INFO [main-SendThread(zk01.dev.bunchball.net:2181):ClientCnxn$SendThread#1058] - Opening socket connection to server zk01.dev.bunchball.net/192.168.8.58:2181
2012-11-09 14:06:57,274 - INFO [main-SendThread(zk01.dev.bunchball.net:2181):ClientCnxn$SendThread#947] - Socket connection established to zk01.dev.bunchball.net/192.168.8.58:2181, initiating session
2012-11-09 14:06:57,275 - INFO [main-SendThread(zk01.dev.bunchball.net:2181):ClientCnxn$SendThread#1183] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
We tried restarting the test dev machine, and also restart zookeeper host but nothing worked. We are totally confused why it works perfectly fine with other machines except this one. What might be the cause of this?
I just have the same situation as you and I have just fixed this problem.
It is the reason that you have configured even number of zookeepers which directly result in
this problem,try to change your number of zookeeper node to a odd one.
for example the original status of my zookeeper cluster is consisted of 4 nodes,then just remove one of them which result in the number of node to be 3
well ,now its ok to startup zookeeper cluster
below is the output of successfully connect to zookeeper server
2013-04-22 22:07:05,654 [myid:] - INFO [main:ZooKeeper#438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher#1321ed6
Welcome to ZooKeeper!
2013-04-22 22:07:05,704 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread#966] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2013-04-22 22:07:05,727 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread#849] - Socket connection established to localhost/127.0.0.1:2181, initiating session
[zk: localhost:2181(CONNECTING) 0] 2013-04-22 22:07:05,846 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread#1207] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13e3211c06e0000, negotiated timeout = 30000
I faced the same issue and found it was due to zookeeper cluster nodes needs ports opened to communicate with each other.
server.1=xx.xx.xx.xx:2888:3888
server.2=xx.xx.xx.xx:2888:3888
server.3=xx.xx.xx.xx:2888:3888
once i allowed these ports through aws security group and restarted. All worked fine for me
I have just solved the problem. I am using centos 7. And the trouble-maker is firewall.Using "systemctl stop firewalld" to shut it all down in each server can simply solve the problem.Or you can use command like
firewall-cmd --zone=public --add-port=2181/udp --add-port=2181/tcp --permanent" to configure all three ports ,include 2181,2888,3888 in each server.And then "firewall-cmd --reload
Finally use
zkServer.sh restart
to restart your servers and problem solved.
In my case, I config zoo.cfg like this:
server.1=host-1:2888:3888
server.2=host-2:2888:3888
server.3=host-3:2888:3888
But, in host-1, I config host-1 resolve to 127.0.0.1 in /etc/hosts:
127.0.0.1 localhost host-1
which may results other hosts can't communicate with it. Resolve host-1 to its real ip solved this problem.
Hope this can help.
Had the same error during setup on a 2 node cluster. I discovered I had mixed up the contents of the myid file versus the server.id=HOST_IP:port entry.
Essentially, if you have two servers (SERVER1 and SERVER2) for which you have created "myid" files in dataDir for zookeeper as below
SERVER1 (myid)
1
SERVER2 (myid)
2
Ensure the entry in your zoo.cfg file corresponds for each of these i.e server.1 should use SERVER1 hostname and server.2 should use SERVER2 hostname followed by the port as below
SERVER1 (zoo.cfg)
... (other config omitted)
server.1=SERVER1:2888:3888
server.2=SERVER2:2888:3888
SERVER2 (zoo.cfg)
... (other config omitted)
server.1=SERVER1:2888:3888
server.2=SERVER2:2888:3888
Just to make sure, I also deleted the version-* folder in the dataDir then restarted Zookeeper to get it working.
I had this problem too, and it turned out that I was telling zookeeper to connect to the wrong port. Have you verified that zookeeper is actually running on port 2181 on the dev machine?
I just have the same situation as you and I have just fixed this problem.
my conf/zoo.cfg just like this:
server.1=10.194.236.32:2888:3888
server.2=10.194.236.33:2888:3888
server.3=10.208.177.15:2888:3888
server.4=10.210.154.23:2888:3888
server.5=10.210.154.22:2888:3888
then i set data/myid file content like this:
1 //at host 10.194.236.32
2 //at host 10.194.236.33
3 //at host 10.208.177.15
4 //at host 10.210.154.23
5 //at host 10.210.154.22
finally restart zookeeper
I was having the same error when I was trying to connect my broker with my Zookeeper ensemble using A records to point to Zookeeper IPs. The problem was in my zookeepers. My zookeepers were not able to bind to port 2181 because I was pointing my A records to public IP. This was preventing the zookeeper ensemble to choose a leader and communicate with each other. Pointing A records to private IP enabled the zookeeper ensemble to choose a leader and the cluster became active. After this when I tried to connect one of my brokers to the ensemble, it connected successfully.
Also check the local firewall,
service firewalld status
If it's running, just stop it
service firewalld stop
And then give it a try.
I had this problem too, and I found that I just need to restart zookeeper, then restart tomcat so my webapp connected nicely then
I was able to start with zookeeper and kafka having 2 nodes each.
I got the error because i had started zookeeper with ./zkServer.sh instead of the kafka wrapper
bin/zookeeper-server-start.sh config/zookeeper.properties
Make sure all required services are running
Step 1 : Check if hbase-master is running
sudo /etc/init.d/hbase-master status
if not, then start it sudo /etc/init.d/hbase-master start
Step 2 : Check if hbase-regionserver is running
sudo /etc/init.d/hbase-regionserver status
if not, then start it sudo /etc/init.d/hbase-regionserver start
Step 3 : Check if zookeeper-server is running
sudo /etc/init.d/zookeeper-server status
if not, then start it sudo /etc/init.d/zookeeper-server start
or simply run these 3 commands in a row.
sudo /etc/init.d/hbase-master restart
sudo /etc/init.d/hbase-regionserver restart
sudo /etc/init.d/zookeeper-server restart
after that don't forget to check the status
sudo /etc/init.d/hbase-master status
sudo /etc/init.d/hbase-regionserver status
sudo /etc/init.d/zookeeper-server status
You might find that zookeeper is still not running:
then you can run the zookeeper
sudo /usr/lib/zookeeper/bin/zkServer.sh stop
sudo /usr/lib/zookeeper/bin/zkServer.sh start
after that again check the status and make sure its running
sudo /etc/init.d/zookeeper-server status
This should work.
I start standalone instance in my machine, and encounter the same problem. Finally, I change from ip "127.0.0.1" to "localhost" and the problem is gone.
Check the zookeeper logs (/var/log/zookeeper). It looks like a connection is established, which should mean there is a record of it.
I had the same situation and it was because a process opened connections and failed to close them. This eventually exceeded the per-host connection limit and my logs were overflowing with
2016-08-03 15:21:13,201 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#188] - Too many connections from /172.31.38.64 - max is 50
Assuming zookeeper is on the usual port, you could do a check for that with:
lsof -i -P | grep 2181
This can happen if there are too many open connections.
Try increasing the maxClientCnxns setting.
From documentation:
maxClientCnxns
(No Java system property)
Limits the number of concurrent connections (at the socket level) that a single client, identified by IP address, may make to a single member of the ZooKeeper ensemble. This is used to prevent certain classes of DoS attacks, including file descriptor exhaustion. Setting this to 0 or omitting it entirely removes the limit on concurrent connections.
You can edit settings in the config file. Most likely it can be found at /etc/zookeeper/conf/zoo.cfg.
In modern ZooKeeper versions default value is 60. You can increase it by adding the maxClientCnxns=4096 line to the end of the config file.
I also ran into this problem last week and have managed to fix this now. I got the idea to resolve this one from the response shared by #gukoff.
My requirement and situation was slightly different from the ones shared so far but the issue was fundamentally the same so I thought of sharing it on this thread.
I was actually trying to query zookeeper quorum (after every 30 seconds) for some information from my application and was using the Curator Framework for this purpose (the methods available in LeaderLatch class). So, essentially I was starting up a CuratorFramework client and supplying this to LeaderLatch object.
Only after I ran into the error mentioned in this thread - I realised that I did not close the zookeeper client connection(s) established in my applications. The maxClientCnxns property had the value of 60 and as soon as the number of connections (all of them were stale connections) touched 60, my application started complaining with this error.
I found out about the number of open connections by:
Checking the zookeeper logs, where there were warning messages stating "Too many connections from {IP address of the host}"
Running the following netstat command from the same host mentioned in the above logs where my application was running:
netstat -no | grep :2181 | wc -l
Note: The 2181 port is the default for zookeeper supplied as a parameter in grep to match the zookeeper connections.
To fix this, I cleared up all of those stale connections manually and then added the code for closing the zookeeper client connections gracefully in my application.
I hope this helps!
I encountered same problem ,too. In my case the problem is about iptables rules.
To communicate with zookeeper node, 2181 port must be accept incoming request, also for internal communication between zookeeper nodes 2888,3888 ports must be open for incoming request.
iptables -t nat -I PREROUTING -p tcp -s 10.0.0.0/24 --dport 2181 -j DNAT --to-destination serverIp:2181
iptables -t nat -I PREROUTING -p udp -s 10.0.0.0/24 --dport 2181 -j DNAT --to-destination serverIp:2181
iptables -t nat -I PREROUTING -p tcp -s 10.0.0.0/24 --dport 2888 -j DNAT --to-destination serverIp:2888
iptables -t nat -I PREROUTING -p udp -s 10.0.0.0/24 --dport 2888 -j DNAT --to-destination serverIp:2888
iptables -t nat -I PREROUTING -p tcp -s 10.0.0.0/24 --dport 3888 -j DNAT --to-destination serverIp:3888
iptables -t nat -I PREROUTING -p udp -s 10.0.0.0/24 --dport 3888 -j DNAT --to-destination serverIp:3888
sudo service iptables save
This is a common issue if the Zookeeper server is not running or no longer running (i.e. it crashed after you started it).
So first, check that you have the Zookeeper server running. A simple way to check is grep the running processes:
# ps -ef | grep zookeeper
(run this a couple of times to see if the same process ID is still there. its possible it keep restarting with a new process ID. Alternatively you can use 'systemctl status zookeeper' if your Linux distro support systemd)
You should see the process running as a java process:
# ps -ef | grep zookeeper
root 492 0 0 00:01 pts/1 00:00:00 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /root/zookeeper-3.5.0-alpha/bin/../build/classes:/root/zookeeper-3.5.0-alpha/bin/../build/lib/*.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/slf4j-log4j12-1.7.5.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/slf4j-api-1.7.5.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/servlet-api-2.5-20081211.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/netty-3.7.0.Final.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/log4j-1.2.16.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/jline-2.11.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/jetty-util-6.1.26.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/jetty-6.1.26.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/javacc.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/jackson-mapper-asl-1.9.11.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/jackson-core-asl-1.9.11.jar:/root/zookeeper-3.5.0-alpha/bin/../lib/commons-cli-1.2.jar:/root/zookeeper-3.5.0-alpha/bin/../zookeeper-3.5.0-alpha.jar:/root/zookeeper-3.5.0-alpha/bin/../src/java/lib/*.jar:/root/zookeeper-3.5.0-alpha/bin/../conf: -Xmx1000m -Xmx1000m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /root/zookeeper-3.5.0-alpha/bin/../conf/zoo.cfg
If its not there, then there's likely something in the zookeeper log file indicating the issue.
To find the zookeeper log file, you should first figure out where its configured for logging. In my case I have zookeeper installed under my root directory (not suggesting you install it there):
[root#centos6_zookeeper conf]# pwd
/root/zookeeper-3.5.0-alpha/conf
And you can find the log setting in this file:
[root#centos6_zookeeper conf]# grep "zookeeper.log" log4j.properties
zookeeper.log.dir=/var/log
zookeeper.log.file=zookeeper.log
zookeeper.log.threshold=INFO
zookeeper.log.maxfilesize=256MB
zookeeper.log.maxbackupindex=20
So Zookeeper is configured to log under /var/log.
Then there's usually a zookeeper.log and/or zookeeper.out file which should indicate your startup error.
Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
I changed just the number of brokers in the zoo.cfg file and restart zookeeper and kafka service
I also get the same error when i started my replicated zk, one of zkClient can not connect to localhost:2181, i checked the log file under apache-zookeeper-3.5.5-bin/logs directory, and found this:
2019-08-20 11:30:39,763 [myid:5] - WARN
[QuorumPeermyid=5(secure=disabled):QuorumCnxManager#677]
- Cannot open channel to 3 at election address /xxxx:3888 java.net.SocketTimeoutException: connect timed out at
java.net.PlainSocketImpl.socketConnect(Native Method) at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at
java.net.Socket.connect(Socket.java:589) at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:648)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:705)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:733)
at
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:910)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1247)
2019-08-20 11:30:44,768 [myid:5] - WARN
[QuorumPeermyid=5(secure=disabled):QuorumCnxManager#677]
- Cannot open channel to 4 at election address /xxxxxx:3888 java.net.SocketTimeoutException: connect timed out at
java.net.PlainSocketImpl.socketConnect(Native Method) at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at
java.net.Socket.connect(Socket.java:589) at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:648)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:705)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:733)
at
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:910)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1247)
2019-08-20 11:30:44,769 [myid:5] - INFO
[QuorumPeermyid=5(secure=disabled):FastLeaderElection#919]
- Notification time out: 51200
that means this zk server can not connect to other servers, and i found this server ping other servers fail, and after remove this server from the replica, the problem is solved.
hope this will be helpful.
This can happen despite the ZooKeeper servers being up and running and the socket open and accepting connections, if one or more of the ZooKeeper disks are out of space. This can easily happen if the old ZK snapshot and log files are never cleaned up:
The ZooKeeper server creates snapshot and log files, but never deletes them. The retention policy of the data and log files is implemented outside of the ZooKeeper server. The server itself only needs the latest complete fuzzy snapshot, all log files following it, and the last log file preceding it. The latter requirement is necessary to include updates which happened after this snapshot was started but went into the existing log file at that time. This is possible because snapshotting and rolling over of logs proceed somewhat independently in ZooKeeper. See the maintenance section in this document for more details on setting a retention policy and maintenance of ZooKeeper storage.
There is a maintenance job that can be run to clean up old snapshot and log files: See https://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_maintenance.
leave only one entry for your host IP in /etc/hosts file, it resolved.
I got the same problem and I am using windows 10. After adding following mentioned lines in to my zookeeper properties file my problem was fixed.
tickTime=2000
initLimit=5
syncLimit=2
Just now I solved the same question and post a blog.
In brief, if xx's zoo.cfg like:
server.1=xx:2888:3888
server.2=yy:2888:3888
server.3=zz:2888:3888
then xx's myid=1 is must