how to get current zookeeper cluster's member server list - apache-zookeeper

I want to get the member server list and their type(Leader or observer) in my java application.
And also want to get the dead server.
Is their any way to do that? I read the document, but didn't find.

It would be nice if there were a built-in answer for this without resorting to JMX. If you are on one of the zookeeper nodes, you can read the zoo.cfg file to get the list of servers (dead and alive ones) and then "stat" each one individually to see if it's alive and what its status is (note the "Mode" attribute on a successful response). E.g.:
$ echo stat | nc 127.0.0.1 2181
Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT
Clients:
/127.0.0.1:54752[1](queued=0,recved=215524,sent=215524)
/127.0.0.1:59298[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/6
Received: 5596
Sent: 5596
Connections: 2
Outstanding: 0
Zxid: 0x10000010f
Mode: leader
Node count: 54
Note that "stat" does not show you the other members of the zookeeper ensemble--it only shows you the connected clients.

Zookeeper exposes this information over jmx.

It can also be query sending "stat" command using direct connection to port 2181.
For an example of how to do that from python see:
https://github.com/apache/zookeeper/blob/765cedb5c65526384011ea958e59938fc7493168/src/contrib/huebrowser/zkui/src/zkui/stats.py

Related

docker swarm - connections from wildfly to postgres randomly hang

I'm experiencing a weird problem when deploying a docker stack (compose file).
I have a three node docker swarm - master and two workers.
All machines are CentOS 7.5 with kernel 3.10.0 and docker 18.03.1-ce.
Most things run on the master, one of which is a wildfly (v9.x) application server.
On one of the workers is a postgres database.
After deploying the stack things work normally, but after a while (or maybe after a specific action in the web app) request start to hang.
Running netstat -ntp inside the wildfly container shows 52 bytes stuck in the Send-q:
tcp 0 52 10.0.0.72:59338 10.0.0.37:5432 ESTABLISHED -
On the postgres side the connection is also in ESTABLISHED state, but the send and receive queues are 0.
It's always exactly 52 bytes. I read somewhere that ACK packets with timestamps are also 52 bytes. Is there any way I can verify that?
We have the following sysctl tunables set:
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_timestamps = 0
The first three were needed because of this.
All services in the stack are connected to the same default network that docker creates.
Now if I move the postgres service to be on the same host as the wildfly service the problem doesn't seem to surface or if I declare a separate network for postgres and add it only to the services that need the database (and the database of course) the problem also doesn't seem to show.
Has anyone come across a similar issue? Can anyone provide any pointers on how I can debug the problem further?
Turns out this is a known issue with pooled connections in swarm with services on different nodes.
Basically the workaround is to set the above tuneables + enable tcp keepalive on the socket. See here and here for more details.

Multiple a record for failover

I have 5 mongos server at amazon:
192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.4
192.168.1.5
and 2 HAProxy servers for load balancing:
192.168.1.6
192.168.1.7
My domain is registered at: namecheap.com, let call it domain.com.
1)
can i point database.domain.com to both HAProxy servers?
if yes how?
2)
if HAProxy server: 192.168.1.6 fails will 192.168.1.7 take over?
3)
Can i control the timeout of the records?
Please explain to me how things work and how to make it work like i want.
I'm trying to understand how such system is setup for failover. I'm seeking
knowledge and not humiliation so either you try to help or dont do anything
please.
Anna stay positive, we all are learning from each other. Well you need to create a replicaset of all of your mongodb servers. Replica set is a mongodb answer of handling failover.
Please see https://docs.mongodb.org/manual/replication/
To connect mongodb, you don't need any proxy servers. just point directly to mongodb primary. Depending on your application, mongodb connection string can look slightly different. Normally, it should be something like:
mongodb://192.168.1.1,192.168.1.2,192.168.1.3,192.168.1.4,192.168.1.5/?replicaSet=<replicaset_name>
See https://docs.mongodb.org/manual/reference/connection-string/
Just in further to #Saleem's answer, by way of explanation into DNS, multiple A records in DNS don't act as a failover, rather they act more like a load balancer in that your upstream DNS server will request the A record and will select one of the A records listed to return to you, and this record may change each time the time to live expires and your DNS server has to re-request the A record.
Some DNS servers are smart enough to request a new A record if the provided one doesn't work and so that gives you a certain level of psuedo-redundancy, but most do not have this feature enabled.
(source: Using DNS for failover using multiple A records)

cannot start up pepProxy service on FIWARE orion context broker instance

I have created an orion context broker instance (FIWARE cloud portal image) that seems like with pepProxy installed. When I run "service pepProxy start" here is the feedback from the terminal:
Starting...
pepProxy dead but pid file exists
Starting pepProxy... Success
When check the status with "service pepProxy status", it says:
pepProxy dead but pid file exists
What can be done?
It seems something is preventing the PEP Proxy from starting. Have you checked "/var/log/pepProxy"? Please, also check what port the PEP Proxy is trying to bind to (usually 1026) and whether there is any other process actually running on that port (maybe the Context Broker is already running in that standard port).
In case the problem is a port conflict, you should change the Context Broker port in /etc/sysconfig/contextBroker or the one of the PEP Proxy in /etc/sysconfig/pepProxy.
If that's not the problem we would need some more information in order to help you.

Haproxy continue to route sessions to a backend marked as down

I'm using HaProxy 1.5.0 in front of a 3-node MariaDB cluster.
HaProxy checks with a custom query/xinet service that each DB node has a synced status.
When for some reason the check fails (the node for instance gets desynced or becomes donor), the corresponding backend in haproxy is marked down, but I can still see active sessions on it in haproxy statistics console, and queries in DB process list (this is possible because MariaDB service is still up and accepts queries, even though the cluster status is not synced).
I was wondering why HaProxy does not close active connections when a backend becomes down, and dispatch them to other active backends?
I get this expected behaviour when the MariaDB service is fully stopped on a given node (no session possible).
Is there a specific option to allow this? Option redispatch seemed promising but it applies when connections are closed (not in my case) and it's already active in my config.
Thanks for your help.
Here are the settings we're using to get the same behavior:
default-server port 9200 [snip] on-marked-down shutdown-sessions
The on-marked-down shutdown-sessions option, that tells HAProxy to close all connections to the backend server when it is marked as down.
Of course, you can add it to every individual server if you're not using a default-server directive :)

jboss clustering GMS, join

I have jboss 5.1.0.
we have configured jboss somehow using clustering, but in fact we do not use clustering while developing or testing. But in order to launch the project i have to type the following:
./run.sh -c all -g uniqueclustername
-b 0.0.0.0 -Djboss.messaging.ServerPeerID=1 -Djboss.service.binding.set=ports-01
but while jboss starting i able to see something like this in the console:
17:24:45,149 WARN [GMS]
join(172.24.224.7:60519) sent to
172.24.224.2:61247 timed out (after 3000 ms), retrying 17:24:48,170 WARN
[GMS] join(172.24.224.7:60519) sent to
172.24.224.2:61247 timed out (after 3000 ms), retrying 17:24:51,172 WARN
[GMS] join(172.24.224.7:60519)
here 172.24.224.7 it is my local IP
though 172.24.224.2 other IP of other developer in our room (and jboss there is stoped).
So, it tries to join to the other node or something. (i'm not very familiar how jboss acts in clusters). And as a result the application are not starting.
What may be the problem in? how to avoid this joining ?
You can probably fix this by specifying
-Djgroups.udp.ip_ttl=0
in your startup. This Sets the IP time-to-live on the JGroups packets to zero, so they never get anywhere, and the cluster will never form. We use this in dev here to stop the various developer machines from forming a cluster. There's no need to specify a unique cluster name.
I'm assuming you need to do clustering in production, is that right? Could you just use the default configuration instead of all? This would remove the clustering stuff altogether.
while setting up the server, keeping the host name = localhost and --host=localhost instead of ip address will solve the problem. That makes the server to start in non clustered mode.