How to create two Aerospike Clusters on same L2 network - redhat

I am using two aerospike clusters(each with one node/machine only).
Since both machine are on same LAN, they try to connect each other trying to form single cluster. Because of this I was getting error(while inserting record):
Error: (11) AEROSPIKE_ERR_CLUSTER
So on my ubuntu setup(one of two machines) I blocked port 9918 using cmd:
ufw block 9918
After block cmd, aerospike clusters started working(I was able to insert record).
Whats better way to avoid two Aerospike machines on same LAN to not communicate with each other ?

Just make sure to change the multicast address and/or port in the heartbeat configuration so the 2 nodes don't try to send heartbeat to each other.
heartbeat {
mode multicast # Send heartbeats using Multicast
address 239.1.99.2 # multicast address
port 9918 # multicast port
interval 150 # Number of milliseconds between heartbeats
timeout 10 # Number of heartbeat intervals to wait
# before timing out a node
}
Alternatively, you can also switch to mode mesh and have only the node itself in the mesh-see-address-port list:
heartbeat {
mode mesh # Send heartbeats using Mesh (Unicast) protocol
port 3002 # port on which this node is listening to
# heartbeat
mesh-seed-address-port 192.168.1.100 3002 # IP address for seed node in the cluster
# This IP happens to be the local node
interval 150 # Number of milliseconds between heartbeats
timeout 10 # Number of heartbeat intervals to wait before
# timing out a node
}

Related

In jmeter SEVERE: No buffer space available (maximum connections reached?): connect exception

In jemeter i am testing for 100000 MQTT concurrent user with ramp up of 10000 and loop count is 1.
The library that I am using for MQTT in Jmeter is https://github.com/emqx/mqtt-jmeter . But I am getting
SEVERE: No buffer space available (maximum connections reached?): connect exception after reaching 64378.
Specification:
OS: Windows 10
Ram : 64 GB
CPU : i7
Configuration in registry editor:
This is due to the windows having too many active client connections.
The default number of ephemeral TCP ports is 5000. Sometimes this number may be insufficient if the server has too many active client connections. In that case the ephemeral TCP ports are all used up and no more can be allocated to a new client connection request resulting in the error message (for a Java application)
You should specify TCP / IP settings by editing the following registry values ​​in the HKEY_LOCAL_MACHINE \ SYSTEM \ CurrentControlSet \ Services \ Tcpip \ Parameters registry subkey:
MaxUserPort
Specifies the maximum port number for ephemeral TCP ports.
TcpNumConnections
Specifies the maximum number of concurrent connections that TCP can open. This value significantly affects the number of concurrent osh.exe processes that are allowed. If the value for TcpNumConnections is too low, Windows can not assign TCP ports to stages in parallel jobs, and the parallel jobs can not run.
These keys are not added to the registry by default.
Follow this link to Configuring the Windows registry: Specifying TCP / IP settings and made necessary edit.
Hope this will help.

MongoDB nodes (AWS EC2 Instances) are still responsive even after network partitioning done using Security Groups

I have created a MongoDB replica set using 5 EC2 instances on AWS. I added the nodes using rs.add("[IP_Address]") command.
I want to perform network partition in the replica set. In order to that, I have specified 2 kinds of security groups. 'SG1' has 27017 port (MongoDB port) opened. 'SG2' doesn't expose 27017.
I want to isolate 2 nodes from the replica set. When I apply SG2 on these 2 nodes (EC2 instances), ideally they should stop getting write and read from the primary as I am blocking the 27017 port using security group SG2. But in my case, they are still writable. Data written on Primary reflects on the partitioned node. Can someone help? TYA.
Most firewalls, including AWS Security groups, will block incoming connections when the connection is being opened. Changing settings will affect all new connection, but existing open connections are not re-evaluated when they are applied.
MongoDB maintains connections between hosts and that would only get blocked after loss of connection between the hosts.
On Linux you can restart the networking which will reset the connections. You can do this after applying the new rules by running:
/etc/init.d/networking stop && /etc/init.d/networking start

Linux Kernel parameters which can be tuned when TCP backlog exceeds in WebSphere MQ server

We are facing an issue where we see TCP backlogs gets exceeded than default value (100) on our MQ server (v7.5) running on Linux (Redhat) platform during high number of connection requests on MQ server. The ListenerBacklog is configured as 100 in qm.ini which is the default listener backlog value (maximum connection requests) for Linux. Whenever we have connections burst and TCP backlogs exceeds the queue manager stops functioning and resumes only when queue manager/server is restarted.
So we are looking whether there are attributes in Linux kernel related to socket tuning which can improve tcp backlog at network layer and cause no harm to queue manager.Does increasing these values as below in /etc/sysctl.conf will help to resolve this issue or improve performance of queue manager?
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 1024
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Amazon EC2 Elastic Load Balancer TCP disconnect after couple of hours

I am testing the reliability of TCP connections using Amazon Elastic Load Balancer compared to not using the Load Balancer to see if it has any impact.
I have setup a small Elastic Load Balancer on Amazon EC2 us-east zones with 8 t2.micro instances using an auto scaling group without policy and set to 8 min/max instance.
Each instance run a simple TCP server that accept connections on port 8017 and relay some data to the clients coming from another remote server located in my network. The same data is send to all clients.
For the purpose of the test, the servers running on the micro instances are only sending 1 byte of data every 60 seconds (to be sure the connection don't time out).
I connected multiple clients from various outside networks using the ELB DNS name provided, and after maybe 6-24 hours, I always stop receiving data and eventually the connections all die.
All clients stops around the same time, even though they are on different network/ISP. Each "client" application is doing about 10 TCP connections and they all stop receiving data.
All server instances look fine after this happen, they still send data.
To do further testing and eliminate the TCP server code problem, I also have external clients connected directly to the public IP of a single instance, without the ELB, and the data doesn't stop and the connection is not lost in this case (so far).
The Load balancer Idle Timeout is set to 900 seconds.
The Cross-Zone load balancing is enabled and I am using the following zones: us-east-1e, us-east-1b, us-east-1c, us-east-1d
I read the documentation, and searched everywhere to see if this is a known behaviour, but I couldn't find any clear answer or confirmation of others having the same issue, but it seems clear it is happening in my case.
My question: Is this a known/expected behaviour for TCP load balancer? Otherwise, any idea what could be the problem in my setup?

how to get current zookeeper cluster's member server list

I want to get the member server list and their type(Leader or observer) in my java application.
And also want to get the dead server.
Is their any way to do that? I read the document, but didn't find.
It would be nice if there were a built-in answer for this without resorting to JMX. If you are on one of the zookeeper nodes, you can read the zoo.cfg file to get the list of servers (dead and alive ones) and then "stat" each one individually to see if it's alive and what its status is (note the "Mode" attribute on a successful response). E.g.:
$ echo stat | nc 127.0.0.1 2181
Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT
Clients:
/127.0.0.1:54752[1](queued=0,recved=215524,sent=215524)
/127.0.0.1:59298[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/6
Received: 5596
Sent: 5596
Connections: 2
Outstanding: 0
Zxid: 0x10000010f
Mode: leader
Node count: 54
Note that "stat" does not show you the other members of the zookeeper ensemble--it only shows you the connected clients.
Zookeeper exposes this information over jmx.
It can also be query sending "stat" command using direct connection to port 2181.
For an example of how to do that from python see:
https://github.com/apache/zookeeper/blob/765cedb5c65526384011ea958e59938fc7493168/src/contrib/huebrowser/zkui/src/zkui/stats.py