how to start two jboss eap 6.4 on the same machine - jboss

I want to start two or more jboss eap 6.4 in the domain, but when I started the second domain I got this warning:
[Server:server-one] 15:34:35,606 WARN [org.hornetq.core.client]
(hornetq-discovery-group-thread-dg-group1) HQ212034: There are more
than one servers on the network broadcasting the same node id. You
will see this message exactly once (per node) if a node is restarted,
in which case it can be safely ignored. But if it is logged
continuously it means you really do have more than one node on the
same network active concurrently with the same node id. This could
occur if you have a backup node active at the same time as its live
node. nodeID=14bdbf74-f56c-11e4-a65f-738aa3641190
I cannot get this to work.

You must have copied the node from other node. Delete messagingjournal from data directory from all the nodes and restart all the nodes again.

Related

Fail back from slave to master in Artemis colocated HA config is not working

I have a 4 node Artemis 2.10 cluster on Linux configured for async IO journal, replication and colocated HA servers. I have been testing the failover and fail back but its not working. When shutting down one server (A) in an HA pair the colocated backup on the second server (B) will activate and correctly processes messages intended for the original server A. I modified the ColocatedFailoverExample from Artemis examples to check this and it is working. The problem is that when I bring up the original server A it starts, becomes live, registers acceptors and addresses and joins the cluster but a number of things are wrong:
looking at the artemis console for server A there is no longer a colocated_backup_1 listed to show that it is providing a colocated backup to server B.
Server A coming back up causes the server that was failed over to, server B, to go totally offline and only function as a backup. The master it was providing stops and no longer displays addresses or accepters in the UI.
Although it says its running as a backup Server B also doesn't have the colocated_backup_1 object shown in its console either.
Server B seems to be part of the cluster still but in the UI there is no green master node shown for it anymore - just a red slave node circle. Client connections to server B fail, most likely because the colocated master that was running on it was shutdown.
In the Artemis UI for server B under node-name > cluster-connections > cluster-name the attributes for the cluster show no nodes in the node array and the node id is wrong. The node id is now the same as the id of the master broker on server A. Its almost as if the information for the colocated_backup_01 broker on server B that was running before failover has replaced the server B live server and there's now only one broker on server B - the colocated backup.
This all happens immediately when I bring up server A. The UI for server B immediately refreshes at that time and the colocated_backup_01 entry disappears and the acceptors and addresses links under what was the master broker name for server B just disappear. The cluster attributes page will refresh and the 3 nodes that were listed there in the "nodes" attribute disappear and the "nodes" attribute is empty.
Now if I take down server B instead and bring it up, the roles between the server pair are swapped. Now server B becomes live again and is shown as a master node in the topology (but still no colocated_backup_01 in the UI) and the server A master broker goes offline and server A reconfigures as a backup/slave node. Whether server A or B is in this "offline", broker backup-only state the value of the Node property in the cluster attributes shown in the UI is the same value for both. Prior to doing the failover test they had different node ids which makes sense but the colocated_backup_01 backup on each did share the node id of the node it was backing up.
To summarize what I think is happening: The master that is coming backup after failover seems to trigger its partner backup node to come up as a backup but to also stop being a master node itself. From that point the pair colocation stops and there is only ever one live master between the two servers instead of one on each. The fail-back feature seems to be not only failing the original master back but shutting down the colocated master on that backup as well. Almost as if the topology between the two was configured to be colocated and its treating it a the standard two-node HA config where one server is the master and one is the slave.
The only way to fix the issue with the pair is to stop both servers and
remove everything under the broker "data" directory on both
boxes before starting them again. Just removing the colocated backup files on each machine isn't enough - everything under "data" has to go. After doing this they come up correctly and both are live masters and they pair up as HA colocated backups for each other again.
Here is the ha-policy section of the broker.xml file which is the same for all 4 servers:
<ha-policy>
<replication>
<colocated>
<backup-request-retries>5</backup-request-retries>
<backup-request-retry-interval>5000</backup-request-retry-interval>
<max-backups>1</max-backups>
<request-backup>true</request-backup>
<backup-port-offset>20</backup-port-offset>
<master>
<check-for-live-server>true</check-for-live-server>
<vote-on-replication-failure>true</vote-on-replication-failure>
</master>
<slave>
<max-saved-replicated-journals-size>-1</max-saved-replicated-journals-size>
<allow-failback>true</allow-failback>
<vote-on-replication-failure>true</vote-on-replication-failure>
<restart-backup>false</restart-backup>
</slave>
</colocated>
</replication>
</ha-policy>

JBOSS EAP 6.4 Unable to load topology

in my Jboss web console, the topology view in Domain tab is empty, I don`t know why. Everything is up and running, I just can't see the domain topology in Jboss console. "Unable to load topology"
I just got the "Unable to load topology" error today.
I have separate multinode jboss domain configurations in my env.
one is 6 nodes and one is 3 nodes. all running 6.4.22.GA
The error came up for me when we were switching ldap user authentication hosts and attempting to do that while leaving the servers up/running as much as possible.
When the domain node was changed to the new ldap server and brought back up we got the topology error.
fix was to bounce jbossas-domain on the other nodes and point to the new ldap server. After we did that the jboss console was able to display the topology again.
In short my solution was to make sure all the nodes in the jboss domain had the same configuration and then bounce them.

Prevent deployment to entry node, only deploy to other nodes

I have a free OpenShift account with the default 3 gears. On this I have installed the WildFly 8.1 image using the OpenShift web console. I set the minimal and maximal scaling to 3.
What happens now is that OpenShift will create 3 JBoss WildFly instances:
One on the entry node (which is also running HAProxy)
One on an auxiliary node
One on another auxiliary node
The weird thing is that the JBoss WildFly instance on the entry node is by default disabled in the load balancer config (haproxy.conf). BUT, OpenShift is still deploying the war archive to it whenever I commit in the associated git repo.
What's extra problematic here is that because of the incredibly low number of max user processes (250 via ulimit -u), this JBoss WildFly instance on the entry node cannot even startup. During startup JBoss WildFly will throw random 'java.lang.OutOfMemoryError: unable to create new native thread' (and no, memory is fine, it's the OS process limit).
As a result, the deployment process will hang.
So to summarize:
A JBoss WildFly instance is created on the entry node, but disabled in the load balancer
JBoss WildFly in its default configuration cannot startup on the entry node, not even with a trivial war.
The deployer process attempts to deploy to JBoss WildFly on the entry node, despite it being disabled in the load balancer
Now my question:
How can I modify the deployer process (including the gear start command) to not attempt to deploy to the JBoss WildFly instance on the entry node?
When an app scales from 2 gears to 3, HAproxy stops routing traffic to your application on the headgear and routes it to the two other gears. This assures that HAproxy is getting the most CPU as possible as the application on your headgear (where HAproxy is running) is no longer serving requests.
The out of memory message you're seeing might not be an actual out of memory issue but a bug relating to ulimit https://bugzilla.redhat.com/show_bug.cgi?id=1090092.

Zooker Failover Strategies

We are young team building an applicaiton using Storm and Kafka.
We have common Zookeeper ensemble of 3 nodes which is used by both Storm and Kafka.
I wrote a test case to test zooker Failovers
1) Check all the three nodes are running and confirm one is elected as a Leader.
2) Using Zookeeper unix client, created a znode and set a value. Verify the values are reflected on other nodes.
3) Modify the znode. set value in one node and verify other nodes have the change reflected.
4) Kill one of the worker nodes and make sure the master/leader is notified about the crash.
5) Kill the leader node. Verify out of other two nodes, one is elected as a leader.
Do i need i add any more test case? additional ideas/suggestion/pointers to add?
From the documentation
Verifying automatic failover
Once automatic failover has been set up, you should test its operation. To do so, first locate the active NameNode. You can tell which node is active by visiting the NameNode web interfaces -- each node reports its HA state at the top of the page.
Once you have located your active NameNode, you may cause a failure on that node. For example, you can use kill -9 to simulate a JVM crash. Or, you could power cycle the machine or unplug its network interface to simulate a different kind of outage. After triggering the outage you wish to test, the other NameNode should automatically become active within several seconds. The amount of time required to detect a failure and trigger a fail-over depends on the configuration of ha.zookeeper.session-timeout.ms, but defaults to 5 seconds.
If the test does not succeed, you may have a misconfiguration. Check the logs for the zkfc daemons as well as the NameNode daemons in order to further diagnose the issue.
more on setting up automatic failover

how to disable HAJNDI on jboss-4.0.3sp1

my test bed is 2 server which all run service based on jboss-4.0.3sp1, they are configured as cluster and has HA-JNDI online between 2 nodes.
due to some framework change, i need to shutdown the service on one node, how could we shutdown HA-JNDI?
i can not update cluster-service.xml to remove HA JDNI definition, that will cause application start-up error.
thanks,
Emre
Here is from JBoss Clustering documentation:
java.naming.provider.url JNDI setting can now
accept a list of urls separated by a comma. Example:
java.naming.provier.url=server1:1100,server2:1100,server3:1100,server4:1100
When initialising, the JNP client code will try to get in touch with each
server from the list, one after the other, stopping as soon as one server
has been reached.
So set it to server that is up.
I hope it is helps.