jboss-eap-6 HA singleton deploying multiple web archives in standalone configuration - jboss

I am able to deploy my ear and wars in my standalone cluster. 2 of my wars are for the HA singleton. Soon after starting the first standalone jboss-eap-6, I start the second. When all my applications have deployed successfully I open J-Console, I notice that one of my singleton war is running on the first jboss-eap-6 and the second singleton war is running on the second jboss-eap-6. Also in Jconsole, there was only 1 jboss-eap-6 reporting as primary.
My question is: Is there some way in jboss-eap-6 standalone.xml I can force only 1 jboss-eap-6 to run the singleton HA wars. Or would I have to package the wars into an ear?

I don't think there is anything in standalone.xml that would change the behaviour of a war. In any case you should be using standalone-ha.xml for a cluster with HA singletons deployed.
The JBoss High Availability Singleton architecture changed significantly between JBoss EAP 5 and 6.
Under JBoss EAP 5 you just placed your deployable object in a deploy-hasingleton special deployment folder. Under JBoss EAP 6 your classes need to implement a JBoss Service Layer, specifically org.jboss.msc.service.Service along with a org.jboss.msc.service.ServiceActivator. it is the implementation of these service classes that control the instantiation and management of your HA Singleton. I have not tried deploying a hasingleton as a war and I have some doubts because I suspect the dependent service classes may not be available in the web container.
The ServiceActivator is responsible for managing the lifecycle of the Service. The ServiceActivator implementation class needs to be listed in a file META-INF/service/org.jboss.msc.service.ServiceActivator for JBoss to activate it during its startup / deployment.
Example:
Create a Service Activator
public abstract class SingletonActivator<T extends Serializable> implements ServiceActivator {
#Override
public SingletonService<String> instantiateSingleton() {
return new SingletonService<String>();
}
public ServiceName getServiceName() {
return ServiceName.JBOSS.append("my", "ha", "singleton");
}
/**
* Activated by the Service Activator
*
* #param service
* #param serviceName
* - the Singleton Service Name that is registered in the JBOSS cluster
*/
#Override
public final void activate(ServiceActivatorContext context) {
SingletonService<T> service = instantiateSingleton();
SingletonService<T> singleton = new SingletonService<T>(service, getServiceName());
/*
* The NamePreference is a combination of the node name (-Djboss.node.name) and the name of
* the configured cache "singleton". If there is more than 1 node, it is possible to add more than
* one name and the election will use the first available node in that list.
*/
// e.g. singleton.setElectionPolicy(new PreferredSingletonElectionPolicy(new SimpleSingletonElectionPolicy(), new NamePreference("node1/singleton")));
// or singleton.setElectionPolicy(new PreferredSingletonElectionPolicy(new SimpleSingletonElectionPolicy(), new NamePreference("node1/singleton"), new
// NamePreference("node2/singleton")));
singleton.build(new DelegatingServiceContainer(context.getServiceTarget(), context.getServiceRegistry())).setInitialMode(ServiceController.Mode.ACTIVE).install();
}
}
Create A HA Singleton Service Class that is solely responsible for looking up and invoking your EJB containing your business logic
public class SingletonService<T> implements Service<T> {
protected ScheduledExecutorService deployDelayThread = null;
/**
* The node we are running on
*/
protected String nodeName;
/**
* A flag whether the service is started (or scheduled to be started)
*/
protected final AtomicBoolean started = new AtomicBoolean(false);
/**
* Container life cycle call upon activation. This will construct the singleton instance in this JVM and start the Timer.
*/
#Override
public final void start(StartContext context) throws StartException {
this.nodeName = System.getProperty("jboss.node.name");
logger.info("Starting service '" + this.getClass().getName() + "' on node " + nodeName);
if (!started.compareAndSet(false, true)) {
throw new StartException("The service " + this.getClass().getName() + " is still started!");
}
// MSC does not allow this thread to be blocked so we let the service know that the start is asynchronous and the result will be advised later.
// We delay the actual deployment of the Singleton for a few seconds to allow time for a HASingleton Election to be held and won by one of the instances.
// If the winner is not this instance (prior to deployemnt) then stop(Context) is invoked which sets started to false and the deployment does not occur.
// context.asynchronous();
deployDelayThread.schedule(new StartSingletonAsync(context), 10, TimeUnit.SECONDS);
context.complete();
}
/** Introduces a 5s delay in starting the Singleton bean giving time for the the ha singleton election to be held and won */
private class StartSingletonAsync implements Runnable {
private StartSingletonAsync(StartContext context) {
}
#Override
public void run() {
try {
startSingletonBean();
} catch (StartException e) {
logger.info("Start Exception", e);
}
// be nice to the garbage collector, we don't need this any more
deployDelayThread.shutdown();
deployDelayThread = null;
}
}
private void startSingletonBean() throws StartException {
try {
if (!started.get()) {
throw new StartException("Aborted due to service stopping");
}
// Start your EJB
InitialContext ic = new InitialContext();
bean = ic.lookup(getJndiName());
bean.startHaSingleton();
logger.info("*** Master Only: HASingleton service " + getJndiName() + " started on master:" + nodeName);
if (!bean.isRunning()) {
logger.error("ERROR Bean should be running");
}
} catch (NamingException e) {
throwStartException(e);
}
}
private void throwStartException(Exception e) throws StartException {
String message = "Could not initialize HASingleton" + getJndiName() + " on " + nodeName;
logger.error(message, e);
throw new StartException(message, e);
}
/**
* Container life cycle call when activated
*/
#Override
public final void stop(StopContext context) {
if (deployDelayThread != null) {
deployDelayThread.shutdownNow();
}
if (!started.compareAndSet(true, false) || bean == null) {
logger.warn("The service '" + this.getClass().getName() + "' is not active!");
} else {
try {
InitialContext ic = new InitialContext();
bean = (JmxMBean) ic.lookup(getJndiName());
bean.stopHaSingleton();
logger.info("*** Master Only: HASingleton service " + getJndiName() + " stopped on master:" + nodeName);
} catch (EJBException e) {
// Note: all these exceptions are already logged by JBoss
} catch (NamingException e) {
logger.error("Could not stop HASingleton service " + getJndiName() + " on " + nodeName, e);
}
logger.info("MASTER ONLY HASingleton service '" + this.getClass().getName() + "' Stopped on node " + nodeName);
}
}
private String getJndiName() {
return "java.global/path/to/your/singleton/ejb";
}
}
Finally list your Activator class in META-INF/servcie/org.jboss.msc.service.ServiceActivator
com.mycompany.singletons.SingletonActivator
You may also need to add dependencies to the manifest META-INF/MANIFEST.MF file inside your jar as follows: Dependencies: org.jboss.msc, org.jboss.as.clustering.singleton, org.jboss.as.server
There is a more extensive implementation guide available from Redhat at https://access.redhat.com/documentation/en-US/JBoss_Enterprise_Application_Platform/6.4/html/Development_Guide/Implement_an_HA_Singleton.html. You may need to create a Redhat account to access this. There is also a quickstart example in the JBoss distribution.

So after further evaluation, the 2 singletons eventually would merge together after a few minutes, thus creating 1 intended singleton.

Related

How to create Job in Kubernetes using Java API

Am able to create a job in the Kubernetes cluster using CLI (https://kubernetesbyexample.com/jobs/)
Is there a way to create a job inside the cluster using Java API ?
You can use Kubernetes Java Client to create any object such as Job. Referring from the example here
/*
* Creates a simple run to complete job that computes π to 2000 places and prints it out.
*/
public class JobExample {
private static final Logger logger = LoggerFactory.getLogger(JobExample.class);
public static void main(String[] args) {
final ConfigBuilder configBuilder = new ConfigBuilder();
if (args.length > 0) {
configBuilder.withMasterUrl(args[0]);
}
try (KubernetesClient client = new DefaultKubernetesClient(configBuilder.build())) {
final String namespace = "default";
final Job job = new JobBuilder()
.withApiVersion("batch/v1")
.withNewMetadata()
.withName("pi")
.withLabels(Collections.singletonMap("label1", "maximum-length-of-63-characters"))
.withAnnotations(Collections.singletonMap("annotation1", "some-very-long-annotation"))
.endMetadata()
.withNewSpec()
.withNewTemplate()
.withNewSpec()
.addNewContainer()
.withName("pi")
.withImage("perl")
.withArgs("perl", "-Mbignum=bpi", "-wle", "print bpi(2000)")
.endContainer()
.withRestartPolicy("Never")
.endSpec()
.endTemplate()
.endSpec()
.build();
logger.info("Creating job pi.");
client.batch().jobs().inNamespace(namespace).createOrReplace(job);
// Get All pods created by the job
PodList podList = client.pods().inNamespace(namespace).withLabel("job-name", job.getMetadata().getName()).list();
// Wait for pod to complete
client.pods().inNamespace(namespace).withName(podList.getItems().get(0).getMetadata().getName())
.waitUntilCondition(pod -> pod.getStatus().getPhase().equals("Succeeded"), 1, TimeUnit.MINUTES);
// Print Job's log
String joblog = client.batch().jobs().inNamespace(namespace).withName("pi").getLog();
logger.info(joblog);
} catch (KubernetesClientException e) {
logger.error("Unable to create job", e);
} catch (InterruptedException interruptedException) {
logger.warn("Thread interrupted!");
Thread.currentThread().interrupt();
}
}
}
If you want to launch a job using a static manifest yaml from inside the cluster, it should be easy using the official library.
This code worked for me.
ApiClient client = ClientBuilder.cluster().build(); //create in-cluster client
Configuration.setDefaultApiClient(client);
BatchV1Api api = new BatchV1Api(client);
V1Job job = new V1Job();
job = (V1Job) Yaml.load(new File("/tmp/template.yaml")); //load static yaml file
ApiResponse<V1Job> response = api.createNamespacedJobWithHttpInfo("default", job, "true", null, null);
You can also modify any kind of information of the job before launching it with the combination of getter and setter.
// set metadata-name
job.getMetadata().setName("newName");
// set spec-template-metadata-name
job.getSpec().getTemplate().getMetadata().setName("newName");

Apache Curator : No leader is getting selected intermittently

I am using Apache Curator Leader Election Recipe : https://curator.apache.org/curator-recipes/leader-election.html in my application.
Zookeeper version : 3.5.7
Curator : 4.0.1
Below are the sequence of steps:
1. Whenever my tomcat server instance is getting up, I create a single CuratorFramework instance(single instance per tomcat server) and start it :
CuratorFramework client = CuratorFrameworkFactory.newClient(connectionString, retryPolicy);
client.start();
if(!client.blockUntilConnected(10, TimeUnit.MINUTES)){
LOGGER.error("Zookeeper connection could not establish!");
throw new RuntimeException("Zookeeper connection could not establish");
}
Create an instance of LSAdapter and start it:
LSAdapter adapter = new LSAdapter(client, <some_metadata>);
adapter.start();
Below is my LSAdapter class :
public class LSAdapter extends LeaderSelectorListenerAdapter implements Closeable {
//<Class instance variables defined>
public LSAdapter(CuratorFramework client, <some_metadata>) {
leaderSelector = new LeaderSelector(client, <path_to_be_used_for_leader_election>, this);
leaderSelector.autoRequeue();
}
public void start() throws IOException {
leaderSelector.start();
}
#Override
public void close() throws IOException {
leaderSelector.close();
}
#Override
public void takeLeadership(CuratorFramework client) throws Exception {
final int waitSeconds = (int) (5 * Math.random()) + 1;
LOGGER.info(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
LOGGER.debug(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
while (true) {
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
//do leader tasks
} catch (InterruptedException e) {
LOGGER.error(name + " was interrupted.");
//cleanup
Thread.currentThread().interrupt();
} finally {
}
}
}
}
When server instance is getting down, close LSAdapter instance(which application is using) and close CuratorFramework client created
CloseableUtils.closeQuietly(lsAdapter);
curatorFrameworkClient.close();
The issue I am facing is that at times, when server is restarted, no leader gets elected. I checked that by tracing the log inside takeLeadership(). I have two tomcat server instances with above code, connecting to same zookeeper quorum and most of the times one of the instance becomes leader but when this issue happens, both of them becomes follower. Please suggest what am I doing wrong.
As I answered on Curator's Jira, you are swallowing the interrupted exception. When you get InterruptedException you must exit your takeLeadership(). In your code example, you are merely resetting the interrupted state and continuing the loop - this will cause an infinite loop of interrupted exceptions, btw. After calling Thread.currentThread().interrupt(); you should exit the while loop.

Netty Client Connect with Server, but server does not fire channelActive/Registered

I have the following architecture in use:
- [Client] - The enduser connecting to our service.
- [GameServer] - The game server on which the game is running.
- [GameLobby] - A server that is responsible for matching Clients with a GameServer.
If we have for example 4 Clients that want to play a game and get matched to a GameLobby, then the first time all these connection succeeds properly.
However when they decide to rematch, then one of the Clients will not properly connect.
The connection between all the Clients and the GameServer happens simultaneously.
Clients that rematch first removes their current connection with the GameServer and head into the lobby again.
This connection will succeed, no errors are thrown. Even using a ChannelFuture it shows that the client connection was made properly, the following values are retrieved to show that the client thinks the connection was correct:
- ChannelFuture.isSuccess() = True
- ChannelFuture.isDone() = True
- ChannelFuture.cause() = Null
- ChannelFuture.isCancelled() = False
- Channel.isOpen() = True
- Channel.isActive() = True
- Channel.isRegistered() = True
- Channel.isWritable() = True
Thus the connection was properly made according to the Client. However on the GameServer at the SimpleChannelInboundHandler, the method ChannelRegistered/ChannelActive is never called for that specific Client. Only for the other 3 Clients.
All the 4 Clients, the GameServer, and the Lobby are running on the same IPAddress.
Since it only happens when (re)connecting again to the GameServer, I thought that is had to do with not properly closing the connection. Currently this is done through:
try {
group.shutdownGracefully();
channel.closeFuture().sync();
} catch (InterruptedException e) {
e.printStackTrace();
}
On the GameServer the ChannelUnregister is called thus this is working, and the connection is destroyed.
I have tried adding listeners to the ChannelFuture of the malfunctioning channel connection, however according to the channelFuture everything works, which is not the case.
I tried adding ChannelOptions to allow for more Clients queued to the server.
GameServer
The GameServer server is initialized as follow:
// Create the bootstrap to make this act like a server.
ServerBootstrap serverBootstrap = new ServerBootstrap();
serverBootstrap.group(bossGroup)
.channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitialisation(new ClientInputReader(gameThread)))
.option(ChannelOption.SO_BACKLOG, 1000)
.childOption(ChannelOption.SO_KEEPALIVE, true)
.childOption(ChannelOption.TCP_NODELAY, true);
bossGroup.execute(gameThread); // Executing the thread that handles all games on this GameServer.
// Launch the server with the specific port.
serverBootstrap.bind(port).sync();
The GameServer ClientInputReader
#ChannelHandler.Sharable
public class ClientInputReader extends SimpleChannelInboundHandler<Packet> {
private ServerMainThread serverMainThread;
public ClientInputReader(ServerMainThread serverMainThread) {
this.serverMainThread = serverMainThread;
}
#Override
public void channelRegistered(ChannelHandlerContext ctx) throws Exception {
System.out.println("[Connection: " + ctx.channel().id() + "] Channel registered");
super.channelRegistered(ctx);
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Packet handling
}
}
The malfunction connection is not calling anything of the SimpleChannelInboundHandler. Not even ExceptionCaught.
The GameServer ChannelInitialisation
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler channelInputReader;
public ChannelInitialisation(SimpleChannelInboundHandler channelInputReader) {
this.channelInputReader = channelInputReader;
}
#Override
protected void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline pipeline = ch.pipeline();
// every packet is prefixed with the amount of bytes that will follow
pipeline.addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
pipeline.addLast(new LengthFieldPrepender(4));
pipeline.addLast(new PacketEncoder(), new PacketDecoder(), channelInputReader);
}
}
Client
Client creating a GameServer connection:
// Configure the client.
group = new NioEventLoopGroup();
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.handler(new ChannelInitialisation(channelHandler));
// Start the client.
channel = b.connect(address, port).await().channel();
/* At this point, the client thinks that the connection was succesfully, as the channel is active, open, registered and writable...*/
ClientInitialisation:
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler<Packet> channelHandler;
ChannelInitialisation(SimpleChannelInboundHandler<Packet> channelHandler) {
this.channelHandler = channelHandler;
}
#Override
public void initChannel(SocketChannel ch) throws Exception {
// prefix messages by the length
ch.pipeline().addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
ch.pipeline().addLast(new LengthFieldPrepender(4));
// our encoder, decoder and handler
ch.pipeline().addLast(new PacketEncoder(), new PacketDecoder(), channelHandler);
}
}
ClientHandler:
public class ClientPacketHandler extends SimpleChannelInboundHandler<Packet> {
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
super.channelActive(ctx);
System.out.println("Channel active: " + ctx.channel().id());
ctx.channel().writeAndFlush(new PacketSetupClientToGameServer());
System.out.println("Sending setup packet to the GameServer: " + ctx.channel().id());
// This is successfully called, as the client thinks the connection was properly made.
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Reading packets.
}
}
I expect that the Client could connect properly to the server. Since the other Clients are properly connecting and the client could previously connect just fine.
TL;DR: When multiple Clients try to create a new match, there is a possibility that one, possibly more, Client(s) will not connect properly with the server, after the previous connection was closed.
For some that struggle with this issue in some way or another.
I did a workaround that allows me to continue even tho there is still a bug inside the Netty framework (as far as I am concerned). The workaround is quite simple just create a connection pool.
My solution uses a maximum of five connections inside the connection pool. If one of the connection gets no reply from the GameServer, then it is not that big of a deal, since there are four others that will have a high chance of succeeding. I know this is a bad workaround, but I could not find any information on this issue. It works and only gives a maximum delay of 5 seconds (each retry takes a second)

Distributed state-machine's zookeeper ensemble fails while processing parallel regions with error KeeperErrorCode = BadVersion

Background :
Diagram :
Statemachine uml state diagram
We have a normal state machine as depicted in diagram that monitors spring-BATCH micro-service(deployed on streams source/processor/sink design) ,for each batch that is started .
We receive sequence of REST calls to internally fire events per batch id on respective batch's machine object. i.e. per batch id the new state machine object is created .
And each machine is having n number of parallel regions(representing spring batch's chunks ) also as shown in the diagram.
REST calls made are using multi-threaded environment where 2 simultaneous calls of same batchId may come for different region Ids of BATCHPROCESSING state .
Up till now we had a single node(single installation) running of this state machine micro-service but now we want to deploy it on multiple instances ; to receive REST calls .
For this , the Distributed State Machine is what we want to introduce . We have below configuration in place for Running Distributed State Machine .
#Configuration
#EnableStateMachine
public class StateMachineUMLWayConfiguration extends
StateMachineConfigurerAdapter<String, String> {
..
..
#Override
public void configure(StateMachineModelConfigurer<String,String> model)
throws Exception {
model
.withModel()
.factory(stateMachineModelFactory());
}
#Bean
public StateMachineModelFactory<String,String> stateMachineModelFactory() {
StorehubBatchUmlStateMachineModelFactory factory =null;
try {
factory = new StorehubBatchUmlStateMachineModelFactory
(templateUMLInClasspath,stateMachineEnsemble());
} catch (Exception e) {
LOGGER.info("Config's State machine factory got exception
:"+factory);
}
LOGGER.info("Config's State machine factory method Called:"+factory);
factory.setStateMachineComponentResolver(stateMachineComponentResolver());
return factory;
}
#Override
public void configure(StateMachineConfigurationConfigurer<String,
String>
config) throws Exception {
config
.withDistributed()
.ensemble(stateMachineEnsemble());
}
#Bean
public StateMachineEnsemble<String, String> stateMachineEnsemble() throws
Exception {
return new ZookeeperStateMachineEnsemble<String, String>(curatorClient(), "/batchfoo1", true, 512);
}
#Bean
public CuratorFramework curatorClient() throws Exception {
CuratorFramework client =
CuratorFrameworkFactory.builder().defaultData(new byte[0])
.retryPolicy(new ExponentialBackoffRetry(1000, 3))
.connectString("localhost:2181").build();
client.start();
return client;
}
StorehubBatchUmlStateMachineModelFactory's build method:
#Override
public StateMachineModel<String, String> build(String batchChunkId) {
Model model = null;
try {
model = UmlUtils.getModel(getResourceUri(resolveResource(batchChunkId)).getPath());
} catch (IOException e) {
throw new IllegalArgumentException("Cannot build model from resource " + resource + " or location " + location, e);
}
UmlModelParser parser = new UmlModelParser(model, this);
DataHolder dataHolder = parser.parseModel();
ConfigurationData<String, String> configurationData = new ConfigurationData<String, String>( null, new SyncTaskExecutor(),
new ConcurrentTaskScheduler() , false, stateMachineEnsemble,
new ArrayList<StateMachineListener<String, String>>(), false,
null, null,
null, null, false,
null , batchChunkId, null,
null ) ;
return new DefaultStateMachineModel<String, String>(configurationData, dataHolder.getStatesData(), dataHolder.getTransitionsData());
}
Created new custom service interface level method in place of DefaultStateMachineService.acquireStateMachine(machineId)
#Override
public StateMachine<String, String> acquireDistributedStateMachine(String machineId, boolean start) {
synchronized (distributedMachines) {
DistributedStateMachine<String,String> distributedStateMachine = distributedMachines.get(machineId);
StateMachine<String,String> distMachineDelegateX = null;
if (distributedStateMachine == null) {
StateMachine<String, String> machine = stateMachineFactory.getStateMachine(machineId);
distributedStateMachine = (DistributedStateMachine<String, String>) machine;
}
distributedMachines.put(machineId, distributedStateMachine);
return handleStart(distributedStateMachine, start);
}
}
Problem :
Now problem is that , micro service deployed on single instance runs successfully even for events received by it are from multi threaded environment where one thread hits with the event REST call belonging to Region 1 and simultaneously other thread comes for region 2 of same batch . Machine goes ahead in synch ,with successful parallel regions' processing , till its last state i.e BATCHCOMPLETED .
Also we checked at zookeeper side that at last the BATCHCOMPLETED STATE was being recorded in node's current version.
But , besides 1st instance , when we keep same micro service app-jar deployed on some other location to treat it as a 2nd instance of micro-service that is also now running to accept event REST calls(say by listening at another tomcat port 9002) ; it fails in middle somewhere randomly . This failure happens randomly after any one of the events among parallel regions is fired and when ensemble.setState() is being called internally on state change of that event .
It gives following error:
[36mo.s.s.support.AbstractStateMachine [0;39m [2m:[0;39m Interceptors threw exception, skipping state change
org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
at org.springframework.statemachine.zookeeper.ZookeeperStateMachineEnsemble.setState(ZookeeperStateMachineEnsemble.java:241) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
at org.springframework.statemachine.ensemble.DistributedStateMachine$LocalStateMachineInterceptor.preStateChange(DistributedStateMachine.java:209) ~[spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.StateMachineInterceptorList.preStateChange(StateMachineInterceptorList.java:101) ~[spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.callPreStateChangeInterceptors(AbstractStateMachine.java:859) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.switchToState(AbstractStateMachine.java:880) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.access$500(AbstractStateMachine.java:81) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine$3.transit(AbstractStateMachine.java:335) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.handleTriggerTrans(DefaultStateMachineExecutor.java:286) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.handleTriggerTrans(DefaultStateMachineExecutor.java:211) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.processTriggerQueue(DefaultStateMachineExecutor.java:449) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.access$200(DefaultStateMachineExecutor.java:65) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor$1.run(DefaultStateMachineExecutor.java:323) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-4.3.13.RELEASE.jar!/:4.3.13.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.scheduleEventQueueProcessing(DefaultStateMachineExecutor.java:352) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.execute(DefaultStateMachineExecutor.java:163) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.sendEventInternal(AbstractStateMachine.java:603) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.sendEvent(AbstractStateMachine.java:218) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.ensemble.DistributedStateMachine.sendEvent(DistributedStateMachine.java:108)
..skipping Lines....
Caused by: org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
at org.springframework.statemachine.zookeeper.ZookeeperStateMachinePersist.write(ZookeeperStateMachinePersist.java:113) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
at org.springframework.statemachine.zookeeper.ZookeeperStateMachinePersist.write(ZookeeperStateMachinePersist.java:50) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
at org.springframework.statemachine.zookeeper.ZookeeperStateMachineEnsemble.setState(ZookeeperStateMachineEnsemble.java:235) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
... 73 common frames omitted
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
Question :
1.So is the configuration mentioned above needs something more to be configured to avoid that exception mentioned above??
Because Both state-machine micro-service instances were tested with the case when they both were connecting to same instance i.e. same string .connectString("localhost:2181").build() or case when they were made to connect to different zookeeper instances(i.e. 'localhost:2181' , 'localhost:2182').
Same exception of BAD VERSION occurs during state machine ensemble's processing in both cases .
2.Also If Batches would run in parallel so their respective machines would need to be created to run in parallel at state-machine micro-service end .
So here , technically new State machine we need for new batchId , running simultaneously .
But looking at the ZookeeperStateMachineEnsemble , One znode path seems to be associated with one ensemble , whenever ensemble object is instantiated once in the main config class ("StateMachineUMLWayConfiguration") .
So is it expected to only use that singleton ensemble instance only? Can't multiple ensembles be created at run-time referencing different znode paths run in parallel to log their respective Distributed State Machine's states to their respective znode paths??
a. Because batches running in parallel would need separate znode paths to be created . Thus due to our attempt of keeping separate znode path per batch , we need separate ensemble to be instantiated per batch's machine. But that seems to be getting into the lock condition while getting connection to znode through curator client.
b. REST call fired for event triggering does not complete , as the machine it acquired is stuck in ensemble to connect .
Thanks in advance .

Pax Exam how to start multiple containers

for a project i'm working on, we have the necessity to write PaxExam integration tests which run over multiple Karaf containers.
The idea would be finding a way to extend/configure PaxExam to start-up a Karaf container (or more) and deploying there a bounce of bundles, and then start the test Karaf container which will then test the functionality.
We need this to verify performance tests and other things.
Does someone know anything about that? Is that actually possible in PaxExam?
I write the answer by myself, after having found this interesting article.
In particular have a look at the sections Using the Karaf Shell and Distributed integration tests in Karaf
http://planet.jboss.org/post/advanced_integration_testing_with_pax_exam_karaf
This is basically what the article says:
first of all you have to change the test probe header, allowing the dynamic-package
#ProbeBuilder
public TestProbeBuilder probeConfiguration(TestProbeBuilder probe) {
probe.setHeader(Constants.DYNAMICIMPORT_PACKAGE, "*;status=provisional");
return probe;
}
After that, the article suggests the following code that is able to execute commands in the Karaf shell
#Inject
CommandProcessor commandProcessor;
protected String executeCommands(final String ...commands) {
String response;
final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
final PrintStream printStream = new PrintStream(byteArrayOutputStream);
final CommandSession commandSession = commandProcessor.createSession(System.in, printStream, System.err);
FutureTask<string> commandFuture = new FutureTask<string>(
new Callable<string>() {
public String call() {
try {
for(String command:commands) {
System.err.println(command);
commandSession.execute(command);
}
} catch (Exception e) {
e.printStackTrace(System.err);
}
return byteArrayOutputStream.toString();
}
});
try {
executor.submit(commandFuture);
response = commandFuture.get(COMMAND_TIMEOUT, TimeUnit.MILLISECONDS);
} catch (Exception e) {
e.printStackTrace(System.err);
response = "SHELL COMMAND TIMED OUT: ";
}
return response;
}
Then, the rest is kind of trivial, you will have to implement a layer able to start-up a child instance of Karaf
public void createInstances() {
//Install broker feature that is provided by FuseESB
executeCommands("admin:create --feature broker brokerChildInstance");
//Install producer feature that provided by imaginary feature repo.
executeCommands("admin:create --featureURL mvn:imaginary/repo/1.0/xml/features --feature producer producerChildInstance");
//Install producer feature that provided by imaginary feature repo.
executeCommands("admin:create --featureURL mvn:imaginary/repo/1.0/xml/features --feature consumer consumerChildInstance");
//start child instances
executeCommands("admin:start brokerChildInstance");
executeCommands("admin:start producerChildInstance");
executeCommands("admin:start consumerChildInstance");
//You will need to destroy the child instances once you are done.
//Using #After seems the right place to do that.
}