JBPM - Process instance XXX is disconnected - jboss

We are facing error 'Process instance XXX is disconnected' very frequently in our project and holding up the tasks operations.
We are using SynchronizedTaskService for task operation:
Code snippet are below:
final RuntimeManager runtimeManager = RuntimeEngineFacory.getRuntimeManager();
final RuntimeEngine engine = runtimeManager.getRuntimeEngine(EmptyContext.get());
SynchronizedTaskService taskService = (SynchronizedTaskService) engine.GetTaskService();
It was raised in one of JBPM bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1161574
Please help if anyone has any clue.

After lots of research and finding (including different community discussion) came to little close to this issue solution.
Below are the main reason of this:
Using the (runtime manager) Singleton strategy with JTA transactions
(UserTransaction or CMT) is not recommended because there is a race
condition when using this. This race condition can result in an
IllegalStateException with a message similar to "Process instance XXX
is disconnected."
If you are using Singleton Strategy then make sure you are Synchronizing your call to JBPM.
Solution
Better to use Per Process Runtime Strategy so that JBPM engine will make sure strict relationship between Process Instance and Session. Session will remain associated till entire lifetime of Process Instance. This will also assure that Session are not shared across. This is I think most advance strategy available in JBPM.

Finally I am able to resolved this issue.
For benefit of whoever facing issue:
- This issue appears whenever you have not properly managed transactions
There were some place we were managed transaction incorrectly and somehow JBPM internally InternalKnowledgeRuntime set to null.
By the way this errors throws from
class: ProcessInstanceImpl
method: getProcess()
public Process getProcess(){
if(this.process == null){
if(processXml == null){
if(kruntime == null){
throw new RuntimeException("Process instance " + id + "[" + processId + "] is disconnected. "));
}else{
other code ...........
}
}
}
}

Related

JBoss 7.1.0: executor.submit() how to debug (no exception but task not started)

I have an issue with JBoss EAP 7.1.0 GA. On one server (my DEV) this works like a charm while on the other (TEST environment) the Callable executed using executor.submit() does not seem to be started (I do not see that "This is call" message in log), but no exception or any other clue is given.
The question is - where should I look like / how should I debug this issue?
The calling code:
#Resource(name = "DefaultManagedExecutorService")
ManagedExecutorService executor;
try {
DownloadPlayers dp = new DownloadPlayers();
Future<Queue<PlayerForDownload>> f = executor.submit(dp);
Queue<PlayerForDownload> q = f.get();
L.info(q.size());
} catch (Exception e) {
L.error("EXCEPTION" + e.getMessage());
}
The class it calls:
public class DownloadPlayers implements Callable<Queue<PlayerForDownload>> {
// the constructor gets called, I'm sure as it writes to log
// the call is as simple as this
#Override
public Queue<PlayerForDownload> call() {
L.info("This is call()");
try {
return this.getPlayersForDownload();
} catch (WorkerException e) {
L.error(e);
return null;
}
}
}
As stated above, the code itself seems to be OK as it works in one server but does not work on the other. Both are
7.1.0GA standalone.
Any advice how to debug the ManagedExecutorService?
Thanks.
In this particular case the problem was that on the TEST environment it was only allowed to run two threads which we already running (by some completely different part of application I didn't realize). So the problem is solved now by setting the "Core threads" parameter in ManagerExecutorService to higher value, the tasks are running.
However the tricky part was that there was no obvious visible difference between the JBoss servers (I compared the standalone.xml configs...) just because the ManagerExecutorService in JBoss has some default (blank) values that actually depend on the system config (v-CPU cores in my case). So despite the config being the same, the "Core threads" seem to default to 2 on TEST and some higher (unknown to me) value on my DEV.
So never depend on default settings in ManagerExecutorService if comparing two environments.
I have also rewritten the logic, instead of using blocking Future.get() or checking for Future.isDone() in a loop a do Future.get() with timeout and in the exception handler I decide whether to keep waiting or fail.

Clustering in AEM

I am facing an error, which is something peculiar. I am using AEM 5.6.1.
I have 2 author instances(a1 and a2) and both are in cluster. We are performing tar optimization on the instances daily between 2a.m. - 5a.m.(London Timezone). Now, in the error.log of a2, I am seeing the below error everyday in the above mentioned time:
419 ERROR [pool-6-thread-1] org.apache.sling.discovery.impl.cluster.ClusterViewServiceImpl getEstablishedView: the existing established view does not incude the enter code herelocal instance yet! Assming isolated mode.
Now, I did some research on this and has come to know that, AEM users ClusterViewServiceImpl.java for clustering. And in that, the below mentioned code snippet is basically failing:
EstablishedClusterView clusterViewImpl = new EstablishedClusterView(
config, view, getSlingId());
boolean foundLocal = false;
for (Iterator<InstanceDescription> it = clusterViewImpl
.getInstances().iterator(); it.hasNext();) {
InstanceDescription instance = it.next();
if (instance.isLocal()) {
foundLocal = true;
break;
}
}
if (foundLocal) {
return clusterViewImpl;
} else {
logger.info("getEstablishedView: the existing established view does not incude the local instance yet! Assuming isolated mode.");
return getIsolatedClusterView();
}
Can someone help me to understand more in depth regarding the same. Does it mean that, the clustering is not properly working? What can be the possible impacts because of this error?
I think you've got a classic case of split brain.
Clustering authors is not a good approach and has been disfavoured in future versions of AEM, as the authors often get out of sync when they can't talk to each other for whatever reason, usually temporarily network related. Believe me, they are sensitive.
When communication drops, the slave thinks it no longer has a master, and claims to be the master itself. When that occurs, and communication is re-established the damage has been done as there is no recovery mechanism.
At best, only ever allow users to connect to the primary author and have the secondary author as a High Availability server.
Better still, set up replication from the primary author that everyone writes to, and have it auto replicate on write to the secondary backup author.
Hope that helps.

General pattern for failing over from one database to another using Entity Framework?

We have an enterprise DB that is replicated through many sites throughout the world. We would like our app to attempt to connect to one of the local sites, and if that site is down we want it to fall back to the enterprise DB. We'd like this behavior on each of our DB operations.
We are using Entity Framework, C#, and SQL Server.
At first I hoped I could just specify a "Failover Partner" in the connection string, but that only works in a mirrored DB environment, which this is not. I also looked into writing a custom IDbExecutionStrategy. But these strategies only allow you to specify the pattern for retrying a failed DB operation. It does not allow you to change the operation in any way like directing it to a new connection.
So, do you know of any good pattern for dealing with this type of operation, other than duplicating retry logic around each of our many DB operations?
Update on 2014-05-14:
I'll elaborate in response to some of the suggestions already made.
I have many places where the code looks like this:
try
{
using(var db = new MyDBContext(ConnectionString))
{
// Database operations here.
// var myList = db.MyTable.Select(...), etc.
}
}
catch(Exception ex)
{
// Log exception here, perhaps rethrow.
}
It was suggested that I have a routine that first checks each of the connections strings and returns the first one that successfully connects. This is reasonable as far as it goes. But some of the errors I'm seeing are timeouts on the operations, where the connection works but the DB has issues that keep it from completing the operation.
What I'm looking for is a pattern I can use to encapsulate the unit of work and say, "Try this on the first database. If it fails for any reason, rollback and try it on the second DB. If that fails, try it on the third, etc. until the operation succeeds or you have no more DBs." I'm pretty sure I can roll my own (and I'll post the result if I do), but I was hoping there might be a known way to approach this.
How about using some Dependency Injection system like autofac and registering there a factory for new context objects - it will execute logic that will try to connect first to local and in case of failure it will connect to enterprise db. Then it will return ready DbContext object. This factory will be provided to all objects that require it with Dependency Injection system - they will use it to create contexts and dispose of them when they are not needed any more.
" We would like our app to attempt to connect to one of the local sites, and if that site is down we want it to fall back to the enterprise DB. We'd like this behavior on each of our DB operations."
If your app is strictly read-only on the DB and data consistency is not absolutely vital to your app/users, then it's just a matter of trying to CONNECT until an operational site has been found. As M.Ali suggested in his remark.
Otherwise, I suggest you stop thinking along these lines immediately because you're just running 90 mph down a dead end street. As Viktor Zychla suggested in his remark.
Here is what I ended up implementing, in broad brush-strokes:
Define delegates called UnitOfWorkMethod that will execute a single Unit of Work on the Database, in a single transaction. It takes a connection string and one also returns a value:
delegate T UnitOfWorkMethod<out T>(string connectionString);
delegate void UnitOfWorkMethod(string connectionString);
Define a method called ExecuteUOW, that will take a unit of work and method try to execute it using the preferred connection string. If it fails, it tries to execute it with the next connection string:
protected T ExecuteUOW<T>(UnitOfWorkMethod<T> method)
{
// GET THE LIST OF CONNECTION STRINGS
IEnumerable<string> connectionStringList = ConnectionStringProvider.GetConnectionStringList();
// WHILE THERE ARE STILL DATABASES TO TRY, AND WE HAVEN'T DEFINITIVELY SUCCEDED OR FAILED
var uowState = UOWStateEnum.InProcess;
IEnumerator<string> stringIterator = connectionStringList.GetEnumerator();
T returnVal = default(T);
Exception lastException = null;
string connectionString = null;
while ((uowState == UOWStateEnum.InProcess) && stringIterator.MoveNext())
{
try
{
// TRY TO EXECUTE THE UNIT OF WORK AGAINST THE DB.
connectionString = stringIterator.Current;
returnVal = method(connectionString);
uowState = UOWStateEnum.Success;
}
catch (Exception ex)
{
lastException = ex;
// IF IT FAILED BECAUSE OF A TRANSIENT EXCEPTION,
if (TransientChecker.IsTransient(ex))
{
// LOG THE EXCEPTION AND TRY AGAINST ANOTHER DB.
Log.TransientDBException(ex, connectionString);
}
// ELSE
else
{
// CONSIDER THE UOW FAILED.
uowState = UOWStateEnum.Failed;
}
}
}
// LOG THE FAILURE IF WE HAVE NOT SUCCEEDED.
if (uowState != UOWStateEnum.Success)
{
Log.ExceptionDuringDataAccess(lastException);
returnVal = default(T);
}
return returnVal;
}
Finally, for each operation we define our unit of work delegate method. Here an example
UnitOfWorkMethod uowMethod =
(providerConnectionString =>
{
using (var db = new MyContext(providerConnectionString ))
{
// Do my DB commands here. They will roll back if exception thrown.
}
});
ExecuteUOW(uowMethod);
When ExecuteUOW is called, it tries the delegate on each database until it either succeeds or fails on all of them.
I'm going to accept this answer since it fully addresses all of concerns raised in the original question. However, if anyone provides and answer that is more elegant, understandable, or corrects flaws in this one I'll happily accept it instead.
Thanks to all who have responded.

WF4 InstancePersistenceCommand interrupted

I have a windows service, running workflows. The workflows are XAMLs loaded from database (users can define their own workflows using a rehosted designer). It is configured with one instance of the SQLWorkflowInstanceStore, to persist workflows when becoming idle. (It's basically derived from the example code in \ControllingWorkflowApplications from Microsoft's WCF/WF samples).
But sometimes I get an error like below:
System.Runtime.DurableInstancing.InstanceOwnerException: The execution of an InstancePersistenceCommand was interrupted because the instance owner registration for owner ID 'a426269a-be53-44e1-8580-4d0c396842e8' has become invalid. This error indicates that the in-memory copy of all instances locked by this owner have become stale and should be discarded, along with the InstanceHandles. Typically, this error is best handled by restarting the host.
I've been trying to find the cause, but it is hard to reproduce in development, on production servers however, I get it once in a while. One hint I found : when I look at the LockOwnersTable, I find the LockOnwersTable lockexpiration is set to 01/01/2000 0:0:0 and it's not getting updated anymore, while under normal circumstances the should be updated every x seconds according to the Host Lock Renewal period...
So , why whould SQLWorkflowInstanceStore stop renewing this LockExpiration and how can I detect the cause of it?
This happens because there are procedures running in the background and trying to extend the lock of the instance store every 30 seconds, and it seems that once the connection fail connecting to the SQL service it will mark this instance store as invalid.
you can see the same behaviour if you delete the instance store record from [LockOwnersTable] table.
The proposed solution is when this exception fires, you need to free the old instance store and initialize a new one
public class WorkflowInstanceStore : IWorkflowInstanceStore, IDisposable
{
public WorkflowInstanceStore(string connectionString)
{
_instanceStore = new SqlWorkflowInstanceStore(connectionString);
InstanceHandle handle = _instanceStore.CreateInstanceHandle();
InstanceView view = _instanceStore.Execute(handle,
new CreateWorkflowOwnerCommand(), TimeSpan.FromSeconds(30));
handle.Free();
_instanceStore.DefaultInstanceOwner = view.InstanceOwner;
}
public InstanceStore Store
{
get { return _instanceStore; }
}
public void Dispose()
{
if (null != _instanceStore)
{
var deleteOwner = new DeleteWorkflowOwnerCommand();
InstanceHandle handle = _instanceStore.CreateInstanceHandle();
_instanceStore.Execute(handle, deleteOwner, TimeSpan.FromSeconds(10));
handle.Free();
}
}
private InstanceStore _instanceStore;
}
you can find the best practices to create instance store handle in this link
Workflow Instance Store Best practices
This is an old thread but I just stumbled on the same issue.
Damir's Corner suggests to check if the instance handle is still valid before calling the instance store. I hereby quote the whole post:
Certain aspects of Workflow Foundation are still poorly documented; the persistence framework being one of them. The following snippet is typically used for setting up the instance store:
var instanceStore = new SqlWorkflowInstanceStore(connectionString);
instanceStore.HostLockRenewalPeriod = TimeSpan.FromSeconds(30);
var instanceHandle = instanceStore.CreateInstanceHandle();
var view = instanceStore.Execute(instanceHandle,
new CreateWorkflowOwnerCommand(), TimeSpan.FromSeconds(10));
instanceStore.DefaultInstanceOwner = view.InstanceOwner;
It's difficult to find a detailed explanation of what all of this
does; and to be honest, usually it's not necessary. At least not,
until you start encountering problems, such as InstanceOwnerException:
The execution of an InstancePersistenceCommand was interrupted because
the instance owner registration for owner ID
'9938cd6d-a9cb-49ad-a492-7c087dcc93af' has become invalid. This error
indicates that the in-memory copy of all instances locked by this
owner have become stale and should be discarded, along with the
InstanceHandles. Typically, this error is best handled by restarting
the host.
The error is closely related to the HostLockRenewalPeriod property
which defines how long obtained instance handle is valid without being
renewed. If you try monitoring the database while an instance store
with a valid instance handle is instantiated, you will notice
[System.Activities.DurableInstancing].[ExtendLock] being called
periodically. This stored procedure is responsible for renewing the
handle. If for some reason it fails to be called within the specified
HostLockRenewalPeriod, the above mentioned exception will be thrown
when attempting to persist a workflow. A typical reason for this would
be temporarily inaccessible database due to maintenance or networking
problems. It's not something that happens often, but it's bound to
happen if you have a long living instance store, e.g. in a constantly
running workflow host, such as a Windows service.
Fortunately it's not all that difficult to fix the problem, once you
know the cause of it. Before using the instance store you should
always check, if the handle is still valid; and renew it, if it's not:
if (!instanceHandle.IsValid)
{
instanceHandle = instanceStore.CreateInstanceHandle();
var view = instanceStore.Execute(instanceHandle,
new CreateWorkflowOwnerCommand(), TimeSpan.FromSeconds(10));
instanceStore.DefaultInstanceOwner = view.InstanceOwner;
}
It's definitely less invasive than the restart of the host, suggested
by the error message.
you have to be sure about expiration of owner user
here how I am used to handle this issue
public SqlWorkflowInstanceStore SetupSqlpersistenceStore()
{
SqlWorkflowInstanceStore sqlWFInstanceStore = new SqlWorkflowInstanceStore(ConfigurationManager.ConnectionStrings["DB_WWFConnectionString"].ConnectionString);
sqlWFInstanceStore.InstanceCompletionAction = InstanceCompletionAction.DeleteAll;
InstanceHandle handle = sqlWFInstanceStore.CreateInstanceHandle();
InstanceView view = sqlWFInstanceStore.Execute(handle, new CreateWorkflowOwnerCommand(), TimeSpan.FromSeconds(30));
handle.Free();
sqlWFInstanceStore.DefaultInstanceOwner = view.InstanceOwner;
return sqlWFInstanceStore;
}
and here how you can use this method
wfApp.InstanceStore = SetupSqlpersistenceStore();
wish this help

Quickly Testing Database Connectivity within the Entity Framework

[I am new to ADO.NET and the Entity Framework, so forgive me if this questions seems odd.]
In my WPF application a user can switch between different databases at run time. When they do this I want to be able to do a quick check that the database is still available. What I have easily available is the ObjectContext. The test I am preforming is getting the count on the total records of a very small table and if it returns results then it passed, if I get an exception then it fails. I don't like this test, it seemed the easiest to do with the ObjectContext.
I have tried setting the connection timeout it in the connection string and on the ObjectConntext and either seem to change anything for the first scenario, while the second one is already fast so it isn't noticeable if it changes anything.
Scenario One
If the connect was down when before first access it takes about 30 seconds before it gives me the exception that the underlying provider failed.
Scenario Two
If the database was up when I started the application and I access it, and then the connect drops while using the test is quick and returns almost instantly.
I want the first scenario described to be as quick as the second one.
Please let me know how best to resolve this, and if there is a better way to test the connectivity to a DB quickly please advise.
There really is no easy or quick way to resolve this. The ConnectionTimeout value is getting ignored with the Entity Framework. The solution I used is creating a method that checks if a context is valid by passing in the location you which to validate and then it getting the count from a known very small table. If this throws an exception the context is not valid otherwise it is. Here is some sample code showing this.
public bool IsContextValid(SomeDbLocation location)
{
bool isValid = false;
try
{
context = GetContext(location);
context.SomeSmallTable.Count();
isValid = true;
}
catch
{
isValid = false;
}
return isValid;
}
You may need to use context.Database.Connection.Open()