Part of DROOLS ruleflow not working randomly - drools

We have a main ruleflow which calls 8 more rule flows (Rule1.rf to Rule8.rf) through an AND splitter. One of the rule flows - say Rules4.rf - is fired sometimes and not fired sometimes.
This is for an online application and we use jBoss. When the server is started, everything works fine. After many hours, for some requests, Rules4.rf is not fired at all and for others, its fired properly.
We even posted the same request again and again and the issue happens some times only. There is no difference in the logs between the success & failure requests, except for the logs from the Rules4.rf which missing in failued requests.
We are using drools 5.1 and java 6.
Please help me. This is creating a very big issue.

It is very difficult to figure out what might be going on without being able to look at the actual code and log. Could you by any chance create a JIRA and attach the process and if possible (a part of) an audit log that shows the issue?
Kris

Related

Coffescripts not receiving data from broadcasted message relay job

Here's the code:
How to ensure the javascripts/channels/chatrooms.coffee is loading and receive: (data) works? Console.log data is not loading
I posted the problem the other day. It comes from a tutorial, but somewhere after continuing work on my project, I'm not sure where the bug came up,
But the message relay job posts on the server, and my config.yml has redis with redis up and running.
There have been similar bugs, but I've worked through those solutions and it's not enough. The received: (data) console log doesn't arrive in the js console of the browser.
The App subscriptions seem to all be in order according to the tutorial, I bet it's a really simple fix.
My messaging system still works, so it's not crucial to the project, but it's a difference of having the chat system work in realtime versus with a 4 second page-refresh delay.
The last problem I had was not including jquery for my other coffeescripts, so I'm guessing the channels coffeescripts is probably a one line fix.

AppFabric Hosted Workflow does not always reload after delay/unload

I have a WCF Windows Workflow (4.5) Workflow Service hosted under IIS and using AppFabric 1.1. The workflow instances are long-running (up to about a week), but much of the time is spent in Delay activities.
This seemed to work fine at first, but when running multiple instances of the workflow at the same time (2+ instances causes this), some of them just never wake up once they've unloaded from memory during the Delay step. When I look at the logs, the errors I find all look like this:
System.OperationCanceledException: The execution of InstancePersistenceCommands has been canceled because the InstanceHandle was freed.
at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Activities.Dispatcher.DurableInstanceManager.WaitAndHandleStoreEventsCallback(IAsyncResult result)
Unfortunately, I'm not finding any useful information on that error message.
The SuspensionExceptionName and SuspensionReason fields in the AppFabric Persisted Instances Table show System.NullReferenceException: Object reference not set to an instance of an object. But this doesn't happen inside my workflow, only outside.
Additional Info:
I'm running the activity as a Fire & Forget (receive activity, no send)
My workflow calls into other WCF services to fetch data.
I am running this on Server 2012 R2, IIS 8 (not azure)
Workflow Persistence is working. I can reset IIS, reboot... its just when I run 2 instances that it has problems.
I'm definitely not hitting any kind of throttling limits. While the workflow deals with a few MB of data, this issue happens at 2+ instances.
Any idea what might be happening here?
Edit:
I realized I found more information on how the issue operates and never added it to the question. When the delay issue happens, it operates a lot like a static variable getting written by 2 threads.
Here's a visualization:
WF1 Start ---->Do Stuff--->Sleep------------*1----->Cancelled Exception at some point
------WF 2 Start---->Do Stuff------->Sleep->Wake up---------*2------>More Stuff---->End Successfully
*1 - When WF Instance 1 Should Wake up (Same time as WF 2 wakes)
*2 - When WF Instance 2 Should have woken up (Seems to be ignored)
Before anyone asks... I got rid of every static variable, method, class in my code. Nothing is static anymore.
I've been struggling with similar issues for quite a while. I use WFW4 and I find similar errors when a workflow instance is in a long delay.
I don't know what the cause of the problem is, but I have a work around that you might find helpful.
In my case, the errors I get are from Workflow Management Service and say:
Failed to invoke service management endpoint at 'net.pipe://.svc' to activate service '/Alerts/Workflows/.xamlx'. Exception: 'Access is denied.'
These errors start happening sometime between 6 and 30 hours after the instance goes into a long delay.
I have found that if I create a new instance of the workflow when the first instance is in delay and the errors are happening, then Workflow Management Service is able to resume interacting with the first sleeping instance.
So, I made a new workflow whose sole purpose is to periodically launch and then kill instances of the workflow that contains the long delay.
It actually gets a bit more complicated to make this work. I wanted this new workflow to also go to sleep between times when it creates and kills a new instance of the first workflow. But this going to sleep causes the instance of the new workflow to suffer the same problem as the first workflow. So, I modified the new workflow so it does the following:
-- delay for some rather short period, such as 30 minutes
-- create an instance of the first workflow
-- wait a minute
-- kill the just-created instance of the first workflow
-- create a new instance of this new error-preventing workflow
-- terminate
Since having done this, I no longer get the Access is Denied error from Workflow Management Service!
Hope this helps
Turns out my first answer was not correct, but I believe this answer is right, and solves the issue ChrisG is having.
My workaround did not actually work. Took a while for the problem to resurface. 29 hours to be precise - the default time it takes for an app pool to recycle.
So for me, the solution was to make my app pool not recycle. When an app pool recycles while a workflow instance is in a delay activity, the workflowManagementService is not able to wake up the instance and throws Access is Denied errors. If you create a new instance of the workflow after the app pool has recycled, the first instance will pick up where it left off, but sometimes still has problems, which is what I believe is happening to ChrisG.
ChrisG, looking at your visualization, is it possible that an appPool is recycling during the time wf1 is sleeping? I believe that is the cause the exception. If you then launch a new wf instance after *2 has passed (and if an app pool recycle happened prior to *1), that will wake up both wf1 and wf2, but wf1 won't work properly (at least in my experience)
Also, this happens after iisresets and server reboots. To handle those, you need to use IIS7 which allows the web application (as well as the web site) which is hosting the xamlx files to autostart after an iisreset or server reboot. This option is not available in IIS6. See http://www.postseek.com/meta/991815402b369e71ce925cde47ac907d for details
Hope this helps!

Threading profile for one endpoint in Mule results in endless hanging when used for two endpoints

I added a program to a Mule project in order to avoid duplication of code. They are both REST, access the same webservice and return json. The first service's threading profile looked like this:
<configuration doc:name="Configuration">
<default-receiver-threading-profile maxThreadsActive="10" poolExhaustedAction="WAIT" threadWaitTimeout="10000" maxBufferSize="100" maxThreadsIdle="2" >
</configuration>
When I added my second service, the same threading profile caused the program to hang endlessly. No log entries, no nothing. Raising max active threads to 1000, or even deleting the WAIT option doesn't do anything. It's only when I change "maxThreadsIdle" to 3 or higher, or delete maxBufferSize or delete the entire thing that both can work in the same project. One other thing...when I edit the mule flow and save, the program automatically launches again. Oddly enough, the results end up appearing in any browser I left hanging from trying to submit.
What I want to know is why the min threads needs to be set to 3 or higher...I mean what the heck is actually going on here? Ideally, I would like to keep the threading configuration set to what I have here.

SWTBot doesn't work

I follow a book called Eclipse Plugin Development by Example: Beginner's Guide and all examples are hosted at github. However, I can't successfully run SWTBot example.
The first time it takes a very long time to run, but in the end it would pass all test cases.
However when I try to run the same code second time, it only testUI() will pass, the other three will have org.eclipse.swtbot.swt.finder.exceptions.WidgetNotFoundException: The widget was null.
Somewhere in the book is said
If one (shell) is not currently visible, it polls (every 500 milliseconds by default) until one is found or the default timeout period (5 seconds) ends when a WidgetNotFoundException is thrown
But I don't see why the first time all test cases will pass but not the second time.
but I have not idea why the first time will work but second time won't.
I also report this at github issue but so far no one response.
did you interfere with your desktop while the test was running? I found this can (!) cause problems with SWTBot.
Also, WidgetNotFound is an exception you'll be seeing a lot when using this framework. Sometimes it might be due to bugs, sometimes to unusual underlying UI code. It should be reproducible in those cases, though.

How to use a WF DelayActivity in an ASP.Net web based workflow

I have a web application that I am adding workflow functionality to using Windows Workflow Foundation. I have based my solution around K. Scott Allen's Orders Workflow example on OdeToCode. At the start I didn't realise the significance of the caveat "if you use Delay activities with and configure active timers for the manual scheduling service, these events will happen on a background thread that is not associated with an HTTP request". I now need to use Delay activities and it doesn't work as is with his solution architecture. Has anyone come across this and found a good solution to this? The example is linked to from a lot of places but I haven't seen anyone else come across this issue and it seems like a bit of a show stopper to me.
Edit: The problem is that the results from the workflow are returned to the the web application via HttpContext. I am using the ManualWorkflowSchedulerService with the useActiveTimers and this works fine for most situations because workflow events are fired from the web app and HttpContext still exists when the workflow results are returned and the web app can continue processing. When a delay activity is used processing happens on a background thread and when it tries to return results to the web app, there is no valid HttpContext (because there has been no Http Request), so further processing fails. That is, the webapp is trying to process the workflow results but there has been no http request.
I think I need to do all post Delay activity processing within the workflow rather than handing off to the web app.
Cheers.
You didn't describe the problem you are having. But maybe this is of some help.
You can use the ManualWorkflowSchedulerService with the useActiveTimers and the workflow will continue on another thread. Normally this is fine because your HTTP request has already finished and it doesn't really matter.
If however you need full control the workflow runtime will let you get a handle on all loaded workflows using the GetLoadedWorkflows() function. This will return acollection of WorkflowInstance objects. usign these you can can call the GetWorkflowNextTimerExpiration() to check which is expired. If one is you can manually resume it. In this case you want to use the ManualWorkflowSchedulerService with the useActiveTimers=false so you can control the last thread as well. However in most cases using useActiveTimers=true works perfectly well.