AppFabric Hosted Workflow does not always reload after delay/unload - workflow

I have a WCF Windows Workflow (4.5) Workflow Service hosted under IIS and using AppFabric 1.1. The workflow instances are long-running (up to about a week), but much of the time is spent in Delay activities.
This seemed to work fine at first, but when running multiple instances of the workflow at the same time (2+ instances causes this), some of them just never wake up once they've unloaded from memory during the Delay step. When I look at the logs, the errors I find all look like this:
System.OperationCanceledException: The execution of InstancePersistenceCommands has been canceled because the InstanceHandle was freed.
at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Activities.Dispatcher.DurableInstanceManager.WaitAndHandleStoreEventsCallback(IAsyncResult result)
Unfortunately, I'm not finding any useful information on that error message.
The SuspensionExceptionName and SuspensionReason fields in the AppFabric Persisted Instances Table show System.NullReferenceException: Object reference not set to an instance of an object. But this doesn't happen inside my workflow, only outside.
Additional Info:
I'm running the activity as a Fire & Forget (receive activity, no send)
My workflow calls into other WCF services to fetch data.
I am running this on Server 2012 R2, IIS 8 (not azure)
Workflow Persistence is working. I can reset IIS, reboot... its just when I run 2 instances that it has problems.
I'm definitely not hitting any kind of throttling limits. While the workflow deals with a few MB of data, this issue happens at 2+ instances.
Any idea what might be happening here?
Edit:
I realized I found more information on how the issue operates and never added it to the question. When the delay issue happens, it operates a lot like a static variable getting written by 2 threads.
Here's a visualization:
WF1 Start ---->Do Stuff--->Sleep------------*1----->Cancelled Exception at some point
------WF 2 Start---->Do Stuff------->Sleep->Wake up---------*2------>More Stuff---->End Successfully
*1 - When WF Instance 1 Should Wake up (Same time as WF 2 wakes)
*2 - When WF Instance 2 Should have woken up (Seems to be ignored)
Before anyone asks... I got rid of every static variable, method, class in my code. Nothing is static anymore.

I've been struggling with similar issues for quite a while. I use WFW4 and I find similar errors when a workflow instance is in a long delay.
I don't know what the cause of the problem is, but I have a work around that you might find helpful.
In my case, the errors I get are from Workflow Management Service and say:
Failed to invoke service management endpoint at 'net.pipe://.svc' to activate service '/Alerts/Workflows/.xamlx'. Exception: 'Access is denied.'
These errors start happening sometime between 6 and 30 hours after the instance goes into a long delay.
I have found that if I create a new instance of the workflow when the first instance is in delay and the errors are happening, then Workflow Management Service is able to resume interacting with the first sleeping instance.
So, I made a new workflow whose sole purpose is to periodically launch and then kill instances of the workflow that contains the long delay.
It actually gets a bit more complicated to make this work. I wanted this new workflow to also go to sleep between times when it creates and kills a new instance of the first workflow. But this going to sleep causes the instance of the new workflow to suffer the same problem as the first workflow. So, I modified the new workflow so it does the following:
-- delay for some rather short period, such as 30 minutes
-- create an instance of the first workflow
-- wait a minute
-- kill the just-created instance of the first workflow
-- create a new instance of this new error-preventing workflow
-- terminate
Since having done this, I no longer get the Access is Denied error from Workflow Management Service!
Hope this helps

Turns out my first answer was not correct, but I believe this answer is right, and solves the issue ChrisG is having.
My workaround did not actually work. Took a while for the problem to resurface. 29 hours to be precise - the default time it takes for an app pool to recycle.
So for me, the solution was to make my app pool not recycle. When an app pool recycles while a workflow instance is in a delay activity, the workflowManagementService is not able to wake up the instance and throws Access is Denied errors. If you create a new instance of the workflow after the app pool has recycled, the first instance will pick up where it left off, but sometimes still has problems, which is what I believe is happening to ChrisG.
ChrisG, looking at your visualization, is it possible that an appPool is recycling during the time wf1 is sleeping? I believe that is the cause the exception. If you then launch a new wf instance after *2 has passed (and if an app pool recycle happened prior to *1), that will wake up both wf1 and wf2, but wf1 won't work properly (at least in my experience)
Also, this happens after iisresets and server reboots. To handle those, you need to use IIS7 which allows the web application (as well as the web site) which is hosting the xamlx files to autostart after an iisreset or server reboot. This option is not available in IIS6. See http://www.postseek.com/meta/991815402b369e71ce925cde47ac907d for details
Hope this helps!

Related

Is there a way to process the queued webhook in ADO?

We have a service hook created for one of our projects in ADO. It was going fine until last weekend. Suddenly few webhooks started queued and I am not sure how to force it to get processed. Can someone help me if there is a way to force those items to get processed.
Thanks,
Venu
I am afraid that you cannot get that you want during process.
Under the process, the queued service hooks will not be picked again and will not be processed again.
When the main thread, such as a work item, is running, you cannot forcefully intervene or exit the content that is already queued.
And there is a similar issue also discussing about this situation.
And waiting service hooks are actually coupled, which also depends on your memory, because they actually run in memory. If there are occasional memory loss and other problems during execution, this cannot ensure that all service hooks can be executed as expected.
Or you should interrupt the current process and reduce the service hooks for it. But it is not a good solution.
So it is the best way to add a function that can handle the queued service hooks in the process. But currently there is no such function. Therefore we recommend you submit the suggestion ticket to the Team to suggest them add that feature.

Azure WebJob - Limit Processing Time to Specific Hours

I have an MVC web site, a storage queue and a WebJob. Users can request the generation of a set of reports by clicking a button on the web page. This inserts a message into the storage queue. In the past, the WebJob ran continuously and processed those requests fine. But the demand and size of the reports has grown to the point where the WebJob is slowing down the web app. I would like to still place the request message in the queue, but delay processing of all requests until the evening, when the web app is mostly idle. This would allow me to continue using the WebJob code and QueueTrigger functionality without having to waste resources by moving to a dedicated Worker Role, etc. The reports don't need to be generated immediately, so a delay is acceptable.
I don't see a built-in way to set a time window on processing. The only thing I have found is a powershell cmdlet for starting and stopping WebJobs (Start-AzureWebsiteJob / Stop-AzureWebsiteJob). So I was thinking that I could create a scheduled powershell job that runs at midnight, starts the webjob, lets it run, and then runs again early in the AM and stops it.
Does anyone know of a better option than this? Anything more "official" that perhaps I could not find?
One possible solution would be to hide the messages in the queue for a certain amount of time when they are inserted.
If you're using AddMessage method, you can specify this timespan value in initialVisibilityDelay parameter.
What this will do is ensure that the messages are not immediately visible in the queue to be picked by WebJob and will become visible only when this timespan elapses.
Will such a solution work for you?
Maybe I didn't fully understand your question, but couldn't you use "Triggered" WebJob that is triggered by CRON schedule? You can then limit it to specific hours
0 * 20-22 * * *
This example will run every minute from 8pm to 10pm

PhantomJS not killing webserver client connections

I have a kind of proxy server running on a WebServer module and I noticed that this server is being killed due to its memory consumption.
Every time the server gets a new request it creates a child client process, the problem I see is that the process remains alive indefinitely.
Here is the server I'm using:
server.js
I thought response.close() was closing and killing client connections, but it is not.
Here is the list of child processes displayed on htop:
(Those process are even more, it is just a fragment of the list)
I really need to kill those processes because they are using all the free memory. Am I missing something?
I could simply restart the server, but the memory will still be wasted.
Thanks you !
EDIT:
The processes I mentioned before are threads and no independient processes as I thought (check this).
Every http request creates a new thread, and that's ok, but this thread is not being killed after the script ends.
Also, I found out that no new threads are created if the request handler doesn't run casper (I mean casper.run(..)).
So, new threads are created only if the server runs a casper instance, the problem is that this instance doesn't end after run function does.
I tried casper.done() as mentioned below, but it kill the whole process instead of the current running thread. (I did not find any doc for this function).
When I execute other casper scripts, outside the server in the same machine, the instanced threads and the whole phantom process ends successfully. What would be happening?
I am using Phantom 2.1.1 and Casper 1.1.1 versions.
Please ask me anything if you want more or specific information.
Thanks again for reading !
This is a well known issue with casper:
https://github.com/casperjs/casperjs/issues/1355
It has not been fixed by the casper guys and is currently marked as an enhancement. I guess it's not on their priority list.
Anyways, the workaround is to write a server side component e.g. a node.js server to handle the incoming requests and for every request run a casper script to do the scraping in a new child process. This child process will be closed when casper terminates it's job. While this is a workaround, it is not an optimal solution as the cost of opening a child process for every request is not cheap. it will be hard to heavily scale an approach similar to this. However, it is a sufficient workaround. More on this fully sensible approach is in the link above.

Any way to have delayed_job execute some run-once code at startup and use across all jobs?

So I've got a delayed_job task that pushes some info to an XMPP server. Ideally you create a connection to XMPP once and then constantly push data to it, rather than creating a new connection every time you have some data to send.
Is there any kind of facility in delayed_job for running a sort of 'setup' method when a worker starts, have it set some instance variables (like the XMPP connection object) that can then be used by all the jobs that come up? It's okay if each worker runs its own setup method. I just don't want every job (thousands per day) connecting to the XMPP server from scratch every time.
Thanks for any help!
Delayed Job now has "Hooks" (enqueue, before, after, success, error, failure) - it looks like these were added around June 2010. The before hook would probably work in a case where you wanted to find an existing connection to reuse.

Windows Workflow: Persistence and Polling

I'm currently learning the WF framework, so bear with me; mostly I'm looking for where to start looking, not necessarily a direct answer. I just can't seem to figure out how to begin researching what I'd like in The Google.
Let's say I have a simple one-step workflow (much more complicated than that, but for simplicity's sake). This workflow needs to watch a certain record in the database to see when it changes. I don't have the capability to "push" via a trigger from the database when the row changes, so I need to poll for it every so often.
This workflow needs to be persisted to the database to be durable against restarts and whatnot as this is a long-running workflow. I'm trying to figure out the best way to get it to check every 3 minutes or so and also persist to the database. Do the persistence capabilities of the framework allow for that? It seems to be time-based. And since the workflow won't be reawakened by an external event, how does it reload from the database and check the same step it did previously again? Does it attempt the last unfulfilled activity automatically upon reloading?
Do "while" activities with a delay attached to it work at all, or can it be handled solely through the persistence services?
I'm not sure what you mean by "handled soley through persistence services"? Persistence refers only to the storing of an idle workflow.
You could have a Delay and a Code activity in a Sequence in a While loop. When in the Delay the workflow will go idle and may be persisted if necessary. However depending on how much state is needed when persisting the workflow and/or how many such workflows you would have running at any one time may mean that a leaner approach is necessary.
A leaner approach would be to externalise the DB watching and have some "DB watching" workflow service raise an event when the desired change has occured. This service would be added to Workflow runtime.
To that end you need a service contract which is defined by an Inteface with the [ExternalDataExchange] attribute. This interface in turn defines an event that the service will raise when the desired DB change is detected. It also defines a method that a Workflow can call to specify what what change this service should be looking for. The method should accept an instance GUID so that the requesting instance can be found when the DB change is detected.
In the workflow you use a CallExternalMethodActivity to call this services method. You then flow to a HandleExternalEventActivity which listen for the event. At this point the workflow will go idle and can be persisted. It will remain there until the service raises the event.