Handling the 'Faulted' state of a Workflow - workflow

I'm wondering how best to handle the Faulted state in a WF4 workflow service host. I'm using a console self-hosted service. I understand one approach is to implement the IErrorHandler interface, but does anybody know how I then configure this on my service? i.e. How do I add to the Behaviors collection?
Additionally, I wonder if anybody had any thoughts/advice on how best to handle a 'restart' scenario (or indeed if it's possible??) once the workflow service host has entered the Faulted state. My understanding is that once the service host enters the faulted state then it is end game and the application is in effect terminated. Can anybody give me a possible strategy for this? I'm thinking maybe a management service on top that handles failed instances of the workflow service host console application - though I'd be interested to hear from people who've faced this dilemma before, before I attempt anything.
EDIT:
Also, I'm working in a clustered environment. When the cluster enters a fail-over state, the workflow appears to lose connectivity with the database for a period of (no more than) one minute. Has anybody dealt with this scenario specifically?
Thanks in advance
Ian

We have a solution with Microsoft.Activities v1.8.4 see WorkflowService Configuration Based Extensions which allows you to add extensions using a service behavior and some config.

Related

Retaining and Migrating Actor / Service State

I've been looking at using service fabric as a platform for a new solution that we are building and I am getting hung up on data / stage management. I really like the concept of reliable services and the actor model and as we have started to prototype out some things it seems be working well.
With that beings said I am getting hung up on state management and how I would use it in a 'real' project. I am also a little concerned with how the data feels like a black box that I can't interrogate or manipulate directly if needed. A couple scenarios I've thought about are:
How would I share state between two developers on a project? I have an Actor and as long as I am debugging the actor my state is maintained, replicated, etc. However when I shut it down the state is all lost. More importantly someone else on my team would need to set up the same data as I do, this is fine for transactional data - but certain 'master' data should just be constant.
Likewise I am curious about how I would migrate data changes between environments. We periodically pull production data down form our SQL Azure instance today to keep our test environment fresh, we also push changes up from time to time depending on the requirements of the release.
I have looked at the backup and restore process, but it feels cumbersome, especially in the development scenario. Asking someone to (or scripting the) restore on every partition of every stateful service seems like quite a bit of work.
I think that the answer to both of these questions is that I can use the stateful services, but I need to rely on an external data store for anything that I want to retain. The service would check for state when it was activated and use the stateful service almost as a write-through cache. I'm not suggesting that this needs to be a uniform design choice, more on a service by service basis - depending on the service needs.
Does that sound right, am I overthinking this, missing something, etc?
Thanks
Joe
If you want to share Actor state between developers, you can use a shared cluster. (in Azure or on-prem). Make sure you always do upgrade-style deployments, so state will survive. State is persisted if you configure the Actor to do so.
You can migrate data by doing a backup of all replica's of your service and restoring them on a different cluster. (have the service running and trigger data-loss). It's cumbersome yes, but at this time it's the only way. (or store state externally)
Note that state is safe in the cluster, it's stored on disk and replicated. There's no need to have an external store, provided you do regular state backups and keep them outside the cluster. Stateful services can be more than just caches.

How to manage state in microservices?

First of all, this is a question regarding my thesis for school. I have done some research about this, it seems like a problem that hasn't been tackled yet (might not be that common).
Before jumping right into the problem, I'll give a brief example of my use case.
I have multiple namespaces containing microservices depending on a state X. To manage this the microservices are put in a namespace named after the state. (so namespaces state_A, state_B, ...)
Important to know is that each microservice needs this state at startup of the service. It will download necessary files, ... according to the state. When launching it with state A version 1, it is very likely that the state gets updated every month. When this happens, it is important to let all the microservices that depend on state A upgrade whatever necessary (databases, in-memory state, ...).
My current approach for this problem is simply using events, the microservices that need updates when the state changes can subscribe on the event and migrate/upgrade accordingly. The only problem I'm facing is that while the service is upgrading, it should still work. So somehow I should duplicate the service first, let the duplicate upgrade and when the upgrade is successful, shut down the original. Because of this the used orchestration service would have to be able to create duplicates (including duplicating the state).
My question is, are there already solutions for my problem (and if yes, which ones)? I have looked into Netflix Conductor (which seemed promising with its workflows and events), Amazon SWF, Marathon and Kubernetes, but none of them covers my problem.
Best of all the existing solution should not be bound to a specific platform (Azure, GCE, ...).
For uninterrupted upgrade you should use clusters of nodes providing your service and perform a rolling update, which takes out a single node at a time, upgrading it, leaving the rest of the nodes for continued servicing. I recommend looking at the concept of virtual services (e.g. in kubernetes) and rolling updates.
For inducing state I would recommend looking into container initialization mechanisms. For example in docker you can use entrypoint scripts or in kubernetes there is the concept of init containers. You should note though that today there is a trend to decouple services and state, meaning the state is kept in a DB that is separate from the service deployment, allowing to view the service as a stateless component that can be replaced without losing state (given the interfacing between the service and required state did not change). This is good in scenarios where the service changes more frequently and the DB design less frequently.
Another note - I am not sure that representing state in a namespace is a good idea. Typically a namespace is a static construct for organization (of code, services, etc.) that aims for stability.

Fake services mock for local development

This has happend to me more than once, thought someone can give some insight.
I have worked on multiple projects where my project depends on external service. When I have to run the application locally, i would need that service to be up. But sometimes I would be coding to the next version of their service which may not be ready.
So the question is, is there already a way that can have a mock service up and running that i could configure with some request and responses?
For example, lets say that I have a local application that needs to make a rest call to some other service outside to obtain some data. E.g, say, for given a user, i need to find all pending shipments which would come from other service. But I dont have access to that service.
In order to run my application, i need a working external service but I dont have access to it in my environment. Is there a better way rather than having to create a fake service?
You should separate the communications concerns from your business logic (something I call "Edge Component" see here and here).
For one it will let you test the business logic by itself. It will also give you the opportunity to rethink the temporal coupling you currently have. e.g. you may want the layer that handle communications to pre-fetch, cache etc. data from other services so that you will also have more resilient services at run time

mqsvc.exe pegs cpu at full usage when deploying nservicebus to production

When I deployed my site that uses nservice to a new production box, it was unusably slow...
After some debugging I discovered that mqsvc.exe was taking up 50% of the CPU usage and the other 50% was being taken up by w3wp.exe
I found this post here:
http://geekswithblogs.net/michaelstephenson/archive/2010/05/07/139717.aspx
which recommended the following:
Make sure you set the windows service for NserviceBus Generic Host to the right credentials
Make sure you have the queue set with the right permissions
Make sure you turn on the right logging configuration in NServiceBus
So I figured the issue was something related to permissions, but even after trying to set the permissions correctly (I thought) I still wasn't able to resolve the issue.
If you allow NServiceBus to create its own queues, then it will create them with the correct permissions it needs.
The problem comes in when you set up a web application, and then the queues are created, and then the identity the application runs under changes. Then you get exactly this problem. NServiceBus tries to check the queue for a message, it does not have access to do so, so it immediately retries over and over, and you spike the processor.
The fix: Delete the queue. Restart the web application. NServiceBus takes over.
Edit: As noted in the comments, NServiceBus 3.x doesn't invoke the installers by default, which means queues are not automatically created in production unless you ask it to. See the documentation page on Installers for more detail.
For a web application (or any other situation where you're not using NServiceBus.Host) you can invoke the installers as part of the fluent config. There is a full example in the NServiceBus download, but here is a link to the relevant file on GitHub.
The issue did end up being that the website needed to be granted explicit permissions to the queues.
I found a number of resources online telling me this, but I still had to spend a good amount of time monkeying around with exactly WHICH account needed access... turned out that since my application pools were set to run as ApplicationPoolIdentity, I need to grant the account permissions by adding the following account to the nservicebus queue:
IIS AppPool\{APP POOL NAME}
I granted full access rights, though I'm sure you could refine that a bit if you needed to.
Hopefully, this will help anyone who runs into the same issues.
(This is my first attempt at the "Answer your own question" mechanism so please let me know if I am doing something wrong..)

Continuation of a process after a system crash/restart - Drools Flow

I've been playing with examples I downloaded with the book Drools JBoss Rules 5.0. To my relief they work :) Drools Flow has been my point of interest as a possible workflow engine replacement.
As I'm trying to wrap my head around things, I've been wondering how a premature death of a rulesflow process gets restarted? What I'm mean is say a process is bouncing from one node to another like expected, then the containing process dies due to a crash, restart or whatever. Is the current node/place of the ruleflow process retained, and can it just continue from that point on system restart? If so how?
The group I work for is very Java EE centric with JBoss being our favorite application server. I see examples of Drools leveraging Spring's persistence and bean lookup support.
Are there examples of doing the same with JBoss?
If you persist the state of the process instances and tasks in the database. Even if the VM was down and restart again, you can retrieve the process instances.
Use the
To create the session
ksession = JPAKnowledgeService.newStatefulKnowledgeSession(kbase,null,env)
To load the session with the session id.
ksession = JPAKnowledgeService.loadStatefulKnowledgeSession( sessionId, kbase,
You only need to know the session id. Session information will be store in SessionInfo table. Download the example project below.
http://dl.dropbox.com/u/2634115/drools-test.zip
The example is using Btm with H2 database, it also work well with mysql-connector-java-5.1.13 with Btm. Note that the process that are complete will be automatically deleted from the database.
You are looking at the basic concept of Process Migration. During what is known as strong migration, a process can be stopped on one machine and the entire state of the process migrated to another machine (including the program counter and all existing stacks). Before you go thinking that this is completely insane, think about this from a JVM perspective. Since you're application is already being run in virtual hardware; it isn't hard to stop the application and pick it back up where it left off since it is completely virtualized.
If you would like another example, look at VMWare; an entire machine can be paused and migrated to another machine and started again. It's very interesting stuff and usually relates mainly to Distributed Computing where you might have hundreds of agents that need to migrate from machine to machine as some go down for maintenance.
I realize that I didn't give an example of this through JBoss; but giving a background on what exactly you're looking for can give you a much better insight into what to look for going forward.