Azure Service Fabric Actor Retry Logic on Exception - azure-service-fabric

I'm working on a Azure Service Fabric solution.
I have implemented an Actor with some state, everything works fine.
Can anyone explain me , what happens if an user exception is thrown in my Actor implementation? What does the Service Fabric Environment do if a call to an actor throws an exception. Is there any default retry logic, that Forces the call again?

If an actor throws an exception, it gets handled inside ActorRemotingExceptionHandler or other default implementation of IExceptionHandler. Currently, if the exception is an ordinary exception which is not related to network issues or cluster or nodes availability, it will be rethrown on the client side where you will be able to handle it.

Related

Stateless Worker service in Service Fabric restarted in the same process

I have a stateless service that pulls messages from an Azure queue and processes them. The service also starts some threads in charge of cleanup operations. We recently ran into an issue where these threads which ideally should have been killed when the service shuts down continue to remain active (definitely a bug in our service shutdown process).
Further looking at logs, it seemed that, the RunAsync methods cancellation token received a cancellation request, and later within the same process a new instance of the stateless service that was registered in ServiceRuntime.RegisterServiceAsync was created.
Is this expected behavior that service fabric can re-use the same process to start a new instance of the stateless service after shutting down the current instance.
The service fabric documentation https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-hosting-model does seem to suggest that different services can be hosted on the same process, but I could not find the above behavior being mentioned there.
In the shared process model, there's one process per ServicePackage instance on every node. Adding or replacing services will reuse the existing process.
This is done to save some (process-level) overhead on the node that runs the service, so hosting is more cost-efficient. Also, it enables port sharing for services.
You can change this (default) behavior, by configuring the 'exclusive process' mode in the application manifest, so every replica/instance will run in a separate process.
<Service Name="VotingWeb" ServicePackageActivationMode="ExclusiveProcess">
As you mentioned, you can monitor the CancellationToken to abort the separate worker threads, so they can stop when the service stops.
More info about the app model here and configuring process activation mode.

Blocking a Service Fabric service shutdown externally

I'm going to write a quick little SF service to report endpoints from a service to a load balancer. That part is easy and well understood. FabricClient. Find Services. Discover endpoints. Do stuff with load balancer API.
But I'd like to be able to deal with a graceful drain and shutdown situation. Basically, catch and block the shutdown of a SF service until after my app has had a chance to drain connections to it from the pool.
There's no API I can find to accomplish this. But I kinda bet one of the services would let me do this. Resource manager. Cluster manager. Whatever.
Is anybody familiar with how to pull this off?
From what I know this isn't possible in a way you've described.
Service Fabric service can be shutdown by multiple reasons: re-balancing, errors, outage, upgrade etc. Depending on the type of service (stateful or stateless) they have slightly different shutdown routine (see more) but in general if the service replica is shutdown gracefully then OnCloseAsync method is invoked. Inside this method replica can perform a safe cleanup. There is also a second case - when replica is forcibly terminated. Then OnAbort method is called and there are no clear statements in documentation about guarantees you have inside OnAbort method.
Going back to your case I can suggest the following pattern:
When replica is going to shutdown inside OnCloseAsync or OnAbort it calls lbservice and reports that it is going to shutdown.
The lbservice the reconfigure load balancer to exclude this replica from request processing.
replica completes all already processing requests and shutdown.
Please note that you would need to implement startup mechanism too i.e. when replica is started then it reports to lbservice that it is active now.
In a mean time I like to notice that Service Fabric already implements this mechanics. Here is an example of how API Management can be used with Service Fabric and here is an example of how Reverse Proxy can be used to access Service Fabric services from the outside.
EDIT 2018-10-08
In order to abstract receive notifications about services endpoints changes in general you can try to use FabricClient.ServiceManagementClient.ServiceNotificationFilterMatched Event.
There is a similar situation solved in this question.

Restart Akka Actor System after terminated

We have an Akka http app with approx. 100+ API and 15+ Actors. After Http.bindAndHandle(routes, host, port) I have terminated ActorSystem.
Http().bindAndHandle(corsHandler(routes), "0.0.0.0", 9090/*, connectionContext = https*/)
sys.addShutdownHook(actorSystem.terminate())
So, I don't want to stop my application. So, My questions are:
Does actorsystem needs to terminate compulsory?
Does my application stop working after terminating actorsystem?
What if user hits API after actorsystem is terminated? Does it Restart again to handle API requests?
So, what do I need to do if I want my application always listening to client requests.
Thanks.
You are looking for fault tolerance in your application. As the actor system is going to be terminated in situation of some error or when we explicitly force it to terminate. You have to use supervision strategy for your application to be fault tolerant. Please look into these links
https://doc.akka.io/docs/akka/2.5/fault-tolerance.html
https://doc.akka.io/docs/akka/2.5/general/supervision.html
The purpose of a shutdown hook is to allow an orderly shut down of the application when the JVM is about to shutdown. It's not necessarily required in all circumstances, but an orderly shutdown could be useful if your ActorSystem wants to release resources in an orderly manner, or signal to other nodes in a cluster that it's being shut down.
When the actor system has terminated, there will be no more actors to handle HTTP requests, because actors cannot exist without a running actor system to be part of. So no, if your user hits the API after the actor system has terminated, the actor system will not be restarted, because instead the request will simply be rejected (connection refused or something).
You can't avoid that happening in your code because a JVM shutdown cannot be cancelled.
However, the good news is, you can avoid it at the infrastructure level using various operational techniques, e.g. blue-green deployments with a HTTP load balancer can support downtime-free upgrades of stateless applications.

Error while deleting stateless service in Service Fabric Cluster

In Service Fabric Cluster i have a stateless service which has a while(true) loop running continuously in RunAsync Method. Due to this while loop i am finding it hard to delete the application from the cluster. Error occurs every time i try to delete stating cannot detach the process.Normally i try to deploy the application once to remove the code. To redeploy the code on top of the application i have to deploy twice. Is there a work around to this without removing the infinite while loop.
Updated: Runasync Method
protected override async Task RunAsync(CancellationToken cancellationToken)
{
//making sure the thread is active
while (true)
{
do something;
}
}
Thank you for the input.
During shutdown, the cancellation token passed to RunAsync is canceled.
You need to check the cancellation token's IsCancellationRequested property in your main loop. When this becomes true, and if called, the token's ThrowIfCancellationRequested method throws an OperationCanceledException.
If your service does not respond to these API calls in a reasonable amount of time, Service Fabric can forcibly terminate your service. Usually this only happens during application upgrades or when a service is being deleted. This timeout is 15 minutes by default.
See this document for a good reference: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-lifecycle#stateless-service-shutdown

Azure service fabric does not cancel with lots of tasks

I'm trying to test out various performance aspects of Azure Service Fabric to understand how it all works and have hit against some problems when cancelling a service.
In one particular test, I create 101 tasks, 50 reading a queue, 50 writing a queue and 1 showing progress and reporting it.
When the service gets stopped, for example just re-deploying the application I can see the cancellation token gets the request and some tasks cancel, but I see a lot of events in the event viewer basically saying
AsyncCalloutAdapter-22542743: end delegate threw an exception
System.OperationCanceledException: Operation canceled. ---> System.Runtime.InteropServices.COMException: Operation aborted (Exception from HRESULT: 0x80004004 (E_ABORT))
at System.Fabric.Interop.NativeRuntime.IFabricStateReplicator2.EndReplicate(IFabricAsyncOperationContext context)
at System.Fabric.Interop.AsyncCallOutAdapter2`1.Finish(IFabricAsyncOperationContext context, Boolean expectedCompletedSynchronously)
--- End of inner exception stack trace ---
The only way to get this back is to either reset the local cluster.
When using a fewer number of tasks it all seems to work ok.
This is all using a local development cluster.