Restart Akka Actor System after terminated - scala

We have an Akka http app with approx. 100+ API and 15+ Actors. After Http.bindAndHandle(routes, host, port) I have terminated ActorSystem.
Http().bindAndHandle(corsHandler(routes), "0.0.0.0", 9090/*, connectionContext = https*/)
sys.addShutdownHook(actorSystem.terminate())
So, I don't want to stop my application. So, My questions are:
Does actorsystem needs to terminate compulsory?
Does my application stop working after terminating actorsystem?
What if user hits API after actorsystem is terminated? Does it Restart again to handle API requests?
So, what do I need to do if I want my application always listening to client requests.
Thanks.

You are looking for fault tolerance in your application. As the actor system is going to be terminated in situation of some error or when we explicitly force it to terminate. You have to use supervision strategy for your application to be fault tolerant. Please look into these links
https://doc.akka.io/docs/akka/2.5/fault-tolerance.html
https://doc.akka.io/docs/akka/2.5/general/supervision.html

The purpose of a shutdown hook is to allow an orderly shut down of the application when the JVM is about to shutdown. It's not necessarily required in all circumstances, but an orderly shutdown could be useful if your ActorSystem wants to release resources in an orderly manner, or signal to other nodes in a cluster that it's being shut down.
When the actor system has terminated, there will be no more actors to handle HTTP requests, because actors cannot exist without a running actor system to be part of. So no, if your user hits the API after the actor system has terminated, the actor system will not be restarted, because instead the request will simply be rejected (connection refused or something).
You can't avoid that happening in your code because a JVM shutdown cannot be cancelled.
However, the good news is, you can avoid it at the infrastructure level using various operational techniques, e.g. blue-green deployments with a HTTP load balancer can support downtime-free upgrades of stateless applications.

Related

Should/could a Kubernetes pod process.exit(1) itself or is it better to use liveliness probe?

I have an important service running in Kubernetes.
You can picture it as a dispatcher service which connects to a publisher and dispatch the information to a RabbitMQ queue:
[Service1: publisher] -> [Service2: dispatcher] -> [Service3: RabbitMQ]
(everything is in the same namespace, there is one replica for each service and they are set as Stateful Sets)
If the connection between publisher and dispatcher is down, publisher will buffer the messages, all good.
However, if the connection between dispatcher and RabbitMQ is down, the messages will be lost.
Thus, when I lose connection with RabbitMQ, I'd like to somehow process.exit(1) the dispatcher so it instantly stops receiving messages from publisher and then the messages are buffered.
I'm also thinking about doing it in a more "k8s way" by setting up a liveliness probe but I'm afraid this could take some time before it detects it and restart (without "DDoSing" my pod every 1sec). Because to set up this probe, I would have to "listen for disconnect" and if I know I am disconnected, why should I wait for k8s to take action (and lose precious messages) while I could simply kill the service?
I know the question might be a bit vague but I'm also asking for some hints / best practices here. Thanks.

Blocking a Service Fabric service shutdown externally

I'm going to write a quick little SF service to report endpoints from a service to a load balancer. That part is easy and well understood. FabricClient. Find Services. Discover endpoints. Do stuff with load balancer API.
But I'd like to be able to deal with a graceful drain and shutdown situation. Basically, catch and block the shutdown of a SF service until after my app has had a chance to drain connections to it from the pool.
There's no API I can find to accomplish this. But I kinda bet one of the services would let me do this. Resource manager. Cluster manager. Whatever.
Is anybody familiar with how to pull this off?
From what I know this isn't possible in a way you've described.
Service Fabric service can be shutdown by multiple reasons: re-balancing, errors, outage, upgrade etc. Depending on the type of service (stateful or stateless) they have slightly different shutdown routine (see more) but in general if the service replica is shutdown gracefully then OnCloseAsync method is invoked. Inside this method replica can perform a safe cleanup. There is also a second case - when replica is forcibly terminated. Then OnAbort method is called and there are no clear statements in documentation about guarantees you have inside OnAbort method.
Going back to your case I can suggest the following pattern:
When replica is going to shutdown inside OnCloseAsync or OnAbort it calls lbservice and reports that it is going to shutdown.
The lbservice the reconfigure load balancer to exclude this replica from request processing.
replica completes all already processing requests and shutdown.
Please note that you would need to implement startup mechanism too i.e. when replica is started then it reports to lbservice that it is active now.
In a mean time I like to notice that Service Fabric already implements this mechanics. Here is an example of how API Management can be used with Service Fabric and here is an example of how Reverse Proxy can be used to access Service Fabric services from the outside.
EDIT 2018-10-08
In order to abstract receive notifications about services endpoints changes in general you can try to use FabricClient.ServiceManagementClient.ServiceNotificationFilterMatched Event.
There is a similar situation solved in this question.

Spark Driver died, but did not kill the application

I have a streaming job, which fails due to a network call timeout. Whereas the application keeps retying for some time, If in the mean time I kill the Driver, the application does not die. And I have to manually kill the application through the UI.
My question is:
Does this happen because the network connection forms over a different thread and does not let the Application die??

Does gRPC server spin up a new thread for each request?

I tried profiling a gRPC java server. And i see the below set of thread pools majorly.
grpc-default-executor Threads : Created 1 for each incoming request.
grpc-default-worker-ELG Threads: May be to listen on the incoming gRPC requests & assign to the above "grpc-default-executor" thread.
Overall, is gRPC java server, Netty style or Jetty/Tomcat style? Or it can configured to run as both ways?
gRPC Java server is exposed closer to Jetty/Tomcat style, except that it is asynchronous. That is, in normal Servlets each request consumes a thread until it is complete. While newer Servlet versions let you detach from the dedicated thread and continue work asynchronously (freeing the thread for other use) that is more uncommon. In gRPC you are free to work in either style. Note that gRPC uses a cachedThreadPool by default to reuse threads; on server-side it's a good idea to replace the default executor with your own, generally fixed-size, pool via ServerBuilder.executor().
Internally gRPC Java uses the Netty-style. That means fully non-blocking. You may use ServerBuilder.directExecutor() to run on the Netty threads. Although in that case you may want to specify the NettyServerBuilder.bossEventLoopGroup(), workerEventLoopGroup(), and for compatibility channelType().
As far as I know you can specify using the directExecutor() when building the GRPC server / client that will ensure all work is done in the IO thread and so threads will be shared. The default is to not do this for safety reasons as you will need to be very careful about what you do if you are in the IO Thread (like you should never block there).

Distributed Actors in Akka

I'm fairly new to Akka and new to distributed programming in general. Using Akka's Mist component, I've created supervised actors to handle HTTP requests asynchronously. Everything is currently running on one physical machine with local actors. What I don't understand is how to build a truly fault-tolerant system with more than one box. As stated in the Akka docs:
Also, you (usually) need to know if one box is down and/or the service you are talking to on the other box is down. Here actor supervision/linking is a critical tool for not only monitoring the health of remote services, but to actually manage the service, do something about the problem if the actor or node is down. Such as restarting actors on the same node or on another node.
How do I do this? I'm looking for an example or pointers on how to begin making my application distributed. Other services in our group use Apache gateways in front of multiple Tomcat instances, so the event of a Tomcat server going down is transparent to the user. I'm deploying my service to the Akka microkernel and need to achieve a similar level of high availability across more than one physical box.
I'm using Akka 1.1.3.
Remote supervision works only with client-managed remote actors for the Akka 1.x series.
Akka 2.0 that is currently under development will support transparent clustering, cluster-wide supervision and cluster-wide lifecycle monitoring.
You might consider putting an HTTP load balancer in front of Akka Microkernel instances running Mist, this would match what your group does with 'Apache gateways'.
Another approach would be to expose remote actors on a number of instances and then use Akka's LoadBalancer or Actor Pool to send messages around, see here
The second approach is a bit of a pain if you have a dynamic pool of machines, because the pool of devices wants to be specified programatically. Akka 2.0 addresses this with cluster support that is setup in the akka.conf file.
As far as the release date of 2.0, for what its worth 1.2 was just recently released on 2011-Sept-19.