I am currently trying to get started with Akka and I am facing a weird problem. I've got the following code for my Actor:
class AkkaWorkerFT extends Actor {
def receive = {
case Work(n, c) if n < 0 => throw new Exception("Negative number")
case Work(n, c) => self reply n.isProbablePrime(c);
}
}
And this is how I start my workers:
val workers = Vector.fill(nrOfWorkers)(actorOf[AkkaWorkerFT].start());
val router = Routing.loadBalancerActor(SmallestMailboxFirstIterator(workers)).start()
And this is how I shut everything down:
futures.foreach( _.await )
router ! Broadcast(PoisonPill)
router ! PoisonPill
Now what happens is if I send the workers messages with n > 0 (no exception is thrown), everything works fine and the application shuts down properly. However, as soon as I send it a single message which results in an exception, the application does not terminate because there is still an actor running, but I can't figure out where it comes from.
In case it helps, this is the stack of the thread in question:
Thread [akka:event-driven:dispatcher:event:handler-6] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 158
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
LinkedBlockingQueue<E>.take() line: 399
ThreadPoolExecutor.getTask() line: 947
ThreadPoolExecutor$Worker.run() line: 907
MonitorableThread(Thread).run() line: 680
MonitorableThread.run() line: 182
PS: The thread which is not terminating isn't any of the worker threads, because I've added a postStop callback, every one of them stops properly.
PPS: Actors.registry.shutdownAll workarounds the problem, but I think shutdownAll should only be used as a last resort, shouldn't it?
The proper way to handle problems inside akka actors is not to throw an exception but rather to set supervisor hierarchies
"Throwing an exception in concurrent code (let’s assume we are using
non-linked actors), will just simply blow up the thread that currently
executes the actor.
There is no way to find out that things went wrong (apart from
inspecting the stack trace).
There is nothing you can do about it."
see Fault Tolerance Through Supervisor Hierarchies (1.2)
* note * the above is true for old versions of Akka (1.2)
In newer versions (e.g. 2.2) you'd still set a supervisor hierarchy but it will trap Exceptions thrown by child processes. e.g.
class Child extends Actor {
var state = 0
def receive = {
case ex: Exception ⇒ throw ex
case x: Int ⇒ state = x
case "get" ⇒ sender ! state
}
}
and in the supervisor:
class Supervisor extends Actor {
import akka.actor.OneForOneStrategy
import akka.actor.SupervisorStrategy._
import scala.concurrent.duration._
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException ⇒ Resume
case _: NullPointerException ⇒ Restart
case _: IllegalArgumentException ⇒ Stop
case _: Exception ⇒ Escalate
}
def receive = {
case p: Props ⇒ sender ! context.actorOf(p)
}
}
see Fault Tolerance Through Supervisor Hierarchies (2.2)
Turning off the logging to make sure things terminate, as proposed by Viktor, is a bit strange. What you can do instead is:
EventHandler.shutdown()
that cleanly shuts down all the (logger) listeners that keep the world running after the exception:
def shutdown() {
foreachListener(_.stop())
EventHandlerDispatcher.shutdown()
}
Turn of the logger in the akka.conf
Related
I am trying to implement fault tolerance within my actor system for my Scala project, to identify errors and handle them. I am using Classic actors. Each supervisor actor has 5 child actors, if one of these child actors fails, I want to restart that actor, and log the error, and as I mentioned, handle the problem that caused this actor to fail.
I implemented a One-For-One SupervisorStrategy in my Supervisor actor class as such:
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 5, withinTimeRange = 1.minute) {
case e: ArithmeticException =>
logger.error(s"Supervisor: $e from $sender; Restarting!")
Restart
case e: NullPointerException =>
logger.error(s"Supervisor: $e from $sender; Restarting!")
Restart
case e: IllegalArgumentException =>
logger.error(s"Supervisor: $e from $sender; Restarting!")
Restart
case _: Exception =>
logger.error(s"Supervisor: Unknown exception from $sender; Escalating!")
Restart
}
I have also tried something as simple as:
override val supervisorStrategy =
OneForOneStrategy() {
case _ =>
logger.error(s"Supervisor: Unknown exception from $sender; Escalating!")
Restart
}
I added the following code into one of the supervisor methods that tells the actors to start working:
if(!errorSent){
errorSent = true
actor ! new NullPointerException
}
With a var errorSent being declared in my supervisor, and the actor throwing the exception when an exception message is received, it is implemented with this var so that it only occurs once as to not put the actor in a state of infinitely restarting. I did this because the system never/rarely fails. But, in the log file, I do not get what I expect, but the following:
13:41:19.893 [ClusterSystem-akka.actor.default-dispatcher-21] ERROR akka.actor.OneForOneStrategy - null
java.lang.NullPointerException: null
I have looked at so many examples, as well as the Akka documentation for fault tolerance for both types and classic actors, but I cannot get this to work.
I've testing the fault tolerant system of akka and so far it's been good when talking about retrying to send a msg according the maxNrOfRetries specified.
However, it does not restart the actor within the given time range, it restarts all at once, ignoring the within time range.
I tried with AllForOneStrategy and OneForOneStrategy but does not change anything.
Trying to follow this blog post: http://letitcrash.com/post/23532935686/watch-the-routees, this is the code I've been working.
class Supervisor extends Actor with ActorLogging {
var replyTo: ActorRef = _
val child = context.actorOf(
Props(new Child)
.withRouter(
RoundRobinPool(
nrOfInstances = 5,
supervisorStrategy =
AllForOneStrategy(maxNrOfRetries = 3, withinTimeRange = 10.second) {
case _: NullPointerException => Restart
case _: Exception => Escalate
})), name = "child-router")
child ! GetRoutees
def receive = {
case RouterRoutees(routees) =>
routees foreach context.watch
case "start" =>
replyTo = sender()
child ! "error"
case Terminated(actor) =>
replyTo ! -1
context.stop(self)
}
}
class Child extends Actor with ActorLogging {
override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
log.info("***** RESTARTING *****")
message foreach{ self forward }
}
def receive = LoggingReceive {
case "error" =>
log.info("***** GOT ERROR *****")
throw new NullPointerException
}
}
object Boot extends App {
val system = ActorSystem()
val supervisor = system.actorOf(Props[Supervisor], "supervisor")
supervisor ! "start"
}
Am I doing anything wrong to accomplish that?
EDIT
Actually, I misunderstood the purpose of the withinTimeRange.
To schedule my retries in a time range, I'm doing the following:
override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
log.info("***** RESTARTING *****")
message foreach { msg =>
context.system.scheduler.scheduleOnce(30.seconds, self, msg)
}
}
It seems to work ok.
I think you have misunderstood the purpose of the withinTimeRange arg. That value is supposed to be used in conjunction with maxNrOfRetries to provide a window in which to support the limiting of the number of retries. For example, as you have specified, the implication is that the supervisor will no longer restart an individual child if that child needs to be restarted more than 3 times in 10 seconds.
From docs:
maxNrOfRetries - the number of times a child actor is allowed to be
restarted, negative value means no limit, if the limit is exceeded the
child actor is stopped
withinTimeRange - duration of the time window
for maxNrOfRetries, Duration.Inf means no window
Your code means that when any child fails with NullPointerException more than 3 times within 10 seconds it will not be restarted again. Because of AllForOneStrategy after first Routee fails all routees are restarted. And because you've overridden preRestart to resend failed message this situation repeats again until reaches 3 failures within 10 seconds(which is achieved in less than a second).
I have a supervisor below that creates child processes using the default Play! 2.2 Akka.system. When I attempt to instantaneously kill the supervisor, nothing happens and it keeps processing.
class ImportSupervisor extends Actor {
import akka.actor.AllForOneStrategy
import akka.actor.SupervisorStrategy._
import scala.concurrent.duration._
val log = Logging(context.system, this)
override val supervisorStrategy =
AllForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 hour) {
case e: Exception => Stop
}
override def preStart() {
Logger.info("supervisor starting up at "+self.toString)
}
def receive = {
case p: Props => context.actorOf(p)
}
}
Below is the code I'm using to kill the supervisor which has about 1,000 children:
Akka.system.actorSelection("akka://application/user/"+actorName) ! Kill
I can verify I am getting the ActorPath correct, but the problem is the Kill does not take effect instantaneously. I've tried the same with Stop. What am I doing wrong? Is it wrong to assume that it would shut down instantly?
Is it wrong to assume that it would shut down instantly?
Yes.
The Kill message will be enqueued in the actor's mailbox just like any other message. If there are other messages ahead of it, other the actor is busy processing another message, the Kill message will have to wait.
All of this is explained in the documentation:
http://doc.akka.io/docs/akka/2.2.4/scala/actors.html#Stopping_actors
I've a very simple example where I've an Actor (SimpleActor) that perform a periodic task by sending a message to itself. The message is scheduled in the constructor for the actor. In the normal case (i.e., without faults) everything works fine.
But what if the Actor has to deal with faults. I've another Actor (SimpleActorWithFault). This actor could have faults. In this case, I'm generating one myself by throwing an exception. When a fault happens (i.e., SimpleActorWithFault throws an exception) it is automatically restarted. However, this restart messes up the scheduler inside the Actor which no longer functions as excepted. And if the faults happens rapidly enough it generates more unexpected behavior.
My question is what's the preferred way to dealing with faults in such cases? I know I can use Try blocks to handle exceptions. But what if I'm extending another actor where I cannot put a Try in the super class or some case when I'm an excepted fault happens in the actor.
import akka.actor.{Props, ActorLogging}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import akka.actor.Actor
case object MessageA
case object MessageToSelf
class SimpleActor extends Actor with ActorLogging {
//schedule a message to self every second
context.system.scheduler.schedule(0 seconds, 1 seconds, self, MessageToSelf)
//keeps track of some internal state
var count: Int = 0
def receive: Receive = {
case MessageA => {
log.info("[SimpleActor] Got MessageA at %d".format(count))
}
case MessageToSelf => {
//update state and tell the world about its current state
count = count + 1
log.info("[SimpleActor] Got scheduled message at %d".format(count))
}
}
}
class SimpleActorWithFault extends Actor with ActorLogging {
//schedule a message to self every second
context.system.scheduler.schedule(0 seconds, 1 seconds, self, MessageToSelf)
var count: Int = 0
def receive: Receive = {
case MessageA => {
log.info("[SimpleActorWithFault] Got MessageA at %d".format(count))
}
case MessageToSelf => {
count = count + 1
log.info("[SimpleActorWithFault] Got scheduled message at %d".format(count))
//at some point generate a fault
if (count > 5) {
log.info("[SimpleActorWithFault] Going to throw an exception now %d".format(count))
throw new Exception("Excepttttttiooooooon")
}
}
}
}
object MainApp extends App {
implicit val akkaSystem = akka.actor.ActorSystem()
//Run the Actor without any faults or exceptions
akkaSystem.actorOf(Props(classOf[SimpleActor]))
//comment the above line and uncomment the following to run the actor with faults
//akkaSystem.actorOf(Props(classOf[SimpleActorWithFault]))
}
The correct way is to push down the risky behavior into it's own actor. This pattern is called the Error Kernel pattern (see Akka Concurrency, Section 8.5):
This pattern describes a very common-sense approach to supervision
that differentiates actors from one another based on any volatile
state that they may hold.
In a nutshell, it means that actors whose state is precious should not
be allowed to fail or restart. Any actor that holds precious data is
protected such that any risky operations are relegated to a slave
actor who, if restarted, only causes good things to happen.
The error kernel pattern implies pushing levels of risk further down
the tree.
See also another tutorial here.
So in your case it would be something like this:
SimpleActor
|- ActorWithFault
Here SimpleActor acts as a supervisor for ActorWithFault. The default supervising strategy of any actor is to restart a child on Exception and escalate on anything else:
http://doc.akka.io/docs/akka/snapshot/scala/fault-tolerance.html
Escalating means that the actor itself may get restarted. Since you really don't want to restart SimpleActor you could make it always restart the ActorWithFault and never escalate by overriding the supervisor strategy:
class SimpleActor {
override def preStart(){
// our faulty actor --- we will supervise it from now on
context.actorOf(Props[ActorWithFault], "FaultyActor")
...
override val supervisorStrategy = OneForOneStrategy () {
case _: ActorKilledException => Escalate
case _: ActorInitializationException => Escalate
case _ => Restart // keep restarting faulty actor
}
}
To avoid messing up the scheduler:
class SimpleActor extends Actor with ActorLogging {
private var cancellable: Option[Cancellable] = None
override def preStart() = {
//schedule a message to self every second
cancellable = Option(context.system.scheduler.schedule(0 seconds, 1 seconds, self, MessageToSelf))
}
override def postStop() = {
cancellable.foreach(_.cancel())
cancellable = None
}
...
To correctly handle exceptions (akka.actor.Status.Failure is for correct answer to an ask in case of Ask pattern usage by sender):
...
def receive: Receive = {
case MessageA => {
try {
log.info("[SimpleActor] Got MessageA at %d".format(count))
} catch {
case e: Exception =>
sender ! akka.actor.Status.Failure(e)
log.error(e.getMessage, e)
}
}
...
How should I handle an exception thrown by the DbActor here ? I'm not sure how to handle it, should pipe the Failure case ?
class RestActor extends Actor with ActorLogging {
import context.dispatcher
val dbActor = context.actorOf(Props[DbActor])
implicit val timeout = Timeout(10 seconds)
override val supervisorStrategy: SupervisorStrategy = {
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 10 seconds) {
case x: Exception => ???
}
}
def receive = {
case GetRequest(reqCtx, id) => {
// perform db ask
ask(dbActor, ReadCommand(reqCtx, id)).mapTo[SomeObject] onComplete {
case Success(obj) => { // some stuff }
case Failure(err) => err match {
case x: Exception => ???
}
}
}
}
}
Would be glad to get your thought, thanks in advance !
There are a couple of questions I can see here based on the questions in your code sample:
What types of things can I do when I override the default supervisor behavior in the definition of how to handle exceptions?
When using ask, what types of things can I do when I get a Failure result on the Future that I am waiting on?
Let's start with the first question first (usually a good idea). When you override the default supervisor strategy, you gain the ability to change how certain types of unhandled exceptions in the child actor are handled in regards to what to do with that failed child actor. The key word in that previous sentence is unhandled. For actors that are doing request/response, you may actually want to handle (catch) specific exceptions and return certain response types instead (or fail the upstream future, more on that later) as opposed to letting them go unhandled. When an unhandled exception happens, you basically lose the ability to respond to the sender with a description of the issue and the sender will probably then get a TimeoutException instead as their Future will never be completed. Once you figured out what you handle explicitly, then you can consider all the rest of exceptions when defining your custom supervisor strategy. Inside this block here:
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 10 seconds) {
case x: Exception => ???
}
You get a chance to map an exception type to a failure Directive, which defines how the failure will be handled from a supervision standpoint. The options are:
Stop - Completely stop the child actor and do not send any more messages to it
Resume - Resume the failed child, not restarting it thus keeping its current internal state
Restart - Similar to resume, but in this case, the old instance is thrown away and a new instance is constructed and internal state is reset (preStart)
Escalate - Escalate up the chain to the parent of the supervisor
So let's say that given a SQLException you wanted to resume and given all others you want to restart then your code would look like this:
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 10 seconds) {
case x: SQLException => Resume
case other => Restart
}
Now for the second question which pertains to what to do when the Future itself returns a Failure response. In this case, I guess it depends on what was supposed to happen as a result of that Future. If the rest actor itself was responsible for completing the http request (let's say that httpCtx has a complete(statusCode:Int, message:String) function on it), then you could do something like this:
ask(dbActor, ReadCommand(reqCtx, id)).mapTo[SomeObject] onComplete {
case Success(obj) => reqCtx.complete(200, "All good!")
case Failure(err:TimeoutException) => reqCtx.complete(500, "Request timed out")
case Failure(ex) => reqCtx.complete(500, ex.getMessage)
}
Now if another actor upstream was responsible for completing the http request and you needed to respond to that actor, you could do something like this:
val origin = sender
ask(dbActor, ReadCommand(reqCtx, id)).mapTo[SomeObject] onComplete {
case Success(obj) => origin ! someResponseObject
case Failure(ex) => origin ! Status.Failure(ex)
}
This approach assumes that in the success block you first want to massage the result object before responding. If you don't want to do that and you want to defer the result handling to the sender then you could just do:
val origin = sender
val fut = ask(dbActor, ReadCommand(reqCtx, id))
fut pipeTo origin
For simpler systems one may want to catch and forward all of the errors. For that I made this small function to wrap the receive method, without bothering with supervision:
import akka.actor.Actor.Receive
import akka.actor.ActorContext
/**
* Meant for wrapping the receive method with try/catch.
* A failed try will result in a reply to sender with the exception.
* #example
* def receive:Receive = honestly {
* case msg => sender ! riskyCalculation(msg)
* }
* ...
* (honestActor ? "some message") onComplete {
* case e:Throwable => ...process error
* case r:_ => ...process result
* }
* #param receive
* #return Actor.Receive
*
* #author Bijou Trouvaille
*/
def honestly(receive: =>Receive)(implicit context: ActorContext):Receive = { case msg =>
try receive(msg) catch { case error:Throwable => context.sender ! error }
}
you can then place it into a package file and import a la akka.pattern.pipe and such. Obviously, this won't deal with exceptions thrown by asynchronous code.