I am trying to implement fault tolerance within my actor system for my Scala project, to identify errors and handle them. I am using Classic actors. Each supervisor actor has 5 child actors, if one of these child actors fails, I want to restart that actor, and log the error, and as I mentioned, handle the problem that caused this actor to fail.
I implemented a One-For-One SupervisorStrategy in my Supervisor actor class as such:
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 5, withinTimeRange = 1.minute) {
case e: ArithmeticException =>
logger.error(s"Supervisor: $e from $sender; Restarting!")
Restart
case e: NullPointerException =>
logger.error(s"Supervisor: $e from $sender; Restarting!")
Restart
case e: IllegalArgumentException =>
logger.error(s"Supervisor: $e from $sender; Restarting!")
Restart
case _: Exception =>
logger.error(s"Supervisor: Unknown exception from $sender; Escalating!")
Restart
}
I added a throw new NullPointerException in one of the methods I know is called in every actor, as the system never/rarely fails. But, in the log file, I do not get what I expect, but the following:
13:41:19.893 [ClusterSystem-akka.actor.default-dispatcher-21] ERROR akka.actor.OneForOneStrategy - null
java.lang.NullPointerException: null
I have looked at so many examples, as well as the Akka documentation for fault tolerance for both types and classic actors, but I cannot get this to work.
EDIT
After doing some more research, I saw that a potential reason for the error I am getting could be caused by an infinite cycle in the child actor, whereby it throws an error, restarts, throws an error etc.
So I took a different approach, where in the supervisor actor, I added a variable sentError = false, then in one of the supervisor methods that tells the child actors to start working, I added the following:
if(!errorSent){
errorSent = true
actor ! new NullPointerException
}
And when a child actor receives an error, they throw it. I took this approach so that this only ever happens once, to avoid the cycle. However, unfortunately this did not change the output, and I still got the error mentioned above.
Related
I have a small Scala application that's using Akka actors. In the course of doing some refactoring, I've encountered a completely unexpected situation that I've been unable to debug.
I have nodes in a graph that send messages to each other. In fact, they all send messages to a single actor who turns around and directs them back to the appropriate target, but I doubt that's especially relevant.
My "NodeActor" has a receive method:
final def receive: PartialFunction[Any,Unit] = {
case NInit() => ...
case NStart() => ...
case m: Any =>
log.error(s"Unexpected message: $m")
}
I have started to get the unexpected message error where $m is a bare tuple. It's being sent by the central monitoring actor, but I cannot for the life of me work out how. All of the messages sent by the graph monitor look like properly formatted case class instances. I've attempted unsuccessfully to get Akka to log ever sent and received method.
Eventually, I discovered aroundReceive and implemented a primitive version of that which simply logs the received message. None of the messages that pass through that method are just a bare tuple.
I completely baffled about how to proceed. Suggestions most welcome.
Use sender.path. I normally do
log.error(s"Unexpected message: $m from sender: ${sender.path}")
I took out the case m: Any case, let the MatchError occur, and reviewed the stack trace. I was literally calling receive() directly from elsewhere due to a cut-and-paste failure-to-think bug. Nevermind.
We have a stream connected to Kafka that needs to have different behavior based on the exception type. For example, if it has a SQLException it should use Supervision.Stop, but if it's a RetriableException it should use Supervision.Restart.
I would also like to be able to implement an exponential backoff strategy for those error types that need to be restarted, but in some testing I did it seems like using the RestartSource with a Decider causes the Decider to be ignored.
What is the best way to implement an exponential backoff strategy for streams that throw a specific error type?
To solve the first problem, you can just use the decider pattern:
val decider: Supervision.Decider = {
case _: ArithmeticException => Supervision.Resume
case _ => Supervision.Stop
}
implicit val materializer = ActorMaterializer(
ActorMaterializerSettings(system).withSupervisionStrategy(decider))
I don't think the RestartSource ignores the decider... The decider is a property of the materializer, which is required for the source to start. If it does, you should report a bug to Akka Streams.
Hi fellow coders and admired gurus,
I have an actor that implements FSM that is required to throw an IOException on certain messages while in a specific state (Busy) to be restarted by its Supervisor.
excerpt:
case class ExceptionResonse(errorCode: Int)
when(Busy) {
case ExceptionResponse(errorCode) =>
throw new IOException(s"Request failed with error code $errorCode")
}
I am trying to test that behavior by using a TestActorRef and calling receive directly on that expecting receive to throw an IOException.
case class WhenInStateBusy() extends TestKit(ActorSystem()) with After {
val myTestFSMRef = TestFSMRef(MyFSM.props)
...
def prepare: Result = {
// prepares tested actor by going through an initialization sequence
// including 'expectMsgPfs' for several messages sent from the tested FSM
// most of my test cases depend on the correctness of that initialization sequence
// finishing with state busy
myTestFSMRef.setState(Busy)
awaitCond(
myTestFSMRef.stateName == Busy,
maxDelay,
interval,
s"Actor must be in State 'Busy' to proceed, but is ${myTestFSMRef.stateName}"
)
success
}
def testCase = this {
prepare and {
myTestFSMRef.receive(ExceptionResponse(testedCode)) must throwAn[IOException]
}
}
}
Note: The initialization sequence makes sure, the tested FSM is fully initialized and has setup its internal mutable state. State Busy can only be left when the actor receives a certain kind of message that in my test setup has to be provided by the test case, so I am pretty sure the FSM is in the right state.
Now, on my Jenkins server (Ubuntu 14.10) this test case fails in about 1 out of 20 attempts (-> No exception is thrown). However, on my development machine (Mac Os X 10.10.4) I am not able to reproduce the bug. So debugger does not help me.
The tests are run sequentially and after each example the test system is shut down.
Java version 1.7.0_71
Scala version 2.11.4
Akka version 2.3.6
Specs2 version 2.3.13
Can anyone explain why sometimes calling myTestActorRef.receive(ExceptionResponse(testedCode)) does not result in an Exception?
This is a tricky question indeed: my prime suspect is that the Actor is not yet initialized. Why is this? When implementing system.actorOf (which is used by TestFSMRef.apply()) it became clear that there can only be one entity that is responsible for actually starting an Actor, and that is its parent. I tried many different things and all of them were flawed in some way.
But how does that make this test fail?
The basic answer is that with the code you show it is not guaranteed that at the time you execute setState the FSM has already been initialized. Especially on (low-powered) Jenkins boxes it may be that the guardian actor does not get scheduled to run for a measurable amount of time. If that is the case then the startWith statement in your FSM will override the setState because it runs afterwards.
The solution to this would be to send another message to the FSM and expect back the proper response before calling setState.
Does the exception generated by a function within the child actor have to be caught and thrown by the child actor explicitly or will the supervisor strategy directives(escalate) take care of percolating the generated exception to the supervisor actor behind the scenes?
My supervisor strategy:
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 5, withinTimeRange = 5 minute) {
case _: ArithmeticException ⇒ Resume
case _: NullPointerException ⇒ Restart
case _: IllegalArgumentException ⇒ Stop
case _: IOException ⇒ Stop
case _: Exception ⇒ Restart
}
And some of the operations within the child actor could possibly throw an IOException. Should I put a try catch block in the child actor to catch it and then throw it so that the supervisor can catch it? Or will akka take care of it behind the scenes?
In case you have nothing to lose, let it crash. It should be the default behaviour. But unfortunately sometimes you need to allocate resources or you need to information about what happened when things went wrong.
Personally I usually catch all the exceptions I would like to inform about what is going on when I am creating endpoints or web applications.
Most of the time it's best to "let it crash" and choose the correct supervisor strategy for your situation. If you want to keep the internal state, just choose resume. Cleaning up resources before restarting can be done in the preRestart method.
Akka has sophisticated ways of handling exceptions, why not use them? There is no reason to catch an exception and then throw it again.
I am developing a server which is an intermediate server between 2 other endpoints.
So I implement a spray route which triggers a computation, and this computation invokes an actor which invokes a spray client.
The following pattern applies:
Client => ...=> (Spray Route => Service Actor => Spray Client) =>... => distant Server.
I am developing the exception management and I have a hard time understanding what to do.
When the distant server sends me BadRequest, or any error code, I want to throw an exception, which leads answering to the client the same kind of errors.
I guess my question is general about exception handling and excalation.
I naively believed that when I throw an exception in a future, the future could call the case Failure:
def receive = {
case SendRequest => {
val s = sender()
val call = for {
request <- ComputeRequest
result <- CallSprayClient ? request
} yield result
call onComplete {
case Success(succ) => s ! succ
case Failure(e) = throw e
}
}
}
The thing is, when ComputeRequest or CallSprayClient throw an exception, the Failure case of my callback is not triggered.
I looked at the supervision pattern, but it seems that the exception or message causing the error is not propagated neither.
In my particular case, depending on the exception, I would like to send a different http response to my client, hence the need to escalate.
What development pattern should I apply?
Thanks.