Retry / replay of failed messages in AKKA - scala

I'm using AKKA.NET in my current .NET project.
My question is this: How are experienced AKKA-developers implementing the replay-message-on-failure pattern using the latest AKKA libraries for either Java or .NET?
Here are some more details.
I want to ensure that a failed message (i.e. a message received by an actor leading to an exception) is replayed / retried a number of times with a time interval between each. Normally the actor is restarted by the failed message is thrown away.
I have written my own small helper method like this to solve it:
public void WithRetries(int noSeconds, IUntypedActorContext context, IActorRef receiver, IActorRef sender, Object message, Action action)
{
try
{
action();
}
catch (Exception e)
{
context.System.Scheduler.ScheduleTellOnce(new TimeSpan(0, 0, noSeconds), receiver, message, sender);
throw;
}
}
}
Now my actors typically look like this:
Receive<SomeMessage>(msg =>
{
ActorHelper.Instance.WithRetries(-1, Context, Self, Sender, msg, () => {
...here comes the actual message processing
});
});
I like the above solution because it is straightforward. However, I don't like that it adds yet another layer of indirection in my code, and the code gets a bit more messy if I use this helper method in many places. Furthermore it has some limitations. First of all, the number of retries is not governed by the helper method. It is governed by the supervision strategy of the supervisor, which I believe is messy. Furthermore, the time interval is fixed whereas I would in some cases like a time interval that increases for each retry.
I would prefer something that can be configured using HOCON. Or something that can be applied as a cross-concern.
I can see various suggestions for either AKKA for Scala, AKKA for Java and AKKA.NET. I have seen examples with routers, examples with Circuit Breaker (e.g. http://getakka.net/docs/CircuitBreaker#examples) and so forth. I have also seen some examples using the same idea as above. But I have a feeling that it should be even simpler. Perhaps it involves some usage of AKKA Persistence and events.
So to repeat my question: How are experienced AKKA-developers implementing the replay-message-on-failure pattern using the latest AKKA libraries for either Java or .NET?

I looked into this last year sometime - I'm away from my dev machine so cannot check,so this is all coming from memory:
I seem to remember the solution to this was a combination of stashing and supervision strategies and lifecycle hooks :)
I think you can wrap your child actor code in a try-catch, then in the case of error, stash the message and re-throw the exception so it is handled by the supervisor and all the usual supervision strategies come into play. I think you would resume rather than restart. Then in the appropriate lifecycle message (onresume?!) unstash messages which should mean the failed message is processed again.
Now this isn't all that different from what you've already posted above, so hopefully someone has a better solution :)

This may be late. But another solution is to pass the comamnd (or essential params) to the actor constructor and send the command to islef when created and use the Restart directive.
// Scala code
class ResilientActor(cmd:Comman) extends Actor {
def receive = {
...
}
self ! cmd
}
...
override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 3){
case _: SomeRetryableException => Restart
case t => super.supervisorStrategy.decider.applyOrElse(t, (_:Any) => Escalate)
}

Related

Is sending futures in Akka messages OK?

I'm working on implementing a small language to send tasks to execution and control execution flow. After the sending a task to my system, the user gets a future (on which it can call a blocking get() or flatMap() ). My question is: is it OK to send futures in Akka messages?
Example: actor A sends a message Response to actor B and Response contains a future among its fields. Then at some point A will fulfill the promise from which the future was created. After receiving the Response, B can call flatMap() or get() at any time.
I'm asking because Akka messages should be immutable and work even if actors are on different JVMs. I don't see how my example above can work if actors A and B are on different JVMs. Also, are there any problems with my example even if actors are on same JVM?
Something similar is done in the accepted answer in this stackoverflow question. Will this work if actors are on different JVMs?
Without remoting it's possible, but still not advisable. With remoting in play it won't work at all.
If your goal is to have an API that returns Futures, but uses actors as the plumbing underneath, one approach could be that the API creates its own actor internally that it asks, and then returns the future from that ask to the caller. The actor spawned by the API call is guaranteed to be local to the API instance and can communicate with the rest of the actor system via the regular tell/receive mechanism, so that there are no Futures sent as messages.
class MyTaskAPI(actorFactory: ActorRefFactory) {
def doSomething(...): Future[SomethingResult] = {
val taskActor = actorFactory.actorOf(Props[MyTaskActor])
taskActor ? DoSomething(...).mapTo[SomethingResult]
}
}
where MyTaskActor receives the DoSomething, captures the sender, sends out the request for task processince and likely becomes a receiving state for SomethingResult which finally responds to the captured sender and stops itself. This approach creates two actors per request, one explicitly, the MyTaskActor and one implicitly, the handler of the ask, but keeps all state inside of actors.
Alternately, you could use the ActorDSL to create just one actor inline of doSomething and use a captured Promise for completion instead of using ask:
class MyTaskAPI(system: System) {
def doSomething(...): Future[SomethingResult] = {
val p = Promise[SomethingResult]()
val tmpActor = actor(new Act {
become {
case msg:SomethingResult =>
p.success(msg)
self.stop()
}
}
system.actorSelection("user/TaskHandler").tell(DoSomething(...), tmpActor)
p.future
}
}
This approach is a bit off the top of my head and it does use a shared value between the API and the temp actor, which some might consider a smell, but should give an idea how to implement your workflow.
If you're asking if it's possible, then yes, it's possible. Remote actors are basically interprocess communication. If you set everything up on both machines to a state where both can properly handle the future, then it should be good. You don't give any working example so I can't really delve deeper into it.

Why does `TestFSMRef.receive must throw[Exception]` fail intermittently

Hi fellow coders and admired gurus,
I have an actor that implements FSM that is required to throw an IOException on certain messages while in a specific state (Busy) to be restarted by its Supervisor.
excerpt:
case class ExceptionResonse(errorCode: Int)
when(Busy) {
case ExceptionResponse(errorCode) =>
throw new IOException(s"Request failed with error code $errorCode")
}
I am trying to test that behavior by using a TestActorRef and calling receive directly on that expecting receive to throw an IOException.
case class WhenInStateBusy() extends TestKit(ActorSystem()) with After {
val myTestFSMRef = TestFSMRef(MyFSM.props)
...
def prepare: Result = {
// prepares tested actor by going through an initialization sequence
// including 'expectMsgPfs' for several messages sent from the tested FSM
// most of my test cases depend on the correctness of that initialization sequence
// finishing with state busy
myTestFSMRef.setState(Busy)
awaitCond(
myTestFSMRef.stateName == Busy,
maxDelay,
interval,
s"Actor must be in State 'Busy' to proceed, but is ${myTestFSMRef.stateName}"
)
success
}
def testCase = this {
prepare and {
myTestFSMRef.receive(ExceptionResponse(testedCode)) must throwAn[IOException]
}
}
}
Note: The initialization sequence makes sure, the tested FSM is fully initialized and has setup its internal mutable state. State Busy can only be left when the actor receives a certain kind of message that in my test setup has to be provided by the test case, so I am pretty sure the FSM is in the right state.
Now, on my Jenkins server (Ubuntu 14.10) this test case fails in about 1 out of 20 attempts (-> No exception is thrown). However, on my development machine (Mac Os X 10.10.4) I am not able to reproduce the bug. So debugger does not help me.
The tests are run sequentially and after each example the test system is shut down.
Java version 1.7.0_71
Scala version 2.11.4
Akka version 2.3.6
Specs2 version 2.3.13
Can anyone explain why sometimes calling myTestActorRef.receive(ExceptionResponse(testedCode)) does not result in an Exception?
This is a tricky question indeed: my prime suspect is that the Actor is not yet initialized. Why is this? When implementing system.actorOf (which is used by TestFSMRef.apply()) it became clear that there can only be one entity that is responsible for actually starting an Actor, and that is its parent. I tried many different things and all of them were flawed in some way.
But how does that make this test fail?
The basic answer is that with the code you show it is not guaranteed that at the time you execute setState the FSM has already been initialized. Especially on (low-powered) Jenkins boxes it may be that the guardian actor does not get scheduled to run for a measurable amount of time. If that is the case then the startWith statement in your FSM will override the setState because it runs afterwards.
The solution to this would be to send another message to the FSM and expect back the proper response before calling setState.

Scala how to use akka actors to handle a timing out operation efficiently

I am currently evaluating javascript scripts using Rhino in a restful service. I wish for there to be an evaluation time out.
I have created a mock example actor (using scala 2.10 akka actors).
case class Evaluate(expression: String)
class RhinoActor extends Actor {
override def preStart() = { println("Start context'"); super.preStart()}
def receive = {
case Evaluate(expression) ⇒ {
Thread.sleep(100)
sender ! "complete"
}
}
override def postStop() = { println("Stop context'"); super.postStop()}
}
Now I run use this actor as follows:
def run {
val t = System.currentTimeMillis()
val system = ActorSystem("MySystem")
val actor = system.actorOf(Props[RhinoActor])
implicit val timeout = Timeout(50 milliseconds)
val future = (actor ? Evaluate("10 + 50")).mapTo[String]
val result = Try(Await.result(future, Duration.Inf))
println(System.currentTimeMillis() - t)
println(result)
actor ! PoisonPill
system.shutdown()
}
Is it wise to use the ActorSystem in a closure like this which may have simultaneous requests on it?
Should I make the ActorSystem global, and will that be ok in this context?
Is there a more appropriate alternative approach?
EDIT: I think I need to use futures directly, but I will need the preStart and postStop. Currently investigating.
EDIT: Seems you don't get those hooks with futures.
I'll try and answer some of your questions for you.
First, an ActorSystem is a very heavy weight construct. You should not create one per request that needs an actor. You should create one globally and then use that single instance to spawn your actors (and you won't need system.shutdown() anymore in run). I believe this covers your first two questions.
Your approach of using an actor to execute javascript here seems sound to me. But instead of spinning up an actor per request, you might want to pool a bunch of the RhinoActors behind a Router, with each instance having it's own rhino engine that will be setup during preStart. Doing this will eliminate per request rhino initialization costs, speeding up your js evaluations. Just make sure you size your pool appropriately. Also, you won't need to be sending PoisonPill messages per request if you adopt this approach.
You also might want to look into the non-blocking callbacks onComplete, onSuccess and onFailure as opposed to using the blocking Await. These callbacks also respect timeouts and are preferable to blocking for higher throughput. As long as whatever is way way upstream waiting for this response can handle the asynchronicity (i.e. an async capable web request), then I suggest going this route.
The last thing to keep in mind is that even though code will return to the caller after the timeout if the actor has yet to respond, the actor still goes on processing that message (performing the evaluation). It does not stop and move onto the next message just because a caller timed out. Just wanted to make that clear in case it wasn't.
EDIT
In response to your comment about stopping a long execution there are some things related to Akka to consider first. You can call stop the actor, send a Kill or a PosionPill, but none of these will stop if from processing the message that it's currently processing. They just prevent it from receiving new messages. In your case, with Rhino, if infinite script execution is a possibility, then I suggest handling this within Rhino itself. I would dig into the answers on this post (Stopping the Rhino Engine in middle of execution) and setup your Rhino engine in the actor in such a way that it will stop itself if it has been executing for too long. That failure will kick out to the supervisor (if pooled) and cause that pooled instance to be restarted which will init a new Rhino in preStart. This might be the best approach for dealing with the possibility of long running scripts.

How to safely use reply and !? on a Scala Actor

Depending on a reply from a Scala Actor seems incredibly error-prone to me. Is this truly the idiomatic Scala way to have conversations between actors? Is there an alternative, or a safer use of reply that I'm missing?
(About me: I'm familiar with synchronization in Java, but I've never designed an actor-based system before and don't yet have a full understanding of the paradigm.)
Example mistakes
For a trivial demonstration, let's look at this silly integer-parsing Actor:
import actors._, Actor._
val a = actor {
loop {
react {
case s: String => reply(s.toInt)
}
}
}
We could intend to use this as
scala> a !? "42"
res0: Any = 42
But if the actor fails to reply (in this case because a careless programmer did not think to catch NumberFormatException in the actor), we'll be waiting forever:
scala> a !? "f"
We also make a mistake at the call site. This next example also blocks indefinitely, because the actor does not reply to Int messages:
scala> a !? 42
Timeout
You could use !? (msec: Long, msg: Any) if the expected reply has some known reasonable time bound, but that is not the case in most circumstances I can think of.
Guaranteeing reply
One thought would be to design that actor such that it necessarily replies to every message:
import actors._, Actor._
val a = actor {
loop {
react {
case m => reply {
try {
m match {
case s: String => reply(s.toInt)
case _ => None
}
} catch {
case e => e
}
}
}
}
}
This feels better, although there is still a little fear of accidentally invoking !? on an actor is no longer acting.
I can see your concerns, but I would actually argue that this is not any worse than the synchronization you are used to. Who guarantees that the locks will ever be released again?
Using !? is at your own risk, so no there are no 'safer' uses that I am aware of. Threads can block or die and there is absolutely nothing we can do about it. Except for providing safety-valves that can soften the blow.
The event-based acting actually gives you alternatives to receiving replies synchronously. The timeout is one of them but another thing such as Futures via the !! method. They are designed to handle deadlocks such as that. The method immediately returns a future that can be handled later.
For inspiration and more in-depth design decisions see:
Actors:
http://docs.scala-lang.org/overviews/core/actors.html
Futures (in scala 2.10):
http://docs.scala-lang.org/sips/pending/futures-promises.html
Don't bother with old local actors - learn Akka. Also it's good that you know about synchronized, but personally me - almost never use such a word, even in Java code. Imagine synchronized is deprecated, learn Java memory model, learn CAS.
I am not familiar with the Actor system in the Scala standard library myself, but I highly recommend checking out the Akka toolkit (http://akka.io/) which has "replaced" the Scala Actors and comes with the Scala distribution as of Scala 2.10.
In terms of Actor system design in general, some of the key ideas are asynchronous (non-blocking), isolated mutability, and communication via message passing. Each Actor encapsulates it's own state, nobody else is allowed to touch it. You can send an Actor a message that may "ask" it to change state, but the Actor implementation is free to ignore it. Messages are sent asynchronously (you CAN make it blocking, not recommended). If you want to have some sort of "response" (so that you can associate a message with a previously sent message), the Future API in Scala 2.10 and ask of Akka can help.
Regarding your error format exception and the problem in general, consider looking at the ask and Future API in Scala 2.10 and Akka 2.1. It will handle exceptions and is non-blocking.
Scala 2.10 also has a new Try that is intended as an alternative to the old-fashioned try-catch clauses. The Try has an apply method that you would use like any try (minus the catch and finally). Try has two sub-classes Success and Failure. An instance of Try[T] will have subclasses Success[T] and Failure[Throwable]. It is easier to explain by example:
>>> val x: Try[Int] = Try { "5".toInt } // Success[Int] with encapsulated value 5
>>> val y: Try[Int] = Try { "foo".toInt } // Failure(java.lang.NumberFormatException: For input string: "foo")
Since Try does not throw the actual exception and the subclasses are conveniently case-classes, you could easily use the result as a message to an Actor.

Easiest way to do idle processing in a Scala Actor?

I have a scala actor that does some work whenever a client requests it. When, and only when no client is active, I would like the Actor to do some background processing.
What is the easiest way to do this? I can think of two approaches:
Spawn a new thread that times out and wakes up the actor periodically. A straight forward approach, but I would like to avoid creating another thread (to avoid the extra code, complexity and overhead).
The Actor class has a reactWithin method, which could be used to time out from the actor itself. But the documentation says the method doesn't return. So, I am not sure how to use it.
Edit; a clarification:
Assume that the background task can be broken down into smaller units that can be independently processed.
Ok, I see I need to put my 2 cents. From the author's answer I guess the "priority receive" technique is exactly what is needed here. It is possible to find discussion in "Erlang: priority receive question here at SO". The idea is to accept high priority messages first and to accept other messages only in absence of high-priority ones.
As Scala actors are very similar to Erlang, a trivial code to implement this would look like this:
def act = loop {
reactWithin(0) {
case msg: HighPriorityMessage => // process msg
case TIMEOUT =>
react {
case msg: HighPriorityMessage => // process msg
case msg: LowPriorityMessage => // process msg
}
}
}
This works as follows. An actor has a mailbox (queue) with messages. The receive (or receiveWithin) argument is a partial function and Actor library looks for a message in a mailbox which can be applied to this partial function. In our case it would be an object of HighPriorityMessage only. So, if Actor library finds such a message, it applies our partial function and we are processing a message of high priority. Otherwise, reactWithin with timeout 0 calls our partial function with argument TIMEOUT and we immediately try to process any possible message from the queue (as it waits for a message we cannot exclude a possiblity to get HighPriorityMessage).
It sounds like the problem you describe is not well suited to the actor sub-system. An Actor is designed to sequentially process its message queue:
What should happen if the actor is performing the background work and a new task arrives?
An actor can only find out about this is it is continuously checking its mailbox as it performs the background task. How would you implement this (i.e. how would you code the background tasks as a unit of work so that the actor could keep interrupting and checking the mailbox)?
What should happen if the actor has many background tasks in its mailbox in front of the main task?
Do these background tasks get thrown away, or sent to another actor? If the latter, how can you prevent CPU time being given to that actor to perform the tasks?
All in all, it sounds much more like you need to explore some grid-style software that can run in the background (like Data Synapse)!
Just after asking this question I tried out some completely whacky code and it seems to work fine. I am not sure though if there is a gotcha in it.
import scala.actors._
object Idling
object Processor extends Actor {
start
import Actor._
def act() = {
loop {
// here lie dragons >>>>>
if (mailboxSize == 0) this ! Idling
// <<<<<<
react {
case msg:NormalMsg => {
// do the normal work
reply(answer)
}
case Idling=> {
// do the idle work in chunks
}
case msg => println("Rcvd unknown message:" + msg)
}
}
}
}
Explanation
Any code inside the argument of loop but before the call to react seems to get called when the Actor is about to wait for a message. I am sending a Idling message to self here. In the handler for this message I ensure that the mailbox-size is 0, before doing the processing.