How to discover that a Scala remote actor is died? - scala

In Scala, an actor can be notified when another (remote) actor terminates by setting the trapExit flag and invoking the link() method with the second actor as parameter. In this case when the remote actor ends its job by calling exit() the first one is notified by receiving an Exit message.
But what happens when the remote actor terminates in a less graceful way (e.g. the VM where it is running crashes)? In other words, how the local actor can discover that the remote one is no longer available? Of course I would prefer (if possible) that the local actor could be notified by a message similar to the Exit one, but it seems not feasible. Am I missing something? Should I continuously polling the state of the remote actor (and in this case I don't know which is the best way to do that) or is there a smarter solution?

But what happens when the remote actor terminates in a less graceful way (e.g. the VM where it is running crashes)
Actor proxy stays alive accepting messages (and loosing them), and waiting for you to restart the JVM with remote actor. Watching for JVM crashes (and other failures happening on the infrastructure level) is far beyond Scala responsibilities. Good choice for that could be monitoring through JMX.
In other words, how the local actor can discover that the remote one is no longer available?
You may define a timeout interval (say 5000 millis). If remote actor doesn't reply during this interval, it's a sign for you that something unexpected is happening to remote actor, and you may either ask it about its state or just treat it as dead.
Should I continuously polling the state of the remote actor (and in this case I don't know which is the best way to do that) or is there a smarter solution?
You may put a kind of a polling load balancer/dispatcher in front of a group of actors, that will use only those actors that are alive and ready to process messages (which makes sense in case of remote actors that may suddenly appear/disappear behind the proxy) -> Can Scala actors process multiple messages simultaneously?

The book Actors in Scala mentions (not tested personally):
Trapping termination notifications.
In some cases, it is useful to receive termination notifications as messages in the mailbox of a monitoring actor.
For example, a monitoring actor may want to rethrow an exception that is not handled by some linked actor.
Or, a monitoring actor may want to react to normal termination, which is not possible by default.
Actors can be configured to receive all termination notifications as normal messages in their mailbox using the Boolean trapExit flag. In the following example actor b links itself to actor a:
val a = actor { ... }
val b = actor {
self.trapExit = true
link(a)
...
}
Note that before actor b invokes link it sets its trapExit member to true;
this means that whenever a linked actor terminates (normally or abnormally) it receives a message of type Exit.
Therefore, actor b is going to be notified whenever actor a terminates (assuming that actor a did not terminate before b’s invocation of link).
So "what happens when the remote actor terminates in a less graceful way"?
It should receive an Exit message even in the case of an abnormal termination.
val b = actor {
self.trapExit = true
link(a)
a ! 'start
react {
case Exit(from, reason) if from == a =>
println("Actor 'a' terminated because of " + reason)
}
}

Related

When to use Ask pattern in Akka

I'm started to learn Akka and in many official examples I see that request-response implemented using tell pattern. I.e. after worker made his work he sends result as new message back to sender. For example in this Pi approximation official tutorial shown how to design application where Master sends some work to workers and then awaits for results as another message.
Master code:
def receive = {
case Calculate ⇒
for (i ← 0 until nrOfMessages) workerRouter ! Work(i * nrOfElements, nrOfElements)
case Result(value) ⇒
pi += value
nrOfResults += 1
if (nrOfResults == nrOfMessages) {
// Send the result to the listener
listener ! PiApproximation(pi, duration = (System.currentTimeMillis - start).millis)
// Stops this actor and all its supervised children
context.stop(self)
}
}
Worker code:
def receive = {
case Work(start, nrOfElements) ⇒
sender ! Result(calculatePiFor(start, nrOfElements)) // perform the work
}
But I'm wondering why this example didn't use ask pattern? What is wrong with using ask pattern here?
If it's ok to use ask pattern here, then I have another question: how I can stop all worker actors after work is done?
Should my worker actors send PoisonPill message to themselves?
Or should Master actor Broadcast(PoisonPill)?
Or there some another more elegant way?
ask is useful when integrating actors with Future APIs, but they do introduce some overhead. Further, Future completions do not come through an actor's mailbox and are scheduled on a thread separate from the one used by the mailbox, meaning that receiving a future completion inside an actor introduces the need for doing multi-thread coordination.
Meanwhile, if you use tell to send to a worker and have it tell the sender in response, the communication will always flow properly through the mailbox channel.
Additionally, it is much easier to discern the inputs an actor deals with if they all come in via receive rather than some coming in via receive and other's as completions of ask messages.
To address you PoisonPill questions, the answer is likely depends. You might opt for a worker-per-message approach in which case the worker should kill itself. If instead you use a worker pool, you could make the supervisor responsible for scaling that pool up and down, having it send the PoisonPill or have workers time out on idleness, once again killing themselves.

Check if a Scala / Akka actor is terminated

While executing code within one actor, I need to check if the actor is still alive.
Would this be advisable or is there a better way?
if (self != akka.actor.DeadLetter) {
// do something
}
Thanks!
EDIT---
Thanks for all your inputs. What happens is the following. I am using Play. As my actor starts, upon an incoming request, a timeout is scheduled.
Promise.timeout({
Logger.info(s"$self, timeout expired")
// do something
}, timeoutValue)
Sometimes the actor is stopped for other reasons before the timeout expires (for instance, the client disconnects). In that case, what I see then in the logs is
Actor[akka://application/deadLetters], timeout expired.
To my understanding, this means that the default deadLetters actor is executing that code. So my question really is: what is the best way to check if the Promise code is executed after the actor is terminated, and prevent it from going further if that is the case?
You should familiarize yourself with the actor lifecycle: http://doc.akka.io/docs/akka/2.3.4/scala/actors.html#Actor_Lifecycle
From inside an actor you can implement the postStop() callback that will be called immediately before your actor is going to be stopped. If you want to monitor the lifecycle of the actor from another actor you should use DeathWatch: http://doc.akka.io/docs/akka/2.3.4/scala/actors.html#Lifecycle_Monitoring_aka_DeathWatch
If your actor is dead, no code within the actor will be running.
You can check if specific actor is available, with actorSelection, and then send a message to whatever is returned as a result (if there will be nothing, no message will be sent).
ActorContext.actorSelection(<your-actor-name>) ! someMessage
I think you can watch your actor and if you receive message Terminated you are sure that your actor is not running.
ActorContext.watch(self)

Scala system process hangs

I have an actor that uses ProcessBuilder to execute an external process:
def act {
while (true) {
receive {
case param: String => {
val filePaths = Seq("/tmp/file1","/tmp/file2")
val fileList = new ByteArrayInputStream(filePaths.mkString("\n").getBytes())
val output = s"myExecutable.sh ${param}" #< fileList !!<
doSomethingWith(output)
}
}
}
}
I run hundreds this actors running in parallel. Sometimes, for an unknown reason, the execution of the process (!!) never returns. It hangs forever. This specific actor cannot handle new messages. Is there any way to setup a timeout for this process to return, and if it exceeds retry?
What could be the reason for these executions to hold forever? Because these commands are not supposed to last more than a few milliseconds.
Edit 1:
Two important facts that I observed:
This problem does not occur on Max OS X, only in Linux
When I don't use ByteArrayInputStream as input for the execution, the program does not hang
I have an actor that uses ProcessBuilder to execute an external process: ... I run hundreds this actors running in parallel ...
That's some very heavy processing happening in parallel just to achieve a few millisecs of work in each case. Concurrent processing mechanisms rank as follows (from worst to best in terms of resource-usage, scalability and performance):
process = heavy-weight
thread = medium-weight (dozens of threads can execute within a single process space)
actor = light-weight (dozens of actors can execute by leveraging a single shared thread or multiple shared threads)
Concurrently spawning many processes takes significant operating system resources - for process creation and termination. In extreme cases, the O/S overhead to start & end processes could consume hundreds or thousands more CPU and memory resources than the actual job execution. That's why the thread-model was created (and the more efficient actor model). Think of your current processing as doing 'CGI-like' non-scalable O/S-stressing-processing from within your extremely-scalable actors - that's an anti-pattern. It doesn't take much to stress some operating systems to the point of breakage: this could be happening.
Also, if the files being read are very large in size, it would be best for scalability and reliability to limit the number of processes that concurrently read files on the same disk. It might be OK for up to 10 processes to read concurrently, I doubt it would be OK for 100.
How should an Actor invoke an external program?
Of course, if you converted your logic in myExecutable.sh into Scala, you would not need to create processes at all. Achieving scalability, performance and reliability would be more straightforward.
Assuming this is not possible/desirable, you should limit the total number of processes created and you should reuse them across different Actors / requests over time.
First solution option: (1) create a pool of processes that are reused (say size 10) (2) create actors (say 100) that communicate to/from the processes via ProcessIO (3) if all processes are busy with processing, then it is OK/appropriate that Actors block until one becomes available. The issue with this option: complexity; the 100 actors must do work to interact with the process pool and the actors themselves add little value when the processes are the bottle-neck.
Better solution option: (1) create a limited number of actors (say 10) (2) have each actor create 1 private long-running process (i.e. no pool as such) (3) have each actor communicate to/from via ProcessIO, blocking if the process is busy. Issue: still not as simple as possible; actors interact poorly with blocking processes.
Best solution option: (1) no actors, a simple for-loop from your main thread will achieve the same benefits as actors (2) create a limited number of processes (10) (3) via for-loop, sequentially interact each process using ProcessIO (if busy - block or skip to next iteration)
Is there any way to setup a timeout for this process to return, and if it exceeds retry?
Indeed there is. One of the most powerful features of actors is the ability for some actors to spawn other actors and to act as supervisor of them (receiving failure or timeout messages, from which they can recover/restart). With 'native scala actors' this is done via rudimentary programming, generating your own checks and timeout messages. But I won't cover that because the Akka approaches are more powerful and simpler. Plus the next major Scala release (2.11) will use Akka as the supported actor model, with 'native scala actors' deprecated.
Here's an example Akka supervising actor with programmatic timeout/restart (not compiled/tested). Of course, this is not useful if you go with the 3rd solution option):
import scala.concurrent.duration._
import scala.collection.immutable.Set
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume // resumes (reuses) all child actors
case _: NullPointerException => Restart // restarts all child actors
case _: IllegalArgumentException => Stop // terminates this actor & all children
case _: Exception => Escalate // supervisor to receive exception
}
val worker = context.actorOf(Props[Worker]) // creates a supervised child actor
var pendingRequests = Set.empty[WorkerRequest]
def receive = {
case req: WorkRequest(sender, jobReq) =>
pendingRequests = pendingRequests + req
worker ! req
system.scheduler.scheduleOnce(10 seconds, self, WorkTimeout(req))
case resp: WorkResponse(req # WorkRequest(sender, jobReq), jobResp) =>
pendingRequests = pendingRequests - req
sender ! resp
case timeout: WorkTimeout(req) =>
if (pendingRequests get req != None) {
// restart the unresponsive worker
worker restart
// resend all pending requests
pendingRequests foreach{ worker ! _ }
}
}
}
A word of caution: this approach to actor supervision will not overcome poor architecture & design. If you start with suitable process/thread/actor design to meet your requirements, then supervision will promote reliability. But if you start with poor design, then there's a risk that using 'brute-force' recovery from O/S-level failures could exacerbate your problems - making process reliability worse or even causing the machine to crash.
I don't have enough info to reproduce the issue, so I can't diagnose it exactly, but here's how I'd go about diagnosing it if I were in your shoes. The basic approach is a differential diagnosis - identify possible causes, and tests that would prove or rule them out.
The first thing I'd do is to validate that the myExecutable.sh process spawned by the application is actually terminating.
If the process isn't terminating, then this is part of the problem, so we need to understand why. One thing we could do is to run something other than myExecutable.sh. You suggested that ByteArrayInputStream may be part of the problem, which suggests that myExecutable.sh is getting bad input on stdin. If that's the case, then you could instead run a script that simply logs its input to a file, which would show this. If the input is invalid, then ByteArrayInputStream is providing bad data for some reason - thread safety and unicode are the obvious culprits, but looking at the actual bad data should give you a clue. If the input is valid, then it's a bug in myExecutable.sh.
If the process is terminating, then the problem is somewhere else. My first guesses would be that it's either related to actor scheduling (actor libraries typically use ForkJoin for execution, which is great, but doesn't deal well with blocking code), or a bug in the scala.sys.process library (wouldn't be unprecedented - I had to drop scala.sys.process from a project I was working on because of a memory leak).
Looking at the stack trace for a hung thread should give you some clues (VisualVM is your friend), as you should be able to see what's waiting. You can then find the relevant code in the OpenJDK or Scala standard library source code. Where you go from there depends on what you find.
Can you not fire off this process and its handling in a future and use a timed wait against it?
I don't think we can figure it out witout knowing myExecutable.sh or doSomethingWith.
When it hangs, try killing all the myExecutable.sh processes.
If it helps, you should inspect the myExecutable.sh.
If it does not help, you should inspect the doSomethingWith function.

How can I join started Actor?

After starting an actor
val a = actor {
loop {
react {
case "end" => exit
...
}
}
}
And sending "end" message to it
a ! "end"
I want to make sure the thread is finished.
How can it be joined?
As far as I know an actor can't be joined explicitly as in the Java-way. The concept is to introduce event-based programming, so it should only react on events.
You can trap the exit however and link your initial actor (A) to the ending actor (B) via link. That way the initial actor A receives an event when B ends. If you don't get any event you know the actor hasn't terminated. See the thread here: https://stackoverflow.com/a/4304602/999865
If you don't care about resources and knows that the thread will terminate you can also make the actor inherit from DaemonActor. Daemon actors doesn't block any system resources.
Seems to me that the "actor-way" of doing what you want is actually have your actor reply with a message "Done" to the requester, so it can take action. That way it is all still asynchronous event handling

Scala Remote Actors - Pitfalls

While writing Scala RemoteActor code, I noticed some pitfalls:
RemoteActor.classLoader = getClass().getClassLoader() has to be set in order to avoid "java.lang.ClassNotFoundException"
link doesn't always work due to "a race condition for which a NetKernel (the facility responsible for forwarding messages remotely) that backs a remote actor can close before the remote actor's proxy (more specifically, proxy delegate) has had a chance to send a message remotely indicating the local exit." (Stephan Tu)
RemoteActor.select doesn't always return the same delegate (RemoteActor.select - result deterministic?)
Sending a delegate over the network prevents the application to quit normally (RemoteActor unregister actor)
Remote Actors won't terminate if RemoteActor.alive() and RemoteActor.register() are used outside act. (See the answer of Magnus)
Are there any other pitfalls a programmer should be aware of?
Here's another; you need to put your RemoteActor.alive() and RemoteActor.register() calls inside your act method when you define your actor or the actor won't terminate when you call exit(); see How do I kill a RemoteActor?