Idiomatically scheduling background work that dies with the main thread in Scala - scala

I have a scala program that runs for a while and then terminates. I'd like to provide a library to this program that, behind the scenes, schedules an asynchronous task to run every N seconds. I'd also like the program to terminate when the main entrypoint's work is finished without needing to explicitly tell the background work to shut down (since it's inside a library).
As best I can tell the idiomatic way to do polling or scheduled work in Scala is with Akka's ActorSystem.scheduler.schedule, but using an ActorSystem makes the program hang after main waiting for the actors. I then tried and failed to add another actor that joins on the main thread, seemingly because "Anything that blocks a thread is not advised within Akka"
I could introduce a custom dispatcher; I could kludge something together with a polling isAlive check, or adding a similar check inside each worker; or I could give up on Akka and just use raw Threads.
This seems like a not-too-unusual thing to want to do, so I'd like to use idiomatic Scala if there's a clear best way.

I don't think there is an idiomatic Scala way.
The JVM program terminates when all non-daemon thread are finished. So you can schedule your task to run on a daemon thread.
So just use Java functionality:
import java.util.concurrent._
object Main {
def main(args: Array[String]): Unit = {
// Make a ThreadFactory that creates daemon threads.
val threadFactory = new ThreadFactory() {
def newThread(r: Runnable) = {
val t = Executors.defaultThreadFactory().newThread(r)
t.setDaemon(true)
t
}
}
// Create a scheduled pool using this thread factory
val pool = Executors.newSingleThreadScheduledExecutor(threadFactory)
// Schedule some function to run every second after an initial delay of 0 seconds
// This assumes Scala 2.12. In 2.11 you'd have to create a `new Runnable` manually
// Note that scheduling will stop, if there is an exception thrown from the function
pool.scheduleAtFixedRate(() => println("run"), 0, 1, TimeUnit.SECONDS)
Thread.sleep(5000)
}
}
You can also use guava to create a daemon thread factory with new ThreadFactoryBuilder().setDaemon(true).build().

If you use Akka scheduler you will be relying on highly tuned and optimized implementation that is well tested. Bringing up an actor system is a bit heavy weight though, I agree. Additionally you have to bring in a dependency on akka. If you are ok with that you can explicitly call system.shutdown from main when you are done, or wrap it in a function that will do it for you.
Alternatively, you could try something along these lines:
import scala.concurrent._
import ExecutionContext.Implicits.global
object Main extends App {
def repeatEvery[T](timeoutMillis: Int)(f: => T): Future[T] = {
val p = Promise[T]()
val never = p.future
f
def timeout = Future {
Thread.sleep(timeoutMillis)
throw new TimeoutException
}
val failure = Future.firstCompletedOf(List(never, timeout))
failure.recoverWith { case _ => repeatEvery(timeoutMillis)(f) }
}
repeatEvery(1000) {
println("scheduled job called")
}
println("main started doing its work")
Thread.sleep(10000)
println("main finished")
}
Prints:
scheduled job called
main started doing its work
scheduled job called
scheduled job called
scheduled job called
scheduled job called
scheduled job called
scheduled job called
scheduled job called
scheduled job called
scheduled job called
main finished
I don't like that it uses Thread.sleep, but that is done to avoid using any other 3rd party schedulers and Scala Future does not provide timeout options. So you'll be wasting one thread on that scheduling task, but that's what Akka scheduler seems to do anyway. The difference is that perhaps you want a single scheduler for the whole JVM not to waste too many threads. The code I provided albeit simpler will waste a thread per job.

Related

interrupt scala parallel collection

Is there any way to interrupt a parallel collection computation in Scala?
Example:
val r = new Runnable {
override def run(): Unit = {
(1 to 3).par.foreach { _ => Thread.sleep(5000000) }
}
}
val t = new Thread(r)
t.start()
Thread.sleep(300) // let them spin up
t.interrupt()
I'd expect t.interrupt to interrupt all threads spawned by par, but this is not happening, it keeps spinning inside ForkJoinTask.externalAwaitDone. Looks like that method clears the interrupted status and keeps waiting for the spawned threads to finish.
This is Scala 2.12
The thread that you t.start() is responsible just for starting parallel computations and to wait and gather the result.
It is not connected to threads that compute operations. Usually, it runs on default forkJoinPool that independent from the thread that submits computation tasks.
If you want to interrupt the computation, you can use custom execution back-end (like manually created forkJoinPool or a threadPool), and then shut it down. You can read about that here.
Or you can provide a callback from the computation.
But all those approaches are not so good for such a case.
If you producing a production solution or your case is complex and critical for the app, you probably should use something that has cancellation by design. Like Monix.Task or CancellableFuture.
Or at least use Future and cancel it with workarounds.

What can cause Akka's Scheduler to execute scheduled tasks before the scheduled time?

I'm experiencing a strange behaviour when using Akka's scheduler. My code looks roughly like this:
val s = ActorSystem("scheduler")
import scala.concurrent.ExecutionContext.Implicits.global
def doSomething(): Future[Unit] = {
val now = new GregorianCalendar(TimeZone.getTimeZone("UTC"))
println(s"${now.get(Calendar.MINUTE)}:${now.get(Calendar.SECOND)}:${now.get(Calendar.MILLISECOND)}" )
// Do many things that include an http request using "dispatch" and manipulation of the response and saving it in a file.
}
val futures: Seq[Future[Unit]] = for (i <- 1 to 500) yield {
println(s"$i : ${i*600}")
// AlphaVantage recommends 100 API calls per minute
akka.pattern.after(i * 600 milliseconds, s.scheduler) { doSomething() }
}
Future.sequence(futures).onComplete(_ => s.terminate())
When I execute my code, doSomething is initially called repeatedly with 600 milliseconds between successive calls, as expected. However, after a while, all remaining scheduled calls are suddenly executed simultaneously.
I suspect that something inside my doSomething might be interfering with the scheduling, but I don't know what. My doSomething just does an http request using dispatch and manipulates the result, and does not interact directly with akka or the scheduler in any way. So, my question is:
What can cause the Scheduler's schedule to fail and suddenly trigger the immediate execution of all remaining scheduled tasks?
(PS: I tried to simplify my doSomething to post a minimal non-working example here, but my simplifications resulted in working examples.)
Ok. I figured it out. As soon as one of the futures fail, the line
Future.sequence(futures).onComplete(_ => s.terminate())
will terminate the actor system, and all remaining scheduled tasks will be called.

Is it correct to use `Future` to run some loop task which is never finished?

In our project, we need to do a task which listens to a queue and process the coming messages in a loop, which is never finished. The code is looking like:
def processQueue = {
while(true) {
val message = queue.next();
processMessage(message) match {
case Success(_) => ...
case _ => ...
}
}
}
So we want to run it in a separate thread.
I can imagine two ways to do it, one is to use Thread as what we do in Java:
new Thread(new Runnable() { processQueue() }).start();
Another way is use Future (as we did now):
Future { processQueue }
I just wonder if is it correct to use Future in this case, since as I know(which might be wrong), Future is mean to be running some task which will finish or return a result in some time of the future. But our task is never finished.
I also wonder what's the best solution for this in scala.
A Future is supposed to a value that will eventually exist, so I don't think it makes much sense to create one that will never be fulfilled. They're also immutable, so passing information to them is a no-no. And using some externally referenced queue within the Future sounds like dark road to go down.
What you're describing is basically an Akka Actor, which has it's own FIFO queue, with a receive method to process messages. It would look something like this:
import akka.actor._
class Processor extends Actor {
def receive = {
case msg: String => processMessage(msg) match {
case Success(x) => ...
case _ => ...
}
case otherMsg # Message(_, _) => {
// process this other type of message..
}
}
}
Your application could create a single instance of this Processor actor with an ActorSystem (or some other elaborate group of these actors):
val akkaSystem = ActorSystem("myActorSystem")
val processor: ActorRef = akkaSystem.actorOf(Props[Processor], "Processor")
And send it messages:
processor ! "Do some work!"
In short, it's a better idea to use a concurrency framework like Akka than to create your own for processing queues on separate threads. The Future API is most definitely not the way to go.
I suggest perusing the Akka Documentation for more information.
If you are just running one thread (aside from the main thread), it won't matter. If you do this repeatedly, and you really want lots of separate threads, you should use Thread since that is what it is for. Futures are built with the assumption that they'll terminate, so you might run out of pool threads.

Scala how to use akka actors to handle a timing out operation efficiently

I am currently evaluating javascript scripts using Rhino in a restful service. I wish for there to be an evaluation time out.
I have created a mock example actor (using scala 2.10 akka actors).
case class Evaluate(expression: String)
class RhinoActor extends Actor {
override def preStart() = { println("Start context'"); super.preStart()}
def receive = {
case Evaluate(expression) ⇒ {
Thread.sleep(100)
sender ! "complete"
}
}
override def postStop() = { println("Stop context'"); super.postStop()}
}
Now I run use this actor as follows:
def run {
val t = System.currentTimeMillis()
val system = ActorSystem("MySystem")
val actor = system.actorOf(Props[RhinoActor])
implicit val timeout = Timeout(50 milliseconds)
val future = (actor ? Evaluate("10 + 50")).mapTo[String]
val result = Try(Await.result(future, Duration.Inf))
println(System.currentTimeMillis() - t)
println(result)
actor ! PoisonPill
system.shutdown()
}
Is it wise to use the ActorSystem in a closure like this which may have simultaneous requests on it?
Should I make the ActorSystem global, and will that be ok in this context?
Is there a more appropriate alternative approach?
EDIT: I think I need to use futures directly, but I will need the preStart and postStop. Currently investigating.
EDIT: Seems you don't get those hooks with futures.
I'll try and answer some of your questions for you.
First, an ActorSystem is a very heavy weight construct. You should not create one per request that needs an actor. You should create one globally and then use that single instance to spawn your actors (and you won't need system.shutdown() anymore in run). I believe this covers your first two questions.
Your approach of using an actor to execute javascript here seems sound to me. But instead of spinning up an actor per request, you might want to pool a bunch of the RhinoActors behind a Router, with each instance having it's own rhino engine that will be setup during preStart. Doing this will eliminate per request rhino initialization costs, speeding up your js evaluations. Just make sure you size your pool appropriately. Also, you won't need to be sending PoisonPill messages per request if you adopt this approach.
You also might want to look into the non-blocking callbacks onComplete, onSuccess and onFailure as opposed to using the blocking Await. These callbacks also respect timeouts and are preferable to blocking for higher throughput. As long as whatever is way way upstream waiting for this response can handle the asynchronicity (i.e. an async capable web request), then I suggest going this route.
The last thing to keep in mind is that even though code will return to the caller after the timeout if the actor has yet to respond, the actor still goes on processing that message (performing the evaluation). It does not stop and move onto the next message just because a caller timed out. Just wanted to make that clear in case it wasn't.
EDIT
In response to your comment about stopping a long execution there are some things related to Akka to consider first. You can call stop the actor, send a Kill or a PosionPill, but none of these will stop if from processing the message that it's currently processing. They just prevent it from receiving new messages. In your case, with Rhino, if infinite script execution is a possibility, then I suggest handling this within Rhino itself. I would dig into the answers on this post (Stopping the Rhino Engine in middle of execution) and setup your Rhino engine in the actor in such a way that it will stop itself if it has been executing for too long. That failure will kick out to the supervisor (if pooled) and cause that pooled instance to be restarted which will init a new Rhino in preStart. This might be the best approach for dealing with the possibility of long running scripts.

Executing CPU-bound tasks with Scala actors?

Suppose I have to ехеcute several CPU-bound tasks. If I have 4 CPUs, for example, I would probably create a fixed-size thread pool of 4-5 worker threads waiting on a queue and put the tasks in the queue. In Java I can use java.util.concurrent (maybe ThreadPoolExecutor) to implement this mechanism.
How would you implement it with Scala actors?
All actors are basically threads which are executed by a scheduler under the hood. The scheduler creates a thread pool to execute actors roughly bound to your number of cores. This means that you can just create an actor per task you need to execute and leave the rest to Scala:
for(i <- 1 to 20) {
actor {
print(i);
Thread.sleep(1000);
}
}
The disadvantage here is depending on the number of tasks, the cost of creating a thread for each task may be quite expensive since threads are not so cheap in Java.
A simple way to create a bounded pool of worker actors and then distribute the tasks to them via messaging would be something like:
import scala.actors.Actor._
val numWorkers = 4
val pool = (1 to numWorkers).map { i =>
actor {
loop {
react {
case x: String => println(x)
}
}
}
}
for(i <- 1 to 20) {
val r = (new util.Random).nextInt(numWorkers)
pool(r) ! "task "+i
}
The reason we want to create multiple actors is because a single actor processes only one message (i.e. task) at a time so to get parallelism for your tasks you need to create multiple.
A side note: the default scheduler becomes particularly important when it comes to I/O bound tasks, as you will definitely want to change the size of the thread pool in that case. Two good blog posts which go into details about this are: Explore the Scheduling of Scala Actors and Scala actors thread pool pitfall.
With that said, Akka is an Actor framework that provides tools for more advanced workflows with Actors, and it is what I would use in any real application. Here is a load balancing (rather than random) task executor:
import akka.actor.Actor
import Actor._
import akka.routing.{LoadBalancer, CyclicIterator}
class TaskHandler extends Actor {
def receive = {
case t: Task =>
// some computationally expensive thing
t.execute
case _ => println("default case is required in Akka...")
}
}
class TaskRouter(numWorkers: Int) extends Actor with LoadBalancer {
val workerPool = Vector.fill(numWorkers)(actorOf[TaskHandler].start())
val seq = new CyclicIterator(workerPool)
}
val router = actorOf(new TaskRouter(4)).start()
for(i <- 1 to 20) {
router ! Task(..)
}
You can have different types of Load Balancing (CyclicIterator is round-robin distribution), so you can check the docs here for more info.
Well, you usually don't. Part of the attraction of using actors is that they handle such details for you.
If, however, you insist on managing that, you'll need to override the protected scheduler method on your Actor class to return an appropriate IScheduler. See also the scala.actors.scheduler package, and the comments on the Actor trait concerning schedulers.