I'm using some code with blocking statement:
blocking {
Thread.sleep(10*1000)
}
Is there a way to assert that this blocking statement is given? Or in other words: Can I write a test that fails if somebody removes the blocking statement?
Update: How to assert blocking when used in Futures?
Try playing with BlockContext.
You should get something like this:
var blocked = false // flag to detect blocking
val oldContext = BlockContext.current
val myContext = new BlockContext {
override def blockOn[T](thunk: =>T)(implicit permission: CanAwait): T = {
blocked = true
oldContext.blockOn(thunk)
}
}
BlockContext.withBlockContext(myContext) {
blocking {} // block (or not) here
}
assert(blocked) // verify that blocking happened
Update on making it work if you want to test code wrapped in Future (comment follow-up)
When you construct the Future it's factory method takes block of code (function) to execute explicitly and execution context implicitly (commonly scala.concurrent.ExecutionContext.Implicits.global).
The block of code later will be scheduled to execution context and will be runned in one of it's threads.
Now, if you simply wrap blocking piece of code into Future inside code block passed to BlockContext.withBlockContext, like you suggest in comment:
BlockContext.withBlockContext(myContext) {
Future {
blocking { Thread.sleep(100) }
}
}
... this will not work since your current thread will only do Future construction and actual code passed to Future will be executed in thread from relevant execution context (BlockContext.withBlockContext detects blockings in current thread).
Having that said, I can suggest you to do one of 3 things:
Do not wrap code you want to test into Future. If you want to test whether piece of code uses blocking or not - just do that.
Write a function and test it, you can pass it to Future in production.
Lets assume that for some reason you can't avoid creating Future in your test. In this case you'll have to tamper with execution context that is used when constucting future.
This code sample demonstrates how one could do that (reuse blocked and myContext from my original example):
// execution context that submits everything that is passed to it to global execution context
// it also wraps any work submited to it into block context that records blocks
implicit val ec = new ExecutionContext {
override def execute(runnable: Runnable): Unit = {
ExecutionContext.Implicits.global execute new Runnable {
override def run(): Unit = {
BlockContext.withBlockContext(myContext) {
runnable.run()
}
}
}
}
override def reportFailure(t: Throwable): Unit = {
ExecutionContext.Implicits.global.reportFailure(t)
}
}
// this future will use execution context defined above
val f = Future {
blocking {} // block (or not) here
}
Await.ready(f, scala.concurrent.duration.Duration.Inf)
assert(blocked)
If your Future gets created indirectly, for example, as a result calling some other function that you run in your test, then you'll have to somehow (possibly using dependency injection) drag your mocked execution context into wherever Future gets created and use it there to consruct it.
As you can see, the first option is the simpliest one and I suggest sticking to it if you can.
Related
I'm writing close() method for my ScheduledExecutorService-based timer:
override def close(implicit executionContext: ExecutionContext): Future[Unit] = {
val p = Promise[Unit]
executionContext.execute(new Runnable() {
override def run() = {
blocking {
p complete Try {
executor.shutdown()
//OK for default global execution context
//As we marked this code as blocking, additional thread
//will be used on that so no threadpool starvation
executor.awaitTermination(1, TimeUnit.DAYS)
}
}
}
})
p.future
}
But if I implement ExecutionContext by myself, this code will block one of the pool's threads because I did not find any way to get that blocking context.
So, question: Is it possible to create own ExecutionContext that can properly handle scala.concurrent.blocking?
Of course it's possible, it's just far from trivial. You would need to create an ExecutionContext that creates threads that mix in BlockContext which requires the following method:
def blockOn[T](thunk: => T)(implicit permission: CanAwait): T
blocking(thunk) will eventually lead to calling blockOn(thunk), and blockOn should figure out if the ExecutionContext has reached starvation and needs to do something or not. scala.concurrent.ExecutionContext.Implicits.global does it this way, but as you can see it uses a ForkJoinPool to do the heavy-lifting, and the implementation of that is thousands of lines of code.
Keep in mind that whether you use ExecutionContext.Implicits.global or your own `ExecutionContext, a thread will still be blocked by your code. The only difference is that the former spawns another thread to handle the fact that too many are blocked. Creating your own is likely to create some dangerous bugs though, as a lot of care has to be taken to avoid deadlocks or spawning too many threads.
EDIT: clarification of intent:
I have a (5-10 second) scala computation that aggregates some data from many AWS S3 objects at a given point in time. I want to make this information available through a REST API. I'd also like to update this information every minute or so for new objects that have been written to this bucket in the interim. The summary itself will be a large JSON blob, and can save a bunch of AWS calls if I cache the results of my S3 API calls from the previous updates (since these objects are immutable).
I'm currently writing this Spray.io based REST service in Scala. I'd like the REST server to continue serving 'stale' data even if a computation is currently taking place. Then once the computation is finished, I'd like to atomically start serving requests of the new data snapshot.
My initial idea was to have two actors, one doing the Spray routing and serving, and the other handling the long running computation and feeding the most recent cached result to the routing actor:
class MyCompute extends Actor {
var myvar = 1.0 // will eventually be several megabytes of state
import context.dispatcher
// [ALTERNATIVE A]:
// def compute() = this.synchronized { Thread.sleep(3000); myvar += 1.0 }
// [ALTERNATIVE B]:
// def compute() = { Thread.sleep(3000); this.synchronized { myvar += 1.0 }}
def compute() = { Thread.sleep(3000); myvar += 1.0 }
def receive = {
case "compute" => {
compute() // BAD: blocks this thread!
// [FUTURE]:
Future(compute()) // BAD: Not threadsafe
}
case "retrieve" => {
sender ! myvar
// [ALTERNATIVE C]:
// sender ! this.synchronized { myvar }
}
}
}
class MyHttpService(val dataService:ActorRef) extends HttpServiceActor {
implicit val timeout = Timeout(1 seconds)
import context.dispatcher
def receive = runRoute {
path("ping") {
get {
complete {
(dataService ? "retrieve").map(_.toString).mapTo[String]
}
}
} ~
path("compute") {
post {
complete {
dataService ! "compute"
"computing.."
}
}
}
}
}
object Boot extends App {
implicit val system = ActorSystem("spray-sample-system")
implicit val timeout = Timeout(1 seconds)
val dataService = system.actorOf(Props[MyCompute], name="MyCompute")
val httpService = system.actorOf(Props(classOf[MyHttpService], dataService), name="MyRouter")
val cancellable = system.scheduler.schedule(0 milliseconds, 5000 milliseconds, dataService, "compute")
IO(Http) ? Http.Bind(httpService, system.settings.config.getString("app.interface"), system.settings.config.getInt("app.port"))
}
As things are written, everything is safe, but when passed a "compute" message, the MyCompute actor will block the thread, and not be able to serve requests to the MyHttpService actor.
Some alternatives:
akka.agent
The akka.agent.Agent looks like it is designed to handle this problem nicely (replacing the MyCompute actor with an Agent), except that it seems to be designed for simpler updates of state:: In reality, MyCompute will have multiple bits of state (some of which are several megabyte datastructures), and using the sendOff functionality would seemingly rewrite all of that state every time which would seemingly apply a lot of GC pressure unnecessarily.
Synchronization
The [Future] code above solves the blocking problem, but as if I'm reading the Akka docs correctly, this would not be threadsafe. Would adding a synchronize block in [ALTERNATIVE A] solve this? I would also imagine that I only have to synchronize the actual update to the state in [ALTERNATIVE B] as well. I would seemingly also have to do the same for the reading of the state as in [ALTERNATIVE C] as well?
Spray-cache
The spray-cache pattern seems to be built with a web serving use case in mind (small cached objects available with a key), so I'm not sure if it applies here.
Futures with pipeTo
I've seen examples of wrapping a long running computation in a Future and then piping that back to the same actor with pipeTo to update internal state.
The problem with this is: what if I want to update the mutable internal state of my actor during the long running computation?
Does anyone have any thoughts or suggestions for this use case?
tl;dr:
I want my actor to update internal, mutable state during a long running computation without blocking. Ideas?
So let the MyCompute actor create a Worker actor for each computation:
A "compute" comes to MyCompute
It remembers the sender and spawns the Worker actor. It stores the Worker and the Sender in Map[Worker, Sender]
Worker does the computation. On finish, Worker sends the result to MyCompute
MyCompute updates the result, retrieves the orderer of it from the Map[Worker, Sender] using the completed Worker as the key. Then it sends the result to the orderer, and then it terminates the Worker.
Whenever you have blocking in an Actor, you spawn a dedicated actor to handle it. Whenever you need to use another thread or Future in Actor, you spawn a dedicated actor. Whenever you need to abstract any complexity in Actor, you spawn another actor.
I have a Java library that performs long running blocking operations. The library is designed to respond to user cancellation requests. The library integration point is the Callable interface.
I need to integrate this library into my application from within an Actor. My initial thought was to do something like this:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
val callable: java.util.concurrent.Callable[Void] = ???
val fut = Future {
callable.call()
}
fut.onSuccess {
case _ => // continue on success path
}
fut.onFailure {
case throwable => // handle exceptions
}
I think this code will work properly in as much as it will not block the actor. But I don't know how I would provide a way to cancel the operation. Assume that while the callable is processing, the actor receives a message that indicates it should cancel the operation being worked on in the callable, and that the library is responsive to cancellation requests via interrupting the processing thread.
What is the best practice to submit a Callable from within an Actor and sometime later cancel the operation?
UPDATE
To be clear, the library exposes an instance of the java.util.concurrent.Callable interface. Callable in and of itself does not provide a cancel method. But the callable object is implemented in such a way that it is responsive to cancellation due to interrupting the thread. In java, this would be done by submitting the callable to an Executor. This would return a java.util.concurrent.Future. It is this Future object that provides the cancel method.
In Java I would do the following:
ExecutorService executor = ...
Callable c = ...
Future f = executor.submit(c);
...
// do more stuff
...
// If I need to cancel the callable task I just do this:
f.cancel(true);
It seems there is a disconnect between a java.util.concurrent.Future and scala.concurrent.Future. The java version provides a cancel method while the scala one does not.
In Scala I would do this:
// When the Akka Actor receives a message to process a
// long running/blocking task I would schedule it to run
// on a different thread like this:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
val callable: java.util.concurrent.Callable[Void] = ???
val fut = Future {
callable.call()
}
fut.onSuccess {
case _ => // continue on success path
}
fut.onFailure {
case throwable => // handle exceptions
}
// But now, if/when the actor receives a message to cancel
// the task because it is taking too long to finish (even
// though it is running in the background) there is no way
// that I know of to cancel or interrupt the
// scala.concurrent.Future.
Is there an idiomatic scala approach for cancelling a scala.concurrent.Future?
From what I understood your library is exposing an interface that has call and some cancel method, right? I'm assuming you can just call cancel whenever you want to. An example like the one below should get you started.
class InterruptableThingy extends Actor {
implicit val ctx = context.system.dispatchers.lookup("dedicated-dispatcher")
var counter = 0
var tasks = Map.empty[Int, HasCancelMethod]
def receive = {
case "doThing" =>
counter += 1
val id = counter
val thing = ???
Future { thing.call() } onSuccess {} // ...
tasks(id) = thing
sender() ! id
case Interrupt(id) =>
tasks(id).cancel()
tasks -= id
}
}
case class Interrupt(taskId: Int)
Please notice that we're using a dedicated dispatcher for the blocking Futures. This is a very good pattern as you can configure that dedicated dispatcher fittingly to your blocking workloads (and won't eat up resourced in the default dispatcher). Dispatchers are explained in more detail in the docs here: http://doc.akka.io/docs/akka/2.3.3/scala/dispatchers.html
I'm using the Netty library (version 4 from GitHub). It works great in Scala, but I am hoping for my library to be able to use continuation passing style for the asynchronous waiting.
Traditionally with Netty you would do something like this (an example asynchronous connect operation):
//client is a ClientBootstrap
val future:ChannelFuture = client.connect(remoteAddr);
future.addListener(new ChannelFutureListener {
def operationComplete (f:ChannelFuture) = {
//here goes the code that happens when the connection is made
}
})
If you are implementing a library (which I am) then you basically have three simple options to allow the user of the library to do stuff after the connection is made:
Just return the ChannelFuture from your connect method and let the user deal with it - this doesn't provide much abstraction from netty.
Take a ChannelFutureListener as a parameter of your connect method and add it as a listener to the ChannelFuture.
Take a callback function object as a parameter of your connect method and call that from within the ChannelFutureListener that you create (this would make for a callback-driven style somewhat like node.js)
What I am trying to do is a fourth option; I didn't include it in the count above because it is not simple.
I want to use scala delimited continuations to make the use of the library be somewhat like a blocking library, but it will be nonblocking behind the scenes:
class MyLibraryClient {
def connect(remoteAddr:SocketAddress) = {
shift { retrn: (Unit => Unit) => {
val future:ChannelFuture = client.connect(remoteAddr);
future.addListener(new ChannelFutureListener {
def operationComplete(f:ChannelFuture) = {
retrn();
}
});
}
}
}
}
Imagine other read/write operations being implemented in the same fashion. The goal of this being that the user's code can look more like this:
reset {
val conn = new MyLibraryClient();
conn.connect(new InetSocketAddress("127.0.0.1", 1337));
println("This will happen after the connection is finished");
}
In other words, the program will look like a simple blocking-style program but behind the scenes there won't be any blocking or threading.
The trouble I'm running into is that I don't fully understand how the typing of delimited continuations work. When I try to implement it in the above way, the compiler complains that my operationComplete implementation actually returns Unit #scala.util.continuations.cpsParam[Unit,Unit => Unit] instead of Unit. I get that there is sort of a "gotcha" in scala's CPS in that you must annotate a shift method's return type with #suspendable, which gets passed up the call stack until the reset, but there doesn't seem to be any way to reconcile that with a pre-existing Java library that has no concept of delimited continuations.
I feel like there really must be a way around this - if Swarm can serialize continuations and jam them over the network to be computed elsewhere, then it must be possible to simply call a continuation from a pre-existing Java class. But I can't figure out how it can be done. Would I have to rewrite entire parts of netty in Scala in order to make this happen?
I found this explanation of Scala's continuations extremely helpful when I started out. In particular pay attention to the parts where he explains shift[A, B, C] and reset[B, C]. Adding a dummy null as the last statement of operationComplete should help.
Btw, you need to invoke retrn() inside another reset if it may have a shift nested inside it.
Edit: Here is a working example
import scala.util.continuations._
import java.util.concurrent.Executors
object Test {
val execService = Executors.newFixedThreadPool(2)
def main(args: Array[String]): Unit = {
reset {
val conn = new MyLibraryClient();
conn.connect("127.0.0.1");
println("This will happen after the connection is finished");
}
println("Outside reset");
}
}
class ChannelFuture {
def addListener(listener: ChannelFutureListener): Unit = {
val future = this
Test.execService.submit(new Runnable {
def run(): Unit = {
listener.operationComplete(future)
}
})
}
}
trait ChannelFutureListener {
def operationComplete(f: ChannelFuture): Unit
}
class MyLibraryClient {
def connect(remoteAddr: String): Unit#cps[Unit] = {
shift {
retrn: (Unit => Unit) => {
val future: ChannelFuture = new ChannelFuture()
future.addListener(new ChannelFutureListener {
def operationComplete(f: ChannelFuture): Unit = {
println("operationComplete starts")
retrn();
null
}
});
}
}
}
}
with a possible output:
Outside reset
operationComplete starts
This will happen after the connection is finished
This program, after executing main(), does not exit.
object Main
{
def main(args: Array[String]) {
... // existing code
f()
... // existing code
}
def f() {
import scala.actors.Actor._
val a = actor {
loop {
react {
case msg: String => System.out.println(msg)
}
}
}
a ! "hello world"
}
}
Because of this unexpected side-effect, using actors can be viewed as intrusive.
Assuming the actors must continue to run until program termination, how would you do to preserve original behavior in all cases of termination?
In 2.8 there's a DaemonActor class that allows this. In 2.7.x I you could hack in a custom scheduler that doesn't prevent shutdown even if there are still live actors, or if you want an easy way you could call System.exit() at the end of main.
If you think of an actor as kind of a light-weight thread, much of the time you want a live actor to prevent program termination. Otherwise if you have a program that does all of its work in actors, you'd need to have something on the main thread just to keep it alive until all the actors finish.
After the main thread in the above example completed, the program still had a non-daemon thread running the actor. It is usually a bad idea to brutally terminate running threads using Thread.destroy() or System.exit() for results may be very bad for your program including, but not limited to, data corruption and deadlocks. That is why Thread.destroy() and alike methods were deprecated in Java for the first place. The right way would be to explicitly implement termination logic in your threads. In case of Scala actors that boils down to sending a Stop message to all running actors and make them quit when they get it. With this approach your eample would look like this:
object Main
{
case object Stop
def main(args: Array[String]) {
... // existing code
val a = f()
a ! "hello world"
... // existing code
a ! Stop
}
def f() = {
import scala.actors.Actor._
actor {
loop {
react {
case msg: String => System.out.println(msg)
case Stop => exit()
}
}
}
}
}