How do I safely use ThreadLocal caches in Actors? - scala

My app ends up doing a lot of background processing via Actors, specifically loading Mapper instances and then doing some work upon them. It's very repetitive and I'd like to cache some of these lookups across my Actor code.
I'd typically use a ThreadLocal for this. However, since the thread initialization is handled by the Actor thread pool, it seems like the only place to initialize and subsequently clear the ThreadLocal would be in the actor's PartialFunction which receives incoming messages.
What I'm doing now is to create another method in my Actor, like this:
override def aroundUpdates[T](fn: => T) : T = {
clientCache.init {
fn
}
}
Where the init method handles clearing the ThreadLocal in a finally block. I don't like this approach because aroundUpdates only exists for the purpose of setting up the cache and it smells like a code smell.
Is there a better way to do this?

You don't need to use thread-locals: during a single reaction, you are running in a single thread. Hence you could just use a normal var. What's more, because your reactions are sequential and the actor subsystem manages synchronization for you, you could (If you want) access the state from different reactions:
def act = loop {
var state : String = null
def foo = state = "Hello"
def bar = { println(state + " World"); state = null }
def baz = println(state + " Oxbow")
react {
case MsgA => foo; bar
case MsgB => baz
}
}
Hence thread locals make no sense whatsoever to use in your own reactions!

Related

Atomic compareAndSet parameters are evaluated even if it's not used

I have the following code that set the Atomic variable (both java.util.concurrent.atomic and monix.execution.atomic behaves the same:
class Foo {
val s = AtomicAny(null: String)
def foo() = {
println("called")
/* Side Effects */
"foo"
}
def get(): String = {
s.compareAndSet(null, foo())
s.get
}
}
val f = new Foo
f.get //Foo.s set from null to foo, print called
f.get //Foo.s not updated, but still print called
The second time it compareAndSet, it did not update the value, but still foo is called. This is causing problem because foo is having side effects (in my real code, it creates an Akka actor and give me error because it tries to create duplicate actors).
How can I make sure the second parameter is not evaluated unless it is actually used? (Preferably not using synchronized)
I need to pass implicit parameter to foo so lazy val would not work. E.g.
lazy val s = get() //Error cannot provide implicit parameter
def foo()(implicit context: Context) = {
println("called")
/* Side Effects */
"foo"
}
def get()(implicit context: Context): String = {
s.compareAndSet(null, foo())
s.get
}
Updated answer
The quick answer is to put this code inside an actor and then you don't have to worry about synchronisation.
If you are using Akka Actors you should never need to do your own thread synchronisation using low-level primitives. The whole point of the actor model is to limit the interaction between threads to just passing asynchronous messages. This provides all the thread synchronisation that you need and guarantees that an actor processes a single message at a time in a single-threaded manner.
You should definitely not have a function that is accessed simultaneously by multiple threads that creates a singleton actor. Just create the actor when you have the information you need and pass the ActorRef to any other actors that need it using dependency injection or a message. Or create the actor at the start and initialise it when the first message arrives (using context.become to manage the actor state).
Original answer
The simplest solution is just to use a lazy val to hold your instance of foo:
class Foo {
lazy val foo = {
println("called")
/* Side Effects */
"foo"
}
}
This will create foo the first time it is used and after that will just return the same value.
If this is not possible for some reason, use an AtomicInteger initialised to 0 and then call incrementAndGet. If this returns 1 then it is the first pass through this code and you can call foo.
Explanation:
Atomic operations such as compareAndSet require support from the CPU instruction set, and modern processors have single atomic instructions for such operations. In some cases (e.g. cache line is held exclusively by this processor) the operation can be very fast. Other cases (e.g. cache line also in cache of another processor) the operation can be significantly slower and can impact other threads.
The result is that the CPU must be holding the new value before the atomic instruction is executed. So the value must be computed before it is known whether it is needed or not.

Asynchronous message handling with Akka's Actors

In my project I'm using Akka's Actors. By definition Actors are thread-safe, which means that in the Actor's receive method
def receive = {
case msg =>
// some logic here
}
only one thread at a time processes the commented piece of code. However, things are starting to get more complicated when this code is asynchronous:
def receive = {
case msg =>
Future {
// some logic here
}
}
If I understand this correctly, in this case only the Future construct will be synchronized, so to speak, and not the logic inside the Future.
Of course I may block the Future:
def receive = {
case msg =>
val future = Future {
// some logic here
}
Await.result(future, 10.seconds)
}
which solves the problem, but I think we all should agree that this is hardly an acceptable solution.
So this is my question: how can I retain the thread-safe nature of actors in case of asynchronous computing without blocking Scala's Futures?
How can I retain the thread-safe nature of actors in case of
asynchronous computing without block Scalas Future?
This assumption is only true if you modify the internal state of the actor inside the Future which seems to be a design smell in the first place. Use the future for computation only by creating a copy of the data and pipe to result of the computation to the actor using pipeTo. Once the actor receives the result of the computation you can safely operate on it:
import akka.pattern.pipe
case class ComputationResult(s: String)
def receive = {
case ComputationResult(s) => // modify internal state here
case msg =>
Future {
// Compute here, don't modify state
ComputationResult("finished computing")
}.pipeTo(self)
}
I think you need to "resolve" the db query first and then use the result to return a new Future. If the db query returns a Future[A], then you can use flatMap to operate over A and return a new Future. Something in the lines of
def receive = {
case msg =>
val futureResult: Future[Result] = ...
futureResult.flatMap { result: Result =>
// ....
// return a new Future
}
}
the simplest solution here is to turn the actor into a state machine (use AkkaFSM) and do the following:
dispatch a future for the mongoDB request.
use the reference to your own actor to commuincate with your actor
tell the message back from the future.
depending on context you might have to do some more to get a proper response.
But this has the advantage that you process the message with the actor state and you can mutate the actor state as you please as you own the thread.

Should methods and members of Actor be defined as private

What is the best practice when defining an actor?
Actor state: is it better to define a "var" with a collection like in the code below or is it better to define a "val" with mutable collection ? should we define it as private ?
should we define methods of Actor as private ?
class FooActor(out:ActorRef)extends Actor {
private var words:List[String] = Nil
override def receive: Receive = ???
def foo()=???
}
On the first point, generally, I would go with neither. Instead, set the receive method to a method taking the collection as a parameter, and update the actor's state when the collection changes using context.become(...). Eg:
class FooActor(out:ActorRef)extends Actor {
override def receive: Receive = active(Nil)
def active(words:List[String]): Receive = Receive {
case word_to_add: String => context.become(active(word_to_add :: words))
case ...
}
private def foo()=???
}
On the second point, any helper methods are probably only for the actor's own use, so make them private.
To the first point it really depends on how large the collection of items is going to be that you're mutating. Are you going to be adding 100k items to a Map over the course of 100k messages? If this is the case perhaps you should be using a mutable collection so as to avoid the overhead of copying the entire collection to add each item. Make a smart decision based on the use case.
Here's a reference to the performance of mutable vs. immutable collections: http://www.scala-lang.org/docu/files/collections-api/collections.html
To the second point the visibility of the methods doesn't matter in terms of the interface with the Actor. The only way that you should be interacting with an Actor is through asking and telling messages so the visibility of any member methods is of little consequence outside of inferring purpose to the reader.

How to run futures on the current actor's dispatcher in Akka

Akka's documentation warns:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback
It seems to me that if I could get the Future which wants to access the mutable state to run on the same dispatcher that arranges for mutual exclusion of threads handling actor messages then this issue could be avoided. Is that possible? (Why not?)
The ExecutionContext provided by context.dispatcher is not tied to the actor messages dispatcher, but what if it were? i.e.
class MyActorWithSafeFutures {
implicit def safeDispatcher = context.dispatcherOnMessageThread
var successCount = 0
var failureCount = 0
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount += 1
case Failure(_) => failureCount += 1
}
response pipeTo sender()
}
}
}
Is there any way to do that in Akka?
(I know that I could convert the above example to do something like self ! IncrementSuccess, but this question is about mutating actor state from Futures, rather than via messages.)
It looks like I might be able to implement this myself, using code like the following:
class MyActorWithSafeFutures {
implicit val executionContext: ExecutionContextExecutor = new ExecutionContextExecutor {
override def execute(runnable: Runnable): Unit = {
self ! runnable
}
override def reportFailure(cause: Throwable): Unit = {
throw new Error("Unhandled throwable", cause)
}
}
override def receive: Receive = {
case runnable: Runnable => runnable.run()
... other cases here
}
}
Would that work? Why doesn't Akka offer that - is there some huge drawback I'm not seeing?
(See https://github.com/jducoeur/Requester for a library which does just this in a limited way -- for Asks only, not for all Future callbacks.)
Your actor is executing its receive under one of the dispatcher's threads, and you want to spin off a Future that's firmly attached to this particular thread? In that case the system can't reuse this thread to run a different actor, because that would mean the thread was unavailable when you wanted to execute the Future. If it happened to use that same thread to execute someClient, you might deadlock with yourself. So this thread can no longer be used freely to run other actors - it has to belong to MySafeActor.
And no other threads can be allowed to freely run MySafeActor - if they were, two different threads might try to update successCount at the same time and you'd lose data (e.g. if the value is 0 and two threads both try to do successCount +=1, the value can end up as 1 rather that 2). So to do this safely, MySafeActor has to have a single Thread that's used for itself and its Future. So you end up with MySafeActor and that Future being tightly, but invisibly, coupled. The two can't run at the same time and could deadlock against each other. (It's still possible for a badly-written actor to deadlock against itself, but the fact that all the code using that actor's "imaginary mutex" is in a single place makes it easier to see potential problems).
You could use traditional multithreading techniques - mutexes and the like - to allow the Future and MySafeActor to run concurrently. But what you really want is to encapsulate successCount in something that can be used concurrently but safely - some kind of... Actor?
TL;DR: either the Future and the Actor: 1) may not run concurrently, in which case you may deadlock 2) may run concurrently, in which case you will corrupt data 3) access state in a concurrency-safe way, in which case you're reimplementing Actors.
You could use a PinnedDispatcher for your MyActorWithSafeFutures actor class which would create a thread pool with exactly one thread for each instance of the given class, and use context.dispatcher as execution context for your Future.
To do this you have to put something like this in your application.conf:
akka {
...
}
my-pinned-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
and to create your actor:
actorSystem.actorOf(
Props(
classOf[MyActorWithSafeFutures]
).withDispatcher("my-pinned-dispatcher"),
"myActorWithSafeFutures"
)
Although what you are trying to achieve breaks completely the purpose of the actor model. The actor state should be encapsulated, and actor state changes should be driven by incoming messages.
This does not answer your question directly, but rather offers an alternative solution using Akka Agents:
class MyActorWithSafeFutures extends Actor {
var successCount = Agent(0)
var failureCount = Agent(0)
def doSomethingWithPossiblyStaleCounts() = {
val (s, f) = (successCount.get(), failureCount.get())
statisticsCollector ! Ratio(f/s+f)
}
def doSomethingWithCurrentCounts() = {
val (successF, failureF) = (successCount.future(), failureCount.future())
val ratio : Future[Ratio] = for {
s <- successF
f <- failureF
} yield Ratio(f/s+f)
ratio pipeTo statisticsCollector
}
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount.send(_ + 1)
case Failure(_) => failureCount.send(_ + 1)
}
response pipeTo sender()
}
}
The catch is that if you want to operate on the counts that would result if you were using #volatile, then you need to operate inside a Future, see doSomethingWithCurrentCounts().
If you are fine with having values which are eventually consistent (there might be pending updates scheduled for the Agents), then something like doSometinghWithPossiblyStaleCounts() is fine.
#rkuhn explains why this would be a bad idea on the akka-user list:
My main consideration here is that such a dispatcher would make it very convenient to have multiple concurrent entry points into the Actor’s behavior, where with the current recommendation there is only one—the active behavior. While classical data races are excluded by the synchronization afforded by the proposed ExecutionContext, it would still allow higher-level races by suspending a logical thread and not controlling the intermediate execution of other messages. In a nutshell, I don’t think this would make the Actor easier to reason about, quite the opposite.

Appropriate to use Futures and Promises for delayed initialization?

Is it appropriate to use Futures and Promises for delayed initialization, rather than using an Option var or some mutable variable?
You could create a factory class that encapsulates the promise:
class IntFactory{
val intPromise = Promise[Int]
def create () : Future[Int] = intPromise.future
def init (data : String) : Unit = intPromise success data.length
}
An actor or some other class could then use it like this:
class MyActor(factory : IntFactory) extends Actor{
val future_int = factory.create()
def receive = {
case (msg : String) => factory.init(msg) // Now the promise is fulfilled
}
}
Is there anything wrong with doing something like this? It may not have been ideal to use an actor as an example, as I think there are better alternatives for actors (become or FSM). I am currently considering using this with a non-actor class. Some of the instance variables are nothing until certain events occur. I was considering doing this instead of using a var Option and setting it to None. If this is bad, what are some other alternatives?
EDIT:
I thought of situations where this might be more useful. If I had multiple things that needed to be initialized, and I had some async action that I wanted to perform when it was all done:
class MyActor(factory1 : IntFactory, factory2 : IntFactory) extends Actor{
val future_int1 = factory1.create()
val future_int2 = factory2.create()
for{
x <- future_int1
y <- future_int2
} // Do some stuff when both are complete
def receive = {
case "first" => factory1.init("first")
case "second" => factory2.init("second")
}
}
Then I would not have to check which ones are None every time I get another piece.
MORE EDITS:
Some additional information that I failed to specify in my original question:
The data needed to initialize the objects will come in asynchronously.
The data passed to the init function is required for initialization. I edited my example code so that this is now the case.
I am not using Akka. I thought Akka would be helpful for throwing together a quick example and thought that experienced Akka people could provide useful feedback.
Yes, this is certainly a better approach than using mutable variables (whether Option or not). Using lazy val, as #PatrykĆwiek suggested, is even better if you can initialize the state at any time instead of waiting for external events and don't need to do it asynchronously.
Judging from your IntFactory, you don't really need the data string (it's not used anywhere), so I think the base case could be rewritten like this:
class Foo {
lazy val first = {
Thread.sleep(2000) // Some computation, initialization etc.
25
}
lazy val second = {
Thread.sleep(1000) // Some computation, initialization etc.
11
}
def receive(s : String) = s match {
case "first" => first
case "second" => second
case _ => -1
}
}
now, let's say you do this:
val foo = new Foo()
println(foo.receive("first")) // waiting for 2 seconds, initializing
println(foo.receive("first")) // returns immediately
println(foo.receive("second")) // waiting for 1 second, initializing
Now both first and second can be initialized at most once.
You can't pass parameters to lazy vals, so if the data string is somehow important to the initialization, then you'd probably be better off using factory method with memoization (IMO).