Should my Scala actors' properties be marked #volatile? - scala

In Scala, if I have a simple class as follows:
val calc = actor {
var sum = 0
loop {
react {
case Add(n) =>
sum += n
case RequestSum =>
sender ! sum
}
}
}
Should my field sum be marked #volatile? Whilst the actor is logically single-threaded (i.e. the messages are processed sequentially), the individual reactions may be happening on separate threads and hence the state variable may be being altered on one thread and then read from another.

You don't need to mark them as volatile. The execution of your code isn't inside a synchronized block, but the actor will always pass through one before your code is invoked, thus forcing memory into a consistent state across threads.

Related

Atomic compareAndSet parameters are evaluated even if it's not used

I have the following code that set the Atomic variable (both java.util.concurrent.atomic and monix.execution.atomic behaves the same:
class Foo {
val s = AtomicAny(null: String)
def foo() = {
println("called")
/* Side Effects */
"foo"
}
def get(): String = {
s.compareAndSet(null, foo())
s.get
}
}
val f = new Foo
f.get //Foo.s set from null to foo, print called
f.get //Foo.s not updated, but still print called
The second time it compareAndSet, it did not update the value, but still foo is called. This is causing problem because foo is having side effects (in my real code, it creates an Akka actor and give me error because it tries to create duplicate actors).
How can I make sure the second parameter is not evaluated unless it is actually used? (Preferably not using synchronized)
I need to pass implicit parameter to foo so lazy val would not work. E.g.
lazy val s = get() //Error cannot provide implicit parameter
def foo()(implicit context: Context) = {
println("called")
/* Side Effects */
"foo"
}
def get()(implicit context: Context): String = {
s.compareAndSet(null, foo())
s.get
}
Updated answer
The quick answer is to put this code inside an actor and then you don't have to worry about synchronisation.
If you are using Akka Actors you should never need to do your own thread synchronisation using low-level primitives. The whole point of the actor model is to limit the interaction between threads to just passing asynchronous messages. This provides all the thread synchronisation that you need and guarantees that an actor processes a single message at a time in a single-threaded manner.
You should definitely not have a function that is accessed simultaneously by multiple threads that creates a singleton actor. Just create the actor when you have the information you need and pass the ActorRef to any other actors that need it using dependency injection or a message. Or create the actor at the start and initialise it when the first message arrives (using context.become to manage the actor state).
Original answer
The simplest solution is just to use a lazy val to hold your instance of foo:
class Foo {
lazy val foo = {
println("called")
/* Side Effects */
"foo"
}
}
This will create foo the first time it is used and after that will just return the same value.
If this is not possible for some reason, use an AtomicInteger initialised to 0 and then call incrementAndGet. If this returns 1 then it is the first pass through this code and you can call foo.
Explanation:
Atomic operations such as compareAndSet require support from the CPU instruction set, and modern processors have single atomic instructions for such operations. In some cases (e.g. cache line is held exclusively by this processor) the operation can be very fast. Other cases (e.g. cache line also in cache of another processor) the operation can be significantly slower and can impact other threads.
The result is that the CPU must be holding the new value before the atomic instruction is executed. So the value must be computed before it is known whether it is needed or not.

Why can't Actors complete all work although I created 10000 of them?

I create 10000 actors and send a message to each, but it seems that the akka system can't complete all the work.
when I check the thread state, they are all in TIMED_WATIING.
My code:
class Reciver extends Actor {
val log = Logging(context.system, this)
var num = 0
def receive = {
case a: Int => log.info(s"[${self.path}] receive $a, num is $num")
Thread.sleep(2000)
log.info(s"[${self.path}] processing $a, num is $num")
num = a
}
}
object ActorSyncOrAsync extends App {
val system = ActorSystem("mysys")
for (i <- 0 to 10000) {
val actor = system.actorOf(Props[Reciver])
actor ! i
}
println("main thread send request complete")
}
You should remove Thread.sleep or (if you're using default thread-pool) surround it with:
scala.concurrent.blocking {
Thread.sleep(2000)
}
scala.concurrent.blocking marks the computation to have a managed blocking, which means that it tells the pool that computation is not taking CPU resources but just waits for some result or timeout. You should be careful with this however. So, basically, this advice works if you're using Thread.sleep for debugging purposes or just to emulate some activity - no Thread.sleep (even surrounded by blocking) should take place in production code.
Explanation:
When some fixed pool is used (including fork-join as it doesn't steal work from threads blocked by Thread.sleep) - there is only POOL_SIZE threads (it equals to the number of cores in your system by default) is used for computation. Everything else is going to be queued.
So, let's say 4 cores, 2 seconds per task, 10000 tasks - it's gonna take 2*10000/4 = 5000 seconds.
The general advice is to not block (including Thread.sleep) inside your actors: Blocking needs careful management. If you need to delay some action it's better to use Scheduler (as #Lukasz mentioned): http://doc.akka.io/docs/akka/2.4.4/scala/scheduler.html

How to run futures on the current actor's dispatcher in Akka

Akka's documentation warns:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback
It seems to me that if I could get the Future which wants to access the mutable state to run on the same dispatcher that arranges for mutual exclusion of threads handling actor messages then this issue could be avoided. Is that possible? (Why not?)
The ExecutionContext provided by context.dispatcher is not tied to the actor messages dispatcher, but what if it were? i.e.
class MyActorWithSafeFutures {
implicit def safeDispatcher = context.dispatcherOnMessageThread
var successCount = 0
var failureCount = 0
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount += 1
case Failure(_) => failureCount += 1
}
response pipeTo sender()
}
}
}
Is there any way to do that in Akka?
(I know that I could convert the above example to do something like self ! IncrementSuccess, but this question is about mutating actor state from Futures, rather than via messages.)
It looks like I might be able to implement this myself, using code like the following:
class MyActorWithSafeFutures {
implicit val executionContext: ExecutionContextExecutor = new ExecutionContextExecutor {
override def execute(runnable: Runnable): Unit = {
self ! runnable
}
override def reportFailure(cause: Throwable): Unit = {
throw new Error("Unhandled throwable", cause)
}
}
override def receive: Receive = {
case runnable: Runnable => runnable.run()
... other cases here
}
}
Would that work? Why doesn't Akka offer that - is there some huge drawback I'm not seeing?
(See https://github.com/jducoeur/Requester for a library which does just this in a limited way -- for Asks only, not for all Future callbacks.)
Your actor is executing its receive under one of the dispatcher's threads, and you want to spin off a Future that's firmly attached to this particular thread? In that case the system can't reuse this thread to run a different actor, because that would mean the thread was unavailable when you wanted to execute the Future. If it happened to use that same thread to execute someClient, you might deadlock with yourself. So this thread can no longer be used freely to run other actors - it has to belong to MySafeActor.
And no other threads can be allowed to freely run MySafeActor - if they were, two different threads might try to update successCount at the same time and you'd lose data (e.g. if the value is 0 and two threads both try to do successCount +=1, the value can end up as 1 rather that 2). So to do this safely, MySafeActor has to have a single Thread that's used for itself and its Future. So you end up with MySafeActor and that Future being tightly, but invisibly, coupled. The two can't run at the same time and could deadlock against each other. (It's still possible for a badly-written actor to deadlock against itself, but the fact that all the code using that actor's "imaginary mutex" is in a single place makes it easier to see potential problems).
You could use traditional multithreading techniques - mutexes and the like - to allow the Future and MySafeActor to run concurrently. But what you really want is to encapsulate successCount in something that can be used concurrently but safely - some kind of... Actor?
TL;DR: either the Future and the Actor: 1) may not run concurrently, in which case you may deadlock 2) may run concurrently, in which case you will corrupt data 3) access state in a concurrency-safe way, in which case you're reimplementing Actors.
You could use a PinnedDispatcher for your MyActorWithSafeFutures actor class which would create a thread pool with exactly one thread for each instance of the given class, and use context.dispatcher as execution context for your Future.
To do this you have to put something like this in your application.conf:
akka {
...
}
my-pinned-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
and to create your actor:
actorSystem.actorOf(
Props(
classOf[MyActorWithSafeFutures]
).withDispatcher("my-pinned-dispatcher"),
"myActorWithSafeFutures"
)
Although what you are trying to achieve breaks completely the purpose of the actor model. The actor state should be encapsulated, and actor state changes should be driven by incoming messages.
This does not answer your question directly, but rather offers an alternative solution using Akka Agents:
class MyActorWithSafeFutures extends Actor {
var successCount = Agent(0)
var failureCount = Agent(0)
def doSomethingWithPossiblyStaleCounts() = {
val (s, f) = (successCount.get(), failureCount.get())
statisticsCollector ! Ratio(f/s+f)
}
def doSomethingWithCurrentCounts() = {
val (successF, failureF) = (successCount.future(), failureCount.future())
val ratio : Future[Ratio] = for {
s <- successF
f <- failureF
} yield Ratio(f/s+f)
ratio pipeTo statisticsCollector
}
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount.send(_ + 1)
case Failure(_) => failureCount.send(_ + 1)
}
response pipeTo sender()
}
}
The catch is that if you want to operate on the counts that would result if you were using #volatile, then you need to operate inside a Future, see doSomethingWithCurrentCounts().
If you are fine with having values which are eventually consistent (there might be pending updates scheduled for the Agents), then something like doSometinghWithPossiblyStaleCounts() is fine.
#rkuhn explains why this would be a bad idea on the akka-user list:
My main consideration here is that such a dispatcher would make it very convenient to have multiple concurrent entry points into the Actor’s behavior, where with the current recommendation there is only one—the active behavior. While classical data races are excluded by the synchronization afforded by the proposed ExecutionContext, it would still allow higher-level races by suspending a logical thread and not controlling the intermediate execution of other messages. In a nutshell, I don’t think this would make the Actor easier to reason about, quite the opposite.

akka how to to launch a master task and block on it finishing?

I'm trying to get started with akka in scala. In the main scala thread I'd like to start an akka actor, send one message to it, and block until that actor terminates. What is the best way to do this?
For example I have a test actor that just repeatedly sends messages to itself:
class Incrementer() extends Actor {
val maxMessages = 5
var counter = 0
def receive() = {
case DoIncr() => {
if (counter < maxMessages) {
counter += 1
self ! DoIncr()
} else {
self.stop()
}
}
}
}
and it is invoked via:
val inc = actorOf(new Incrementer()).start()
val result = inc !! DoIncr()
println(result) // this should block this thread, but it doesn't seem to.
// do other stuff
That block takes just over 5,000 ms to execute instead of what I expect to be a few ms, so it seems to have to do with a default future timeout - and the program does not actually terminate. All I'm really trying to do is time the performance of sending x number of messages. What's going on here?
As Viktor mentioned, in order for !! to terminate successfully, you have to reply to the message. The 5 second delay you are seeing the actor's default timeout, which is configurable. More info can be found on the Akka site.
If you use forward to send the message instead of !, then self.reply will respond to the original sender.
The first message you send to an Akka actor performs some setup that doesn't happen when processing other messages. Be sure to take that into account for your timings.
Corrected code would be:
import akka.actor._
object DoIncr
class Incrementer extends Actor {
val maxMessages = 5
var counter = 0
def receive = {
case DoIncr =>
if (counter < maxMessages) {
counter += 1
self forward DoIncr
} else {
self.reply(()) // replying with () since we have nothing better to say
self.stop()
}
}
}
Aside: I made a few other changes to get your code in line with idiomatic Scala. Your code works without these changes, but it now looks like more typical Scala code.
Case classes without parameter lists have been deprecated. Use objects instead.
If you have a class without a parameter list, you can omit the parenthesis
Actor's receive method does not have parens; your implementing class shouldn't have them either.
It's purely a matter of style, but the body of a case statement does not require braces.

Processing concurrently in Scala

As in my own answer to my own question, I have the situation whereby I am processing a large number of events which arrive on a queue. Each event is handled in exactly the same manner and each even can be handled independently of all other events.
My program takes advantage of the Scala concurrency framework and many of the processes involved are modelled as Actors. As Actors process their messages sequentially, they are not well-suited to this particular problem (even though my other actors are performing actions which are sequential). As I want Scala to "control" all thread creation (which I assume is the point of it having a concurrency system in the first place) it seems I have 2 choices:
Send the events to a pool of event processors, which I control
get my Actor to process them concurrently by some other mechanism
I would have thought that #1 negates the point of using the actors subsystem: how many processor actors should I create? being one obvious question. These things are supposedly hidden from me and solved by the subsystem.
My answer was to do the following:
val eventProcessor = actor {
loop {
react {
case MyEvent(x) =>
//I want to be able to handle multiple events at the same time
//create a new actor to handle it
actor {
//processing code here
process(x)
}
}
}
}
Is there a better approach? Is this incorrect?
edit: A possibly better approach is:
val eventProcessor = actor {
loop {
react {
case MyEvent(x) =>
//Pass processing to the underlying ForkJoin framework
Scheduler.execute(process(e))
}
}
}
This seems like a duplicate of another question. So I'll duplicate my answer
Actors process one message at a time. The classic pattern to process multiple messages is to have one coordinator actor front for a pool of consumer actors. If you use react then the consumer pool can be large but will still only use a small number of JVM threads. Here's an example where I create a pool of 10 consumers and one coordinator to front for them.
import scala.actors.Actor
import scala.actors.Actor._
case class Request(sender : Actor, payload : String)
case class Ready(sender : Actor)
case class Result(result : String)
case object Stop
def consumer(n : Int) = actor {
loop {
react {
case Ready(sender) =>
sender ! Ready(self)
case Request(sender, payload) =>
println("request to consumer " + n + " with " + payload)
// some silly computation so the process takes awhile
val result = ((payload + payload + payload) map {case '0' => 'X'; case '1' => "-"; case c => c}).mkString
sender ! Result(result)
println("consumer " + n + " is done processing " + result )
case Stop => exit
}
}
}
// a pool of 10 consumers
val consumers = for (n <- 0 to 10) yield consumer(n)
val coordinator = actor {
loop {
react {
case msg # Request(sender, payload) =>
consumers foreach {_ ! Ready(self)}
react {
// send the request to the first available consumer
case Ready(consumer) => consumer ! msg
}
case Stop =>
consumers foreach {_ ! Stop}
exit
}
}
}
// a little test loop - note that it's not doing anything with the results or telling the coordinator to stop
for (i <- 0 to 1000) coordinator ! Request(self, i.toString)
This code tests to see which consumer is available and sends a request to that consumer. Alternatives are to just randomly assign to consumers or to use a round robin scheduler.
Depending on what you are doing, you might be better served with Scala's Futures. For instance, if you don't really need actors then all of the above machinery could be written as
import scala.actors.Futures._
def transform(payload : String) = {
val result = ((payload + payload + payload) map {case '0' => 'X'; case '1' => "-"; case c => c}).mkString
println("transformed " + payload + " to " + result )
result
}
val results = for (i <- 0 to 1000) yield future(transform(i.toString))
If the events can all be handled independently, why are they on a queue? Knowing nothing else about your design, this seems like an unnecessary step. If you could compose the process function with whatever is firing those events, you could potentially obviate the queue.
An actor essentially is a concurrent effect equipped with a queue. If you want to process multiple messages simultaneously, you don't really want an actor. You just want a function (Any => ()) to be scheduled for execution at some convenient time.
Having said that, your approach is reasonable if you want to stay within the actors library and if the event queue is not within your control.
Scalaz makes a distinction between Actors and concurrent Effects. While its Actor is very light-weight, scalaz.concurrent.Effect is lighter still. Here's your code roughly translated to the Scalaz library:
val eventProcessor = effect (x => process x)
This is with the latest trunk head, not yet released.
This sounds like a simple consumer/producer problem. I'd use a queue with a pool of consumers. You could probably write this with a few lines of code using java.util.concurrent.
The purpose of an actor (well, one of them) is to ensure that the state within the actor can only be accessed by a single thread at a time. If the processing of a message doesn't depend on any mutable state within the actor, then it would probably be more appropriate to just submit a task to a scheduler or a thread pool to process. The extra abstraction that the actor provides is actually getting in your way.
There are convenient methods in scala.actors.Scheduler for this, or you could use an Executor from java.util.concurrent.
Actors are much more lightweight than threads, and as such one other option is to use actor objects like Runnable objects you are used to submitting to a Thread Pool. The main difference is you do not need to worry about the ThreadPool - the thread pool is managed for you by the actor framework and is mostly a configuration concern.
def submit(e: MyEvent) = actor {
// no loop - the actor exits immediately after processing the first message
react {
case MyEvent(x) =>
process(x)
}
} ! e // immediately send the new actor a message
Then to submit a message, say this:
submit(new MyEvent(x))
, which corresponds to
eventProcessor ! new MyEvent(x)
from your question.
Tested this pattern successfully with 1 million messages sent and received in about 10 seconds on a quad-core i7 laptop.
Hope this helps.