More parallel actors in scala - scala

(sort of followup to How to make a code thread safe in scala? )
I have a scala class that can inherently be called only from one thread (let's call it class ThreadUnsafeProducer); it is, however, safe to have more threads to each access exactly one object. However, the ThreadUnsafeProducer is quite memory heavy, so I don't want each thread to have one ThreadUnsafeProducer.
I want to have a given number N of ThreadUnsafeProducer objects (ideally one for each CPU).
I have lots of threads Consumer that all share the same object SharedObject.
I want to somehow use Actors model to give messages to either SharedObject or ThreadUnsafeProducer (I am not sure which) to have a given number of concurrent ThreadUnsafeProducer running. And I am quite lost in all the Akka/Actors classes.

I recently found Akka Routing classes
http://doc.akka.io/docs/akka/2.0/scala/routing.html
It looks really nice and exactly what I need. If it works it would be beautiful.

Related

Scala objects and thread safety

I am new to Scala.
I am trying to figure out how to ensure thread safety with functions in a Scala object (aka singleton)
From what I have read so far, it seems that I should keep visibility to function scope (or below) and use immutable variables wherever possible. However, I have not seen examples of where thread safety is violated, so I am not sure what other precautions should be taken.
Can someone point me to a good discussion of this issue, preferably with examples of where thread safety is violated?
Oh man. This is a huge topic. Here's a Scala-based intro to concurrency and Oracle's Java lessons actually have a pretty good intro as well. Here's a brief intro that motivates why concurrent reading and writing of shared state (of which Scala objects are particular specific case) is a problem and provides a quick overview of common solutions.
There's two (fundamentally related) classes of problems when it comes to thread safety and state mutation:
Clobbering (missing) writes
Inaccurate (changing out from under you) reads
Let's look at each of these in turn.
First clobbering writes:
object WritesExample {
var myList: List[Int] = List.empty
}
Imagine we had two threads concurrently accessing WritesExample, each of executes the following updateList
def updateList(x: WritesExample.type): Unit =
WritesExample.myList = 1 :: WritesExample.myList
You'd probably hope when both threads are done that WritesExample.myList has a length of 2. Unfortunately, that might not be the case if both threads read WritesExample.myList before the other thread has finished a write. If when both threads read WritesExample.myList it is empty, then both will write back a list of length 1, with one write overwriting the other, so that in the end WritesExample.myList only has a length of one. Hence we've effectively lost a write we were supposed to execute. Not good.
Now let's look at inaccurate reads.
object ReadsExample {
val myMutableList: collection.mutable.MutableList[Int]
}
Once again, let's say we had two threads concurrently accessing ReadsExample. This time each of them executes updateList2 repeatedly.
def updateList2(x: ReadsExample.type): Unit =
ReadsExample.myMutableList += ReadsExample.myMutableList.length
In a single-threaded context, you would expect updateList2, when repeatedly called, to simply generate an ordered list of incrementing numbers, e.g. 0, 1, 2, 3, 4,.... Unfortunately, when multiple threads are accessing ReadsExample.myMutableList with updateList2 at the same time, it's possible that between when ReadsExample.myMutableList.length is read and when the write is finally persisted, ReadsExample.myMutableList has already been modified by another thread. So in theory you could see something like 0, 0, 1, 1 or potentially if one thread takes longer to write than another 0, 1, 2, 1 (where the slower thread finally writes to the list after the other thread has already accessed and written to the list three times).
What happened is that the read was inaccurate/out-of-date; the actual data structure that was updated was different from the one that was read, i.e. was changed out from under you in the middle of things. This is also a huge source of bugs because many invariants you might expect to hold (e.g. every number in the list corresponds exactly to its index or every number appears only once) hold in a single-threaded context, but fail in a concurrent context.
Now that we've motivated some of the problems, let's dive into some of the solutions. You mentioned immutability so let's talk about that first. You might notice that in my example of clobbering writes I use an immutable data structure whereas in my inconsistent reads example I use a mutable data structure. That is intentional. They are in a sense dual to one another.
With immutable data structures you cannot have an "inaccurate" read in the sense I laid out above because you never mutate data structures, but rather place a new copy of a data structure in the same location. The data structure cannot change out from under you because it cannot change! However you can lose a write in the process by placing a version of a data structure back to its original location that does not incorporate a change made previously by another process.
With mutable data structures on the other hand, you cannot lose a write because all writes are in-place mutations of the data structure, but you can end up executing a write to a data structure whose state differs from when you analyzed it to formulate the write.
If it's a "pick your poison" kind of scenario, why do you often hear advice to go with immutable data structures to help with concurrency? Well immutable data structures make it easier to ensure invariants about the state being modified hold even if writes are lost. For example, if I rewrote the ReadsList example to use an immutable List (and a var instead), then I could confidently say that the integer elements of the list will always correspond to the indices of the list. This means that your program is much less likely to enter an inconsistent state (e.g. it's not hard to imagine that a naive mutable set implementation could end up with non-unique elements when mutated concurrently). And it turns out that modern techniques for dealing with concurrency usually are pretty good at dealing with missing writes.
Let's look at some of those approaches that deal with shared state concurrency. At their hearts they can all be summed up as various ways of serializing read/write pairs.
Locks (a.k.a. directly try to serialize read/write pairs): This is usually the one you'll hear first as a fundamental way of dealing with concurrency. Every process that wants to access state first places a lock on it. Any other process is now excluded from accessing that state. The process then writes to that state and on completion releases the lock. Other processes are now free to repeat the process. In our WritesExample, updateList would first acquire the lock before executing and releasing the lock; this would prevent other processes from reading WritesExample.myList until the write was completed, thereby preventing them from seeing old versions of myList that would lead to clobbering writes (note that are more sophisticated locking procedures that allow for simultaneous reads, but let's stick with the basics for now).
Locks often do not scale well to multiple pieces of state. With multiple locks, often you need to acquire and release locks in a certain order otherwise you can end up deadlocking or livelocking.
The Oracle and Twitter docs linked a the beginning have good overviews of this approach.
Describe Your Action, Don't Execute It (a.k.a. build up a serial representation of your actions and have someone else process it): Instead of accessing and modifying state directly, you describe an action of how to do this and then give it to someone else to actually execute the action. For example, you might pass messages to an object (e.g. actors in Scala) that queues up these requests and then executes them one-by-one on some internal state that it never directly exposes to anyone else. In the particular case of actors, this improves the situation over locks by removing the need to explicitly acquire and release locks. As long as you encapsulate all the state you need to access at once in a single object, message passing works great. Actors break down when you distribute state across multiple objects (and as such this is heavily discouraged in this paradigm).
Akka actors are one good example of this in Scala.
Transactions (a.k.a. temporarily isolate some reads and writes from others and let the isolation system serialize things for you): Wrap all your read/writes in transactions that ensure during the course of your reads and writes your view of the world is isolated from any other changes. There's usually two ways of achieving this. Either you go for an approach similar to locks where you prevent other people from accessing the data while a transaction is running or you restart a transaction from the very beginning whenever you detect that a change has occurred to the shared state and throw away any progress you've made (usually the latter for performance reasons). On the one hand, transactions, unlike locks and actors, scale to disparate pieces of state very well. Just wrap all your accesses in transactions and you're good to go. On the other hand, your reads and writes have to be side-effect-free because they might be thrown away and retried many times and you can't really undo most side effects.
And if you're really unlucky, although you usually can't truly deadlock with a good implementation of transactions, a long-lived transaction can constantly be interrupted by other short-lived transactions such that it keeps getting thrown away and retried and never actually succeeds (which amounts to something like livelocking). In effect you're giving up direct control of serialization order and hoping your transaction system orders things sensibly.
Scala's STM library is a good example of this approach.
Remove Shared State: The final "solution" is to rethink the problem altogether and try to think about whether you truly need global, shared state that is writable. If you don't need writable shared state, then concurrency problems go away altogether!
Everything in life is about trade-offs and concurrency is no exception. When thinking about concurrency first understand what state you have and what invariants you want to preserve about that state. Then use that to guide your decision as to what kind of tools you want to use to tackle the problem.
The Thread Safety Problem section within this Scala concurrency article might be of interest to you. In essence, it illustrates the thread safety problem using a simple example and outlines 3 different approaches to tackle the problem, namely synchronization, volatile and AtomicReference:
When you enter synchronized points, access volatile references, or
deference AtomicReferences, Java forces the processor to flush their
cache lines and provide a consistent view of data.
There is also a brief overview comparing the cost of the 3 approaches:
AtomicReference is the most costly of these two choices since you
have to go through method dispatch to access values. volatile and
synchronized are built on top of Java’s built-in monitors. Monitors
cost very little if there’s no contention. Since synchronized allows
you more fine-grained control over when you synchronize, there will be
less contention so synchronized tends to be the cheapest option.
This is not specific to Scala, if your object contains a state that can be modified concurrently thread safety can be violated depending on the implementation. For example:
object BankAccount {
private var balance: Long = 0L
def deposit(amount: Long): Unit = balance += amount
}
In this case the the object is not thread safe, there are a lot of approachs to make it thread safe, for example using Akka, or synchronized blocks. For simplicity I will write it using synchronized blocks
object BankAccount {
private var balance: Long = 0L
def deposit(amount: Long): Unit =
this.synchronized {
balance += amount
}
}

Akka and singleton actors

I've recently started messing around with akka's actors and http modules. However I've stumbled upon a rather annoying little quirk, namely, creating singelton actors.
Here are two examples:
1)
I have an in-memory cache, my service is quite small (its an app rather) so I really like this in memory model. I can hold most information relevant to the user in a Map (well, a map of lists, but still, quite an easy to reason about structure) and I don't get the overhead and complexity of a redis, geode or aerospike.
The only problem is that this in-memory chache can be modified, by multiple sources and said modifications must be synchronous. Instead of synchornizing all 3 acess methods for this structure (e.g. by building a message queue or implementing locks) I thought I'd just wrap the structure and its access methods into an actor, build in message queue, easy receive->send logic and if things scale up it will be very easy to replace with a DA actors over a dedicated in memory db.
2) I have a "Service" layer that should be used to dispatch actors for various jobs (access the database, access the in-memory cache, do this computation with data and deliver the result to the user... etc).
It makes sense of this Service layer to be a "singleton" of sorts, a closure over some functions, since it does nothing that's blocking or cpu/memory intensive in any way, it simply assigns tasks further down the line (e.g. decides how many actors/thread/w.e should be created and where a request should go)
However, this thing would require either:
a) Making both object singleton actors or
b) Making both objects actual "objects"(as in the scala object notation that designates a single named singleton with functions that have closures over its scope)
There are plenty of problems with b), namely that the service layer will either have to get an actors system "passed" to it (and I'm not sure that's a best practice) in order o create actors, rather than creating its own "childrens" it will create children's using the global actors system and the messaging and monitoring logic will be a lot more awkward and unintuitive. Also, that the in-memory cache will not have the advantage of the built in message que (I'm not saying its hard to implement one, but this seems like one of those situation where one goes "Oh, jolly, its good that I have actors and I don't have to spend time implementing and testing this code")
a) seems to have the problem of being generally speaking poorly documented and unadvised in the akka documentation. I mean:
http://doc.akka.io/docs/akka/2.4/scala/cluster-singleton.html
Look at this shit, half of the docs are warning against using it, it was its own dependency and quite frankly its very hard to read for a poor sod like me which hasn't set foot in the functional&concurrent programming ivory tower.
So, ahm. Could any of you guys explain to me why its bad to use singleton actors ? How do you design singletons if they can't be actors ? Is there any way to design singleton actors that won't cause a lot of damage down the line ? Is the whole "service" model of having "global" services that are called rather than instantiated "un akka like" ?
Just to clarify the documentation, they're not warning against using it. They're warning that there are circumstances in which using a singleton will cause problems, which are expected given the circumstances. They mention the following situations:
If the singleton is a performance bottleneck. This makes sense. If everything relies on a single object that does work slowly, everything will be slow.
If the actor needs to be non-stop available, you'll run into problems if the singleton ever goes down, because those messages can't just be handled by another instance. It will take some amount of time to re-start the singleton before its work can be resumed.
The biggest problem happens if you have auto-downing turned on. Auto-downing is a policy by which an unreachable node is assumed to be down, and removed from the network. If you do this, but the node is not actually down but just unreachable due to a network partition, both sides of the partition will decide that they're the surviving nodes and create their own singletons. So now you have two singletons. Which is, of course, not what you want from a singleton. But you should never use auto-downing outside of testing anyway. It's a terrible recovery strategy that was included for completeness and convenience in testing.
So I don't read that as recommending against using it. Just being clear about the expected pitfalls if you do use it, based on the nature of the structure.

How to use Scala Futures the right way?

I'm wondering if Futures are better to be used in conjunction with Actors only, rather than in a program that does not use Actor. Said differently, is performing asynchronous computation with future something that should better be done within an Actors system?
Here why i'm saying that:
1 -
You perform a computation for which the result, would trigger some action that you may do in another thread.
For instance, i have a long operation to determine the price of something, from my main thread, i decide to launch an asynchronous process for it. In the mean time i could be doing other thing, then when the response is ready/availble or communicated back to me, i go on on that path.
I can see that with actor this is handy, because you can pipe a result to an actor. But with a typical threading model, you can either block or .... ?
2 -
Another issue, let say i need to update the age of a list of participant, by getting some information online. Let assume i just have one future for that task. Isn't closing over the participant list something wrong to do. Multiple thread maybe accessing that participant list at the same time. So making the update within the future would simply be wrong and in that case, we would need java concurrent collection isn't it ?
Maybe i see it the wrong way, future are not meant to do side effect
at all
But in that case, fair enough, no side effect, but we still have the problem of getting a value back from the calling thread, which can only be blocking. I mean let's imagine that, the result, would help the calling thread, to update some data structure. How to do that update asynchronously without closing over that data structure somehow.
I believe the call backs such as OnComplete can be use for
side-effecting (Am it right here?)
still, the call back would have to close over the data structure anyway. Hence i don't see how not using Actor.
PS: I like actors, i'm just trying to understand better the usage of future without actors. I read everywhere, that one should use actor only when necessary that is when state need to be manage. It seems to me that overall, using future, without actor, always involve blocking somewhere down the line, if the result need to be communicated back at some point to the thread that initiated the asynchronous task.
Actors are good when you are dealing with mutable state because they encapsulate the mutable state. and allow only message-based interaction.
You can use Future to execute in a different thread. You don't have to block on a Future because Scala's Future compose. So if you have multiple Futures in your code, you don't have to wait/block for all of them to compete. For example, if your pipeline is completely non-block or asyn (e.g., Play and Spray) you can return a Future back to the client.
Futures are lightweight compared to actors because you don't need a complete actorsystem.
Here is a quote from Martin Odersky that I really like.
There is no silver bullet for all concurrency issues; the right
solution depends on what one needs to achieve. Do you want to define
asynchronous computations that react to events or streams of values?
Or have autonomous, isolated entities communicating via messages? Or
define transactions over a mutable store? Or, maybe the primary
purpose of parallel execution is to increase the performance? For each
of these tasks, there is an abstraction that does the job: futures,
reactive streams, actors, transactional memory, or parallel
collections.
So choose your abstraction based on your use case and needs.

Typed messages in akka

Akka framework recommends using typed actor only for interacting with external code. However, standard actors from akka are untyped. Is there any better way to create type safe actors? Are there some other actor frameworks or type safe wrappers around akka?
If you really want actors with static typing, then you might as well go ahead and use typed actors throughout your code. This is strongly discouraged for a couple of reasons.
1.) You run the risk of your system degenerating into a bunch of RPCs. An actor's receive method makes it pretty obvious that the whole thing is about message passing, much less so if you're just calling methods on a typed actor.
2.) An actor just really doesn't have a type. While it's running, the messages an actor is able to process may change depending on what state is in, as may what it does with those messages. This is an excellent way of modeling a lot of protocols, and Akka actors have first class support for it with FSMs.
So if you really want to do it, you're free to used typed actors everywhere and it'll work, but you should really think hard about the problem you're trying to solve before doing so.
For compile time checking see SynapseGrid framework. It defines a SystemBuilder that constructs the DataFlow topology. While constructing it is guaranteed that types that pass by are checked. Then the resulting system is converted to RuntimeSystem with nested and properly interconnected actors.
Why is this a problem for you? akka.actor.Actor has the receive method of type PartialFunction that will only be called for messages that it can handle. Why do you need compile time checks? But to answer your question: one way would be - for an external api - to build a wrapper around your ActorRef that then sends the messages to the actor.
Things are going quite fast, I thought about giving an update
1. Typed actors are deprecated
2. Instead a new concept of Akka Typed is being devloped at the momemnt
As I understood this should be the definitive solution to an typed actor system. But since this is at least the third try and planned earliest for Akka 2.4, this claim remains to be proven.
I personally do look forward to have both systems available: the existing one for more dynamic use cases, the new one for more robust ones

Akka for simulations

I'm new to akka and the actor-pattern, therefore I'm not sure if it fit my needs.
I want to create a simulation with akka and millions of entities (think as domain objects - later actors) that can influence each other. So thinking as simulation with a more-or-less "fuzzy" result, we have an array with entities, where each of these entities has a speed, but is thwarted by the entities in front of the actual entity. When the simulation starts, each entity should move n-fields, or, if thwarted by others, less fields. We have multiple iterations, and in the end we have a new order. This is repeated for some rounds until we want to see a "snapshot" of the leading entities (which are then possibly removed before the next round starts).
So I don't understand if I can create this with akka, because:
Is it possible to have global list with the position of each actor, so they know at which position they are and which are in front of them?
As far as I understand, this violates the encapsulation of the actors. I can put the position of the actor in the actor itself, but how can I see/notify the actors around this actor?
Beside of this, the global list will create synchronization problems and impacts the performance, which is the exactly opposite of the desired behaviour (and is complementary to akka/the actor-pattern)
What did I missed? Do I have to search for another design approach?
Thanks for suggestions.
Update: working with the eventbus and classifiers doesn't seem an option, too. Refering to the documentation:
"hence it is not well-suited to use cases in which subscriptions change with very high frequency"
The actor model is a very good fit for your scenario. Actors communicate by sending messages, so each actor can send messages to his neighbors containing his position. Of course, each actor cannot know about every other actor in the system (not efficiently anyway) so you will have to also devise a scheme though which each actor knows which are his neighbors.
As for getting a snapshot of the system, simply have a central actor that is known by everybody and knows everybody.
It seems like you're just getting started with actors. Read a bit more - the akka site is a good resource - and come back and refine your question, if needed.
Your problem sounds like an n-body simulation sort of thing, so looking into that might help also.