Accumulating and Ordering Child Actor Responses? - scala

I'm new to Scala and Akka, and I've been reading a book on Akka. I can't find a reasonable solution for what I would think is a common use case with with Actors.
Lets say you have a parent actor that receives a request for a large chunk of work (Say you need to go to the network to download 100 files), and so the parent actor splits up the work and routes it to 10 children, so that we are downloading more than 1 file at once.
Somehow I need to get all of the files back to the parent actor in order. What would be a good design pattern for doing this?
I found a link in my searching where they seem to have come up with a good way of accomplishing this, but because the blog don't actually show how to use the example (as they just providing a code snippet), and I'm a scala noob, I don't understand how to put it in practice:
http://www.ccri.com/2014/01/22/accumulating-responses-from-child-actors-and-transitive-message-ordering/

It can be done with akka futures. From your parent actor:
Create an array of 100 futures, each of which fetches one file
Use Future.sequence or .traverse to map the array of futures to a future of an array of files
map that to .pipe to send the array back to the parent actor
Each future could be the result of an ask to a pool of child actors, if fetching each file is best suited to an actor not a future.

Related

How to use Scala Futures the right way?

I'm wondering if Futures are better to be used in conjunction with Actors only, rather than in a program that does not use Actor. Said differently, is performing asynchronous computation with future something that should better be done within an Actors system?
Here why i'm saying that:
1 -
You perform a computation for which the result, would trigger some action that you may do in another thread.
For instance, i have a long operation to determine the price of something, from my main thread, i decide to launch an asynchronous process for it. In the mean time i could be doing other thing, then when the response is ready/availble or communicated back to me, i go on on that path.
I can see that with actor this is handy, because you can pipe a result to an actor. But with a typical threading model, you can either block or .... ?
2 -
Another issue, let say i need to update the age of a list of participant, by getting some information online. Let assume i just have one future for that task. Isn't closing over the participant list something wrong to do. Multiple thread maybe accessing that participant list at the same time. So making the update within the future would simply be wrong and in that case, we would need java concurrent collection isn't it ?
Maybe i see it the wrong way, future are not meant to do side effect
at all
But in that case, fair enough, no side effect, but we still have the problem of getting a value back from the calling thread, which can only be blocking. I mean let's imagine that, the result, would help the calling thread, to update some data structure. How to do that update asynchronously without closing over that data structure somehow.
I believe the call backs such as OnComplete can be use for
side-effecting (Am it right here?)
still, the call back would have to close over the data structure anyway. Hence i don't see how not using Actor.
PS: I like actors, i'm just trying to understand better the usage of future without actors. I read everywhere, that one should use actor only when necessary that is when state need to be manage. It seems to me that overall, using future, without actor, always involve blocking somewhere down the line, if the result need to be communicated back at some point to the thread that initiated the asynchronous task.
Actors are good when you are dealing with mutable state because they encapsulate the mutable state. and allow only message-based interaction.
You can use Future to execute in a different thread. You don't have to block on a Future because Scala's Future compose. So if you have multiple Futures in your code, you don't have to wait/block for all of them to compete. For example, if your pipeline is completely non-block or asyn (e.g., Play and Spray) you can return a Future back to the client.
Futures are lightweight compared to actors because you don't need a complete actorsystem.
Here is a quote from Martin Odersky that I really like.
There is no silver bullet for all concurrency issues; the right
solution depends on what one needs to achieve. Do you want to define
asynchronous computations that react to events or streams of values?
Or have autonomous, isolated entities communicating via messages? Or
define transactions over a mutable store? Or, maybe the primary
purpose of parallel execution is to increase the performance? For each
of these tasks, there is an abstraction that does the job: futures,
reactive streams, actors, transactional memory, or parallel
collections.
So choose your abstraction based on your use case and needs.

Akka Actor Setup: In Main method or in 'Manager' class?

Does anyone have any advice about how the creation of a large number of akka actors should be managed?
My new middleware project currently contains about 10 actors, but over time this will inevitably grow to a high number. I'm creating all my actors in my main function, but this could potentially get out of control as the system grows, with the function spanning an entire screen.
I could of course move all the actor creation into a function in a separate class, though this doesn't really solve the problem as such.
I'm not sure if there are any patterns available to help manage this setup procedure?
Normally one should have but a few top-level Actors (i.e. ones that are created by using system.actorOf). This is because you get a very poor fault-tolerance if all Actors are just as likely to ruin things for the others. So what you should do is to think about how you want failure to be contained in your application and then create actors as children of other actors using context.actorOf.
It really depends on the relationship of the actors. If they have no parent/child relationship to each other it doen't really matter where you start them. If they have such a relationship, you should start your actors inside their parents, because you have to use the context of the parent actor to create another actor as its child.
It is difficult to answer your question without knowing more about the nature of the actors you are creating. For example, if you can logically group your actors, I'd do something like this:
def initialize() = {
// Initialize Misc actors
val foo = FooActor()
val bar = BarActor()
initializeActorsThatDoStuff()
initializeActorsThatDoOtherStuff()
}
If they have a parent/child relationship you should do as #drexin suggests.
Edit: Almost forgot: I you are creating multiple actors of the same type with different parameters, I'd of course use a loop and not copy and paste, e.g.
def initializeLotsOfActors(num:Int) =
for(i <- 0 to num) new ActorThatTakesAnInt(i);
(because nobody likes copy and paste :D)

More parallel actors in scala

(sort of followup to How to make a code thread safe in scala? )
I have a scala class that can inherently be called only from one thread (let's call it class ThreadUnsafeProducer); it is, however, safe to have more threads to each access exactly one object. However, the ThreadUnsafeProducer is quite memory heavy, so I don't want each thread to have one ThreadUnsafeProducer.
I want to have a given number N of ThreadUnsafeProducer objects (ideally one for each CPU).
I have lots of threads Consumer that all share the same object SharedObject.
I want to somehow use Actors model to give messages to either SharedObject or ThreadUnsafeProducer (I am not sure which) to have a given number of concurrent ThreadUnsafeProducer running. And I am quite lost in all the Akka/Actors classes.
I recently found Akka Routing classes
http://doc.akka.io/docs/akka/2.0/scala/routing.html
It looks really nice and exactly what I need. If it works it would be beautiful.

Akka for simulations

I'm new to akka and the actor-pattern, therefore I'm not sure if it fit my needs.
I want to create a simulation with akka and millions of entities (think as domain objects - later actors) that can influence each other. So thinking as simulation with a more-or-less "fuzzy" result, we have an array with entities, where each of these entities has a speed, but is thwarted by the entities in front of the actual entity. When the simulation starts, each entity should move n-fields, or, if thwarted by others, less fields. We have multiple iterations, and in the end we have a new order. This is repeated for some rounds until we want to see a "snapshot" of the leading entities (which are then possibly removed before the next round starts).
So I don't understand if I can create this with akka, because:
Is it possible to have global list with the position of each actor, so they know at which position they are and which are in front of them?
As far as I understand, this violates the encapsulation of the actors. I can put the position of the actor in the actor itself, but how can I see/notify the actors around this actor?
Beside of this, the global list will create synchronization problems and impacts the performance, which is the exactly opposite of the desired behaviour (and is complementary to akka/the actor-pattern)
What did I missed? Do I have to search for another design approach?
Thanks for suggestions.
Update: working with the eventbus and classifiers doesn't seem an option, too. Refering to the documentation:
"hence it is not well-suited to use cases in which subscriptions change with very high frequency"
The actor model is a very good fit for your scenario. Actors communicate by sending messages, so each actor can send messages to his neighbors containing his position. Of course, each actor cannot know about every other actor in the system (not efficiently anyway) so you will have to also devise a scheme though which each actor knows which are his neighbors.
As for getting a snapshot of the system, simply have a central actor that is known by everybody and knows everybody.
It seems like you're just getting started with actors. Read a bit more - the akka site is a good resource - and come back and refine your question, if needed.
Your problem sounds like an n-body simulation sort of thing, so looking into that might help also.

Akka framework support for finding duplicate messages

I'm trying to build a high-performance distributed system with Akka and Scala.
If a message requesting an expensive (and side-effect-free) computation arrives, and the exact same computation has already been requested before, I want to avoid computing the result again. If the computation requested previously has already completed and the result is available, I can cache it and re-use it.
However, the time window in which duplicate computation can be requested may be arbitrarily small. e.g. I could get a thousand or a million messages requesting the same expensive computation at the same instant for all practical purposes.
There is a commercial product called Gigaspaces that supposedly handles this situation.
However there seems to be no framework support for dealing with duplicate work requests in Akka at the moment. Given that the Akka framework already has access to all the messages being routed through the framework, it seems that a framework solution could make a lot of sense here.
Here is what I am proposing for the Akka framework to do:
1. Create a trait to indicate a type of messages (say, "ExpensiveComputation" or something similar) that are to be subject to the following caching approach.
2. Smartly (hashing etc.) identify identical messages received by (the same or different) actors within a user-configurable time window. Other options: select a maximum buffer size of memory to be used for this purpose, subject to (say LRU) replacement etc. Akka can also choose to cache only the results of messages that were expensive to process; the messages that took very little time to process can be re-processed again if needed; no need to waste precious buffer space caching them and their results.
3. When identical messages (received within that time window, possibly "at the same time instant") are identified, avoid unnecessary duplicate computations. The framework would do this automatically, and essentially, the duplicate messages would never get received by a new actor for processing; they would silently vanish and the result from processing it once (whether that computation was already done in the past, or ongoing right then) would get sent to all appropriate recipients (immediately if already available, and upon completion of the computation if not). Note that messages should be considered identical even if the "reply" fields are different, as long as the semantics/computations they represent are identical in every other respect. Also note that the computation should be purely functional, i.e. free from side-effects, for the caching optimization suggested to work and not change the program semantics at all.
If what I am suggesting is not compatible with the Akka way of doing things, and/or if you see some strong reasons why this is a very bad idea, please let me know.
Thanks,
Is Awesome, Scala
What you are asking is not dependent on the Akka framework but rather it's how you architect your actors and messages. First ensuring that your messages are immutable and have an appropriately defined identities via the equals/hashCode methods. Case classes give you both for free however if you have actorRefs embedded in the message for reply purposes you will have to override the identity methods. The case class parameters should also have the same properties recursively (immutable and proper identity).
Secondly you need to figure out how the actors will handle storing and identifying current/past computations. The easiest is to uniquely map requests to actors. This way that actor and only that actor will ever process that specific request. This can be done easily given a fixed set of actors and the hashCode of the request. Bonus points if the actor set is supervised where the supervisor is managing the load balancing/mapping and replacing failed actors ( Akka makes this part easy ).
Finally the actor itself can maintain a response caching behavior based on the criteria you described. Everything is thread safe in the context of the actor so a LRU cache keyed by the request itself ( good identity properties remember ) is easy with any type of behavior you want.
As Neil says, this is not really framework functionality, it's rather trivial to implement this and even abstract it into it's own trait.
trait CachingExpensiveThings { self: Actor =>
val cache = ...
def receive: Actor.Receive = {
case s: ExpensiveThing => cachedOrCache(s)
}
def cacheOrCached(s: ExpensiveThing) = cache.get(s) match {
case null => val result = compute(s)
cache.put(result)
self.reply_?)(result)
case cached => self.reply_?)(cached)
}
def compute(s: ExpensiveThing): Any
}
class MyExpensiveThingCalculator extends Actor with CachingExpensiveThings {
def compute(s: ExpensiveThing) = {
case l: LastDigitOfPi => ...
case ts: TravellingSalesman => ...
}
}
I do not know if all of these responsibilities should be handled only by the Akka. As usual, it all depends on the scale, and in particular - the number of attributes that defines the uniqueness of the message.
In case of cache mechanism, already mentioned approach with uniquely mapping requests to actors is way to go especially that it could be supported by the persistency.
In case of identity, instead of checking simple equality (which may be bottleneck) I will rather use graph based algorithm like signal-collect.