Akka framework (Scala) - Agents to store large complex state - scala

I recently discovered the akka framework and felt it was a good match for one of my projects. I must say I'm very impressed with it so far.
In my project, I need to have 1M+ entities receive state updates a very fast rate. Naturally, akka actors seem to be the first choice. I do however wonder if I'm not better off using agents to store the state updates (so far, my actors only have two messages - one for updating the state and the other for reading it -- and I don't believe that will ever change).
Looking at the few examples for agents, I get the feeling that they are not meant to store large complex state. Am I wrong?
In short, I would like to store something like:
case class AgentState(val list1 : List[Int], val list2 : List[Int], val peers : List[Agent])
Obviously, updating the state becomes less pretty than in toy examples where you use integers ;)
Does it make sense then to have an Agent? How would you go about doing this?
Thanks for your answers!
-LP

Akka Agents are backed by Actors, so it only makes sense if you want to have concurrent readers and serial writers.

Related

Akka and singleton actors

I've recently started messing around with akka's actors and http modules. However I've stumbled upon a rather annoying little quirk, namely, creating singelton actors.
Here are two examples:
1)
I have an in-memory cache, my service is quite small (its an app rather) so I really like this in memory model. I can hold most information relevant to the user in a Map (well, a map of lists, but still, quite an easy to reason about structure) and I don't get the overhead and complexity of a redis, geode or aerospike.
The only problem is that this in-memory chache can be modified, by multiple sources and said modifications must be synchronous. Instead of synchornizing all 3 acess methods for this structure (e.g. by building a message queue or implementing locks) I thought I'd just wrap the structure and its access methods into an actor, build in message queue, easy receive->send logic and if things scale up it will be very easy to replace with a DA actors over a dedicated in memory db.
2) I have a "Service" layer that should be used to dispatch actors for various jobs (access the database, access the in-memory cache, do this computation with data and deliver the result to the user... etc).
It makes sense of this Service layer to be a "singleton" of sorts, a closure over some functions, since it does nothing that's blocking or cpu/memory intensive in any way, it simply assigns tasks further down the line (e.g. decides how many actors/thread/w.e should be created and where a request should go)
However, this thing would require either:
a) Making both object singleton actors or
b) Making both objects actual "objects"(as in the scala object notation that designates a single named singleton with functions that have closures over its scope)
There are plenty of problems with b), namely that the service layer will either have to get an actors system "passed" to it (and I'm not sure that's a best practice) in order o create actors, rather than creating its own "childrens" it will create children's using the global actors system and the messaging and monitoring logic will be a lot more awkward and unintuitive. Also, that the in-memory cache will not have the advantage of the built in message que (I'm not saying its hard to implement one, but this seems like one of those situation where one goes "Oh, jolly, its good that I have actors and I don't have to spend time implementing and testing this code")
a) seems to have the problem of being generally speaking poorly documented and unadvised in the akka documentation. I mean:
http://doc.akka.io/docs/akka/2.4/scala/cluster-singleton.html
Look at this shit, half of the docs are warning against using it, it was its own dependency and quite frankly its very hard to read for a poor sod like me which hasn't set foot in the functional&concurrent programming ivory tower.
So, ahm. Could any of you guys explain to me why its bad to use singleton actors ? How do you design singletons if they can't be actors ? Is there any way to design singleton actors that won't cause a lot of damage down the line ? Is the whole "service" model of having "global" services that are called rather than instantiated "un akka like" ?
Just to clarify the documentation, they're not warning against using it. They're warning that there are circumstances in which using a singleton will cause problems, which are expected given the circumstances. They mention the following situations:
If the singleton is a performance bottleneck. This makes sense. If everything relies on a single object that does work slowly, everything will be slow.
If the actor needs to be non-stop available, you'll run into problems if the singleton ever goes down, because those messages can't just be handled by another instance. It will take some amount of time to re-start the singleton before its work can be resumed.
The biggest problem happens if you have auto-downing turned on. Auto-downing is a policy by which an unreachable node is assumed to be down, and removed from the network. If you do this, but the node is not actually down but just unreachable due to a network partition, both sides of the partition will decide that they're the surviving nodes and create their own singletons. So now you have two singletons. Which is, of course, not what you want from a singleton. But you should never use auto-downing outside of testing anyway. It's a terrible recovery strategy that was included for completeness and convenience in testing.
So I don't read that as recommending against using it. Just being clear about the expected pitfalls if you do use it, based on the nature of the structure.

Typed messages in akka

Akka framework recommends using typed actor only for interacting with external code. However, standard actors from akka are untyped. Is there any better way to create type safe actors? Are there some other actor frameworks or type safe wrappers around akka?
If you really want actors with static typing, then you might as well go ahead and use typed actors throughout your code. This is strongly discouraged for a couple of reasons.
1.) You run the risk of your system degenerating into a bunch of RPCs. An actor's receive method makes it pretty obvious that the whole thing is about message passing, much less so if you're just calling methods on a typed actor.
2.) An actor just really doesn't have a type. While it's running, the messages an actor is able to process may change depending on what state is in, as may what it does with those messages. This is an excellent way of modeling a lot of protocols, and Akka actors have first class support for it with FSMs.
So if you really want to do it, you're free to used typed actors everywhere and it'll work, but you should really think hard about the problem you're trying to solve before doing so.
For compile time checking see SynapseGrid framework. It defines a SystemBuilder that constructs the DataFlow topology. While constructing it is guaranteed that types that pass by are checked. Then the resulting system is converted to RuntimeSystem with nested and properly interconnected actors.
Why is this a problem for you? akka.actor.Actor has the receive method of type PartialFunction that will only be called for messages that it can handle. Why do you need compile time checks? But to answer your question: one way would be - for an external api - to build a wrapper around your ActorRef that then sends the messages to the actor.
Things are going quite fast, I thought about giving an update
1. Typed actors are deprecated
2. Instead a new concept of Akka Typed is being devloped at the momemnt
As I understood this should be the definitive solution to an typed actor system. But since this is at least the third try and planned earliest for Akka 2.4, this claim remains to be proven.
I personally do look forward to have both systems available: the existing one for more dynamic use cases, the new one for more robust ones

Akka for simulations

I'm new to akka and the actor-pattern, therefore I'm not sure if it fit my needs.
I want to create a simulation with akka and millions of entities (think as domain objects - later actors) that can influence each other. So thinking as simulation with a more-or-less "fuzzy" result, we have an array with entities, where each of these entities has a speed, but is thwarted by the entities in front of the actual entity. When the simulation starts, each entity should move n-fields, or, if thwarted by others, less fields. We have multiple iterations, and in the end we have a new order. This is repeated for some rounds until we want to see a "snapshot" of the leading entities (which are then possibly removed before the next round starts).
So I don't understand if I can create this with akka, because:
Is it possible to have global list with the position of each actor, so they know at which position they are and which are in front of them?
As far as I understand, this violates the encapsulation of the actors. I can put the position of the actor in the actor itself, but how can I see/notify the actors around this actor?
Beside of this, the global list will create synchronization problems and impacts the performance, which is the exactly opposite of the desired behaviour (and is complementary to akka/the actor-pattern)
What did I missed? Do I have to search for another design approach?
Thanks for suggestions.
Update: working with the eventbus and classifiers doesn't seem an option, too. Refering to the documentation:
"hence it is not well-suited to use cases in which subscriptions change with very high frequency"
The actor model is a very good fit for your scenario. Actors communicate by sending messages, so each actor can send messages to his neighbors containing his position. Of course, each actor cannot know about every other actor in the system (not efficiently anyway) so you will have to also devise a scheme though which each actor knows which are his neighbors.
As for getting a snapshot of the system, simply have a central actor that is known by everybody and knows everybody.
It seems like you're just getting started with actors. Read a bit more - the akka site is a good resource - and come back and refine your question, if needed.
Your problem sounds like an n-body simulation sort of thing, so looking into that might help also.

Akka framework support for finding duplicate messages

I'm trying to build a high-performance distributed system with Akka and Scala.
If a message requesting an expensive (and side-effect-free) computation arrives, and the exact same computation has already been requested before, I want to avoid computing the result again. If the computation requested previously has already completed and the result is available, I can cache it and re-use it.
However, the time window in which duplicate computation can be requested may be arbitrarily small. e.g. I could get a thousand or a million messages requesting the same expensive computation at the same instant for all practical purposes.
There is a commercial product called Gigaspaces that supposedly handles this situation.
However there seems to be no framework support for dealing with duplicate work requests in Akka at the moment. Given that the Akka framework already has access to all the messages being routed through the framework, it seems that a framework solution could make a lot of sense here.
Here is what I am proposing for the Akka framework to do:
1. Create a trait to indicate a type of messages (say, "ExpensiveComputation" or something similar) that are to be subject to the following caching approach.
2. Smartly (hashing etc.) identify identical messages received by (the same or different) actors within a user-configurable time window. Other options: select a maximum buffer size of memory to be used for this purpose, subject to (say LRU) replacement etc. Akka can also choose to cache only the results of messages that were expensive to process; the messages that took very little time to process can be re-processed again if needed; no need to waste precious buffer space caching them and their results.
3. When identical messages (received within that time window, possibly "at the same time instant") are identified, avoid unnecessary duplicate computations. The framework would do this automatically, and essentially, the duplicate messages would never get received by a new actor for processing; they would silently vanish and the result from processing it once (whether that computation was already done in the past, or ongoing right then) would get sent to all appropriate recipients (immediately if already available, and upon completion of the computation if not). Note that messages should be considered identical even if the "reply" fields are different, as long as the semantics/computations they represent are identical in every other respect. Also note that the computation should be purely functional, i.e. free from side-effects, for the caching optimization suggested to work and not change the program semantics at all.
If what I am suggesting is not compatible with the Akka way of doing things, and/or if you see some strong reasons why this is a very bad idea, please let me know.
Thanks,
Is Awesome, Scala
What you are asking is not dependent on the Akka framework but rather it's how you architect your actors and messages. First ensuring that your messages are immutable and have an appropriately defined identities via the equals/hashCode methods. Case classes give you both for free however if you have actorRefs embedded in the message for reply purposes you will have to override the identity methods. The case class parameters should also have the same properties recursively (immutable and proper identity).
Secondly you need to figure out how the actors will handle storing and identifying current/past computations. The easiest is to uniquely map requests to actors. This way that actor and only that actor will ever process that specific request. This can be done easily given a fixed set of actors and the hashCode of the request. Bonus points if the actor set is supervised where the supervisor is managing the load balancing/mapping and replacing failed actors ( Akka makes this part easy ).
Finally the actor itself can maintain a response caching behavior based on the criteria you described. Everything is thread safe in the context of the actor so a LRU cache keyed by the request itself ( good identity properties remember ) is easy with any type of behavior you want.
As Neil says, this is not really framework functionality, it's rather trivial to implement this and even abstract it into it's own trait.
trait CachingExpensiveThings { self: Actor =>
val cache = ...
def receive: Actor.Receive = {
case s: ExpensiveThing => cachedOrCache(s)
}
def cacheOrCached(s: ExpensiveThing) = cache.get(s) match {
case null => val result = compute(s)
cache.put(result)
self.reply_?)(result)
case cached => self.reply_?)(cached)
}
def compute(s: ExpensiveThing): Any
}
class MyExpensiveThingCalculator extends Actor with CachingExpensiveThings {
def compute(s: ExpensiveThing) = {
case l: LastDigitOfPi => ...
case ts: TravellingSalesman => ...
}
}
I do not know if all of these responsibilities should be handled only by the Akka. As usual, it all depends on the scale, and in particular - the number of attributes that defines the uniqueness of the message.
In case of cache mechanism, already mentioned approach with uniquely mapping requests to actors is way to go especially that it could be supported by the persistency.
In case of identity, instead of checking simple equality (which may be bottleneck) I will rather use graph based algorithm like signal-collect.

Using Scala, does a functional paradigm make sense for analyzing live data?

For example, when analyzing live stockmarket data I expose a method to my clients
def onTrade(trade: Trade) {
}
The clients may choose to do anything from counting the number of trades, calculating averages, storing high lows, price comparisons and so on. The method I expose returns nothing and the clients often use vars and mutable structures for their computation. For example when calculating the total trades they may do something like
var numTrades = 0
def onTrade(trade: Trade) {
numTrades += 1
}
A single onTrade call may have to do six or seven different things. Is there any way to reconcile this type of flexibility with a functional paradigm? In other words a return type, vals and nonmutable data structures
You might want to look into Functional Reactive Programming. Using FRP, you would express your trades as a stream of events, and manipulate this stream as a whole, rather than focusing on a single trade at a time.
You would then use various combinators to construct new streams, for example one that would return the number of trades or highest price seen so far.
The link above contains links to several Haskell implementations, but there are probably several Scala FRP implementations available as well.
One possibility is using monads to encapsulate state within a purely functional program. You might check out the Scalaz library.
Also, according to reports, the Scala team is developing a compiler plug-in for an effect system. Then you might consider providing an interface like this to your clients,
def callbackOnTrade[A, B](f: (A, Trade) => B)
The clients define their input and output types A and B, and define a pure function f that processes the trade. All "state" gets encapsulated in A and B and threaded through f.
Callbacks may not be the best approach, but there are certainly functional designs that can solve such a problem. You might want to consider FRP or a state-monad solution as already suggested, actors are another possibility, as is some form of dataflow concurrency, and you can also take advantage of the copy method that's automatically generated for case classes.
A different approach is to use STM (software transactional memory) and stick with the imperative paradigm whilst still retaining some safety.
The best approach depends on exactly how you're persisting the data and what you're actually doing in these state changes. As always, let a profiler be your guide if performance is critical.