Broadcast message to actors watching a particular actor? - scala

How can I broadcast a message to all actors that are watching a particular actor?
For context, suppose I have a AuctionActor (which is potentially a remote actor) that is being watched by a large number of AuctionParticipantActor types. I would like the AuctionActor to broadcast various messages to AuctionParicipantActor types.
One possibility would be for the AuctionActor to keep a collection of all participant ActorRef instances and then loop over this collection when ever a message needs to be sent to all participants. This seems inefficient and I am hoping for a better solution...

If you don't want to go with PubSub as mentioned by Diego Martinoia, I would suggest using Routers with BroadcastingLogic. This goes in the direction you mentioned with the collection of ActorRefs, but uses Akka functionality to achieve it being more efficient than just iterating over a collection in your AuctionActor.
From Akka Docs
Routers are designed to be extremely efficient at receiving messages and passing them quickly on to routees.
A normal actor can be used for routing messages, but an actor’s single-threaded processing can become a bottleneck. Routers can achieve much higher throughput with an optimization to the usual message-processing pipeline that allows concurrent routing.
In your case it could look like this:
class AuctionActor extends Actor {
var router = Router(BroadcastRoutingLogic(), Vector[ActorRefRoutee]())
def receive = {
case AddParticipant(ref) =>
router = router.addRoutee(ref)
case RemoveParticipant(ref) =>
router = router.removeRoutee(ref)
case update: ImportantUpdate =>
router.route(update, self)
}
}

Related

Migrating from standards Akka actors to Akka Streams with back pressure and throttling

We have the following logic implemented to manage jobs targeting different backends:
A Manager Actor is started. This actor:
Loads the configuration required to target each backen (mutable map backend name -> backend connector configuration);
Loads a pool of Actors (RoundRobinPool) to handle the jobs for each backend (mutable map backend name -> RoundRobinPool Actor Ref)
When a request is received by the Manager actor, it retrieves the backend name from the message and forward it to the corresponding pool of Actor to handle the job (assuming a configuration for this backend was registered). The result of the job request is then returned from the actor to the original sender (reason why we use forward).
This logic works very well, but backend being slow to handle job, we are in a typical case of fast publisher, slow consumer and this is raising issues when the load increases.
After doing some research, Akka Streams seems the way to go as it allows to implement back pressure and throttling which would be perfect for our usage (for exemple, limit to 5 requests per seconds).
The idea is to keep the Manager Actor with the same routing logic but replace the pools of Actors with a Source.queue.
When registering the Source.queue, this would bed perform like this:
val queue = Source
.queue[RunBackendRequest](0, OverflowStrategy.backpressure)
.throttle(5, 1.second)
.map(r => runBackendRequest(r))
.toMat(Sink.ignore)(Keep.left)
.run())
Where the definition of RunBackendRequest is:
case class RunBackendRequest(originalSender: ActorRef, backendConnector: BackendConnector, request: BackendRequest)
And the function runBackendRequest is defined as such:
private def runBackendRequest(runRequest: RunBackendRequest): Unit = {
val connector = BackendConnectorFactory.getBackendConnector(configuration.underlying, runRequest.backendConnector.toConfig(), materializer, environment.asJava)
Future { connector.doSomeWork(runRequest.request) } map { result =>
runRequest.originalSender ! Success(result)
} recover {
case e: Exception => runRequest.originalSender ! Failure(e)
}
}
}
When the Manager Actor receive a message, it will 'offer' it to the correct queue based on the name of the target backend contained in the message.
Therefore, I have a few question:
Is this the correct way to use Akka Stream in this particular use case or could it we written differently and more efficiently?
Is that ok to provide the actorRef of the original sender in RunBackendRequest object so that the request will be answered in the Flow?
Is there a way to retrieve the result of the Flow into a Future instead and the Manager actor could then return the result of the request itself?
Akka Streams seems to be very powerful, but there is clearly a learning curve!
It feels to me having the Manager Actor creates a single point of failure. Maybe worth a try:
The original sender keeps hammering an Akka stream graph instead of the Manager actor. Make sure you pass the ActorRef downstream such that reply can be sent back
Inside the graph, using either partition-then-merge or Substreams to process requests that aim towards different backend connectors.
Either as the last step of the graph or after the backend connectors have finished, answer the original sender.
Overall, Colin's article is a great introduction on how to use Akka streams with Partition and Merge to archive your goal.
Let me know if you need more clarification and I can update my answer accordingly.

Find actor by persistence id

I have a system, that has an actor per user. Users send messages rarely, but when they do, they send usually not only one, but few.
Currently, I have a map, where I store persistenceId -> ActorRef. When I'm receiving a new message for an actor, I look into the map, if there is an ActorRef, I use it. If it is missing, I create it and put it into the map. For sure I don't want to have 2 instances of same persistence actor at the same time. Also, I don't want to create and destroy the actor for each message, as recovery could take some time.
I feel there should be some cleaner way of "locating or creating" an actor. Something like actorSystem.getOrCreate(persistenceId, props). I thought that sharding might help me with that, but I couldn't find an exact example of this. Also, I know there is actorSelection, which has downsides:
using it in too many places, with hardcoded paths that are tricky to
maintain
using it to send too many messages as it has a performance
cost
So basically the question is what is the best way of locating persistence actor within one service if I actor persistenceId is userId. If I decide to use sharding, then it will be 1 shard per actor. Is this ok?
Actor sharding is pretty much what you need - you can think about it as a distributed map of actors and there is no need of having additional solutions. The sharding takes care of summoning the actor behind the scenes and there is no need for you to manage actors yourself.
val sharding = ClusterSharding(system).start(
typeName = CustomerActor.shardName,
entityProps = CustomerActor.props,
settings = ClusterShardingSettings(system),
extractEntityId = CustomerActor.extractEntityId,
extractShardId = CustomerActor.extractShardId)
}
where extractEntityId is a function which routes messages to appropriate actors
val extractEntityId: ShardRegion.ExtractEntityId = {
case gc: GetCustomer => (gc.customerId, gc)
}
And final example:
case class GetCustomer(customerId: String)
sharding ! GetCustomer("customer-id")
More details here https://doc.akka.io/docs/akka/2.5/cluster-sharding.html

Akka synchronizing timestamped messages from several actors

Imagine the following architecture. There is an actor in akka that receives push messages via websocket. They have a timestamp and interval between those timestamps is 1 minute. Though the messages with the same timestamp can arrive multiple times via websocket. And then this messages are being broadcasted to as example three further actors (ma). They calculate metrics and push the messages further to the one actor(c).
For ma I defined a TimeSeriesBuffer that allows writing to the buffer only if entities have consequent timestamps. After successfull push to the buffer ma's emit metrics, that go to the c. c can only change it's state when it has all three metrics. Therefore I defined a trait Synchronizable and then a SynchronizableTimeSeriesBuffer with "master-slave" architecture.
On each push to every buffer a check is triggered in order to understand if there are new elements in the buffers of all three SynchronizableTimeSeriesBuffer with the same timestamp that can be emitted further to c as a single message.
So here are the questions:
1) Is it too complicated of a solution?
2) Is there a better way to do it in terms of scala and akka?
3) Why is it not so fast and not so parallel when messages in the system instead of being received "one by one" are loaded from db in a big batch and fed to the system in order to backtest the metrics. (one of the buffers is filling much faster than the others, while other one is at 0 length). I have an assumption it has something to do with akka's settings regarding dispatching/mailbox.
I created a gist with regarding code:
https://gist.github.com/ifif14/18b5f85cd638af7023462227cd595a2f
I would much appreciate the community's help in solving this nontrivial case.
Thanks in advance
Igor
Simplification
It seems like much of your architecture is designed to ensure that your message are sequentially ordered in time. Why not just add a simple Actor at the beginning that filters out duplicated messages? Then the rest of your system could be relatively simple.
As an example; given a message with timestamp
type Payload = ???
case class Message(timestamp : Long, payload : Payload)
You can write the filter Actor:
class FilterActor(ma : Iterable[ActorRef]) extends Actor {
var currentMaxTime = 0L
override def receive = {
case m : Message if m.timestamp > currentMaxTime => ma foreach (_ ! m)
case _ =>
}
}
Now you can eliminate all of the "TimeSeriesBuffer" and "Synchronizable" logic since you know that ma, and c, will only receive time-ordered messages.
Batch Processing
The likely reason why batch processing is not so concurrent is because the mailbox for your ma Actor is being filled up by the database query and whatever processing it is doing is slower than the processing for c. Therefore ma's mailbox continues to accumulate messages while c's mailbox remains relatively empty.
Thanks so much for your answer. The part with cutting off is what I also implemented in Synchronizable Trait.
//clean up slaves. if their queue is behind masters latest element
master_last_timestamp match {
case Some(ts) => {
slaves.foreach { s =>
while ( s.queue.length > 0 && s.getElementTimestamp(s.queue.front) < ts ) {
s.dequeue()
}
// val els = s.dequeueAll { queue_el => s.getElementTimestamp(queue_el) < ts }
}
}
case _ => Unit
}
The reason why I started to implement the buffer is because I feel like I will be using it a lot in the system and I don't think to write this part for each actor I will be using. Seems easier to have a blueprint that does it.
But a more important reason is that for some reason one buffer is either being filled much slower or not at all than the other two. Though they are being filled by the same actors!! (just different instances, and computation time should be pretty much the same) And then after two other actors emitted all messages that were "passed" from the database the third one starts receiving it. It feels to me that this one actor is just not getting processor time. So I think it's a dispatcher's setting that can affect this. Are you familiar with this?
Also I would expect dispatcher work more like round-robin, given each process a little of execution time, but it ends up serving only limited amount of actors and then jumping to the next ones. Although they sort of have to receive initial messages at the same time since there is a broadcaster.
I read akka documentation on dispatchers and mailboxes, but I still don't understand how to do it.
Thank you
Igor

How to limit actors per user

I am trying to understand how I could model this with akka.
My system processes messages for users, and the messages have to be processed serially for each user i.e. I cannot have 2+ actors processing a message at the same time for the same user.
So if I get 100 messages, and they are all for different users, I can spawn as many actors as I need to handle them.
But if I get 100 messages and they are for only 10 users, I have to ensure there is only 1 actor processing a given user at the same time.
How could I model this in Akka? How could I filter messages or manage that only 1 actor per user?
note: Each message will have a UserId with it.
Based on the information given, it sounds like you should consider using Cluster Sharding, and shard based on the user id. There are other solutions too, but this is probably a good fit and very simple to add based on what you've described.
One approach would be to have a map between user id and its actor in your request accepting actor. As such, every actor has a mailbox, so that it will queue the incoming messages to a certain extent (it's up to you to set up the mailbox size, it can even be unlimited, as far as memory allows).
In other words,
class RequestAcceptor extends Actor {
var users: Map[UserId, ActorRef] = Map.empty
def receive = {
case r#UserRequest(userId) =>
val maybeActor = users.get(userId)
if (maybeActor.isDefined) {
maybeActor.get ! r
} else {
val actor = Props(classOf[ProcessingActor])
users += userId -> actor
actor ! r
}
}
}
Of course, you have to take care about the mailbox overflow in this case, but that would be a separate question, which is most probably should better be solved with Akka Streams.

An Actor "queue"?

In Java, to write a library that makes requests to a server, I usually implement some sort of dispatcher (not unlike the one found here in the Twitter4J library: http://github.com/yusuke/twitter4j/blob/master/twitter4j-core/src/main/java/twitter4j/internal/async/DispatcherImpl.java) to limit the number of connections, to perform asynchronous tasks, etc.
The idea is that N number of threads are created. A "Task" is queued and all threads are notified, and one of the threads, when it's ready, will pop an item from the queue, do the work, and then return to a waiting state. If all the threads are busy working on a Task, then the Task is just queued, and the next available thread will take it.
This keeps the max number of connections to N, and allows at most N Tasks to be operating at the same time.
I'm wondering what kind of system I can create with Actors that will accomplish the same thing? Is there a way to have N number of Actors, and when a new message is ready, pass it off to an Actor to handle it - and if all Actors are busy, just queue the message?
Akka Framework is designed to solve this kind of problems, and is exactly what you're looking for.
Look thru this docu - there're lots of highly configurable dispathers (event-based, thread-based, load-balanced, work-stealing, etc.) that manage actors mailboxes, and allow them to work in conjunction. You may also find interesting this blog post.
E.g. this code instantiates new Work Stealing Dispatcher based on the fixed thread pool, that fulfils load balancing among the actors it supervises:
val workStealingDispatcher = Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher("pooled-dispatcher")
workStealingDispatcher
.withNewThreadPoolWithLinkedBlockingQueueWithUnboundedCapacity
.setCorePoolSize(16)
.buildThreadPool
Actor that uses the dispatcher:
class MyActor extends Actor {
messageDispatcher = workStealingDispatcher
def receive = {
case _ =>
}
}
Now, if you start 2+ instances of the actor, dispatcher will balance the load between the mailboxes (queues) of the actors (actor that has too much messages in the mailbox will "donate" some to the actors that has nothing to do).
Well, you have to see about the actors scheduler, as actors are not usually 1-to-1 with threads. The idea behind actors is that you may have many of them, but the actual number of threads will be limited to something reasonable. They are not supposed to be long running either, but rather quickly answering to messages they receive. In short, the architecture of that code seems to be wholly at odds with how one would design an actor system.
Still, each working actor may send a message to a Queue actor asking for the next task, and then loop back to react. This Queue actor would receive either queueing messages, or dequeuing messages. It could be designed like this:
val q: Queue[AnyRef] = new Queue[AnyRef]
loop {
react {
case Enqueue(d) => q enqueue d
case Dequeue(a) if q.nonEmpty => a ! (q dequeue)
}
}