Suppose I have an application that uses actors for processing User. So there is one UserActor per user. Also every user Actor is mapped to user via id, e.g. to process actions with concrete user you should get Actor like that:
ActorSelection actor = actorSystem.actorSelection("/user/1");
where 1 is user id.
So the problem is - how generate unique id inside cluster effectively? First it needs to check that new id will not duplicate an existent one. I can create one actor for generating id's which will live in one node, and before creating any new UserActor Generator is asked for id, but this leads to additional request inside cluster whenever user is created. Is there a way to do this more effective? Are there build-in akka techniques to do that?
P.S. May this architecture for using Actor is not effective any suggestion/best practice is welcome.
I won't say whether or not your approach is a good idea. That's going to be up to you to decide. If I do understand your problem correctly though, then I can suggest a high level approach to making it work for you. If I understand correctly, you have a cluster, and for any given userId, there should be an actor in the system that handles requests for it, and it should only be on one node and consistently reachable based on the user id of the user. If that's correct, then consider the following approach.
Let's start first with a simple actor, let's call it UserRequestForwarder. This actors job is to find an actor instance for a request for a particular user id and forward on to it. If that actor instance does not yet exist, then this actor will create it before forwarding onto it. A very rough sketch could look like this:
class UserRequestForwarder extends Actor{
def receive = {
case req # DoSomethingForUser(userId) =>
val childName = s"user-request-handler-$userId"
val child = context.child(childName).getOrElse(context.actorOf(Props[UserRequestHandler]))
child forward req
}
}
Now this actor would be deployed onto every node in the cluster via a ConsistentHashingPool router configured in such a way that there would be one instance per node. You just need to make sure that there is something in every request that needs to travel through this router that allows it to be consistently hashed to the node that handles requests for that user (hopefully using the user id)
So if you pass all requests through this router, they will always land on the node that is responsible for that user, ending up in the UserRequestForwarder which will then find the correct user actor on that node and pass the request on to it.
I have not tried this approach myself, but it might work for what you are trying to do provided I understood your problem correctly.
Not an akka expert, so I can't offer code, but shouldn't the following approach work:
Have a single actor being responsible for creating the actors. And have it keep a Hashset of actor names, for actors that it created, and that didn't die already.
If you have to spread the load between multiple actors you can dispatch the task based on the first n digits of the hashcode of the actor name that has to be created.
It seems like you have your answer on how to generate the unique ID. In terms of your larger question, this is what Akka cluster sharding is designed to solve. It will handle distributing shards among your cluster, finding or starting your actors within the cluster and even rebalancing.
http://doc.akka.io/docs/akka/2.3.5/contrib/cluster-sharding.html
There's also an activator with a really nice example.
http://typesafe.com/activator/template/akka-cluster-sharding-scala
Related
I have a system, that has an actor per user. Users send messages rarely, but when they do, they send usually not only one, but few.
Currently, I have a map, where I store persistenceId -> ActorRef. When I'm receiving a new message for an actor, I look into the map, if there is an ActorRef, I use it. If it is missing, I create it and put it into the map. For sure I don't want to have 2 instances of same persistence actor at the same time. Also, I don't want to create and destroy the actor for each message, as recovery could take some time.
I feel there should be some cleaner way of "locating or creating" an actor. Something like actorSystem.getOrCreate(persistenceId, props). I thought that sharding might help me with that, but I couldn't find an exact example of this. Also, I know there is actorSelection, which has downsides:
using it in too many places, with hardcoded paths that are tricky to
maintain
using it to send too many messages as it has a performance
cost
So basically the question is what is the best way of locating persistence actor within one service if I actor persistenceId is userId. If I decide to use sharding, then it will be 1 shard per actor. Is this ok?
Actor sharding is pretty much what you need - you can think about it as a distributed map of actors and there is no need of having additional solutions. The sharding takes care of summoning the actor behind the scenes and there is no need for you to manage actors yourself.
val sharding = ClusterSharding(system).start(
typeName = CustomerActor.shardName,
entityProps = CustomerActor.props,
settings = ClusterShardingSettings(system),
extractEntityId = CustomerActor.extractEntityId,
extractShardId = CustomerActor.extractShardId)
}
where extractEntityId is a function which routes messages to appropriate actors
val extractEntityId: ShardRegion.ExtractEntityId = {
case gc: GetCustomer => (gc.customerId, gc)
}
And final example:
case class GetCustomer(customerId: String)
sharding ! GetCustomer("customer-id")
More details here https://doc.akka.io/docs/akka/2.5/cluster-sharding.html
Suppose we'd have a large number of persistent Person actors, each constructed with an identity and a name argument. What would be the best way to distribute these actors in a cluster, in such a manner that:
new actors are appointed a node by strategy X (round robin, consistent hash, etc.)
a "coordinator" actor contains a mapping from identity to ActorRef
one or more nodes can fail and the affected actors are recovered on other nodes
there is no SPF
I've considered the following, which doesn't seem to solve the problem:
Cluster sharding; all actors are initialised equally and created by coordinator
Cluster aware routing; groups or pools are fixed size and can't be modified dynamically
Sounds like you pretty much exactly are describing Akka cluster sharding and there isn't enough information to see why it would not fit.
The common solution to deal with such a design problem is to have an uninitialized state of the sharded entity where it only accepts an initialize command containing the needed values (so something like CreateUser(id, name)) and when it gets that it toggles to its "normal" behavior.
Another option could be to introduce an intermediate actor that doesn't start the actual actor until it has extracted the name value if you have no means to change the design of your Person actor.
Ofc. you could also drop down to the Akka cluster APIs directly and build something that exactly matches your use case, but handling redistribution on cluster topology change (add, remove nodes etc) is far from trivial to get right.
I think you would also come to the realisation that achieving such a tool that is entirely non-invasive for your entities without the sharding solution being tightly coupled with you business logic is very hard.
Cluster aware router:
val router = system.actorOf(ClusterRouterPool(
RoundRobinPool(0),
ClusterRouterPoolSettings(
totalInstances = 20,
maxInstancesPerNode = 1,
allowLocalRoutees = false,
useRole = None
)
).props(Props[Worker]), name = "router")
Here, we can send message to router, the message will send to a series of remote routee actors.
Cluster sharding (Not consider persistence)
class NewShoppers extends Actor {
ClusterSharding(context.system).start(
"shardshoppers",
Props(new Shopper),
ClusterShardingSettings(context.system),
Shopper.extractEntityId,
Shopper.extractShardId
)
def proxy = {
ClusterSharding(context.system).shardRegion("shardshoppers")
}
override def receive: Receive = {
case msg => proxy forward msg
}
}
Here, we can send message to proxy, the message will send to a series of sharded actors (a.k.a. entities).
So, my question is: it seems both 2 methods can make the tasks distribute to a lot of actors. What's the design choice of above two? Which situation need which choice?
The pool router would be when you just want to send some work to whatever node and have some processing happen, two messages sent in sequence will likely not end up in the same actor for processing.
Cluster sharding is for when you have a unique id on each actor of some kind, and you have too many of them to fit in one node, but you want every message with that id to always end up in the actor for that id. For example modelling a User as an entity, you want all commands about that user to end up with the user but you want the actor to be moved if the cluster topology changes (remove or add nodes) and you want them reasonably balanced across the existing nodes.
Credit to johanandren and the linked article as basis for the following answer:
Both a router and sharding distribute work. Sharding is required if, additionally to load balancing, the recipient actors have to reliably manage state that is directly associated with the entity identifier.
To recap, the entity identifier is a key, derived from the message being sent, determining the message's receipient actor in the cluster.
First of all, can you manage state associated with an identifier across different nodes using a consistently hashing router? A Consistent Hash router will always send messages with an equal identifier to the same target actor. The answer is: No, as explained below.
The hash-based method stops working when nodes in the cluster go Down or come Up, because this changes the associated actor for some identifiers. If a node goes down, messages that were associated with it are now sent to a different actor in the network, but that actor is not informed about the former state of the actor which it is now replacing. Likewise, if a new node comes up, it will take care of messages (identifiers) that were previously associated with a different actor, and neither the new node or the old node are informed about this.
With sharding, on the other hand, the actors that are created are aware of the entity identifier that they manage. Sharding will make sure that there is exactly one actor managing the entity in the cluster. And it will re-create sharded actors on a different node if their parent node goes down. So using persistence they will retain their (persisted) state across nodes when the number of nodes changes. You also don't have to worry about concurrency issues if an actor is re-created on a different node thanks to Sharding. Furthermore, if a message with a new entity identifier is encountered, for which an actor does not exist yet, a new actor is created.
A consistently hashing router may still be of use for caching, because messages with the same key generally do go to the same actor. To manage a stateful entity that exists only once in the cluster, Sharding is required.
Use routers for load balancing, use Sharding for managing stateful entities in a distributed manner.
I'm currently implementing a system that that receives inbound messages from an external monitoring system. I'm translating these messages into more concise 'events', and I'm using these to alter the state of 'Managed System' objects. Akka Actors seemed like a good use case for encapsulating mutable state in concurrent applications.
The managed systems are identified by a name (99% of the time this is a hostname). Whenever a proper event is received, the system routes the message to the correct actor based on the name property. At first I used to use actorSelection and the complete paths of said actors, but that was very ugly, and I saw several people advise against relying on the fully qualified name of an actor to deliver message.
So I've set up a simple EventBus, which is great as I can now simply do:
eventBus.subscribe(subscriber1, "/managedSystem01")
eventBus.subscribe(subscriber2, "/managedSystem02")
eventBus.publish(MonitoringEvent("/managedSystem01", MonitoringMessage("managedSystem01", "N", "CPU_LOAD_HIGH", True)))
eventBus.publish(MonitoringEvent("/managedSystem02", MonitoringMessage("managedSystem02", "Y", "DISK_USAGE_HIGH", True)))
Of course, I now have the issue that, should I receive and event that concerns a managed system for which I've not spawned an actor yet (this is entirely possibly, it is impossible for me to get an absolute list of managed systems unfortunately), the message will be routed to the dead-letter mailbox.
Ideally I don't want this to happen. When it is unable to address a specific actor, I want to spawn a new one dynamically.
I suppose that, theoretically, I could subscribe to DeadLetter messages but:
That sounds a little 'hacky', since those message are essentially reserved for the system
Is it even possible to recover the original message (in my case, the MonitoringMessage) that is sent to the DeadLetter mailbox?
Alternatively is there a way to check if there are ZERO subscribers to a certain "topic"?
What you describe ("send to Actor by some identifier, if it does not exist buffer until it gets created and then deliver to that newly on-demand created Actor") is implemented in Akka as Cluster Sharding.
While it is designed primarily for sharding load (work) across a cluster, you could use it locally as well, since your requirement is essentially a scaled down (to one node) version of problem that it solves. It takes care of starting new Actors if they don't exist for a given identifier etc, so you'd simply subscribe the shard-region to the events and it'll take care of creating the actors for you.
I'm new to the Akka framework and I'm building an HTTP server application on top of Netty + Akka.
My idea so far is to create an actor for each type of request. E.g. I would have an actor for a POST to /my-resource and another actor for a GET to /my-resource.
Where I'm confused is how I should go about actor creation? Should I:
Create a new actor for every request (by this I mean for every request should I do a TypedActor.newInstance() of the appropriate actor)? How expensive is it to create a new actor?
Create one instance of each actor on server start up and use that actor instance for every request? I've read that an actor can only process one message at a time, so couldn't this be a bottle neck?
Do something else?
Thanks for any feedback.
Well, you create an Actor for each instance of mutable state that you want to manage.
In your case, that might be just one actor if my-resource is a single object and you want to treat each request serially - that easily ensures that you only return consistent states between modifications.
If (more likely) you manage multiple resources, one actor per resource instance is usually ideal unless you run into many thousands of resources. While you can also run per-request actors, you'll end up with a strange design if you don't think about the state those requests are accessing - e.g. if you just create one Actor per POST request, you'll find yourself worrying how to keep them from concurrently modifying the same resource, which is a clear indication that you've defined your actors wrongly.
I usually have fairly trivial request/reply actors whose main purpose it is to abstract the communication with external systems. Their communication with the "instance" actors is then normally limited to one request/response pair to perform the actual action.
If you are using Akka, you can create an actor per request. Akka is extremely slim on resources and you can create literarily millions of actors on an pretty ordinary JVM heap. Also, they will only consume cpu/stack/threads when they actually do something.
A year ago I made a comparison between the resource consumption of the thread-based and event-based standard actors. And Akka is even better than the event-base.
One of the big points of Akka in my opinion is that it allows you to design your system as "one actor per usage" where earlier actor systems often forced you to do "use only actors for shared services" due to resource overhead.
I would recommend that you go for option 1.
Options 1) or 2) have both their drawbacks. So then, let's use options 3) Routing (Akka 2.0+)
Router is an element which act as a load balancer, routing the requests to other Actors which will perform the task needed.
Akka provides different Router implementations with different logic to route a message (for example SmallestMailboxPool or RoundRobinPool).
Every Router may have several children and its task is to supervise their Mailbox to further decide where to route the received message.
//This will create 5 instances of the actor ExampleActor
//managed and supervised by a RoundRobinRouter
ActorRef roundRobinRouter = getContext().actorOf(
Props.create(ExampleActor.class).withRouter(new RoundRobinRouter(5)),"router");
This procedure is well explained in this blog.
It's quite a reasonable option, but whether it's suitable depends on specifics of your request handling.
Yes, of course it could.
For many cases the best thing to do would be to just have one actor responding to every request (or perhaps one actor per type of request), but the only thing this actor does is to forward the task to another actor (or spawn a Future) which will actually do the job.
For scaling up the serial requests handling, add a master actor (Supervisor) which in turn will delegate to the worker actors (Children) (round-robin fashion).