Is there a way to randomly assign routes or roles for a defined number of actors in Akka? - scala

Suppose I want to implement a cluster system where some actos will be request dispatchers and others will be standard nodes. How can I randomly assign a predefined number of actors (does not matter the hostname and port) a specific route or even role?
Explaining better:
Suppose I have these nodes:
1 - akka.tcp://ClusterSystem#192.168.0.1:2551/user/clusterListener
2 - akka.tcp://ClusterSystem#192.168.0.2:2552/user/clusterListener
3 - akka.tcp://ClusterSystem#192.168.0.3:2553/user/clusterListener
4 - akka.tcp://ClusterSystem#192.168.0.4:2554/user/clusterListener
Now I want 2 of them to have the sub route "dispatcher" (akka.tcp://ClusterSystem#xxx.xxx.xxx.xxx:xxxx/user/clusterListener/dispatcher)

You can use http://doc.akka.io/docs/akka/2.3.0/contrib/cluster-singleton.html for coordination.
Every actor without role may send "GetRole" message to the singleton and it will pickup role randomly (using some internal RoleMap). Note that you should listen memberDown message from singleton to free role when some node (obtained this role) has removed.

Related

Changing number of partitions for a reliable actor service

When I create a new Service Fabric actor the underlying (auto generated) actor service is configured to use 10 partitions.
I'm wondering how much I need to care about this value?
In particular, I wonder whether the Actor Runtime has support for changing the number of partitions of an actor service on a running cluster.
The Partition Service Fabric reliable services topic says:
In rare cases, you may end up needing more partitions than you have initially chosen. As you cannot change the partition count after the fact, you would need to apply some advanced partition approaches, such as creating a new service instance of the same service type. You would also need to implement some client-side logic that routes the requests to the correct service instance, based on client-side knowledge that your client code must maintain.
However, due to the nature of Actors and that they are managed by the Actor Runtime I'm tempted to believe that it would indeed be possible to do this. -- That the Actor Runtime would be able to take care of all the heavylifting required to re-partition actor instances.
Is that at all possible?
The number of partitions in a running service cannot be changed. This is true of Actors as well as Reliable Services. Typically, you would want to pick a large number of partitions (more than the number of nodes) up front and then scale out the number of nodes in the cluster instead of trying to repartition your data on the fly. Take a look at Abhishek and Matthew's comments in the discussion here for some ideas on how to estimate how many partitions you might need.

How can I reach a specific replica of a stateless service

I've created a stateless service within Service Fabric. It has a SingletonPartition, but multiple instances (InstanceCount is -1 in my case).
I want to communicate with a specific replica of this service. To find all replica's I use:
var fabricClient = new FabricClient();
var serviceUri = new Uri(SERVICENAME);
Partition partition = (await fabricClient.QueryManager.GetPartitionListAsync(serviceUri)).First();
foreach(Replica replica in await fabricClient.QueryManager.GetReplicaListAsync(partition.PartitionInformation.Id))
{
// communicate with this replica, but how to construct the proxy?
//var eventHandlerServiceClient = ServiceProxy.Create<IService>(new Uri(replica.ReplicaAddress));
}
The problem is that there is no overload of the ServiceProxy to create one to the replica. Is there another way to communicate with a specific replica?
Edit
The scenario we are building is the following. We have different moving parts with counter information: 1 named partitioned stateful service (with a couple of hundred partitions), 1 int64 partitioned stateful service, and 1 actor with state. To aggregate the counter information, we need to reach out to all service-partitions and actor-instances.
We could of course reverse it and let everyone send there counts to a single (partitioned) service. But that would add a network call in the normal flow (and thus overhead).
Instead, we came up with the following. The mentioned services&actors are combined into one executable and one servicemanifest. Therefore they are in the same process. We add a stateless service with instancecount -1 to the mentioned services&actors. All counter information is stored inside a static variable. The stateless service can read this counter information.
Now, we only need to reach out to the stateless service (which has an upper limit of the number of nodes).
Just to get some terminology out of the way first, "replica" only applies to stateful services where you have a unique replica set for each partition of a service and replicate state between them for HA. Stateless services just have instances, all of which are equal and identical.
Now to answer your actual question: ServiceProxy doesn't have an option to connect to a specific instance of a deployed stateless service. You have the following options:
Primary replica: connect to the primary replica of a stateful service partition.
Random instance: connect to a random instance of a stateless service.
Random replica: connect to a random replica - regardless of its role - of a stateful service partition.
Random secondary replica - connect to a random secondary replica of a stateful service partition.
E.g.:
ServiceProxy.Create<IMyService>(serviceUri, partitionKey, TargetReplicaSelector.RandomInstance)
So why no option to connect to a specific stateless service instance?
Well, I would turn this question around and ask why would you want to connect to a specific stateless service instance? By definition, each stateless instance should be identical. If you are keeping some state in there - like user sessions - then now you're stateful and should use stateful services.
You might think of intelligently deciding which instance to connect to for load balancing, but again since it's stateless, no instance should be doing more work than any other as long as requests are distributed evenly. And for that, Service Proxy has the random distribution option.
With that in mind, if you still have some reason to seek out specific stateless service instances, you can always use a different communication stack - like HTTP - and do whatever you want.
"Well, I would turn this question around and ask why would you want to connect to a specific stateless service instance?"
One example would be if you have multiple (3x) stateless service instances all having WebSocket connections to different clients, let's say 500 each. And you want to notify all 1500 (500x3) users of the same message, if it was possible to connect directly to a specific instance (which I would expect was possible, since I can query for those instances using the FabricClient), I could send a message to each instance which would redirect it to all connected clients.
Instead we have to come up with any of multiple workarounds:
Have all instances connect to some evented system that allows them to trigger on incoming message, e.g. Azure Event Hubs, Azure Service Bus, RedisCache.
Host an additional endpoint, as mentioned here, which makes it 3 endpoints pr service instance: WCF, WebSocket, HTTP.
Change to a stateful partitioned service which doesn't hold any state or any replicas, but simply allows to call partitions.
Currently having some serious issues with RedisCache so migrating away from that, and would like to avoid external dependencies such as Event Hubs and Service Bus just for this scenario.
Sending many messages each second, which will give additional overhead when having to call HTTP, and then the request need to transition over to the WebSocket context.
In order to target a specific instance of stateless service you can use named partitions. You can have a single instance per partition and use multiple Named partitions. For example, you can have 5 named partitions [0,1,2,3,4] each will have only one instance of the "service". Then you can call it like this
ServiceProxy.Create<IMyService>(serviceUri, partitionKey, TargetReplicaSelector.RandomInstance)
where partitionKey parameter will have one of values [0,1,2,3,4].
the real example would be
_proxyFactory.CreateServiceProxy<IMyService>(
_myServiceUri,
new ServicePartitionKey("0"), // One of "0,1,2,3,4"
TargetReplicaSelector.Default,
MyServiceEndpoints.ServiceV1);
This way you can choose one of 5 instances. But all 5 instancies may not be always available. For example during startup or when the service dies and SF is recreating or it is in InBuild stage... So for this reason you should run Partition discovery

How generate unique id for Actor?

Suppose I have an application that uses actors for processing User. So there is one UserActor per user. Also every user Actor is mapped to user via id, e.g. to process actions with concrete user you should get Actor like that:
ActorSelection actor = actorSystem.actorSelection("/user/1");
where 1 is user id.
So the problem is - how generate unique id inside cluster effectively? First it needs to check that new id will not duplicate an existent one. I can create one actor for generating id's which will live in one node, and before creating any new UserActor Generator is asked for id, but this leads to additional request inside cluster whenever user is created. Is there a way to do this more effective? Are there build-in akka techniques to do that?
P.S. May this architecture for using Actor is not effective any suggestion/best practice is welcome.
I won't say whether or not your approach is a good idea. That's going to be up to you to decide. If I do understand your problem correctly though, then I can suggest a high level approach to making it work for you. If I understand correctly, you have a cluster, and for any given userId, there should be an actor in the system that handles requests for it, and it should only be on one node and consistently reachable based on the user id of the user. If that's correct, then consider the following approach.
Let's start first with a simple actor, let's call it UserRequestForwarder. This actors job is to find an actor instance for a request for a particular user id and forward on to it. If that actor instance does not yet exist, then this actor will create it before forwarding onto it. A very rough sketch could look like this:
class UserRequestForwarder extends Actor{
def receive = {
case req # DoSomethingForUser(userId) =>
val childName = s"user-request-handler-$userId"
val child = context.child(childName).getOrElse(context.actorOf(Props[UserRequestHandler]))
child forward req
}
}
Now this actor would be deployed onto every node in the cluster via a ConsistentHashingPool router configured in such a way that there would be one instance per node. You just need to make sure that there is something in every request that needs to travel through this router that allows it to be consistently hashed to the node that handles requests for that user (hopefully using the user id)
So if you pass all requests through this router, they will always land on the node that is responsible for that user, ending up in the UserRequestForwarder which will then find the correct user actor on that node and pass the request on to it.
I have not tried this approach myself, but it might work for what you are trying to do provided I understood your problem correctly.
Not an akka expert, so I can't offer code, but shouldn't the following approach work:
Have a single actor being responsible for creating the actors. And have it keep a Hashset of actor names, for actors that it created, and that didn't die already.
If you have to spread the load between multiple actors you can dispatch the task based on the first n digits of the hashcode of the actor name that has to be created.
It seems like you have your answer on how to generate the unique ID. In terms of your larger question, this is what Akka cluster sharding is designed to solve. It will handle distributing shards among your cluster, finding or starting your actors within the cluster and even rebalancing.
http://doc.akka.io/docs/akka/2.3.5/contrib/cluster-sharding.html
There's also an activator with a really nice example.
http://typesafe.com/activator/template/akka-cluster-sharding-scala

How to manage Squid based on per user user bandwidth

I want to manage the bandwidth and traffic based on user activities on Squid Server Proxy.
I made some research but couldn't find the solution that I want.
For example, users who have more than 256K traffic should be restricted from server.
Can you help me?
Thanks
I'm assumed squid 3.x:
To provide a way to limit the bandwidth of certain requests based on any list of criteria.
class:
the class of a delay pool determines how the delay is applied, ie, whether the different client IPs are treated separately or as a group (or both)
class 1:
a class 1 delay pool contains a single unified bucket which is used for all requests from hosts subject to the pool
class 2:
a class 2 delay pool contains one unified bucket and 255 buckets, one for each host on an 8-bit network (IPv4 class C)
class 3:
contains 255 buckets for the subnets in a 16-bit network, and individual buckets for every host on these networks (IPv4 class B )
class 4:
as class 3 but in addition have per authenticated user buckets, one per user.
class 5:
custom class based on tag values returned by external_acl_type helpers in http_access. One bucket per used tag value.
Delay pools allows you to limit traffic for clients or client groups, with various features:
Can specify peer hosts which aren't affected by delay pools, ie,
local peering or other 'free' traffic (with the no-delay peer
option).
delay behavior is selected by ACLs (low and high priority traffic,
staff vs students or student vs authenticated student or so on).
each group of users has a number of buckets, a bucket has an amount
coming into it in a second and a maximum amount it can grow to; when
it reaches zero, objects reads are deferred until one of the object's
clients has some traffic allowance.
any number of pools can be configured with a given class and any set
of limits within the pools can be disabled, for example you might
only want to use the aggregate and per-host bucket groups of class 3,
not the per-network one.
In your case can you use:
For a class 4 delay pool:
delay_pools pool 4
delay_parameters pool aggregate network individual user
The last delay_pool, can be configure in your squid server proxy:
for example; each user will be limited to 128Kbits/sec no matter how many workstations they are logged into:
delay_pools 1
delay_class 1 2
delay_access 1 allow all
delay_parameters 4 32000/32000 8000/8000 600/64000 16000/16000
Please read more:
http://wiki.squid-cache.org/Features/DelayPools
http://www.squid-cache.org/Doc/config/delay_parameters/

Select remote actor on random port

In Scala, if I register a remote actor using alive(0), the actor is registered at a random port.
I can do the registration like this: register('fooParActor, self) in the act method.
Now, on the master/server side, I can select an actor by supplying the port. Do I need to manually scan the ports in order to use random ports?
The problem I am trying to solve is to create n actors on a node and then select them all at a master/server program, e.g. start 10 slaves on node x and get a list 10 remote actors at node y.
How is this done?
There's no need to register various ports for the actors. Instead you need one port for the whole actor system - more precisely the akka kernel (that the server needs to know too). See this page of the documentation for how all of this works in detail.
In order to select a remote actor you can then look it up via its path in the remote actor system, similarly to something like this:
context.actorFor("akka://actorSystemName#10.0.0.1:2552/user/someActorName/1")
In that case, you would have created the n actors as children of the someActorName actor and given them the names 1 to n (so you could get the others via .../someActorName/2, .../someActorName/3 and so on).
There's no need to randomize anything at all here and given how you described the problem, there is also no need for randomization within that. You simply start the 10 actors up and number them from 1 to 10. Any random numbers would just unnecessarily complicate things.
As for really random ports I can only agree with sourcedelica. You need a fixed port to communicate the random ones, or some other way of communications. If someone doesn't know where to communicate to due to the random port it simply won't work.
You need to have at least one ActorSystem with a well known port. Then the other ActorSystems can use port 0 to have Akka assign a random port. The slave ActorSystems will have actors register with an actor on the Master so it knows all of the remote systems.
If you absolutely need to have your master use a random port it will need to communicate its port out of band (using a shared filesystem or database).