Akka and ReactiveMongo - mongodb

I am trying to find the best approach to sharing the same pool of connection between actors withing the cluster workers. I have the following structure:
Master Actor -> Worker Actors(can be up to 100 or more) -> MongoDB
Between workers and MongoDB I want to put reactivemongo, however I am not sure how exactly to provide connection pool sharing between all actors.
According to reactivemongo documentation:
A MongoDriver instance manages an actor system; a connection manages a pool of connections. In general, MongoDriver or create a MongoConnection are never instantiated more than once. You can provide a list of one ore more servers; the driver will guess if it's a standalone server or a replica set configuration. Even with one replica node, the driver will probe for other nodes and add them automatically.
Should I just create it in Master actor and then bundle with each message?
So, this would be in Master actor:
val driver = new MongoDriver
val connection = driver.connection(List("localhost"))
And then I pass connection to actors in a message. Or should I query a connection in each Work Actor and pass just driver in a message?
Any help is very appreciated.
Thanks.

I would create the driver and connection in the master actor. I would then set up the the worker actors to take an instance of MongoConnection as a constructor argument so that each worker has a reference to the connection (which is really a proxy to a pool of connections). Then, in something like preStart, have the master actor create the workers (which I am assuming are routed) and supply the connection as an arg. A very simplified example could look like this:
class MongoMaster extends Actor{
val driver = new MongoDriver
val connection = driver.connection(List("localhost"))
override def preStart = {
context.actorOf(Props(classOf[MongoWorker], connection).withRouter(FromConfig()))
}
def receive = {
//do whatever you need here
...
}
}
class MongoWorker(conn:MongoConnection) extends Actor{
def receive = {
...
}
}
This code is not exact, but at least it shows the high level concepts I described.

The answer by cmbaxter works as long as you don't need to instantiate the worker actors remotely. MongoConnection is not serializable.
I found this article https://github.com/eigengo/akka-patterns/wiki/Configuration very helpful. The basic idea is to implement a trait called Configured, which is populated by the main application. The actors can then use that trait to gain access to local, non-serializable objects such as MongoConnection.

Related

when we initialize actor system and create an actor using actorOf method, how many actors are getting created?

I have 2 questions:
How many actors does the below code create?
How do I create 1000 actors at the same time?
val system = ActorSystem("DonutStoreActorSystem");
val donutInfoActor = system.actorOf(Props[DonutInfoActor], name = "DonutInfoActor")
When you start the classic actor system and use actorOf like that it will create one of your DonutInfoActor and a few internal Akka system actors related to the event bus, logging, cluster if you are using that.
Just as texasbruce said in a comment, a loop lets you create any number of actors from a single spot, startup is async, so you will get an ActorRef back that is ready to use but the actor that it referencing it may still be starting up.
Note that if you are building something new we recommend the new "typed" actor APIs that was completed in Akka 2.6 over the classic API used in your sample.

Scala Play 2.5 with Slick 3 and Spec2

I have a play application using Slick that I want to test using Spec2, but I keep getting the error org.postgresql.util.PSQLException: FATAL: sorry, too many clients already. I have tried to shut down the database connection by using
val mockApp = new GuiceApplicationBuilder()
val db = mockApp.injector.instanceOf[DBApi].database("default")
...
override def afterAll = {
db.getConnection().close()
db.shutdown()
}
But the error persists. The Slick configuration is
slick.dbs.default.driver="slick.driver.PostgresDriver$"
slick.dbs.default.db.driver="org.postgresql.Driver"
slick.dbs.default.db.url="jdbc:postgresql://db:5432/hygge_db"
slick.dbs.default.db.user="*****"
slick.dbs.default.db.password="*****"
getConnection of DbApi either gets connection from underlying data-source's (JdbcDataSource I presume) pool or creates a new one. I see no pool specified in your configuration, so I think it always creates a new one for you. So if you didn't close connection inside the test - getConnection won't help - it will just try to create a new one or take random connection from pool (if pooling is enabled).
So the solution is to either configure connection pooling:
When using a connection pool (which is always recommended in
production environments) the minimum size of the connection pool
should also be set to at least the same size. The maximum size of the
connection pool can be set much higher than in a blocking application.
Any connections beyond the size of the thread pool will only be used
when other connections are required to keep a database session open
(e.g. while waiting for the result from an asynchronous computation in
the middle of a transaction) but are not actively doing any work on
the database.
so you can just set maximum available connections number in your config:
connectionPool = 5
Or you can share same connection (you'll probably have to ensure sequentiality then):
object SharedConnectionForAllTests{
val connection = db.getConnection()
def close() = connection.close()
}
It's better to inject it with Spring/Guice of course, so you could conviniently manage connection's lifecycle.

Akka Actor internal state during shard migration in a cluster

we are using Akka sharding to distribute our running actors across several Nodes. Those actors are Persistent and we keep their internal state in the database.
Now we need to add ActorRef to "metrics actor", running on each node. Each actor in shard is supposed to send telemetric data to metrics actor - it must choose the right metrics actor which is running locally on the very same node. Reason is, Metric actor gathers data peer node.
Now, I was just thinking to create Metric actor in Main method (which runs initially on each node):
val mvMetrics : ActorRef = system.actorOf(MetricsActor("mv"), "mvMetrics")
and then pass that reference to ClusterSharding inicialisation as a part of Actors props object:
ClusterSharding(system).start(
typeName = shardName,
entityProps = MyShardActor.props(mvMetrics),
settings = ClusterShardingSettings(system),
extractEntityId = idExtractor,
extractShardId = shardResolver)
My question is, what happen if such created actors migrate between nodes, e.g. from Node A -> B? I can imagine that migrated props object on node B remains the same as on node A, so the ActorRef remains the same and therefore newly created actor will be sending metrics data to original node A?
Thanks
How about taking advantage of ActorRef.path? imagine that each node has its actor named in a certain way, and then an actor will dynamically find the relevant metrics actor using the path.

Actor lookup in an Akka Cluster

I have a Scala application where I have several nodes. Each node has an ActorSystem with a main actor and each actor must have some ActorRef to certain actors (for example "Node 1" has "Actor3" and "Actor3" needs the ActorRef for "Actor7" and "Actor8" to do its work). My problem is that I don't know if another node ("Node2") has the "Actor1" or the "Actor7" I'm looking for.
My idea was to loop inside every MemberUp, using the ActorSelection several times and asking every new member if it has the actors I'm looking for. Is this the only way I can do it? Is there a way to do this more efficiently?
An alternative approach to ActorSelection can be lookup table. If you need to make lots of actor selection and actor creation is not so dynamic, it can be better solution.
On each node you can create a data structure like Map[String,List[String]] first key is Node name and List value is for actor refs in this node.
Trick is when any node has change for its actors (creating, stopping) another actor should notice other nodes about changes so any nodes have synchronised updated map.
If you guaranty it, then each node can lookup actor existence;
map.get(nodeName) match {
case Some(n) => n.contains(actorName)
case None => false
}
I've solved a very similar problem in our cluster by having a DiscoveryActor at a known path on every node. The protocol of the DiscoveryActor has
Register(name, actorRef)
Subscribe(name)
Up(name, actorRef)
Down(name, actorRef)
Each named actor sends a Register to its local DiscoveryActor which in turn broadcasts the Up to all local subscribers and all other DiscoveryActor's on other nodes, which in turn broadcast to their subscribers
The DiscoveryActor watches MemberUp/MemberDown to determine when to look for a new peer DiscoveryActor and broadcast its local registrations or broadcast Down for registrations of downed peers.

Akka: "Trying to deserialize a serialized ActorRef without an ActorSystem in scope" error

I am integrating the use of Akka actors and Spark in the following way: when a task is distributed among the Spark nodes, while processing that tasks, each node also periodically sends metrics data to a different collector process that sits somewhere else on the network through the use of an Akka actor (connecting to the remote process through akka-remote).
The actor-based metrics sending/receiving functionality works just fine when used in standalone mode, but when integrated in a Spark task the following error is thrown:
java.lang.IllegalStateException: Trying to deserialize a serialized ActorRef without an ActorSystem in scope. Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'
at akka.actor.SerializedActorRef.readResolve(ActorRef.scala:407) ~[akka-actor_2.10-2.3.11.jar:na]
If I understood it correctly, the source of the problem is the Spark node being unable to deserialize the ActorRef because it does not have the full information required to do it. I understand that putting an ActorSystem in scope would fix it, but I am not sure how to use the suggested akka.serialization.Serialization.currentSystem.withValue(system) { ... }
The Akka official docs are very good in pretty much all topics they cover. Unfortunately, the chapter devoted to Serialization could be improved IMHO.
Note: there is a similar SO question here but the accepted solution is too specific and thus not really useful in the general case
An ActorSystem is responsible for all of the functionality involved with ActorRef objects.
When you program something like
actorRef ! message
You're actually invoking a bunch of work within the ActorSystem, not the ActorRef, to put the message in the right mailbox, tee-up the Actor to run the receive method within the thread pool, etc... From the documentation:
An actor system manages the resources it is configured to use in order
to run the actors which it contains. There may be millions of actors
within one such system, after all the mantra is to view them as
abundant and they weigh in at an overhead of only roughly 300 bytes
per instance. Naturally, the exact order in which messages are
processed in large systems is not controllable by the application
author
That is why your code works fine "standalone", but not in Spark. Each of your Spark nodes is missing the ActorSystem machinery, therefore even if you could de-serialize the ActorRef in a node there would be no ActorSystem to process the ! in your node function.
You can establish an ActorSystem within each node and use (i) remoting to send messages to your ActorRef in the "master" ActorSystem via actorSelection or (ii) the serialization method you mentioned where each node's ActorSystem would be the system in the example you quoted.