Initialize an entity on startup on Akka Sharding

Initialize an entity on startup on Akka Sharding - scala

How could I prestart an entity at cluster startup? I have found a way to do so but I think it is not the right way to do. It consists of sending a StartEntity(entityId) message to the shard region on every node. Suppose I have 1000 entities to initialize. It seems very unperformant (explosion of messages in the cluster since every node tries to initialize the remote entity)!
val shardRegion: ActorRef[ShardingEnvelope[Command]] =
sharding.init(Entity(HelloServiceEntity)(createBehavior = ctx => HelloWorldService()))
Seq("S0", "S1").foreach { id =>
shardRegion ! StartEntity(id)
}
Is there any efficient way to achieve what I want? I could not find an official post or documentation about it. Am I doing this wrong?

I had an idea ! I could use a cluster Singleton whose job would be to initialize entities. That’s the most efficient way I came up with without getting into internals and having to propose a pull request :joy:

Related

Vertx to mongoDB connections

I'm working on a Java/vertx project where the backend is MongoDB (I used to work with Elixir/Erlang since some time, and I'm quite new to vertx but I believe it's the best fit). Basically, I have an http API handled by some HttpServerVerticles which need to store data to (or retrieve data from) the mongo db and to send the appropriate reply to the API caller. I'm looking for the right pattern to implement the queries and the handling of the replies.
From the official guide and some tutorials, I see that for a relational JDBC database, it is necessary to define a dedicated verticle that will handle queries asynchronously. This was my first try with the mongo client but it introduces a lot of boilerplate.
On the other hand, from the mongo client documentation I read that it's Completely non-blocking and that it has its own connection pool. Does that mean that we can safely (from vertx event loop point of view), define and use the mongo client directly in the http verticle ?
Is there any alternative pattern ?
Versions : vertx:3.5.4 / mongodb:4.0.3

It's like that: mongo connection pool is exactly like SQL-db pool synchronous and blocking in it's nature, but is wrapped with non-blocking vert.x API around.
So, instead of a normal blocking way of
JsonObject obj = mongo.get( someQuery )
you have rather a non-blocking call out of the box:
mongo.findOne( 'collectionName', someQuery ){ AsyncResult<JsonObject> res ->
JsonObject obj = res.result()
doStuff( obj )
}
That means, that you can safely use it directly on the event-loop in any type of verticle without reinventing the asyncronous wheel over and over again.

At our client we use mongodb-driver-rx. Vertx has support for RX (vertx-rx-java) and it fits pretty well on mongodb-driver-rx.
For more information see:
https://mongodb.github.io/mongo-java-driver-rx/
https://vertx.io/docs/vertx-rx/java/
https://github.com/vert-x3/vertx-examples/blob/master/rxjava-2-examples/src/main/java/io/vertx/example/reactivex/database/mongo/Client.java

Querying a list of Actors in Azure Service Fabric

I currently have a ReliableActor for every user in the system. This actor is appropriately named User, and for the sake of this question has a Location property. What would be the recommended approach for querying Users by Location?
My current thought is to create a ReliableService that contains a ReliableDictionary. The data in the dictionary would be a projection of the User data. If I did that, then I would need to:
Query the dictionary. After GA, this seems like the recommended approach.
Keep the dictionary in sync. Perhaps through Pub/Sub or IActorEvents.
Another alternative would be to have a persistent store outside Service Fabric, such as a database. This feels wrong, as it goes against some of the ideals of using the Service Fabric. If I did, I would assume something similar to the above but using a Stateless service?
Thank you very much.

I'm personally exploring the use of Actors as the main datastore (ie: source of truth) for my entities. As Actors are added, updated or deleted, I use MassTransit to publish events. I then have Reliable Statefull Services subscribed to these events. The services receive the events and update their internal IReliableDictionary's. The services can then be queried to find the entities required by the client. Each service only keeps the entity data that it requires to perform it's queries.
I'm also exploring the use of EventStore to publish the events as well. That way, if in the future I decide I need to query the entities in a new way, I could create a new service and replay all the events to it.
These Pub/Sub methods do mean the query services are only eventually consistent, but in a distributed system, this seems to be the norm.

While the standard recommendation is definitely as Vaclav's response, if querying is the exception then Actors could still be appropriate. For me whether they're suitable or not is defined by the normal way of accessing them, if it's by key (presumably for a user record it would be) then Actors work well.
It is possible to iterate over Actors, but it's quite a heavy task, so like I say is only appropriate if it's the exceptional case. The following code will build up a set of Actor references, you then iterate over this set to fetch the actors and then can use Linq or similar on the collection that you've built up.
ContinuationToken continuationToken = null;
var actorServiceProxy = ActorServiceProxy.Create("fabric:/MyActorApp/MyActorService", partitionKey);
var queriedActorCount = 0;
do
{
var queryResult = actorServiceProxy.GetActorsAsync(continuationToken, cancellationToken).GetAwaiter().GetResult();
queriedActorCount += queryResult.Items.Count();
continuationToken = queryResult.ContinuationToken;
} while (continuationToken != null);
TLDR: It's not always advisable to query over actors, but it can be achieved if required. Code above will get you started.

if you find yourself needing to query across a data set by some data property, like User.Location, then Reliable Collections are the right answer. Reliable Actors are not meant to be queried over this way.
In your case, a user could simply be a row in a Reliable Dictionary.

RedisClient fails strategy

I'm using play framework with scala. I also use RedisScala driver (this one https://github.com/etaty/rediscala ) to communicate with Redis. If Redis doesn't contain data then my app is looking for data in MongoDB.
When Redis fails or just not available for some reason then application waits a response too long. How to implement failover strategy in this case. I would like to stop requesting Redis if requests take too long time. And start working with Redis when it is back online.
To clarify the question my code is like following now
private def getUserInfo(userName: String): Future[Option[UserInfo]] = {
CacheRepository.getBaseUserInfo(userName) flatMap{
case Some(userInfo) =>
Logger.trace(s"AuthenticatedAction.getUserInfo($userName). User has been found in cache")
Future.successful(Some(userInfo))
case None =>
getUserFromMongo(userName)
}
}

I think you need to distinguish between the following cases (in order of their likelihood of occurrence) :
No Data in cache (Redis) - I guess in this case, Redis will return very quickly and you have to get it from Mongo. In your code above you need to set the data in Redis after you get it from Mongo so that you have it in the cache for subsequent calls.
You need to wrap your RedisClient in your application code aware of any disconnects/reconnects. Essentially have a two states - first, when Redis is working properly, second, when Redis is down/slow.
Redis is slow - this could happen because of one of the following.
2.1. Network is slow: Again, you cannot do much about this but to return a message to your client. Going to Mongo is unlikely to resolve this if your network itself is slow.
2.2. Operation is slow: This happens if you are trying to get a lot of data or you are running a range query on a sorted set, for example. In this case you need to revisit the Redis data structure you are using the the amount of data you are storing in Redis. However, looks like in your example, this is not going to be an issue. Single Redis get operations are generally low latency on a LAN.
Redis node is not reachable - I'm not sure how often this is going to happen unless your network is down. In such a case you also will have trouble connecting to MongoDB as well. I believe this can also happen when the node running Redis is down or its disk is full etc. So you should handle this in your design. Having said that the Rediscala client will automatically detect any disconnects and reconnect automatically. I personally have done this. Stopped and updgrade Redis version and restart Redis without touching my running client(JVM).
Finally, you can use a Future with a timeout (see - Scala Futures - built in timeout?) in your program above. If the Future is not completed by the timeout you can take your other action(s) (go to Mongo or return an error message to the user). Given that #1 and #2 are likely to happen much more frequently than #3, you timeout value should reflect these two cases. Given that #1 and #2 are fast on a LAN you can start with a timeout value of 100ms.

Soumya Simanta provided detailed answer and I just would like to post code I used for timeout. The code requires Play framework which is used in my project
private def get[B](key: String, valueExtractor: Map[String, ByteString] => Option[B], logErrorMessage: String): Future[Option[B]] = {
val timeoutFuture = Promise.timeout(None, Duration(Settings.redisTimeout))
val mayBeHaveData = redisClient.hgetall(key) map { value =>
valueExtractor(value)
} recover{
case e =>
Logger.info(logErrorMessage + e)
None
}
// if timeout occurred then None will be result of method
Future.firstCompletedOf(List(mayBeHaveData, timeoutFuture))
}

better solution than timertask with scala in play framework

I need to regularly check the database for updated records. I currently use TimerTask which works fine. However, I've found its efficiency is not good and consumes a lot of server resouces. Is there a solution which can fulfill my requirement but is better?
def checknewmessages() = Action{
request =>
TimerTask(5000){
//code to check database
}
}

I can think of two solutions:
You can use the ReactiveMongo driver for Play which is completely non-blocking and async and capped collection in Mongo DB.
Please see this for an example -
https://github.com/sgodbillon/reactivemongo-tailablecursor-demo
How to listen for changes to a MongoDB collection?
If you are using a database that doesn't support a push mechanisms you can implement that using an Actor by scheduling messages to itself at regular intervals.

If your logic is in your database (stored procedures etc) you could simply create a cron job.
You could also create a command line script that encapsulates the logic and schedule (cron again).
If you have your logic in your web application, you could again create a cron job that simply makes an API call to your app.

MSMQ querying for a specific message

I have a questing regarding MSMQ...
I designed an async arhitecture like this:
CLient - > WCF Service (hosted in WinService) -> MSMQ
so basically the WCF service takes the requests, processes them, adds them to an INPUT queue and returns a GUID. The same WCF service (through a listener) takes first message from queue (does some stuff...) and then it puts it back into another queue (OUTPUT).
The problem is how can I retrieve the result from the OUTPUT queue when a client requests it... because MSMQ does not allow random access to it's messages and the only solution would be to iterate through all messages and push them back in until I find the exact one I need. I do not want to use DB for this OUTPUT queue, because of some limitations imposed by the client...

You can look in your Output-Queue for your message by using
var mq = new MessageQueue(outputQueueName);
mq.PeekById(yourId);
Receiving by Id:
mq.ReceiveById(yourId);

A queue is inherently a "first-in-first-out" kind of data structure, while what you want is a "random access" data structure. It's just not designed for what you're trying to achieve here, so there isn't any "clean" way of doing this. Even if there was a way, it would be a hack.
If you elaborate on the limitations imposed by the client perhaps there might be other alternatives. Why don't you want to use a DB? Can you use a local SQLite DB, perhaps, or even an in-memory one?
Edit: If you have a client dictating implementation details to their own detriment then there are really only three ways you can go:
Work around them. In this case, that could involve using a SQLite DB - it's just a file and the client probably wouldn't even think of it as a "database".
Probe deeper and find out just what the underlying issue is, ie. why don't they want to use a DB? What are their real concerns and underlying assumptions?
Accept a poor solution and explain to the client that this is due to their own restriction. This is never nice and never easy, so it's really a last resort.

You may could use CorrelationId and set it when you send the message. Then, to receive the same message you can pick the specific message with ReceiveByCorrelationId as follow:
message = queue.ReceiveByCorrelationId(correlationId);
Moreover, CorrelationId is a string with the following format:
Guid()\\Number

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Initialize an entity on startup on Akka Sharding - scala

I had an idea ! I could use a cluster Singleton whose job would be to initialize entities. That’s the most efficient way I came up with without getting into internals and having to propose a pull request :joy:

Related

Vertx to mongoDB connections

Querying a list of Actors in Azure Service Fabric

RedisClient fails strategy

better solution than timertask with scala in play framework

MSMQ querying for a specific message

Categories

Resources