Querying a list of Actors in Azure Service Fabric - azure-service-fabric

I currently have a ReliableActor for every user in the system. This actor is appropriately named User, and for the sake of this question has a Location property. What would be the recommended approach for querying Users by Location?
My current thought is to create a ReliableService that contains a ReliableDictionary. The data in the dictionary would be a projection of the User data. If I did that, then I would need to:
Query the dictionary. After GA, this seems like the recommended approach.
Keep the dictionary in sync. Perhaps through Pub/Sub or IActorEvents.
Another alternative would be to have a persistent store outside Service Fabric, such as a database. This feels wrong, as it goes against some of the ideals of using the Service Fabric. If I did, I would assume something similar to the above but using a Stateless service?
Thank you very much.

I'm personally exploring the use of Actors as the main datastore (ie: source of truth) for my entities. As Actors are added, updated or deleted, I use MassTransit to publish events. I then have Reliable Statefull Services subscribed to these events. The services receive the events and update their internal IReliableDictionary's. The services can then be queried to find the entities required by the client. Each service only keeps the entity data that it requires to perform it's queries.
I'm also exploring the use of EventStore to publish the events as well. That way, if in the future I decide I need to query the entities in a new way, I could create a new service and replay all the events to it.
These Pub/Sub methods do mean the query services are only eventually consistent, but in a distributed system, this seems to be the norm.

While the standard recommendation is definitely as Vaclav's response, if querying is the exception then Actors could still be appropriate. For me whether they're suitable or not is defined by the normal way of accessing them, if it's by key (presumably for a user record it would be) then Actors work well.
It is possible to iterate over Actors, but it's quite a heavy task, so like I say is only appropriate if it's the exceptional case. The following code will build up a set of Actor references, you then iterate over this set to fetch the actors and then can use Linq or similar on the collection that you've built up.
ContinuationToken continuationToken = null;
var actorServiceProxy = ActorServiceProxy.Create("fabric:/MyActorApp/MyActorService", partitionKey);
var queriedActorCount = 0;
do
{
var queryResult = actorServiceProxy.GetActorsAsync(continuationToken, cancellationToken).GetAwaiter().GetResult();
queriedActorCount += queryResult.Items.Count();
continuationToken = queryResult.ContinuationToken;
} while (continuationToken != null);
TLDR: It's not always advisable to query over actors, but it can be achieved if required. Code above will get you started.

if you find yourself needing to query across a data set by some data property, like User.Location, then Reliable Collections are the right answer. Reliable Actors are not meant to be queried over this way.
In your case, a user could simply be a row in a Reliable Dictionary.

Related

REST new ID with DDD Aggregate

This question seemed fool at first sight for me, but then I realized that I don't have a proper answer yet, and interestingly also didn't find good explanation about it in my searches.
I'm new to Domain Driven Design concepts, so, even if the question is basic, feel free to add any considerations to it.
I'm designing in Rest API to configure Server Instances, and I came up with a Aggregate called Instance that contains a List of Configurations, only one specific Configuration will be active at a given time.
To add a Configuration, one would call an endpoint POST /instances/{id}/configurations with the body on the desired configuration. In response, if all okay, it would receive a HTTP 204 with a Header Location containing the new Configuration ID.
I'm planning to have only one Controller, InstanceController, that would call InstanceService that would manipulate the Instance Aggregate and then store to the Repo.
Since the ID's are generated by the repository, If I call Instance.addConfiguration and then InstanceRepository.store, how would I get the ID of the newly created configuration? I mean, it's a List, so It's not trivial as calling Instance.configuration.identity
A option would implement a method in Instance like, getLastAddedConfiguration, but this seems really brittle.
What is the general approach in this situation?
the ID's are generated by the repository
You could remove this extra complexity. Since Configuration is an entity of the Instance aggregate, its Id only needs to be unique inside the aggregate, not across the whole application. Therefore, the easiest is that the Aggregate assigns the ConfigurationId in the Instance.addConfiguration method (as the aggregate can easily ensure the uniqueness of the new Id). This method can return the new ConfigurationId (or the whole object with the Id if necessary).
What is the general approach in this situation?
I'm not sure about the general approach, but in my opinion, the sooner you create the Ids the better. For Aggregates, you'd create the Id before storing it (maybe a GUID), for entities, the Aggregate can create it the moment of creating/adding the entity. This allows you to perform other actions (eg publishing an event) using these Ids without having to store and retrieve the Ids from the DB, which will necessarily have an impact on how you implement and use your repositories and this is not ideal.

hazelcast spring-data write-through

I am using Spring-Boot, Spring-Data/JPA with Hazelcast client/server topology. In parts of my test application, I am calculating time when performing CRUD operations on the client side (the server is the one interacting with a relational db). I configured the map(Store) to be write-behind by setting write-delay-seconds to 10.
Spring-Data's save() returns the persisted entity. In the client app, therefore, the application flow will be blocked until the (server) returns the persisted entity.
Would like to know is there is an alternative in which case the client does NOT have to wait for the entity to persist. Was under the impression that once new data is stored in the Map, persisting to the backed happens asynchronously -> the client app would NOT have to wait.
Map config in hazelast.xml:
<map name="com.foo.MyMap">
<map-store enabled="true" initial-mode="EAGER">
<class-name>com.foo.MyMapStore</class-name>
<write-delay-seconds>10</write-delay-seconds>
</map-store>
</map>
#NeilStevenson I don't find your response particularly helpful. I asked on an earlier post about where and how to generate the Map keys. You pointed me to the documentation which fails to shed any light on this topic. Same goes for the hazelcast (and other) examples.
The point of having the cache in the 1st place, is to avoid hitting the database. When we add data (via save()), we need to also generate an unique key for the Map. This key also becomes the Entity.Id in the database table. Since, again, its the hazelcast client that generates these Ids, there is no need to wait for the record to be persisted in the backend.
The only reason to wait for save() to return the persisted object would be to catch any exceptions NOT because of the ID.
That unfortunately is how it is meant to work, see https://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html#save-S-.
Potentially the external store mutates the saved entry in some way.
Although you know it won't do this, there isn't a variant on the save defined.
So the answer seems to be this is not currently available in the general purpose Spring repository definition. Why not raise a feature request for the Spring Data team ?

Service Fabric Actors - save state to database

I'm working on a sample Service Fabric project, where I have to maintain a shopping list. For this I have a ShoppingList actor, which is identifiable by a specific id. It stores the current list content in its state using StateManager. All works fine.
However, in parallel I'd like to maintain the shopping list content in a sql database. In particular:
store all add/remove item request for future analysis (ML)
on first actor initialization load list content from db (e.g. after cluster has been re-created)
What is the best approach to achieve that? Create a custom StateProvider (how? can't find examples)?
Or maybe have another service/actor for handling all db operations (possibly using queues and reminders)?
All examples seem to completely rely on default StateManager, with no data persistence to external storage, so I'm not sure what's the best practice.
The best way will be to have a separate entity responsible for storing data to DB. And actor will just send an event (not implying SF events) with some data about performed operation, and another entity will catch it and perform the rest of the work.
But of course you can implement this thing in actor itself, but it will bring two possible issues:
Actor will be not able to process other requests if there will be some issues with DB or connectivity between actor and DB or if there will be high loading of DB itself and it will process requests slowly. The actor would have to wait till transferring to DB successfully completes.
Possible overloading of DB with many single connections from many actors instead of one or several connection from another entity and batch insertion.
So, your final solution will depend on workload of your system. But definitely you will need a reliable queue to safely store data in DB if value of such data is too high to afford a loss.
Also, I think you could use default state manager to store logs and information about transactions before it will be transferred to DB and remove from service's state after transaction completes. There is no need to have permanent storage of such data in services.
And another things to take into consideration — reading from DB. Probably, if you have relationship database and will update with new records only one table + if there will be huge amount of actors that will query such data on activation, you will have performance degradation as this table will be locked for reading or writing if you will not configure it to behave differently. So, probably, you will need caching system to read data for actors activation — depends on your workload.
And about implementing your custom State Manager: take a look at this example. Basically, all you need to do is to implement IReliableStateManagerReplica interface and pass it to StatefullService constructor.

Dependency between data store

TL;DR
What's the best way to handle dependency between types of data that is loaded asynchronously from different backend endpoints?
Problem
My app fetches data from a backend, for each entity I have an endpoint to fetch all instances.
For example api.myserver.com/v1/users for User model and api.myserver.com/v1/things for Thing model.
This data is parsed and placed into data store objects (e.g. UserDataStore and ThingDataStore) that serve these models to the rest of the app.
Question
What should I do if the data that comes from /things depends on data that comes from /users and the fetch operations are async. In my case /things returns the id of a user that created them. This means that if /things returns before /users, then I won't have enough data to create the Thing model.
Options
Have /things return also relevant /users data nested.
This is bad because:
I'll then have multiple model instances User for the same actual user - one that came from /users and one that came nested in /things.
Increases the total payload size transferred.
In a system with some permission policy, data that is returned for /users can be different to /things, and then it'll allow partially populated models to be in the app.
Create an operational dependency between the two data stores, so that ThingsDataStore will have to wait for UserDataStore to be populated before it attempts to load its own data.
This is also bad because:
Design-wise this dependency is not welcome.
Operational-wise, it will very quickly become complicated once you throw in another data stores (e.g. dependency cycles, etc).
What is the best solution for my problem and in general?
This is obviously not platform / language dependent.
I see two possible solutions:
Late initialization of UserDataStore in ThingDataStore. You will have to allow for creation an object that is not fully valid. And you will also need to add method that will give you an information whether UserDataStore is initialized or not. Not perfect, because for some time there will exists an invalid instance.
Create some kind of proxy or maybe a buider object for ThingDataStore that will hold all information about particular thing and will create ThingDataStore object as soon as UserDataStore related with this instance will be received.
Maybe it will help you. Good luck!

MSMQ querying for a specific message

I have a questing regarding MSMQ...
I designed an async arhitecture like this:
CLient - > WCF Service (hosted in WinService) -> MSMQ
so basically the WCF service takes the requests, processes them, adds them to an INPUT queue and returns a GUID. The same WCF service (through a listener) takes first message from queue (does some stuff...) and then it puts it back into another queue (OUTPUT).
The problem is how can I retrieve the result from the OUTPUT queue when a client requests it... because MSMQ does not allow random access to it's messages and the only solution would be to iterate through all messages and push them back in until I find the exact one I need. I do not want to use DB for this OUTPUT queue, because of some limitations imposed by the client...
You can look in your Output-Queue for your message by using
var mq = new MessageQueue(outputQueueName);
mq.PeekById(yourId);
Receiving by Id:
mq.ReceiveById(yourId);
A queue is inherently a "first-in-first-out" kind of data structure, while what you want is a "random access" data structure. It's just not designed for what you're trying to achieve here, so there isn't any "clean" way of doing this. Even if there was a way, it would be a hack.
If you elaborate on the limitations imposed by the client perhaps there might be other alternatives. Why don't you want to use a DB? Can you use a local SQLite DB, perhaps, or even an in-memory one?
Edit: If you have a client dictating implementation details to their own detriment then there are really only three ways you can go:
Work around them. In this case, that could involve using a SQLite DB - it's just a file and the client probably wouldn't even think of it as a "database".
Probe deeper and find out just what the underlying issue is, ie. why don't they want to use a DB? What are their real concerns and underlying assumptions?
Accept a poor solution and explain to the client that this is due to their own restriction. This is never nice and never easy, so it's really a last resort.
You may could use CorrelationId and set it when you send the message. Then, to receive the same message you can pick the specific message with ReceiveByCorrelationId as follow:
message = queue.ReceiveByCorrelationId(correlationId);
Moreover, CorrelationId is a string with the following format:
Guid()\\Number