Azure Service Fabric - External State Manager - service-fabric-stateful

As part of a Proof of Concept, We're trying to store an ASF Reliable Service or Actor state in a durable data store (like a MongoDB or DocumentDB).
The idea is to provide a custom state manager, that stores data in a database instead of memory (or disk, or whatever ASF does).
So far, We're unable to find documentation of guides showing how to provide our custom state manager to ASF when creating a Reliable Service or Actor.
Any help is highly appreciated.

When you register your Actor type, you can specify your own custom ActorService for it. Into this ActorService, you can inject your own IActorStateManager and IActorStateProvider. Modify Program.cs:
ActorRuntime.RegisterActorAsync<CustomActor>((context, actorType) =>
new CustomActorService(context, actorType,
/* Inject a factory that creates StateManagers here */
stateManagerFactory: (actorBase, provider) => new CustomActorStateManager(actorBase, provider),
/* Inject a custom state provider here */
stateProvider: new CustomStateProvider()
)
).GetAwaiter().GetResult();
A stateful service is very similar.

The built-in statemanagers are highly specialized (disk, memory, quorum), I'd recommend using the 'regular' way (api) when storing data in a different store.
So for DocumentDb, you'd simply use the client, as you would in any other program.

Related

Axon Framework - Initiate saga from a Non-Aggregate (eventGateway)

I am very new to axon. I have the following query. I am running 2 micro-services viz Payment and Order using Cassandra as an Event Store and Kafka . From the
payment micro-service. I am dispatching an event from eventGateway
#Component
#ProcessingGroup("OrderProcessor")
public class OrderEventHandler {
#Inject
private EventGateway eventGateway;
public void createOrder() {
OrderCreatedEvent orderCreatedEvent = new OrderCreatedEvent("orderId",
100,
"CREATED");
eventGateway.publish(orderCreatedEvent);
}
Also, I have configured the sagaStore and repository components
SagaViewRepository
#Repository
public interface SagaViewRepository extends CassandraRepository<SagaView, String> {
}
SagaStore
public CassandraSagaStore sagaStoreStore() {
return CassandraSagaStore(...);
}
How do I listen the above event (OrderCreatedEvent ) in SagaEvent Listener present in Order microservice. Below is the implementation
#Saga(sagaStore = "sagaStoreStore")
public class OrderManagementSaga {
#Inject
private transient CommandGateway commandGateway;
#StartSaga
#SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCreatedEvent orderCreatedEvent){
//Saga invoked;
}
Any hints in the are much appreciated
Thank You.
In virtually any project, I would not immediately go for the microservices route.
If you are taking that route, it means you are stuck in infrastructure work, like how to send a message from one service to another, instead of providing business functionality.
That doesn't mean I would not use messaging already in your application. It is the usages of commands, events and queries which allows you to change the distance of your message buses to whatever length.
Anyhow, this is more so a recommendation than an answer to your question.
To be honest, I am unsure what you are looking for. You already stated you are using Cassandra (not supported from within Axon at any means by the way) and Kafka. That makes it so that Kafka is your means to publish events between services, right? That is what Axon provides the Kafka Extension for, for example.
Do note that taking this route will require you to define different pieces of infrastructure for command, event and query dispatching, as well as event storage. Furthermore, as already shortly pointed out, Axon doesn't view Cassandra as an optimal Event Store. If you want to know why I'd recommend you take a look at this presentation.
Instead of going for "segregated infrastructure customization
, I would recommend giving Axon Server a try. It is a one-stop-shop to distribute commands, events and queries, as well as a fully optimized event store. One thing is for certain, you wouldn't have to really think about "how to dispatch an event from your Payment service to your Order service. It would simply do it as long as your Axon application is connected to Axon Server (which is also a breeze). If you want to learn more about Axon Server, Axon's Reference Guide has a dedicated section on it you can read here.
If you feel Kafka is the way to go, that's also fine of course. It will mean more work from you or your team. So you will have to account for those manhours. For more info on how to set up Axon's Kafka Extensions for event distribution, you can check this Reference Guide page. Note that Kafka will only bring you event distribution. You are thus still left with solving the problem of command distribution, query distribution and event storage.

Recommended way to handle REST parameters in Spring cloud function

I really like the way Spring cloud function decouples the business logic from the runtime target (local or cloud) and makes it easy to integrate with serverless providers.
I plan to use SCF with AWS Lambda behind an API gateway to design the backend of a system.
However, I am not completely clear on what is the recommended way to handle REST related parameters such as Query params, headers, path etc. inside the Spring cloud functions.
As per our initial analysis, we could derive two possible approaches:
When enabling “Lambda proxy integration” in API Gateway, Query params and other information are available as Message headers inside the SCF.
We can use “Mapping templates” in API Gateway to map all the required information into a JSON body and deserialize as a POJO to take input directly into the SCF.
This way, the SCF does not need to bother about how the required data is passed to the API.
What is the recommended way to achieve this? Are we missing something that enables to do this in a better way?
I don't think you are missing anything featurewise, except perhaps that it might also be convenient to work with composite functions - e.g. marshal|transform, where marshal is a Function<Message<?>, ?> and transform is the business logic. The marshal function could be generic (and convert to some sort of canonical form), and be provided as an autoconfiguration in a shared library (for instance).

Kafka Streams - Define Custom Relational/Non_Key_Value StateStore With Fault Tolerance

I am trying to implement event sourcing using kafka.
My vision for the stream processor application is a typical 3-layer Spring application in which:
The "presentation" layer is replaced by (implemented by?) Kafka streams API.
The business logic layer is utilized by the processor API in the topology.
Also, the DB is a relational H2, In-memory database which is accessed via Spring Data JPA Repositories. The repositories also implements necessary interfaces for them to be registered as Kafka state stores to use the benefits (restoration & fault tolerance)
But I'm wondering how should I implement the custom state store part?
I have been searching And:
There are some interfaces such as StateStore & StoreBuilder. StoreBuilder has a withLoggingEnabled() method; But if I enable it, when does the actual update & change logging happen? usually the examples are all key value stores even for the custom ones. What if I don't want key value? The example in interactive queries section in kafka documentation just doesn't cut it.
I am aware of interactive queries. But they seem to be good for queries & not updates; as the name suggests.
In a key value store the records that are sent to change log are straightforward. But if I don't use key value; when & how do I inform kafka that my state has changed?
You will need to implement StateStore for the actually store engine you want to use. This interface does not dictate anything about the store, and you can do whatever you want.
You also need to implement a StoreBuilder that act as a factory to create instances of your custom store.
MyCustomStore implements StateStore {
// define any interface you want to present to the user of the store
}
MyCustomStoreBuilder implements StoreBuilder<MyCustomStore> {
MyCustomStore builder() {
// create new instance of MyCustomStore and return it
}
// all other methods (except `name()`) are optional
// eg, you can do a dummy implementation that only returns `this`
}
Compare: https://docs.confluent.io/current/streams/developer-guide/processor-api.html#implementing-custom-state-stores
But if I don't use key value; when & how do I inform kafka that my state has changed?
If you want to implement withLoggingEnabled() (similar for caching), you will need to implement this logging (or caching) as part of your store. Because, Kafka Streams does not know how your store works, it cannot provide an implementation for this. Thus, it's your design decision, if your store supports logging into a changelog topic or not. And if you want to support logging, you need to come up with a design that maps store updates to key-value pairs (you can also write multiple per update) that you can write into a changelog topic and that allows you to recreate the state when reading those records fro the changelog topic.
Getting a fault-tolerant store is not only possible via change logging. For example, you could also plugin a remote store, that does replication etc internally and thus rely on the store's fault-tolerance capabilities instead of using change logging. Of course, using a remote store implies other challenges compare to using a local store.
For the Kafka Streams default stores, logging and caching is implemented as wrappers for the actual store, making it easily plugable. But you can implement this in any way that fits your store best. You might want to check out the following classes for the key-value-store as comparison:
https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java
https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/state/internals/ChangeLoggingKeyValueBytesStore.java
https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/state/internals/CachingKeyValueStore.java
For interactive queries, you implement a corresponding QueryableStoreType to integrate your custom store. Cf. https://docs.confluent.io/current/streams/developer-guide/interactive-queries.html#querying-local-custom-state-stores You are right, that Interactive Queries is a read-only interface for the existing stores, because the Processors should be responsible for maintaining the stores. However, nothing prevents you to open up your custom store for writes, too. However, this will make your application inherently non-deterministic, because if you rewind an input topic and reprocess it, it might compute a different result, depending what "external store writes" are performed. You should consider doing any write to the store via the input topics. But it's your decision. If you allow "external writes" you will need to make sure that they get logged, too, in case you want to implement logging.

Service Fabric Actors - save state to database

I'm working on a sample Service Fabric project, where I have to maintain a shopping list. For this I have a ShoppingList actor, which is identifiable by a specific id. It stores the current list content in its state using StateManager. All works fine.
However, in parallel I'd like to maintain the shopping list content in a sql database. In particular:
store all add/remove item request for future analysis (ML)
on first actor initialization load list content from db (e.g. after cluster has been re-created)
What is the best approach to achieve that? Create a custom StateProvider (how? can't find examples)?
Or maybe have another service/actor for handling all db operations (possibly using queues and reminders)?
All examples seem to completely rely on default StateManager, with no data persistence to external storage, so I'm not sure what's the best practice.
The best way will be to have a separate entity responsible for storing data to DB. And actor will just send an event (not implying SF events) with some data about performed operation, and another entity will catch it and perform the rest of the work.
But of course you can implement this thing in actor itself, but it will bring two possible issues:
Actor will be not able to process other requests if there will be some issues with DB or connectivity between actor and DB or if there will be high loading of DB itself and it will process requests slowly. The actor would have to wait till transferring to DB successfully completes.
Possible overloading of DB with many single connections from many actors instead of one or several connection from another entity and batch insertion.
So, your final solution will depend on workload of your system. But definitely you will need a reliable queue to safely store data in DB if value of such data is too high to afford a loss.
Also, I think you could use default state manager to store logs and information about transactions before it will be transferred to DB and remove from service's state after transaction completes. There is no need to have permanent storage of such data in services.
And another things to take into consideration — reading from DB. Probably, if you have relationship database and will update with new records only one table + if there will be huge amount of actors that will query such data on activation, you will have performance degradation as this table will be locked for reading or writing if you will not configure it to behave differently. So, probably, you will need caching system to read data for actors activation — depends on your workload.
And about implementing your custom State Manager: take a look at this example. Basically, all you need to do is to implement IReliableStateManagerReplica interface and pass it to StatefullService constructor.

Querying a list of Actors in Azure Service Fabric

I currently have a ReliableActor for every user in the system. This actor is appropriately named User, and for the sake of this question has a Location property. What would be the recommended approach for querying Users by Location?
My current thought is to create a ReliableService that contains a ReliableDictionary. The data in the dictionary would be a projection of the User data. If I did that, then I would need to:
Query the dictionary. After GA, this seems like the recommended approach.
Keep the dictionary in sync. Perhaps through Pub/Sub or IActorEvents.
Another alternative would be to have a persistent store outside Service Fabric, such as a database. This feels wrong, as it goes against some of the ideals of using the Service Fabric. If I did, I would assume something similar to the above but using a Stateless service?
Thank you very much.
I'm personally exploring the use of Actors as the main datastore (ie: source of truth) for my entities. As Actors are added, updated or deleted, I use MassTransit to publish events. I then have Reliable Statefull Services subscribed to these events. The services receive the events and update their internal IReliableDictionary's. The services can then be queried to find the entities required by the client. Each service only keeps the entity data that it requires to perform it's queries.
I'm also exploring the use of EventStore to publish the events as well. That way, if in the future I decide I need to query the entities in a new way, I could create a new service and replay all the events to it.
These Pub/Sub methods do mean the query services are only eventually consistent, but in a distributed system, this seems to be the norm.
While the standard recommendation is definitely as Vaclav's response, if querying is the exception then Actors could still be appropriate. For me whether they're suitable or not is defined by the normal way of accessing them, if it's by key (presumably for a user record it would be) then Actors work well.
It is possible to iterate over Actors, but it's quite a heavy task, so like I say is only appropriate if it's the exceptional case. The following code will build up a set of Actor references, you then iterate over this set to fetch the actors and then can use Linq or similar on the collection that you've built up.
ContinuationToken continuationToken = null;
var actorServiceProxy = ActorServiceProxy.Create("fabric:/MyActorApp/MyActorService", partitionKey);
var queriedActorCount = 0;
do
{
var queryResult = actorServiceProxy.GetActorsAsync(continuationToken, cancellationToken).GetAwaiter().GetResult();
queriedActorCount += queryResult.Items.Count();
continuationToken = queryResult.ContinuationToken;
} while (continuationToken != null);
TLDR: It's not always advisable to query over actors, but it can be achieved if required. Code above will get you started.
if you find yourself needing to query across a data set by some data property, like User.Location, then Reliable Collections are the right answer. Reliable Actors are not meant to be queried over this way.
In your case, a user could simply be a row in a Reliable Dictionary.