Order fulfilment with Akka FSM, storing state - scala

I am trying to build order fulfilment component with AKKA FSM. I have few basic doubts on how the state is been stored and taken further upon event from user.
Consider states
ORDER_CLEAN, ORDER_INIT, ORDER_PAYMENT_WAITING, ORDER_PAYMENT_SUCCESS, ORDER_DELIVERY, ORDER_COMPLETE
Events as
EV_CART_CHECKOUT, EV_PROCEED_PAYMENT, EV_PAYMENT_SUCCESSFUL, EV_ITEMS_PACKED, EV_DELIVERED
State changes as
(EV_CART_CHECKOUT, ORDER_CLEAN) -> ORDER_INIT
(EV_PROCEED_PAYMENT, ORDER_INIT) -> ORDER_PAYMENT_WAITING
(EV_PAYMENT_SUCCESSFUL, ORDER_PAYMENT_WAITING) -> ORDER_PAYMENT_SUCCESS
(EV_ITEMS_PACKED, ORDER_PAYMENT_SUCCESS) -> ORDER_DELIVERY
(EV_DELIVERED, ORDER_DELIVERY) -> ORDER_COMPLETE
Questions
When we create FSM actors starting at ORDER_CLEAN with event EV_CART_CHECKOUT, would this actor is alive till we bring it to ORDER_COMPLETE(assuming we stop actor at this state) state?
If yes to above point, in that case as we store order status on database how do we trigger new event on that actor? Is that do we need to maintain order_id to actor mapping and trigger event? What if there are 10K unique orders are currently being processed then we maintain mapping for all 10K actors is it? If so what is best data structure for maintaining these mappings for larger number of orders?
In continuation to 2nd point, what if actors go down how to bring back actors to same state? Is supervisor actor only way to solve this? Or do we need to check actor status and then send event?
At any point of state, user might not trigger next event may be for days, then is it good to keep actor live for such longer time or is it good to create new actor with updated state?
What are the better approaches to address these problems with akka FSM

If we are talking about non-persistent Actor, generally speaking, we
can't assume it will be alive between events. You simply might
restart or redeploy the service, so the answer to your 1. question
is no.
To trigger a new event to the actor, you should create this actor initialise state machine with last valid state from the DB.
You could either use Akka Persistence or just read current order state from the DB and pass it to the actor
Actors are very lightweight objects, but talking about 10k events I would suggest to terminate actor after each transition

Related

Akka-streams time based grouping

I have an application which listens to a stream of events. These events tend to come in chunks: 10 to 20 of them within the same second, with minutes or even hours of silence between them. These events are processed and result in an aggregate state, and this updated state is sent further downstream.
In pseudo code, it would look something like this:
kafkaSource()
.mapAsync(1)((entityId, event) => entityProcessor(entityId).process(event)) // yields entityState
.mapAsync(1)(entityState => submitStateToExternalService(entityState))
.runWith(kafkaCommitterSink)
The thing is that the downstream submitStateToExternalService has no use for 10-20 updated states per second - it would be far more efficient to just emit the last one and only handle that one.
With that in mind, I started looking if it wouldn't be possible to not emit the state after processing immediately, and instead wait a little while to see if more events are coming in.
In a way, it's similar to conflate, but that emits elements as soon as the downstream stops backpressuring, and my processing is actually fast enough to keep up with the events coming in, so I can't rely on backpressure.
I came across groupedWithin, but this emits elements whenever the window ends (or the max number of elements is reached). What I would ideally want, is a time window where the waiting time before emitting downstream is reset by each new element in the group.
Before I implement something to do this myself, I wanted to make sure that I didn't just overlook a way of doing this that is already present in akka-streams, because this seems like a fairly common thing to do.
Honestly, I would make entityProcessor into an cluster sharded persistent actor.
case class ProcessEvent(entityId: String, evt: EntityEvent)
val entityRegion = ClusterSharding(system).shardRegion("entity")
kafkaSource()
.mapAsync(parallelism) { (entityId, event) =>
entityRegion ? ProcessEvent(entityId, event)
}
.runWith(kafkaCommitterSink)
With this, you can safely increase the parallelism so that you can handle events for multiple entities simultaneously without fear of mis-ordering the events for any particular entity.
Your entity actors would then update their state in response to the process commands and persist the events using a suitable persistence plugin, sending a reply to complete the ask pattern. One way to get the compaction effect you're looking for is for them to schedule the update of the external service after some period of time (after cancelling any previously scheduled update).
There is one potential pitfall with this scheme (it's also a potential issue with a homemade Akka Stream solution to allow n > 1 events to be processed before updating the state): what happens if the service fails between updating the local view of state and updating the external service?
One way you can deal with this is to encode whether the entity is dirty (has state which hasn't propagated to the external service) in the entity's state and at startup build a list of entities and run through them to have dirty entities update the external state.
If the entities are doing more than just tracking state for publishing to a single external datastore, it might be useful to use Akka Persistence Query to build a full-fledged read-side view to update the external service. In this case, though, since the read-side view's (State, Event) => State transition would be the same as the entity processor's, it might not make sense to go this way.
A midway alternative would be to offload the scheduling etc. to a different actor or set of actors which get told "this entity updated it's state" and then schedule an ask of the entity for its current state with a timestamp of when the state was locally updated. When the response is received, the external service is updated, if the timestamp is newer than the last update.

Service Fabric actors auto delete

In a ServiceFabric app, I have the necessity to create thousands of stateful Actors, so I need to avoid accumulating Actors when they become useless.
I know I can't delete an Actor from the Actor itself, but I don't want to keep track of Actors and loop to delete them.
The Actors runtime use Garbace collection to remove the deactivated Actor objects (but not their state); so, I was thinking about removing Actor state inside the OnDeactivateAsync() method and let the GC deallocate the Actor object after the usual 60min.
In theory, something like this should be equivalent to delete the Actor, isn't it?
protected override async Task OnActivateAsync()
{
await this.StateManager.TryRemoveStateAsync("MyState");
}
Is there anything remaining that only explicit deletion can remove?
According to the docs, you shouldn't change the state from OnDeactivateAsync.
If you need your Actor to not keep persisted state, you can use attributes to change the state persistence behavior:
No persisted state: State is not replicated or written to disk. This
level is for actors that simply don't need to maintain state reliably.
[StatePersistence(StatePersistence.None)]
class MyActor : Actor, IMyActor
{
}
Finally, you can use the ActorService to query Actors, see if they are inactive, and delete them.
TL;DR There are some additional resources you can free yourself (reminders) and some that only explicit deletion can remove because they are not publicly accessible.
Service Fabric Actor repo is available on GitHub. I am using using persistent storage model which seems to use KvsActorStateProvider behind the scenes so I'll base the answer on that. There is a series of calls that starts at IActorService.DeleteActorAsync and continues over to IActorManager.DeleteActorAsync. Lot of stuff is happening in there including a call to the state provider to remove the state part of the actor. The core code that handles this is here and it seems to be removing not only the state, but also reminders and some internal actor data. In addition, if you are using actor events, all event subscribers are unsubscribed for your actor.
If you really want delete-like behavior without calling the actor runtime, I guess you could register a reminder that would delete the state and unregister itself plus other reminders.

In Akka's persistent actor, is creating a child actor considered to be a side effect, or the creation of state?

This question isn't as philosophical as the title might suggest. Consider the following approach to persistence:
Commands to perform Operations come in from various Clients. I represent both Operations and Clients as persistent actors. The Client's state is the lastOperationId to pass through. The Operation's state is pretty much an FSM of the Operation's progress (it's effectively a Saga, as it then needs to reach out to other systems external to the ActorSystem in order to move through it's states).
A Reception actor receives the operation command, which contains the client id and operation id. The Reception actor creates or retrieves the Client actor and forwards it the command. The Client actor reads and validates the operation command, persists it, creates an OperationReceived event, updates its own state with the this operation id. Now it needs to create a new Operation actor to manage the new long-running operation. But here is where I get lost and all the nice examples in the documentation and on the various blogs don't help. Most commentators say that a PersistentActor converts commands to events, and then updates their state. They may also have side effects as long as they are not invoked during replay. So I have two areas of confusion:
Is the creation of an Operation actor in this context equivalent to
creating state, or performing a side effect? It doesn't seem like a side effect, but at the same time it's not changing its own state, but causing a state change in a new child.
Am I supposed to construct a Command to send to the new Operation actor or will I
simply forward it the OperationReceived event?
If I go with my assumption that creating a child actor is not a side effect, it means I must also create the child when replaying. This in turn would cause the state of the child to be recovered.
I hope the underlying question is clear. I feel it's a general question, but the best way I can formulate it is by giving a specific example.
Edit:
On reflection, I think that the creation of one persistent actor from another is an act of creating state, albeit outsourced. That means that the event that triggers the creation will trigger that creation on a subsequent replay (which will lead to the retrieval of the child's own persisted state). This makes me think that passing the event (rather than a wrapping command) might be the cleanest thing to do as the same event can be applied to update the state in both parent and child. There should be no need to persist the event as it comes into the child - it has already been persisted in the parent and will replay.
On reflection, I think that the creation of one persistent actor from another is an act of creating state, albeit outsourced. That means that the event that triggers the creation will trigger that same creation on a subsequent replay. This makes me think that passing the event (rather than a wrapping command) might be the cleanest thing to do as the same event can be applied to update the state in both parent and child. There should be no need to persist the event as it comes into the child - it has already been persisted in the parent and will replay.

Where to put calculation executed regularly that updates actor's internal state?

I am learning Scala and Akka.
In the problem I am trying to solve I want an actor to be reading a real-time data stream and perform a certain calculation that would update its state.
Every 3 seconds I am sending a request through a Scheduler for the actor to return to its state.
While I have pretty much everything implemented, with my actor having a broadcaster and receiver and the function to update the state right. I am not entirely sure how to do it, I could potentially put the calculations always running in a separate thread inside the actor but I would like to now if there is a more elegant way to make this in scala.
I would suggest to divide the work between two actors. The parent actor would manage child worker actor and would track the state. It sends a message to the child worker actor to trigger data processing.
The child worker actor processes the data stream - don't forget to wrap the processing into a Future so that it doesn't block the actor from processing messages. It also periodically sends messages to the master with current state. So the child worker is stateless, it sends notifications when its state changes.
If you want to know the current state of the work overall, you ask the master. In principle, you can merge this into one actor which sends the status message to itself. I wouldn't update the state directly to avoid concurrency issues. The reason is that the data processing work running in the Future can possible run on a different thread than message processing.

Lifecyle of an actor in actor model

I'm a newbie to actor model. Could anyone please explain the lifecycle of an actor in actor model? I've been looking for the answer in the documentation, but I couldn't find anything satisfactory.
I'm interested in what an actor does after it completes the onReceive() method - is it still alive or is it dead? Can we control its lifetime to say "don't die, wait there for the next message"? For example, with a round-robin router, if I set it to have 5 actors - would it always distribute the work across the same 5 actors? Or actors are destroyed and created anytime there is a message, but the maximum limit is always 5.
Thanks!
The Actor is always alive unless you explicitly "kill" it (or it crashes somehow). When it receives a message, it will "use" a thread, process the message, then go back to an "idle" state. When it receives another message, it becomes "active" again.
In the case of a round-robin router with 5 Actors, it is the same 5 Actors - the router does not create new ones each time a message is sent to the router.
The Actor model follows an "isolated mutability" (concurrency) model - it encapsulates state only to itself - other Actors are not able to touch this state directly, they can only interact with it via message passing. The Actors must be "alive" in order to keep the state.