In Akka's persistent actor, is creating a child actor considered to be a side effect, or the creation of state? - persistence

This question isn't as philosophical as the title might suggest. Consider the following approach to persistence:
Commands to perform Operations come in from various Clients. I represent both Operations and Clients as persistent actors. The Client's state is the lastOperationId to pass through. The Operation's state is pretty much an FSM of the Operation's progress (it's effectively a Saga, as it then needs to reach out to other systems external to the ActorSystem in order to move through it's states).
A Reception actor receives the operation command, which contains the client id and operation id. The Reception actor creates or retrieves the Client actor and forwards it the command. The Client actor reads and validates the operation command, persists it, creates an OperationReceived event, updates its own state with the this operation id. Now it needs to create a new Operation actor to manage the new long-running operation. But here is where I get lost and all the nice examples in the documentation and on the various blogs don't help. Most commentators say that a PersistentActor converts commands to events, and then updates their state. They may also have side effects as long as they are not invoked during replay. So I have two areas of confusion:
Is the creation of an Operation actor in this context equivalent to
creating state, or performing a side effect? It doesn't seem like a side effect, but at the same time it's not changing its own state, but causing a state change in a new child.
Am I supposed to construct a Command to send to the new Operation actor or will I
simply forward it the OperationReceived event?
If I go with my assumption that creating a child actor is not a side effect, it means I must also create the child when replaying. This in turn would cause the state of the child to be recovered.
I hope the underlying question is clear. I feel it's a general question, but the best way I can formulate it is by giving a specific example.
Edit:
On reflection, I think that the creation of one persistent actor from another is an act of creating state, albeit outsourced. That means that the event that triggers the creation will trigger that creation on a subsequent replay (which will lead to the retrieval of the child's own persisted state). This makes me think that passing the event (rather than a wrapping command) might be the cleanest thing to do as the same event can be applied to update the state in both parent and child. There should be no need to persist the event as it comes into the child - it has already been persisted in the parent and will replay.

On reflection, I think that the creation of one persistent actor from another is an act of creating state, albeit outsourced. That means that the event that triggers the creation will trigger that same creation on a subsequent replay. This makes me think that passing the event (rather than a wrapping command) might be the cleanest thing to do as the same event can be applied to update the state in both parent and child. There should be no need to persist the event as it comes into the child - it has already been persisted in the parent and will replay.

Related

Akka-streams time based grouping

I have an application which listens to a stream of events. These events tend to come in chunks: 10 to 20 of them within the same second, with minutes or even hours of silence between them. These events are processed and result in an aggregate state, and this updated state is sent further downstream.
In pseudo code, it would look something like this:
kafkaSource()
.mapAsync(1)((entityId, event) => entityProcessor(entityId).process(event)) // yields entityState
.mapAsync(1)(entityState => submitStateToExternalService(entityState))
.runWith(kafkaCommitterSink)
The thing is that the downstream submitStateToExternalService has no use for 10-20 updated states per second - it would be far more efficient to just emit the last one and only handle that one.
With that in mind, I started looking if it wouldn't be possible to not emit the state after processing immediately, and instead wait a little while to see if more events are coming in.
In a way, it's similar to conflate, but that emits elements as soon as the downstream stops backpressuring, and my processing is actually fast enough to keep up with the events coming in, so I can't rely on backpressure.
I came across groupedWithin, but this emits elements whenever the window ends (or the max number of elements is reached). What I would ideally want, is a time window where the waiting time before emitting downstream is reset by each new element in the group.
Before I implement something to do this myself, I wanted to make sure that I didn't just overlook a way of doing this that is already present in akka-streams, because this seems like a fairly common thing to do.
Honestly, I would make entityProcessor into an cluster sharded persistent actor.
case class ProcessEvent(entityId: String, evt: EntityEvent)
val entityRegion = ClusterSharding(system).shardRegion("entity")
kafkaSource()
.mapAsync(parallelism) { (entityId, event) =>
entityRegion ? ProcessEvent(entityId, event)
}
.runWith(kafkaCommitterSink)
With this, you can safely increase the parallelism so that you can handle events for multiple entities simultaneously without fear of mis-ordering the events for any particular entity.
Your entity actors would then update their state in response to the process commands and persist the events using a suitable persistence plugin, sending a reply to complete the ask pattern. One way to get the compaction effect you're looking for is for them to schedule the update of the external service after some period of time (after cancelling any previously scheduled update).
There is one potential pitfall with this scheme (it's also a potential issue with a homemade Akka Stream solution to allow n > 1 events to be processed before updating the state): what happens if the service fails between updating the local view of state and updating the external service?
One way you can deal with this is to encode whether the entity is dirty (has state which hasn't propagated to the external service) in the entity's state and at startup build a list of entities and run through them to have dirty entities update the external state.
If the entities are doing more than just tracking state for publishing to a single external datastore, it might be useful to use Akka Persistence Query to build a full-fledged read-side view to update the external service. In this case, though, since the read-side view's (State, Event) => State transition would be the same as the entity processor's, it might not make sense to go this way.
A midway alternative would be to offload the scheduling etc. to a different actor or set of actors which get told "this entity updated it's state" and then schedule an ask of the entity for its current state with a timestamp of when the state was locally updated. When the response is received, the external service is updated, if the timestamp is newer than the last update.

Creating children actors in Akka Typed persistent actors

Let’s assume an application implemented using Akka Typed has a persistent actor. This persistent actor as part of its operations creates transient (or non-persistent) children actors, each child has a unique ID and these IDs are part of the persisted state. The persistent actor also needs some way of communicating with its children, but we don’t want to persist children’s ActorRefs as they aren’t really part of the state. On recovery the persistent actor should recreate its children based on the recovered state. It doesn’t sound like a very unusual use case, I’m trying to figure out what’s the cleanest way of implementing it. I could create the children actors inside the andThen Effect in my command handler which is meant for side effects, but then there’s no way to save the child’s ActorRef from there. That seems to be a more general characteristic of the typed Persistence API - it’s very hard to have non-persistent state in persistent actors (which could be used for storing the transient children ActorRefs in this case). One solution I came up with is having a sort of “proxy” actor for creating children, keeping a map of IDs and ActorRefs, and forwarding messages based on IDs. The persistent actor holds a reference to that proxy actor and contacts it every time it needs to create a new child or send something to one of the existing children. I have mixed feelings about it though and would appreciate if somebody can point me to a better solution.
If you are not using snapshots then the persistence mechanism does not store the State object, it stores the sequence of Events that led to that State object. On recovery it simply re-plays those Events in the order in which they happened and your eventHandler will return a modified State object that reflects the effect of each event.
This means that the State object can contain values that are not themselves persisted but are just set by the processing of certain Events. They are, in effect, cached values derived from the persistent values in the State.
In your case the operation that causes the creation of a transient actor will be captured as an Event on the actor. So you can create the transient actor in the eventHandler and put the ActorRef in the new State object. When the actor is recovered it will replay that event and your actor will re-create the transient actor.
If you are using snapshots then I don't think there is a requirement that the snapshot object is the same type as your State object, so you can snapshot the state without the ActorRefs and re-create them when you get the SnapshotOffer message.
It's a design goal of typed persistence that the State be fully recoverable from the events (or from a snapshot and the events since that snapshot).
In general, the only way to have state that's non-persistent is to wrap the EventSourcedBehavior in a Behaviors.setup block which sets up the state. One option for this is some sort of mutable state (e.g. a var or (likely exclusive or) mutable collection) in setup, which the command/event/recovery handlers manipulate.
A much more immutable alternative is to define an immutable fixture in setup, which includes a child actor which was spawned in setup to manage non-persistent state. You can also put things like the entity ID or other things that are immutable for at least this incarnation of the entity into the fixture.

Service Fabric actors auto delete

In a ServiceFabric app, I have the necessity to create thousands of stateful Actors, so I need to avoid accumulating Actors when they become useless.
I know I can't delete an Actor from the Actor itself, but I don't want to keep track of Actors and loop to delete them.
The Actors runtime use Garbace collection to remove the deactivated Actor objects (but not their state); so, I was thinking about removing Actor state inside the OnDeactivateAsync() method and let the GC deallocate the Actor object after the usual 60min.
In theory, something like this should be equivalent to delete the Actor, isn't it?
protected override async Task OnActivateAsync()
{
await this.StateManager.TryRemoveStateAsync("MyState");
}
Is there anything remaining that only explicit deletion can remove?
According to the docs, you shouldn't change the state from OnDeactivateAsync.
If you need your Actor to not keep persisted state, you can use attributes to change the state persistence behavior:
No persisted state: State is not replicated or written to disk. This
level is for actors that simply don't need to maintain state reliably.
[StatePersistence(StatePersistence.None)]
class MyActor : Actor, IMyActor
{
}
Finally, you can use the ActorService to query Actors, see if they are inactive, and delete them.
TL;DR There are some additional resources you can free yourself (reminders) and some that only explicit deletion can remove because they are not publicly accessible.
Service Fabric Actor repo is available on GitHub. I am using using persistent storage model which seems to use KvsActorStateProvider behind the scenes so I'll base the answer on that. There is a series of calls that starts at IActorService.DeleteActorAsync and continues over to IActorManager.DeleteActorAsync. Lot of stuff is happening in there including a call to the state provider to remove the state part of the actor. The core code that handles this is here and it seems to be removing not only the state, but also reminders and some internal actor data. In addition, if you are using actor events, all event subscribers are unsubscribed for your actor.
If you really want delete-like behavior without calling the actor runtime, I guess you could register a reminder that would delete the state and unregister itself plus other reminders.

Stateless Akka actors

I am just starting with Akka and I am trying to split some messy functions in more manageable pieces, each of which would be carried out by a different actor.
My task seems exactly what is suitable for the actor model. I have a tree of objects, which are saved in a database. Each node has some attributes; lets' concentrate on just one and call it wealth. The wealth of the children depends on the wealth of the parent. When one computes the wealth on a node, this should trigger a similar computation on the children. I want to collect all the updated instances of nodes and save them at the same time in the DB.
This seems easy: an actor just performs the computation and then starts another actor for each child of the current node. When all children send back a message with the results of the computation, the actor collects them, adds the current node and sends a message to the actor above him.
The problem is that I do not know a way to be sure that one has received a message from all the children. A trivial way would be to use a counter and increase it whenever a message from a child arrives.
But what happens if two independent parts of the system require such a computation to be performed and the same actor is reused? The actor will spawn twice as many children and the counting will not be reliable anymore. What I would need is to make sure that the same actor is not reused from the external system, but new actors are generated each time the computation is triggered.
Is this even possible? If not, is there some automatic mechanism in Akka to make sure that every child has performed its task?
Am I just using the actor model in a situation that is not suitable here? Essentially I am doing nothing more than could be done with functions - the actors themselves are stateless, but they allow me to simplify and parallelize the computation.
The way that you describe things, I think what you really want is a recursive function that returns a Future[Tree[NodeWealth]], the function would spawn a Future every time it is called and it would call itself for each child in the hierarchy, at the end of the function it would compose the Futures from those calls into a single Future[Result]. When the recursive function terminates and returns the Future[Tree[NodeWealth]] you can then compose it with another Future which updates your DB. Take a look at the Akka documentation here on Futures. And pay particular attention to the section "Composing Futures".
The composition of futures should allow you to avoid state and implement this easily.
Another option would be to use actors and futures, and use the ask pattern on child actors, compose the resulting futures into a single future which you return to the sender (parent actor). This is essentially the same thing, just intertwined with actors. I would probably only go this route if you are already representing your nodes as actors for some other reason.

How to handle concurrent access to a Scala collection?

I have an Actor that - in its very essence - maintains a list of objects. It has three basic operations, an add, update and a remove (where sometimes the remove is called from the add method, but that aside), and works with a single collection. Obviously, that backing list is accessed concurrently, with add and remove calls interleaving each other constantly.
My first version used a ListBuffer, but I read somewhere it's not meant for concurrent access. I haven't gotten concurrent access exceptions, but I did note that finding & removing objects from it does not always work, possibly due to concurrency.
I was halfway rewriting it to use a var List, but removing items from Scala's default immutable List is a bit of a pain - and I doubt it's suitable for concurrent access.
So, basic question: What collection type should I use in a concurrent access situation, and how is it used?
(Perhaps secondary: Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread?)
(Tertiary: In Scala, what collection type is best for inserts and random access (delete / update)?)
Edit: To the kind responders: Excuse my late reply, I'm making a nasty habit out of dumping a question on SO or mailing lists, then moving on to the next problem, forgetting the original one for the moment.
Take a look at the scala.collection.mutable.Synchronized* traits/classes.
The idea is that you mixin the Synchronized traits into regular mutable collections to get synchronized versions of them.
For example:
import scala.collection.mutable._
val syncSet = new HashSet[Int] with SynchronizedSet[Int]
val syncArray = new ArrayBuffer[Int] with SynchronizedBuffer[Int]
You don't need to synchronize the state of the actors. The aim of the actors is to avoid tricky, error prone and hard to debug concurrent programming.
Actor model will ensure that the actor will consume messages one by one and that you will never have two thread consuming message for the same Actor.
Scala's immutable collections are suitable for concurrent usage.
As for actors, a couple of things are guaranteed as explained here the Akka documentation.
the actor send rule: where the send of the message to an actor happens before the receive of the same actor.
the actor subsequent processing rule: where processing of one message happens before processing of the next message by the same actor.
You are not guaranteed that the same thread processes the next message, but you are guaranteed that the current message will finish processing before the next one starts, and also that at any given time, only one thread is executing the receive method.
So that takes care of a given Actor's persistent state. With regard to shared data, the best approach as I understand it is to use immutable data structures and lean on the Actor model as much as possible. That is, "do not communicate by sharing memory; share memory by communicating."
What collection type should I use in a concurrent access situation, and how is it used?
See #hbatista's answer.
Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread
The second (though the thread on which messages are processed may change, so don't store anything in thread-local data). That's how the actor can maintain invariants on its state.