How can I say when the children of supervisor are getting restarted? I want to be able to send some initialization messages to the children that are restarted (after they've been recreated). Is it possible?
I could override postRestart at each child, but I would prefer to do this in the supervisor, as he controls the initialization. Is it possible?
I've tried watching the children, but Terminated doesn't seem to trigger if the child restarts
If you define a supervision strategy in the parent, then in its decider block you can access state of that actor (the parent) just like from a receive block. You can send there a message to the child (sender will point to the failing child). Since at that point, where this supervision block decides the fate of the child the child is already suspended, this message will be only processed by the child after restart.
Related
I have a process that creates multiple threads and creates a socket.
Now i want to create a daemon process by calling a fork() and exit the parent process.
But the threads that are created by parent process get exited when parent is killed.
Is there a way I can inherit those threads and socket to child process ?
(Code is run in CPP)
But the threads that are created by parent process get exited when parent is killed.
Not exactly. The parent's threads are unaffected, and the child only ever gets the thread that called fork(). This is not the same thing as the child getting the other threads, and them thereafter terminating. In particular, no cancellation handlers or exit handlers that may have been registered by them are called in the child, and this may leave mutexes and other synchronization objects in an unusable and unrecoverable state. Cleaning up such a mess is the intended purpose of fork handlers, but those are tricky to use correctly, and they must be used consistently throughout the process to be effective.
I there a way I can inherit those threads and socker to child process ?
A child process inherits its parent's open file descriptors automatically, so nothing special needs to be done about the socket. But the other threads? No.
The POSIX documentation for fork is explicit about all this:
The new process (child process) shall be an exact copy of the calling
process (parent process) except as detailed below:
[...]
The child process shall have its own copy of the parent's file descriptors. Each of the child's file descriptors shall refer to the
same open file description with the corresponding file descriptor of
the parent.
[...]
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a
replica of the calling thread and its entire address space, possibly
including the states of mutexes and other resources. Consequently, to
avoid errors, the child process may only execute async-signal-safe
operations until such time as one of the exec functions is called.
When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that
is not async-signal-safe, the behavior is undefined.
If the objective of forking is to disassociate from the original session and parent in order to run in the background as a daemon, then your best bet is for the initial thread to do that immediately upon startup, before creating any additional threads or opening any sockets.
This question isn't as philosophical as the title might suggest. Consider the following approach to persistence:
Commands to perform Operations come in from various Clients. I represent both Operations and Clients as persistent actors. The Client's state is the lastOperationId to pass through. The Operation's state is pretty much an FSM of the Operation's progress (it's effectively a Saga, as it then needs to reach out to other systems external to the ActorSystem in order to move through it's states).
A Reception actor receives the operation command, which contains the client id and operation id. The Reception actor creates or retrieves the Client actor and forwards it the command. The Client actor reads and validates the operation command, persists it, creates an OperationReceived event, updates its own state with the this operation id. Now it needs to create a new Operation actor to manage the new long-running operation. But here is where I get lost and all the nice examples in the documentation and on the various blogs don't help. Most commentators say that a PersistentActor converts commands to events, and then updates their state. They may also have side effects as long as they are not invoked during replay. So I have two areas of confusion:
Is the creation of an Operation actor in this context equivalent to
creating state, or performing a side effect? It doesn't seem like a side effect, but at the same time it's not changing its own state, but causing a state change in a new child.
Am I supposed to construct a Command to send to the new Operation actor or will I
simply forward it the OperationReceived event?
If I go with my assumption that creating a child actor is not a side effect, it means I must also create the child when replaying. This in turn would cause the state of the child to be recovered.
I hope the underlying question is clear. I feel it's a general question, but the best way I can formulate it is by giving a specific example.
Edit:
On reflection, I think that the creation of one persistent actor from another is an act of creating state, albeit outsourced. That means that the event that triggers the creation will trigger that creation on a subsequent replay (which will lead to the retrieval of the child's own persisted state). This makes me think that passing the event (rather than a wrapping command) might be the cleanest thing to do as the same event can be applied to update the state in both parent and child. There should be no need to persist the event as it comes into the child - it has already been persisted in the parent and will replay.
On reflection, I think that the creation of one persistent actor from another is an act of creating state, albeit outsourced. That means that the event that triggers the creation will trigger that same creation on a subsequent replay. This makes me think that passing the event (rather than a wrapping command) might be the cleanest thing to do as the same event can be applied to update the state in both parent and child. There should be no need to persist the event as it comes into the child - it has already been persisted in the parent and will replay.
I am learning Scala and Akka.
In the problem I am trying to solve I want an actor to be reading a real-time data stream and perform a certain calculation that would update its state.
Every 3 seconds I am sending a request through a Scheduler for the actor to return to its state.
While I have pretty much everything implemented, with my actor having a broadcaster and receiver and the function to update the state right. I am not entirely sure how to do it, I could potentially put the calculations always running in a separate thread inside the actor but I would like to now if there is a more elegant way to make this in scala.
I would suggest to divide the work between two actors. The parent actor would manage child worker actor and would track the state. It sends a message to the child worker actor to trigger data processing.
The child worker actor processes the data stream - don't forget to wrap the processing into a Future so that it doesn't block the actor from processing messages. It also periodically sends messages to the master with current state. So the child worker is stateless, it sends notifications when its state changes.
If you want to know the current state of the work overall, you ask the master. In principle, you can merge this into one actor which sends the status message to itself. I wouldn't update the state directly to avoid concurrency issues. The reason is that the data processing work running in the Future can possible run on a different thread than message processing.
I am trying to use a hierarchy of Akka actors to handle per user state. There is a parent actor that owns all the children, and handles the get-or-create in the correct way (see a1, a2):
class UserActorRegistry extends Actor {
override def Receive = {
case msg# DoPerUserWork(userId, _) =>
val perUserActor = getOrCreateUserActor(userId)
// perUserActor is live now, but will it receive "msg"?
perUserActor.forward(msg)
}
def getOrCreateUserActor(userId: UserId): ActorRef = {
val childName = userId.toActorName
context.child(childName) match {
case Some(child) => child
case None => context.actorOf(Props(classOf[UserActor], userId), childName)
}
}
In order to reclaim memory, the UserActors expire after a period of idleness (i.e. a timer triggers the child actor to call context.stop(self)).
My problem is that I think I have a race condition between the "getOrCreateUserActor" and the child actor receiving the forwarded message -- if the child expires in that window then the forwarded message will be lost.
Is there any way I can either detect this edge case, or refactor the UserActorRegistry to preclude it?
I can see two problems with your current design that open yourself up to the race condition you mention:
1) Having the termination condition (timer sending a poison pill) go directly to the child actor. By taking this approach, the child can certainly be terminated on a separate thread (within the dispatcher) while at the same time, a message has been setup to be forwarded to it in the UserActorRegistry actor (on a different thread within the dispatcher).
2) Using a PoisonPill to terminate the child. A PoisonPill is for a graceful stop, allowing for other messages in the mailbox to be processed first. In your case, you are terminating due to inactivity, which seems to indicate no other messages already in the mailbox. I see a PoisonPill as wrong here because in your case, another message might be sent after the PosionPill and that message would surely be lost after the PoisonPill is processed.
So I'm going to suggest that you delegate the termination of the inactive children to the UserActorRegistry as opposed to doing it in the children themselves. When you detect the condition of inactivity, send a message to the instance of UserActorRegistry indicating that a particular child needs to be terminated. When you receive that message, terminate that child via stop instead of sending a PoisonPill. By using the single mailbox of the UserActorRegistry which is processed in a serial manner, you can help ensure that a child is not about to be terminated in parallel while you are about to send it a message.
Now, there is a complication here that you have to deal with. Stopping an actor is asynchronous. So if you call stop on a child, it might not be completely stopped when you are processing a DoPerUserWork message and thus might send it a message that will be lost because it's in the process of stopping. You can solve this by keeping some internal state (a List) that represents children that are in the process of being stopped. When you stop a child, add its name to that list and then setup DeathWatch (via context watch child) on it. When you receive the Terminated event for that child, remove it's name from the list of children being terminated. If you receive work for a child while its name is in that list, requeue it for re-processing, maybe up to a max number of times so as to not try and reprocess forever.
This is not a perfect solution; it's just an identification of some of the issues with your approach and a push in the right direction for solving some of them. Let me know if you want to see the code for this and I'll whip something together.
Edit
In response to your second comment. I don't think you'll be able to look at a child ActorRef and see that it's currently shutting down, thus the need for that list of children that are in the process of being shutdown. You could enhance the DoPerUserWork message to contain a numberOfAttempts:Int field and increment this and send back to self for reprocessing if you see the target child is currently shutting down. You could then use the numberOfAttempts to prevent re-queuing forever, stopping at some max number of attempts. If you don't feel completely comfortable relying on DeathWatch, you could add a time-to-live component to the items in the list of children shutting down. You could then prune items as you encounter them if they are in the list but have been in there too long.
I am confused by behavior I am seeing in Akka. Briefly, I have a set of actors performing scientific calculations (star formation simulation). They have some state. When an error occurs such that one or more enter an invalid state, I want to restart the whole set to start over. I also want to do this if a single calc (over the entire set) takes too long (there is no way to predict in advance how long it may run).
So, there is the set of Simulation actors at the bottom of the tree, then a Director above them (that creates them via a Router, and sends them messages via that Router as well). There is one more Director level above that to create Directors on different machines and collect results from them all.
I handle the timeout case by using the Akka Scheduler to create a one-time timeout event, in the local Director, when the simulation is started. When the Director gets this event, if all its Simulation actors have not finished, it does this:
children ! Broadcast(Kill)
where children is the Router that owns/created them - this sends a Kill to all the children (SimulActors).
What I thought would occur is that all the child actors would be restarted. However, their preRestart() hook method is never called. I see the Kill message received, but that's it.
I must be missing something fundamental here. I have read the Akka docs on this topic and I have to say I find them less than clear (especially the page on Supervisors). I would really appreciate either a thorough explanation of the Kill/restart process, or just some other references (Google wasn't very helpful).
Note
If the child of a router terminates, the router will not automatically
spawn a new child. In the event that all children of a router have
terminated the router will terminate itself.
Taken from the akka docs.
I would consider using a supervision strategy - akka has behavior built in for killing all actors (all for one strategy) and you can define the specific strategy - eg restart.
I think a more idiomatic way to run this would be to have the actors throw x exception if they're not done after a period of time and then the supervisor handle that via supervision strategy.
You could throw a not done exception from the child and then define the behaviour like so:
override val supervisorStrategy =
AllForOneStrategy(maxNrOfRetries = 0) {
case _: NotDoneException ⇒ Stop
case _: Exception ⇒ Restart
}
It's important to understand that a restart means stopping the old actor and creating a new separate object/Actor
References:
http://doc.akka.io/docs/akka/snapshot/scala/fault-tolerance.html
http://doc.akka.io/docs/akka/snapshot/general/supervision.html