How can I check if a remote actor, for which I have obtained an actorRef via actorFor, is alive? Any reference to documentation would be appreciated. I am using Akka from Scala.
I've seen reference to supervisors and deathwatch, but don't really feel my use-case needs such heavy machinery. I just want to have my client check if the master is up using a known path, and if it is send a message introducing itself. If the master is not up, then it should wait for a bit then retry.
Update 2:
Suggestions are that I just use a ping-pong ask test to see if it's alive. I understand this to be something like
implicit val timeout = Timeout(5 seconds)
val future = actor ? AreYouAlive
try{
Await.result(future, timeout.duration)
}catch{
case e:AskTimeoutException => println("It's not there: "+e)
}
I think I've been confused by the presence of exceptions in the logs, which remain there now. E.g.
Error: java.net.ConnectException:Connection refused
Error: java.nio.channels.ClosedChannelException:null
Perhaps this is just how it works and I must accept the errors/warning in the logs rather than try to protect against them?
Just send it messages. Its machine could become unreachable the nanosecond after you sent your message anyway. IF you don't get any reply, it is most likely dead. There's a large chapter on this in the docs: http://doc.akka.io/docs/akka/2.0.1/general/message-send-semantics.html
You should never assume that the network is available. Our architect here always says that there are two key concepts that come into play in distributed system design.
They are:
Timeout
Retry
Messages should 'timeout' if they don't make it after x period of time and then you can retry the message. With timeout you don't have to worry about the specific error - only that message response has failed. For high levels of availability you may want to consider using tools such as zookeeper to handle clustering/availability monitoring. See leader election here for example: http://zookeeper.apache.org/doc/trunk/recipes.html
Related
I started to learn Akka and came across a challenge for which I can't find an easy solution, despite having waded through the documentation and related Stakoverflow questions:
Building on the Client-Side Websocket Support example on the Akka website, I am using as the basis the following code snippet in Scala:
val flow: Flow[Message, Message, Future[Done]] =
Flow.fromSinkAndSourceMat(printSink, Source.maybe)(Keep.left)
val (upgradeResponse, closed) =
Http().singleWebSocketRequest(WebSocketRequest("ws://localhost/ws"), flow)
The use case I have is a client (printSink) consuming a continuous stream from the websocket server. The communication uni-directional only, thus no need for a source.
My question is then as follows:
I need to regularly force a re-connection to the websocket server, and for that I need to disconnect first. But for the life of me, I can't find a way to do a simple disconnect
In a somewhat opposite scenario, I need to keep the websocket connection alive and "swap out" the sink. Is this even possible, i.e. without creating another websocket connection?
For question 1 (forcing a disconnect from the client), this should work
val flow: Flow[Message, Message, (Future[Done], Promise[Option[Message])] =
Flow.fromSinkAndSourceMat(
printSink,
Source.maybe
)(Keep.both)
val (upgradeResponse, (closed, disconnect)) =
Http().singleWebsocketRequest(WebSocketRequest("ws://localhost/ws"), flow)
disconnect can then be completed with a None to disconnect:
disconnect.success(None)
For question 2, my intuition is that that sort of dynamic stream operation would seem to require a custom stream operator (i.e. one level below the Graph DSL and two levels below the "normal" scaladsl/javadsl). I don't have a huge amount of direct experience there, to be honest.
In my project, we often use stateless actors. Reason is that we want to use these actors for fire-and-forget messages.
This provides us a quick way to perform an async task without creating and managing futures ourselves.
This works very well, but one of the issues is that testing this stuff is really hard. I wonder how can I write the test case for this.
One obvious thought is that at the end of the code execution I can do sender ! EmptySuccess and then the test cases could use the ask pattern to look whether they got the EmptySuccess or not.
The problem is that in production all the code will use ! on the actor reference and therefore we may end up with lots of dead letter messages which may pollute our logs (because the senders don't really wait for receiving the answer from the actor).
Edit: We don't want to switch to futures as of now. Reason is that this is legacy code and if we cannot turtle our future all the way down, because this will mean a lot of code change.
The best solution for this is likely in the akka testkit.
http://doc.akka.io/docs/akka/current/scala/testing.html
If you just want to test that an actor is sending messages to another actor and that they are received, send messages to a test probe. You can then inspect that probe and do really useful things like ensure it received x number of messages in n seconds or use should matchers on the messages in the probe
I am not sure what approach to follow for Akka supervision.
I have an Akka actor that lists files from a FTP server when a message triggers it. If the connection is broken, the actor will fail with an exception (say, IOException) which will trigger supervision. At this point, I see two alternatives:
I keep resuming / restarting the actor until the server comes back up, maybe with an exponential backoff
I set parameters (such as maxNrOfRetries = xy) in a way that the supervisor will give up and stop the actor after xy times
The first strategy seems wasteful, but the second one seems to bring another issue: how would the actor be eventually restarted? I have the feeling that tuning the parameters of the Backoff supervisor is the best way to go, but maybe I'm missing something?
If you need to restart the actor eventually without knowing when the connection will be up again, exponential backoff (with an upper limit of say 60 seconds?) seems reasonable.
This way you have a fast reconnect if the connection is just lost for a few seconds and with the back off your not wasting resources. The upper limit on the backoff sets the maximum time your actor is offline even though the connection might be back up.
I want to check whether ClusterSharding started on not for one region. Here is the code:
def someMethod: {
val system = ActorSystem("ClusterSystem", ConfigFactory.load())
val region: ActorRef = ClusterSharding(system).shardRegion("someActorName")
}
Method akka.contrib.pattern.ClusterSharding#shardRegion throws IllegalArgumentException if it do not find shardRegion. I do not like approach to catch IllegalArgumentException just to check that ClusterSharding did not started.
Is there another approach like ClusterSharding(system).isStarted(shardRegionName = "someActorName")?
Or it is assumed that I should start all shardingRegion at ActorSystem start up?
You should indeed start all regions as soon as possible. According to the docs:
"When using the sharding extension you are first, typically at system startup on each node in the cluster, supposed to register the supported entry types with the ClusterSharding.start method."
Startup of a region is not immediate. In particular, even in local cases, it would take at the very least the time specified in the akka.contrib.cluster.sharding.retry-interval (the name is misleading: this value is both the initial delay of registration and the retry interval) parameter of your configuration before your sharded actors can effectively receive messages (the messages sent in that period are not lost, but not delivered until after a while).
If you want to be 100% sure that your region started, you should have one of your sharded actor respond to an identify message after you call cluster.start . Once it replies, you are guaranteed that your region is up and running. You can use a ask pattern if you want to be blocking and await on the ask future.
I am confused by behavior I am seeing in Akka. Briefly, I have a set of actors performing scientific calculations (star formation simulation). They have some state. When an error occurs such that one or more enter an invalid state, I want to restart the whole set to start over. I also want to do this if a single calc (over the entire set) takes too long (there is no way to predict in advance how long it may run).
So, there is the set of Simulation actors at the bottom of the tree, then a Director above them (that creates them via a Router, and sends them messages via that Router as well). There is one more Director level above that to create Directors on different machines and collect results from them all.
I handle the timeout case by using the Akka Scheduler to create a one-time timeout event, in the local Director, when the simulation is started. When the Director gets this event, if all its Simulation actors have not finished, it does this:
children ! Broadcast(Kill)
where children is the Router that owns/created them - this sends a Kill to all the children (SimulActors).
What I thought would occur is that all the child actors would be restarted. However, their preRestart() hook method is never called. I see the Kill message received, but that's it.
I must be missing something fundamental here. I have read the Akka docs on this topic and I have to say I find them less than clear (especially the page on Supervisors). I would really appreciate either a thorough explanation of the Kill/restart process, or just some other references (Google wasn't very helpful).
Note
If the child of a router terminates, the router will not automatically
spawn a new child. In the event that all children of a router have
terminated the router will terminate itself.
Taken from the akka docs.
I would consider using a supervision strategy - akka has behavior built in for killing all actors (all for one strategy) and you can define the specific strategy - eg restart.
I think a more idiomatic way to run this would be to have the actors throw x exception if they're not done after a period of time and then the supervisor handle that via supervision strategy.
You could throw a not done exception from the child and then define the behaviour like so:
override val supervisorStrategy =
AllForOneStrategy(maxNrOfRetries = 0) {
case _: NotDoneException ⇒ Stop
case _: Exception ⇒ Restart
}
It's important to understand that a restart means stopping the old actor and creating a new separate object/Actor
References:
http://doc.akka.io/docs/akka/snapshot/scala/fault-tolerance.html
http://doc.akka.io/docs/akka/snapshot/general/supervision.html