I started to learn Akka and came across a challenge for which I can't find an easy solution, despite having waded through the documentation and related Stakoverflow questions:
Building on the Client-Side Websocket Support example on the Akka website, I am using as the basis the following code snippet in Scala:
val flow: Flow[Message, Message, Future[Done]] =
Flow.fromSinkAndSourceMat(printSink, Source.maybe)(Keep.left)
val (upgradeResponse, closed) =
Http().singleWebSocketRequest(WebSocketRequest("ws://localhost/ws"), flow)
The use case I have is a client (printSink) consuming a continuous stream from the websocket server. The communication uni-directional only, thus no need for a source.
My question is then as follows:
I need to regularly force a re-connection to the websocket server, and for that I need to disconnect first. But for the life of me, I can't find a way to do a simple disconnect
In a somewhat opposite scenario, I need to keep the websocket connection alive and "swap out" the sink. Is this even possible, i.e. without creating another websocket connection?
For question 1 (forcing a disconnect from the client), this should work
val flow: Flow[Message, Message, (Future[Done], Promise[Option[Message])] =
Flow.fromSinkAndSourceMat(
printSink,
Source.maybe
)(Keep.both)
val (upgradeResponse, (closed, disconnect)) =
Http().singleWebsocketRequest(WebSocketRequest("ws://localhost/ws"), flow)
disconnect can then be completed with a None to disconnect:
disconnect.success(None)
For question 2, my intuition is that that sort of dynamic stream operation would seem to require a custom stream operator (i.e. one level below the Graph DSL and two levels below the "normal" scaladsl/javadsl). I don't have a huge amount of direct experience there, to be honest.
Related
We have the following logic implemented to manage jobs targeting different backends:
A Manager Actor is started. This actor:
Loads the configuration required to target each backen (mutable map backend name -> backend connector configuration);
Loads a pool of Actors (RoundRobinPool) to handle the jobs for each backend (mutable map backend name -> RoundRobinPool Actor Ref)
When a request is received by the Manager actor, it retrieves the backend name from the message and forward it to the corresponding pool of Actor to handle the job (assuming a configuration for this backend was registered). The result of the job request is then returned from the actor to the original sender (reason why we use forward).
This logic works very well, but backend being slow to handle job, we are in a typical case of fast publisher, slow consumer and this is raising issues when the load increases.
After doing some research, Akka Streams seems the way to go as it allows to implement back pressure and throttling which would be perfect for our usage (for exemple, limit to 5 requests per seconds).
The idea is to keep the Manager Actor with the same routing logic but replace the pools of Actors with a Source.queue.
When registering the Source.queue, this would bed perform like this:
val queue = Source
.queue[RunBackendRequest](0, OverflowStrategy.backpressure)
.throttle(5, 1.second)
.map(r => runBackendRequest(r))
.toMat(Sink.ignore)(Keep.left)
.run())
Where the definition of RunBackendRequest is:
case class RunBackendRequest(originalSender: ActorRef, backendConnector: BackendConnector, request: BackendRequest)
And the function runBackendRequest is defined as such:
private def runBackendRequest(runRequest: RunBackendRequest): Unit = {
val connector = BackendConnectorFactory.getBackendConnector(configuration.underlying, runRequest.backendConnector.toConfig(), materializer, environment.asJava)
Future { connector.doSomeWork(runRequest.request) } map { result =>
runRequest.originalSender ! Success(result)
} recover {
case e: Exception => runRequest.originalSender ! Failure(e)
}
}
}
When the Manager Actor receive a message, it will 'offer' it to the correct queue based on the name of the target backend contained in the message.
Therefore, I have a few question:
Is this the correct way to use Akka Stream in this particular use case or could it we written differently and more efficiently?
Is that ok to provide the actorRef of the original sender in RunBackendRequest object so that the request will be answered in the Flow?
Is there a way to retrieve the result of the Flow into a Future instead and the Manager actor could then return the result of the request itself?
Akka Streams seems to be very powerful, but there is clearly a learning curve!
It feels to me having the Manager Actor creates a single point of failure. Maybe worth a try:
The original sender keeps hammering an Akka stream graph instead of the Manager actor. Make sure you pass the ActorRef downstream such that reply can be sent back
Inside the graph, using either partition-then-merge or Substreams to process requests that aim towards different backend connectors.
Either as the last step of the graph or after the backend connectors have finished, answer the original sender.
Overall, Colin's article is a great introduction on how to use Akka streams with Partition and Merge to archive your goal.
Let me know if you need more clarification and I can update my answer accordingly.
I use akka-streams' ActorPublisher actor as a streaming per-connection Source of data being sent to an incoming WebSocket or HTTP connection.
ActorPublisher's contract is to regularly request data by supplying a demand - number of elements that can be accepted by downstream. I am not supposed to send more elements if the demand is 0. I observe that if I buffer elements, when consumer is slow, that buffer size fluctuates between 1 and 60, but mostly near 40-50.
To stream I use akka-http's ability to set WebSocket output and HttpResponse data to a Source of Messages (or ByteStrings).
I wonder how the back-pressure works in this case - when I'm streaming data to a client through network. How exactly these numbers are calculated? Does it check what's happening on network level?
The closest I could find for your question "how the back-pressure works in this case" is from the documentation:
Akka HTTP is streaming all the way through, which means that the
back-pressure mechanisms enabled by Akka Streams are exposed through
all layers–from the TCP layer, through the HTTP server, all the way up
to the user-facing HttpRequest and HttpResponse and their HttpEntity
APIs.
As to "how these numbers are calculated", I believe that is specified in the configuration settings.
My akka-streams learn-o-thon continues. I'd like to integrate my akka-streams application with akka-cluster and DistributedPubSubMediator.
Adding support for Publish is fairly straight forward, but the Subscribe part I'm having trouble with.
For reference, a subscriber is given as follows in the Typesafe sample:
class ChatClient(name: String) extends Actor {
val mediator = DistributedPubSub(context.system).mediator
mediator ! Subscribe("some topic", self)
def receive = {
case ChatClient.Message(from, text) =>
...process message...
}
}
My question is, how should I integrate this actor with my flow, and how should I ensure I'm getting publish messages in the absence of stream backpressure?
I'm trying to accomplish a pubsub model where one stream may publish a message and another stream would consume it (if subscribed).
You probably want to make your Actor extend ActorPublisher. Then you can create a Source from it and integrate that into your stream.
See the docs on ActorPublisher here: http://doc.akka.io/docs/akka-stream-and-http-experimental/2.0.3/scala/stream-integrations.html
The other answers are outdated: they suggest using ActorPublisher, which has been deprecated since version 2.5.0.
For those interested in a current approach, Colin Breck wrote an excellent series in his blog about integrating Akka Streams and Akka actors. Over the course of the series, Breck fleshes out a system that begins with Akka Streams and plain actors, then incorporates Akka Cluster and Akka Persistence. The first post in the series is here (the distributed stream processing piece is in part 3).
I found Slick 3.0 introduced a new feature called streaming
http://slick.typesafe.com/doc/3.0.0-RC1/database.html#streaming
I'm not familiar with Akka. streaming seems a lazy or async value, but it is not very clear for me to understand why it is useful, and when will it be useful..
Does anyone have ideas about this?
So lets imagine the following use case:
A "slow" client wants to get a large dataset from the server. The client sends a request to the server which loads all the data from the database, stores it in memory and then passes it down to the client.
And here we're faced with problems: The client handles the data not so fast as we wanted => we can't release the memory => this may result in an out of memory error.
Reactive streams solve this problem by using backpressure. We can wrap Slick's publisher around the Akka source and then "feed" it to the client via Akka HTTP.
The thing is that this backpressure is propagated through TCP via Akka HTTP down to the publisher that represents the database query.
That means that we only read from the database as fast as the client can consume the data.
P.S This just a little aspect where reactive streams can be applied.
You can find more information here:
http://www.reactive-streams.org/
https://youtu.be/yyz7Keg1w9E
https://youtu.be/9S-4jMM1gqE
How can I check if a remote actor, for which I have obtained an actorRef via actorFor, is alive? Any reference to documentation would be appreciated. I am using Akka from Scala.
I've seen reference to supervisors and deathwatch, but don't really feel my use-case needs such heavy machinery. I just want to have my client check if the master is up using a known path, and if it is send a message introducing itself. If the master is not up, then it should wait for a bit then retry.
Update 2:
Suggestions are that I just use a ping-pong ask test to see if it's alive. I understand this to be something like
implicit val timeout = Timeout(5 seconds)
val future = actor ? AreYouAlive
try{
Await.result(future, timeout.duration)
}catch{
case e:AskTimeoutException => println("It's not there: "+e)
}
I think I've been confused by the presence of exceptions in the logs, which remain there now. E.g.
Error: java.net.ConnectException:Connection refused
Error: java.nio.channels.ClosedChannelException:null
Perhaps this is just how it works and I must accept the errors/warning in the logs rather than try to protect against them?
Just send it messages. Its machine could become unreachable the nanosecond after you sent your message anyway. IF you don't get any reply, it is most likely dead. There's a large chapter on this in the docs: http://doc.akka.io/docs/akka/2.0.1/general/message-send-semantics.html
You should never assume that the network is available. Our architect here always says that there are two key concepts that come into play in distributed system design.
They are:
Timeout
Retry
Messages should 'timeout' if they don't make it after x period of time and then you can retry the message. With timeout you don't have to worry about the specific error - only that message response has failed. For high levels of availability you may want to consider using tools such as zookeeper to handle clustering/availability monitoring. See leader election here for example: http://zookeeper.apache.org/doc/trunk/recipes.html