My akka-streams learn-o-thon continues. I'd like to integrate my akka-streams application with akka-cluster and DistributedPubSubMediator.
Adding support for Publish is fairly straight forward, but the Subscribe part I'm having trouble with.
For reference, a subscriber is given as follows in the Typesafe sample:
class ChatClient(name: String) extends Actor {
val mediator = DistributedPubSub(context.system).mediator
mediator ! Subscribe("some topic", self)
def receive = {
case ChatClient.Message(from, text) =>
...process message...
}
}
My question is, how should I integrate this actor with my flow, and how should I ensure I'm getting publish messages in the absence of stream backpressure?
I'm trying to accomplish a pubsub model where one stream may publish a message and another stream would consume it (if subscribed).
You probably want to make your Actor extend ActorPublisher. Then you can create a Source from it and integrate that into your stream.
See the docs on ActorPublisher here: http://doc.akka.io/docs/akka-stream-and-http-experimental/2.0.3/scala/stream-integrations.html
The other answers are outdated: they suggest using ActorPublisher, which has been deprecated since version 2.5.0.
For those interested in a current approach, Colin Breck wrote an excellent series in his blog about integrating Akka Streams and Akka actors. Over the course of the series, Breck fleshes out a system that begins with Akka Streams and plain actors, then incorporates Akka Cluster and Akka Persistence. The first post in the series is here (the distributed stream processing piece is in part 3).
Related
I have 2 questions:
How many actors does the below code create?
How do I create 1000 actors at the same time?
val system = ActorSystem("DonutStoreActorSystem");
val donutInfoActor = system.actorOf(Props[DonutInfoActor], name = "DonutInfoActor")
When you start the classic actor system and use actorOf like that it will create one of your DonutInfoActor and a few internal Akka system actors related to the event bus, logging, cluster if you are using that.
Just as texasbruce said in a comment, a loop lets you create any number of actors from a single spot, startup is async, so you will get an ActorRef back that is ready to use but the actor that it referencing it may still be starting up.
Note that if you are building something new we recommend the new "typed" actor APIs that was completed in Akka 2.6 over the classic API used in your sample.
We have the following logic implemented to manage jobs targeting different backends:
A Manager Actor is started. This actor:
Loads the configuration required to target each backen (mutable map backend name -> backend connector configuration);
Loads a pool of Actors (RoundRobinPool) to handle the jobs for each backend (mutable map backend name -> RoundRobinPool Actor Ref)
When a request is received by the Manager actor, it retrieves the backend name from the message and forward it to the corresponding pool of Actor to handle the job (assuming a configuration for this backend was registered). The result of the job request is then returned from the actor to the original sender (reason why we use forward).
This logic works very well, but backend being slow to handle job, we are in a typical case of fast publisher, slow consumer and this is raising issues when the load increases.
After doing some research, Akka Streams seems the way to go as it allows to implement back pressure and throttling which would be perfect for our usage (for exemple, limit to 5 requests per seconds).
The idea is to keep the Manager Actor with the same routing logic but replace the pools of Actors with a Source.queue.
When registering the Source.queue, this would bed perform like this:
val queue = Source
.queue[RunBackendRequest](0, OverflowStrategy.backpressure)
.throttle(5, 1.second)
.map(r => runBackendRequest(r))
.toMat(Sink.ignore)(Keep.left)
.run())
Where the definition of RunBackendRequest is:
case class RunBackendRequest(originalSender: ActorRef, backendConnector: BackendConnector, request: BackendRequest)
And the function runBackendRequest is defined as such:
private def runBackendRequest(runRequest: RunBackendRequest): Unit = {
val connector = BackendConnectorFactory.getBackendConnector(configuration.underlying, runRequest.backendConnector.toConfig(), materializer, environment.asJava)
Future { connector.doSomeWork(runRequest.request) } map { result =>
runRequest.originalSender ! Success(result)
} recover {
case e: Exception => runRequest.originalSender ! Failure(e)
}
}
}
When the Manager Actor receive a message, it will 'offer' it to the correct queue based on the name of the target backend contained in the message.
Therefore, I have a few question:
Is this the correct way to use Akka Stream in this particular use case or could it we written differently and more efficiently?
Is that ok to provide the actorRef of the original sender in RunBackendRequest object so that the request will be answered in the Flow?
Is there a way to retrieve the result of the Flow into a Future instead and the Manager actor could then return the result of the request itself?
Akka Streams seems to be very powerful, but there is clearly a learning curve!
It feels to me having the Manager Actor creates a single point of failure. Maybe worth a try:
The original sender keeps hammering an Akka stream graph instead of the Manager actor. Make sure you pass the ActorRef downstream such that reply can be sent back
Inside the graph, using either partition-then-merge or Substreams to process requests that aim towards different backend connectors.
Either as the last step of the graph or after the backend connectors have finished, answer the original sender.
Overall, Colin's article is a great introduction on how to use Akka streams with Partition and Merge to archive your goal.
Let me know if you need more clarification and I can update my answer accordingly.
I'v been reading about Akka Streams in the last couple of days and I've been working with Rx libraries in Scala for the last couple of months. To me there seems to be some overlap in what both these libraries got to offer. RxScala was a bit easier to get started, understand and use. For example., here is a simple use case where I'm using Scala's Rx library to connect to Kafka topic, wrap that up into an Observable so that I could have subscribers getting those messages.
val consumerStream = consumer.createMessageStreamsByFilter(topicFilter(topics), 1, keyDecoder, valueDecoder).head
val observableConsumer = Observable.fromIterator(consumerStream).map(_.message())
This is quite simple and neat. Any clues on how I should get started with akka streams? I want to use the same example above where I want to emit events from the Source. I will later have a Flow and a Sink. Then finally, in my main class, I will combine these 3 to run the application data flow.
Any suggestions?
So here is what I came up with:
val kafkaStreamItr = consumer.createMessageStreamsByFilter(topicFilter(topics), 1, keyDecoder, valueDecoder).head
Source.fromIterator(() => kafkaStreamItr).map(_.message)
I started learning Akka Streams, which is a framework for processing data with back-pressure functionality. The library is part of Akka that describes itself as:
Akka is a toolkit and runtime for building highly concurrent,
distributed, and resilient message-driven applications on the JVM.
These capabilities comes from the nature of Akka actors. However, from my perspective, stream processing and actors are irrelevant concept to each other.
Question:
Do Akka Streams take advantage of these features of Akka actors? If yes, would you explain how actors help streams?
Akka Streams is a higher level abstraction than actors. It's an implementation of Reactive Streams which builds on top of the actor model. It takes advantage of all the actor features because it uses actors.
You can even go back to using actors directly in any part of the stream. Look at ActorPublisher and ActorSubscriber.
A good starting point is the akka stream quickstart.
Yes, an Actor is used to "materialize" each {Source, Flow, Sink} of a Stream. This means that when you create a Stream nothing actually happens until the stream is materialized, typically via the .run() method call.
As an example, here is a Stream being defined:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Source, Flow, Sink}
val stream = Source.single[String]("test")
.via(Flow[String].filter(_.size > 0))
.to(Sink.foreach{println})
Even though the stream is now a val no computation has actually happened. The Stream is just a recipe for computation. To actually kick-off the work the Stream needs to be materialized. Here is an example that does not use implicits to clearly show how materialization occurs:
val actorSystem = ActorSystem()
val materializer = ActorMaterializer()(actorSystem)
stream.run()(materializer) //work begins
Now 3 Actors (at least) have been created: 1 for the Source.single, 1 for the Flow.filter, and 1 for the Sink.foreach. Note: you can use the same materializer to initiate other streams
val doesNothingStream = Source.empty[String]
.to(Sink.ignore)
.run()(materializer)
I want to implement some kind of message bus in one of my Scala applications. The features would be:
ability to subscribe to 1 .. N types of messages
messages may have payloads
loose coupling (nodes only hold a reference to the bus)
lightweight (no fully blown enterprise message queue etc.)
What I plan to do is to implement all nodes and the bus itself as standard Scala actors. For example I want to define a trait Subscriber like this:
trait Subscriber[M <: Message[_]] {
this: Actor =>
def notify(message: M)
}
Ideally mixing in this trait should already register the subscription for the type M.
So does this idea make sense? Are there better approaches to realize a message bus?
Disclaimer: I am the PO of Akka
Hi Itti,
This has already been done for you in Akka, the Actor Kernel: www.akka.io
Docs: http://doc.akkasource.org/routing-scala
Pub/Sub: Akka Listeners
Routers: Akka Routers
Convenience: Akka Routing