It's possible to create sources and sinks from actors using Source.actorPublisher() and Sink.actorSubscriber() methods respectively. But is it possible to create a Flow from actor?
Conceptually there doesn't seem to be a good reason not to, given that it implements both ActorPublisher and ActorSubscriber traits, but unfortunately, the Flow object doesn't have any method for doing this. In this excellent blog post it's done in an earlier version of Akka Streams, so the question is if it's possible also in the latest (2.4.9) version.
I'm part of the Akka team and would like to use this question to clarify a few things about the raw Reactive Streams interfaces. I hope you'll find this useful.
Most notably, we'll be posting multiple posts on the Akka team blog about building custom stages, including Flows, soon, so keep an eye on it.
Don't use ActorPublisher / ActorSubscriber
Please don't use ActorPublisher and ActorSubscriber. They're too low level and you might end up implementing them in such a way that's violating the Reactive Streams specification. They're a relict of the past and even then were only "power-user mode only". There really is no reason to use those classes nowadays. We never provided a way to build a flow because the complexity is simply explosive if it was exposed as "raw" Actor API for you to implement and get all the rules implemented correctly.
If you really really want to implement raw ReactiveStreams interfaces, then please do use the Specification's TCK to verify your implementation is correct. You will likely be caught off guard by some of the more complex corner cases a Flow (or in RS terminology a Processor has to handle).
Most operations are possible to build without going low-level
Many flows you should be able to simply build by building from a Flow[T] and adding the needed operations onto it, just as an example:
val newFlow: Flow[String, Int, NotUsed] = Flow[String].map(_.toInt)
Which is a reusable description of the Flow.
Since you're asking about power user mode, this is the most powerful operator on the DSL itself: statefulFlatMapConcat. The vast majority of operations operating on plain stream elements is expressable using it: Flow.statefulMapConcat[T](f: () ⇒ (Out) ⇒ Iterable[T]): Repr[T].
If you need timers you could zip with a Source.timer etc.
GraphStage is the simplest and safest API to build custom stages
Instead, building Sources/Flows/Sinks has its own powerful and safe API: the GraphStage. Please read the documentation about building custom GraphStages (they can be a Sink/Source/Flow or even any arbitrary shape). It handles all of the complex Reactive Streams rules for you, while giving you full freedom and type-safety while implementing your stages (which could be a Flow).
For example, taken from the docs, is an GraphStage implementation of the filter(T => Boolean) operator:
class Filter[A](p: A => Boolean) extends GraphStage[FlowShape[A, A]] {
val in = Inlet[A]("Filter.in")
val out = Outlet[A]("Filter.out")
val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
setHandler(in, new InHandler {
override def onPush(): Unit = {
val elem = grab(in)
if (p(elem)) push(out, elem)
else pull(in)
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
}
}
It also handles asynchronous channels and is fusable by default.
In addition to the docs, these blog posts explain in detail why this API is the holy grail of building custom stages of any shape:
Akka team blog: Mastering GraphStages (part 1, introduction) - a high level overview
... tomorrow we'll publish one about it's API as well...
Kunicki blog: Implementing a Custom Akka Streams Graph Stage - another example implementing sources (really applies 1:1 to building Flows)
Konrad's solution demonstrates how to create a custom stage that utilizes Actors, but in most cases I think that is a bit overkill.
Usually you have some Actor that is capable of responding to questions:
val actorRef : ActorRef = ???
type Input = ???
type Output = ???
val queryActor : Input => Future[Output] =
(actorRef ? _) andThen (_.mapTo[Output])
This can be easily utilized with basic Flow functionality which takes in the maximum number of concurrent requests:
val actorQueryFlow : Int => Flow[Input, Output, _] =
(parallelism) => Flow[Input].mapAsync[Output](parallelism)(queryActor)
Now actorQueryFlow can be integrated into any stream...
Here is a solution build by using a graph stage. The actor has to acknowledge all messages in order to have back-pressure. The actor is notified when the stream fails/completes and the stream fails when the actor terminates.
This can be useful if you don't want to use ask, e.g. when not every input message has a corresponding output message.
import akka.actor.{ActorRef, Status, Terminated}
import akka.stream._
import akka.stream.stage.{GraphStage, GraphStageLogic, InHandler, OutHandler}
object ActorRefBackpressureFlowStage {
case object StreamInit
case object StreamAck
case object StreamCompleted
case class StreamFailed(ex: Throwable)
case class StreamElementIn[A](element: A)
case class StreamElementOut[A](element: A)
}
/**
* Sends the elements of the stream to the given `ActorRef` that sends back back-pressure signal.
* First element is always `StreamInit`, then stream is waiting for acknowledgement message
* `ackMessage` from the given actor which means that it is ready to process
* elements. It also requires `ackMessage` message after each stream element
* to make backpressure work. Stream elements are wrapped inside `StreamElementIn(elem)` messages.
*
* The target actor can emit elements at any time by sending a `StreamElementOut(elem)` message, which will
* be emitted downstream when there is demand.
*
* If the target actor terminates the stage will fail with a WatchedActorTerminatedException.
* When the stream is completed successfully a `StreamCompleted` message
* will be sent to the destination actor.
* When the stream is completed with failure a `StreamFailed(ex)` message will be send to the destination actor.
*/
class ActorRefBackpressureFlowStage[In, Out](private val flowActor: ActorRef) extends GraphStage[FlowShape[In, Out]] {
import ActorRefBackpressureFlowStage._
val in: Inlet[In] = Inlet("ActorFlowIn")
val out: Outlet[Out] = Outlet("ActorFlowOut")
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
private lazy val self = getStageActor {
case (_, StreamAck) =>
if(firstPullReceived) {
if (!isClosed(in) && !hasBeenPulled(in)) {
pull(in)
}
} else {
pullOnFirstPullReceived = true
}
case (_, StreamElementOut(elemOut)) =>
val elem = elemOut.asInstanceOf[Out]
emit(out, elem)
case (_, Terminated(targetRef)) =>
failStage(new WatchedActorTerminatedException("ActorRefBackpressureFlowStage", targetRef))
case (actorRef, unexpected) =>
failStage(new IllegalStateException(s"Unexpected message: `$unexpected` received from actor `$actorRef`."))
}
var firstPullReceived: Boolean = false
var pullOnFirstPullReceived: Boolean = false
override def preStart(): Unit = {
//initialize stage actor and watch flow actor.
self.watch(flowActor)
tellFlowActor(StreamInit)
}
setHandler(in, new InHandler {
override def onPush(): Unit = {
val elementIn = grab(in)
tellFlowActor(StreamElementIn(elementIn))
}
override def onUpstreamFailure(ex: Throwable): Unit = {
tellFlowActor(StreamFailed(ex))
super.onUpstreamFailure(ex)
}
override def onUpstreamFinish(): Unit = {
tellFlowActor(StreamCompleted)
super.onUpstreamFinish()
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
if(!firstPullReceived) {
firstPullReceived = true
if(pullOnFirstPullReceived) {
if (!isClosed(in) && !hasBeenPulled(in)) {
pull(in)
}
}
}
}
override def onDownstreamFinish(): Unit = {
tellFlowActor(StreamCompleted)
super.onDownstreamFinish()
}
})
private def tellFlowActor(message: Any): Unit = {
flowActor.tell(message, self.ref)
}
}
override def shape: FlowShape[In, Out] = FlowShape(in, out)
}
Related
I am trying to implement a simple one-to-many pub/sub pattern using a BroadcastHub. This fails silently for large numbers of subscribers, which makes me think I am hitting some limit on the number of streams I can run.
First, let's define some events:
sealed trait Event
case object EX extends Event
case object E1 extends Event
case object E2 extends Event
case object E3 extends Event
case object E4 extends Event
case object E5 extends Event
I have implemented the publisher using a BroadcastHub, adding a Sink.actorRefWithAck each time I want to add a new subscriber. Publishing the EX event ends the broadcast:
trait Publisher extends Actor with ActorLogging {
implicit val materializer = ActorMaterializer()
private val sourceQueue = Source.queue[Event](Publisher.bufferSize, Publisher.overflowStrategy)
private val (
queue: SourceQueueWithComplete[Event],
source: Source[Event, NotUsed]
) = {
val (q,s) = sourceQueue.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run()
s.runWith(Sink.ignore)
(q,s)
}
def publish(evt: Event) = {
log.debug("Publishing Event: {}", evt.getClass().toString())
queue.offer(evt)
evt match {
case EX => queue.complete()
case _ => Unit
}
}
def subscribe(actor: ActorRef, ack: ActorRef): Unit =
source.runWith(
Sink.actorRefWithAck(
actor,
onInitMessage = Publisher.StreamInit(ack),
ackMessage = Publisher.StreamAck,
onCompleteMessage = Publisher.StreamDone,
onFailureMessage = onErrorMessage))
def onErrorMessage(ex: Throwable) = Publisher.StreamFail(ex)
def publisherBehaviour: Receive = {
case Publisher.Subscribe(sub, ack) => subscribe(sub, ack.getOrElse(sender()))
case Publisher.StreamAck => Unit
}
override def receive = LoggingReceive { publisherBehaviour }
}
object Publisher {
final val bufferSize = 5
final val overflowStrategy = OverflowStrategy.backpressure
case class Subscribe(sub: ActorRef, ack: Option[ActorRef])
case object StreamAck
case class StreamInit(ack: ActorRef)
case object StreamDone
case class StreamFail(ex: Throwable)
}
Subscribers can implement the Subscriber trait to separate the logic:
trait Subscriber {
def onInit(publisher: ActorRef): Unit = ()
def onInit(publisher: ActorRef, k: KillSwitch): Unit = onInit(publisher)
def onEvent(event: Event): Unit = ()
def onDone(publisher: ActorRef, subscriber: ActorRef): Unit = ()
def onFail(e: Throwable, publisher: ActorRef, subscriber: ActorRef): Unit = ()
}
The actor logic is quite simple:
class SubscriberActor(subscriber: Subscriber) extends Actor with ActorLogging {
def subscriberBehaviour: Receive = {
case Publisher.StreamInit(ack) => {
log.debug("Stream initialized.")
subscriber.onInit(sender())
sender() ! Publisher.StreamAck
ack.forward(Publisher.StreamInit(ack))
}
case Publisher.StreamDone => {
log.debug("Stream completed.")
subscriber.onDone(sender(),self)
}
case Publisher.StreamFail(ex) => {
log.error(ex, "Stream failed!")
subscriber.onFail(ex,sender(),self)
}
case e: Event => {
log.debug("Observing Event: {}",e)
subscriber.onEvent(e)
sender() ! Publisher.StreamAck
}
}
override def receive = LoggingReceive { subscriberBehaviour }
}
One of the key points is that all subscribers must receive all messages sent by the publisher, so we have to know that all streams have materialized and all actors are ready to receive before starting the broadcast. This is why the StreamInit message is forwarded to another, user-provided actor.
To test this, I define a simple MockPublisher that just broadcasts a list of events when told to do so:
class MockPublisher(events: Event*) extends Publisher {
def receiveBehaviour: Receive = {
case MockPublish => events map publish
}
override def receive = LoggingReceive { receiveBehaviour orElse publisherBehaviour }
}
case object MockPublish
I also define a MockSubscriber who merely counts how many events it has seen:
class MockSubscriber extends Subscriber {
var count = 0
val promise = Promise[Int]()
def future = promise.future
override def onInit(publisher: ActorRef): Unit = count = 0
override def onEvent(event: Event): Unit = count += 1
override def onDone(publisher: ActorRef, subscriber: ActorRef): Unit = promise.success(count)
override def onFail(e: Throwable, publisher: ActorRef, subscriber: ActorRef): Unit = promise.failure(e)
}
And a small method for subscription:
object MockSubscriber {
def sub(publisher: ActorRef, ack: ActorRef)(implicit system: ActorSystem): Future[Int] = {
val s = new MockSubscriber()
implicit val tOut = Timeout(1.minute)
val a = system.actorOf(Props(new SubscriberActor(s)))
val f = publisher ! Publisher.Subscribe(a, Some(ack))
s.future
}
}
I put everything together in a unit test:
class SubscriberTests extends TestKit(ActorSystem("SubscriberTests")) with
WordSpecLike with Matchers with BeforeAndAfterAll with ImplicitSender {
override def beforeAll:Unit = {
system.eventStream.setLogLevel(Logging.DebugLevel)
}
override def afterAll:Unit = {
println("Shutting down...")
TestKit.shutdownActorSystem(system)
}
"The Subscriber" must {
"publish events to many observers" in {
val n = 9
val p = system.actorOf(Props(new MockPublisher(E1,E2,E3,E4,E5,EX)))
val q = scala.collection.mutable.Queue[Future[Int]]()
for (i <- 1 to n) {
q += MockSubscriber.sub(p,self)
}
for (i <- 1 to n) {
expectMsgType[Publisher.StreamInit](70.seconds)
}
p ! MockPublish
q.map { f => Await.result(f, 10.seconds) should be (6) }
}
}
}
This test succeeds for relatively small values of n, but fails for, say, val n = 90000. No caught or uncaught exception appears anywhere and neither does any out-of-memory complaint from Java (which does occur if I go even higher).
What am I missing?
Edit: Tried this on multiple computers with different specs. Debug info shows no messages reach any of the subscribers once n is high enough.
Akka Stream (and any other reactive stream, actually) provides you backpressure. If you hadn't messed up with how you create your consumers (e.g. allowing creation of 1GB JSON, which will you chop into smaller pieces only after you fetched it into memory) you should have a comfortable situation where you can consider your memory usage pretty much upper-bounded (because of how backpressure manage push-pull mechanics). Once you measure where your upper-bound lies, your can set up your JVM and container memory, so that you could let it run without fear of out of memory errors (provided that there is not other thing happening in your JVM which could cause memory usage spike).
So, from this we can see that there is some constraint on how much stream you can run in parallel - specifically you can run only as much of them as your memory allows you. CPU should not be a limitation (as you will have multiple threads), but if you will start too much of them on one machine, then CPU inevitably with have to switch between different streams making each of them slower. It might not be a technical blocker, but you might end up in a situation where processing is so slow that it doesn't fulfill its business purpose (though, I guess, you would have to run much more than few of streams at once).
In your tests you might run into some other issues as well. E.g. if you reuse the same thread pool for some blocking operations as you use for Actor System without informing the thread pool that they are blocking, you might end up with a dead lock (as a matter of the fact, you should run all IO blocking operations on a different thread pool than "computing" operations). Having 90000(!) concurrent things happening at the same time (and probably having the same small thread pool) almost guarantees running into issues (I guess you could run into issues even if instead of actors you would run the code directly on futures). Here you are using actor system in tests, which AFAIR use blocking logic only highlighting all the possible issues with small thread pools which keep blocking and non-blocking tasks in the same place.
I am using a third party library to provide parsing services (user agent parsing in my case) which is not a thread safe library and has to operate on a single threaded basis. I would like to write a thread safe API that can be called by multiple threads to interact with it via Futures API as the library might introduce some potential blocking (IO). I would also like to provide back pressure when necessary and return a failed future when the parser doesn't catch up with the producers.
It could actually be a generic requirement/question, how to interact with any client/library which is not thread safe (user agents/geo locations parsers, db clients like redis, loggers collectors like fluentd), with back pressure in a concurrent environments.
I came up with the following formula:
encapsulate the parser within a dedicated Actor.
create an akka stream source queue that receives ParseReuqest that contains the user agent and a Promise to complete, and using the ask pattern via mapAsync to interact with the parser actor.
create another actor to encapsulate the source queue.
Is this the way to go? Is there any other way to achieve this, maybe simpler ? maybe using graph stage? can it be done without the ask pattern and less code involved?
the actor mentioned in number 3, is because I'm not sure if the source queue is thread safe or not ?? I wish it was simply stated in the docs, but it doesn't. there are multiple versions over the web, some stating it's not and some stating it is.
Is the source queue, once materialized, is thread safe to push elements from different threads?
(the code may not compile and is prone to potential failures, and is only intended for this question in place)
class UserAgentRepo(dbFilePath: String)(implicit actorRefFactory: ActorRefFactory) {
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
implicit val askTimeout = Timeout(5 seconds)
// API to parser - delegates the request to the back pressure actor
def parse(userAgent: String): Future[Option[UserAgentData]] = {
val p = Promise[Option[UserAgentData]]
parserBackPressureProvider ! UserAgentParseRequest(userAgent, p)
p.future
}
// Actor to provide back pressure that delegates requests to parser actor
private class ParserBackPressureProvider extends Actor {
private val parser = context.actorOf(Props[UserAgentParserActor])
val queue = Source.queue[UserAgentParseRequest](100, OverflowStrategy.dropNew)
.mapAsync(1)(request => (parser ? request.userAgent).mapTo[Option[UserAgentData]].map(_ -> request.p))
.to(Sink.foreach({
case (result, promise) => promise.success(result)
}))
.run()
override def receive: Receive = {
case request: UserAgentParseRequest => queue.offer(request).map {
case QueueOfferResult.Enqueued =>
case _ => request.p.failure(new RuntimeException("parser busy"))
}
}
}
// Actor parser
private class UserAgentParserActor extends Actor {
private val up = new UserAgentParser(dbFilePath, true, 50000)
override def receive: Receive = {
case userAgent: String =>
sender ! Try {
up.parseUa(userAgent)
}.toOption.map(UserAgentData(userAgent, _))
}
}
private case class UserAgentParseRequest(userAgent: String, p: Promise[Option[UserAgentData]])
private val parserBackPressureProvider = actorRefFactory.actorOf(Props[ParserBackPressureProvider])
}
Do you have to use actors for this?
It does not seem like you need all this complexity, scala/java hasd all the tools you need "out of the box":
class ParserFacade(parser: UserAgentParser, val capacity: Int = 100) {
private implicit val ec = ExecutionContext
.fromExecutor(
new ThreadPoolExecutor(
1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(capacity)
)
)
def parse(ua: String): Future[Option[UserAgentData]] = try {
Future(Some(UserAgentData(ua, parser.parseUa(ua)))
.recover { _ => None }
} catch {
case _: RejectedExecutionException =>
Future.failed(new RuntimeException("parser is busy"))
}
}
I have an external (that is, I cannot change it) Java API which looks like this:
public interface Sender {
void send(Event e);
}
I need to implement a Sender which accepts each event, transforms it to a JSON object, collects some number of them into a single bundle and sends over HTTP to some endpoint. This all should be done asynchronously, without send() blocking the calling thread, with some fixed-size buffer and dropping new events if the buffer is full.
With akka-streams this is quite simple: I create a graph of stages (which uses akka-http to send HTTP requests), materialize it and use the materialized ActorRef to push new events to the stream:
lazy val eventPipeline = Source.actorRef[Event](Int.MaxValue, OverflowStrategy.fail)
.via(CustomBuffer(bufferSize)) // buffer all events
.groupedWithin(batchSize, flushDuration) // group events into chunks
.map(toBundle) // convert each chunk into a JSON message
.mapAsyncUnordered(1)(sendHttpRequest) // send an HTTP request
.toMat(Sink.foreach { response =>
// print HTTP response for debugging
})(Keep.both)
lazy val (eventsActor, completeFuture) = eventPipeline.run()
override def send(e: Event): Unit = {
eventsActor ! e
}
Here CustomBuffer is a custom GraphStage which is very similar to the library-provided Buffer but tailored to our specific needs; it probably does not matter for this particular question.
As you can see, interacting with the stream from non-stream code is very simple - the ! method on the ActorRef trait is asynchronous and does not need any additional machinery to be called. Each event which is sent to the actor is then processed through the entire reactive pipeline. Moreover, because of how akka-http is implemented, I even get connection pooling for free, so no more than one connection is opened to the server.
However, I cannot find a way to do the same thing with FS2 properly. Even discarding the question of buffering (I will probably need to write a custom Pipe implementation which does additional things that we need) and HTTP connection pooling, I'm still stuck with a more basic thing - that is, how to push the data to the reactive stream "from outside".
All tutorials and documentation that I can find assume that the entire program happens inside some effect context, usually IO. This is not my case - the send() method is invoked by the Java library at unspecified times. Therefore, I just cannot keep everything inside one IO action, I necessarily have to finalize the "push" action inside the send() method, and have the reactive stream as a separate entity, because I want to aggregate events and hopefully pool HTTP connections (which I believe is naturally tied to the reactive stream).
I assume that I need some additional data structure, like Queue. fs2 does indeed have some kind of fs2.concurrent.Queue, but again, all documentation shows how to use it inside a single IO context, so I assume that doing something like
val queue: Queue[IO, Event] = Queue.unbounded[IO, Event].unsafeRunSync()
and then using queue inside the stream definition and then separately inside the send() method with further unsafeRun calls:
val eventPipeline = queue.dequeue
.through(customBuffer(bufferSize))
.groupWithin(batchSize, flushDuration)
.map(toBundle)
.mapAsyncUnordered(1)(sendRequest)
.evalTap(response => ...)
.compile
.drain
eventPipeline.unsafeRunAsync(...) // or something
override def send(e: Event) {
queue.enqueue(e).unsafeRunSync()
}
is not the correct way and most likely would not even work.
So, my question is, how do I properly use fs2 to solve my problem?
Consider the following example:
import cats.implicits._
import cats.effect._
import cats.effect.implicits._
import fs2._
import fs2.concurrent.Queue
import scala.concurrent.ExecutionContext
import scala.concurrent.duration._
object Answer {
type Event = String
trait Sender {
def send(event: Event): Unit
}
def main(args: Array[String]): Unit = {
val sender: Sender = {
val ec = ExecutionContext.global
implicit val cs: ContextShift[IO] = IO.contextShift(ec)
implicit val timer: Timer[IO] = IO.timer(ec)
fs2Sender[IO](2)
}
val events = List("a", "b", "c", "d")
events.foreach { evt => new Thread(() => sender.send(evt)).start() }
Thread sleep 3000
}
def fs2Sender[F[_]: Timer : ContextShift](maxBufferedSize: Int)(implicit F: ConcurrentEffect[F]): Sender = {
// dummy impl
// this is where the actual logic for batching
// and shipping over the network would live
val consume: Pipe[F, Event, Unit] = _.evalMap { event =>
for {
_ <- F.delay { println(s"consuming [$event]...") }
_ <- Timer[F].sleep(1.seconds)
_ <- F.delay { println(s"...[$event] consumed") }
} yield ()
}
val suspended = for {
q <- Queue.bounded[F, Event](maxBufferedSize)
_ <- q.dequeue.through(consume).compile.drain.start
sender <- F.delay[Sender] { evt =>
val enqueue = for {
wasEnqueued <- q.offer1(evt)
_ <- F.delay { println(s"[$evt] enqueued? $wasEnqueued") }
} yield ()
enqueue.toIO.unsafeRunAsyncAndForget()
}
} yield sender
suspended.toIO.unsafeRunSync()
}
}
The main idea is to use a concurrent Queue from fs2. Note, that the above code demonstrates that neither the Sender interface nor the logic in main can be changed. Only an implementation of the Sender interface can be swapped out.
I don't have much experience with exactly that library but it should look somehow like that:
import cats.effect.{ExitCode, IO, IOApp}
import fs2.concurrent.Queue
case class Event(id: Int)
class JavaProducer{
new Thread(new Runnable {
override def run(): Unit = {
var id = 0
while(true){
Thread.sleep(1000)
id += 1
send(Event(id))
}
}
}).start()
def send(event: Event): Unit ={
println(s"Original producer prints $event")
}
}
class HackedProducer(queue: Queue[IO, Event]) extends JavaProducer {
override def send(event: Event): Unit = {
println(s"Hacked producer pushes $event")
queue.enqueue1(event).unsafeRunSync()
println(s"Hacked producer pushes $event - Pushed")
}
}
object Test extends IOApp{
override def run(args: List[String]): IO[ExitCode] = {
val x: IO[Unit] = for {
queue <- Queue.unbounded[IO, Event]
_ = new HackedProducer(queue)
done <- queue.dequeue.map(ev => {
println(s"Got $ev")
}).compile.drain
} yield done
x.map(_ => ExitCode.Success)
}
}
We can create a bounded queue that will consume elements from sender and make them available to fs2 stream processing.
import cats.effect.IO
import cats.effect.std.Queue
import fs2.Stream
trait Sender[T]:
def send(e: T): Unit
object Sender:
def apply[T](bufferSize: Int): IO[(Sender[T], Stream[IO, T])] =
for
q <- Queue.bounded[IO, T](bufferSize)
yield
val sender: Sender[T] = (e: T) => q.offer(e).unsafeRunSync()
def stm: Stream[IO, T] = Stream.eval(q.take) ++ stm
(sender, stm)
Then we'll have two ends - one for Java worlds, to send new elements to Sender. Another one - for stream processing in fs2.
class TestSenderQueue:
#Test def testSenderQueue: Unit =
val (sender, stream) = Sender[Int](1)
.unsafeRunSync()// we have to run it preliminary to make `sender` available to external system
val processing =
stream
.map(i => i * i)
.evalMap{ ii => IO{ println(ii)}}
sender.send(1)
processing.compile.toList.start//NB! we start processing in a separate fiber
.unsafeRunSync() // immediately right now.
sender.send(2)
Thread.sleep(100)
(0 until 100).foreach(sender.send)
println("finished")
Note that we push data in the current thread and have to run fs2 in a separate thread (.start).
Suppose I have following simple graph.
class KafkaSource[A](kI: KafkaIterator) extends GraphStage[SourceShape[A]] {
val out = Outlet[A]("KafkaSource.out")
override val shape = SourceShape.of(out)
override def createLogic(attr: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
setHandler(out, new OutHandler {
override def onPull(): Unit = {
push(out, kI.next)
}
})
}
}
val g = GraphDSL.create(){ implicit b =>
val source = b.add(new KafkaSource[Message](itr))
val sink = b.add(Sink.foreach[Message](println))
source ~> sink
ClosedShape
}
we're running it as
RunnableGraph.fromGraph(g).run()
I wish to signal the kafkaSource to stop(or artificially complete) instead of pushing the next available element, so that connected stages downstream also stop.
how do I accomplish that ?
The scenario being, we have millions of messages in kafka & we would like to stop processing messages everyday at 9pm (for instance) and assuming we're stopping our running applications with a clean shutdown.
Although probably not relevant for phantomastray any more but helpful for others:
A KillSwitch allows the completion of graphs of FlowShape from the
outside. It consists of a flow element that can be linked to a graph
of FlowShape needing completion control. The KillSwitch trait allows
to complete or fail the graph(s). [Source: Akka Docs]
I am trying to continuously read the wikipedia IRC channel using this lib: https://github.com/implydata/wikiticker
I created a custom Akka Publisher, which will be used in my system as a Source.
Here are some of my classes:
class IrcPublisher() extends ActorPublisher[String] {
import scala.collection._
var queue: mutable.Queue[String] = mutable.Queue()
override def receive: Actor.Receive = {
case Publish(s) =>
println(s"->MSG, isActive = $isActive, totalDemand = $totalDemand")
queue.enqueue(s)
publishIfNeeded()
case Request(cnt) =>
println("Request: " + cnt)
publishIfNeeded()
case Cancel =>
println("Cancel")
context.stop(self)
case _ =>
println("Hm...")
}
def publishIfNeeded(): Unit = {
while (queue.nonEmpty && isActive && totalDemand > 0) {
println("onNext")
onNext(queue.dequeue())
}
}
}
object IrcPublisher {
case class Publish(data: String)
}
I am creating all this objects like so:
def createSource(wikipedias: Seq[String]) {
val dataPublisherRef = system.actorOf(Props[IrcPublisher])
val dataPublisher = ActorPublisher[String](dataPublisherRef)
val listener = new MessageListener {
override def process(message: Message) = {
dataPublisherRef ! Publish(Jackson.generate(message.toMap))
}
}
val ticker = new IrcTicker(
"irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(listener)
)
ticker.start() // if I comment this...
Thread.currentThread().join() //... and this I get Request(...)
Source.fromPublisher(dataPublisher)
}
So the problem I am facing is this Source object. Although this implementation works well with other sources (for example from local file), the ActorPublisher don't receive Request() messages.
If I comment the two marked lines I can see, that my actor has received the Request(count) message from my flow. Otherwise all messages will be pushed into the queue, but not in my flow (so I can see the MSG messages printed).
I think it's something with multithreading/synchronization here.
I am not familiar enough with wikiticker to solve your problem as given. One question I would have is: why is it necessary to join to the current thread?
However, I think you have overcomplicated the usage of Source. It would be easier for you to work with the stream as a whole rather than create a custom ActorPublisher.
You can use Source.actorRef to materialize a stream into an ActorRef and work with that ActorRef. This allows you to utilize akka code to do the enqueing/dequeing onto the buffer while you can focus on the "business logic".
Say, for example, your entire stream is only to filter lines above a certain length and print them to the console. This could be accomplished with:
def dispatchIRCMessages(actorRef : ActorRef) = {
val ticker =
new IrcTicker("irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(new MessageListener {
override def process(message: Message) =
actorRef ! Publish(Jackson.generate(message.toMap))
}))
ticker.start()
Thread.currentThread().join()
}
//these variables control the buffer behavior
val bufferSize = 1024
val overFlowStrategy = akka.stream.OverflowStrategy.dropHead
val minMessageSize = 32
//no need for a custom Publisher/Queue
val streamRef =
Source.actorRef[String](bufferSize, overFlowStrategy)
.via(Flow[String].filter(_.size > minMessageSize))
.to(Sink.foreach[String](println))
.run()
dispatchIRCMessages(streamRef)
The dispatchIRCMessages has the added benefit that it will work with any ActorRef so you aren't required to only work with streams/publishers.
Hopefully this solves your underlying problem...
I think the main problem is Thread.currentThread().join(). This line will 'hang' current thread because this thread is waiting for himself to die. Please read https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#join-long- .