How to backpressure a ActorPublisher - scala

I'm writing few samples to understand akka streams and backpressures. I'm trying to see how a slow consumer backpressure's a AkkaPublisher
My code as follows.
class DataPublisher extends ActorPublisher[Int] {
import akka.stream.actor.ActorPublisherMessage._
var items: List[Int] = List.empty
def receive = {
case s: String =>
println(s"Producer buffer size ${items.size}")
if (totalDemand == 0)
items = items :+ s.toInt
else
onNext(s.toInt)
case Request(demand) =>
if (demand > items.size) {
items foreach (onNext)
items = List.empty
}
else {
val (send, keep) = items.splitAt(demand.toInt)
items = keep
send foreach (onNext)
}
case other =>
println(s"got other $other")
}
}
and
Source.fromPublisher(ActorPublisher[Int](dataPublisherRef)).runWith(sink)
Where the sink is a Subscriber with a sleep to emulate slow consumer. And publisher keeps producing data regardless.
--EDIT--
My question is when the demand is 0 programatically buffers data. How can I make use of backpressure to slow down the publisher
Something like
throttledSource().buffer(10, OverflowStrategy.backpressure).runWith(throttledSink())
This will not effect the publisher and its buffer keeps going.
Thanks,
Sajith

Don't use ActorPublisher
Firstly, don't use ActorPublisher - it is a very low-level and deprecated API. We decided to deprecate as users should not be working on such low level of abstraction in Akka Streams.
One of the tricky things is exactly what you're asking about -- handling backpressure is entirely in the hands of the developer writing the ActorPublisher if they use this API. So you have to receive the Request(n) messages and make sure that you never signal more elements than you got requests for. This behaviour is specified in the Reactive Streams Specification which you then have to implement correctly. Basically, you're exposed to all the complexities of Reactive Streams (which is a full specification, with many edge cases -- disclaimer: I was/am part of developing Reactive Streams as well as Akka Streams).
Showing how back-pressure manifests in GraphStage
Secondly, to build custom stages you should be using the API designed for it: GraphStage. Please note that such stage is also pretty low level. Normally users of Akka Streams don't need to write custom stages, however it is absolutely expected and fine to write your own stages if they would implement some logic that the built-in stages don't provide.
Here's a simplified Filter implementation from the Akka codebase:
case class Filter[T](p: T ⇒ Boolean) extends SimpleLinearGraphStage[T] {
override def initialAttributes: Attributes = DefaultAttributes.filter
override def toString: String = "Filter"
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) with OutHandler with InHandler {
override def onPush(): Unit = {
val elem = grab(in)
if (p(elem)) push(out, elem)
else pull(in)
}
// this method will NOT be called, if the downstream has not signalled enough demand!
// this method being NOT called is how back-pressure manifests in stages
override def onPull(): Unit = pull(in)
setHandlers(in, out, this)
}
}
As you can see, instead of implementing the entire Reactive Streams logic and rules yourself (which is hard), you get simple callbacks like onPush and onPull. Akka Streams handles the demand management, and it will automatically call onPull if the downstream has signaled demand, and it will NOT call it, if there is no demand -- which would mean the downstream is applying backpressure to this stage.

This can be accomplished with an intermediate Flow.buffer:
val flowBuffer = Flow[Int].buffer(10, OverflowStrategy.dropHead)
Source
.fromPublisher(ActorPublisher[Int](dataPublisherRef))
.via(flowBuffer)
.runWith(sink)

Related

How to clean up substreams in continuous Akka streams

Given I have a very long running stream of events flowing through something as show below. When a long time has passed there will be lots of sub streams created that is no longer needed.
Is there a way to clean up a specific substream at a given time, for
example the substream created by id 3 should be cleaned and the state
in the scan method lost at 13Pm (expires property of Wid)?
case class Wid(id: Int, v: String, expires: LocalDateTime)
test("Substream with scan") {
val (pub, sub) = TestSource.probe[Wid]
.groupBy(Int.MaxValue, _.id)
.scan("")((a: String, b: Wid) => a + b.v)
.mergeSubstreams
.toMat(TestSink.probe[String])(Keep.both)
.run()
}
TL;DR You can close a substream after some time. However, using input to dynamically set the time with built-in stages is another matter.
Closing a substream
To close a flow, you usually complete it (from upstream), but you can also cancel it (from downstream). For instance, the take(n: Int) flow will cancel once n elements have gone through.
Now, in the groupBy case, you cannot complete a substream, since upstream is shared by all substreams, but you can cancel it. How depends on what condition you want to put on its end.
However, be aware that groupBy removes inputs for subflows that have already been closed: If a new element with id 3 comes from upstream to the groupBy after the 3-substream has been closed, it will simply be ignored and the next element will be pulled in. The reason for this is probably that some elements might be lost in the process between closing and re-opening of the substream. Also, if your stream is supposed to run for a very long time, this will affect performances because each element will be checked against the list of closed substreams before being forwarded to the relevant (live) substream. You might want to implement your own stateful filter (say, with a bloom filter) if you're not satisfied with the performances of this.
To close a substream, I usually use either take (if you want only a given number of elements, but that's probably not the case on an infinite stream), or some kind of timeout: either completionTimeout if you want a fixed time from materialization to closure or idleTimeout if you want to close when no element have been coming through for some time. Note that these flows do not cancel the stream but fail it, so you have to catch the exception with a recover or recoverWith stage to change the failure into a cancel (recoverWith allows you to cancel without sending any last element, by recovering with Source.empty).
Dynamically set the timeout
Now what you want is to set dynamically the closing time according to the first passing element. This is more complicated because materialization of streams is independant of the elements that pass through them. Indeed, in the usual (without groupBy) case, streams are materialized before any element go through them, so it makes no sense to use elements to materialize them.
I had similar issues in that question, and ended up using a modified version of groupBy with signature
paramGroupBy[K, OO, MM](maxSubstreams: Int, f: Out => K, paramSubflow: K => Flow[Out, OO, MM])
that allows to define every substream using the key that defined it. This can be modified to have the first element (instead of the key), as parameter.
Another (probably simpler, in your case) way would be to write your own stage that does exactly what you want: get end-time from first element and cancel the stream at that time. Here is an example implementation for this (I used a scheduler instead of setting a state):
object CancelAfterTimer
class CancelAfter[T](getTimeout: T => FiniteDuration) extends GraphStage[FlowShape[T, T]] {
val in = Inlet[T]("CancelAfter.in")
val out = Outlet[T]("CancelAfter.in")
override val shape: FlowShape[T, T] = FlowShape(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new TimerGraphStageLogic(shape) with InHandler with OutHandler {
override def onPush(): Unit = {
val elem = grab(in)
if (!isTimerActive(CancelAfterTimer))
scheduleOnce(CancelAfterTimer, getTimeout(elem))
push(out, elem)
}
override def onTimer(timerKey: Any): Unit =
completeStage() //this will cancel the upstream and close the downstrean
override def onPull(): Unit = pull(in)
setHandlers(in, out, this)
}
}

Akka Stream - Splitting flow into multiple Sources

I have a TCP connection in Akka Stream that ends in a Sink. Right now all messages go into one Sink. I want to split the stream into an unknown number of Sinks given some function.
The use case is as follows, from the TCP connection I get en continuous stream of something like List[DeltaValue], now I want to create an actorSink for each DeltaValue.id so that i can continuously accumulate and implement behaviour for each DeltaValue.id. I find this to be a standard use case in stream processing but I'm not able to find a good example with Akka Stream.
This is what I have right now:
def connect(): ActorRef = tcpConnection
.//SOMEHOW SPLIT HERE and create a ReceiverActor for each message
.to(Sink.actorRef(system.actorOf(ReceiverActor.props(), ReceiverActor.name), akka.Done))
.run()
Update:
I now have this, not sure what to say about it, it does not feel super stable but it should work:
private def spawnActorOrSendMessage(m: ResponseMessage): Unit = {
implicit val timeout = Timeout(FiniteDuration(1, TimeUnit.SECONDS))
system.actorSelection("user/" + m.id.toString).resolveOne().onComplete {
case Success(actorRef) => actorRef ! m
case Failure(ex) => (system.actorOf(ReceiverActor.props(), m.id.toString)) ! m
}
}
def connect(): ActorRef = tcpConnection
.to(Sink.foreachParallel(10)(spawnActorOrSendMessage))
.run()
The below should be a somewhat improved version of what was updated in the question. The main improvement is that your actors are kept in a data structure to avoid actorSelection resolution for every incoming message.
case class DeltaValue(id: String, value: Double)
val src: Source[DeltaValue, NotUsed] = ???
src.runFold(Map[String, ActorRef]()){
case (actors, elem) if actors.contains(elem.id) ⇒
actors(elem.id) ! elem.value
actors
case (actors, elem) ⇒
val newActor = system.actorOf(ReceiverActor.props(), ReceiverActor.name)
newActor ! elem.value
actors.updated(elem.id, newActor)
}
Keep in mind that, when you integrate Akka Streams with bare actors, you lose backpressure support. This is one of the reasons why you should try and implement your logic within the boundaries of Akka Streams whenever possible. And this is not always possible - e.g. when remoting is needed etc.
In your case, you could consider leveraging groupBy and the concept of substream. The example below is folding the elements of each substream by summing them, just to give an idea:
src.groupBy(maxSubstreams = Int.MaxValue, f = _.id)
.fold("" → 0d) {
case ((id, acc), delta) ⇒ id → delta.value + acc
}
.mergeSubstreams
.runForeach(println)
EventStream
You can send messages to the ActorSystem's EventStream within a stream sink and separately have the Actors subscribe to the stream.
Split At Stream Level
You can split the stream at the stream level using Broadcast. The documentation has a good example of this.
Split At Actor Level
You could also use Sink.actorRef in combination with a BroadcastPool to broadcast the messages to multiple Actors.

How to run futures on the current actor's dispatcher in Akka

Akka's documentation warns:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback
It seems to me that if I could get the Future which wants to access the mutable state to run on the same dispatcher that arranges for mutual exclusion of threads handling actor messages then this issue could be avoided. Is that possible? (Why not?)
The ExecutionContext provided by context.dispatcher is not tied to the actor messages dispatcher, but what if it were? i.e.
class MyActorWithSafeFutures {
implicit def safeDispatcher = context.dispatcherOnMessageThread
var successCount = 0
var failureCount = 0
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount += 1
case Failure(_) => failureCount += 1
}
response pipeTo sender()
}
}
}
Is there any way to do that in Akka?
(I know that I could convert the above example to do something like self ! IncrementSuccess, but this question is about mutating actor state from Futures, rather than via messages.)
It looks like I might be able to implement this myself, using code like the following:
class MyActorWithSafeFutures {
implicit val executionContext: ExecutionContextExecutor = new ExecutionContextExecutor {
override def execute(runnable: Runnable): Unit = {
self ! runnable
}
override def reportFailure(cause: Throwable): Unit = {
throw new Error("Unhandled throwable", cause)
}
}
override def receive: Receive = {
case runnable: Runnable => runnable.run()
... other cases here
}
}
Would that work? Why doesn't Akka offer that - is there some huge drawback I'm not seeing?
(See https://github.com/jducoeur/Requester for a library which does just this in a limited way -- for Asks only, not for all Future callbacks.)
Your actor is executing its receive under one of the dispatcher's threads, and you want to spin off a Future that's firmly attached to this particular thread? In that case the system can't reuse this thread to run a different actor, because that would mean the thread was unavailable when you wanted to execute the Future. If it happened to use that same thread to execute someClient, you might deadlock with yourself. So this thread can no longer be used freely to run other actors - it has to belong to MySafeActor.
And no other threads can be allowed to freely run MySafeActor - if they were, two different threads might try to update successCount at the same time and you'd lose data (e.g. if the value is 0 and two threads both try to do successCount +=1, the value can end up as 1 rather that 2). So to do this safely, MySafeActor has to have a single Thread that's used for itself and its Future. So you end up with MySafeActor and that Future being tightly, but invisibly, coupled. The two can't run at the same time and could deadlock against each other. (It's still possible for a badly-written actor to deadlock against itself, but the fact that all the code using that actor's "imaginary mutex" is in a single place makes it easier to see potential problems).
You could use traditional multithreading techniques - mutexes and the like - to allow the Future and MySafeActor to run concurrently. But what you really want is to encapsulate successCount in something that can be used concurrently but safely - some kind of... Actor?
TL;DR: either the Future and the Actor: 1) may not run concurrently, in which case you may deadlock 2) may run concurrently, in which case you will corrupt data 3) access state in a concurrency-safe way, in which case you're reimplementing Actors.
You could use a PinnedDispatcher for your MyActorWithSafeFutures actor class which would create a thread pool with exactly one thread for each instance of the given class, and use context.dispatcher as execution context for your Future.
To do this you have to put something like this in your application.conf:
akka {
...
}
my-pinned-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
and to create your actor:
actorSystem.actorOf(
Props(
classOf[MyActorWithSafeFutures]
).withDispatcher("my-pinned-dispatcher"),
"myActorWithSafeFutures"
)
Although what you are trying to achieve breaks completely the purpose of the actor model. The actor state should be encapsulated, and actor state changes should be driven by incoming messages.
This does not answer your question directly, but rather offers an alternative solution using Akka Agents:
class MyActorWithSafeFutures extends Actor {
var successCount = Agent(0)
var failureCount = Agent(0)
def doSomethingWithPossiblyStaleCounts() = {
val (s, f) = (successCount.get(), failureCount.get())
statisticsCollector ! Ratio(f/s+f)
}
def doSomethingWithCurrentCounts() = {
val (successF, failureF) = (successCount.future(), failureCount.future())
val ratio : Future[Ratio] = for {
s <- successF
f <- failureF
} yield Ratio(f/s+f)
ratio pipeTo statisticsCollector
}
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount.send(_ + 1)
case Failure(_) => failureCount.send(_ + 1)
}
response pipeTo sender()
}
}
The catch is that if you want to operate on the counts that would result if you were using #volatile, then you need to operate inside a Future, see doSomethingWithCurrentCounts().
If you are fine with having values which are eventually consistent (there might be pending updates scheduled for the Agents), then something like doSometinghWithPossiblyStaleCounts() is fine.
#rkuhn explains why this would be a bad idea on the akka-user list:
My main consideration here is that such a dispatcher would make it very convenient to have multiple concurrent entry points into the Actor’s behavior, where with the current recommendation there is only one—the active behavior. While classical data races are excluded by the synchronization afforded by the proposed ExecutionContext, it would still allow higher-level races by suspending a logical thread and not controlling the intermediate execution of other messages. In a nutshell, I don’t think this would make the Actor easier to reason about, quite the opposite.

Scala concurrency on iterators as queues

I'm not really sure of the correct language of my problem, so feel free to provide me with the right terms.
Say I have a process A, which outputs an iterator (lazy evaluation)
This produces Iterator[A]
I then have another process B, which maps the events returning
Iterator[B]
This continues for several more processes
Iterator[A] -> Iterator[B] -> Iterator[C] -> ---
Now eventually I evaluate this stream into a list[Z].
This saves me the memory hit of having a List[A] -> List[B] -> List[C] etc
Now I want to improve performance by introducing parallelisation, but I don't want to parallelise the evaluation of each element across the iterators, but rather each iterator stack. So in this case a thread for process A fills a Queue[A] for Iterator[A], a thread for process B takes from Queue[A], applies whatever mapping, and then adds to Queue[B] for Iterator[B] to read from.
Now I have done this before in other languages by designing my own Async queues, I was wondering what Scala has to solve this.
Heres a first stab solutions I made using an actor.
Its fully blocking, so maybe an implementation using futures could be developed
case class AsyncIterator[T](iterator:Iterator[T]) extends Iterator[T] {
private val queue = new scala.collection.mutable.SynchronizedQueue[Int]()
private var end = !iterator.hasNext
def hasNext() = {
if (end) false
else if (!queue.isEmpty) true
else hasNext
}
def next() = {
while (q.isEmpty) {
if (end) throw new Exception("blah")
}
q.dequeue()
}
private val producer: Actor = actor {
loop {
if (!iterator.hasNext) {
end = true
exit
}
else {
q.enqueue(iterator.next)
}
}
}
producer.start()
}
Since you're open to alternative languages, how about Go?
There was a discussion recently about how to construct an event-driven pipeline, which would achieve the same thing as you describe but in a completely different way.
It's arguably easier to think about and design an event pipeline than it is to reason about lazy iterators because it becomes a data flow system in which the key question at each stage is 'what does this stage do with a single entity?' rather than 'how can I iterate efficiently over many entities?'
Once an event-driven pipeline has been implemented, the question of how to make it concurrent or parallel is moot - you've already done it.

Is there a FIFO stream in Scala?

I'm looking for a FIFO stream in Scala, i.e., something that provides the functionality of
immutable.Stream (a stream that can be finite and memorizes the elements that have already been read)
mutable.Queue (which allows for added elements to the FIFO)
The stream should be closable and should block access to the next element until the element has been added or the stream has been closed.
Actually I'm a bit surprised that the collection library does not (seem to) include such a data structure, since it is IMO a quite classical one.
My questions:
1) Did I overlook something? Is there already a class providing this functionality?
2) OK, if it's not included in the collection library then it might by just a trivial combination of existing collection classes. However, I tried to find this trivial code but my implementation looks still quite complex for such a simple problem. Is there a simpler solution for such a FifoStream?
class FifoStream[T] extends Closeable {
val queue = new Queue[Option[T]]
lazy val stream = nextStreamElem
private def nextStreamElem: Stream[T] = next() match {
case Some(elem) => Stream.cons(elem, nextStreamElem)
case None => Stream.empty
}
/** Returns next element in the queue (may wait for it to be inserted). */
private def next() = {
queue.synchronized {
if (queue.isEmpty) queue.wait()
queue.dequeue()
}
}
/** Adds new elements to this stream. */
def enqueue(elems: T*) {
queue.synchronized {
queue.enqueue(elems.map{Some(_)}: _*)
queue.notify()
}
}
/** Closes this stream. */
def close() {
queue.synchronized {
queue.enqueue(None)
queue.notify()
}
}
}
Paradigmatic's solution (sightly modified)
Thanks for your suggestions. I slightly modified paradigmatic's solution so that toStream returns an immutable stream (allows for repeatable reads) so that it fits my needs. Just for completeness, here is the code:
import collection.JavaConversions._
import java.util.concurrent.{LinkedBlockingQueue, BlockingQueue}
class FIFOStream[A]( private val queue: BlockingQueue[Option[A]] = new LinkedBlockingQueue[Option[A]]() ) {
lazy val toStream: Stream[A] = queue2stream
private def queue2stream: Stream[A] = queue take match {
case Some(a) => Stream cons ( a, queue2stream )
case None => Stream empty
}
def close() = queue add None
def enqueue( as: A* ) = queue addAll as.map( Some(_) )
}
In Scala, streams are "functional iterators". People expect them to be pure (no side effects) and immutable. In you case, everytime you iterate on the stream you modify the queue (so it's no pure). This can create a lot of misunderstandings, because iterating twice the same stream, will have two different results.
That being said, you should rather use Java BlockingQueues, rather than rolling your own implementation. They are considered well implemented in term of safety and performances. Here is the cleanest code I can think of (using your approach):
import java.util.concurrent.BlockingQueue
import scala.collection.JavaConversions._
class FIFOStream[A]( private val queue: BlockingQueue[Option[A]] ) {
def toStream: Stream[A] = queue take match {
case Some(a) => Stream cons ( a, toStream )
case None => Stream empty
}
def close() = queue add None
def enqueue( as: A* ) = queue addAll as.map( Some(_) )
}
object FIFOStream {
def apply[A]() = new LinkedBlockingQueue
}
I'm assuming you're looking for something like java.util.concurrent.BlockingQueue?
Akka has a BoundedBlockingQueue implementation of this interface. There are of course the implementations available in java.util.concurrent.
You might also consider using Akka's actors for whatever it is you are doing. Use Actors to be notified or pushed a new event or message instead of pulling.
1) It seems you're looking for a dataflow stream seen in languages like Oz, which supports the producer-consumer pattern. Such a collection is not available in the collections API, but you could always create one yourself.
2) The data flow stream relies on the concept of single-assignment variables (such that they don't have to be initialized upon declaration point and reading them prior to initialization causes blocking):
val x: Int
startThread {
println(x)
}
println("The other thread waits for the x to be assigned")
x = 1
It would be straightforward to implement such a stream if single-assignment (or dataflow) variables were supported in the language (see the link). Since they are not a part of Scala, you have to use the wait-synchronized-notify pattern just like you did.
Concurrent queues from Java can be used to achieve that as well, as the other user suggested.