Iteratees Error/Exception Handling vs Reactive Streams/akka-stream - scala

Surprise surprise I'm having a few problems with Iteratees and Error handling.
The problem;
Read some bytes from an InputStream (from the network, must be InputStream), do some chunking/grouing on this InputStream (for work distribution), followed by a transformation to turn this into a case class DataBlock(blockNum: Int, data: ByteString) for sending to actors (Array[Bytes] converted to a CompactByteString).
The flow;
InputStream.read -- bytes --> Group -- 1000 byte blocks --> Transform -- DataBlock --> Actors
The code;
class IterateeTest {
val actor: ActorRef = myDataBlockRxActor(...)
val is = new IntputStream(fromBytes...)
val source = Enumerator.fromStream(is)
val chunker = Traversable.takeUpTo[Array[Byte]](1000)
val transform:Iteratee[Array[Byte], Int] = Iteratee.fold[Array[Byte],Int](0) {
(bNum, bytes) => DataBlock(bNum, CompactByteString(bytes)); bNum + 1
}
val fut = source &> chunker |>> transform
}
case class DataBlock(blockNum: Int, data: CompactByteString)
The question;
My current Iteratee code works well. However, I want to be able to handle failures on either side;
When the InputStream read method fails - I want to know how many
bytes/block have been processed successfully and resume reading the
stream from that point. When read in the Enumerator throws an error, the fut just returns the exception, there is no state, so I dont know which block I am up to unless I pass it to the rxing actor (which I dont want to do)
If the output side fails or can no longer receive DataBlock messages because the Actor's buffer is full hold reading from the input stream
How should I do this?
How could I/would I be better of trying this using reactive-streams/Akka-stream (experimental) or scalaz iteratees over Play's iteratees because I need defined error handling?

(1) can be implemented with an Enumeratee.
val (counter, getCount) = {
var count = 0
(Enumeratee.map { x => count += 1; x },
() => count)
}
val fut = source &> counter &> chunker |>> transform
You can then use getCount inside a recover or such on fut.
You get (2) for free with Play Iteratees. No further reads will happen until the Iteratee is ready for more data, and if it fails no more reads will occur. The InputStream is automatically closed when the Enumerator is done, whether from failure or normal termination.

Related

How do I call context become outside of a Future from Ask messages?

I have a parent akka actor named buildingCoordinator which creates childs name elevator_X. For now I am creating only one elevator. The buildingCoordinator sends a sequence of messages and wait for responses in order to move an elevator. The sequence is this: sends ? RequestElevatorState -> receive ElevatorState -> sends ? MoveRequest -> receives MoveRequestSuccess -> changes the state. As you can see I am using the ask pattern. After the movement is successes the buildingCoordinator changes its state using context.become.
The problem that I am running is that the elevator is receiving MoveRequest(1,4) for the same floor twice, sometimes three times. I do remove the floor when I call context.become. However I remove inside the last map. I think it is because I am using context.become inside a future and I should use it outside. But I am having trouble implementing it.
case class BuildingCoordinator(actorName: String,
numberOfFloors: Int,
numberOfElevators: Int,
elevatorControlSystem: ElevatorControlSystem)
extends Actor with ActorLogging {
import context.dispatcher
implicit val timeout = Timeout(4 seconds)
val elevators = createElevators(numberOfElevators)
override def receive: Receive = operational(Map[Int, Queue[Int]](), Map[Int, Queue[Int]]())
def operational(stopsRequests: Map[Int, Queue[Int]], pickUpRequests: Map[Int, Queue[Int]]): Receive = {
case msg#MoveElevator(elevatorId) =>
println(s"[BuildingCoordinator] received $msg")
val elevatorActor: ActorSelection = context.actorSelection(s"/user/$actorName/elevator_$elevatorId")
val newState = (elevatorActor ? RequestElevatorState(elevatorId))
.mapTo[ElevatorState]
.flatMap { state =>
val nextStop = elevatorControlSystem.findNextStop(stopsRequests.get(elevatorId).get, state.currentFloor, state.direction)
elevatorActor ? MoveRequest(elevatorId, nextStop)
}
.mapTo[MoveRequestSuccess]
.flatMap(moveRequestSuccess => elevatorActor ? MakeMove(elevatorId, moveRequestSuccess.targetFloor))
.mapTo[MakeMoveSuccess]
.map { makeMoveSuccess =>
println(s"[BuildingCoordinator] Elevator ${makeMoveSuccess.elevatorId} arrived at floor [${makeMoveSuccess.floor}]")
// removeStopRequest
val stopsRequestsElevator = stopsRequests.get(elevatorId).getOrElse(Queue[Int]())
val newStopsRequestsElevator = stopsRequestsElevator.filterNot(_ == makeMoveSuccess.floor)
val newStopsRequests = stopsRequests + (elevatorId -> newStopsRequestsElevator)
val pickUpRequestsElevator = pickUpRequests.get(elevatorId).getOrElse(Queue[Int]())
val newPickUpRequestsElevator = {
if (pickUpRequestsElevator.contains(makeMoveSuccess.floor)) {
pickUpRequestsElevator.filterNot(_ == makeMoveSuccess.floor)
} else {
pickUpRequestsElevator
}
}
val newPickUpRequests = pickUpRequests + (elevatorId -> newPickUpRequestsElevator)
// I THINK I SHOULD NOT CALL context.become HERE
// context.become(operational(newStopsRequests, newPickUpRequests))
val dropOffFloor = BuildingUtil.generateRandomFloor(numberOfFloors, makeMoveSuccess.floor, makeMoveSuccess.direction)
context.self ! DropOffRequest(makeMoveSuccess.elevatorId, dropOffFloor)
(newStopsRequests, newPickUpRequests)
}
// I MUST CALL context.become HERE, BUT I DONT KNOW HOW
// context.become(operational(newState.flatMap(state => (state._1, state._2))))
}
Other thing that might be nasty here is this big chain of map and flatMap. This was my way to implement, however I think it might exist one way better.
You can't and you should not call context.become or anyhow change actor state outside Receive method and outside Receive method invoke thread (which is Akka distpatcher thread), like in your example. Eg:
def receive: Receive = {
// This is a bug, because context is not and is not supposed to be thread safe.
case message: Message => Future(context.become(anotherReceive))
}
What you should do - send message to self after async operation finished and change the state receive after. If in a mean time you don't want to handle incoming messages - you can stash them. See for more details: https://doc.akka.io/docs/akka/current/typed/stash.html
High level example, technical details omitted:
case OperationFinished(calculations: Map[Any, Any])
class AsyncActor extends Actor with Stash {
def operation: Future[Map[Any, Any]] = ...//some implementation of heavy async operation
def receiveStartAsync(calculations: Map[Any, Any]): Receive = {
case StartAsyncOperation =>
//Start async operation and inform yourself that it is finished
operation.map(OperationFinished.apply) pipeTo self
context.become(receiveWaitAsyncOperation)
}
def receiveWaitAsyncOperation: Receive = {
case OperationFinished =>
unstashAll()
context.become(receiveStartAsync)
case _ => stash()
}
}
I like your response #Ivan Kurchenko.
But, according to: Akka Stash docs
When unstashing the buffered messages by calling unstashAll the messages will be processed sequentially in the order they were added and all are processed unless an exception is thrown. The actor is unresponsive to other new messages until unstashAll is completed. That is another reason for keeping the number of stashed messages low. Actors that hog the message processing thread for too long can result in starvation of other actors.
Meaning that under load, for example, the unstashAll operation will cause all other Actors to be in starvation.
According to the same doc:
That can be mitigated by using the StashBuffer.unstash with numberOfMessages parameter and then send a message to context.self before continuing unstashing more. That means that other new messages may arrive in-between and those must be stashed to keep the original order of messages. It becomes more complicated, so better keep the number of stashed messages low.
Bottom line: you should keep the stashed message count low. It might be not suitable for load operation.

How to wait for file upload stream to complete in Akka actor

Recently I started using Akka and I am using it to create a REST API using Akka HTTP to upload a file. The file can have millions of records, and for each record I need to perform some validation and business logic. The way I have modeled my actors are, the root actor receives the file stream, converts bytes to String and then splits the records by line separator. After doing this it sends the stream (record by record) to another actor for processing, which in turn distributes the records to other actors based on some grouping. To send the steam from the main root actor to the actor for processing I am using Sink.actorRefWithAck.
This is working fine for a small file, but for a large file what I have observed is, I am getting multiple chunks and the first chunk is getting processed. If I add Thread.sleep for a few seconds based on the load, then it is processing the whole file. I am wondering if there is any way I can know if the stream has been consumed by the processing actor completely so that I don't have to deal with Thread.sleep. Here is the code snippet that I have used:
val AckMessage = DefaultFileUploadProcessActor.Ack
val receiver = context.system.actorOf(
Props(new DefaultFileUploadProcessActor(uuid, sourceId)(self, ackWith = AckMessage)))
// sent from stream to actor to indicate start, end or failure of stream:
val InitMessage = DefaultFileUploadProcessActor.StreamInitialized
val OnCompleteMessage = DefaultFileUploadProcessActor.StreamCompleted
val onErrorMessage = (ex: Throwable) => DefaultFileUploadProcessActor.StreamFailure(ex)
val actorSink = Sink.actorRefWithAck(
receiver,
onInitMessage = InitMessage,
ackMessage = AckMessage,
onCompleteMessage = OnCompleteMessage,
onFailureMessage = onErrorMessage
)
val processStream =
fileStream
.map(byte => byte.utf8String.split(System.lineSeparator()))
.runWith(actorSink)
Thread.sleep(9000)
log.info(s"completed distribution of data to the actors")
sender() ! ActionPerformed(uuid, "Done")
Any expert advice on the approach I have taken will be highly appreciated.
If you have Source with only one file you can await the stream completion by awaiting Future which is returned from runWith method.
If you have Source of multiple files, you should write something like:
filesSource
.mapAsync(1)(data => (receiver ? data).mapTo[ProcessingResult])
.mapAsync(1)(processingResult => (resultListener ? processingResult).mapTo[ListenerResponse])
.runWith(Sink.ignore)
Assuming that fileStream is a Source[ByteString, Future[IOResult], one idea is to retain the materialized value of the source, then fire off the reply to sender() once this materialized value has completed:
val processStream: Future[IOResult] =
fileStream
.map(_.utf8String.split(System.lineSeparator()))
.to(actorSink)
.run()
processStream.onComplete {
case Success(_) =>
log.info("completed distribution of data to the actors")
sender() ! ActionPerformed(uuid, "Done")
case Failure(t) =>
// ...
}
The above approach ensures that the entire file is consumed before the sender is notified.
Note that Akka Streams has a Framing object that can parse lines from a ByteString stream:
val processStream: Future[IOResult] =
fileStream
.via(Framing.delimiter(
ByteString(System.lineSeparator()),
maximumFrameLenght = 256,
allowTruncation = true))
.map(_.ut8String)
.to(actorSink) // the actor will have to expect String, not Array[String], messages
.run()
The receiver actor will receive the OnCompleteMessage or onErrorMessage when the stream has been completed successfully or with failure, so you should handle those messages in the receive block of the receiver DefaultFileUploadProcessActor actor.

Akka Stream return object from Sink

I've got a SourceQueue. When I offer an element to this I want it to pass through the Stream and when it reaches the Sink have the output returned to the code that offered this element (similar as Sink.head returns an element to the RunnableGraph.run() call).
How do I achieve this? A simple example of my problem would be:
val source = Source.queue[String](100, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.ReturnTheStringSomehow
val graph = source.via(flow).to(sink).run()
val x = graph.offer("foo")
println(x) // Output should be "Modified foo"
val y = graph.offer("bar")
println(y) // Output should be "Modified bar"
val z = graph.offer("baz")
println(z) // Output should be "Modified baz"
Edit: For the example I have given in this question Vladimir Matveev provided the best answer. However, it should be noted that this solution only works if the elements are going into the sink in the same order they were offered to the source. If this cannot be guaranteed the order of the elements in the sink may differ and the outcome might be different from what is expected.
I believe it is simpler to use the already existing primitive for pulling values from a stream, called Sink.queue. Here is an example:
val source = Source.queue[String](128, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.queue[String]().withAttributes(Attributes.inputBuffer(1, 1))
val (sourceQueue, sinkQueue) = source.via(flow).toMat(sink)(Keep.both).run()
def getNext: String = Await.result(sinkQueue.pull(), 1.second).get
sourceQueue.offer("foo")
println(getNext)
sourceQueue.offer("bar")
println(getNext)
sourceQueue.offer("baz")
println(getNext)
It does exactly what you want.
Note that setting the inputBuffer attribute for the queue sink may or may not be important for your use case - if you don't set it, the buffer will be zero-sized and the data won't flow through the stream until you invoke the pull() method on the sink.
sinkQueue.pull() yields a Future[Option[T]], which will be completed successfully with Some if the sink receives an element or with a failure if the stream fails. If the stream completes normally, it will be completed with None. In this particular example I'm ignoring this by using Option.get but you would probably want to add custom logic to handle this case.
Well, you know what offer() method returns if you take a look at its definition :) What you can do is to create Source.queue[(Promise[String], String)], create helper function that pushes pair to stream via offer, make sure offer doesn't fail because queue might be full, then complete promise inside your stream and use future of the promise to catch completion event in external code.
I do that to throttle rate to external API used from multiple places of my project.
Here is how it looked in my project before Typesafe added Hub sources to akka
import scala.concurrent.Promise
import scala.concurrent.Future
import java.util.concurrent.ConcurrentLinkedDeque
import akka.stream.scaladsl.{Keep, Sink, Source}
import akka.stream.{OverflowStrategy, QueueOfferResult}
import scala.util.Success
private val queue = Source.queue[(Promise[String], String)](100, OverflowStrategy.backpressure)
.toMat(Sink.foreach({ case (p, param) =>
p.complete(Success(param.reverse))
}))(Keep.left)
.run
private val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
private def sendQueuedRequest(request: String): Future[String] = {
val p = Promise[String]
val offerFuture = queue.offer(p -> request)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete(_ => futureDeque.remove(future))
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendQueuedRequest(request)))
else
sendQueuedRequest(request)
}
}
I realize that blocking synchronized queue may be bottleneck and may grow indefinitely but because API calls in my project are made only from other akka streams which are backpressured I never have more than dozen items in futureDeque. Your situation may differ.
If you create MergeHub.source[(Promise[String], String)]() instead you'll get reusable sink. Thus every time you need to process item you'll create complete graph and run it. In that case you won't need hacky java container to queue requests.

How to use an Akka Streams SourceQueue with PlayFramework

I would like to use a SourceQueue to push elements dynamically into an Akka Stream source.
Play controller needs a Source to be able to stream a result using the chuncked method.
As Play uses its own Akka Stream Sink under the hood, I can't materialize the source queue myself using a Sink because the source would be consumed before it's used by the chunked method (except if I use the following hack).
I'm able to make it work if I pre-materialize the source queue using a reactive-streams publisher, but it's a kind of 'dirty hack' :
def sourceQueueAction = Action{
val (queue, pub) = Source.queue[String](10, OverflowStrategy.fail).toMat(Sink.asPublisher(false))(Keep.both).run()
//stupid example to push elements dynamically
val tick = Source.tick(0 second, 1 second, "tick")
tick.runForeach(t => queue.offer(t))
Ok.chunked(Source.fromPublisher(pub))
}
Is there a simpler way to use an Akka Streams SourceQueue with PlayFramework?
Thanks
The solution is to use mapMaterializedValue on the source to get a future of its queue materialization :
def sourceQueueAction = Action {
val (queueSource, futureQueue) = peekMatValue(Source.queue[String](10, OverflowStrategy.fail))
futureQueue.map { queue =>
Source.tick(0.second, 1.second, "tick")
.runForeach (t => queue.offer(t))
}
Ok.chunked(queueSource)
}
//T is the source type, here String
//M is the materialization type, here a SourceQueue[String]
def peekMatValue[T, M](src: Source[T, M]): (Source[T, M], Future[M]) = {
val p = Promise[M]
val s = src.mapMaterializedValue { m =>
p.trySuccess(m)
m
}
(s, p.future)
}
Would like to share an insight I got today, though it may not be appropriate to your case with Play.
Instead of thinking of a Source to trigger, one can often turn the problem upside down and provide a Sink to the function that does the sourcing.
In such a case, the Sink would be the "recipe" (non-materialized) stage and we can now use Source.queue and materialize it right away. Got queue. Got the flow that it runs.

How can Akka streams be materialized continually?

I am using Akka Streams in Scala to poll from an AWS SQS queue using the AWS Java SDK. I created an ActorPublisher which dequeues messages on a two second interval:
class SQSSubscriber(name: String) extends ActorPublisher[Message] {
implicit val materializer = ActorMaterializer()
val schedule = context.system.scheduler.schedule(0 seconds, 2 seconds, self, "dequeue")
val client = new AmazonSQSClient()
client.setRegion(RegionUtils.getRegion("us-east-1"))
val url = client.getQueueUrl(name).getQueueUrl
val MaxBufferSize = 100
var buf = Vector.empty[Message]
override def receive: Receive = {
case "dequeue" =>
val messages = iterableAsScalaIterable(client.receiveMessage(new ReceiveMessageRequest(url).getMessages).toList
messages.foreach(self ! _)
case message: Message if buf.size == MaxBufferSize =>
log.error("The buffer is full")
case message: Message =>
if (buf.isEmpty && totalDemand > 0)
onNext(message)
else {
buf :+= message
deliverBuf()
}
case Request(_) =>
deliverBuf()
case Cancel =>
context.stop(self)
}
#tailrec final def deliverBuf(): Unit =
if (totalDemand > 0) {
if (totalDemand <= Int.MaxValue) {
val (use, keep) = buf.splitAt(totalDemand.toInt)
buf = keep
use foreach onNext
} else {
val (use, keep) = buf.splitAt(Int.MaxValue)
buf = keep
use foreach onNext
deliverBuf()
}
}
}
In my application, I am attempting to run the flow at a 2 second interval as well:
val system = ActorSystem("system")
val sqsSource = Source.actorPublisher[Message](SQSSubscriber.props("queue-name"))
val flow = Flow[Message]
.map { elem => system.log.debug(s"${elem.getBody} (${elem.getMessageId})"); elem }
.to(Sink.ignore)
system.scheduler.schedule(0 seconds, 2 seconds) {
flow.runWith(sqsSource)(ActorMaterializer()(system))
}
However, when I run my application I receive java.util.concurrent.TimeoutException: Futures timed out after [20000 milliseconds] and subsequent dead letter notices which is caused by the ActorMaterializer.
Is there a recommended approach for continually materializing an Akka Stream?
I don't think you need to create a new ActorPublisher every 2 seconds. This seems redundant and wasteful of memory. Also, I don't think an ActorPublisher is necessary. From what I can tell of the code, your implementation will have an ever growing number of Streams all querying the same data. Each Message from the client will be processed by N different akka Streams and, even worse, N will grow over time.
Iterator For Infinite Loop Querying
You can get the same behavior from your ActorPublisher by using scala's Iterator. It is possible to create an Iterator which continuously queries the client:
//setup the client
val client = {
val sqsClient = new AmazonSQSClient()
sqsClient setRegion (RegionUtils getRegion "us-east-1")
sqsClient
}
val url = client.getQueueUrl(name).getQueueUrl
//single query
def queryClientForMessages : Iterable[Message] = iterableAsScalaIterable {
client receiveMessage (new ReceiveMessageRequest(url).getMessages)
}
def messageListIteartor : Iterator[Iterable[Message]] =
Iterator continually messageListStream
//messages one-at-a-time "on demand", no timer pushing you around
def messageIterator() : Iterator[Message] = messageListIterator flatMap identity
This implementation only queries the client when all previous Messages have been consumed and is therefore truly reactive. No need to keep track of a buffer with fixed size. Your solution needs a buffer because the creation of Messages (via a timer) is de-coupled from the consumption of Messages (via println). In my implementation, creation & consumption are tightly coupled via back-pressure.
Akka Stream Source
You can then use this Iterator generator-function to feed an akka stream Source:
def messageSource : Source[Message, _] = Source fromIterator messageIterator
Flow Formation
And finally you can use this Source to perform the println (As a side note: your flow value is actually a Sink since Flow + Sink = Sink). Using your flow value from the question:
messageSource runWith flow
One akka Stream processing all messages.