Sending actorRefWithAck inside stream - scala

I'm using answer from this thread because I need to treat first element especially. The problem is, I need to send this data to another Actor or persist locally (which is not possibl).
So, my stream looks like this:
val flow: Flow[Message, Message, (Future[Done], Promise[Option[Message]])] = Flow.fromSinkAndSourceMat(
Flow[Message].mapAsync[Trade](1) {
case TextMessage.Strict(text) =>
Unmarshal(text).to[Trade]
case streamed: TextMessage.Streamed =>
streamed.textStream.runFold("")(_ ++ _).flatMap(Unmarshal(_).to[Trade])
}.groupBy(pairs.size, _.s).prefixAndTail(1).flatMapConcat {
case (head, tail) =>
// sending first element here
val result = Source(head).to(Sink.actorRefWithAck(
ref = actor,
onInitMessage = Init,
ackMessage = Ack,
onCompleteMessage = "done"
)).run()
// some kind of operation on the result
Source(head).concat(tail)
}.mergeSubstreams.toMat(sink)(Keep.right),
Source.maybe[Message])(Keep.both)
Is this a good practice? Will it have unintended consequences? Unfortunately, I cannot call persist inside stream, so I want to send this data to the external system.

Your current approach doesn't use result in any way, so a simpler alternative would be to fire and forget the first Message to the actor:
groupBy(pairs.size, _.s).prefixAndTail(1).flatMapConcat {
case (head, tail) =>
// sending first element here
actor ! head.head
Source(head).concat(tail)
}
The actor would then not have to worry about handling Init and sending Ack messages and could be solely concerned with persisting Message instances.

Related

How to handle backpressure when Streaming file from s3 with actor interop

I am trying to download a large file from S3 and sending it's data to another actor that is doing an http request and then to persist the response. I want to limit number of requests sent by that actor hence I need to handle backpressure.
I tried doing something like this :
S3.download(bckt, bcktKey).map{
case Some((file, _)) =>
file
.via(CsvParsing.lineScanner())
.map(_.map(_.utf8String)).drop(1)//drop headers
.map(p => Foo(p.head, p(1)))
.mapAsync(30) { p =>
implicit val askTimeout: Timeout = Timeout(10 seconds)
(httpClientActor ? p).mapTo[Buzz]
}
.mapAsync(1){
case b#Buzz(_, _) =>
(persistActor ? b).mapTo[Done]
}.runWith(Sink.head)
The problem is that I see that it reads only 30 lines from file as the limit set for parallelism. I am not sure that this is the correct way to achieve what I'm looking for
As Johny notes in his comment, the Sink.head is what causes the stream to only process about 30 elements. What happens is approximately:
Sink.head signals demand for 1 element
this demand propagates up through the second mapAsync
when the demand reaches the first mapAsync, since that one has parallelism 30, it signals demand for 30 elements
the CSV parsing stages emit 30 elements
when the response to the ask with the first element from the client actor is received, the response propagates down to the ask of the persist actor
demand is signaled for one more element from the CSV parsing stages
when the persist actor responds, the response goes to the sink
since the sink is Sink.head which cancels the stream once it receives an element, the stream gets torn down
any asks of the client actor which have been sent but are awaiting a response will still get processed
There's a bit of a race between the persist actor's response and the CSV parsing and sending an ask to the client actor: if the latter is faster, 31 lines might get processed by the client actor.
If you just want a Future[Done] after every element has been processed, Sink.last will work very well with this code.
If the reason is not the usage of Sink.head as I mentioned in the comment, you can backpressure the stream using Sink.actorRefWithBackpressure.
Sample code:
class PersistActor extends Actor {
override def receive: Receive = {
case "init" =>
println("Initialized")
case "complete" =>
context.stop(self)
case message =>
//Persist Buzz??
sender() ! Done
}
}
val sink = Sink
.actorRefWithBackpressure(persistActor, "init", Done, "complete", PartialFunction.empty)
S3.download(bckt, bcktKey).map{
case Some((file, _)) =>
file
.via(CsvParsing.lineScanner())
.map(_.map(_.utf8String)).drop(1)//drop headers
.map(p => Foo(p.head, p(1)))
//You could backpressure here too...
.mapAsync(30) { p =>
implicit val askTimeout: Timeout = Timeout(10 seconds)
(httpClientActor ? p).mapTo[Buzz]
}
.to(sink)
.run()

How do I call context become outside of a Future from Ask messages?

I have a parent akka actor named buildingCoordinator which creates childs name elevator_X. For now I am creating only one elevator. The buildingCoordinator sends a sequence of messages and wait for responses in order to move an elevator. The sequence is this: sends ? RequestElevatorState -> receive ElevatorState -> sends ? MoveRequest -> receives MoveRequestSuccess -> changes the state. As you can see I am using the ask pattern. After the movement is successes the buildingCoordinator changes its state using context.become.
The problem that I am running is that the elevator is receiving MoveRequest(1,4) for the same floor twice, sometimes three times. I do remove the floor when I call context.become. However I remove inside the last map. I think it is because I am using context.become inside a future and I should use it outside. But I am having trouble implementing it.
case class BuildingCoordinator(actorName: String,
numberOfFloors: Int,
numberOfElevators: Int,
elevatorControlSystem: ElevatorControlSystem)
extends Actor with ActorLogging {
import context.dispatcher
implicit val timeout = Timeout(4 seconds)
val elevators = createElevators(numberOfElevators)
override def receive: Receive = operational(Map[Int, Queue[Int]](), Map[Int, Queue[Int]]())
def operational(stopsRequests: Map[Int, Queue[Int]], pickUpRequests: Map[Int, Queue[Int]]): Receive = {
case msg#MoveElevator(elevatorId) =>
println(s"[BuildingCoordinator] received $msg")
val elevatorActor: ActorSelection = context.actorSelection(s"/user/$actorName/elevator_$elevatorId")
val newState = (elevatorActor ? RequestElevatorState(elevatorId))
.mapTo[ElevatorState]
.flatMap { state =>
val nextStop = elevatorControlSystem.findNextStop(stopsRequests.get(elevatorId).get, state.currentFloor, state.direction)
elevatorActor ? MoveRequest(elevatorId, nextStop)
}
.mapTo[MoveRequestSuccess]
.flatMap(moveRequestSuccess => elevatorActor ? MakeMove(elevatorId, moveRequestSuccess.targetFloor))
.mapTo[MakeMoveSuccess]
.map { makeMoveSuccess =>
println(s"[BuildingCoordinator] Elevator ${makeMoveSuccess.elevatorId} arrived at floor [${makeMoveSuccess.floor}]")
// removeStopRequest
val stopsRequestsElevator = stopsRequests.get(elevatorId).getOrElse(Queue[Int]())
val newStopsRequestsElevator = stopsRequestsElevator.filterNot(_ == makeMoveSuccess.floor)
val newStopsRequests = stopsRequests + (elevatorId -> newStopsRequestsElevator)
val pickUpRequestsElevator = pickUpRequests.get(elevatorId).getOrElse(Queue[Int]())
val newPickUpRequestsElevator = {
if (pickUpRequestsElevator.contains(makeMoveSuccess.floor)) {
pickUpRequestsElevator.filterNot(_ == makeMoveSuccess.floor)
} else {
pickUpRequestsElevator
}
}
val newPickUpRequests = pickUpRequests + (elevatorId -> newPickUpRequestsElevator)
// I THINK I SHOULD NOT CALL context.become HERE
// context.become(operational(newStopsRequests, newPickUpRequests))
val dropOffFloor = BuildingUtil.generateRandomFloor(numberOfFloors, makeMoveSuccess.floor, makeMoveSuccess.direction)
context.self ! DropOffRequest(makeMoveSuccess.elevatorId, dropOffFloor)
(newStopsRequests, newPickUpRequests)
}
// I MUST CALL context.become HERE, BUT I DONT KNOW HOW
// context.become(operational(newState.flatMap(state => (state._1, state._2))))
}
Other thing that might be nasty here is this big chain of map and flatMap. This was my way to implement, however I think it might exist one way better.
You can't and you should not call context.become or anyhow change actor state outside Receive method and outside Receive method invoke thread (which is Akka distpatcher thread), like in your example. Eg:
def receive: Receive = {
// This is a bug, because context is not and is not supposed to be thread safe.
case message: Message => Future(context.become(anotherReceive))
}
What you should do - send message to self after async operation finished and change the state receive after. If in a mean time you don't want to handle incoming messages - you can stash them. See for more details: https://doc.akka.io/docs/akka/current/typed/stash.html
High level example, technical details omitted:
case OperationFinished(calculations: Map[Any, Any])
class AsyncActor extends Actor with Stash {
def operation: Future[Map[Any, Any]] = ...//some implementation of heavy async operation
def receiveStartAsync(calculations: Map[Any, Any]): Receive = {
case StartAsyncOperation =>
//Start async operation and inform yourself that it is finished
operation.map(OperationFinished.apply) pipeTo self
context.become(receiveWaitAsyncOperation)
}
def receiveWaitAsyncOperation: Receive = {
case OperationFinished =>
unstashAll()
context.become(receiveStartAsync)
case _ => stash()
}
}
I like your response #Ivan Kurchenko.
But, according to: Akka Stash docs
When unstashing the buffered messages by calling unstashAll the messages will be processed sequentially in the order they were added and all are processed unless an exception is thrown. The actor is unresponsive to other new messages until unstashAll is completed. That is another reason for keeping the number of stashed messages low. Actors that hog the message processing thread for too long can result in starvation of other actors.
Meaning that under load, for example, the unstashAll operation will cause all other Actors to be in starvation.
According to the same doc:
That can be mitigated by using the StashBuffer.unstash with numberOfMessages parameter and then send a message to context.self before continuing unstashing more. That means that other new messages may arrive in-between and those must be stashed to keep the original order of messages. It becomes more complicated, so better keep the number of stashed messages low.
Bottom line: you should keep the stashed message count low. It might be not suitable for load operation.

Recover on Akka ask based on the message sent

I am sending different messages to the actor via ask. On timeout I'd like to provide a default value which is different for messages being asked to the actor.
Since the Timeout Exception is always the same, I can not use it in recover to return different default values I need the original message being sent.
How can one achieve that.
Code example:
val storageActorProxy = Flow[ByteString]
.via(Framing.lengthField(TCPMessage.sizeFieldLength, TCPMessage.sizeFieldIndex, Int.MaxValue))
.map(TCPMessage.decode)
.ask[OperationResponse](storageActor)
//TODO: looking for this recover; non-existent AFAIK
.customRecover {
case Op1 => DefaultResponseA()
case Op2 => DefaultResponseB()
}
.map(TCPMessage.encode(_).toByteString)
Akka's ask method is actually pretty easy to recreate - it is just a mapAsync with some extra logic for better errors when the actor dies (see the code). As such, just use mapAsync manually so you can recover the ask error.
val storageActorProxy = Flow[ByteString]
.via(Framing.lengthField(TCPMessage.sizeFieldLength, TCPMessage.sizeFieldIndex, Int.MaxValue))
.map(TCPMessage.decode)
.mapAsync(parallelism = 2) { decodedMessage =>
(storageActor ? decodedMessage).recover {
case Op1 => DefaultResponseA()
case Op2 => DefaultResponseB()
}
}
.map(TCPMessage.encode(_).toByteString)

Why does Source.tick stop after one hundred HttpRequests?

Using akka stream and akka HTTP, I have created a stream which polls an api every 3 seconds, Unmarshalls the result to a JsValue object and sends this result to an actor. As can be seen in the following code:
// Source wich performs an http request every 3 seconds.
val source = Source.tick(0.seconds,
3.seconds,
HttpRequest(uri = Uri(path = Path("/posts/1"))))
// Processes the result of the http request
val flow = Http().outgoingConnectionHttps("jsonplaceholder.typicode.com").mapAsync(1) {
// Able to reach the API.
case HttpResponse(StatusCodes.OK, _, entity, _) =>
// Unmarshal the json response.
Unmarshal(entity).to[JsValue]
// Failed to reach the API.
case HttpResponse(code, _, entity, _) =>
entity.discardBytes()
Future.successful(code.toString())
}
// Run stream
source.via(flow).runWith(Sink.actorRef[Any](processJsonActor,akka.actor.Status.Success(("Completed stream"))))
This works, however the stream closes after 100 HttpRequests (ticks).
What is the cause of this behaviour?
Definitely something to do with outgoingConnectionHttps. This is a low level DSL and there could be some misconfigured setting somewhere which is causing this (although I couldn't figure out which one).
Usage of this DSL is actually discouraged by the docs.
Try using a higher level DSL like cached connection pool
val flow = Http().cachedHostConnectionPoolHttps[NotUsed]("akka.io").mapAsync(1) {
// Able to reach the API.
case (Success(HttpResponse(StatusCodes.OK, _, entity, _)), _) =>
// Unmarshal the json response.
Unmarshal(entity).to[String]
// Failed to reach the API.
case (Success(HttpResponse(code, _, entity, _)), _) =>
entity.discardBytes()
Future.successful(code.toString())
case (Failure(e), _) ⇒
throw e
}
// Run stream
source.map(_ → NotUsed).via(flow).runWith(...)
A potential issue is that there is no backpressure signal with Sink.actorRef, so the actor's mailbox could be getting full. If the actor, whenever it receives a JsValue object, is doing something that could take a long time, use Sink.actorRefWithAck instead. For example:
val initMessage = "start"
val completeMessage = "done"
val ackMessage = "ack"
source
.via(flow)
.runWith(Sink.actorRefWithAck[Any](
processJsonActor, initMessage, ackMessage, completeMessage))
You would need to change the actor to handle an initMessage and reply to the stream for every stream element with an ackMessage (with sender ! ackMessage). More information on Sink.actorRefWithAck is found here.

akka unstashAll not working

I am implementing an actor with multiple states and using Stash not to lose any messages. My states are initializing (get something from DB), running (handling requests) and updating (updating my state).
My problem is that I lose the messages when i try to unstashAll() in future resolving.
def initializing: Receive = {
case Initialize =>
log.info("initializing")
(for {
items1 <- db.getItems("1")
items2 <- db.getItems("2")
} yield items1 ::: items2) map {items =>
unstashAll()
context.become(running(items))
}
case r =>
log.debug(s"actor received message: $r while initializing and stashed it for further processing")
stash()}
i fixed it by changing my implementation to this
def initializing: Receive = {
case Initialize =>
log.info("initializing")
(for {
items1 <- db.getItems("1")
items2 <- db.getItems("2")
} yield items1 ::: items2) pipeTo self
context.become({
case items: List[Item] =>
unstashAll()
context.become(running(items))
case r =>
log.debug(s"actor received message: $r while initializing and stashed it for further processing")
stash()
})
case r =>
log.debug(s"actor received message: $r while initializing and stashed it for further processing")
stash()}
can anyone explain why the first didn't work ?
I think the unstashAll part works fine. The problem is that you were running it - together with your context.become - as part of a future callback.
This means that the code inside your map block escapes the predictability of your actor sequential processing. In other words, this could happen:
you call unstashAll
your messages are put back in the actor's mailbox
the actor picks up your messages one by one. The context still hasn't changed, so they are stashed again
your context finally becomes running (but it's too late)
The solution is - as you found out - the pipeTo pattern, which essentially sends a Futures result to the actor as a message. This makes it all sequential and predictable.