Throttle HTTP request on Akka/Spray - scala

I'm using Akka actors in Scala to download resources from external service (HTTP get request). Response from external service is JSON and I have to use paging (provider is very slow). I want to download all paged results concurrently in 10 threads. I use an URL such as this to download chunk: http://service.com/itmes?limit=50&offset=1000
I have created following pipeline:
ScatterActor => RoundRobinPool[10](LoadChunkActor) => Aggreator
ScatterActor takes total count of items to download and divides it into chunks. I created 10 LoadChunkActor's to process tasks concurrently.
override def receive: Receive = {
case LoadMessage(limit) =>
val offsets: IndexedSeq[Int] = 0 until limit by chunkSize
offsets.foreach(offset => context.system.actorSelection(pipe) !
LoadMessage(chunkSize, offset))
}
LoadChunkActor uses Spray to send request. Actor looks like this:
val pipeline = sendReceive ~> unmarshal[List[Items]]
override def receive: Receive = {
case LoadMessage(limit, offset) =>
val uri: String = s"http://service.com/items?limit=50&offset=$offset"
val responseFuture = pipeline {Get(uri)}
responseFuture onComplete {
case Success(items) => aggregator ! Loaded(items)
}
}
As you can see, LoadChunkActor is requesting chunk from external service and adding callback to be run onComplete. Actor is now ready to take another message and he is requesting another chunk. Spray is using nonblocking API to download chunks. In result external service is flooded with my requests and I get timeouts.
How can I schedule list of tasks but I want to process maximum 10 at the same time?

I have created following solution (similar to pulling http://www.michaelpollmeier.com/akka-work-pulling-pattern/:
ScatterActor (10000x messages) =>
ThrottleActor => LoadChunkActor => ThrottleMonitorActor => Aggregator
^ |
|<--------WorkDoneMessage------------|
ThrottleActor pub messages into ListBuffer and sends to LoadChunkActor maximum N count of messages.
When LoadChunkActor sends message to Aggregator through ThrottleMonitorActor.
ThrottleMonitorActor sends confirmation to ThrottleActor.
ThrottleActor sends next message to LoadChunkActor.

From the project adhoclabs/akka-http-contrib, you now (July 2016, two years later) the scala.co.adhoclabs.akka.http.contrib.throttle package from Yeghishe Piruzyan.
See "Akka Http Request Throttling"
implicit val throttleSettings = MetricThrottleSettings.fromConfig
Http().bindAndHandle(
throttle.apply(routes),
httpInterface,
httpPort
)

Related

How to handle backpressure when Streaming file from s3 with actor interop

I am trying to download a large file from S3 and sending it's data to another actor that is doing an http request and then to persist the response. I want to limit number of requests sent by that actor hence I need to handle backpressure.
I tried doing something like this :
S3.download(bckt, bcktKey).map{
case Some((file, _)) =>
file
.via(CsvParsing.lineScanner())
.map(_.map(_.utf8String)).drop(1)//drop headers
.map(p => Foo(p.head, p(1)))
.mapAsync(30) { p =>
implicit val askTimeout: Timeout = Timeout(10 seconds)
(httpClientActor ? p).mapTo[Buzz]
}
.mapAsync(1){
case b#Buzz(_, _) =>
(persistActor ? b).mapTo[Done]
}.runWith(Sink.head)
The problem is that I see that it reads only 30 lines from file as the limit set for parallelism. I am not sure that this is the correct way to achieve what I'm looking for
As Johny notes in his comment, the Sink.head is what causes the stream to only process about 30 elements. What happens is approximately:
Sink.head signals demand for 1 element
this demand propagates up through the second mapAsync
when the demand reaches the first mapAsync, since that one has parallelism 30, it signals demand for 30 elements
the CSV parsing stages emit 30 elements
when the response to the ask with the first element from the client actor is received, the response propagates down to the ask of the persist actor
demand is signaled for one more element from the CSV parsing stages
when the persist actor responds, the response goes to the sink
since the sink is Sink.head which cancels the stream once it receives an element, the stream gets torn down
any asks of the client actor which have been sent but are awaiting a response will still get processed
There's a bit of a race between the persist actor's response and the CSV parsing and sending an ask to the client actor: if the latter is faster, 31 lines might get processed by the client actor.
If you just want a Future[Done] after every element has been processed, Sink.last will work very well with this code.
If the reason is not the usage of Sink.head as I mentioned in the comment, you can backpressure the stream using Sink.actorRefWithBackpressure.
Sample code:
class PersistActor extends Actor {
override def receive: Receive = {
case "init" =>
println("Initialized")
case "complete" =>
context.stop(self)
case message =>
//Persist Buzz??
sender() ! Done
}
}
val sink = Sink
.actorRefWithBackpressure(persistActor, "init", Done, "complete", PartialFunction.empty)
S3.download(bckt, bcktKey).map{
case Some((file, _)) =>
file
.via(CsvParsing.lineScanner())
.map(_.map(_.utf8String)).drop(1)//drop headers
.map(p => Foo(p.head, p(1)))
//You could backpressure here too...
.mapAsync(30) { p =>
implicit val askTimeout: Timeout = Timeout(10 seconds)
(httpClientActor ? p).mapTo[Buzz]
}
.to(sink)
.run()

Forwarding (Downloading/Uploading) Large File via Akka HTTP / Akka Streams

I have a service that takes an HttpRequest from a client to get a file from another server via REST and then forward the file to the client as an HttpResponse.
Don't ask me why the client doesn't ask for the file him/herself because that is a long story.
I compiled a strategy to download the file to the file system and then send the file to the client. This is using extracts from other stackoveflow responses from #RamonJRomeroyVigil.
def downloadFile(request: HttpRequest, fileName: String): Future[IOResult] = {
Http().singleRequest(request).flatMap { response =>
val source = response.entity.dataBytes
source.runWith(FileIO.toPath(filePath))
}
}
def buildResponse(fileName: String)
val bufferedSrc = scala.io.Source.fromFile(fileName)
val source = Source
.fromIterator(() => bufferedSrc.getLines())
.map(ChunkStreamPart.apply)
HttpResponse(entity = HttpEntity.Chunked(ContentTypes.`application/octet-stream`, source))
}
However, I would like to do this in one step without saving the file system and taking advantage of the streaming abilities.
I also would like to limit the amount of request the client can serve at the same time to 5.
Thanks
As you are already getting the file as a stream from the second server, you can forward it directly to the client. You only need to build your HttpResponse on the fly :
def downloadFile(request: HttpRequest) : Future[HttpResponse] = {
Http().singleRequest(request).map {
case okResponse # HttpResponse(StatusCodes.OK, _, _, _) =>
HttpResponse(
entity = HttpEntity.Chunked(ContentTypes.`application/octet-stream`,
okResponse
.entity
.dataBytes
.map(ChunkStreamPart.apply)
))
case nokResponse # HttpResponse(_, _, _, _) =>
nokResponse
}
}
To change the maximum number of concurrent requests allowed for the client, you would need to set akka.http.client.host-connection-pool.max-connections and
akka.http.client.host-connection-pool.max-open-requests. More details can be found here.

How to wait for file upload stream to complete in Akka actor

Recently I started using Akka and I am using it to create a REST API using Akka HTTP to upload a file. The file can have millions of records, and for each record I need to perform some validation and business logic. The way I have modeled my actors are, the root actor receives the file stream, converts bytes to String and then splits the records by line separator. After doing this it sends the stream (record by record) to another actor for processing, which in turn distributes the records to other actors based on some grouping. To send the steam from the main root actor to the actor for processing I am using Sink.actorRefWithAck.
This is working fine for a small file, but for a large file what I have observed is, I am getting multiple chunks and the first chunk is getting processed. If I add Thread.sleep for a few seconds based on the load, then it is processing the whole file. I am wondering if there is any way I can know if the stream has been consumed by the processing actor completely so that I don't have to deal with Thread.sleep. Here is the code snippet that I have used:
val AckMessage = DefaultFileUploadProcessActor.Ack
val receiver = context.system.actorOf(
Props(new DefaultFileUploadProcessActor(uuid, sourceId)(self, ackWith = AckMessage)))
// sent from stream to actor to indicate start, end or failure of stream:
val InitMessage = DefaultFileUploadProcessActor.StreamInitialized
val OnCompleteMessage = DefaultFileUploadProcessActor.StreamCompleted
val onErrorMessage = (ex: Throwable) => DefaultFileUploadProcessActor.StreamFailure(ex)
val actorSink = Sink.actorRefWithAck(
receiver,
onInitMessage = InitMessage,
ackMessage = AckMessage,
onCompleteMessage = OnCompleteMessage,
onFailureMessage = onErrorMessage
)
val processStream =
fileStream
.map(byte => byte.utf8String.split(System.lineSeparator()))
.runWith(actorSink)
Thread.sleep(9000)
log.info(s"completed distribution of data to the actors")
sender() ! ActionPerformed(uuid, "Done")
Any expert advice on the approach I have taken will be highly appreciated.
If you have Source with only one file you can await the stream completion by awaiting Future which is returned from runWith method.
If you have Source of multiple files, you should write something like:
filesSource
.mapAsync(1)(data => (receiver ? data).mapTo[ProcessingResult])
.mapAsync(1)(processingResult => (resultListener ? processingResult).mapTo[ListenerResponse])
.runWith(Sink.ignore)
Assuming that fileStream is a Source[ByteString, Future[IOResult], one idea is to retain the materialized value of the source, then fire off the reply to sender() once this materialized value has completed:
val processStream: Future[IOResult] =
fileStream
.map(_.utf8String.split(System.lineSeparator()))
.to(actorSink)
.run()
processStream.onComplete {
case Success(_) =>
log.info("completed distribution of data to the actors")
sender() ! ActionPerformed(uuid, "Done")
case Failure(t) =>
// ...
}
The above approach ensures that the entire file is consumed before the sender is notified.
Note that Akka Streams has a Framing object that can parse lines from a ByteString stream:
val processStream: Future[IOResult] =
fileStream
.via(Framing.delimiter(
ByteString(System.lineSeparator()),
maximumFrameLenght = 256,
allowTruncation = true))
.map(_.ut8String)
.to(actorSink) // the actor will have to expect String, not Array[String], messages
.run()
The receiver actor will receive the OnCompleteMessage or onErrorMessage when the stream has been completed successfully or with failure, so you should handle those messages in the receive block of the receiver DefaultFileUploadProcessActor actor.

Why does Source.tick stop after one hundred HttpRequests?

Using akka stream and akka HTTP, I have created a stream which polls an api every 3 seconds, Unmarshalls the result to a JsValue object and sends this result to an actor. As can be seen in the following code:
// Source wich performs an http request every 3 seconds.
val source = Source.tick(0.seconds,
3.seconds,
HttpRequest(uri = Uri(path = Path("/posts/1"))))
// Processes the result of the http request
val flow = Http().outgoingConnectionHttps("jsonplaceholder.typicode.com").mapAsync(1) {
// Able to reach the API.
case HttpResponse(StatusCodes.OK, _, entity, _) =>
// Unmarshal the json response.
Unmarshal(entity).to[JsValue]
// Failed to reach the API.
case HttpResponse(code, _, entity, _) =>
entity.discardBytes()
Future.successful(code.toString())
}
// Run stream
source.via(flow).runWith(Sink.actorRef[Any](processJsonActor,akka.actor.Status.Success(("Completed stream"))))
This works, however the stream closes after 100 HttpRequests (ticks).
What is the cause of this behaviour?
Definitely something to do with outgoingConnectionHttps. This is a low level DSL and there could be some misconfigured setting somewhere which is causing this (although I couldn't figure out which one).
Usage of this DSL is actually discouraged by the docs.
Try using a higher level DSL like cached connection pool
val flow = Http().cachedHostConnectionPoolHttps[NotUsed]("akka.io").mapAsync(1) {
// Able to reach the API.
case (Success(HttpResponse(StatusCodes.OK, _, entity, _)), _) =>
// Unmarshal the json response.
Unmarshal(entity).to[String]
// Failed to reach the API.
case (Success(HttpResponse(code, _, entity, _)), _) =>
entity.discardBytes()
Future.successful(code.toString())
case (Failure(e), _) ⇒
throw e
}
// Run stream
source.map(_ → NotUsed).via(flow).runWith(...)
A potential issue is that there is no backpressure signal with Sink.actorRef, so the actor's mailbox could be getting full. If the actor, whenever it receives a JsValue object, is doing something that could take a long time, use Sink.actorRefWithAck instead. For example:
val initMessage = "start"
val completeMessage = "done"
val ackMessage = "ack"
source
.via(flow)
.runWith(Sink.actorRefWithAck[Any](
processJsonActor, initMessage, ackMessage, completeMessage))
You would need to change the actor to handle an initMessage and reply to the stream for every stream element with an ackMessage (with sender ! ackMessage). More information on Sink.actorRefWithAck is found here.

Creating Enumerator from client-sent data

I have a REST service (Play Framework 2.0 w/Scala) that receives messages via a POST request.
I want to allow a user to see the queue of messages received in a webpage. I wanted to create a SSE channel between browser and server, so the server pushes new messages to the browser.
To create that SSE stream, as per documentation, I'm using a chain of Enumerator/Enumeratee/Iteratee.
My problem is: how do I inject the messages received from the POST request to the enumerator. So given a code like follows:
def receive(msg: String) = Action {
sendToEnumerator()
Ok
}
val enumerator = Enumerator.fromCallback( ??? )
def sseStream() = Action {
Ok.stream(enumerator &> anotherEnumeratee ><> EventStrem()).as("text/evetn-stream")
}
What should I put in both sendToEnumerator and enumerator (where the ??? are). Or should I just use WebSockets and Actors instead? (I favour SEE due to broader compatibility, so would like to use SSE if possible)
Ok, found a way:
// The enum for pushing data to spread to all connected users
val hubEnum = Enumerator.imperative[String]()
// The hub used to get multiple output of a common input (the hubEnum)
val hub = Concurrent.hub[String](hubEnum)
// Converts message to Json for the web version
private val asJson: Enumeratee[String, JsValue] = Enumeratee.map[String] {
text => JsObject(
List(
"eventName" -> JsString("eventName"),
"text" -> JsString(text)
)
)
}
// loads data into hubEnum
def receiveData(msg: String) = Action { implicit request =>
hubEnum push msg
}
// read the Hub iterator and pushes back to clients
def stream = Action { implicit request =>
Ok.stream(hub.getPatchCord &> asJson ><> EventSource()).as("text/event-stream")
}
The trick is to create an imperative Enumerator. This enumerator allows you to push data into it when it becomes available. With this then you can follow the standard procedure: create a Hub based on the enumerator, convert it with some Enumeratee and send it back to browsers via SSE.
Thanks to this website for giving me the solution :)