I am currently learning and playing around with STTP using the Monix backend. I am mainly stuck with closing the backend after all my requests (each request is a task) have been processed.
I have created sample/mock code to resemble my issue (to my understanding my problem is more general rather than specific to my code):
import sttp.client.asynchttpclient.monix._
import monix.eval.Task
import monix.reactive.Observable
import sttp.client.{Response, UriContext}
import scala.concurrent.duration.DurationInt
object ObservableTest extends App {
val activities = AsyncHttpClientMonixBackend().flatMap { implicit backend =>
val ids: Task[List[Int]] = Task { (1 to 3).toList }
val f: String => Task[Response[Either[String, String]]] = (i: String) => fetch(uri"$i", "")
val data: Task[List[Task[Response[Either[String, String]]]]] = ids map (_ map (_ => f("https://heloooo.free.beeceptor.com/my/api/path")))
data.guarantee(backend.close()) // If I close the backend here, I can' generate requests after (when processing the actual requests in the list)
// I have attempted to return a Task containing a tuple of (data, backend) but closing the backend from outside of the scope did not work as I expected
}
import monix.execution.Scheduler.Implicits.global
val obs = Observable
.fromTask(activities)
.flatMap { listOfFetches =>
Observable.fromIterable(listOfFetches)
}
.throttle(3 second, 1)
.map(_.runToFuture)
obs.subscribe()
}
And my fetch (api call maker) function looks like:
def fetch(uri: Uri, auth: String)(implicit
backend: SttpBackend[Task, Observable[ByteBuffer], WebSocketHandler]
) = {
println(uri)
val task = basicRequest
.get(uri)
.header("accept", "application/json")
.header("Authorization", auth)
.response(asString)
.send()
task
}
As my main task contains other tasks, which I later need to process, I need to find an alternative way to close the Monix backend from outside. Is there a clean way to do close the backend after I consume the requests in List[Task[Response[Either[String, String]]]]?
The problems comes from the fact that with the sttp backend open, you are computing a list of tasks to be performed - the List[Task[Response[Either[String, String]]]], but you are not running them. Hence, we need to sequence running these tasks, before the backend closes.
The key thing to do here is to create a single description of a task, that runs all of these requests while the backend is still open.
Once you compute data (which itself is a task - a description of a computation - which, when run, yield a list of tasks - also descriptions of computations), we need to convert this into a single, non-nested Task. This can be done in a variety of ways (e.g. using simple sequencing), but in your case this will be using the Observable:
AsyncHttpClientMonixBackend().flatMap { implicit backend =>
val ids: Task[List[Int]] = Task { (1 to 3).toList }
val f: String => Task[Response[Either[String, String]]] = (i: String) => fetch(uri"$i", "")
val data: Task[List[Task[Response[Either[String, String]]]]] =
ids map (_ map (_ => f("https://heloooo.free.beeceptor.com/my/api/path")))
val activities = Observable
.fromTask(data)
.flatMap { listOfFetches =>
Observable.fromIterable(listOfFetches)
}
.throttle(3 second, 1)
.mapEval(identity)
.completedL
activities.guarantee(
backend.close()
)
}
First note that the Observable.fromTask(...) is inside the outermost flatMap, so is created when the backend is still open. We create the observable, throttle it etc., and then comes the crucial fact: once we have the throttled stream, we evaluate each item (each item is a Task[...] - a description of how to send na http request) using mapEval. We get a stream of Either[String, String], which are the results of the requests.
Finally, we convert the stream to a Task using .completedL (discarding the results), which waits until the whole stream completes.
This final task is then sequenced with closing the backend. The final sequence of side-effects that will happen, as described above, is:
open the backend
create the list of tasks (data)
create a stream, which throttles elements from the list computed by data
evaluate each item in the stream (send requests)
close the backend
Related
Disclaimer: I am new to sttp and Monix, and that is my attempt to learn more about these libraries. My goal is to fetch data (client-side) from a given API via HTTP GET requests -> parse JSON responses -> write this information to a database. My question pertains to the first part only. My objective is to run get requests in an asynchronous (hopefully fast) way while having a way to either avoid or handle rate limits.
Below is a snippet of what I have already tried, and seems to work for a single request:
package com.github.client
import io.circe.{Decoder, HCursor}
import sttp.client._
import sttp.client.circe._
import sttp.client.asynchttpclient.monix._
import monix.eval.Task
object SO extends App {
case class Bla(paging: Int)
implicit val dataDecoder: Decoder[Bla] = (hCursor: HCursor) => {
for {
next_page <- hCursor.downField("foo").downArray.downField("bar").as[Int]
} yield Bla(next_page)
}
val postTask = AsyncHttpClientMonixBackend().flatMap { implicit backend =>
val r = basicRequest
.get(uri"https://foo.bar.io/v1/baz")
.header("accept", "application/json")
.header("Authorization", "hushh!")
.response(asJson[Bla])
r.send() // How can I instead of operating on a single request, operate on multiple
.flatMap { response =>
Task(response.body)
}
.guarantee(backend.close())
}
import monix.execution.Scheduler.Implicits.global
postTask.runSyncUnsafe() match {
case Left(error) => println(s"Error when executing request: $error")
case Right(data) => println(data)
}
}
My questions:
How can I operate on several GET Requests (instead of a single request) via using Monix, while keeping the code asynchronous and composable
How can I avoid or handle hitting rate-limits imposed by the api server
On a side note, I am also flexible in terms of using another back-end if that will support the rate limiting objective.
You can use monix.reactive.Observable like this
Observable.repeatEval(postTask) // we generate infinite observable of the task
.throttle(1.second, 3) // set throttling
.mapParallelOrderedF(2)(_.runToFuture) // set execution parallelism and execute tasks
.subscribe() // start the pipline
while (true) {}
Given a function A => IO[B] (aka Kleisli[IO, A, B]) that is meant to be called multiple times, and has side effects, like updating a DB, how to delegate such multiple calls of it into a stream (I guess Pipe[IO, A, B]) (fs2, monix observable/iterant)? Reason for this is to be able to accumulate state, batch calls together over a time window etc.
More concretely, http4s server requires a Request => IO[Response], so I am looking how to operate on streams (for the above benefits), but ultimately provide such a function to http4s.
I suspect it will need some correlation ID behind the scenes and I am fine with that, I am more interested in how to do it safely and properly from an FP perspective.
Ultimately, the signature I expect is probably something like:
Pipe[IO, A, B] => (A => IO[B]), such that calls to Kleisli are piped through the pipe.
As an afterthought, would it be at all possible to backpressure?
One idea is to model it with MPSC (Multiple Publisher Single Consumer). I'll give an example with Monix since I'm more familiar with it, but the idea stays the same even if you use FS2.
object MPSC extends App {
sealed trait Event
object Event {
// You'll need a promise in order to send the response back to user
case class SaveItem(num: Int, promise: Deferred[Task, Int]) extends Event
}
// For backpressure, take a look at `PublishSubject`.
val cs = ConcurrentSubject[Event](MulticastStrategy.Publish)
def pushEvent(num: Int) = {
for {
promise <- Deferred[Task, Int]
_ <- Task.delay(cs.onNext(SaveItem(num, promise)))
} yield promise
}
// You get a list of events now since it is buffered
// Monix has a lot of buffer strategies, check the docs for more details
def processEvents(items: Seq[Event]): Task[Unit] = {
Task.delay(println(s"Items: $items")) >>
Task.traverse(items) {
case SaveItem(_, promise) => promise.complete(Random.nextInt(100))
}.void
}
val app = for {
// Start the stream in the background
_ <- cs
.bufferTimed(3.seconds) // Buffer all events within 3 seconds
.filter(_.nonEmpty)
.mapEval(processEvents)
.completedL
.startAndForget
_ <- Task.sleep(1.second)
p1 <- pushEvent(10)
p2 <- pushEvent(20)
p3 <- pushEvent(30)
// Wait for the promise to complete, you'll do this for each request
x <- p1.get
y <- p2.get
z <- p3.get
_ <- Task.delay(println(s"Completed promise: [$x, $y, $z]"))
} yield ()
app.runSyncUnsafe()
}
I am using a third party library to provide parsing services (user agent parsing in my case) which is not a thread safe library and has to operate on a single threaded basis. I would like to write a thread safe API that can be called by multiple threads to interact with it via Futures API as the library might introduce some potential blocking (IO). I would also like to provide back pressure when necessary and return a failed future when the parser doesn't catch up with the producers.
It could actually be a generic requirement/question, how to interact with any client/library which is not thread safe (user agents/geo locations parsers, db clients like redis, loggers collectors like fluentd), with back pressure in a concurrent environments.
I came up with the following formula:
encapsulate the parser within a dedicated Actor.
create an akka stream source queue that receives ParseReuqest that contains the user agent and a Promise to complete, and using the ask pattern via mapAsync to interact with the parser actor.
create another actor to encapsulate the source queue.
Is this the way to go? Is there any other way to achieve this, maybe simpler ? maybe using graph stage? can it be done without the ask pattern and less code involved?
the actor mentioned in number 3, is because I'm not sure if the source queue is thread safe or not ?? I wish it was simply stated in the docs, but it doesn't. there are multiple versions over the web, some stating it's not and some stating it is.
Is the source queue, once materialized, is thread safe to push elements from different threads?
(the code may not compile and is prone to potential failures, and is only intended for this question in place)
class UserAgentRepo(dbFilePath: String)(implicit actorRefFactory: ActorRefFactory) {
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
implicit val askTimeout = Timeout(5 seconds)
// API to parser - delegates the request to the back pressure actor
def parse(userAgent: String): Future[Option[UserAgentData]] = {
val p = Promise[Option[UserAgentData]]
parserBackPressureProvider ! UserAgentParseRequest(userAgent, p)
p.future
}
// Actor to provide back pressure that delegates requests to parser actor
private class ParserBackPressureProvider extends Actor {
private val parser = context.actorOf(Props[UserAgentParserActor])
val queue = Source.queue[UserAgentParseRequest](100, OverflowStrategy.dropNew)
.mapAsync(1)(request => (parser ? request.userAgent).mapTo[Option[UserAgentData]].map(_ -> request.p))
.to(Sink.foreach({
case (result, promise) => promise.success(result)
}))
.run()
override def receive: Receive = {
case request: UserAgentParseRequest => queue.offer(request).map {
case QueueOfferResult.Enqueued =>
case _ => request.p.failure(new RuntimeException("parser busy"))
}
}
}
// Actor parser
private class UserAgentParserActor extends Actor {
private val up = new UserAgentParser(dbFilePath, true, 50000)
override def receive: Receive = {
case userAgent: String =>
sender ! Try {
up.parseUa(userAgent)
}.toOption.map(UserAgentData(userAgent, _))
}
}
private case class UserAgentParseRequest(userAgent: String, p: Promise[Option[UserAgentData]])
private val parserBackPressureProvider = actorRefFactory.actorOf(Props[ParserBackPressureProvider])
}
Do you have to use actors for this?
It does not seem like you need all this complexity, scala/java hasd all the tools you need "out of the box":
class ParserFacade(parser: UserAgentParser, val capacity: Int = 100) {
private implicit val ec = ExecutionContext
.fromExecutor(
new ThreadPoolExecutor(
1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(capacity)
)
)
def parse(ua: String): Future[Option[UserAgentData]] = try {
Future(Some(UserAgentData(ua, parser.parseUa(ua)))
.recover { _ => None }
} catch {
case _: RejectedExecutionException =>
Future.failed(new RuntimeException("parser is busy"))
}
}
I've got a SourceQueue. When I offer an element to this I want it to pass through the Stream and when it reaches the Sink have the output returned to the code that offered this element (similar as Sink.head returns an element to the RunnableGraph.run() call).
How do I achieve this? A simple example of my problem would be:
val source = Source.queue[String](100, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.ReturnTheStringSomehow
val graph = source.via(flow).to(sink).run()
val x = graph.offer("foo")
println(x) // Output should be "Modified foo"
val y = graph.offer("bar")
println(y) // Output should be "Modified bar"
val z = graph.offer("baz")
println(z) // Output should be "Modified baz"
Edit: For the example I have given in this question Vladimir Matveev provided the best answer. However, it should be noted that this solution only works if the elements are going into the sink in the same order they were offered to the source. If this cannot be guaranteed the order of the elements in the sink may differ and the outcome might be different from what is expected.
I believe it is simpler to use the already existing primitive for pulling values from a stream, called Sink.queue. Here is an example:
val source = Source.queue[String](128, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.queue[String]().withAttributes(Attributes.inputBuffer(1, 1))
val (sourceQueue, sinkQueue) = source.via(flow).toMat(sink)(Keep.both).run()
def getNext: String = Await.result(sinkQueue.pull(), 1.second).get
sourceQueue.offer("foo")
println(getNext)
sourceQueue.offer("bar")
println(getNext)
sourceQueue.offer("baz")
println(getNext)
It does exactly what you want.
Note that setting the inputBuffer attribute for the queue sink may or may not be important for your use case - if you don't set it, the buffer will be zero-sized and the data won't flow through the stream until you invoke the pull() method on the sink.
sinkQueue.pull() yields a Future[Option[T]], which will be completed successfully with Some if the sink receives an element or with a failure if the stream fails. If the stream completes normally, it will be completed with None. In this particular example I'm ignoring this by using Option.get but you would probably want to add custom logic to handle this case.
Well, you know what offer() method returns if you take a look at its definition :) What you can do is to create Source.queue[(Promise[String], String)], create helper function that pushes pair to stream via offer, make sure offer doesn't fail because queue might be full, then complete promise inside your stream and use future of the promise to catch completion event in external code.
I do that to throttle rate to external API used from multiple places of my project.
Here is how it looked in my project before Typesafe added Hub sources to akka
import scala.concurrent.Promise
import scala.concurrent.Future
import java.util.concurrent.ConcurrentLinkedDeque
import akka.stream.scaladsl.{Keep, Sink, Source}
import akka.stream.{OverflowStrategy, QueueOfferResult}
import scala.util.Success
private val queue = Source.queue[(Promise[String], String)](100, OverflowStrategy.backpressure)
.toMat(Sink.foreach({ case (p, param) =>
p.complete(Success(param.reverse))
}))(Keep.left)
.run
private val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
private def sendQueuedRequest(request: String): Future[String] = {
val p = Promise[String]
val offerFuture = queue.offer(p -> request)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete(_ => futureDeque.remove(future))
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendQueuedRequest(request)))
else
sendQueuedRequest(request)
}
}
I realize that blocking synchronized queue may be bottleneck and may grow indefinitely but because API calls in my project are made only from other akka streams which are backpressured I never have more than dozen items in futureDeque. Your situation may differ.
If you create MergeHub.source[(Promise[String], String)]() instead you'll get reusable sink. Thus every time you need to process item you'll create complete graph and run it. In that case you won't need hacky java container to queue requests.
So I tried to get a small play app communicating with another rest service.
The idea is, to receive a request on the play side and then do a request to the rest api and feed parts of the result to another local actor before displaying the response from the local actor and the rest service in the browser.
This image shows how
And I tried to do it with streams. I got it all working, but I am absolutely not happy with the part where I talk to my local actor and create a Future[(Future[String],Future[String]) tuple , so I would be happy if you could point me in the direction, how to do this in an elegant and clean way.
So here is my code. The input is a csv file.
My local actor creates an additional graphic I want to put into the response.
def upload = Action.async(parse.multipartFormData) { request =>
request.body.file("input").map { inputCsv =>
//csv to list of strings
val inputList: List[String] = convertFileToList(inputCsv)
//http request to rest service
val responseFuture: Future[HttpResponse] = httpRequest(inputList, "/path",4321 ,"0.0.0.0")
//pattern match response and ask local actor
val formattedResult = responseFuture.flatMap { response =>
response.status match {
case akka.http.scaladsl.model.StatusCodes.OK =>
val resultTeams = Unmarshal(response.entity).to[CustomResultCaseClass]
//the part I'd like to improve
val tupleFuture = resultTeams.map(result =>
(Future(result.teams.reduce(_ + "," + _)),
plotter.ask(PlotData(result.eval)).mapTo[ChartPath].flatMap(plotAnswer => Future(plotAnswer.path))))
tupleFuture.map(tuple => tuple._1.map(teams =>
p._2.map(chartPath => Ok(views.html.upload(teams))(chartPath))))).flatMap(a => a).flatMap(b => b)
}
}
formattedResult
}.getOrElse(Future(play.api.mvc.Results.BadRequest))
}
For comprehensions are useful for this type of use cases. A basic example which demonstrates the refactoring involved:
val teamFut = Future(result.teams.reduce(_ + "," + _))
//I think the final .flatMap(Future(_.path)) is unnecessary it should be
// .map(_.path), but I wanted to replicate the question code functionality
val pathFut = plotter.ask(PlotData(result.eval))
.mapTo[ChartPath]
.flatMap(Future(_.path))
val okFut =
for {
teams <- teamFut
chartPath <- pathFut
} yield Ok(views.html.upload(teams))(chartPath)
Note: the Initial Futures should be instantiated outside of the for otherwise parallel execution won't occur.