Disclaimer: I am new to sttp and Monix, and that is my attempt to learn more about these libraries. My goal is to fetch data (client-side) from a given API via HTTP GET requests -> parse JSON responses -> write this information to a database. My question pertains to the first part only. My objective is to run get requests in an asynchronous (hopefully fast) way while having a way to either avoid or handle rate limits.
Below is a snippet of what I have already tried, and seems to work for a single request:
package com.github.client
import io.circe.{Decoder, HCursor}
import sttp.client._
import sttp.client.circe._
import sttp.client.asynchttpclient.monix._
import monix.eval.Task
object SO extends App {
case class Bla(paging: Int)
implicit val dataDecoder: Decoder[Bla] = (hCursor: HCursor) => {
for {
next_page <- hCursor.downField("foo").downArray.downField("bar").as[Int]
} yield Bla(next_page)
}
val postTask = AsyncHttpClientMonixBackend().flatMap { implicit backend =>
val r = basicRequest
.get(uri"https://foo.bar.io/v1/baz")
.header("accept", "application/json")
.header("Authorization", "hushh!")
.response(asJson[Bla])
r.send() // How can I instead of operating on a single request, operate on multiple
.flatMap { response =>
Task(response.body)
}
.guarantee(backend.close())
}
import monix.execution.Scheduler.Implicits.global
postTask.runSyncUnsafe() match {
case Left(error) => println(s"Error when executing request: $error")
case Right(data) => println(data)
}
}
My questions:
How can I operate on several GET Requests (instead of a single request) via using Monix, while keeping the code asynchronous and composable
How can I avoid or handle hitting rate-limits imposed by the api server
On a side note, I am also flexible in terms of using another back-end if that will support the rate limiting objective.
You can use monix.reactive.Observable like this
Observable.repeatEval(postTask) // we generate infinite observable of the task
.throttle(1.second, 3) // set throttling
.mapParallelOrderedF(2)(_.runToFuture) // set execution parallelism and execute tasks
.subscribe() // start the pipline
while (true) {}
Related
I'm accessing data from a REST endpoint :
"https://api-public.sandbox.pro.coinbase.com/products/BTC-EUR/ticker"
To access the data once per second I use an infinite loop while(true) { to invoke a message send to the actor once per second which begins the process of invoking the REST request:
The actor to access the data is:
object ProductTickerRestActor {
case class StringData(data: String)
}
class ProductTickerRestActor extends Actor {
override def receive: PartialFunction[Any, Unit] = {
case ProductTickerRestActor.StringData(data) =>
try {
println("in ProductTickerRestActor")
val rData = scala.io.Source.fromURL("https://api-public.sandbox.pro.coinbase.com/products/BTC-EUR/ticker").mkString
println("rData : "+rData)
}
catch {
case e: Exception =>
println("Exception thrown in ProductTickerRestActor: " + e.getMessage)
}
case msg => println(s"I cannot understand ${msg.toString}")
}
}
I start the application using:
object ExchangeModelDataApplication {
def main(args: Array[String]): Unit = {
val actorSystem = ActorSystemConfig.getActorSystem
val priceDataActor = actorSystem.actorOf(Props[ProductTickerRestActor], "ProductTickerRestActor")
val throttler = Throttlers.getThrottler(priceDataActor)
while(true) {
throttler ! ProductTickerRestActor.StringData("test")
Thread.sleep(1000)
}
}
Throttler:
object Throttlers {
implicit val materializer = ActorMaterializer.create(ActorSystemConfig.getActorSystem)
def getThrottler(priceDataActor: ActorRef) = Source.actorRef(bufferSize = 1000, OverflowStrategy.dropNew)
.throttle(1, 1.second)
.to(Sink.actorRef(priceDataActor, NotUsed))
.run()
}
How to run the following code asynchronously instead of blocking using an infinite loop? :
throttler ! ProductTickerRestActor.StringData("test")
Thread.sleep(1000)
Also, the throttler, in this case, maybe redundant as I'm throttling requests within the loop regardless.
I would just use Akka Streams for this along with Akka HTTP. Using Akka 2.6.x, something along these lines would be sufficient for 1 request/second
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import akka.stream.scaladsl._
import scala.concurrent.duration._
object HTTPRepeatedly {
implicit val system = ActorSystem()
import system.dispatcher
val sourceFromHttp: Source[String, NotUsed] =
Source.repeated("test") // Not sure what "test" is actually used for here...
.throttle(1, 1.second)
.map { str =>
HttpRequest(uri = "https://api-public.sandbox.pro.coinbase.com/products/BTC-EUR/ticker")
}.mapAsync(1) { req =>
Http().singleRequest(req)
}.mapAsync(1)(_.entity.toStrict(1.minute))
.map(_.data.decodeString(java.nio.charset.StandardCharsets.UTF_8))
}
Then you could, for instance (for simplicity, put this in a main within HTTPRepeatedly so the implicits are in scope etc.)
val done: Future[Done] =
sourceFromHttp
.take(10) // stop after 10 requests
.runWith(Sink.foreach { rData => println(s"rData: $rData") })
scala.concurrent.Await.result(done, 11.minute)
system.terminate()
Sending a request every second is not a good idea. If, for some reason, the request is delayed, you are going to get a lot of requests piled up. Instead, send the next request one second after the previous requests completes.
Because this code uses a synchronous GET request, you can just send the next request one second after mkString returns.
But using synchronous requests is not a good way to consume a RESTful API in Akka. It blocks the actor receive method until the request is complete, which can eventually block the whole ActorSystem.
Instead, use Akka Http and singleRequest to perform an asynchronous request.
Http().singleRequest(HttpRequest(uri = "https://api-public.sandbox.pro.coinbase.com/products/BTC-EUR/ticker"))
This returns a Future. Make the new request one second after the request completes (e.g. using onComplete on the Future).
Not only is this safer and more asynchronous, it also provides much more control over the REST API call than fromUrl
I am currently learning and playing around with STTP using the Monix backend. I am mainly stuck with closing the backend after all my requests (each request is a task) have been processed.
I have created sample/mock code to resemble my issue (to my understanding my problem is more general rather than specific to my code):
import sttp.client.asynchttpclient.monix._
import monix.eval.Task
import monix.reactive.Observable
import sttp.client.{Response, UriContext}
import scala.concurrent.duration.DurationInt
object ObservableTest extends App {
val activities = AsyncHttpClientMonixBackend().flatMap { implicit backend =>
val ids: Task[List[Int]] = Task { (1 to 3).toList }
val f: String => Task[Response[Either[String, String]]] = (i: String) => fetch(uri"$i", "")
val data: Task[List[Task[Response[Either[String, String]]]]] = ids map (_ map (_ => f("https://heloooo.free.beeceptor.com/my/api/path")))
data.guarantee(backend.close()) // If I close the backend here, I can' generate requests after (when processing the actual requests in the list)
// I have attempted to return a Task containing a tuple of (data, backend) but closing the backend from outside of the scope did not work as I expected
}
import monix.execution.Scheduler.Implicits.global
val obs = Observable
.fromTask(activities)
.flatMap { listOfFetches =>
Observable.fromIterable(listOfFetches)
}
.throttle(3 second, 1)
.map(_.runToFuture)
obs.subscribe()
}
And my fetch (api call maker) function looks like:
def fetch(uri: Uri, auth: String)(implicit
backend: SttpBackend[Task, Observable[ByteBuffer], WebSocketHandler]
) = {
println(uri)
val task = basicRequest
.get(uri)
.header("accept", "application/json")
.header("Authorization", auth)
.response(asString)
.send()
task
}
As my main task contains other tasks, which I later need to process, I need to find an alternative way to close the Monix backend from outside. Is there a clean way to do close the backend after I consume the requests in List[Task[Response[Either[String, String]]]]?
The problems comes from the fact that with the sttp backend open, you are computing a list of tasks to be performed - the List[Task[Response[Either[String, String]]]], but you are not running them. Hence, we need to sequence running these tasks, before the backend closes.
The key thing to do here is to create a single description of a task, that runs all of these requests while the backend is still open.
Once you compute data (which itself is a task - a description of a computation - which, when run, yield a list of tasks - also descriptions of computations), we need to convert this into a single, non-nested Task. This can be done in a variety of ways (e.g. using simple sequencing), but in your case this will be using the Observable:
AsyncHttpClientMonixBackend().flatMap { implicit backend =>
val ids: Task[List[Int]] = Task { (1 to 3).toList }
val f: String => Task[Response[Either[String, String]]] = (i: String) => fetch(uri"$i", "")
val data: Task[List[Task[Response[Either[String, String]]]]] =
ids map (_ map (_ => f("https://heloooo.free.beeceptor.com/my/api/path")))
val activities = Observable
.fromTask(data)
.flatMap { listOfFetches =>
Observable.fromIterable(listOfFetches)
}
.throttle(3 second, 1)
.mapEval(identity)
.completedL
activities.guarantee(
backend.close()
)
}
First note that the Observable.fromTask(...) is inside the outermost flatMap, so is created when the backend is still open. We create the observable, throttle it etc., and then comes the crucial fact: once we have the throttled stream, we evaluate each item (each item is a Task[...] - a description of how to send na http request) using mapEval. We get a stream of Either[String, String], which are the results of the requests.
Finally, we convert the stream to a Task using .completedL (discarding the results), which waits until the whole stream completes.
This final task is then sequenced with closing the backend. The final sequence of side-effects that will happen, as described above, is:
open the backend
create the list of tasks (data)
create a stream, which throttles elements from the list computed by data
evaluate each item in the stream (send requests)
close the backend
I am using the BlazeClientBuilder[IO].resource method to get Client[IO]. Now, I want to mock the client for unit testing but cannot figure out how to do so. Is there a good way of mocking this and how would I do that?
class ExternalCall(val resource: Resource[IO, Client[IO]], externalServiceUrl: Uri) {
def retrieveData: IO[Either[Throwable, String]] = {
for {
req <- IO(Request[IO](Method.GET, uri = externalServiceUrl))
response <- resource.use(client => {
client.fetch[String](req)(httpResponse => {
if (!httpResponse.status.isSuccess)
throw new Exception(httpResponse.status.reason)
else
httpResponse.as[String]
})
})
} yield Right(response)
}
}
Caller code
new ExternalCall(BlazeClientBuilder[IO](global).resource).retrieveData
It seems you only need to do something like
val resourceMock = mock[Resource[IO, Client[IO]]]
//stub whatever is necessary
val call = new ExternalCall(resourceMock).retrieveData
//do asserts and verifications as needed
EDIT:
You can see a fully working example below, but I'd like to stress that this is a good example of why it is a good practice to avoid mocking APIs that you don't own.
A better way to test this would be to place the http4s related code witin a class you own (YourHttpClient or whatever) and write an integration test for that class that checks that the http4s client does the right thing (you can use wiremock to simulate a real http server).
Then you can pass mocks of YourHttpClient to the components that depend on it, with the advantage that you control its API so it will be simpler and if http4s ever updates its API you only have one breaking class rather than having to fix tens or hundreds of mock interactions.
BTW, the example is written using mockito-scala as using the Java version of mockito would have yielded code much harder to read.
val resourceMock = mock[Resource[IO, Client[IO]]]
val clientMock = mock[Client[IO]]
val response: Response[IO] = Response(Status.Ok,
body = Stream("Mocked!!!").through(text.utf8Encode),
headers = Headers(`Content-Type`(MediaType.text.plain, Charset.`UTF-8`)))
clientMock.fetch[String](any[Request[IO]])(*) shouldAnswer { (_: Request[IO], f: Response[IO] => IO[String]) =>
f(response)
}
resourceMock.use[String](*)(*) shouldAnswer { (f: Client[IO] => IO[String]) =>
f(clientMock)
}
val data = new ExternalCall(resourceMock, Uri.unsafeFromString("http://www.example.com")).retrieveData
data.unsafeRunSync().right.value shouldBe "Mocked!!!"
You can easly mock Client using following snippet
import fs2.Stream
import org.http4s.Response
import org.http4s.client.Client
def httpClient(body: String): Client[IO] = Client.apply[IO] { _ =>
Resource.liftF(IO(Response[IO](body = Stream.emits(body.getBytes("UTF-8")))))
}
In order to have the client as resource you need to wrap it with IO and lift to Resource
Resource.liftF(IO(httpClient("body")))
I am using a third party library to provide parsing services (user agent parsing in my case) which is not a thread safe library and has to operate on a single threaded basis. I would like to write a thread safe API that can be called by multiple threads to interact with it via Futures API as the library might introduce some potential blocking (IO). I would also like to provide back pressure when necessary and return a failed future when the parser doesn't catch up with the producers.
It could actually be a generic requirement/question, how to interact with any client/library which is not thread safe (user agents/geo locations parsers, db clients like redis, loggers collectors like fluentd), with back pressure in a concurrent environments.
I came up with the following formula:
encapsulate the parser within a dedicated Actor.
create an akka stream source queue that receives ParseReuqest that contains the user agent and a Promise to complete, and using the ask pattern via mapAsync to interact with the parser actor.
create another actor to encapsulate the source queue.
Is this the way to go? Is there any other way to achieve this, maybe simpler ? maybe using graph stage? can it be done without the ask pattern and less code involved?
the actor mentioned in number 3, is because I'm not sure if the source queue is thread safe or not ?? I wish it was simply stated in the docs, but it doesn't. there are multiple versions over the web, some stating it's not and some stating it is.
Is the source queue, once materialized, is thread safe to push elements from different threads?
(the code may not compile and is prone to potential failures, and is only intended for this question in place)
class UserAgentRepo(dbFilePath: String)(implicit actorRefFactory: ActorRefFactory) {
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
implicit val askTimeout = Timeout(5 seconds)
// API to parser - delegates the request to the back pressure actor
def parse(userAgent: String): Future[Option[UserAgentData]] = {
val p = Promise[Option[UserAgentData]]
parserBackPressureProvider ! UserAgentParseRequest(userAgent, p)
p.future
}
// Actor to provide back pressure that delegates requests to parser actor
private class ParserBackPressureProvider extends Actor {
private val parser = context.actorOf(Props[UserAgentParserActor])
val queue = Source.queue[UserAgentParseRequest](100, OverflowStrategy.dropNew)
.mapAsync(1)(request => (parser ? request.userAgent).mapTo[Option[UserAgentData]].map(_ -> request.p))
.to(Sink.foreach({
case (result, promise) => promise.success(result)
}))
.run()
override def receive: Receive = {
case request: UserAgentParseRequest => queue.offer(request).map {
case QueueOfferResult.Enqueued =>
case _ => request.p.failure(new RuntimeException("parser busy"))
}
}
}
// Actor parser
private class UserAgentParserActor extends Actor {
private val up = new UserAgentParser(dbFilePath, true, 50000)
override def receive: Receive = {
case userAgent: String =>
sender ! Try {
up.parseUa(userAgent)
}.toOption.map(UserAgentData(userAgent, _))
}
}
private case class UserAgentParseRequest(userAgent: String, p: Promise[Option[UserAgentData]])
private val parserBackPressureProvider = actorRefFactory.actorOf(Props[ParserBackPressureProvider])
}
Do you have to use actors for this?
It does not seem like you need all this complexity, scala/java hasd all the tools you need "out of the box":
class ParserFacade(parser: UserAgentParser, val capacity: Int = 100) {
private implicit val ec = ExecutionContext
.fromExecutor(
new ThreadPoolExecutor(
1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(capacity)
)
)
def parse(ua: String): Future[Option[UserAgentData]] = try {
Future(Some(UserAgentData(ua, parser.parseUa(ua)))
.recover { _ => None }
} catch {
case _: RejectedExecutionException =>
Future.failed(new RuntimeException("parser is busy"))
}
}
I need to perform a single synchronous HTTP POST call: create an HTTP Post request with some data, connect to the server, send the request, receive the response, and close the connection. It is important to release all resources used to do this call.
Now I am doing it in Java with Apache Http Client. How can I do it with Scala dispatch library ?
Something like this should work (haven't tested it though)
import dispatch._, Defaults._
import scala.concurrent.Future
import scala.concurrent.duration._
def postSync(path: String, params: Map[String, Any] = Map.empty): Either[java.lang.Throwable, String] = {
val r = url(path).POST << params
val future = Http(r OK as.String).either
Await.result(future, 10.seconds)
}
(I'm using https://github.com/dispatch/reboot for this example)
You explicitly wait for the result of the future, which is either a String or an exception.
And use it like
postSync("http://api.example.com/a/resource", Map("param1" -> "foo") match {
case Right(res) => println(s"Success! Result was $res")
case Left(e) => println(s"Woops, something went wrong: ${e.getMessage}")
}
We all know that Scala's important feature is async. So there are lots of solutions to tell people how to do HTTP CALL ASYNC. For example, dispatch is the async http library. But here we care about how to implement sync http call by scala. I use a trick to do it.
Here is my solution:
val svc = url(#your_url)
val response = Http(svc OK dispatch.as.json4s.Json)
response onComplete {
// do_what_you_want
}
while(!response.isCompleted) {
Thread.sleep(1000)
}
// return_what_you_want
So here I use "isCompleted" to help me to wait until finish. So it is a trick to convert async to sync.