Future with RateLimiter - scala

Suppose I've got a simple blocking HTTP client like this:
def httpGet(url: URL): Future[String] = ???
Now I want to use httpGet to call a server with a request rate limit; e.g. 1000 requests/sec. Since the standard library does not provide a rate limiter I will use RateLimiter of Guava:
import com.google.common.util.concurrent.RateLimiter
import scala.concurrent.{ExecutionContext, Future, blocking}
def throttled[A](fut: => Future[A], rateLimiter: RateLimiter)
(implicit ec: ExecutionContext): Future[A] = {
Future(blocking(rateLimiter.acquire())).flatMap(_ => fut)
}
implicit val ec = ExecutionContext.global
val rateLimiter = RateLimiter.create(permitsPerSeconds = 1000.0)
val throttledFuture = throttled(httpGet(url), rateLimiter)
Does it make sense ?
Would you use another execution context to execute rateLimiter.acquire() ?

Since you're using blocking around the acquire, it's okay, IMO.
Depending on how much work gets done in the thread which calls httpGet, if you're on Scala 2.13 it might be worth considering using the parasitic execution context.
Style nit, but it might be worth taking advantage of Scala's ability to use {'s around single argument lists:
def throttled[A](rateLimiter: RateLimiter)(fut: => Future[A])(implicit ec: ExecutionContext): Future[A]
val throttledFuture = throttled(rateLimiter) { httpGet(url) }

Related

Convert EitherT[Future, A, B] to Option[B]

I have a method
def getInfoFromService(reqParams: Map[String, String]): Future[Either[CustomException, A]]
and I have another function
import cats.implicits._
def processInfoAndModelResponse(input: Future[Either[CustomException, A]]): Option[A] = {
for {
either <- input
resultA <- either
} yield resultA.some
}
So basically, I am trying to convert Future[Either[CustomException, A]]) to Option[A]. The resultA <- either in above code would not work as map is not happy with type.
What you try to do is basically something like
def convert[A](future: Future[Either[CustomException, A]]): Option[A] =
Try(Await.result(future, timeout)).toEither.flatten.toOption
This:
looses information about error, which prevents any debugging (though you could log after .flatten)
blocks async operation which kills any benefit of using Future in the first place
Monads do not work like that. Basically you can handle several monadic calculations with for for the same monad, but Future and Either are different monads.
Try
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
def processInfoAndModelResponse(input: Future[Either[CustomException, A]]): Option[A] = {
Await.result(input, 1.minute).toOption
}
#MateuszKubuszok's answer is different in that it doesn't throw.
You can't sensibly convert a Future[Either[CustomException, A]] to Option[A], as you don't know what the result will be until the Future has completed.
Future[Either[CustomException, A]] might just bet kept that way, or put in IO as IO[Either[CustomException, A]]. If you don't care about CustomException, you can also capture it Future (or IO) error handling mechanisms, or just discard it entirely.
Some options:
absorb the CustomException in Future's error handling:
val fa: Future[A] = fut.transform(_.map(_.toTry))
ignore the CustomException an make the Future return with None
val fa: Future[Option[A]] = fut.map(_.toOption)
if you really want to block a thread and wait for the result,
Await.result(input, Duration.Inf).toOption
but all the caveats of awaiting apply. See the documentaion, reproduced here:
WARNING: It is strongly discouraged to supply lengthy timeouts since the progress of the calling thread will be suspended—blocked—until either the Awaitable has a result or the timeout expires.

Timeout a Future in Scala.js

I need to place a timeout on a Future in a cross-platform JVM / JS application. This timeout would only be used in tests, so a blocking solution wouldn't be that bad.
I implemented the following snippet to make the future timeout on JVM:
def runWithTimeout[T](timeoutMillis: Int)(f: => Future[T]) : Future[T] =
Await.ready(f, Duration.create(timeoutMillis, java.util.concurrent.TimeUnit.MILLISECONDS))
This doesn't work on Scala.js, as it has no implementation of Await. Is there any other solution to add a Timeout to a Future that works in both Scala.js and Scala JVM?
Your code doesn't really add a timeout to the existing future. That's not possible. What you're doing is setting a timeout for waiting for that future at that specific point. That, you can reproduce in a different, fully asynchronous way, by creating a future that will
resolve to f if it finishes within the given timeout
otherwise resolves to a failed TimeoutException
import scala.concurrent._
import scala.concurrent.duration.Duration
import scala.scalajs.js
def timeoutFuture[T](f: Future[T], timeout: Duration)(
implicit ec: ExecutionContext): Future[T] = {
val p = Promise[T]()
val timeoutHandle = js.timers.setTimeout(timeout) {
p.tryFailure(new TimeoutException)
}
f.onComplete { result =>
p.tryComplete(result)
clearTimeout(timeoutHandle)
}
p.future
}
The above is written for Scala.js. You can write an equivalent one for the JVM, and place them in platform-dependent sources.
Alternatively, you can probably write something equivalent in terms of java.util.Timer, which is supported both on JVM and JS.

Executing sequence of functions that return a future sequentially

I have a sequence of functions that return a future. I want to execute them sequentially i.e. after the first function future is complete, execute the next function and so on. Is there a way to do it?
ops: Seq[() => Future[Unit]]
You can combine all the futures into a single one with a foldLeft and flatMap:
def executeSequentially(ops: Seq[() => Future[Unit]])(
implicit exec: ExecutionContext
): Future[Unit] =
ops.foldLeft(Future.successful(()))((cur, next) => cur.flatMap(_ => next()))
foldLeft ensures the order from left to right and flatMap gives sequential execution. Functions are executed with the ExecutionContext, so calling executeSequentially is not blocking. And you can add callbacks or await on the resulting Future when/if you need it.
If you are using Twitter Futures, then I guess you won't need to pass ExecutionContext, but the general idea with foldLeft and flatMap should still work.
If given a Seq[Future[T]] you can convert it to a Future[Seq[T]] like so:
Val a: Seq[Future[T]] = ???
val resut: Future[Seq[T]] = Future.sequence(a)
a little less boilerplate than the above :)
I believe this should do it:
import scala.concurrent.{Await, Future}
import scala.concurrent.duration.Duration
def runSequentially(ops: Seq[() => Future[Unit]]): Unit = {
ops.foreach(f => Await.result(f(), Duration.Inf))
}
If you want to wait less then Duration.Inf, or stop at failure - should be easy to do.

When scala.concurrent.Future's execution starts?

Of course one could collect the system time at the first line of the Future's body. But:
Is it possible to know that time without having access to the future's code. (In my case the method returning the future is to be provided by the user of the 'framework'.)
def f: Future[Int] = ...
def magicTimePeak: Long = ???
The Future itself doesn't really know this (nor was it designed to care). It's all up to the executor when the code will actually be executed. This depends on whether or not a thread is immediately available, and if not, when one becomes available.
You could wrap Future, to keep track of it, I suppose. It would involve creating an underlying Future with a closure that changes a mutable var within the wrapped class. Since you just want a Long, it would have to default to zero if the Future hasn't begun executing, though it would be trivial to change this to Option[Date] or something.
class WrappedFuture[A](thunk: => A)(implicit ec: ExecutionContext) {
var started: Long = 0L
val underlying = Future {
started = System.nanoTime / 1000000 // milliseconds
thunk
}
}
To show that it works, create a fixed thread pool with one thread, then feed it a blocking task for say, 5 seconds. Then, create a WrappedFuture, and check it's started value later. Note the difference in the logged times.
import java.util.concurrent.Executors
import scala.concurrent._
val executorService = Executors.newFixedThreadPool(1)
implicit val ec = ExecutionContext.fromExecutorService(executorService)
scala> println("Before blocked: " + System.nanoTime / 1000000)
Before blocked: 13131636
scala> val blocker = Future(Thread.sleep(5000))
blocker: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise#7e5d9a50
scala> val f = new WrappedFuture(1)
f: WrappedFuture[Int] = WrappedFuture#4c4748bf
scala> f.started
res13: Long = 13136779 // note the difference in time of about 5000 ms from earlier
If you don't control the creation of the Future, however, there is nothing you can do to figure out when it started.

Memory consumption of a parallel Scala Stream

I have written a Scala (2.9.1-1) application that needs to process several million rows from a database query. I am converting the ResultSet to a Stream using the technique shown in the answer to one of my previous questions:
class Record(...)
val resultSet = statement.executeQuery(...)
new Iterator[Record] {
def hasNext = resultSet.next()
def next = new Record(resultSet.getString(1), resultSet.getInt(2), ...)
}.toStream.foreach { record => ... }
and this has worked very well.
Since the body of the foreach closure is very CPU intensive, and as a testament to the practicality of functional programming, if I add a .par before the foreach, the closures get run in parallel with no other effort, except to make sure that the body of the closure is thread safe (it is written in a functional style with no mutable data except printing to a thread-safe log).
However, I am worried about memory consumption. Is the .par causing the entire result set to load in RAM, or does the parallel operation load only as many rows as it has active threads? I've allocated 4G to the JVM (64-bit with -Xmx4g) but in the future I will be running it on even more rows and worry that I'll eventually get an out-of-memory.
Is there a better pattern for doing this kind of parallel processing in a functional manner? I've been showing this application to my co-workers as an example of the value of functional programming and multi-core machines.
If you look at the scaladoc of Stream, you will notice that the definition class of par is the Parallelizable trait... and, if you look at the source code of this trait, you will notice that it takes each element from the original collection and put them into a combiner, thus, you will load each row into a ParSeq:
def par: ParRepr = {
val cb = parCombiner
for (x <- seq) cb += x
cb.result
}
/** The default `par` implementation uses the combiner provided by this method
* to create a new parallel collection.
*
* #return a combiner for the parallel collection of type `ParRepr`
*/
protected[this] def parCombiner: Combiner[A, ParRepr]
A possible solution is to explicitly parallelize your computation, thanks to actors for example. You can take a look at this example from the akka documentation for example, that might be helpful in your context.
The new akka stream library is the fix you're looking for:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Source, Sink}
def iterFromQuery() : Iterator[Record] = {
val resultSet = statement.executeQuery(...)
new Iterator[Record] {
def hasNext = resultSet.next()
def next = new Record(...)
}
}
def cpuIntensiveFunction(record : Record) = {
...
}
implicit val actorSystem = ActorSystem()
implicit val materializer = ActorMaterializer()
implicit val execContext = actorSystem.dispatcher
val poolSize = 10 //number of Records in memory at once
val stream =
Source(iterFromQuery).runWith(Sink.foreachParallel(poolSize)(cpuIntensiveFunction))
stream onComplete {_ => actorSystem.shutdown()}