Is there an overhead because of nesting Futures - scala

I wrote this code
package com.abhi
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
object FutureNesting extends App {
def measure(future: => Future[Unit]) : Future[Long] = {
val start = System.currentTimeMillis()
val ec = implicitly[ExecutionContext]
val t = future
t map { case _ =>
val end = System.currentTimeMillis()
end - start
}
}
measure(Future{ Thread.sleep(10000) }) onSuccess {case a => println(a)}
scala.io.StdIn.readLine()
}
So how many threads am I using in this code. The broader question is that what is the impact of going on nesting future inside futures.
So I ran the application above and observed it using Visual VM. This is what I saw
So the application launched 2 threads ForkJoinPool-1-worker-5 and ForkJoinPool-2-worker-3. However it launches the same 2 threads even if I remove the nesting. So I am not sure what is the overhead because of nesting the futures like above.
Edit:: Some people said it depends on the type of ThreadPool (ForkJoin etc).
I won't know what type of pool do Akka.HTTP or Spray use? I planned to use a code snippet similar to the one above in a Spray web service. The idea was to measure the performance of the web service using Futures.

In your case, you are using wrap over thradpool (ForkJoingPool from java.util.concurrent). Of course, all Futures are executed in it.
import scala.concurrent.ExecutionConext.Implicits.global
Based on this you must implicitly instantiate pool instead import, like this:
implicit val ec: ExecutionContext
And use method from ForkJoinPool: getActiveThreadCount()
Second approach:
You can open profiler (like JProfiler - from Jetbrains or Jvisualvm - attached with jdk) and watch meta information including threads parameters like their amount, activity, memory usage and etc.

Related

What is the reason for IO application not closing its execution?

Why does this hang?
import cats.effect.IO
import cats.effect.unsafe.implicits.global
import com.typesafe.config.ConfigFactory
import io.circe.config.parser
import org.typelevel.log4cats._
import org.typelevel.log4cats.slf4j._
object ItsElenApp extends App:
private val logger: SelfAwareStructuredLogger[IO] = LoggerFactory[IO].getLogger
val ops = for
_ <- logger.info("aa")
_ <- IO(println("bcc"))
_ <- logger.error("bb")
yield ()
ops.unsafeRunSync()
println("end")
It prints:
INFO 2022-06-19 11:56:25,303 ItsElenApp - aa
bcc
and keeps running. Is it the log4cats library, or am I using the App object in a wrong way. Or do I have to close an execution context?
The recommended way of running cats.effect.IO-based apps is using cats.effect.IOApp (or cats.effect.ResourceApp):
object MyApp extends IOApp:
// notice it takes List[String] rather than Array[String]
def run(args: List[String]): IO[ExitCode] = ...
this would run the application, handle setting up exit code, etc. It closes the app when run reaches the end, which might be necessary if the default thread pool is non-daemon. If you don't want to use IOApp you might need to close JVM manually, also taking into consideration that exception might have been thrown on .unsafeRunSync().
Extending App is on the other hand not recommended in general. Why? It uses special Delayed mechanism where the whole body of a class (its constructor) is lazy. This makes it harder to reason about the initialization, which is why this mechanism became deprecated/discouraged. If you are not using IOApp it is better to implement things like:
object MyProgram:
// Java requires this signature
def main(args: Array[String]): Unit = ...
which in your case could look like
object ItsElenApp:
private val logger: SelfAwareStructuredLogger[IO] = LoggerFactory[IO].getLogger
def main(args: Array[String]): Unit =
val ops = for
_ <- logger.info("aa")
_ <- IO(println("bcc"))
_ <- logger.error("bb")
yield ()
// IOApp could spare you this
try
ops.unsafeRunSync()
finally
println("end")
sys.exit(0)
Because log4cats' logging functions seem to be asynchronous. Meaning they might run on a different thread, not in sequence with the other operations.
That means the main thread might finish but the other threads keep being opened and never closing.
If I use ops.unsafeRunTimed(2.seconds) everything executes and closes. But the latter log lines come only at the end of the time. Looks like the logging is somehow lazy and only finishes if it is asked to. I'm not sure.
They write you shouldn't use unsafeRunTimed in production code.
Of course if you use IOApp, everything executes normally again.
But how would you write this in production code, cleanly, nicely without an IOApp? I think there should be a clean way of telling log4cats to finish the operations, return their async fibers results and close everything. All manually.

Right way of handling multiple future callbacks using threadpool in Scala

I am trying to do a very simple thing and want to understand the right way of doing it. I need to periodically make some Rest API calls to a separate service and then process the results asynchronously. I am using actor system's default scheduler to schedule the Http requests and have created a separate threadpool to handle the Future callbacks. Since there is no dependency between requests and response I thought a separate threadpool for handling future callbacks should be fine.
Is there some problem with this approach?
I read the Scala doc and it says there is some issue here (though i not clear on it).
Generally what is recommended way of handling these scenarios?
implicit val system = ActorSystem("my-actor-system") // define an actor system
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10)) // create a thread pool
// define a thread which periodically does some work using the actor system's scheduler
system.scheduler.scheduleWithFixedDelay(5.seconds, 5.seconds)(new Runnable {
override def run(): Unit = {
val urls = getUrls() // get list of urls
val futureResults = urls.map(entry => getData[MyData](entry))) // get data foreach url
futureResults onComplete {
case Success(res) => // do something with the result
case Failure(e) => // do something with the error
}
}
}))
def getdata[T](url : String) : Future[Option[Future[T]] = {
implicit val ec1 = system.dispatcher
val responseFuture: Future[HttpResponse] = execute(url)
responseFuture map { result => {
// transform the response and return data in format T
}
}
}
Whether or not having a separate thread pool really depends on the use case. If the service integration is very critical and is designed to take a lot of resources, then a separate thread pool may make sense, otherwise, just use the default one should be fine. Feel free to refer to Levi's question for more in-depth discussions on this part.
Regarding "job scheduling in an actor system", I think Akka streams are a perfect fit here. I give you an example below. Feel free to refer to the blog post https://blog.colinbreck.com/rethinking-streaming-workloads-with-akka-streams-part-i/ regarding how many things can Akka streams simplify for you.
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
import scala.concurrent.duration._
import scala.concurrent.{ExecutionContext, Future}
import scala.util.{Failure, Success}
object Timer {
def main(args: Array[String]): Unit = {
implicit val system: ActorSystem = ActorSystem("Timer")
// default thread pool
implicit val ec: ExecutionContext = system.dispatcher
// comment out below if custom thread pool is needed
// also make sure you read https://doc.akka.io/docs/akka/current/dispatchers.html#setting-the-dispatcher-for-an-actor
// to define the custom thread pool
// implicit val ec: ExecutionContext = system.dispatchers.lookup("my-custom-dispatcher")
Source
.tick(5.seconds, 5.seconds, getUrls())
.mapConcat(identity)
.mapAsync(1)(url => fetch(url))
.runWith(Sink.seq)
.onComplete {
case Success(responses) =>
// handle responses
case Failure(ex) =>
// handle exceptions
}
}
def getUrls(): Seq[String] = ???
def fetch(url: String): Future[Response] = ???
case class Response(body: String)
}
In addition to Yik San Chan's answer above (especially regarding using Akka Streams), I'd also point out that what exactly you're doing in the .onComplete block is quite relevant to the choice of which ExecutionContext to use for the onComplete callback.
In general, if what you're doing in the callback will be doing blocking I/O, it's probably best to do it in a threadpool which is large relative to the number of cores (note that each thread on the JVM consumes about 1MB or so of heap, so it's probably not a great idea to use an ExecutionContext that spawns an unbounded number of threads; a fixed pool of about 10x your core count is probably OK).
Otherwise, it's probably OK to use an ExecutionContext with a threadpool roughly equal in size to the number of cores: the default Akka dispatcher is such an ExecutionContext. The only real reason to consider not using the Akka dispatcher, in my experience/opinion, is if the callback is going to occupy the CPU for a long time. The phenomenon known as "thread starvation" can occur in that scenario, with adverse impacts on performance and cluster stability (if using, e.g. Akka Cluster or health-checks). In such a scenario, I'd tend to use a dispatcher with fewer threads than cores and consider configuring the default dispatcher with fewer threads than the default (while the kernel's scheduler can and will manage more threads ready-to-run than cores, there are strong arguments for not letting it do so).
In an onComplete callback (in comparison to the various transformation methods on Future like map/flatMap and friends), since all you can do is side-effect, it's probably more likely than not that you're doing blocking I/O.

How can I run parSequenceUnordered of Monix, and handle the results of each task?

I am currently working on implementing client-side http requests to an API, and decided to explore sttp & monix for this task. As I am new to Monix, I am still not sure how to run tasks and retrieve their results. My objective is to have a sequence of http request results, which I can call in parallel -> parse -> load.
Below is a snippet of what I have tried so far:
import sttp.client._
import sttp.client.asynchttpclient.monix._
import monix.eval.Task
object SO extends App {
val postTask = AsyncHttpClientMonixBackend().flatMap { implicit backend =>
val r1 = basicRequest.get(uri"https://hello.world.io/v1/bla")
.header("accept", "application/json")
.response(asString)
.body()
.send()
val tasks = Seq(r1).map(i => Task(i))
Task.parSequenceUnordered(tasks).guarantee(backend.close())
}
import monix.execution.Scheduler.Implicits.global
postTask.runToFuture.foreach(println) // prints: List(Task.FlatMap$2052527361)
}
My confusion is rather a simple one (I am guessing). How can I run the Task.parSequenceUnordered that I have created, and handle (parse the http results) the Tasks within the sequence?
Nice to have: out of curiosity, is it possible to naively introduce rate-limiting/throttling when processing the Task sequence of requests? I am not really looking for building something sophisticated. It could be as simple as spacing out batches of requests. Wondering if Monix has a helper for that already.
Thanks to Oleg Pyzhcov and the monix gitter community for helping me figure this one out.
Quoting Oleg here:
Since you're using backend with monix support already, the type of r1
is Task[Response[Either[String,String]]]. So when you're doing
Seq(r1).map(i => Task(i)), you make it a sequence of tasks that don't
do anything except give you other tasks that give you result (the type
would be Seq[Task[Task[Response[...]]]]). Your code then parallelizes
the outer layer, tasks-that-give-tasks, and you get the tasks that you
started with as the result. You only need to process a Seq(r1) for it
to run requests in parallel.
If you're using Intellij, you can press Alt + = to see the type of
selection - it helps if you can't tell the type from the code alone
(but it gets better with experience).
As for rate-limiting, we have parSequenceN that lets you set a limit
to parallelism. Note that unordered only means that you get slight
performance advantage at the cost of results being in random order in
the output, they are executed non-deterministically anyway.
I ended up with a (simplified) implementation that looks something like this:
import sttp.client._
import sttp.client.asynchttpclient.monix._
import monix.eval.Task
object SO extends App {
val postTask = AsyncHttpClientMonixBackend().flatMap { implicit backend =>
val r1 = basicRequest.get(uri"https://hello.world.io/v1/bla")
.header("accept", "application/json")
.response(asString)
.body()
.send()
val items = Seq(r1.map(x => x.body))
Task.parSequenceN(1)(items).guarantee(backend.close())
}
import monix.execution.Scheduler.Implicits.global
postTask.runToFuture.foreach(println)
}

Does Scala Futures/ExecutionContext have something like C#'s ConfigureAwait

C#'s Tasks have ConfigureAwait(false) for libraries to prevent synchronization to (for example) the UI-thread which is not always necessary:
http://msdn.microsoft.com/en-us/magazine/hh456402.aspx
In .NET I believe there can only be one SynchonizationContext, so it's clear on which threadpool a Task should execute it's continuation.
For a library, when you can't assume the user is in a webrequest(in .NET HttpContext.Current.Items flows), commandline (normal multithreaded), XAML/Windows Forms (single UI thread), it's almost always better to use ConfigureAwait(false), so the Waiter knows it can just execute the continuation on whatever thread is being used to call the Waiter (this is only bad if you do blocking code in the library which could lead to thread starvation on the threadpool where the initial workload is started, let assume we don't do that).
The point is that from a library perspective you don't want to use a thread from the caller's threadpool to synchronize a continuation, you just want the continuation to run on whatever thread. This saves a context switch and keeps the load of the UI thread for example.
In Scala, for each operation (namely map) on Futures, you need an ExecutionContext (passed implicitly). This makes managing threadpools incredibly easy, which I like a lot more than the way .NET has somewhat strange TaskFactory's (which nobody seems to use, they just use the default TaskFactory).
My question is, does Scala have the same problem as .NET in respect to context switches being sometimes unnecessary, and if so, is there a way, similar to ConfigureAwait, to fix this?
Concrete example I'm finding in Scala where I wonder about this:
def trace[T](message: => String)(block: => Future[T]): Future[T] = {
if (!logger.isTraceEnabled) block
else {
val startedAt = System.currentTimeMillis()
block.map { result =>
val timeTaken = System.currentTimeMillis() - startedAt
logger.trace(s"$message took ${timeTaken}ms")
result
}
}
}
I'm using play and I generally import play's default, implicit ExecutionContext.
The map on block needs to run on an execution context.
If I wrote this piece of Scala in a library and I would add an implicit parameter executionContext:
def trace[T](message: => String)(block: => Future[T])(implicit executionContext: ExecutionContext): Future[T] = {
instead of importing play's default ExecutionContext in the libary.

How to run Akka

It seems like there is no need in a class with a main method in it to be able to run Akka How to run akka actors in IntelliJ IDEA. However, here is what I have:
object Application extends App {
val system = ActorSystem()
val supervisor = system.actorOf(Props[Supervisor])
implicit val timeout = Timeout(100 seconds)
import system.dispatcher
system.scheduler.schedule(1 seconds, 600 seconds) {
val future = supervisor ? Supervisor.Start
val list = Await.result(future, timeout.duration).asInstanceOf[List[Int]]
supervisor ! list
}
}
I know I have to specify a main method called "akka.Main" in the configuration. But nonetheless, where should I move the current code from object Application ?
You can write something like
import _root_.akka.Main
object Application extends App {
Main.main(Array("somepackage.Supervisor"))
}
and Supervisor actor should have overriden preStart function as #cmbaxter suggested.
Then run sbt console in intellij and write run.
I agree with #kdrakon that your code is fine the way it is, but if you wanted to leverage the akka.Main functionality, then a simple refactor like so will make things work:
package code
class ApplicationActor extends Actor {
override def preStart = {
val supervisor = context.actorOf(Props[Supervisor])
implicit val timeout = Timeout(100 seconds)
import context.dispatcher
context.system.scheduler.schedule(1 seconds, 600 seconds) {
val future = (supervisor ? Supervisor.Start).mapTo[List[Int]]
val list = Await.result(future, timeout.duration)
supervisor ! list
}
}
def receive = {
case _ => //Not sure what to do here
}
}
In this case, the ApplicationActor is the arg you would pass to akka.Main and it would basically be the root supervisor to all other actors created in your hierarchy. The only fishy thing here is that being an Actor, it needs a receive implementation and I don't imagine any other actors will be sending messages here thus it doesn't really do anything. But the power to this approach is that when the ApplicationActor is stopped, the stop will also be cascaded down to all other actors that it started, simplifying a graceful shutdown. I suppose you could have the ApplicationActor handle a message to shutdown the actor system given some kind of input (maybe a ShutdownHookThread could initiate this) and give this actor some kind of a purpose after all. Anyway, as stated earlier, your current approach seems fine but this could also be an option if you so desire.
EDIT
So if you wanted to run this ApplicationActor via akka.Main, according to the instructions here, you would execute this from your command prompt:
java -classpath <all those JARs> akka.Main code.ApplicationActor
You will of course need to supply <all those JARS> with your dependencies including akka. At a minimum you will need scala-library and akka-actor in your classpath to make this run.
If you refer to http://doc.akka.io/docs/akka/snapshot/scala/hello-world.html, you'll find that akka.Main expects your root/parent Actor. In your case, Supervisor. As for your already existing code, it can be copied directly into the actors code, possibly in some initialisation calls. For example, refer to the HelloWorld's preStart function.
However, in my opinion, your already existing code is just fine too. Akka.main is a nice helper, as is the microkernel binary. But creating your own main executable is a viable option too.