How to throttle the execution of future? - scala

I have basically list of unique ids, and for every id, i make a call to function which returns future.
Problem is number of futures in a single call is variable.
list.map(id -> futureCall)
There will be too much parallelism which can affect my system. I want to configure number of futures execution in parallel.
I want testable design so i can't do this
After searching alot, i found this
I didn't get it how to use it. I tried but it didn't work.
After that i have just imported it in my class where i am making call.
I have used same snippet and set default maxConcurrent to 4.
I replaced import global execution context with ThrottledExecutionContext

You have to wrap your ExecutionContext with ThrottledExecutionContext.
Here is a little sample:
object TestApp extends App {
implicit val ec = ThrottledExecutionContext(maxConcurrents = 10)(scala.concurrent.ExecutionContext.global)
def futureCall(id:Int) = Future {
println(s"executing $id")
Thread.sleep(500)
id
}
val list = 1 to 1000
val results = list.map(futureCall)
Await.result(Future.sequence(results), 100.seconds)
}
Alternatively you can also try a FixedThreadPool:
implicit val ec = ExecutionContext.fromExecutor(java.util.concurrent.Executors.newFixedThreadPool(10))

I am not sure what you are trying to do here. Default global ExecutionContext uses as many threads as you have CPU cores. So, that would be your parallelism. If that's still "too many" for you, you can control that number with a system property: "scala.concurrent.context.maxThreads", and set that to a lower number.
That will be the maximum number of futures that are executed in parallel at any given time. You should not need to throttle anything explicitly.
Alternatively, you can create your own executor, and give it a BlockingQueue with a limited capacity. That would block on the producer side (when a work item is being submitted), like your implementation does, but I would very strongly advice you from doing that as it is very dangerous and prone to deadlocks, and also much less efficient, that the default ForkJoinPool implementation.

Related

What is the cost of a Scala Future?

What is the cost of a scala Future? Is it bad practice to spin up, say, 1000 of them only to flatMap them away again right away?
In my case, I don't need 1000 futures - I could actually get away with about 10 or so, but it makes my code cleaner to use more futures, and I'm trying to get a sense of tradeoffs between code elegance and abusing resources. Obviously if I had blocking code, they'd be expensive, but if not, how many should I feel free to spin up to save a few lines of code?
You say you create some of them just to deal with a homogeneous list of Future[T]. In that case, if you just want to lift some T to a Future[T], you can do Future.successful(myValue). This causes no asynchronous background operations to be performed. It's a ready value, just wrapped in Future context.
EDIT: After re-reading your question and comments, I believe this is enough for an answer. Continue reading for extra info.
Regarding flatMapping, be aware that if you create 1000 futures beforehand as 1000 different vals, they will start right away (well, whenever JVM execution context decides that it's a good time to start, but definitely as soon as possible). However, if you create them in-place inside the flatMap, they will be chained (whole point of M-word's flatMap is to chain stuff in sequential series, with each step possibly depending on the result of previous one).
Demo:
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
val f1 = Future({ Thread.sleep(2000); println("one") })
val f2 = Future({ Thread.sleep(2000); println("two") })
val result1 = f1.flatMap(f => f2)
Await.result(result1, 5 seconds)
val result2 = Future({ Thread.sleep(2000); println("one") })
.flatMap(f => Future({ Thread.sleep(2000); println("two") }))
Await.result(result2, 5 seconds)
In the case of result1, you will get "one" and "two" printed out together after two seconds (sometimes even first "two" then "one"). But in the second case where we created the futures in-place inside the flatMap, first "one" is printed after two seconds, and then after another two seconds "two". This means that if you chain 1000 Futures like this, but the chain breaks at step 42, the rest 958 futures will not be calculated.
By combining these two facts about Futures you can avoid creating unnecessary ones.
I hope I helped at least a little, because regarding your main question - how much memory and other overhead does a Future cost - I don't have the numbers. That really depends on the settings of your JVM and the machine that's running the code. I do think however that even if your system can take anything you throw at it, you shouldn't be doing (a lot of) unnecessary background Future computations. And even though there are such things as sensible timeouts and cancelling via their respective Promises, creating an extra million of Futures you won't need sounds like a bad design IMHO.
(Note that I said "background computations". If you mainly need all these Futures to keep all types "in the future level" so that the whole code is easier to work with (e.g. with for comprehensions), in that case aforementioned Future.successful is your friend since it's not a computation, just an already computed value stored in a Future context)
I might have misunderstood your question. Correct me if I am mistaken.
What is the cost of a scala Future?
Whenever you wrap expression(s) in future, a java runnable is created under the hood. And a Runnable is just an interface with a run(). Your block of codes is then wrapper inside the run method of the runnable. This runnable is then submitted to the execution context.
In a very general sense, a future is nothing more than a runnable with bunch of helper methods. An instance of future is no different from other objects. You may reference this thread to get a rough idea on what's the memory consumption of a single java object.
If you are interested, you can trace the whole chain of action starting from the creation of a future

Play 2.5.x (Scala) -- How does one put a value obtained via wsClient into a (lazy) val

The use case is actually fairly typical. A lot of web services use authorization tokens that you retrieve at the start of a session and you need to send those back on subsequent requests.
I know I can do it like this:
lazy val myData = {
val request = ws.url("/some/url").withAuth(user, password, WSAuthScheme.BASIC).withHeaders("Accept" -> "application/json")
Await.result(request.get().map{x => x.json }, 120.seconds)
}
That just feels wrong as all the docs say never us Await.
Is there a Future/Promise Scala style way of handling this?
I've found .onComplete which allows me to run code upon the completion of a Promise however without using a (mutable) var I see no way of getting a value in that scope into a lazy val in a different scope. Even with a var there is a possible timing issue -- hence the evils of mutable variables :)
Any other way of doing this?
Unfortunately, there is no way to make this non-blocking - lazy vals are designed to be synchronous and to block any thread accessing them until they are completed with a value (internally a lazy val is represented as a simple synchronized block).
A Future/Promise Scala way would be to use a Future[T] or a Promise[T] instead of a val x: T, but that way implies a great deal of overhead with executionContexts and maps upon each use of the val, and more optimal resource utilization may not be worth the decreased readability in all cases, so it may be OK to leave the Await there if you extensively use the value in many parts of your application.

Locking read/write operations on a data structure in Scala/akka

I have multiple actors (in the form of Futures) firing other futures off based on what they read from a single object's cache. I want to make sure that no work overlaps, and thus want to put a lock on all read/modify/write operations. How do I do this in Scala?
I tried this, but I don't want every method/function that accesses the cache to have to be synchronized, but rather have anything that tries to access the cache understand that it needs to wait until it's time for it to access.
//The cache
object certCache {
var cache = new HashMap[Char, Future[Boolean]]
}
def someMethod = synchronized {
if(certCache ... )
certCache.do(...)
}
Any tips?
Agents
The akka library has a perfect solution for your question: Agents. From the documentation:
import scala.concurrent.ExecutionContext.Implicits.global
import akka.agent.Agent
val agent = Agent(42)
To read from an Agent you can dereference them or call the get method, both of which are immediately returning synchronous calls:
val agentResult = agent()
val agentResult2 = agent.get
Updates are asynchronous:
agent send (_ + 10) //updates value to 52, eventually
Similarly, you can get a Future of the Agent's value which completes after the currently queued updates have completed:
val futureValue = agent.future
Actors
Of course you can always go with a "home grown" solution and write an Actor that caches your values and responds to queries. BUT, this is a much more manual/inefficient solution than Agents.
Actors should only be considered as a last resort when other akka/scala solutions do not apply. This is because Actors are very low-level and the receive method is not compose-able.

Akka actor forward message with continuation

I have an actor which takes the result from another actor and applies some check on it.
class Actor1(actor2:Actor2) {
def receive = {
case SomeMessage =>
val r = actor2 ? NewMessage()
r.map(someTransform).pipeTo(sender)
}
}
now if I make an ask of Actor1, we now have 2 futures generated, which doesnt seem overly efficient. Is there a way to provide a foward with some kind of continuation, or some other approach I could use here?
case SomeMessage => actor2.forward(NewMessage, someTransform)
Futures are executed in an ExecutionContext, which are like thread pools. Creating a new future is not as expensive as creating a new thread, but it has its cost. The best way to work with futures is to create as much as needed and compose then in a way that things that can be computed in parallel are computed in parallel if the necessary resources are available. This way you will make the best use of your machine.
You mentioned that akka documentation discourages excessive use of futures. I don't know where you read this, but what I think it means is to prefer transforming futures rather than creating your own. This is exactly what you are doing by using map. Also, it may mean that if you create a future where it is not needed you are adding unnecessary overhead.
In your case you have a call that returns a future and you need to apply sometransform and return the result. Using map is the way to go.

Scala - futures and concurrency

I am trying to understand Scala futures coming from Java background: I understand you can write:
val f = Future { ... }
then I have two questions:
How is this future scheduled? Automatically?
What scheduler will it use? In Java you would use an executor that could be a thread pool etc.
Furthermore, how can I achieve a scheduledFuture, the one that executes after a specific time delay? Thanks
The Future { ... } block is syntactic sugar for a call to Future.apply (as I'm sure you know Maciej), passing in the block of code as the first argument.
Looking at the docs for this method, you can see that it takes an implicit ExecutionContext - and it is this context which determines how it will be executed. Thus to answer your second question, the future will be executed by whichever ExecutionContext is in the implicit scope (and of course if this is ambiguous, it's a compile-time error).
In many case this will be the one from import ExecutionContext.Implicits.global, which can be tweaked by system properties but by default uses a ThreadPoolExecutor with one thread per processor core.
The scheduling however is a different matter. For some use-cases you could provide your own ExecutionContext which always applied the same delay before execution. But if you want the delay to be controllable from the call site, then of course you can't use Future.apply as there are no parameters to communicate how this should be scheduled. I would suggest submitting tasks directly to a scheduled executor in this case.
Andrzej's answer already covers most of the ground in your question. Worth mention is that Scala's "default" implicit execution context (import scala.concurrent.ExecutionContext.Implicits._) is literally a java.util.concurrent.Executor, and the whole ExecutionContext concept is a very thin wrapper, but is closely aligned with Java's executor framework.
For achieving something similar to scheduled futures, as Mauricio points out, you will have to use promises, and any third party scheduling mechanism.
Not having a common mechanism for this built into Scala 2.10 futures is a pity, but nothing fatal.
A promise is a handle for an asynchronous computation. You create one (assuming ExecutionContext in scope) by calling val p = Promise[Int](). We just promised an integer.
Clients can grab a future that depends on the promise being fulfilled, simply by calling p.future, which is just a Scala future.
Fulfilling a promise is simply a matter of calling p.successful(3), at which point the future will complete.
Play 2.x solves scheduling by using promises and a plain old Java 1.4 Timer.
Here is a linkrot-proof link to the source.
Let's also take a look at the source here:
object Promise {
private val timer = new java.util.Timer()
def timeout[A](message: => A, duration: Long, unit: TimeUnit = TimeUnit.MILLISECONDS)
(implicit ec: ExecutionContext): Future[A] = {
val p = Promise[A]()
timer.schedule(new java.util.TimerTask {
def run() {
p.completeWith(Future(message)(ec))
}
}, unit.toMillis(duration))
p.future
}
}
This can then be used like so:
val future3 = Promise.timeout(3, 10000) // will complete after 10 seconds
Notice this is much nicer than plugging a Thread.sleep(10000) into your code, which will block your thread and force a context switch.
Also worth noticing in this example is the val p = Promise... at the function's beginning, and the p.future at the end. This is a common pattern when working with promises. Take it to mean that this function makes some promise to the client, and kicks off an asynchronous computation in order to fulfill it.
Take a look here for more information about Scala promises. Notice they use a lowercase future method from the concurrent package object instead of Future.apply. The former simply delegates to the latter. Personally, I prefer the lowercase future.