I am new to Scala, and was trying to use some parallel constructs(Future in particular).
I found there is an implicit parameter of type ExecutionContext. IMO, it is something similar to(and maybe more abstract than) the concept of thread pool. I have tried to learn it through documentation, but I cannot find any clear and detailed introduction about it.
Could anyone please explain what exactly execution context is in Scala? And what is the purpose of introducing execution context to the language?
The basic idea is pretty simple: you have a callback that's going to be executed at some point. On what thread will it be executed? The current one? A new one? One from a pool? That's for the execution context to decide. The default one (ExecutionContext.global) uses threads from a global pool (with a number of threads determined by how many CPU cores you have).
In other circumstances, you might want to use a different context. For example, Akka actors can use their dispatcher as an execution context.
Related
Reading the scala source code for scala.concurrent.Future and scala.concurrent.impl.Future, it seems that every future composition via map dispatches a new task for an executor. I assume this commonly triggers a context switch for the current thread, and/or an assignment of a thread for the job.
Considering that function flows need to pass around Futures between them to act on results of Futures without blocking (or without delving into callback spaghetti), isn't this "reactive" paradigm very pricy in practice, when code is well written in a modular way where each function only does something small and passes along to other ones?
It depends on the execution context. So you can choose the strategy.
You're executor can also just do it in the calling thread, keeping the map-calls on the same thread. You can pass your own strategy by passing explicitly the execution context or use the implicit.
I would first test what the default fork/join pool does, by logging which thread was used. Afaik newer versions of it sometimes utilize the submitting thread. However, I don't know if that's used / applied for scala future callbacks.
I wonder what is the best (recommended, approved etc.) way to do non-blocking JDBC queries in Play! application using Play's connection pool (in Scala and to PostgreSQL if it matters)? I understand that JDBC is definitely blocking per se, but surely there are approaches to do the calls in separated threads (e.g. using futures or actors) to avoid blocking of the calling thread.
Suppose I decided to wrap the calls in futures, which execution context should I use, the Play's default one? Or it's better to create separated execution context for handling DB queries?
I know that there are some libraries for this like postgresql-async, but I really want to understand the mechanics :)
Suppose I decided to wrap the calls in futures, which execution context should I use, the Play's default one? Or it's better to create separated execution context for handling DB queries?
It is better to use separate execution context in this case. This way there will be no chance that your non-blocking jobs (most of the default Play's stuff) submitted to default execution context will be jammed by blocking JDBC calls in jobs you submit to the same execution context.
I suggest to read this (especially second part) to get a general idea of how you could deal with execution contexts in different situations (including case with blocking database queries), and then refer this to get more details on configuring your scenario in Play.
Suppose I decided to wrap the calls in futures, which execution
context should I use, the Play's default one?
If you do that, you gain nothing, it's like not using futures at all. Wrapping blocking calls in futures only helps you if you execute them in separate execution contexts.
In Play, you can basically choose between the following two approaches when dealing with blocking IO:
Turn Play into a one-thread-per-request Framework by drastically increasing the default execution context. No futures needed, just call your blocking database as always. Simple, but not the intention behind Play
Create specific execution contexts for your blocking IO-calls and gain fine-grained control of what you are doing
See the docs: "Understanding Play thread pools"
How well to scala parallel collection operations get along with the concurrency/parallelism used by Akka Actors (and Futures) with respect to efficient scheduling on the system?
Actors' and Futures' execution is handled by an ExecutionContext generally provided by the Dispatcher. What I find on parallel collections indicates they use a TaskSupport object. I found a ExecutionContextTaskSupport object that may connect the two but am not sure.
What is the proper way to mix the two concurrency solutions, or is it advised not to?
At present this is not supported / handled well.
Prior to Scala 2.11-M7, attempting to use the dispatcher as the ContextExecutor throws an exception.
That is, the following code in an actor's receive will throw a NotImplementedError:
val par = List(1,2,3).par
par.tasksupport = new ExecutionContextTaskSupport(context.dispatcher)
par foreach println
Incidentally, this has been fixed in 2.11-M7, though it was not done to correct the above issue.
In reading through the notes on the fix it sounds like the implementation provided by ExecutionContextTaskSupport in the above case could have some overhead over directly using one of the other TaskSupport implementations; however, I have done nothing to test that interpretation or evaluate the magnitude of any impact.
A Note on Parallel Collections:
By default Parallel Collections will use the global ExecutorContext (ExecutionContext.Implicits.global) just as you might use for Futures. While this is well behaved, if you want to be constrained by the dispatcher (using context.dispatcher)—as you are likely to do with Futures in Akka—you need to set a different TaskSupport as shown in the code sample above.
I'd like to use reflection in combination with parallel processing in Scala, but I'm getting bitten by reflection's lack of thread safety.
So, I'm considering just running each task in its own process (not thread).
Is there any easy way to do this?
For example, is there a way to configure .par so it spawns processes, not threads? Or is there some function fork that takes a closure and runs it in a new process?
EDIT: Futures are apparently a good way to go.
However, I still need to figure out how to run them in separate processes.
EDIT 2: I'm still having concurrency issues, even when using Akka's "fork-join-executor" dispatcher, which sure sounds like it should be forking processes. However, when I run ManagementFactory.getRuntimeMXBean().getName() inside the Futures, it seems everything still lives in the same process.
Is this the right way to check for actual process-level parallelism?
Am I using the correct Akka dispatcher?
EDIT 3: I realize reflection sucks. Unfortunately it is used in a library I need.
Have you looked into Scala Actors or Akka? There may be no more compelling reason to use Scala than for parallel and asynchronous programming. It's baked into the language. Check out these facilities. I'm pretty sure you'll find what you need.
There's little information as regards the problem you're trying to solve here...previous answers are pretty much on the ball - look at Actors etc...Akka and you may find that you don't need to necessarily do anything too complicated. Introspection/reflection in a multi-threaded environment usually means a messy and not well thought-out strategy in terms of decomposing the problem in hand.
When using futures in scala the default behaviour is to use the default Implicits.global execution context. It seems this defaults to making one thread available per processor. In a more traditional threaded web application this seems like a poor default when the futures are performing a task such as waiting on a database (as opposed to some cpu bound task).
I'd expect that overriding the default context would be fairly standard in production but I can find so little documentation about doing it that it seems that it might not be that common. Am I missing something?
Instead of thinking of it as overriding the default execution context, why not ask instead "Should I use multiple execution contexts for different things?" If that's the question, then my answer would be yes. Where I work, we use Akka. Within our app, we use the default Akka execution context for non blocking functionality. Then, because there is no good non blocking jdbc driver currently, all of our blocking SQL calls use a separate execution context, where we have a thread per connection approach. Keeping the main execution context (a fork join pool) free from blocking lead to a significant increase in throughput for us.
I think it's perfectly ok to use multiple different execution contexts for different types of work within your system. It's worked well for us.
The "correct" answer is that your methods that needs to use an ExecutionContext require an ExecutionContext in their signature, so you can supply ExecutionContext(s) from the "outside" to control execution at a higher level.
Yes, creating and using other execution contexts in you application is definitely a good idea.
Execution contexts will modularize your concurrency model and isolate the different parts of your application, so that if something goes wrong in a part of your app, the other parts will be less impacted by this. To consider your example, you would have a different execution context for DB-specific operations and another one for say, processing of web requests.
In this presentation by Jonas Boner this pattern is referred to as creating "Bulkheads" in your application for greater stability & fault tolerance.
I must admit I haven't heard much about execution context usage by itself. However, I do see this principle applied in some frameworks. For example, Play will use different execution contexts for different types of jobs and they encourage you to split your tasks into different pools if necessary: Play Thread Pools
The Akka middleware also suggests splitting your app into different contexts for the different concurrency zones in your application. They use the concept of Dispatcher which is an execution context on batteries.
Also, most operators in the scala concurrency library require an execution context. This is by design to give you the flexibility you need when in modularizing your application concurrency-wise.