Mixing Parallel Collections with Akka - scala

How well to scala parallel collection operations get along with the concurrency/parallelism used by Akka Actors (and Futures) with respect to efficient scheduling on the system?
Actors' and Futures' execution is handled by an ExecutionContext generally provided by the Dispatcher. What I find on parallel collections indicates they use a TaskSupport object. I found a ExecutionContextTaskSupport object that may connect the two but am not sure.
What is the proper way to mix the two concurrency solutions, or is it advised not to?

At present this is not supported / handled well.
Prior to Scala 2.11-M7, attempting to use the dispatcher as the ContextExecutor throws an exception.
That is, the following code in an actor's receive will throw a NotImplementedError:
val par = List(1,2,3).par
par.tasksupport = new ExecutionContextTaskSupport(context.dispatcher)
par foreach println
Incidentally, this has been fixed in 2.11-M7, though it was not done to correct the above issue.
In reading through the notes on the fix it sounds like the implementation provided by ExecutionContextTaskSupport in the above case could have some overhead over directly using one of the other TaskSupport implementations; however, I have done nothing to test that interpretation or evaluate the magnitude of any impact.
A Note on Parallel Collections:
By default Parallel Collections will use the global ExecutorContext (ExecutionContext.Implicits.global) just as you might use for Futures. While this is well behaved, if you want to be constrained by the dispatcher (using context.dispatcher)—as you are likely to do with Futures in Akka—you need to set a different TaskSupport as shown in the code sample above.

Related

What is the use of FastFuture in akka

What is the use of Fastfuture in akka, not clear from the documentation:
Provides alternative implementations of the basic transformation operations defined on Future, which try to avoid scheduling to an ExecutionContext if possible, i.e. if the given future value is already present.
How is it different from Future, Can someone explain with an example in what cases this to be used and what benefit does it provide in terms of performance or any other aspects?
When an ExecutionContext is used in map calls, that involves extra scheduling cost in scala Futures, while with akka FastFutures it can perform map in the same thread avoiding potential context switch and potentially causing cache misses for very short tasks (like simple number crunching). So for fast map operations FastFuture should be faster.
Please note that flatMap usually requires an ExecutionContext in FastFutures too as it should use that for scheduling the generated Futures.
It might worth checking Viktor Klang's blog and the discussion related Futures on Scala contributors page.

Atomic function/method in scala (without introducing actor system overheads)

I currently use an Akka actor to establish a code block that is executed atomically and in a thread safe manner (Akka mailbox semantics impose atomicity by virtue of processing one message at a time).
However this introduces the need for an actor system, and additional side-effects or bloat (having to manually propagate exceptions to the caller, losing type safety on ask, and in general using message semantics rather than function calls).
Can a thread-safe atomic code block be accomplished in scala in a simpler way? would you apply #volatile to a function?
It depends on what kind of shared state you want to protect here:
The easiest and universal choice is using same old synchronized. However, unlike the Akka, it's completely blocking, so may easily kill your performance and of course the code-style, as it's hard to control messy side effects. It may also allow for dead-locks.
Java's locks is same approach, but might be a little better for performance.
Another option is same old Java's AtomicReference(implements CAS operations) and related classes. The positive thing about is that they're non-blocking - developers actually use them to build high-performant collections. The ways of using locks and CAS are decribed here. They both are pretty low-level mechanizms, so I would not recommend to use them much, especially for business-logic (any actor's implementation would be better).
If your shared state is a collection - you may want use same old Java's concurrent collections (they have atomic operations like putIfAbscent). Scala has interesting non-blocking TrieMap for instance.
Scala STM is also an alternative
Finally, this question is dedicated to lightweight actor model implementations.
P.S. Volatile annotation is nothing more than volatile keyword analog from Java. You can put it on the method just because any annotation can be put on anything.
Depending on what you're trying to achieve, the simplest might be old synchronized:
//your mutable state
private var x = 0
//better than locking on 'this' is to have a dedicated lock
private val lock = new Object
def add(i:Int) = lock.synchronized { x += i }
This is the 'old Java' way, but it might work for you depending on what you're doing. Of course, this is the fastest way to deadlocks if your synchronize operation is more complex and/or you need high throughput.

Scala futures and threads

Reading the scala source code for scala.concurrent.Future and scala.concurrent.impl.Future, it seems that every future composition via map dispatches a new task for an executor. I assume this commonly triggers a context switch for the current thread, and/or an assignment of a thread for the job.
Considering that function flows need to pass around Futures between them to act on results of Futures without blocking (or without delving into callback spaghetti), isn't this "reactive" paradigm very pricy in practice, when code is well written in a modular way where each function only does something small and passes along to other ones?
It depends on the execution context. So you can choose the strategy.
You're executor can also just do it in the calling thread, keeping the map-calls on the same thread. You can pass your own strategy by passing explicitly the execution context or use the implicit.
I would first test what the default fork/join pool does, by logging which thread was used. Afaik newer versions of it sometimes utilize the submitting thread. However, I don't know if that's used / applied for scala future callbacks.

Should I override the default ExecutionContext?

When using futures in scala the default behaviour is to use the default Implicits.global execution context. It seems this defaults to making one thread available per processor. In a more traditional threaded web application this seems like a poor default when the futures are performing a task such as waiting on a database (as opposed to some cpu bound task).
I'd expect that overriding the default context would be fairly standard in production but I can find so little documentation about doing it that it seems that it might not be that common. Am I missing something?
Instead of thinking of it as overriding the default execution context, why not ask instead "Should I use multiple execution contexts for different things?" If that's the question, then my answer would be yes. Where I work, we use Akka. Within our app, we use the default Akka execution context for non blocking functionality. Then, because there is no good non blocking jdbc driver currently, all of our blocking SQL calls use a separate execution context, where we have a thread per connection approach. Keeping the main execution context (a fork join pool) free from blocking lead to a significant increase in throughput for us.
I think it's perfectly ok to use multiple different execution contexts for different types of work within your system. It's worked well for us.
The "correct" answer is that your methods that needs to use an ExecutionContext require an ExecutionContext in their signature, so you can supply ExecutionContext(s) from the "outside" to control execution at a higher level.
Yes, creating and using other execution contexts in you application is definitely a good idea.
Execution contexts will modularize your concurrency model and isolate the different parts of your application, so that if something goes wrong in a part of your app, the other parts will be less impacted by this. To consider your example, you would have a different execution context for DB-specific operations and another one for say, processing of web requests.
In this presentation by Jonas Boner this pattern is referred to as creating "Bulkheads" in your application for greater stability & fault tolerance.
I must admit I haven't heard much about execution context usage by itself. However, I do see this principle applied in some frameworks. For example, Play will use different execution contexts for different types of jobs and they encourage you to split your tasks into different pools if necessary: Play Thread Pools
The Akka middleware also suggests splitting your app into different contexts for the different concurrency zones in your application. They use the concept of Dispatcher which is an execution context on batteries.
Also, most operators in the scala concurrency library require an execution context. This is by design to give you the flexibility you need when in modularizing your application concurrency-wise.

More parallel actors in scala

(sort of followup to How to make a code thread safe in scala? )
I have a scala class that can inherently be called only from one thread (let's call it class ThreadUnsafeProducer); it is, however, safe to have more threads to each access exactly one object. However, the ThreadUnsafeProducer is quite memory heavy, so I don't want each thread to have one ThreadUnsafeProducer.
I want to have a given number N of ThreadUnsafeProducer objects (ideally one for each CPU).
I have lots of threads Consumer that all share the same object SharedObject.
I want to somehow use Actors model to give messages to either SharedObject or ThreadUnsafeProducer (I am not sure which) to have a given number of concurrent ThreadUnsafeProducer running. And I am quite lost in all the Akka/Actors classes.
I recently found Akka Routing classes
http://doc.akka.io/docs/akka/2.0/scala/routing.html
It looks really nice and exactly what I need. If it works it would be beautiful.