Scala 2.11.x concurrency: pool of workers doing something similar to map-reduce? - scala

What is the idiomatic way to implement a pool of workers in Scala, such that work units coming from some source can be allocated to the next free worker and processed asynchronously? Each worker would produce a result and eventually, all the results would need to get combined to produce the overall result.
We do not know the number of work units on which we need to run a worker in advance and we do not know in advance the optimal number of workers, because that will depend on the system we run on.
So roughly what should happen is this:
for each work unit, eventually start a worker to process it
for each finished worker, combine its result into the global result
return the global result after all the worker results have been combined
Should this be done exclusively by futures, no matter how many work units and how many workers there will be? What if the results can only be combined when they are ALL available?
Most examples of futures I have seen have a fixed number of futures and then use for comprehension to combine them, but what if the number of futures is not known and I have e.g. just a collection of an arbitrary number of futures? What if there will be billions of easier work units to get processed that way versus just a few dozen long-running ones?
Are there other, better ways to do this, e.g. with Actors instead?
How would the design ideally change when the results of each worker does not need to get combined and each worker is completely independent of the others?

Too many questions in your question to address them all.
Basically, Futures will do what you want, you can create the ExecutionContext that better fits your needs. To combine the results: Future.sequence.

Related

Concurrency, how to create an efficient actor setup?

Alright so I have never done intense concurrent operations like this before, theres three main parts to this algorithm.
This all starts with a Vector of around 1 Million items.
Each item gets processed in 3 main stages.
Task 1: Make an HTTP Request, Convert received data into a map of around 50 entries.
Task 2: Receive the map and do some computations to generate a class instance based off the info found in the map.
Task 3: Receive the class and generate/add to multiple output files.
I initially started out by concurrently running task 1 with 64K entries across 64 threads (1024 entries per thread.). Generating threads in a for loop.
This worked well and was relatively fast, but I keep hearing about actors and how they are heaps better than basic Java threads/Thread pools. I've created a few actors etc. But don't know where to go from here.
Basically:
1. Are actors the right way to achieve fast concurrency for this specific set of tasks. Or is there another way I should go about it.
2. How do you know how many threads/actors are too many, specifically in task one, how do you know what the limit is on number of simultaneous connections is (Im on mac). Is there a golden rue to follow? How many threads vs how large per thread pool? And the actor equivalents?
3. Is there any code I can look at that implements actors for a similar fashion? All the code Im seeing is either getting an actor to print hello world, or super complex stuff.
1) Actors are a good choice to design complex interactions between components since they resemble "real life" a lot. You can see them as different people sending each other requests, it is very natural to model interactions. However, they are most powerful when you want to manage changing state in your application, which does not seem to be the case for you. You can achieve fast concurrency without actors. Up to you.
2) If none of your operations is blocking the best rule is amount of threads = amount of CPUs. If you use a non blocking HTTP client, and NIO when writing your output files then you should be fully non-blocking on IOs and can just safely set the thread count for your app to the CPU count on your machine.
3) The documentation on http://akka.io is very very good and comprehensive. If you have no clue how to use the actor model I would recommend getting a book - not necessarily about Akka.
1) It sounds like most of your steps aren't stateful, in which case actors add complication for no real benefit. If you need to coordinate multiple tasks in a mutable way (e.g. for generating the output files) then actors are a good fit for that piece. But the HTTP fetches should probably just be calls to some nonblocking HTTP library (e.g. spray-client - which will in fact use actors "under the hood", but in a way that doesn't expose the statefulness to you).
2) With blocking threads you pretty much have to experiment and see how many you can run without consuming too many resources. Worry about how many simultaneous connections the remote system can handle rather than hitting any "connection limits" on your own machine (it's possible you'll hit the file descriptor limit but if so best practice is just to increase it). Once you figure that out, there's no value in having more threads than the number of simultaneous connections you want to make.
As others have said, with nonblocking everything you should probably just have a number of threads similar to the number of CPU cores (I've also heard "2x number of CPUs + 1", on the grounds that that ensures there will always be a thread available whenever a CPU is idle).
With actors I wouldn't worry about having too many. They're very lightweight.
If you have really no expierience in Akka try to start with something simple like doing a one-to-one actor-thread rewriting of your code. This will be easier to grasp how things work in akka.
Spin two actors at the begining one for receiving requests and one for writting to the output file. Then when request is received create an actor in request-receiver actor that will do the computation and send the result to the writting actor.

What is the benefit of using Futures over parallel collections in scala?

Is there a good reason for the added complexity of Futures (vs parallel collections) when processing a list of items in parallel?
List(...).par.foreach(x=>longRunningAction(x))
vs
Future.traverse(List(...)) (x=>Future(longRunningAction(x)))
I think the main advantage would be that you can access the results of each future as soon as it is computed, while you would have to wait for the whole computation to be done with a parallel collection. A disadvantage might be that you end up creating lots of futures. If you later end up calling Future.sequence, there really is no advantage.
Parallel collections will kill of some threads as we get closer to processing all items. So last few items might be processed by single thread.
Please see my question for more details on this behavior Using ThreadPoolTaskSupport as tasksupport for parallel collections in scala
Future does no such thing, and all your threads are in use until all objects are processed. Hence unless your tasks are so small that you dont care about loss of parallelism for last few tasks and you are using huge number of threads, which have to killed of as soon as possible, Futures are better.
Futures become useful as soon as you want to compose your deferred / concurrent computations. Futures (the good kind, anyway, such as Akka's) are monadic and hence allow you to build arbitrarily complex computational structures with all the concurrency and synchronization handled properly by the Futures library.

Process work in parallel with non-threadsafe function in scala

I have a lot of work (thousands of jobs) for a Scala application to process. Each piece of work is the file name of a 100 MB file. To process each file, I need to use an extractor object that is not thread safe (I can have multiple copies, but copies are expensive, and I should not make one per job). What is the best way to complete this work in parallel in Scala?
You can wrap your extractor in an Actor and send each file name to the actor as a message. Since an instance of an actor will process only one message at a time, thread safety won't be an issue. If you want to use multiple extractors, just start multiple instances of the actor and balance between them (you could write another actor to act as a load balancer).
The extractor actor(s) can then send extracted files to other actors to do the rest of the processing in parallel.
Don't make 1000 jobs, but make 4x250 jobs (targeting 4 threads) and give one extractor to each batch. Inside each batch, work sequentially. This might not be optimal parallel-wise, since one batch might finish earlier but it is very easy to implement.
Probably the correct (but more complicated) solution would be to make a pool of extractors, where jobs take extractors from and put them back after finishing.
I would make a thread pool, where each thread has an instance of the extractor class, and instantiate just as many of these threads as it takes to saturate the system (based on CPU usage, IO bandwidth, memory bandwidth, network bandwidth, contention for other shared resources, etc.). Then use a thread-safe work queue that these threads can pull tasks from, process them, and iterate until the container is empty.
Mind you, there should be one or several libraries in just about any modern language that implements exactly this. In C++, it would be Intel's Threading Building Blocks. In Objective-C, it would be Grand Central Dispatch.
It depends: what's the relative amount of CPU consumed by the extractor for each job ?
If it is very small, you have a classic single-producer/multiple-consumer problem for which you can find lots of solution in different languages. For Scala, if you are reluctant to start using actors, you can still use the Java API (Runnable, Executors and BlockingQueue, are quite good).
If it is a substantial amount (more than 10%), you app will never scale with a multithread model (see Amdhal law). You may prefer to run several process (several JVM) to obtain thread safety, and thus eliminate the non-sequential part.
First question: how quick does the work need to be completed?
Second question: would this work be isolated to a single physical box or what are your upper bounds on computational resource.
Third question: does the work that needs doing to each individual "job" require blocking and is it serialised or could be partitioned into parallel packets of work?
Maybe think about a distributed model whereby you scale through designing with a mind to pushing out across multiple nodes from the first instance, actors, remoteref all that crap first...try and keep your logic simple and easy - so serialised. Don't just think in terms of a single box.
Most answers here seem to dwell on the intricacies of spawning thread pools and executors and all that stuff - which is fine, but be sure you have a handle on the real problem first, before you start complicating your life with lots of thinking around how you manage the synchronisation logic.
If a problem can be decomposed, then decompose it. Don't overcomplicate it for the sake of doing so - it leads to better engineered code and less sleepless nights.

Using Actors to exploit cores

I'm new to Scala in general and Actors in particular and my problem is so basic, the online resources I have found don't cover it.
I have a CPU-intensive, easily parallelized algorithm that will be run on an n-core machine (I don't know n). How do I implement this in Actors so that all available cores address the problem?
The first way I thought of was to simple break the problem into m pieces (where m is some medium number like 10,000) and create m Actors, one for each piece, give each Actor its little piece and let 'em go.
Somehow, this struck me as inefficient. Zillions of Actors just hanging around, waiting for some CPU love, pointlessly switching contexts...
Then I thought, make some smaller number of Actors, and feed each one several pieces. The problem was, there's no reason to expect the pieces are the same size, so one core might get bogged down, with many of its tasks still queued, while other cores are idle.
I noodled around with a Supervisor that knew which Actors were busy, and eventually realized that this has to be a solved problem. There must be a standard pattern (maybe even a standard library) for dealing with this very generic issue. Any suggestions?
Take a look at the Akka library, which includes an implementaton of actors. The Dispatchers Module gives you more options for limiting actors to cpu threads (HawtDispatch-based event-driven) and/or balancing the workload (Work-stealing event-based).
Generally, there're 2 kinds of actors: those that are tied to threads (one thread per actor), and those that share 1+ thread, working behind a scheduler/dispatcher that allocates resources (= possibility to execute a task/handle incoming message against controlled thread-pool or a single thread).
I assume, you use second type of actors - event-driven actors, because you mention that you run 10k of them. No matter how many event-driven actors you have (thousands or millions), all of them will be fighting for the small thread pool to handle the message. Therefore, you will even have a worse performance dividing your task queue into that huge number of portions - scheduler will either try to handle messages sent to 10k actors against a fixed thread pool (which is slow), or will allocate new threads in the pool (if the pool is not bounded), which is dangerous (in the worst case, there will be started 10k threads to handle messages).
Event-driven actors are good for short-time (ideally, non-blocking) tasks. If you're dealing with CPU-intensive tasks I'd limit number of threads in the scheduler/dispatcher pool (when you use event-driven actors) or actors themselves (when you use thread-based actors) to the number of cores to achieve the best performance.
If you want this to be done automatically (adjust number of threads in dispatcher pool to the number of cores), you should use HawtDisaptch (or it's Akka implementation), as it was proposed earlier:
The 'HawtDispatcher' uses the
HawtDispatch threading library which
is a Java clone of libdispatch. All
actors with this type of dispatcher
are executed on a single system wide
fixed sized thread pool. The number of
of threads will match the number of
cores available on your system. The
dispatcher delivers messages to the
actors in the order that they were
producer at the sender.
You should look into Futures I think. In fact, you probably need a threadpool which simply queues threads when a max number of threads has been reached.
Here is a small example involving futures: http://blog.tackley.net/2010/01/scala-futures.html
I would also suggest that you don't pay too much attention to the context switching since you really can't do anything but rely on the underlying implementation. Of course a rule of thumb would be to keep the active threads around the number of physical cores, but as I noted above this could be handled by a threadpool with a fifo-queue.
NOTE that I don't know if Actors in general or futures are implemented with this kind of pool.
For thread pools, look at this: http://www.scala-lang.org/api/current/scala/concurrent/ThreadPoolRunner.html
and maybe this: http://www.scala-lang.org/api/current/scala/actors/scheduler/ResizableThreadPoolScheduler.html
Good luck
EDIT
Check out this piece of code using futures:
import scala.actors.Futures._
object FibFut {
def fib(i: Int): Int = if (i < 2) 1 else fib(i - 1) + fib(i - 2)
def main(args: Array[String]) {
val fibs = for (i <- 0 to 42) yield future { fib(i) }
for (future <- fibs) println(future())
}
}
It showcases a very good point about futures, namely that you decide in which order to receive the results (as opposed to the normal mailbox-system which employs a fifo-system i.e. the fastest actor sends his result first).
For any significant project, I generally have a supervisor actor, a collection of worker actors each of which can do any work necessary, and a large number of pieces of work to do. Even though I do this fairly often, I've never put it in a (personal) library because the operations end up being so different each time, and the overhead is pretty small compared to the whole coding project.
Be aware of actor starvation if you end up utilizing the general actor threadpool. I ended up simply using my own algorithm-task-owned threadpool to handle the parallelization of a long-running, concurrent task.
The upcoming Scala 2.9 is expected to include parallel data structures which should automatically handle this for some uses. While it does not use Actors, it may be something to consider for your problem.
While this feature was originally slated for 2.8, it has been postponed until the next major release.
A presentation from the last ScalaDays is here:
http://days2010.scala-lang.org/node/138/140

Does it make sense to use a pool of Actors?

I'm just learning, and really liking, the Actor pattern. I'm using Scala right now, but I'm interested in the architectural style in general, as it's used in Scala, Erlang, Groovy, etc.
The case I'm thinking of is where I need to do things concurrently, such as, let's say "run a job".
With threading, I would create a thread pool and a blocking queue, and have each thread poll the blocking queue, and process jobs as they came in and out of the queue.
With actors, what's the best way to handle this? Does it make sense to create a pool of actors, and somehow send messages to them containing or the jobs? Maybe with a "coordinator" actor?
Note: An aspect of the case which I forgot to mention was: what if I want to constrain the number of jobs my app will process concurrently? Maybe with a config setting? I was thinking that a pool might make it easy to do this.
Thanks!
A pool is a mechanism you use when the cost of creating and tearing down a resource is high. In Erlang this is not the case so you should not maintain a pool.
You should spawn processes as you need them and destroy them when you have finished with them.
Sometimes, it makes sense to limit how many working processes you have operating concurrently on a large task list, as the task the process is spawned to complete involve resource allocations. At the very least processes use up memory, but they could also keep open files and/or sockets which tend to be limited to only thousands and fail miserably and unpredictable once you run out.
To have a pull-driven task pool, one can spawn N linked processes that ask for a task, and one hand them a function they can spawn_monitor. As soon as the monitored process has ended, they come back for the next task. Specific needs drive the details, but that is the outline of one approach.
The reason I would let each task spawn a new process is that processes do have some state and it is nice to start off a clean slate. It's a common fine-tuning to set the min-heap size of processes adjusted to minimize the number of GCs needed during its lifetime. It is also a very efficient garbage collection to free all memory for a process and start on a new one for the next task.
Does it feel weird to use twice the number of processes like that? It's a feeling you need to overcome in Erlang programming.
There is no best way for all cases. The decision depends on the number, duration, arrival, and required completion time of the jobs.
The most obvious difference between just spawning off actors, and using pools is that in the former case your jobs will be finished nearly at the same time, while in the latter case completion times will be spread in time. The average completion time will be the same though.
The advantage of using actors is the simplicity on coding, as it requires no extra handling. The trade-off is that your actors will be competing for your CPU cores. You will not be able to have more parallel jobs than CPU cores (or HT's, whatever), no matter what programming paradigm you use.
As an example, imagine that you need to execute 100'000 jobs, each taking one minute, and the results are due next month. You have four cores. Would you spawn off 100'000 actors having each compete over the resources for a month, or would you just queue your jobs up, and have execute four at a time?
As a counterexample, imagine a web server running on the same machine. If you have five requests, would you prefer to serve four users in T time, and one in 2T, or serve all five in 1.2T time ?