Cats a List of State Monads "fail fast" on <...>.sequence method? - scala

let's say we have a list of states and we want to sequence them:
import cats.data.State
import cats.instances.list._
import cats.syntax.traverse._
trait MachineState
case object ContinueRunning extends MachineState
case object StopRunning extends MachineState
case class Machine(candy: Int)
val addCandy: Int => State[Machine, MachineState] = amount =>
State[Machine, MachineState] { machine =>
val newCandyAmount = machine.candy + amount
if(newCandyAmount > 10)
(machine, StopRunning)
else
(machine.copy(newCandyAmount), ContinueRunning)
}
List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50)).sequence.run(Machine(0)).value
Result would be
(Machine(10),List(ContinueRunning, ContinueRunning, ContinueRunning, StopRunning, StopRunning, StopRunning))
It's obvious that 3 last steps are redundant. Is there a way to make this sequence stop early? Here when StopRunning gets returned I would like to stop. For example a list of Either's would fail fast and stop sequence early if needed (because it acts like a monad).
For the record - I do know that it is possible to simply write a tail recursion that checks each state that is being runned and if some condition is satisfied - stop the recursion. I just want to know if there is a more elegant way of doing this? The recursion solution seems like a lot of boilerplate to me, am I wrong or not?
Thank you!:))

There are 2 things here needed to be done.
The first is understanding what is actually happening:
State takes some state value, threads in between many composed calls and in the process produces some output value as well
in your case Machine is the state threaded between calls, while MachineState is the output of a single operation
sequence (usually) takes a collection (here List) of some parametric stuff here State[Machine, _] and turns nesting on the left side (here: List[State[Machine, _]] -> State[Machine, List[_]]) (_ is the gap that you'll be filling with your type)
the result is that you'll thread state (Machine(0)) through all the functions, while you combine the output of each of them (MachineState) into list of outputs
// ammonite
// to better see how many times things are being run
# {
val addCandy: Int => State[Machine, MachineState] = amount =>
State[Machine, MachineState] { machine =>
val newCandyAmount = machine.candy + amount
println("new attempt with " + machine + " and " + amount)
if(newCandyAmount > 10)
(machine, StopRunning)
else
(machine.copy(newCandyAmount), ContinueRunning)
}
}
addCandy: Int => State[Machine, MachineState] = ammonite.$sess.cmd24$$$Lambda$2669/1733815710#25c887ca
# List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50)).sequence.run(Machine(0)).value
new attempt with Machine(0) and 1
new attempt with Machine(1) and 2
new attempt with Machine(3) and 5
new attempt with Machine(8) and 10
new attempt with Machine(8) and 20
new attempt with Machine(8) and 50
res25: (Machine, List[MachineState]) = (Machine(8), List(ContinueRunning, ContinueRunning, ContinueRunning, StopRunning, StopRunning, StopRunning))
In other words, what you want is circuit breaking then .sequence might not be what you want.
As a matter of the fact, you probably want something else - combine a list of A => (A, B) functions into one function which stops next computation if the result of a computation is StopRunning (in your code nothing tells the code what is the condition of circuit break and how it should be performed). I would suggest doing it explicitly with some other function, e.g.:
# {
List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50))
.reduce { (a, b) =>
a.flatMap {
// flatMap and map uses MachineState
// - the second parameter is the result after all!
// we are pattern matching on it to decide if we want to
// proceed with computation or stop it
case ContinueRunning => b // runs next computation
case StopRunning => State.pure(StopRunning) // returns current result without modifying it
}
}
.run(Machine(0))
.value
}
new attempt with Machine(0) and 1
new attempt with Machine(1) and 2
new attempt with Machine(3) and 5
new attempt with Machine(8) and 10
res23: (Machine, MachineState) = (Machine(8), StopRunning)
This will eliminate the need for running code within addCandy - but you cannot really get rid of code that combines states together, so this reduce logic will be applied on runtime n-1 times (where n is the size of your list) and that cannot be helped.
BTW If you take a closer look at Either you will find that it also computes n results and only then combines them so that it looks like it's circuit breaking, but in fact isn't. Sequence is combining a result of "parallel" computations but won't interrupt them if any of them failed.

Related

Unable to get println working in Grouped Integer Range map function

I am experimenting with below code:
def TestRun(n: Int): Unit = {
(1 to n)
.grouped(4)
.map(grp => { println("Group length is: " + grp.length)})
}
TestRun(100)
And I am a bit surprised that I am not able to see any output of println after executing the program. Code compiled successfully and ran, but without any expected output.
Kindly point me what mistake I am doing.
The reason there is no output is that Range gives an Iterator which is lazy. This means that it won't create any data until it is asked. Likewise the grouped and map methods also return a lazy Iterator, so the result is a Iterator that will return a set of values only when asked. TestRun never asks for the data, so it is never generated.
One way round this is to use foreach rather than map because foreach is eager (the opposite of lazy) and will take each value from the Iterator in turn.
Another way would be to force the Iterator to become a concrete collection using something like toList:
def TestRun(n: Int): Unit = {
(1 to n)
.grouped(4)
.map(grp => { println("Group length is: " + grp.length)})
.toList
}
TestRun(100)

handle multiple Future in Scala

I want to create a list of Future, each of which could pass or fail and collate results from successful Future. How can I do this?
val futures2:List[Future[Int]] = List(Future{1}, Future{2},Future{throw new Exception("error")})
Questions
1) I want to wait for each future to finish
2) I want to collect sum of return values from each success future and ignore the ones which failed (so I should get 3).
One thing that you need to understand is that... Avoid trying to "get" values from a Future or Futures.
You can keep on operating in the Futuristic land.
val futureList = List(
Future(1),
Future(2),
Future(throw new Exception("error"))
)
// addd 1 to futures
// map will propagate errors to transformed futures
// only successful futures will result in +1, rest will stay with errors
val tranformedFutureList = futureList
.map(future => future.map(i => i + 1))
// print values of futures
// simimlar to map... for each will work only with successful futures
val unitFutureList = futureList
.map(future => future.foreach(i => println(i)))
// now lets give you sum of your "future" values
val sumFuture = futureList
.foldLeft(Future(0))((facc, f) => f.onComplete({
case Success(i) => facc.map(acc => acc + i)
case Failure(ex) => facc
})
And since OP (#Manu Chanda) asked about "getting" a value from a Promise, I am adding some bits about what Promise are in Scala.
So... first lets talk how to think about a Future in Scala.
If you see a Future[Int] then try to think of it as an ongoing computation which is "supposed to produce" an Int. Now that computation can successfully complete and result in a Success[Int] or a throw an exception and result in a Failure[Throwable]. And thus you see the functions such as onComplete, recoverWith, onFailure which seem like talking about a computation.
val intFuture = Future {
// all this inside Future {} is going to run in some other thread
val i = 5;
val j = i + 10;
val k = j / 5;
k
}
Now... what is a Promise.
Well... as the name indicates... a Promise[Int] is a promise of an Int value... nothing more.
Just like when a parent promises a certain toy to their child. Note that in this case... the parent has not necessarily started working on getting that toy, they have just promised that they will.
To complete the promise... they will first have to start working to complete it... got to market... buy from shop... come back home.Or... sometimes... they are busy so... they will ask someone else to bring that toy and keep doing their work... that other guy will try to bring that toy to parent (he may fail to buy it) and then they will complete the promise with whatever result they got from him.
So... basically a Promise wraps a Future inside of it. And that "wrapped" Future "value" can be considered as the value of the Promise.
so...
println("Well... The program wants an 'Int' toy")
// we "promised" our program that we will give it that int "toy"
val intPromise = Promise[Int]()
// now we can just move on with or life
println("Well... We just promised an 'Int' toy")
// while the program can make plans with how will it play with that "future toy"
val intFuture = intPromise.future
val plusOneIntFuture = intFuture.map(i => i + 1)
plusOneIntFuture.onComplete({
case Success(i) => println("Wow... I got the toy and modified it to - " + i)
case Failure(ex) => println("I did not get they toy")
})
// but since we at least want to try to complete our promise
println("Now... I suppose we need to get that 'Int' toy")
println("But... I am busy... I can not stop everything else for that toy")
println("ok... lets ask another thread to get that")
val getThatIntFuture = Future {
println("Well... I am thread 2... trying to get the int")
val i = 1
println("Well... I am thread 2... lets just return this i = 1 thingy")
i
}
// now lets complete our promise with whatever we will get from this other thread
getThatIntFuture.onComplete(intTry => intPromise.complete(intTry))
The above code will result in following output,
Well... The program wants an 'Int' toy
Well... We just promised an 'Int' toy
Now... I suppose we need to get that 'Int' toy
But... I am busy... I can not stop everything else for that toy
Well... I am thread 2... trying to get the int
Well... I am thread 2... lets just return this i = 1 thingy
Wow... I got the toy and modified it to - 2
Promise don't help you in "getting" a value from a Future. Asynchronous processes (or Future in Scala) are just running in another timeline... you can not "get" their "value" in your time-line unless you work on aligning your timeline with the process's time-line itself.

Scala: what is the interest in using Iterators?

I have used Iterators after have worked with Regexes in Scala but I don't really understand the interest.
I know that it has a state and if I call the next() method on it, it will output a different result every time, but I don't see anything I can do with it and that is not possible with an Iterable.
And it doesn't seem to work as Akka Streams (for example) since the following example directly prints all the numbers (without waiting one second as I would expect it):
lazy val a = Iterator({Thread.sleep(1000); 1}, {Thread.sleep(1000); 2}, {Thread.sleep(1000); 3})
while(a.hasNext){ println(a.next()) }
So what is the purpose of using Iterators?
Perhaps, the most useful property of iterators is that they are lazy.
Consider something like this:
(1 to 10000)
.map { x => x * x }
.map { _.toString }
.find { _ == "4" }
This snippet will square 10000 numbers, then generate 10000 strings, and then return the second one.
This on the other hand:
(1 to 10000)
.iterator
.map { x => x * x }
.map { _.toString }
.find { _ == "4" }
... only computes two squares, and generates two strings.
Iterators are also often useful when you need to wrap around some poorly designed (java?) objects in order to be able to handle them in functional style:
val rs: ResultSet = jdbcQuery.executeQuery()
new Iterator {
def next = rs
def hasNext = rs.next
}.map { rs =>
fetchData(rs)
}
Streams are similar to iterators - they are also lazy, and also useful for wrapping:
Stream.continually(rs).takeWhile { _.next }.map(fetchData)
The main difference though is that streams remember the data that gets materialized, so that you can traverse them more than once. This is convenient, but may be costly if the original amount of data is very large, especially, if it gets filtered down to much smaller size:
Source
.fromFile("huge_file.txt")
.getLines
.filter(_ == "")
.toList
This only uses, roughly (ignoring buffering, object overhead, and other implementation specific details), the amount of memory, necessary to keep one line in memory, plus however many empty lines there are in the file.
This on the other hand:
val reader = new FileReader("huge_file.txt")
Stream
.continually(reader.readLine)
.takeWhile(_ != null)
.filter(_ == "")
.toList
... will end up with the entire content of the huge_file.txt in memory.
Finally, if I understand the intent of your example correctly, here is how you could do it with iterators:
val iterator = Seq(1,2,3).iterator.map { n => Thread.sleep(1000); n }
iterator.foreach(println)
// Or while(iterator.hasNext) { println(iterator.next) } as you had it.
There is a good explanation of what iterator is http://www.scala-lang.org/docu/files/collections-api/collections_43.html
An iterator is not a collection, but rather a way to access the
elements of a collection one by one. The two basic operations on an
iterator it are next and hasNext. A call to it.next() will return the
next element of the iterator and advance the state of the iterator.
Calling next again on the same iterator will then yield the element
one beyond the one returned previously. If there are no more elements
to return, a call to next will throw a NoSuchElementException.
First of all you should understand what is wrong with your example:
lazy val a = Iterator({Thread.sleep(1); 1}, {Thread.sleep(1); 2},
{Thread.sleep(2); 3}) while(a.hasNext){ println(a.next()) }
if you look at the apply method of Iterator, you'll see there are no calls by name,so all Thread.sleep are calling at the same time when apply method calls. Also Thread.sleep takes parameter of time to sleep in milliseconds, so if you want to sleep your thread on one second you should pass Thread.sleep(1000).
The companion object has additional methods which allow you do the next:
val a = Iterator.iterate(1)(x => {Thread.sleep(1000); x+1})
Iterator is very useful when you need to work with large data. Also you can implement your own:
val it = new Iterator[Int] {
var i = -1
def hasNext = true
def next(): Int = { i += 1; i }
}
I don't see anything I can do with it and that is not possible with an Iterable
In fact, what most collection can do can also be done with Array, but we don't do that because it's much less convenient
So same reason apply to iterator, if you want to model a mutable state, then iterator makes more sense.
For example, Random is implemented in a way resemble to iterator because it's use case fit more naturally in iterator, rather than iterable.

Scala functional way of processing large scala data with lazy collections

I am trying to figure out memory-efficient AND functional ways to process a large scale of data using strings in scala. I have read many things about lazy collections and have seen quite a bit of code examples. However, I run into "GC overhead exceeded" or "Java heap space" issues again and again.
Often the problem is that I try to construct a lazy collection, but evaluate each new element when I append it to the growing collection (I don't now any other way to do so incrementally). Of course, I could try something like initializing an initial lazy collection first and and yield the collection holding the desired values by applying the ressource-critical computations with map or so, but often I just simply do not know the exact size of the final collection a priori to initial that lazy collection.
Maybe you could help me by giving me hints or explanations on how to improve following code as an example, which splits a FASTA (definition below) formatted file into two separate files according to the rule that odd sequence pairs belong to one file and even ones to aother one ("separation of strands"). The "most" straight-forward way to do so would be in a imperative way by looping through the lines and printing into the corresponding files via open file streams (and this of course works excellent). However, I just don't enjoy the style of reassigning to variables holding header and sequences, thus the following example code uses (tail-)recursion, and I would appreciate to have found a way to maintain a similar design without running into ressource problems!
The example works perfectly for small files, but already with files at around ~500mb the code will fail with the standard JVM setups. I do want to process files of "arbitray" size, say 10-20gb or so.
val fileName = args(0)
val in = io.Source.fromFile(fileName) getLines
type itType = Iterator[String]
type sType = Stream[(String, String)]
def getFullSeqs(ite: itType) = {
//val metaChar = ">"
val HeadPatt = "(^>)(.+)" r
val SeqPatt = "([\\w\\W]+)" r
#annotation.tailrec
def rec(it: itType, out: sType = Stream[(String, String)]()): sType =
if (it hasNext) it next match {
case HeadPatt(_,header) =>
// introduce new header-sequence pair
rec(it, (header, "") #:: out)
case SeqPatt(seq) =>
val oldVal = out head
// concat subsequences
val newStream = (oldVal._1, oldVal._2 + seq) #:: out.tail
rec(it, newStream)
case _ =>
println("something went wrong my friend, oh oh oh!"); Stream[(String, String)]()
} else out
rec(ite)
}
def printStrands(seqs: sType) {
import java.io.PrintWriter
import java.io.File
def printStrand(seqse: sType, strand: Int) {
// only use sequences of one strand
val indices = List.tabulate(seqs.size/2)(_*2 + strand - 1).view
val p = new PrintWriter(new File(fileName + "." + strand))
indices foreach { i =>
p.print(">" + seqse(i)._1 + "\n" + seqse(i)._2 + "\n")
}; p.close
println("Done bro!")
}
List(1,2).par foreach (s => printStrand(seqs, s))
}
printStrands(getFullSeqs(in))
Three questions arise for me:
A) Let's assume one needs to maintain a large data structure obtained by processing the initial iterator you get from getLines like in my getFullSeqs method (note the different size of in and the output of getFullSeqs), because transformations on the whole(!) data is required repeatedly, because one does not know which part of the data one will require at any step. My example might not be the best, but how to do so? Is it possible at all??
B) What when the desired data structure is not inherently lazy, say one would like to store the (header -> sequence) pairs into a Map()? Would you wrap it in a lazy collection?
C) My implementation of constructing the stream might reverse the order of the inputted lines. When calling reverse, all elements will be evaluated (in my code, they already are, so this is the actual problem). Is there any way to post-process "from behind" in a lazy fashion? I know of reverseIterator, but is this already the solution, or will this not actually evaluate all elements first, too (as I would need to call it on a list)? One could construct the stream with newVal #:: rec(...), but I would lose tail-recursion then, wouldn't I?
So what I basically need is to add elements to a collection, which are not evaluated by the process of adding. So lazy val elem = "test"; elem :: lazyCollection is not what I am looking for.
EDIT: I have also tried using by-name parameter for the stream argument in rec .
Thank you so much for your attention and time, I really appreciate any help (again :) ).
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
FASTA is defined as a sequential set of sequences delimited by a single header line. A header is defined as a line starting with ">". Every line below the header is called part of the sequence associated with the header. A sequence ends when a new header is present. Every header is unique. Example:
>HEADER1
abcdefg
>HEADER2
hijklmn
opqrstu
>HEADER3
vwxyz
>HEADER4
zyxwv
Thus, sequence 2 is twice as big as seq 1. My program would split that file into a file A containing
>HEADER1
abcdefg
>HEADER3
vwxyz
and a second file B containing
>HEADER2
hijklmn
opqrstu
>HEADER4
zyxwv
The input file is assumed to consist of an even number of header-sequence pairs.
The key to working with really large data structures is to hold in memory only that which is critical to perform whatever operation you need. So, in your case, that's
Your input file
Your two output files
The current line of text
and that's it. In some cases you can need to store information such as how long a sequence is; in such events, you build the data structures in a first pass and use them on a second pass. Let's suppose, for example, that you decide that you want to write three files: one for even records, one for odd, and one for entries where the total length is less than 300 nucleotides. You would do something like this (warning--it compiles but I never ran it, so it may not actually work):
final def findSizes(
data: Iterator[String], sz: Map[String,Long] = Map(),
currentName: String = "", currentSize: Long = 0
): Map[String,Long] = {
def currentMap = if (currentName != "") sz + (currentName->currentSize) else sz
if (!data.hasNext) currentMap
else {
val s = data.next
if (s(0) == '>') findSizes(data, currentMap, s, 0)
else findSizes(data, sz, currentName, currentSize + s.length)
}
}
Then, for processing, you use that map and pass through again:
import java.io._
final def writeFiles(
source: Iterator[String], targets: Array[PrintWriter],
sizes: Map[String,Long], count: Int = -1, which: Int = 0
) {
if (!source.hasNext) targets.foreach(_.close)
else {
val s = source.next
if (s(0) == '>') {
val w = if (sizes.get(s).exists(_ < 300)) 2 else (count+1)%2
targets(w).println(s)
writeFiles(source, targets, sizes, count+1, w)
}
else {
targets(which).println(s)
writeFiles(source, targets, sizes, count, which)
}
}
}
You then use Source.fromFile(f).getLines() twice to create your iterators, and you're all set. Edit: in some sense this is the key step, because this is your "lazy" collection. However, it's not important just because it doesn't read all memory in immediately ("lazy"), but because it doesn't store any previous strings either!
More generally, Scala can't help you that much from thinking carefully about what information you need to have in memory and what you can fetch off disk as needed. Lazy evaluation can sometimes help, but there's no magic formula because you can easily express the requirement to have all your data in memory in a lazy way. Scala can't interpret your commands to access memory as, secretly, instructions to fetch stuff off the disk instead. (Well, not unless you write a library to cache results from disk which does exactly that.)
One could construct the stream with newVal #:: rec(...), but I would
lose tail-recursion then, wouldn't I?
Actually, no.
So, here's the thing... with your present tail recursion, you fill ALL of the Stream with values. Yes, Stream is lazy, but you are computing all of the elements, stripping it of any laziness.
Now say you do newVal #:: rec(...). Would you lose tail recursion? No. Why? Because you are not recursing. How come? Well, Stream is lazy, so it won't evaluate rec(...).
And that's the beauty of it. Once you do it that way, getFullSeqs returns on the first interaction, and only compute the "recursion" when printStrands asks for it. Unfortunately, that won't work as is...
The problem is that you are constantly modifying the Stream -- that's not how you use a Stream. With Stream, you always append to it. Don't keep "rewriting" the Stream.
Now, there are three other problems I could readily identify with printStrands. First, it calls size on seqs, which will cause the whole Stream to be processed, losing lazyness. Never call size on a Stream. Second, you call apply on seqse, accessing it by index. Never call apply on a Stream (or List) -- that's highly inefficient. It's O(n), which makes your inner loop O(n^2) -- yes, quadratic on the number of headers in the input file! Finally, printStrands keeps a reference to seqs throughout the execution of printStrand, preventing processing elements from being garbage collected.
So, here's a first approximation:
def inputStreams(fileName: String): (Stream[String], Stream[String]) = {
val in = (io.Source fromFile fileName).getLines.toStream
val SeqPatt = "^[^>]".r
def demultiplex(s: Stream[String], skip: Boolean): Stream[String] = {
if (s.isEmpty) Stream.empty
else if (skip) demultiplex(s.tail dropWhile (SeqPatt findFirstIn _ nonEmpty), skip = false)
else s.head #:: (s.tail takeWhile (SeqPatt findFirstIn _ nonEmpty)) #::: demultiplex(s.tail dropWhile (SeqPatt findFirstIn _ nonEmpty), skip = true)
}
(demultiplex(in, skip = false), demultiplex(in, skip = true))
}
The problem with the above, and I'm showing that code just to further guide in the issues of lazyness, is that the instant you do this:
val (a, b) = inputStreams(fileName)
You'll keep a reference to the head of both streams, which prevents garbage collecting them. You can't keep a reference to them, so you have to consume them as soon as you get them, without ever storing them in a "val" or "lazy val". A "var" might do, but it would be tricky to handle. So let's try this instead:
def inputStreams(fileName: String): Vector[Stream[String]] = {
val in = (io.Source fromFile fileName).getLines.toStream
val SeqPatt = "^[^>]".r
def demultiplex(s: Stream[String], skip: Boolean): Stream[String] = {
if (s.isEmpty) Stream.empty
else if (skip) demultiplex(s.tail dropWhile (SeqPatt findFirstIn _ nonEmpty), skip = false)
else s.head #:: (s.tail takeWhile (SeqPatt findFirstIn _ nonEmpty)) #::: demultiplex(s.tail dropWhile (SeqPatt findFirstIn _ nonEmpty), skip = true)
}
Vector(demultiplex(in, skip = false), demultiplex(in, skip = true))
}
inputStreams(fileName).zipWithIndex.par.foreach {
case (stream, strand) =>
val p = new PrintWriter(new File("FASTA" + "." + strand))
stream foreach p.println
p.close
}
That still doesn't work, because stream inside inputStreams works as a reference, keeping the whole stream in memory even while they are printed.
So, having failed again, what do I recommend? Keep it simple.
def in = (scala.io.Source fromFile fileName).getLines.toStream
def inputStream(in: Stream[String], strand: Int = 1): Stream[(String, Int)] = {
if (in.isEmpty) Stream.empty
else if (in.head startsWith ">") (in.head, 1 - strand) #:: inputStream(in.tail, 1 - strand)
else (in.head, strand) #:: inputStream(in.tail, strand)
}
val printers = Array.tabulate(2)(i => new PrintWriter(new File("FASTA" + "." + i)))
inputStream(in) foreach {
case (line, strand) => printers(strand) println line
}
printers foreach (_.close)
Now this won't keep anymore in memory than necessary. I still think it's too complex, however. This can be done more easily like this:
def in = (scala.io.Source fromFile fileName).getLines
val printers = Array.tabulate(2)(i => new PrintWriter(new File("FASTA" + "." + i)))
def printStrands(in: Iterator[String], strand: Int = 1) {
if (in.hasNext) {
val next = in.next
if (next startsWith ">") {
printers(1 - strand).println(next)
printStrands(in, 1 - strand)
} else {
printers(strand).println(next)
printStrands(in, strand)
}
}
}
printStrands(in)
printers foreach (_.close)
Or just use a while loop instead of recursion.
Now, to the other questions:
B) It might make sense to do so while reading it, so that you do not have to keep two copies of the data: the Map and a Seq.
C) Don't reverse a Stream -- you'll lose all of its laziness.

Executing a simple task on another thread in scala

I was wondering if there was a way to execute very simple tasks on another thread in scala that does not have a lot of overhead?
Basically I would like to make a global 'executor' that can handle executing an arbitrary number of tasks. I can then use the executor to build up additional constructs.
Additionally it would be nice if blocking or non-blocking considerations did not have to be considered by the clients.
I know that the scala actors library is built on top of the Doug Lea FJ stuff, and also that they support to a limited degree what I am trying to accomplish. However from my understanding I will have to pre-allocate an 'Actor Pool' to accomplish.
I would like to avoid making a global thread pool for this, as from what I understand it is not all that good at fine grained parallelism.
Here is a simple example:
import concurrent.SyncVar
object SimpleExecutor {
import actors.Actor._
def exec[A](task: => A) : SyncVar[A] = {
//what goes here?
//This is what I currently have
val x = new concurrent.SyncVar[A]
//The overhead of making the actor appears to be a killer
actor {
x.set(task)
}
x
}
//Not really sure what to stick here
def execBlocker[A](task: => A) : SyncVar[A] = exec(task)
}
and now an example of using exec:
object Examples {
//Benchmarks a task
def benchmark(blk : => Unit) = {
val start = System.nanoTime
blk
System.nanoTime - start
}
//Benchmarks and compares 2 tasks
def cmp(a: => Any, b: => Any) = {
val at = benchmark(a)
val bt = benchmark(b)
println(at + " " + bt + " " +at.toDouble / bt)
}
//Simple example for simple non blocking comparison
import SimpleExecutor._
def paraAdd(hi: Int) = (0 until hi) map (i=>exec(i+5)) foreach (_.get)
def singAdd(hi: Int) = (0 until hi) foreach (i=>i+5)
//Simple example for the blocking performance
import Thread.sleep
def paraSle(hi : Int) = (0 until hi) map (i=>exec(sleep(i))) foreach (_.get)
def singSle(hi : Int) = (0 until hi) foreach (i=>sleep(i))
}
Finally to run the examples (might want to do it a few times so HotSpot can warm up):
import Examples._
cmp(paraAdd(10000), singAdd(10000))
cmp(paraSle(100), singSle(100))
That's what Futures was made for. Just import scala.actors.Futures._, use future to create new futures, methods like awaitAll to wait on the results for a while, apply or respond to block until the result is received, isSet to see if it's ready or not, etc.
You don't need to create a thread pool either. Or, at least, not normally you don't. Why do you think you do?
EDIT
You can't gain performance parallelizing something as simple as an integer addition, because that's even faster than a function call. Concurrency will only bring performance by avoiding time lost to blocking i/o and by using multiple CPU cores to execute tasks in parallel. In the latter case, the task must be computationally expensive enough to offset the cost of dividing the workload and merging the results.
One other reason to go for concurrency is to improve the responsiveness of the application. That's not making it faster, that's making it respond faster to the user, and one way of doing that is getting even relatively fast operations offloaded to another thread so that the threads handling what the user sees or does can be faster. But I digress.
There's a serious problem with your code:
def paraAdd(hi: Int) = (0 until hi) map (i=>exec(i+5)) foreach (_.get)
def singAdd(hi: Int) = (0 until hi) foreach (i=>i+5)
Or, translating into futures,
def paraAdd(hi: Int) = (0 until hi) map (i=>future(i+5)) foreach (_.apply)
def singAdd(hi: Int) = (0 until hi) foreach (i=>i+5)
You might think paraAdd is doing the tasks in paralallel, but it isn't, because Range has a non-strict implementation of map (that's up to Scala 2.7; starting with Scala 2.8.0, Range is strict). You can look it up on other Scala questions. What happens is this:
A range is created from 0 until hi
A range projection is created from each element i of the range into a function that returns future(i+5) when called.
For each element of the range projection (i => future(i+5)), the element is evaluated (foreach is strict) and then the function apply is called on it.
So, because future is not called in step 2, but only in step 3, you'll wait for each future to complete before doing the next one. You can fix it with:
def paraAdd(hi: Int) = (0 until hi).force map (i=>future(i+5)) foreach (_.apply)
Which will give you better performance, but never as good as a simple immediate addition. On the other hand, suppose you do this:
def repeat(n: Int, f: => Any) = (0 until n) foreach (_ => f)
def paraRepeat(n: Int, f: => Any) =
(0 until n).force map (_ => future(f)) foreach (_.apply)
And then compare:
cmp(repeat(100, singAdd(100000)), paraRepeat(100, singAdd(100000)))
You may start seeing gains (it will depend on the number of cores and processor speed).