Understanding performance of a tailrec annotated recursive method in scala - scala

Consider the following method - which has been verified to conform to the proper tail recursion :
#tailrec
def getBoundaries(grps: Seq[(BigDecimal, Int)], groupSize: Int, curSum: Int = 0, curOffs: Seq[BigDecimal] = Seq.empty[BigDecimal]): Seq[BigDecimal] = {
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (newSum%50==0) { println(s"id=$id newsum=$newSum") }
if (newSum >= groupSize) {
getBoundaries(grps.tail, groupSize, 0, curOffs :+ id) // r1
} else {
getBoundaries(grps.tail, groupSize, newSum, curOffs) // r2
}
}
}
This is running very slowly - about 75 loops per second. When I hit the stacktrace (a nice feature of Intellij) almost every time the line that is currently being invoked is the second tail-recursive call r2. That fact makes me suspicious of the purported "scala unwraps the recursive calls into a while loop". If the unwrapping were occurring then why are we seeing so much time in the invocations themselves?
Beyond having a properly structured tail recursive method are there other considerations to get a recursive routine have performance approaching a direct iteration?

The performance will depend on the underlying type of the Seq.
If it is List then the problem is appending (:+) to the List because this gets very slow with long lists because it has to scan the whole list to find the end.
One solution is to prepend to the list (+:) each time and then reverse at the end. This can give very significant performance improvements, because adding to the start of a list is very quick.
Other Seq types will have different performance characteristics, but you can convert to a List before the recursive call so that you know how it is going to perform.
Here is sample code
def getBoundaries(grps: Seq[(BigDecimal, Int)], groupSize: Int): Seq[BigDecimal] = {
#tailrec
def loop(grps: List[(BigDecimal, Int)], curSum: Int, curOffs: List[BigDecimal]): List[BigDecimal] =
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (newSum >= groupSize) {
loop(grps.tail, 0, id +: curOffs) // r1
} else {
loop(grps.tail, newSum, curOffs) // r2
}
}
loop(grps.toList, 0, Nil).reverse
}
This version gives 10x performance improvement over the original code using the test data provided by the questioner in his own answer to the question.

The issue is not in the recursion but instead in the array manipulation . With the following testcase it runs at about 200K recursions per second
type Fgroups = Seq[(BigDecimal, Int)]
test("testGetBoundaries") {
val N = 200000
val grps: Fgroups = (N to 1 by -1).flatMap { x => Array.tabulate(x % 20){ x2 => (BigDecimal(x2 * 1e9), 1) }}
val sgrps = grps.sortWith { case (a, b) =>
a._1.longValue.compare(b._1.longValue) < 0
}
val bb = getBoundaries(sgrps, 100 )
println(bb.take(math.min(50,bb.length)).mkString(","))
assert(bb.length==1900)
}
My production data sample has a similar number of entries (Array with 233K rows ) but runs at 3 orders of magnitude more slowly. I am looking into the tail operation and other culprits now.
Update The following reference from Alvin Alexander indicates that the tail operation should be v fast for immutable collections - but deadly slow for long mutable ones - including Array's !
https://alvinalexander.com/scala/understanding-performance-scala-collections-classes-methods-cookbook
Wow! I had no idea about the performance implications of using mutable collections in scala!
Update By adding code to convert the Array to an (immutable) Seq I see the 3 orders of magnitude performance improvement on the production data sample:
val grps = if (grpsIn.isInstanceOf[mutable.WrappedArray[_]] || grpsIn.isInstanceOf[Array[_]]) {
Seq(grpsIn: _*)
} else grpsIn
The (now fast ~200K/sec) final code is:
type Fgroups = Seq[(BigDecimal, Int)]
val cntr = new java.util.concurrent.atomic.AtomicInteger
#tailrec
def getBoundaries(grpsIn: Fgroups, groupSize: Int, curSum: Int = 0, curOffs: Seq[BigDecimal] = Seq.empty[BigDecimal]): Seq[BigDecimal] = {
val grps = if (grpsIn.isInstanceOf[mutable.WrappedArray[_]] || grpsIn.isInstanceOf[Array[_]]) {
Seq(grpsIn: _*)
} else grpsIn
if (grps.isEmpty) curOffs
else {
val (id, cnt) = grps.head
val newSum = curSum + cnt.toInt
if (cntr.getAndIncrement % 500==0) { println(s"[${cntr.get}] id=$id newsum=$newSum") }
if (newSum >= groupSize) {
getBoundaries(grps.tail, groupSize, 0, curOffs :+ id)
} else {
getBoundaries(grps.tail, groupSize, newSum, curOffs)
}
}
}

Related

Scala String Equality Question from Programming Interview

Since I liked programming in Scala, for my Google interview, I asked them to give me a Scala / functional programming style question. The Scala functional style question that I got was as follows:
You have two strings consisting of alphabetic characters as well as a special character representing the backspace symbol. Let's call this backspace character '/'. When you get to the keyboard, you type this sequence of characters, including the backspace/delete character. The solution you are to implement must check if the two sequences of characters produce the same output. For example, "abc", "aa/bc". "abb/c", "abcc/", "/abc", and "//abc" all produce the same output, "abc". Because this is a Scala / functional programming question, you must implement your solution in idiomatic Scala style.
I wrote the following code (it might not be exactly what I wrote, I'm just going off memory). Basically I just go linearly through the string, prepending characters to a list, and then I compare the lists.
def processString(string: String): List[Char] = {
string.foldLeft(List[Char]()){ case(accumulator: List[Char], char: Char) =>
accumulator match {
case head :: tail => if(char != '/') { char :: head :: tail } else { tail }
case emptyList => if(char != '/') { char :: emptyList } else { emptyList }
}
}
}
def solution(string1: String, string2: String): Boolean = {
processString(string1) == processString(string2)
}
So far so good? He then asked for the time complexity and I responded linear time (because you have to process each character once) and linear space (because you have to copy each element into a list). Then he asked me to do it in linear time, but with constant space. I couldn't think of a way to do it that was purely functional. He said to try using a function in the Scala collections library like "zip" or "map" (I explicitly remember him saying the word "zip").
Here's the thing. I think that it's physically impossible to do it in constant space without having any mutable state or side effects. Like I think that he messed up the question. What do you think?
Can you solve it in linear time, but with constant space?
This code takes O(N) time and needs only three integers of extra space:
def solution(a: String, b: String): Boolean = {
def findNext(str: String, pos: Int): Int = {
#annotation.tailrec
def rec(pos: Int, backspaces: Int): Int = {
if (pos == 0) -1
else {
val c = str(pos - 1)
if (c == '/') rec(pos - 1, backspaces + 1)
else if (backspaces > 0) rec(pos - 1, backspaces - 1)
else pos - 1
}
}
rec(pos, 0)
}
#annotation.tailrec
def rec(aPos: Int, bPos: Int): Boolean = {
val ap = findNext(a, aPos)
val bp = findNext(b, bPos)
(ap < 0 && bp < 0) ||
(ap >= 0 && bp >= 0 && (a(ap) == b(bp)) && rec(ap, bp))
}
rec(a.size, b.size)
}
The problem can be solved in linear time with constant extra space: if you scan from right to left, then you can be sure that the /-symbols to the left of the current position cannot influence the already processed symbols (to the right of the current position) in any way, so there is no need to store them.
At every point, you need to know only two things:
Where are you in the string?
How many symbols do you have to throw away because of the backspaces
That makes two integers for storing the positions, and one additional integer for temporary storing the number of accumulated backspaces during the findNext invocation. That's a total of three integers of space overhead.
Intuition
Here is my attempt to formulate why the right-to-left scan gives you a O(1) algorithm:
The future cannot influence the past, therefore there is no need to remember the future.
The "natural time" in this problem flows from left to right. Therefore, if you scan from right to left, you are moving "from the future into the past", and therefore you don't need to remember the characters to the right of your current position.
Tests
Here is a randomized test, which makes me pretty sure that the solution is actually correct:
val rng = new util.Random(0)
def insertBackspaces(s: String): String = {
val n = s.size
val insPos = rng.nextInt(n)
val (pref, suff) = s.splitAt(insPos)
val c = ('a' + rng.nextInt(26)).toChar
pref + c + "/" + suff
}
def prependBackspaces(s: String): String = {
"/" * rng.nextInt(4) + s
}
def addBackspaces(s: String): String = {
var res = s
for (i <- 0 until 8)
res = insertBackspaces(res)
prependBackspaces(res)
}
for (i <- 1 until 1000) {
val s = "hello, world"
val t = "another string"
val s1 = addBackspaces(s)
val s2 = addBackspaces(s)
val t1 = addBackspaces(t)
val t2 = addBackspaces(t)
assert(solution(s1, s2))
assert(solution(t1, t2))
assert(!solution(s1, t1))
assert(!solution(s1, t2))
assert(!solution(s2, t1))
assert(!solution(s2, t2))
if (i % 100 == 0) {
println(s"Examples:\n$s1\n$s2\n$t1\n$t2")
}
}
A few examples that the test generates:
Examples:
/helly/t/oj/m/, wd/oi/g/x/rld
///e/helx/lc/rg//f/o, wosq//rld
/anotl/p/hhm//ere/t/ strih/nc/g
anotx/hb/er sw/p/tw/l/rip/j/ng
Examples:
//o/a/hellom/, i/wh/oe/q/b/rld
///hpj//est//ldb//y/lok/, world
///q/gd/h//anothi/k/eq/rk/ string
///ac/notherli// stri/ig//ina/n/g
Examples:
//hnn//ello, t/wl/oxnh///o/rld
//helfo//u/le/o, wna//ova//rld
//anolq/l//twl//her n/strinhx//g
/anol/tj/hq/er swi//trrq//d/ing
Examples:
//hy/epe//lx/lo, wr/v/t/orlc/d
f/hk/elv/jj//lz/o,wr// world
/anoto/ho/mfh///eg/r strinbm//g
///ap/b/notk/l/her sm/tq/w/rio/ng
Examples:
///hsm/y//eu/llof/n/, worlq/j/d
///gx//helf/i/lo, wt/g/orn/lq/d
///az/e/notm/hkh//er sm/tb/rio/ng
//b/aen//nother v/sthg/m//riv/ng
Seems to work just fine. So, I'd say that the Google-guy did not mess up, looks like a perfectly valid question.
You don't have to create the output to find the answer. You can iterate the two sequences at the same time and stop on the first difference. If you find no difference and both sequences terminate at the same time, they're equal, otherwise they're different.
But now consider sequences such as this one: aaaa/// to compare with a. You need to consume 6 elements from the left sequence and one element from the right sequence before you can assert that they're equal. That means that you would need to keep at least 5 elements in memory until you can verify that they're all deleted. But what if you iterated elements from the end? You would then just need to count the number of backspaces and then just ignoring as many elements as necessary in the left sequence without requiring to keep them in memory since you know they won't be present in the final output. You can achieve O(1) memory using these two tips.
I tried it and it seems to work:
def areEqual(s1: String, s2: String) = {
def charAt(s: String, index: Int) = if (index < 0) '#' else s(index)
#tailrec
def recSol(i1: Int, backspaces1: Int, i2: Int, backspaces2: Int): Boolean = (charAt(s1, i1), charAt(s2, i2)) match {
case ('/', _) => recSol(i1 - 1, backspaces1 + 1, i2, backspaces2)
case (_, '/') => recSol(i1, backspaces1, i2 - 1, backspaces2 + 1)
case ('#' , '#') => true
case (ch1, ch2) =>
if (backspaces1 > 0) recSol(i1 - 1, backspaces1 - 1, i2 , backspaces2 )
else if (backspaces2 > 0) recSol(i1 , backspaces1 , i2 - 1, backspaces2 - 1)
else ch1 == ch2 && recSol(i1 - 1, backspaces1 , i2 - 1, backspaces2 )
}
recSol(s1.length - 1, 0, s2.length - 1, 0)
}
Some tests (all pass, let me know if you have more edge cases in mind):
// examples from the question
val inputs = Array("abc", "aa/bc", "abb/c", "abcc/", "/abc", "//abc")
for (i <- 0 until inputs.length; j <- 0 until inputs.length) {
assert(areEqual(inputs(i), inputs(j)))
}
// more deletions than required
assert(areEqual("a///////b/c/d/e/b/b", "b"))
assert(areEqual("aa/a/a//a//a///b", "b"))
assert(areEqual("a/aa///a/b", "b"))
// not enough deletions
assert(!areEqual("aa/a/a//a//ab", "b"))
// too many deletions
assert(!areEqual("a", "a/"))
PS: just a few notes on the code itself:
Scala type inference is good enough so that you can drop types in the partial function inside your foldLeft
Nil is the idiomatic way to refer to the empty list case
Bonus:
I had something like Tim's soltion in mind before implementing my idea, but I started early with pattern matching on characters only and it didn't fit well because some cases require the number of backspaces. In the end, I think a neater way to write it is a mix of pattern matching and if conditions. Below is my longer original solution, the one I gave above was refactored laater:
def areEqual(s1: String, s2: String) = {
#tailrec
def recSol(c1: Cursor, c2: Cursor): Boolean = (c1.char, c2.char) match {
case ('/', '/') => recSol(c1.next, c2.next)
case ('/' , _) => recSol(c1.next, c2 )
case (_ , '/') => recSol(c1 , c2.next)
case ('#' , '#') => true
case (a , b) if (a == b) => recSol(c1.next, c2.next)
case _ => false
}
recSol(Cursor(s1, s1.length - 1), Cursor(s2, s2.length - 1))
}
private case class Cursor(s: String, index: Int) {
val char = if (index < 0) '#' else s(index)
def next = {
#tailrec
def recSol(index: Int, backspaces: Int): Cursor = {
if (index < 0 ) Cursor(s, index)
else if (s(index) == '/') recSol(index - 1, backspaces + 1)
else if (backspaces > 1) recSol(index - 1, backspaces - 1)
else Cursor(s, index - 1)
}
recSol(index, 0)
}
}
If the goal is minimal memory footprint, it's hard to argue against iterators.
def areSame(a :String, b :String) :Boolean = {
def getNext(ci :Iterator[Char], ignore :Int = 0) : Option[Char] =
if (ci.hasNext) {
val c = ci.next()
if (c == '/') getNext(ci, ignore+1)
else if (ignore > 0) getNext(ci, ignore-1)
else Some(c)
} else None
val ari = a.reverseIterator
val bri = b.reverseIterator
1 to a.length.max(b.length) forall(_ => getNext(ari) == getNext(bri))
}
On the other hand, when arguing FP principals it's hard to defend iterators, since they're all about maintaining state.
Here is a version with a single recursive function and no additional classes or libraries. This is linear time and constant memory.
def compare(a: String, b: String): Boolean = {
#tailrec
def loop(aIndex: Int, aDeletes: Int, bIndex: Int, bDeletes: Int): Boolean = {
val aVal = if (aIndex < 0) None else Some(a(aIndex))
val bVal = if (bIndex < 0) None else Some(b(bIndex))
if (aVal.contains('/')) {
loop(aIndex - 1, aDeletes + 1, bIndex, bDeletes)
} else if (aDeletes > 0) {
loop(aIndex - 1, aDeletes - 1, bIndex, bDeletes)
} else if (bVal.contains('/')) {
loop(aIndex, 0, bIndex - 1, bDeletes + 1)
} else if (bDeletes > 0) {
loop(aIndex, 0, bIndex - 1, bDeletes - 1)
} else {
aVal == bVal && (aVal.isEmpty || loop(aIndex - 1, 0, bIndex - 1, 0))
}
}
loop(a.length - 1, 0, b.length - 1, 0)
}

Surprisingly slow of mutable.array.drop

I am a newbie in Scala, and when I am trying to profile my Scala code with YourKit, I have some surprising finding regarding the usage of array.drop.
Here is what I write:
...
val items = s.split(" +") // s is a string
...
val s1 = items.drop(2).mkString(" ")
...
In a 1 mins run of my code, YourKit told me that function call items.drop(2) takes around 11% of the total execution time..
Lexer.scala:33 scala.collection.mutable.ArrayOps$ofRef.drop(int) 1054 11%
This is really surprising to me, is there any internal memory copy that slow down the processing? If so, what is the best practice to optimize my simple code snippet? Thank you.
This is really surprising to me, is there any internal memory copy
that slow down the processing?
ArrayOps.drop internally calls IterableLike.slice, which allocates a builder that produces a new Array for each call:
override def slice(from: Int, until: Int): Repr = {
val lo = math.max(from, 0)
val hi = math.min(math.max(until, 0), length)
val elems = math.max(hi - lo, 0)
val b = newBuilder
b.sizeHint(elems)
var i = lo
while (i < hi) {
b += self(i)
i += 1
}
b.result()
}
You're seeing the cost of the iteration + allocation. You didn't specify how many times this happens and what's the size of the collection, but if it's large this could be time consuming.
One way of optimizing this is to generate a List[String] instead which simply iterates the collection and drops it's head element. Note this will occur an additional traversal of the Array[T] to create the list, so make sure to benchmark this to see you actually gain anything:
val items = s.split(" +").toList
val afterDrop = items.drop(2).mkString(" ")
Another possibility is to enrich Array[T] to include your own version of mkString which manually populates a StringBuilder:
object RichOps {
implicit class RichArray[T](val arr: Array[T]) extends AnyVal {
def mkStringWithIndex(start: Int, end: Int, separator: String): String = {
var idx = start
val stringBuilder = new StringBuilder(end - start)
while (idx < end) {
stringBuilder.append(arr(idx))
if (idx != end - 1) {
stringBuilder.append(separator)
}
idx += 1
}
stringBuilder.toString()
}
}
}
And now we have:
object Test {
def main(args: Array[String]): Unit = {
import RichOps._
val items = "hello everyone and welcome".split(" ")
println(items.mkStringWithIndex(2, items.length, " "))
}
Yields:
and welcome

Efficient way to fold list in scala, while avoiding allocations and vars

I have a bunch of items in a list, and I need to analyze the content to find out how many of them are "complete". I started out with partition, but then realized that I didn't need to two lists back, so I switched to a fold:
val counts = groupRows.foldLeft( (0,0) )( (pair, row) =>
if(row.time == 0) (pair._1+1,pair._2)
else (pair._1, pair._2+1)
)
but I have a lot of rows to go through for a lot of parallel users, and it is causing a lot of GC activity (assumption on my part...the GC could be from other things, but I suspect this since I understand it will allocate a new tuple on every item folded).
for the time being, I've rewritten this as
var complete = 0
var incomplete = 0
list.foreach(row => if(row.time != 0) complete += 1 else incomplete += 1)
which fixes the GC, but introduces vars.
I was wondering if there was a way of doing this without using vars while also not abusing the GC?
EDIT:
Hard call on the answers I've received. A var implementation seems to be considerably faster on large lists (like by 40%) than even a tail-recursive optimized version that is more functional but should be equivalent.
The first answer from dhg seems to be on-par with the performance of the tail-recursive one, implying that the size pass is super-efficient...in fact, when optimized it runs very slightly faster than the tail-recursive one on my hardware.
The cleanest two-pass solution is probably to just use the built-in count method:
val complete = groupRows.count(_.time == 0)
val counts = (complete, groupRows.size - complete)
But you can do it in one pass if you use partition on an iterator:
val (complete, incomplete) = groupRows.iterator.partition(_.time == 0)
val counts = (complete.size, incomplete.size)
This works because the new returned iterators are linked behind the scenes and calling next on one will cause it to move the original iterator forward until it finds a matching element, but it remembers the non-matching elements for the other iterator so that they don't need to be recomputed.
Example of the one-pass solution:
scala> val groupRows = List(Row(0), Row(1), Row(1), Row(0), Row(0)).view.map{x => println(x); x}
scala> val (complete, incomplete) = groupRows.iterator.partition(_.time == 0)
Row(0)
Row(1)
complete: Iterator[Row] = non-empty iterator
incomplete: Iterator[Row] = non-empty iterator
scala> val counts = (complete.size, incomplete.size)
Row(1)
Row(0)
Row(0)
counts: (Int, Int) = (3,2)
I see you've already accepted an answer, but you rightly mention that that solution will traverse the list twice. The way to do it efficiently is with recursion.
def counts(xs: List[...], complete: Int = 0, incomplete: Int = 0): (Int,Int) =
xs match {
case Nil => (complete, incomplete)
case row :: tail =>
if (row.time == 0) counts(tail, complete + 1, incomplete)
else counts(tail, complete, incomplete + 1)
}
This is effectively just a customized fold, except we use 2 accumulators which are just Ints (primitives) instead of tuples (reference types). It should also be just as efficient a while-loop with vars - in fact, the bytecode should be identical.
Maybe it's just me, but I prefer using the various specialized folds (.size, .exists, .sum, .product) if they are available. I find it clearer and less error-prone than the heavy-duty power of general folds.
val complete = groupRows.view.filter(_.time==0).size
(complete, groupRows.length - complete)
How about this one? No import tax.
import scala.collection.generic.CanBuildFrom
import scala.collection.Traversable
import scala.collection.mutable.Builder
case class Count(n: Int, total: Int) {
def not = total - n
}
object Count {
implicit def cbf[A]: CanBuildFrom[Traversable[A], Boolean, Count] = new CanBuildFrom[Traversable[A], Boolean, Count] {
def apply(): Builder[Boolean, Count] = new Counter
def apply(from: Traversable[A]): Builder[Boolean, Count] = apply()
}
}
class Counter extends Builder[Boolean, Count] {
var n = 0
var ttl = 0
override def +=(b: Boolean) = { if (b) n += 1; ttl += 1; this }
override def clear() { n = 0 ; ttl = 0 }
override def result = Count(n, ttl)
}
object Counting extends App {
val vs = List(4, 17, 12, 21, 9, 24, 11)
val res: Count = vs map (_ % 2 == 0)
Console println s"${vs} have ${res.n} evens out of ${res.total}; ${res.not} were odd."
val res2: Count = vs collect { case i if i % 2 == 0 => i > 10 }
Console println s"${vs} have ${res2.n} evens over 10 out of ${res2.total}; ${res2.not} were smaller."
}
OK, inspired by the answers above, but really wanting to only pass over the list once and avoid GC, I decided that, in the face of a lack of direct API support, I would add this to my central library code:
class RichList[T](private val theList: List[T]) {
def partitionCount(f: T => Boolean): (Int, Int) = {
var matched = 0
var unmatched = 0
theList.foreach(r => { if (f(r)) matched += 1 else unmatched += 1 })
(matched, unmatched)
}
}
object RichList {
implicit def apply[T](list: List[T]): RichList[T] = new RichList(list)
}
Then in my application code (if I've imported the implicit), I can write var-free expressions:
val (complete, incomplete) = groupRows.partitionCount(_.time != 0)
and get what I want: an optimized GC-friendly routine that prevents me from polluting the rest of the program with vars.
However, I then saw Luigi's benchmark, and updated it to:
Use a longer list so that multiple passes on the list were more obvious in the numbers
Use a boolean function in all cases, so that we are comparing things fairly
http://pastebin.com/2XmrnrrB
The var implementation is definitely considerably faster, even though Luigi's routine should be identical (as one would expect with optimized tail recursion). Surprisingly, dhg's dual-pass original is just as fast (slightly faster if compiler optimization is on) as the tail-recursive one. I do not understand why.
It is slightly tidier to use a mutable accumulator pattern, like so, especially if you can re-use your accumulator:
case class Accum(var complete = 0, var incomplete = 0) {
def inc(compl: Boolean): this.type = {
if (compl) complete += 1 else incomplete += 1
this
}
}
val counts = groupRows.foldLeft( Accum() ){ (a, row) => a.inc( row.time == 0 ) }
If you really want to, you can hide your vars as private; if not, you still are a lot more self-contained than the pattern with vars.
You could just calculate it using the difference like so:
def counts(groupRows: List[Row]) = {
val complete = groupRows.foldLeft(0){ (pair, row) =>
if(row.time == 0) pair + 1 else pair
}
(complete, groupRows.length - complete)
}

Quicksort using Future ends up in a deadlock

I have written a quicksort (method quicksortF()) that uses a Scala's Future to let the recursive sorting of the partitions be done concurrently. I also have implemented a regular quicksort (method quicksort()). Unfortunately, the Future version ends up in a deadlock (apparently blocks forever) when the list to sort is greater than about 1000 elements (900 would work). The source is shown below.
I am relatively new to Actors and Futures. What is goind wrong here?
Thanks!
import util.Random
import actors.Futures._
/**
* Quicksort with and without using the Future pattern.
* #author Markus Gumbel
*/
object FutureQuickSortProblem {
def main(args: Array[String]) {
val n = 1000 // works for n = 900 but not for 1000 anymore.
// Create a random list of size n:
val list = (1 to n).map(c => Random.nextInt(n * 10)).toList
println(list)
// Sort it with regular quicksort:
val sortedList = quicksort(list)
println(sortedList)
// ... and with quicksort using Future (which hangs):
val sortedListF = quicksortF(list)
println(sortedListF)
}
// This one works.
def quicksort(list: List[Int]): List[Int] = {
if (list.length <= 1) list
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
val sortedLeftList = quicksort(leftList)
val sortedRightList = quicksort(rightList)
sortedLeftList ::: middleList ::: sortedRightList
}
}
// Almost the same as quicksort() except that Future is used.
// However, this one hangs.
def quicksortF(list: List[Int]): List[Int] = {
if (list.length <= 1) list
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
// Same as quicksort() but here we are using a Future
// to sort the left and right partitions independently:
val sortedLeftListFuture = future {
quicksortF(leftList)
}
val sortedRightListFuture = future {
quicksortF(rightList)
}
sortedLeftListFuture() ::: middleList ::: sortedRightListFuture()
}
}
}
class FutureQuickSortProblem // If not defined, Intellij won't find the main method.?!
Disclaimer: I've never personally used the (pre-2.10) standard library's actors or futures in any serious way, and there are a number of things I don't like (or at least don't understand) about the API there, compared for example to the implementations in Scalaz or Akka or Play 2.0.
But I can tell you that the usual approach in a case like this is to combine your futures monadically instead of claiming them immediately and combining the results. For example, you could write something like this (note the new return type):
import scala.actors.Futures._
def quicksortF(list: List[Int]): Responder[List[Int]] = {
if (list.length <= 1) future(list)
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
for {
left <- quicksortF(leftList)
right <- quicksortF(rightList)
} yield left ::: middleList ::: right
}
}
Like your vanilla implementation, this won't necessarily be very efficient, and it will also blow the stack pretty easily, but it shouldn't run out of threads.
As a side note, why does flatMap on a Future return a Responder instead of a Future? I don't know, and neither do some other folks. For reasons like this I'd suggest skipping the now-deprecated pre-2.10 standard library actor-based concurrency stuff altogether.
As I understand, calling apply on the Future (as you do when concatenating the results of the recursive calls) will block until the result is retrieved.

How to yield a single element from for loop in scala?

Much like this question:
Functional code for looping with early exit
Say the code is
def findFirst[T](objects: List[T]):T = {
for (obj <- objects) {
if (expensiveFunc(obj) != null) return /*???*/ Some(obj)
}
None
}
How to yield a single element from a for loop like this in scala?
I do not want to use find, as proposed in the original question, i am curious about if and how it could be implemented using the for loop.
* UPDATE *
First, thanks for all the comments, but i guess i was not clear in the question. I am shooting for something like this:
val seven = for {
x <- 1 to 10
if x == 7
} return x
And that does not compile. The two errors are:
- return outside method definition
- method main has return statement; needs result type
I know find() would be better in this case, i am just learning and exploring the language. And in a more complex case with several iterators, i think finding with for can actually be usefull.
Thanks commenters, i'll start a bounty to make up for the bad posing of the question :)
If you want to use a for loop, which uses a nicer syntax than chained invocations of .find, .filter, etc., there is a neat trick. Instead of iterating over strict collections like list, iterate over lazy ones like iterators or streams. If you're starting with a strict collection, make it lazy with, e.g. .toIterator.
Let's see an example.
First let's define a "noisy" int, that will show us when it is invoked
def noisyInt(i : Int) = () => { println("Getting %d!".format(i)); i }
Now let's fill a list with some of these:
val l = List(1, 2, 3, 4).map(noisyInt)
We want to look for the first element which is even.
val r1 = for(e <- l; val v = e() ; if v % 2 == 0) yield v
The above line results in:
Getting 1!
Getting 2!
Getting 3!
Getting 4!
r1: List[Int] = List(2, 4)
...meaning that all elements were accessed. That makes sense, given that the resulting list contains all even numbers. Let's iterate over an iterator this time:
val r2 = (for(e <- l.toIterator; val v = e() ; if v % 2 == 0) yield v)
This results in:
Getting 1!
Getting 2!
r2: Iterator[Int] = non-empty iterator
Notice that the loop was executed only up to the point were it could figure out whether the result was an empty or non-empty iterator.
To get the first result, you can now simply call r2.next.
If you want a result of an Option type, use:
if(r2.hasNext) Some(r2.next) else None
Edit Your second example in this encoding is just:
val seven = (for {
x <- (1 to 10).toIterator
if x == 7
} yield x).next
...of course, you should be sure that there is always at least a solution if you're going to use .next. Alternatively, use headOption, defined for all Traversables, to get an Option[Int].
You can turn your list into a stream, so that any filters that the for-loop contains are only evaluated on-demand. However, yielding from the stream will always return a stream, and what you want is I suppose an option, so, as a final step you can check whether the resulting stream has at least one element, and return its head as a option. The headOption function does exactly that.
def findFirst[T](objects: List[T], expensiveFunc: T => Boolean): Option[T] =
(for (obj <- objects.toStream if expensiveFunc(obj)) yield obj).headOption
Why not do exactly what you sketched above, that is, return from the loop early? If you are interested in what Scala actually does under the hood, run your code with -print. Scala desugares the loop into a foreach and then uses an exception to leave the foreach prematurely.
So what you are trying to do is to break out a loop after your condition is satisfied. Answer here might be what you are looking for. How do I break out of a loop in Scala?.
Overall, for comprehension in Scala is translated into map, flatmap and filter operations. So it will not be possible to break out of these functions unless you throw an exception.
If you are wondering, this is how find is implemented in LineerSeqOptimized.scala; which List inherits
override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
var these = this
while (!these.isEmpty) {
if (p(these.head)) return Some(these.head)
these = these.tail
}
None
}
This is a horrible hack. But it would get you the result you wished for.
Idiomatically you'd use a Stream or View and just compute the parts you need.
def findFirst[T](objects: List[T]): T = {
def expensiveFunc(o : T) = // unclear what should be returned here
case class MissusedException(val data: T) extends Exception
try {
(for (obj <- objects) {
if (expensiveFunc(obj) != null) throw new MissusedException(obj)
})
objects.head // T must be returned from loop, dummy
} catch {
case MissusedException(obj) => obj
}
}
Why not something like
object Main {
def main(args: Array[String]): Unit = {
val seven = (for (
x <- 1 to 10
if x == 7
) yield x).headOption
}
}
Variable seven will be an Option holding Some(value) if value satisfies condition
I hope to help you.
I think ... no 'return' impl.
object TakeWhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else
seq(seq.takeWhile(_ == null).size)
}
object OptionLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T], index: Int = 0): T = if (seq.isEmpty) null.asInstanceOf[T] else
Option(seq(index)) getOrElse func(seq, index + 1)
}
object WhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else {
var i = 0
def obj = seq(i)
while (obj == null)
i += 1
obj
}
}
objects iterator filter { obj => (expensiveFunc(obj) != null } next
The trick is to get some lazy evaluated view on the colelction, either an iterator or a Stream, or objects.view. The filter will only execute as far as needed.