Why does this function run out of memory? - scala

It's a function to find the third largest of a collection of integers. I'm calling it like this:
val lineStream = thirdLargest(Source.fromFile("10m.txt").getLines.toIterable
val intStream = lineStream map { s => Integer.parseInt(s) }
thirdLargest(intStream)
The file 10m.txt contains 10 million lines with a random integer on each one. The thirdLargest function below should not be keeping any of the integers after it has tested them, and yet it causes the JVM to run out of memory (after about 90 seconds in my case).
def thirdLargest(numbers: Iterable[Int]): Option[Int] = {
def top3of4(top3: List[Int], fourth: Int) = top3 match {
case List(a, b, c) =>
if (fourth > c) List(b, c, fourth)
else if (fourth > b) List(b, fourth, c)
else if (fourth > a) List(fourth, b, c)
else top3
}
#tailrec
def find(top3: List[Int], rest: Iterable[Int]): Int = (top3, rest) match {
case (List(a, b, c), Nil) => a
case (top3, d #:: rest) => find(top3of4(top3, d), rest)
}
numbers match {
case a #:: b #:: c #:: rest => Some(find(List[Int](a, b, c).sorted, rest))
case _ => None
}
}

The OOM error has nothing to do with the way you read the file. It is totally fine and even recommended to use Source.getLines here. The problem is elsewhere.
Many people are being confused by the nature of Scala Stream concept. In fact this is not something you would want to use just to iterate over things. It is lazy indeed however it doesn't discard previous results – they're being memoized so there's no need to recalculate them again on the next use (which never happens in your case but that's where your memory goes). See also this answer.
Consider using foldLeft. Here's a working (but intentionally simplified) example for illustration purposes:
val lines = Source.fromFile("10m.txt").getLines()
print(lines.map(_.toInt).foldLeft(-1 :: -1 :: -1 :: Nil) { (best3, next) =>
(next :: best3).sorted.reverse.take(3)
})

Related

How do you write a function to divide the input list into three sublists?

Write a function to divide the input list into three sublists.
The first sub-list is to include all the elements whose indexes satisfy the equation i mod 3 = 1.
The second sub-list is to include all the elements whose indexes satisfy the equation and mod 3 = 2.
The third sub-list is to contain the remaining elements.
The order of the elements must be maintained. Return the result as three lists.
Write a function using tail and non-tail recursion.
My attempt: I’m very confused in how to increase index so it can go through the list, any recommendation about how to make it recursive with increasing index each time?
def divide(list: List[Int]): (List[Int], List[Int], List[Int]) = {
var index:Int =0
def splitList(remaining: List[Int], firstSubList: List[Int], secondSubList: List[Int], thirdSubList: List[Int], index:Int): (List[Int], List[Int], List[Int]) = {
if(remaining.isEmpty) {
return (List[Int](), List[Int](), List[Int]())
}
val splitted = splitList(remaining.tail, firstSubList, secondSubList, thirdSubList, index)
val firstList = if (index % 3 == 1) List() ::: splitted._1 else splitted._1
val secondList = if (index % 3 == 2) List() ::: splitted._2 else splitted._2
val thirdList = if((index% 3 != 1) && (index % 3 != 2)) List() ::: splitted._3 else splitted._3
index +1
(firstSubList ::: firstList, secondSubList ::: secondList, thirdSubList ::: thirdList)
}
splitList(list, List(), List(), List(), index+1)
}
println(divide(List(0,11,22,33)))
Generalizing the requirement a little, here's one approach using a simple recursive function to compose a Map of Lists by modulo n of the original list indexes:
def splitList[T](list: List[T], n: Int): Map[Int, List[T]] = {
#scala.annotation.tailrec
def loop(zls: List[(T, Int)], lsMap: Map[Int, List[T]]): Map[Int, List[T]] =
zls match {
case Nil =>
lsMap.map{ case (i, ls) => (i, ls.reverse) }
case (x, i) :: rest =>
val j = i % n
loop(rest, lsMap + (j -> (x :: lsMap.getOrElse(j, Nil))))
}
loop(list.zipWithIndex, Map.empty[Int, List[T]])
}
splitList(List(0, 11, 22, 33, 44, 55, 66), 3)
// Map(0 -> List(0, 33, 66), 1 -> List(11, 44), 2 -> List(22, 55))
splitList(List('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'), 4)
// Map(0 -> List(a, e, i), 1 -> List(b, f), 2 -> List(c, g), 3 -> List(d, h))
To do this in real life, just label each value with its index and then group by that index modulo 3:
def divide[T](list: List[T]) = {
val g = list.zipWithIndex.groupMap(_._2 % 3)(_._1)
(g.getOrElse(1, Nil), g.getOrElse(2, Nil), g.getOrElse(0, Nil))
}
If you insist on a recursive version, it might look like this:
def divide[T](list: List[T]) = {
def loop(rem: List[T]): (List[T], List[T], List[T]) =
rem match {
case a::b::c::tail =>
val rem = loop(tail)
(b +: rem._1, c +: rem._2, a +: rem._3)
case a::b::Nil =>
(List(b), Nil, List(a))
case a::Nil =>
(Nil, Nil, List(a))
case Nil =>
(Nil, Nil, Nil)
}
loop(list)
}
Tail recursion would look like this:
def divide[T](list: List[T]) = {
#annotation.tailrec
def loop(rem: List[T], res: (List[T], List[T], List[T])): (List[T], List[T], List[T]) =
rem match {
case a::b::c::tail =>
loop(tail, (res._1 :+ b, res._2 :+ c, res._3 :+ a))
case a::b::Nil =>
(res._1 :+ b, res._2, res._3 :+ a)
case a::Nil =>
(res._1, res._2, res._3 :+ a)
case Nil =>
res
}
loop(list, (Nil, Nil, Nil))
}
And if you care about efficiency, this version would build the lists in the other order and reverse them when returning the result.
Your problem is that you put index+1 into a wrong place. Try swapping it around: put index+1 into the call where you have index, and index into the other one. Also remove the "standalone" index+1 statement in the middle, it doesn't do anything anyway.
That should make your code work ... but it is still not very good. A couple of problems with it (besides it being badly structured, non-idiomatic, and hard to read, which is kinda subjective):
It it is not tail-recursive, and effectively, creates another copy of the entire list on stack. This may be problematic when the list is long.
It concatenates (potentially long) lists. This is a bad idea. List in scala is a singly linked list, you have it's head readily available, but to get to the end, you need to spend O(N) cycles, iterating through each node. Thus things like foo:::bar in a iterative function instantly make any algorithm (at least) quadratic.
The usual "trick" to avoid the last problem is prepending elements to output one-by-one, and then reversing the result in the end. The first one can be avoided with tail-recursion. The "non-idiomatic" and "hard to read" problems are mostly addressed by using match statement in this case:
def split3(
in: List[Int],
one: List[Int],
two: List[Int],
three: List[Int],
index: Int = 0
} = (in, index % 3) match {
case (Nil, _) => (one.reverse, two.reverse, three.reverse)
case (head::tail, 1) => split3(tail, head::one, two, three, index+1)
case (head::tail, 2) => split3(tail, one, head::two, three, index+1)
case (head::tail, _) => split3(tail, one, two, head::three, index+1)
}
Now, this is a fine solution, albeit a little repetitive to my demanding eye ... But if want to be clever and really unleash the full power of scala standard library, forget recursion, you don't really need it in this case.
If you knew that number of elements in the list was always divisible by 3,
you could just do list.grouped(3).toSeq.transpose: break the list into groups of three (each group will have index%3=0 as first element, index%3=1 as second, index%3=2 as the third), and then transpose will turn a list of lists of 3 into a list of three lists where the first one contains all the first elements, the second - all the seconds etc. (I know, you wanted them in a different order, but that's trivial). If you are having trouble understanding what I am talking about, just try running it on some lists, and look at the results.
This would be a really elegant solution ... if it worked :/ The problem is, that it only does when you have 3*n elements in the original list. If not, transpose will fail on the last element if it doesn't have 3 elements like all others. Can we fix it? Well ... that's where the cleverness comes in.
val (triplets, tails) = list.grouped(3).toSeq.partition(_.size == 3)
triplets
.transpose
.padTo(3, Nil)
.zip(tails.flatten.map(Seq(_)).padTo(3, Nil))
.map { case (head, tail) => head ++ tail }
Basically, it is doing the same thing as the one-liner I described above (break into groups of 3 and transpose), but adds special handling for the case when the last group has less than three elements - it splits it out and pads with required number of empty lists, then just appends the result to transposed triplets.

Function takes two List[Int] arguments and produces a List[Int]. SCALA [duplicate]

This question already has answers here:
Scala - Combine two lists in an alternating fashion
(4 answers)
Closed 3 years ago.
The elements of the resulting list should alternate between the elements of the arguments. Assume that the two arguments have the same length.
USE RECURSION
My code as follows
val finalString = new ListBuffer[Int]
val buff2= new ListBuffer[Int]
def alternate(xs:List[Int], ys:List[Int]):List[Int] = {
while (xs.nonEmpty) {
finalString += xs.head + ys.head
alternate(xs.tail,ys.tail)
}
return finalString.toList
}
EXPECTED RESULT:
alternate ( List (1 , 3, 5) , List (2 , 4, 6)) = List (1 , 2, 3, 4, 6)
As far for the output, I don't get any output. The program is still running and cannot be executed.
Are there any Scala experts?
There are a few problems with the recursive solutions suggested so far (including yours, which would actually work, if you replace while with if): appending to end of list is a linear operation, making the whole thing quadratic (taking a .length of a list too, as well ас accessing elements by index), don't do that; also, if the lists are long, a recursion may require a lot of space on the stack, you should be using tail-recursion whenever possible.
Here is a solution that is free of both those problems: it builds the output backwards, by prepending elements to the list (constant time operation) rather than appending them, and reverses the result at the end. It is also tail-recursive: the recursive call is the last operation in the function, which allows the compiler to convert it into a loop, so that it will only use a single stack frame for execution regardless of the size of the lists.
#tailrec
def alternate(
a: List[Int],
b: List[Int],
result: List[Int] = Nil
): List[Int] = (a,b) match {
case (Nil, _) | (_, Nil) => result.reversed
case (ah :: at, bh :: bt) => alternate(at, bt, bh :: ah :: result)
}
(if the lists are of different lengths, the whole thing stops when the shortest one ends, and whatever is left in the longer one is thrown out. You may want to modify the first case (split it into two, perhaps) if you desire a different behavior).
BTW, your own solution is actually better than most suggested here: it is actually tail recursive (or rather can be made one if you add else after your if, which is now while), and appending to ListBuffer isn't actually as bad as to a List. But using mutable state is generally considered "code smell" in scala, and should be avoided (that's one of the main ideas behind using recursion instead of loops in the first place).
Condition xs.nonEmpty is true always so you have infinite while loop.
Maybe you meant if instead of while.
A more Scala-ish approach would be something like:
def alternate(xs: List[Int], ys: List[Int]): List[Int] = {
xs.zip(ys).flatMap{case (x, y) => List(x, y)}
}
alternate(List(1,3,5), List(2,4,6))
// List(1, 2, 3, 4, 5, 6)
A recursive solution using match
def alternate[T](a: List[T], b: List[T]): List[T] =
(a, b) match {
case (h1::t1, h2::t2) =>
h1 +: h2 +: alternate(t1, t2)
case _ =>
a ++ b
}
This could be more efficient at the cost of clarity.
Update
This is the more efficient solution:
def alternate[T](a: List[T], b: List[T]): List[T] = {
#annotation.tailrec
def loop(a: List[T], b: List[T], res: List[T]): List[T] =
(a, b) match {
case (h1 :: t1, h2 :: t2) =>
loop(t1, t2, h2 +: h1 +: res)
case _ =>
a ++ b ++ res.reverse
}
loop(a, b, Nil)
}
This retains the original function signature but uses an inner function that is an efficient, tail-recursive implementation of the algorithm.
You're accessing variables from outside the method, which is bad. I would suggest something like the following:
object Main extends App {
val l1 = List(1, 3, 5)
val l2 = List(2, 4, 6)
def alternate[A](l1: List[A], l2: List[A]): List[A] = {
if (l1.isEmpty || l2.isEmpty) List()
else List(l1.head,l2.head) ++ alternate(l1.tail, l2.tail)
}
println(alternate(l1, l2))
}
Still recursive but without accessing state from outside the method.
Assuming both lists are of the same length, you can use a ListBuffer to build up the alternating list. alternate is a pure function:
import scala.collection.mutable.ListBuffer
object Alternate extends App {
def alternate[T](xs: List[T], ys: List[T]): List[T] = {
val buffer = new ListBuffer[T]
for ((x, y) <- xs.zip(ys)) {
buffer += x
buffer += y
}
buffer.toList
}
alternate(List(1, 3, 5), List(2, 4, 6)).foreach(println)
}

Merge two Streams (ordered) to get a final sorted Stream

For example, how to merge two Streams of sorted Integers? I thought it's very basic, but just found it's non trivial at all. The below one is not tail-recursive and it will stack-overflow when the Streams are large.
def merge(as: Stream[Int], bs: Stream[Int]): Stream[Int] = {
(as, bs) match {
case (Stream.Empty, bss) => bss
case (ass, Stream.Empty) => ass
case (a #:: ass, b #:: bss) =>
if (a < b) a #:: merge(ass, bs)
else b #:: merge(as, bss)
}
}
We may want to turn it into a tail-recursive one by introducing a accumulator. However, if we pre-pend the accumulator, we will only get a stream of reversed order; if we append the accumulator with concatenation (#:::), it's NOT lazy (strict) any more.
What could be the solution here? Thanks
Turning a comment into an answer, there's nothing wrong with your merge.
It's not recursive at all - any one call to merge returns a new Stream without any other call to merge. a #:: merge(ass, bs) return a stream with first element a and where merge(ass, bs) will be called to evaluate the rest of the stream when required.
So
val m = merge(Stream.from(1,2), Stream.from(2, 2))
//> m : Stream[Int] = Stream(1, ?)
m.drop(10000000).take(1)
//> res0: scala.collection.immutable.Stream[Int] = Stream(10000001, ?)
works just fine. No stack overflow.

Pattern matching and infinite streams

So, I'm working to teach myself Scala, and one of the things I've been playing with is the Stream class. I tried to use a naïve translation of the classic Haskell version of Dijkstra's solution to the Hamming number problem:
object LazyHammingBad {
private def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
(a, b) match {
case (x #:: xs, y #:: ys) =>
if (x < y) x #:: merge(xs, b)
else if (y < x) y #:: merge(a, ys)
else x #:: merge(xs, ys)
}
val numbers: Stream[BigInt] =
1 #:: merge(numbers map { _ * 2 },
merge(numbers map { _ * 3 }, numbers map { _ * 5 }))
}
Taking this for a spin in the interpreter led quickly to disappointment:
scala> LazyHammingBad.numbers.take(10).toList
java.lang.StackOverflowError
I decided to look to see if other people had solved the problem in Scala using the Haskell approach, and adapted this solution from Rosetta Code:
object LazyHammingGood {
private def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
if (a.head < b.head) a.head #:: merge(a.tail, b)
else if (b.head < a.head) b.head #:: merge(a, b.tail)
else a.head #:: merge(a.tail, b.tail)
val numbers: Stream[BigInt] =
1 #:: merge(numbers map {_ * 2},
merge(numbers map {_ * 3}, numbers map {_ * 5}))
}
This one worked nicely, but I still wonder how I went wrong in LazyHammingBad. Does using #:: to destructure x #:: xs force the evaluation of xs for some reason? Is there any way to use pattern matching safely with infinite streams, or do you just have to use head and tail if you don't want things to blow up?
a match {case x#::xs =>... is about the same as val (x, xs) = (a.head, a.tail). So the difference between the bad version and the good one, is that in that in the bad version, you're calling a.tail and b.tail right at the start, instead of just use them to build the tail of the resulting stream. Furthermore when you use them at the right of #:: (not pattern matching, but building the result, as in #:: merge(a.b.tail) you are not actually calling merge, that will be done only later, when accessing the tail of the returned Stream. So in the good version, a call to merge does not call tail at all. In the bad version, it calls it right at start.
Now if you consider numbers, or even a simplified version, say 1 #:: merge(numbers, anotherStream), when you call you call tail on that (as take(10) will), merge has to be evaluated. You call tail on numbers, which call merge with numbers as parameters, which calls tails on numbers, which calls merge, which calls tail...
By contrast, in super lazy Haskell, when you pattern match, it does barely any work. When you do case l of x:xs, it will evaluate l just enough to know whether it is an empty list or a cons.
If it is indeed a cons, x and xs will be available as two thunks, functions that will eventually give access, later, to content. The closest equivalent in Scala would be to just test empty.
Note also that in Scala Stream, while the tail is lazy, the head is not. When you have a (non empty) Stream, the head has to be known. Which means that when you get the tail of the stream, itself a stream, its head, that is the second element of the original stream, has to be computed. This is sometimes problematic, but in your example, you fail before even getting there.
Note that you can do what you want by defining a better pattern matcher for Stream:
Here's a bit I just pulled together in a Scala Worksheet:
object HammingTest {
// A convenience object for stream pattern matching
object #:: {
class TailWrapper[+A](s: Stream[A]) {
def unwrap = s.tail
}
object TailWrapper {
implicit def unwrap[A](wrapped: TailWrapper[A]) = wrapped.unwrap
}
def unapply[A](s: Stream[A]): Option[(A, TailWrapper[A])] = {
if (s.isEmpty) None
else {
Some(s.head, new TailWrapper(s))
}
}
}
def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
(a, b) match {
case (x #:: xs, y #:: ys) =>
if (x < y) x #:: merge(xs, b)
else if (y < x) y #:: merge(a, ys)
else x #:: merge(xs, ys)
} //> merge: (a: Stream[BigInt], b: Stream[BigInt])Stream[BigInt]
lazy val numbers: Stream[BigInt] =
1 #:: merge(numbers map { _ * 2 }, merge(numbers map { _ * 3 }, numbers map { _ * 5 }))
//> numbers : Stream[BigInt] = <lazy>
numbers.take(10).toList //> res0: List[BigInt] = List(1, 2, 3, 4, 5, 6, 8, 9, 10, 12)
}
Now you just need to make sure that Scala finds your object #:: instead of the one in Stream.class whenever it's doing pattern matching. To facilitate that, it might be best to use a different name like #>: or ##:: and then just remember to always use that name when pattern matching.
If you ever need to match the empty stream, use case Stream.Empty. Using case Stream() will attempt to evaluate your entire stream there in the pattern match, which will lead to sadness.

Selection sort in functional Scala

I'm making my way through "Programming in Scala" and wrote a quick implementation of the selection sort algorithm. However, since I'm still a bit green in functional programming, I'm having trouble translating to a more Scala-ish style. For the Scala programmers out there, how can I do this using Lists and vals rather than falling back into my imperative ways?
http://gist.github.com/225870
As starblue already said, you need a function that calculates the minimum of a list and returns the list with that element removed. Here is my tail recursive implementation of something similar (as I believe foldl is tail recursive in the standard library), and I tried to make it as functional as possible :). It returns a list that contains all the elements of the original list (but kindof reversed - see the explanation below) with the minimum as a head.
def minimum(xs: List[Int]): List[Int] =
(List(xs.head) /: xs.tail) {
(ys, x) =>
if(x < ys.head) (x :: ys)
else (ys.head :: x :: ys.tail)
}
This basically does a fold, starting with a list containing of the first element of xs If the first element of xs is smaller than the head of that list, we pre-append it to the list ys. Otherwise, we add it to the list ys as the second element. And so on recursively, we've folded our list into a new list containing the minimum element as a head and a list containing all the elements of xs (not necessarily in the same order) with the minimum removed, as a tail. Note that this function does not remove duplicates.
After creating this helper function, it's now easy to implement selection sort.
def selectionSort(xs: List[Int]): List[Int] =
if(xs.isEmpty) List()
else {
val ys = minimum(xs)
if(ys.tail.isEmpty)
ys
else
ys.head :: selectionSort(ys.tail)
}
Unfortunately this implementation is not tail recursive, so it will blow up the stack for large lists. Anyway, you shouldn't use a O(n^2) sort for large lists, but still... it would be nice if the implementation was tail recursive. I'll try to think of something... I think it will look like the implementation of a fold.
Tail Recursive!
To make it tail recursive, I use quite a common pattern in functional programming - an accumulator. It works a bit backward, as now I need a function called maximum, which basically does the same as minimum, but with the maximum element - its implementation is exact as minimum, but using > instead of <.
def selectionSort(xs: List[Int]) = {
def selectionSortHelper(xs: List[Int], accumulator: List[Int]): List[Int] =
if(xs.isEmpty) accumulator
else {
val ys = maximum(xs)
selectionSortHelper(ys.tail, ys.head :: accumulator)
}
selectionSortHelper(xs, Nil)
}
EDIT: Changed the answer to have the helper function as a subfunction of the selection sort function.
It basically accumulates the maxima to a list, which it eventually returns as the base case. You can also see that it is tail recursive by replacing accumulator by throw new NullPointerException - and then inspect the stack trace.
Here's a step by step sorting using an accumulator. The left hand side shows the list xs while the right hand side shows the accumulator. The maximum is indicated at each step by a star.
64* 25 12 22 11 ------- Nil
11 22 12 25* ------- 64
22* 12 11 ------- 25 64
11 12* ------- 22 25 64
11* ------- 12 22 25 64
Nil ------- 11 12 22 25 64
The following shows a step by step folding to calculate the maximum:
maximum(25 12 64 22 11)
25 :: Nil /: 12 64 22 11 -- 25 > 12, so it stays as head
25 :: 12 /: 64 22 11 -- same as above
64 :: 25 12 /: 22 11 -- 25 < 64, so the new head is 64
64 :: 22 25 12 /: 11 -- and stays so
64 :: 11 22 25 12 /: Nil -- until the end
64 11 22 25 12
You should have problems doing selection sort in functional style, as it is an in-place sort algorithm. In-place, by definition, isn't functional.
The main problem you'll face is that you can't swap elements. Here's why this is important. Suppose I have a list (a0 ... ax ... an), where ax is the minimum value. You need to get ax away, and then compose a list (a0 ... ax-1 ax+1 an). The problem is that you'll necessarily have to copy the elements a0 to ax-1, if you wish to remain purely functional. Other functional data structures, particularly trees, can have better performance than this, but the basic problem remains.
here is another implementation of selection sort (generic version).
def less[T <: Comparable[T]](i: T, j: T) = i.compareTo(j) < 0
def swap[T](xs: Array[T], i: Int, j: Int) { val tmp = xs(i); xs(i) = xs(j); xs(j) = tmp }
def selectiveSort[T <: Comparable[T]](xs: Array[T]) {
val n = xs.size
for (i <- 0 until n) {
val min = List.range(i + 1, n).foldLeft(i)((a, b) => if (less(xs(a), xs(b))) a else b)
swap(xs, i, min)
}
}
You need a helper function which does the selection. It should return the minimal element and the rest of the list with the element removed.
I think it's reasonably feasible to do a selection sort in a functional style, but as Daniel indicated, it has a good chance of performing horribly.
I just tried my hand at writing a functional bubble sort, as a slightly simpler and degenerate case of selection sort. Here's what I did, and this hints at what you could do:
define bubble(data)
if data is empty or just one element: return data;
otherwise, if the first element < the second,
return first element :: bubble(rest of data);
otherwise, return second element :: bubble(
first element :: (rest of data starting at 3rd element)).
Once that's finished recursing, the largest element is at the end of the list. Now,
define bubblesort [data]
apply bubble to data as often as there are elements in data.
When that's done, your data is indeed sorted. Yes, it's horrible, but my Clojure implementation of this pseudocode works.
Just concerning yourself with the first element or two and then leaving the rest of the work to a recursed activity is a lisp-y, functional-y way to do this kind of thing. But once you've gotten your mind accustomed to that kind of thinking, there are more sensible approaches to the problem.
I would recommend implementing a merge sort:
Break list into two sub-lists,
either by counting off half the elements into one sublist
and the rest in the other,
or by copying every other element from the original list
into either of the new lists.
Sort each of the two smaller lists (recursion here, obviously).
Assemble a new list by selecting the smaller from the front of either sub-list
until you've exhausted both sub-lists.
The recursion is in the middle of that, and I don't see a clever way of making the algorithm tail recursive. Still, I think it's O(log-2) in time and also doesn't place an exorbitant load on the stack.
Have fun, good luck!
Thanks for the hints above, they were very inspiring. Here's another functional approach to the selection sort algorithm. I tried to base it on the following idea: finding a max / min can be done quite easily by min(A)=if A=Nil ->Int.MaxValue else min(A.head, min(A.tail)). The first min is the min of a list, the second the min of two numbers. This is easy to understand, but unfortunately not tail recursive. Using the accumulator method the min definition can be transformed like this, now in correct Scala:
def min(x: Int,y: Int) = if (x<y) x else y
def min(xs: List[Int], accu: Int): Int = xs match {
case Nil => accu
case x :: ys => min(ys, min(accu, x))
}
(This is tail recursive)
Now a min version is needed which returns a list leaving out the min value. The following function returns a list whose head is the min value, the tail contains the rest of the original list:
def minl(xs: List[Int]): List[Int] = minl(xs, List(Int.MaxValue))
def minl(xs: List[Int],accu:List[Int]): List[Int] = xs match {
// accu always contains min as head
case Nil => accu take accu.length-1
case x :: ys => minl(ys,
if (x<accu.head) x::accu else accu.head :: x :: accu.tail )
}
Using this selection sort can be written tail recursively as:
def ssort(xs: List[Int], accu: List[Int]): List[Int] = minl(xs) match {
case Nil => accu
case min :: rest => ssort(rest, min::accu)
}
(reverses the order). In a test with 10000 list elements this algorithm is only about 4 times slower than the usual imperative algorithm.
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler and more performant code.
I don't think falling back to imperative style is a mistake for certain classes of problems, such as sorting algorithms which usually transform the input buffer in place rather than resulting to a new collection.
My solution is:
package bitspoke.algo
import scala.math.Ordered
import scala.collection.mutable.Buffer
abstract class Sorter[T <% Ordered[T]] {
// algorithm provided by subclasses
def sort(buffer : Buffer[T]) : Unit
// check if the buffer is sorted
def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 }
// swap elements in buffer
def swap(buffer : Buffer[T], i:Int, j:Int) {
val temp = buffer(i)
buffer(i) = buffer(j)
buffer(j) = temp
}
}
class SelectionSorter[T <% Ordered[T]] extends Sorter[T] {
def sort(buffer : Buffer[T]) : Unit = {
for (i <- 0 until buffer.length) {
var min = i
for (j <- i until buffer.length) {
if (buffer(j) < buffer(min))
min = j
}
swap(buffer, i, min)
}
}
}
As you can see, to achieve parametric polymorphism, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to Scala Implicit Conversions of primitive types to Rich Wrappers.
You can write a client program as follows:
import bitspoke.algo._
import scala.collection.mutable._
val sorter = new SelectionSorter[Int]
val buffer = ArrayBuffer(3, 0, 4, 2, 1)
sorter.sort(buffer)
assert(sorter.sorted(buffer))
A simple functional program for selection-sort in Scala
def selectionSort(list:List[Int]):List[Int] = {
#tailrec
def selectSortHelper(list:List[Int], accumList:List[Int] = List[Int]()): List[Int] = {
list match {
case Nil => accumList
case _ => {
val min = list.min
val requiredList = list.filter(_ != min)
selectSortHelper(requiredList, accumList ::: List.fill(list.length - requiredList.length)(min))
}
}
}
selectSortHelper(list)
}
You may want to try replacing your while loops with recursion, so, you have two places where you can create new recursive functions.
That would begin to get rid of some vars.
This was probably the toughest lesson for me, trying to move more toward FP.
I hesitate to show solutions here, as I think it would be better for you to try first.
But, if possible you should be using tail-recursion, to avoid problems with stack overflows (if you are sorting a very, very large list).
Here is my point of view on this problem: SelectionSort.scala
def selectionsort[A <% Ordered[A]](list: List[A]): List[A] = {
def sort(as: List[A], bs: List[A]): List[A] = as match {
case h :: t => select(h, t, Nil, bs)
case Nil => bs
}
def select(m: A, as: List[A], zs: List[A], bs: List[A]): List[A] =
as match {
case h :: t =>
if (m > h) select(m, t, h :: zs, bs)
else select(h, t, m :: zs, bs)
case Nil => sort(zs, m :: bs)
}
sort(list, Nil)
}
There are two inner functions: sort and select, which represents two loops in original algorithm. The first function sort iterates through the elements and call select for each of them. When the source list is empty it return bs list as result, which is initially Nil. The sort function tries to search for maximum (not minimum, since we build result list in reversive order) element in source list. It suppose that maximum is head by the default and then just replace it with a proper value.
This is 100% functional implementation of Selection Sort in Scala.
Here is my solution
def sort(list: List[Int]): List[Int] = {
#tailrec
def pivotCompare(p: Int, l: List[Int], accList: List[Int] = List.empty): List[Int] = {
l match {
case Nil => p +: accList
case x :: xs if p < x => pivotCompare(p, xs, accList :+ x)
case x :: xs => pivotCompare(x, xs, accList :+ p)
}
}
#tailrec
def loop(list: List[Int], accList: List[Int] = List.empty): List[Int] = {
list match {
case x :: xs =>
pivotCompare(x, xs) match {
case Nil => accList
case h :: tail => loop(tail, accList :+ h)
}
case Nil => accList
}
}
loop(list)
}