How do you write an idiomatic Scala Quicksort function? - scala

I recently answered a question with an attempt at writing a quicksort function in Scala, I'd seen something like the code below written somewhere.
def qsort(l: List[Int]): List[Int] = {
l match {
case Nil => Nil
case pivot::tail => qsort(tail.filter(_ < pivot)) ::: pivot :: qsort(tail.filter(_ >= pivot))
}
}
My answer received some constructive criticism pointing out that List was a poor choice of collection for quicksort and secondly that the above wasn't tail recursive.
I tried to re-write the above in a tail recursive manner but didn't have much luck. Is it possible to write a tail recursive quicksort? or, if not, how can it be done in a functional style? Also what can be done to maximize the efficiency of the implementation?

A few years back, I spent some time trying to optimize functional quicksort as far as I could. The following is what I came up with for vanilla List[A]:
def qsort[A : Ordering](ls: List[A]) = {
import Ordered._
def sort(ls: List[A])(parent: List[A]): List[A] = {
if (ls.size <= 1) ls ::: parent else {
val pivot = ls.head
val (less, equal, greater) = ls.foldLeft((List[A](), List[A](), List[A]())) {
case ((less, equal, greater), e) => {
if (e < pivot)
(e :: less, equal, greater)
else if (e == pivot)
(less, e :: equal, greater)
else
(less, equal, e :: greater)
}
}
sort(less)(equal ::: sort(greater)(parent))
}
}
sort(ls)(Nil)
}
I was able to do even better with a custom List structure. This custom structure basically tracked the ideal (or nearly ideal) pivot point for the list. Thus, I could obtain a far better pivot point in constant time, simply by accessing this special list property. In practice, this did quite a bit better than the standard functional approach of choosing the head.
As it is, the above is still pretty snappy. It's "half" tail recursive (you can't do a tail-recursive quicksort without getting really ugly). More importantly, it rebuilds from the tail end first, so that results in substantially fewer intermediate lists than the conventional approach.
It's important to note that this is not the most elegant or most idiomatic way to do quicksort in Scala, it just happens to work very well. You will probably have more success writing merge sort, which is usually a much faster algorithm when implemented in functional languages (not to mention much cleaner).

I guess it depends on what do you mean by "idiomatic". The main advantage of quicksort is being a very fast in-place sorting algorithm. So, if you can't sort in-place, you loose all its advantages -- but you're still stuck with it's dis advantages.
So, here's some code I wrote for Rosetta Code on this very subject. It still doesn't sort in-place, but, on the other hand, it sorts any of the new collections:
import scala.collection.TraversableLike
import scala.collection.generic.CanBuildFrom
def quicksort
[T, CC[X] <: Traversable[X] with TraversableLike[X, CC[X]]] // My type parameters -- which are expected to be inferred
(coll: CC[T]) // My explicit parameter -- the one users will actually see
(implicit ord: Ordering[T], cbf: CanBuildFrom[CC[T], T, CC[T]]) // My implicit parameters -- which will hopefully be implicitly available
: CC[T] = // My return type -- which is the very same type of the collection received
if (coll.isEmpty) {
coll
} else {
val (smaller, bigger) = coll.tail partition (ord.lt(_, coll.head))
quicksort(smaller) ++ coll.companion(coll.head) ++ quicksort(bigger)
}

As it happens I tried to solve this exact same problem recently. I wanted to have the classic algorithm (i.e. the one that does in-place sorting) converted to tail recursive form.
If you are still interested you may see my recommended solution here:
Quicksort rewritten in tail-recursive form - An example in Scala
The article also contains the steps I followed to convert the initial implementation to tail recursive form.

I did some experiments trying to write Quicksort in a purely functional style. Here is what I got (Quicksort.scala):
def quicksort[A <% Ordered[A]](list: List[A]): List[A] = {
def sort(t: (List[A], A, List[A])): List[A] = t match {
case (Nil, p, Nil) => List(p)
case (l, p, g) => partitionAndSort(l) ::: (p :: partitionAndSort(g))
}
def partition(as: List[A]): (List[A], A, List[A]) = {
def loop(p: A, as: List[A], l: List[A], g: List[A]): (List[A], A, List[A]) =
as match {
case h :: t => if (h < p) loop(p, t, h :: l, g) else loop(p, t, l, h :: g)
case Nil => (l, p, g)
}
loop(as.head, as.tail, Nil, Nil)
}
def partitionAndSort(as: List[A]): List[A] =
if (as.isEmpty) Nil
else sort(partition(as))
partitionAndSort(list)
}

My solution on Scala 3.
import scala.language.postfixOps
import scala.util.Random
val randomArray: Array[Int] = (for(_ <- 1 to 1000) yield Random.nextInt(1000)).toArray
def quickSort(inputArray: Array[Int]): Array[Int] =
inputArray.length match
case 0 => inputArray
case 1 => inputArray
case _ => Array.concat(
quickSort(inputArray.filter(inputArray(inputArray.length / 2)
inputArray.filter(inputArray(inputArray.length / 2) ==),
quickSort(inputArray.filter(inputArray(inputArray.length / 2)
print(quickSort(randomArray).mkString("Sorted array: (", ", ", ")"))

Related

How do you run a computation, that may fail, over a list of elements so that it terminates as soon as a failure is detected?

My computation that can fail is halve() below. It returns Left(errorMessage) to indicate failure and Right(value) to indicate the successful halving of a number.
def halve(n: Int): Either[String, Int] =
if (n % 2 == 0) {
Right(n / 2)
} else {
Left("cannot halve odd number")
}
I'd like to apply the halve function to a list of Ints such that as soon as the first call to halve fails (e.g. when called with an odd number), the halveAll function immediately stops iterating over the numbers in ns and returns Left(errorMessage).
Here is one way to achieve this:
def halveAll(ns: List[Int]): Either[String, List[Int]] =
try {
Right(
for {
n <- ns
Right(halved) = halve(n)
} yield n
)
} catch {
case ex: MatchError =>
Left("cannot match an odd number")
}
I would prefer an approach that does not use exceptions. Is there an idiomatic way of achieving this? I'd prefer the approach to use only functionality in the Scala 2.x standard library. If Cats or scalaz has an elegant solution, I'd be interested in hearing about it though.
Thank you!
Example usage of the halveAll function:
val allEven = List(2, 4, 6, 8)
val evenAndOdd = List(2, 4, 6, 7, 8)
println(halveAll(allEven))
println(halveAll(evenAndOdd))
This has been asked a dozen times but I am too lazy to search for a duplicate.
Have you ever heard the FP meme "the answer is always traverse"? Well, you are now part of that, since that is exactly the function you want.
Thus, if you have cats in scope then you only need to do this:
import cats.syntax.all._
def halveAll(ns: List[Int]): Either[String, List[Int]] =
ns.traverse(halve)
If you don't have it already in scope, and don't want to add it just for a single function then you may use the foldLeft from Gaƫl J answer, or implement the recursion if you really want to stop iterating, like this:
def traverse[A, E, B](list: List[A])(f: A => Either[E, B]): Either[E, List[B]] = {
#annotation.tailrec
def loop(remaining: List[A], acc: List[B]): Either[E, List[B]] =
remaining match {
case a :: tail =>
f(a) match {
case Right(b) =>
loop(remaining = tail, b :: acc)
case Left(e) =>
Left(e)
}
case Nil =>
Right(acc.reverse)
}
loop(remaining = list, acc = List.empty)
}
Disclaimer: What follows is only my opinion.
I have heard the argument about not including cats for a single function many times, people simply don't realize is not just one function but probably many of them in the rest of the codebase; which ultimately means you are probably re-implementing many bits of the library in a worse way and with less testing.
The typical Scala approach (without libs) for this would be using foldLeft or a variant like this:
def halveAll(ns: List[Int]): Either[String, List[Int]] = {
ns.foldLeft(Right(List.empty[Int])) { (acc, n) =>
for { // for-comprehension on Either
accVal <- acc
x <- halve(n)
} yield accVal :+ x
}
}
As soon as a Left is produced by halve, it will continue iterating but will not call halve on the remaining items.
If you really need to not iterate anymore, you can use a recursive approach instead.
I guess it depends the size of the list but iterating over it should not be that costly most of the time.

Is this foldRight for this Streams implementation lazy or strict

I am reading Functional Programming in Scala and am on chapter 5, which covers strictness and laziness. I've done all the problems in the book up to 5.13.
One thing I realized as I was working through the problems was that I didn't really understand all the nuances of foldRight until I started doing the problems in this chapter. My question is about the implementation of foldRight given in this chapter.
Let's begin by looking at the implementation for Streams given in this chapter.
import Stream._
trait Stream[+A] {
def toList: List[A] = this match {
case Empty => Nil
case Cons(h, t) => h() :: t().toList
}
def foldRight[B](z: => B)(f: (A, => B) => B): B = // The arrow `=>` in front of the argument type `B` means that the function `f` takes its second argument by name and may choose not to evaluate it.
this match {
case Cons(h,t) => f(h(), t().foldRight(z)(f)) // If `f` doesn't evaluate its second argument, the recursion never occurs.
case _ => z
}
def map[B](f: A => B): Stream[B] =
foldRight(empty[B])((h,t) => cons(f(h), t))
def filter(f: A => Boolean): Stream[A] =
foldRight(empty[A])((h,t) =>
if (f(h)) cons(h, t)
else t)
}
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](hd: => A, tl: => Stream[A]): Stream[A] = {
lazy val head = hd
lazy val tail = tl
Cons(() => head, () => tail)
}
def empty[A]: Stream[A] = Empty
def apply[A](as: A*): Stream[A] =
if (as.isEmpty) empty
else cons(as.head, apply(as.tail: _*))
}
In particular, let's zoom in on the definition of foldRight given here:
def foldRight[B](z: => B)(f: (A, => B) => B): B = // The arrow `=>` in front of the argument type `B` means that the function `f` takes its second argument by name and may choose not to evaluate it.
this match {
case Cons(h,t) => f(h(), t().foldRight(z)(f)) // If `f` doesn't evaluate its second argument, the recursion never occurs.
case _ => z
}
From the text in the book, we know that this implementation of foldRight will not evaluate the second argument of the "chain" function unless it is called. Therefore, foldRight can terminate early before evaluating all the elements of the Stream. Consider this implementation of takeWhile in terms of foldRight as an example:
def takeWhile(f: A => Boolean): Stream[A] =
foldRight(empty[A])((h,t) =>
if (f(h)) cons(h,t)
else empty)
This call to foldRight in the definition of takeWhile can terminate early before traversing all the elements of the Stream. So we know foldRight can terminate early. However, this is not the same thing as saying foldRight is lazy. My question is if this implementation of foldRight is itself lazy or strict. That is, does it apply f to every relevant element in the Stream?
For example, let us take this example:
val someStream = Stream.apply(0, 1, 2, 3, 5)
someStream.foldRight(Empty:Stream[Int]){ (e, acc) =>
Cons(()=>e, ()=>acc)
}
What this call to foldRight would normally do for a non-lazy collection such as a list is that it will copy the list. For this call to someStream, however, will it create a copy of all the members of the Stream? Or will it only evaluate the first term?
Recall that the function in foldRight has the following signature: (A, => B) => B. My question really boils down to when acc, the call-by-name second argument of type B, is evaluated. Is it evaluated when f is first called? Or is acc evaluated when ()=>acc is evaluated? Depending on which one it is, we will either evaluate all the members of the Stream or only the first one when we perform the fold.
And just to reiterate, saying that foldRight terminates early and saying that foldRight is lazy are two different things. The discussion in the book makes it clear that the first statement can be true. This foldRight can terminate early without traversing all the elements of the Stream. But is the second statement true? Is the foldRight we have here lazy?
I bring this up because there is a program trace on page 72 that I don't think I fully understand. It is linked to whether foldRight is itself lazy or not.
Stream(1,2,3,4).map(_ + 10).filter(_ % 2 == 0).toList
cons(11, Stream(2,3,4).map(_ + 10)).filter(_ % 2 == 0).toList
Stream(2,3,4).map(_ + 10).filter(_ % 2 == 0).toList
cons(12, Stream(3,4).map(_ + 10)).filter(_ % 2 == 0).toList
12 :: Stream(3,4).map(_ + 10).filter(_ % 2 == 0).toList
12 :: cons(13, Stream(4).map(_ + 10)).filter(_ % 2 == 0).toList
12 :: Stream(4).map(_ + 10).filter(_ % 2 == 0).toList
12 :: cons(14, Stream().map(_ + 10)).filter(_ % 2 == 0).toList
12 :: 14 :: Stream().map(_ + 10).filter(_ % 2 == 0).toList
12 :: 14 :: List()
Notice how in this trace, the first term (the 1) is passed through the map and filter once before any of the subsequent terms are. Since the map and filter are defined in the Stream using foldRight, this has implications for my question.
So, let's summarize my question:
1) Is foldRight in the implementation given lazy or strict? This is a different question from whether it terminates early.
2) For the following example code,
val someStream = Stream.apply(0, 1, 2, 3, 5)
someStream.foldRight(Empty:Stream[Int]){ (e, acc) =>
Cons(()=>e, ()=>acc)
}
When is acc, the call-by-name parameter, evaluated? Is it when f is called and (e, acc) => Cons(()=>e, ()=>acc) is first created? Or is it when ()=>acc is evaluated?
3) Please explain why in the program trace given, the first term is passed through the map and filter first before moving onto the second term.
You are encouraged to use the text of Functional Programming in Scala to make your point.

Scala: different foldRight implementations in list

I've just figured out that scala (I'm on 2.12) provides completely different implementations of foldRight for immutable list and mutable list.
Immutable list (List.scala):
override def foldRight[B](z: B)(op: (A, B) => B): B =
reverse.foldLeft(z)((right, left) => op(left, right))
Mutable list (LinearSeqOptimized.scala):
def foldRight[B](z: B)(#deprecatedName('f) op: (A, B) => B): B =
if (this.isEmpty) z
else op(head, tail.foldRight(z)(op))
Now I'm just curious.
Could you please explain me why was it implemented so differently?
The override in List seems to override the foldRight in LinearSeqOptimized. The implementation in LinearSeqOptimized
def foldRight[B](z: B)(#deprecatedName('f) op: (A, B) => B): B =
if (this.isEmpty) z
else op(head, tail.foldRight(z)(op))
looks exactly like the canonical definition of foldRight as a catamorphism from your average theory book. However, as was noticed in SI-2818, this implementation is not stack-safe (throws unexpected StackOverflowError for long lists). Therefore, it was replaced by a stack-safe reverse.foldLeft in this commit. The foldLeft is stack-safe, because it has been implemented by a while loop:
def foldLeft[B](z: B)(#deprecatedName('f) op: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = op(acc, these.head)
these = these.tail
}
acc
}
That hopefully explains why it was overridden in List. It doesn't explain why it was not overridden in other classes. I guess it's simply because the mutable data structures are used less often and quite differently anyway (often as buffers and accumulators during the construction of immutable ones).
Hint: there is a blame button in the top right corner over every file on Github, so you can always track down what was changed when, by whom, and why.

Corecursion vs Recursion understanding in scala

if with recursion almost clear, for example
def product2(ints: List[Int]): Int = {
#tailrec
def productAccumulator(ints: List[Int], accum: Int): Int = {
ints match {
case Nil => accum
case x :: tail => productAccumulator(tail, accum * x)
}
}
productAccumulator(ints, 1)
}
I am not sure about to the corecursion. According to the Wikipedia article, "corecursion allows programs to produce arbitrarily complex and potentially infinite data structures, such as streams". For example construction like this
list.filter(...).map(...)
makes to posible prepare temporary streams after filter and map operations.
after filter stream will be collect only filtered elements, and next in the map we will change elements. Correct?
Do functional combinators use recursion executions for map filter
Does any body have good example in Scala "comparing recursion and corecursion"?
The simplest way to understand the difference is to think that recursion consumes data while corecursion produces data. Your example is recursion since it consumes the list you provide as parameter. Also, foldLeft and foldRight are recursion too, not corecursion. Now an example of corecursion. Consider the following function:
def unfold[A, S](z: S)(f: S => Option[(A, S)]): Stream[A]
Just by looking at its signature you can see this function is intended to produce an infinite stream of data. It takes an initial state, z of type S, and a function from S to a possible tuple that will contain the next state and the actual value of the stream, that is of type A. If the result of f is empty (None) then unfold stops producing elements otherwise it goes on passing the next state and so on. Here is its implementation:
def unfold[S, A](z: S)(f: S => Option[(A, S)]): Stream[A] = f(z) match {
case Some((a, s)) => a #:: unfold(s)(f)
case None => Stream.empty[A]
}
You can use this function to implement other productive functions. E.g. the following function will produce a stream of, at most, numOfValues elements of type A:
def elements[A](element: A, numOfValues: Int): Stream[A] = unfold(numOfValues) { x =>
if (x > 0) Some((element, x - 1)) else None
}
Usage example in REPL:
scala> elements("hello", 3)
res10: Stream[String] = Stream(hello, ?)
scala> res10.toList
res11: List[String] = List(hello, hello, hello)

Idiomatic construction to check whether a collection is ordered

With the intention of learning and further to this question, I've remained curious of the idiomatic alternatives to explicit recursion for an algorithm that checks whether a list (or collection) is ordered. (I'm keeping things simple here by using an operator to compare and Int as type; I'd like to look at the algorithm before delving into the generics of it)
The basic recursive version would be (by #Luigi Plinge):
def isOrdered(l:List[Int]): Boolean = l match {
case Nil => true
case x :: Nil => true
case x :: xs => x <= xs.head && isOrdered(xs)
}
A poor performing idiomatic way would be:
def isOrdered(l: List[Int]) = l == l.sorted
An alternative algorithm using fold:
def isOrdered(l: List[Int]) =
l.foldLeft((true, None:Option[Int]))((x,y) =>
(x._1 && x._2.map(_ <= y).getOrElse(true), Some(y)))._1
It has the drawback that it will compare for all n elements of the list even if it could stop earlier after finding the first out-of-order element. Is there a way to "stop" fold and therefore making this a better solution?
Any other (elegant) alternatives?
This will exit after the first element that is out of order. It should thus perform well, but I haven't tested that. It's also a lot more elegant in my opinion. :)
def sorted(l:List[Int]) = l.view.zip(l.tail).forall(x => x._1 <= x._2)
By "idiomatic", I assume you're talking about McBride and Paterson's "Idioms" in their paper Applicative Programming With Effects. :o)
Here's how you would use their idioms to check if a collection is ordered:
import scalaz._
import Scalaz._
case class Lte[A](v: A, b: Boolean)
implicit def lteSemigroup[A:Order] = new Semigroup[Lte[A]] {
def append(a1: Lte[A], a2: => Lte[A]) = {
lazy val b = a1.v lte a2.v
Lte(if (!a1.b || b) a1.v else a2.v, a1.b && b && a2.b)
}
}
def isOrdered[T[_]:Traverse, A:Order](ta: T[A]) =
ta.foldMapDefault(x => some(Lte(x, true))).fold(_.b, true)
Here's how this works:
Any data structure T[A] where there exists an implementation of Traverse[T], can be traversed with an Applicative functor, or "idiom", or "strong lax monoidal functor". It just so happens that every Monoid induces such an idiom for free (see section 4 of the paper).
A monoid is just an associative binary operation over some type, and an identity element for that operation. I'm defining a Semigroup[Lte[A]] (a semigroup is the same as a monoid, except without the identity element) whose associative operation tracks the lesser of two values and whether the left value is less than the right value. And of course Option[Lte[A]] is just the monoid generated freely by our semigroup.
Finally, foldMapDefault traverses the collection type T in the idiom induced by the monoid. The result b will contain true if each value was less than all the following ones (meaning the collection was ordered), or None if the T had no elements. Since an empty T is sorted by convention, we pass true as the second argument to the final fold of the Option.
As a bonus, this works for all traversable collections. A demo:
scala> val b = isOrdered(List(1,3,5,7,123))
b: Boolean = true
scala> val b = isOrdered(Seq(5,7,2,3,6))
b: Boolean = false
scala> val b = isOrdered(Map((2 -> 22, 33 -> 3)))
b: Boolean = true
scala> val b = isOrdered(some("hello"))
b: Boolean = true
A test:
import org.scalacheck._
scala> val p = forAll((xs: List[Int]) => (xs /== xs.sorted) ==> !isOrdered(xs))
p:org.scalacheck.Prop = Prop
scala> val q = forAll((xs: List[Int]) => isOrdered(xs.sorted))
q: org.scalacheck.Prop = Prop
scala> p && q check
+ OK, passed 100 tests.
And that's how you do idiomatic traversal to detect if a collection is ordered.
I'm going with this, which is pretty similar to Kim Stebel's, as a matter of fact.
def isOrdered(list: List[Int]): Boolean = (
list
sliding 2
map {
case List(a, b) => () => a < b
}
forall (_())
)
In case you missed missingfaktor's elegant solution in the comments above:
Scala < 2.13.0
(l, l.tail).zipped.forall(_ <= _)
Scala 2.13.x+
l.lazyZip(l.tail).forall(_ <= _)
This solution is very readable and will exit on the first out-of-order element.
The recursive version is fine, but limited to List (with limited changes, it would work well on LinearSeq).
If it was implemented in the standard library (would make sense) it would probably be done in IterableLike and have a completely imperative implementation (see for instance method find)
You can interrupt the foldLeft with a return (in which case you need only the previous element and not boolean all along)
import Ordering.Implicits._
def isOrdered[A: Ordering](seq: Seq[A]): Boolean = {
if (!seq.isEmpty)
seq.tail.foldLeft(seq.head){(previous, current) =>
if (previous > current) return false; current
}
true
}
but I don't see how it is any better or even idiomatic than an imperative implementation. I'm not sure I would not call it imperative actually.
Another solution could be
def isOrdered[A: Ordering](seq: Seq[A]): Boolean =
! seq.sliding(2).exists{s => s.length == 2 && s(0) > s(1)}
Rather concise, and maybe that could be called idiomatic, I'm not sure. But I think it is not too clear. Moreover, all of those methods would probably perform much worse than the imperative or tail recursive version, and I do not think they have any added clarity that would buy that.
Also you should have a look at this question.
To stop iteration, you can use Iteratee:
import scalaz._
import Scalaz._
import IterV._
import math.Ordering
import Ordering.Implicits._
implicit val ListEnumerator = new Enumerator[List] {
def apply[E, A](e: List[E], i: IterV[E, A]): IterV[E, A] = e match {
case List() => i
case x :: xs => i.fold(done = (_, _) => i,
cont = k => apply(xs, k(El(x))))
}
}
def sorted[E: Ordering] : IterV[E, Boolean] = {
def step(is: Boolean, e: E)(s: Input[E]): IterV[E, Boolean] =
s(el = e2 => if (is && e < e2)
Cont(step(is, e2))
else
Done(false, EOF[E]),
empty = Cont(step(is, e)),
eof = Done(is, EOF[E]))
def first(s: Input[E]): IterV[E, Boolean] =
s(el = e1 => Cont(step(true, e1)),
empty = Cont(first),
eof = Done(true, EOF[E]))
Cont(first)
}
scala> val s = sorted[Int]
s: scalaz.IterV[Int,Boolean] = scalaz.IterV$Cont$$anon$2#5e9132b3
scala> s(List(1,2,3)).run
res11: Boolean = true
scala> s(List(1,2,3,0)).run
res12: Boolean = false
If you split the List into two parts, and check whether the last of the first part is lower than the first of the second part. If so, you could check in parallel for both parts. Here the schematic idea, first without parallel:
def isOrdered (l: List [Int]): Boolean = l.size/2 match {
case 0 => true
case m => {
val low = l.take (m)
val high = l.drop (m)
low.last <= high.head && isOrdered (low) && isOrdered (high)
}
}
And now with parallel, and using splitAt instead of take/drop:
def isOrdered (l: List[Int]): Boolean = l.size/2 match {
case 0 => true
case m => {
val (low, high) = l.splitAt (m)
low.last <= high.head && ! List (low, high).par.exists (x => isOrdered (x) == false)
}
}
def isSorted[A <: Ordered[A]](sequence: List[A]): Boolean = {
sequence match {
case Nil => true
case x::Nil => true
case x::y::rest => (x < y) && isSorted(y::rest)
}
}
Explain how it works.
my solution combine with missingfaktor's solution and Ordering
def isSorted[T](l: Seq[T])(implicit ord: Ordering[T]) = (l, l.tail).zipped.forall(ord.lt(_, _))
and you can use your own comparison method. E.g.
isSorted(dataList)(Ordering.by[Post, Date](_.lastUpdateTime))