Akka streams — filtering by the number of elements in stream - scala

I'm writing an app in Scala and I'm using Akka streams.
At one point, I need to filter out streams that have less than N elements, with N given. So, for example, with N=5:
Source(List(1,2,3)).via(myFilter) // => List()
Source(List(1,2,3,4)).via(myFilter) // => List()
will become empty streams, and
Source(List(1,2,3,4,5)).via(myFilter) // => List(1,2,3,4,5)
Source(List(1,2,3,4,5,6)).via(myFilter) // => List(1,2,3,4,5,6)
will be unchanged.
Of course, we can't know the number of elements in the stream until it's over, and waiting till the end before pushing it through might not be the best idea.
So, instead, I've thought about the following algorithm:
for the first N-1 elements, just buffer them, without passing further;
if the input stream finishes before reaching the Nth element, output an empty stream;
if the input stream reaches Nth element, output the buffered N-1 elements, then output the Nth element, then pass all the following elements that come.
However, I have no idea how to build a Flow element implementing it. Are there some built-in Akka elements I could use?
Edit:
Okay, so I played with it yesterday and I came up with something like that:
Flow[Int].
prefixAndTail(N).
flatMapConcat {
case (prefix, tail) if prefix.length == N =>
Source(prefix).concat(tail)
case _ =>
Source.empty[Int]
}
Will it do what I want?

Perhaps statefulMapConcat could help you:
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
import akka.stream.{ActorMaterializer, Materializer}
import scala.collection.mutable.ListBuffer
import scala.concurrent.ExecutionContext
object StatefulMapConcatExample extends App {
implicit val system: ActorSystem = ActorSystem()
implicit val materializer: Materializer = ActorMaterializer()
implicit val ec: ExecutionContext = scala.concurrent.ExecutionContext.Implicits.global
def filterLessThen(threshold: Int): (Int) => List[Int] = {
var buffering = true
val buffer: ListBuffer[Int] = ListBuffer()
(elem: Int) =>
if (buffering) {
buffer += elem
if (buffer.size < threshold) {
Nil
} else {
buffering = false
buffer.toList
}
} else {
List(elem)
}
}
//Nil
Source(List(1, 2, 3)).statefulMapConcat(() => filterLessThen(5))
.runWith(Sink.seq).map(println)
//Nil
Source(List(1, 2, 3, 4)).statefulMapConcat(() => filterLessThen(5))
.runWith(Sink.seq).map(println)
//Vector(1,2,3,4,5)
Source(List(1, 2, 3, 4, 5)).statefulMapConcat(() => filterLessThen(5))
.runWith(Sink.seq).map(println)
//Vector(1,2,3,4,5,6)
Source(List(1, 2, 3, 4, 5, 6)).statefulMapConcat(() => filterLessThen(5))
.runWith(Sink.seq).map(println)
}

This may be one of those instances where a little "state" can go a long way. Even though the solution is not "purely functional", the updating state will be isolated and unreachable by the rest of the system. I think this is one of the beauties of scala: when an FP solution isn't obvious you can always revert to imperative in an isolated manner...
The completed Flow will be a combination of multiple sub-parts. The first Flow will just group your elements into sequences of size N:
val group : Int => Flow[Int, Seq[Int], _] =
(N) => Flow[Int] grouped N
Now for the non-functional part, a filter that will only allow the grouped Seq values through if the first sequence was the right size:
val minSizeRequirement : Int => Seq[Int] => Boolean =
(minSize) => {
var isFirst : Boolean = True
var passedMinSize : Boolean = False
(testSeq) => {
if(isFirst) {
isFirst = False
passedMinSize = testSeq.size >= minSize
passedMinSize
}
else
passedMinSize
}
}
}
val minSizeFilter : Int => Flow[Seq[Int], Seq[Int], _] =
(minSize) => Flow[Seq[Int]].filter(minSizeRequirement(minSize))
The last step is to convert the Seq[Int] values back into Int values:
val flatten = Flow[Seq[Int]].flatMapConcat(l => Source(l))
Finally, combine them all together:
val combinedFlow : Int => Flow[Int, Int, _] =
(minSize) =>
group(minSize)
.via(minSizeFilter(minSize))
.via(flatten)

Related

Grouping a stream of elements into multiple streams

Let's say we have a case class MyCaseClass(name: String, value: Int). Given an fs2.Stream[F, MyCaseClass] I want to group elements with the same name
val sourceStream: fs2.Stream[F, MyCaseClass] = //
val groupedSameNameStream: fs2.Stream[F, fs2.Stream[F, MyCaseClass]] = //
The reason I need to do this is I want to apply effectfful transformation
val transform: MyCaseClass => F[Unit] = //
to all elements of a stream and in case one group fails the other should keep working.
Is something like this possible to do?
This is possible, with caveats.
It's relatively straightforward to do this if you accept having a Map with an unbounded number of keys, and an unbounded number of associated Queues for each.
We've use code based on a gist by github user kiambogo in production (though ours has been tweaked), and it works fine:
import fs2.concurrent.Queue
import cats.implicits._
import cats.effect.Concurrent
import cats.effect.concurrent.Ref
def groupBy[F[_], A, K](selector: A => F[K])(implicit F: Concurrent[F]): Pipe[F, A, (K, Stream[F, A])] = {
in =>
Stream.eval(Ref.of[F, Map[K, Queue[F, Option[A]]]](Map.empty)).flatMap { st =>
val cleanup = {
import alleycats.std.all._
st.get.flatMap(_.traverse_(_.enqueue1(None)))
}
(in ++ Stream.eval_(cleanup))
.evalMap { el =>
(selector(el), st.get).mapN { (key, queues) =>
queues.get(key).fold {
for {
newQ <- Queue.unbounded[F, Option[A]] // Create a new queue
_ <- st.modify(x => (x + (key -> newQ), x)) // Update the ref of queues
_ <- newQ.enqueue1(el.some)
} yield (key -> newQ.dequeue.unNoneTerminate).some
}(_.enqueue1(el.some) as None)
}.flatten
}.unNone.onFinalize(cleanup)
}
}
If we assume an overhead of 64 bytes for each Map entry (I believe this is very overestimated) then a cardinality of 100,000 unique keys gives us approximately 6.1MiB - well within reasonable size for a jvm process.

Splitting a Monix Observable

I would like to write a split function for monix.reactive.Observable. It should split a source Observable[A] into a new pair (Observable[A], Observable[A]), based on the value of a predicate, evaluated against each element in the source. I would like the split to work independently of whether the source Observable is hot or cold. In the case where the source is cold, the new pair of Observables should also be cold and where the source is hot the new pair of Observables will be hot. I would like to know if such an implementation is possible and, if so, how (I have pasted a failing testcase below).
The signature, as a method on an implicit class, would look like, or similar to
/**
* Split an observable by a predicate, placing values for which the predicate returns true
* to the right (and values for which the predicate returns false to the left).
* This is consistent with the convention adopted by Either.cond.
*/
def split(p: T => Boolean)(implicit scheduler: Scheduler, taskLike: TaskLike[Future]): (Observable[T], Observable[T]) = {
splitEither[T, T](elem => Either.cond(p(elem), elem, elem))
}
Currently, I have a naive implementation that consumes the source elements and pushes them to PublishSubject. The new pair of Observables is thus hot. My tests for a cold Observable are failing.
import monix.eval.TaskLike
import monix.execution.{Ack, Scheduler}
import monix.reactive.{Observable, Observer}
import monix.reactive.subjects.PublishSubject
import scala.concurrent.Future
object ObservableOps {
implicit class ObservableExtensions[T](o: Observable[T]) {
/**
* Split an observable by a predicate, placing values for which the predicate returns true
* to the right (and values for which the predicate returns false to the left).
* This is consistent with the convention adopted by Either.cond.
*/
def split(p: T => Boolean)(implicit scheduler: Scheduler, taskLike: TaskLike[Future]): (Observable[T], Observable[T]) = {
splitEither[T, T](elem => Either.cond(p(elem), elem, elem))
}
/**
* Split an observable into a pair of Observables, one left, one right, according
* to a determinant function.
*/
def splitEither[U, V](f: T => Either[U, V])(implicit scheduler: Scheduler, taskLike: TaskLike[Future]): (Observable[U], Observable[V]) = {
val l = PublishSubject[U]()
val r = PublishSubject[V]()
o.subscribe(new Observer[T] {
override def onNext(elem: T): Future[Ack] = {
f(elem) match {
case Left(u) => l.onNext(u)
case Right(v) => r.onNext(v)
}
}
override def onError(ex: Throwable): Unit = {
l.onError(ex)
r.onError(ex)
}
override def onComplete(): Unit = {
l.onComplete()
r.onComplete()
}
})
(l, r)
}
}
}
//////////
import ObservableOps._
import monix.execution.Scheduler.Implicits.global
import monix.reactive.Observable
import monix.reactive.subjects.PublishSubject
import org.scalatest.FlatSpec
import org.scalatest.Matchers._
import org.scalatest.concurrent.ScalaFutures._
class ObservableOpsSpec extends FlatSpec {
val isEven: Int => Boolean = _ % 2 == 0
"Observable Ops" should "split a cold observable" in {
val o = Observable(1, 2, 3, 4, 5)
val (l, r) = o.split(isEven)
l.toListL.runToFuture.futureValue shouldBe List(1, 3, 5)
r.toListL.runToFuture.futureValue shouldBe List(2, 4)
}
"Observable Ops" should "split a hot observable" in {
val o = PublishSubject[Int]()
val (l, r) = o.split(isEven)
val lbuf = l.toListL.runToFuture
val rbuf = r.toListL.runToFuture
Observable.fromIterable(1 to 5).mapEvalF(i => o.onNext(i)).subscribe()
o.onComplete()
lbuf.futureValue shouldBe List(1, 3, 5)
rbuf.futureValue shouldBe List(2, 4)
}
}
I expect both testcases above to pass but "Observable Ops" should "split a cold observable" is failing.
Edit: working code
An implementation that passes both test cases is as follows:
import monix.execution.Scheduler
import monix.reactive.Observable
object ObservableOps {
implicit class ObservableExtension[T](o: Observable[T]) {
/**
* Split an observable by a predicate, placing values for which the predicate returns true
* to the right (and values for which the predicate returns false to the left).
* This is consistent with the convention adopted by Either.cond.
*/
def split(
p: T => Boolean
)(implicit scheduler: Scheduler): (Observable[T], Observable[T]) = {
splitEither[T, T](elem => Either.cond(p(elem), elem, elem))
}
/**
* Split an observable into a pair of Observables, one left, one right, according
* to a determinant function.
*/
def splitEither[U, V](
f: T => Either[U, V]
)(implicit scheduler: Scheduler): (Observable[U], Observable[V]) = {
val oo = o.map(f)
val l = oo.collect {
case Left(u) => u
}
val r = oo.collect {
case Right(v) => v
}
(l, r)
}
}
}
class ObservableOpsSpec extends FlatSpec {
val isEven: Int => Boolean = _ % 2 == 0
"Observable Ops" should "split a cold observable" in {
val o = Observable(1, 2, 3, 4, 5)
val o2 = o.publish
val (l, r) = o2.split(isEven)
val x= l.toListL.runToFuture
val y = r.toListL.runToFuture
o2.connect()
x.futureValue shouldBe List(1, 3, 5)
y.futureValue shouldBe List(2, 4)
}
"Observable Ops" should "split a hot observable" in {
val o = PublishSubject[Int]()
val (l, r) = o.split(isEven)
val lbuf = l.toListL.runToFuture
val rbuf = r.toListL.runToFuture
Observable.fromIterable(1 to 5).mapEvalF(i => o.onNext(i)).subscribe()
o.onComplete()
lbuf.futureValue shouldBe List(1, 3, 5)
rbuf.futureValue shouldBe List(2, 4)
}
}
Cold observable, by definition, is lazily evaluated for each subscriber. You can't split it without either evaluating everything twice or converting it into hot one.
If you don't mind evaluating everything twice, just use .filter two times.
If you don't mind converting to hot, do it with .publish (or .publish.refCount so you don't need to connect manually).
If you want to preserve cold/hot property and process two pieces in parallel, there's a publishSelector method that lets you treat any observable like a hot one in a limited scope:
coldOrHot.publishSelector { totallyHot =>
val s1 = totallyHot.filter(...).flatMap(...) // any processing
val s2 = totallyHot.filter(...).mapEval(...) // any processing 2
Observable(s1, s2).merge
}
It's limitation, apart from scope, is that result of inner lambda has to be another Observable (which will be returned from publishSelector), so you can't have the helper with the signature you want. But the result will still be cold if the original was cold.

How to use takeWhile with an Iterator in Scala

I have a Iterator of elements and I want to consume them until a condition is met in the next element, like:
val it = List(1,1,1,1,2,2,2).iterator
val res1 = it.takeWhile( _ == 1).toList
val res2 = it.takeWhile(_ == 2).toList
res1 gives an expected List(1,1,1,1) but res2 returns List(2,2) because iterator had to check the element in position 4.
I know that the list will be ordered so there is no point in traversing the whole list like partition does. I like to finish as soon as the condition is not met. Is there any clever way to do this with Iterators? I can not do a toList to the iterator because it comes from a very big file.
The simplest solution I found:
val it = List(1,1,1,1,2,2,2).iterator
val (r1, it2) = it.span( _ == 1)
println(s"group taken is: ${r1.toList}\n rest is: ${it2.toList}")
output:
group taken is: List(1, 1, 1, 1)
rest is: List(2, 2, 2)
Very short but further you have to use new iterator.
With any immutable collection it would be similar:
use takeWhile when you want only some prefix of collection,
use span when you need rest also.
With my other answer (which I've left separate as they are largely unrelated), I think you can implement groupWhen on Iterator as follows:
def groupWhen[A](itr: Iterator[A])(p: (A, A) => Boolean): Iterator[List[A]] = {
#annotation.tailrec
def groupWhen0(acc: Iterator[List[A]], itr: Iterator[A])(p: (A, A) => Boolean): Iterator[List[A]] = {
val (dup1, dup2) = itr.duplicate
val pref = ((dup1.sliding(2) takeWhile { case Seq(a1, a2) => p(a1, a2) }).zipWithIndex collect {
case (seq, 0) => seq
case (Seq(_, a), _) => Seq(a)
}).flatten.toList
val newAcc = if (pref.isEmpty) acc else acc ++ Iterator(pref)
if (dup2.nonEmpty)
groupWhen0(newAcc, dup2 drop (pref.length max 1))(p)
else newAcc
}
groupWhen0(Iterator.empty, itr)(p)
}
When I run it on an example:
println( groupWhen(List(1,1,1,1,3,4,3,2,2,2).iterator)(_ == _).toList )
I get List(List(1, 1, 1, 1), List(2, 2, 2))
I had a similar need, but the solution from #oxbow_lakes does not take into account the situation when the list has only one element, or even if the list contains elements that are not repeated. Also, that solution doesn't lend itself well to an infinite iterator (it wants to "see" all the elements before it gives you a result).
What I needed was the ability to group sequential elements that match a predicate, but also include the single elements (I can always filter them out if I don't need them). I needed those groups to be delivered continuously, without having to wait for the original iterator to be completely consumed before they are produced.
I came up with the following approach which works for my needs, and thought I should share:
implicit class IteratorEx[+A](itr: Iterator[A]) {
def groupWhen(p: (A, A) => Boolean): Iterator[List[A]] = new AbstractIterator[List[A]] {
val (it1, it2) = itr.duplicate
val ritr = new RewindableIterator(it1, 1)
override def hasNext = it2.hasNext
override def next() = {
val count = (ritr.rewind().sliding(2) takeWhile {
case Seq(a1, a2) => p(a1, a2)
case _ => false
}).length
(it2 take (count + 1)).toList
}
}
}
The above is using a few helper classes:
abstract class AbstractIterator[A] extends Iterator[A]
/**
* Wraps a given iterator to add the ability to remember the last 'remember' values
* From any position the iterator can be rewound (can go back) at most 'remember' values,
* such that when calling 'next()' the memoized values will be provided as if they have not
* been iterated over before.
*/
class RewindableIterator[A](it: Iterator[A], remember: Int) extends Iterator[A] {
private var memory = List.empty[A]
private var memoryIndex = 0
override def next() = {
if (memoryIndex < memory.length) {
val next = memory(memoryIndex)
memoryIndex += 1
next
} else {
val next = it.next()
memory = memory :+ next
if (memory.length > remember)
memory = memory drop 1
memoryIndex = memory.length
next
}
}
def canRewind(n: Int) = memoryIndex - n >= 0
def rewind(n: Int) = {
require(memoryIndex - n >= 0, "Attempted to rewind past 'remember' limit")
memoryIndex -= n
this
}
def rewind() = {
memoryIndex = 0
this
}
override def hasNext = it.hasNext
}
Example use:
List(1,2,2,3,3,3,4,5,5).iterator.groupWhen(_ == _).toList
gives: List(List(1), List(2, 2), List(3, 3, 3), List(4), List(5, 5))
If you want to filter out the single elements, just apply a filter or withFilter after groupWhen
Stream.continually(Random.nextInt(100)).iterator
.groupWhen(_ + _ == 100).withFilter(_.length > 1).take(3).toList
gives: List(List(34, 66), List(87, 13), List(97, 3))
You could use method toStream on Iterator.
Stream is a lazy equivalent of List.
As you can see from implementation of toStream it creates a Stream without traversing the whole Iterator.
Stream keeps all element in memory. You should localize usage of link to Stream in some local scope to prevent memory leaking.
With Stream you should use span like this:
val (res1, rest1) = stream.span(_ == 1)
val (res2, rest2) = rest1.span(_ == 2)
I'm guessing a bit here but by the statement "until a condition is met in the next element", it sounds like you might want to look at the groupWhen method on ListOps in scalaz
scala> import scalaz.syntax.std.list._
import scalaz.syntax.std.list._
scala> List(1,1,1,1,2,2,2) groupWhen (_ == _)
res1: List[List[Int]] = List(List(1, 1, 1, 1), List(2, 2, 2))
Basically this "chunks" up the input sequence upon a condition (a (A, A) => Boolean) being met between an element and its successor. In the example above the condition is equality, so, as long as an element is equal to its successor, they will be in the same chunk.

Scala: how to traverse stream/iterator collecting results into several different collections

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}
Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}

Selection Sort Generic type implementation

I worked my way implementing a recursive version of selection and quick sort,i am trying to modify the code in a way that it can sort a list of any generic type , i want to assume that the generic type supplied can be converted to Comparable at runtime.
Does anyone have a link ,code or tutorial on how to do this please
I am trying to modify this particular code
'def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
}
enter code here
def main (args:Array[String]){
val l = List(2,4,5,6,8)
print(quickSort(l))
}
def quickSort(x:List[Int]):List[Int]={
x match{
case xh::xt =>
{
val (first,pivot,second) = partition(x)
quickSort (first):::(pivot :: quickSort(second))
}
case Nil => {x}
}
}
def partition (x:List[Int])=
{
val pivot =x.head
var first:List[Int]=List ()
var second : List[Int]=List ()
val fun=(i:Int)=> {
if (i<pivot)
first=i::first
else
second=i::second
}
x.tail.foreach(fun)
(first,pivot,second)
} '
Language: SCALA
In Scala, Java Comparator is replaced by Ordering (quite similar but comes with more useful methods). They are implemented for several types (primitives, strings, bigDecimals, etc.) and you can provide your own implementations.
You can then use scala implicit to ask the compiler to pick the correct one for you:
def sort[A]( lst: List[A] )( implicit ord: Ordering[A] ) = {
...
}
If you are using a predefined ordering, just call:
sort( myLst )
and the compiler will infer the second argument. If you want to declare your own ordering, use the keyword implicit in the declaration. For instance:
implicit val fooOrdering = new Ordering[Foo] {
def compare( f1: Foo, f2: Foo ) = {...}
}
and it will be implicitly use if you try to sort a List of Foo.
If you have several implementations for the same type, you can also explicitly pass the correct ordering object:
sort( myFooLst )( fooOrdering )
More info in this post.
For Quicksort, I'll modify an example from the "Scala By Example" book to make it more generic.
class Quicksort[A <% Ordered[A]] {
def sort(a:ArraySeq[A]): ArraySeq[A] =
if (a.length < 2) a
else {
val pivot = a(a.length / 2)
sort (a filter (pivot >)) ++ (a filter (pivot == )) ++
sort (a filter(pivot <))
}
}
Test with Int
scala> val quicksort = new Quicksort[Int]
quicksort: Quicksort[Int] = Quicksort#38ceb62f
scala> val a = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39 ,219)
a: scala.collection.mutable.ArraySeq[Int] = ArraySeq(5, 3, 2, 2, 1, 1, 9, 39, 21
9)
scala> quicksort.sort(a).foreach(n=> (print(n), print (" " )))
1 1 2 2 3 5 9 39 219
Test with a custom class implementing Ordered
scala> case class Meh(x: Int, y:Int) extends Ordered[Meh] {
| def compare(that: Meh) = (x + y).compare(that.x + that.y)
| }
defined class Meh
scala> val q2 = new Quicksort[Meh]
q2: Quicksort[Meh] = Quicksort#7677ce29
scala> val a3 = ArraySeq(Meh(1,1), Meh(12,1), Meh(0,1), Meh(2,2))
a3: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(1,1), Meh(12,1), Meh(0
,1), Meh(2,2))
scala> q2.sort(a3)
res7: scala.collection.mutable.ArraySeq[Meh] = ArraySeq(Meh(0,1), Meh(1,1), Meh(
2,2), Meh(12,1))
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler code for the reader. I don't think falling back to imperative style is a mistake for certain classes of problems (such as sorting algorithms which usually transform the input buffer (like a procedure) rather than resulting to a new sorted one
Here it is my solution:
package bitspoke.algo
import scala.math.Ordered
import scala.collection.mutable.Buffer
abstract class Sorter[T <% Ordered[T]] {
// algorithm provided by subclasses
def sort(buffer : Buffer[T]) : Unit
// check if the buffer is sorted
def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 }
// swap elements in buffer
def swap(buffer : Buffer[T], i:Int, j:Int) {
val temp = buffer(i)
buffer(i) = buffer(j)
buffer(j) = temp
}
}
class SelectionSorter[T <% Ordered[T]] extends Sorter[T] {
def sort(buffer : Buffer[T]) : Unit = {
for (i <- 0 until buffer.length) {
var min = i
for (j <- i until buffer.length) {
if (buffer(j) < buffer(min))
min = j
}
swap(buffer, i, min)
}
}
}
As you can see, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to many Scala Implicit Conversions of primitive types to Rich Wrappers.
You can write a client program as follows:
import bitspoke.algo._
import scala.collection.mutable._
val sorter = new SelectionSorter[Int]
val buffer = ArrayBuffer(3, 0, 4, 2, 1)
sorter.sort(buffer)
assert(sorter.sorted(buffer))