Scala: transform a collection, yielding 0..many elements on each iteration

Scala: transform a collection, yielding 0..many elements on each iteration - scala

Given a collection in Scala, I'd like to traverse this collection and for each object I'd like to emit (yield) from 0 to multiple elements that should be joined together into a new collection.
For example, I expect something like this:
val input = Range(0, 15)
val output = input.somefancymapfunction((x) => {
if (x % 3 == 0)
yield(s"${x}/3")
if (x % 5 == 0)
yield(s"${x}/5")
})
to build an output collection that will contain
(0/3, 0/5, 3/3, 5/5, 6/3, 9/3, 10/5, 12/3)
Basically, I want a superset of what filter (1 → 0..1) and map (1 → 1) allows to do: mapping (1 → 0..n).
Solutions I've tried
Imperative solutions
Obviously, it's possible to do so in non-functional maneer, like:
var output = mutable.ListBuffer()
input.foreach((x) => {
if (x % 3 == 0)
output += s"${x}/3"
if (x % 5 == 0)
output += s"${x}/5"
})
Flatmap solutions
I know of flatMap, but it again, either:
1) becomes really ugly if we're talking about arbitrary number of output elements:
val output = input.flatMap((x) => {
val v1 = if (x % 3 == 0) {
Some(s"${x}/3")
} else {
None
}
val v2 = if (x % 5 == 0) {
Some(s"${x}/5")
} else {
None
}
List(v1, v2).flatten
})
2) requires usage of mutable collections inside it:
val output = input.flatMap((x) => {
val r = ListBuffer[String]()
if (x % 3 == 0)
r += s"${x}/3"
if (x % 5 == 0)
r += s"${x}/5"
r
})
which is actually even worse that using mutable collection from the very beginning, or
3) requires major logic overhaul:
val output = input.flatMap((x) => {
if (x % 3 == 0) {
if (x % 5 == 0) {
List(s"${x}/3", s"${x}/5")
} else {
List(s"${x}/3")
}
} else if (x % 5 == 0) {
List(s"${x}/5")
} else {
List()
}
})
which is, IMHO, also looks ugly and requires duplicating the generating code.
Roll-your-own-map-function
Last, but not least, I can roll my own function of that kind:
def myMultiOutputMap[T, R](coll: TraversableOnce[T], func: (T, ListBuffer[R]) => Unit): List[R] = {
val out = ListBuffer[R]()
coll.foreach((x) => func.apply(x, out))
out.toList
}
which can be used almost like I want:
val output = myMultiOutputMap[Int, String](input, (x, out) => {
if (x % 3 == 0)
out += s"${x}/3"
if (x % 5 == 0)
out += s"${x}/5"
})
Am I really overlooking something and there's no such functionality in standard Scala collection libraries?
Similar questions
This question bears some similarity to Can I yield or map one element into many in Scala? — but that question discusses 1 element → 3 elements mapping, and I want 1 element → arbitrary number of elements mapping.
Final note
Please note that this is not the question about division / divisors, such conditions are included purely for illustrative purposes.

Rather than having a separate case for each divisor, put them in a container and iterate over them in a for comprehension:
val output = for {
n <- input
d <- Seq(3, 5)
if n % d == 0
} yield s"$n/$d"
Or equivalently in a collect nested in a flatMap:
val output = input.flatMap { n =>
Seq(3, 5).collect {
case d if n % d == 0 => s"$n/$d"
}
}
In the more general case where the different cases may have different logic, you can put each case in a separate partial function and iterate over the partial functions:
val output = for {
n <- input
f <- Seq[PartialFunction[Int, String]](
{case x if x % 3 == 0 => s"$x/3"},
{case x if x % 5 == 0 => s"$x/5"})
if f.isDefinedAt(n)
} yield f(n)

You can also use some functional library (e.g. scalaz) to express this:
import scalaz._, Scalaz._
def divisibleBy(byWhat: Int)(what: Int): List[String] =
(what % byWhat == 0).option(s"$what/$byWhat").toList
(0 to 15) flatMap (divisibleBy(3) _ |+| divisibleBy(5))
This uses the semigroup append operation |+|. For Lists this operation means a simple list concatenation. So for functions Int => List[String], this append operation will produce a function that runs both functions and appends their results.

If you have complex computation, during which you should sometimes add some elements to operation global accumulator, you can use popular approach named Writer Monad
Preparation in scala is somewhat bulky but results are extremely composable thanks to Monad interface
import scalaz.Writer
import scalaz.syntax.writer._
import scalaz.syntax.monad._
import scalaz.std.vector._
import scalaz.syntax.traverse._
type Messages[T] = Writer[Vector[String], T]
def yieldW(a: String): Messages[Unit] = Vector(a).tell
val output = Vector.range(0, 15).traverse { n =>
yieldW(s"$n / 3").whenM(n % 3 == 0) >>
yieldW(s"$n / 5").whenM(n % 5 == 0)
}.run._1

Here is my proposition for a custom function, might be better with pimp my library pattern
def fancyMap[A, B](list: TraversableOnce[A])(fs: (A => Boolean, A => B)*) = {
def valuesForElement(elem: A) = fs collect { case (predicate, mapper) if predicate(elem) => mapper(elem) }
list flatMap valuesForElement
}
fancyMap[Int, String](0 to 15)((_ % 3 == 0, _ + "/3"), (_ % 5 == 0, _ + "/5"))

You can try collect:
val input = Range(0,15)
val output = input.flatMap { x =>
List(3,5) collect { case n if (x%n == 0) => s"${x}/${n}" }
}
System.out.println(output)

I would us a fold:
val input = Range(0, 15)
val output = input.foldLeft(List[String]()) {
case (acc, value) =>
val acc1 = if (value % 3 == 0) s"$value/3" :: acc else acc
val acc2 = if (value % 5 == 0) s"$value/5" :: acc1 else acc1
acc2
}.reverse
output contains
List(0/3, 0/5, 3/3, 5/5, 6/3, 9/3, 10/5, 12/3)
A fold takes an accumumlator (acc), a collection, and a function. The function is called with the initial value of the accumumator, in this case an empty List[String], and each value of the collection. The function should return an updated collection.
On each iteration, we take the growing accumulator and, if the inside if statements are true, prepend the calculation to the new accumulator. The function finally returns the updated accumulator.
When the fold is done, it returns the final accumulator, but unfortunately, it is in reverse order. We simply reverse the accumulator with .reverse.
There is a nice paper on folds: A tutorial on the universality and expressiveness of fold, by Graham Hutton.

Related

Scala: Best way to filter & map in one iteration

I'm new to Scala and trying to figure out the best way to filter & map a collection. Here's a toy example to explain my problem.
Approach 1: This is pretty bad since I'm iterating through the list twice and calculating the same value in each iteration.
val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums filter { x: Int => (x * x) > N } map { x: Int => (x * x).toString }
Approach 2: This is slightly better but I still need to calculate (x * x) twice.
val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums collect { case x: Int if (x * x) > N => (x * x).toString }
So, is it possible to calculate this without iterating through the collection twice and avoid repeating the same calculations?

Could use a foldRight
nums.foldRight(List.empty[Int]) {
case (i, is) =>
val s = i * i
if (s > N) s :: is else is
}
A foldLeft would also achieve a similar goal, but the resulting list would be in reverse order (due to the associativity of foldLeft.
Alternatively if you'd like to play with Scalaz
import scalaz.std.list._
import scalaz.syntax.foldable._
nums.foldMap { i =>
val s = i * i
if (s > N) List(s) else List()
}

The typical approach is to use an iterator (if possible) or view (if iterator won't work). This doesn't exactly avoid two traversals, but it does avoid creation of a full-sized intermediate collection. You then map first and filter afterwards and then map again if needed:
xs.iterator.map(x => x*x).filter(_ > N).map(_.toString)
The advantage of this approach is that it's really easy to read and, since there are no intermediate collections, it's reasonably efficient.
If you are asking because this is a performance bottleneck, then the answer is usually to write a tail-recursive function or use the old-style while loop method. For instance, in your case
def sumSqBigN(xs: Array[Int], N: Int): Array[String] = {
val ysb = Array.newBuilder[String]
def inner(start: Int): Array[String] = {
if (start >= xs.length) ysb.result
else {
val sq = xs(start) * xs(start)
if (sq > N) ysb += sq.toString
inner(start + 1)
}
}
inner(0)
}
You can also pass a parameter forward in inner instead of using an external builder (especially useful for sums).

I have yet to confirm that this is truly a single pass, but:
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
if (square > N) Some(x) else None
}

You can use collect which applies a partial function to every value of the collection that it's defined at. Your example could be rewritten as follows:
val sqNumsLargerThanN = nums collect {
case (x: Int) if (x * x) > N => (x * x).toString
}

A very simple approach that only does the multiplication operation once. It's also lazy, so it will be executing code only when needed.
nums.view.map(x=>x*x).withFilter(x => x> N).map(_.toString)
Take a look here for differences between filter and withFilter.

Consider this for comprehension,
for (x <- 0 until 10; v = x*x if v > N) yield v.toString
which unfolds to a flatMap over the range and a (lazy) withFilter onto the once only calculated square, and yields a collection with filtered results. To note one iteration and one calculation of square is required (in addition to creating the range).

You can use flatMap.
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
if (square > N) Some(square.toString) else None
}
Or with Scalaz,
import scalaz.Scalaz._
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
(square > N).option(square.toString)
}
The solves the asked question of how to do this with one iteration. This can be useful when streaming data, like with an Iterator.
However...if you are instead wanting the absolute fastest implementation, this is not it. In fact, I suspect you would use a mutable ArrayList and a while loop. But only after profiling would you know for sure. In any case, that's for another question.

Using a for comprehension would work:
val sqNumsLargerThanN = for {x <- nums if x*x > N } yield (x*x).toString
Also, I'm not sure but I think the scala compiler is smart about a filter before a map and will only do 1 pass if possible.

I am also beginner did it as follows
for(y<-(num.map(x=>x*x)) if y>5 ) { println(y)}

Scala - can 'for-yield' clause yields nothing for some condition?

In Scala language, I want to write a function that yields odd numbers within a given range. The function prints some log when iterating even numbers. The first version of the function is:
def getOdds(N: Int): Traversable[Int] = {
val list = new mutable.MutableList[Int]
for (n <- 0 until N) {
if (n % 2 == 1) {
list += n
} else {
println("skip even number " + n)
}
}
return list
}
If I omit printing logs, the implementation become very simple:
def getOddsWithoutPrint(N: Int) =
for (n <- 0 until N if (n % 2 == 1)) yield n
However, I don't want to miss the logging part. How do I rewrite the first version more compactly? It would be great if it can be rewritten similar to this:
def IWantToDoSomethingSimilar(N: Int) =
for (n <- 0 until N) if (n % 2 == 1) yield n else println("skip even number " + n)

def IWantToDoSomethingSimilar(N: Int) =
for {
n <- 0 until N
if n % 2 != 0 || { println("skip even number " + n); false }
} yield n
Using filter instead of a for expression would be slightly simpler though.

I you want to keep the sequentiality of your traitement (processing odds and evens in order, not separately), you can use something like that (edited) :
def IWantToDoSomethingSimilar(N: Int) =
(for (n <- (0 until N)) yield {
if (n % 2 == 1) {
Option(n)
} else {
println("skip even number " + n)
None
}
// Flatten transforms the Seq[Option[Int]] into Seq[Int]
}).flatten
EDIT, following the same concept, a shorter solution :
def IWantToDoSomethingSimilar(N: Int) =
(0 until N) map {
case n if n % 2 == 0 => println("skip even number "+ n)
case n => n
} collect {case i:Int => i}

If you will to dig into a functional approach, something like the following is a good point to start.
First some common definitions:
// use scalaz 7
import scalaz._, Scalaz._
// transforms a function returning either E or B into a
// function returning an optional B and optionally writing a log of type E
def logged[A, E, B, F[_]](f: A => E \/ B)(
implicit FM: Monoid[F[E]], FP: Pointed[F]): (A => Writer[F[E], Option[B]]) =
(a: A) => f(a).fold(
e => Writer(FP.point(e), None),
b => Writer(FM.zero, Some(b)))
// helper for fixing the log storage format to List
def listLogged[A, E, B](f: A => E \/ B) = logged[A, E, B, List](f)
// shorthand for a String logger with List storage
type W[+A] = Writer[List[String], A]
Now all you have to do is write your filtering function:
def keepOdd(n: Int): String \/ Int =
if (n % 2 == 1) \/.right(n) else \/.left(n + " was even")
You can try it instantly:
scala> List(5, 6) map(keepOdd)
res0: List[scalaz.\/[String,Int]] = List(\/-(5), -\/(6 was even))
Then you can use the traverse function to apply your function to a list of inputs, and collect both the logs written and the results:
scala> val x = List(5, 6).traverse[W, Option[Int]](listLogged(keepOdd))
x: W[List[Option[Int]]] = scalaz.WriterTFunctions$$anon$26#503d0400
// unwrap the results
scala> x.run
res11: (List[String], List[Option[Int]]) = (List(6 was even),List(Some(5), None))
// we may even drop the None-s from the output
scala> val (logs, results) = x.map(_.flatten).run
logs: List[String] = List(6 was even)
results: List[Int] = List(5)

I don't think this can be done easily with a for comprehension. But you could use partition.
def getOffs(N:Int) = {
val (evens, odds) = 0 until N partition { x => x % 2 == 0 }
evens foreach { x => println("skipping " + x) }
odds
}
EDIT: To avoid printing the log messages after the partitioning is done, you can change the first line of the method like this:
val (evens, odds) = (0 until N).view.partition { x => x % 2 == 0 }

Scala List.filter with two conditions, applied only once

Don't know if this is possible, but I have some code like this:
val list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
val evens = list.filter { e => e % 2 == 0 }
if(someCondition) {
val result = evens.filter { e => e % 3 == 0 }
} else {
val result = evens.filter { e => e % 5 == 0 }
}
But I don't want to iterate over all elements twice, so is there a way that I can create a "generic pick-all-the-evens numbers on this collection" and apply some other function, so that it would only iterate once?

If you turn list into a lazy collection, such as an Iterator, then you can apply all the filter operations (or other things like map etc) in one pass:
val list = (1 to 12).toList
val doubleFiltered: List[Int] =
list.iterator
.filter(_ % 2 == 0)
.filter(_ % 3 == 0)
.toList
println(doubleFiltered)
When you convert the collection to an Iterator with .iterator, Scala will keep track of the operations to be performed (here, two filters), but will wait to perform them until the result is actually accessed (here, via the call to .toList).
So I might rewrite your code like this:
val list = (1 to 12).toList
val evens = list.iterator.filter(_ % 2 == 0)
val result =
if(someCondition)
evens.filter(_ % 3 == 0)
else
evens.filter(_ % 5 == 0)
result foreach println
Depending on exactly what you want to do, you might want an Iterator, a Stream, or a View. They are all lazily computed (so the one-pass aspect will apply), but they differ on things like whether they can be iterated over multiple times (Stream and View) or whether they keep the computed value around for later access (Stream).
To really see these different lazy behaviors, try running this bit of code and set <OPERATION> to either toList, iterator, view, or toStream:
val result =
(1 to 12).<OPERATION>
.filter { e => println("filter 1: " + e); e % 2 == 0 }
.filter { e => println("filter 2: " + e); e % 3 == 0 }
result foreach println
result foreach println
Here's the behavior you will see:
List (or any other non-lazy collection): Each filter is requires a separate iteration through the collection. The resulting filtered collection is stored in memory so that each foreach can just display it.
Iterator: Both filters and the first foreach are done in a single iteration. The second foreach does nothing since the Iterator has been consumed. Results are not stored in memory.
View: Both foreach calls result in their own single-pass iteration over the collection to perform the filters. Results are not stored in memory.
Stream: Both filters and the first foreach are done in a single iteration. The resulting filtered collection is stored in memory so that each foreach can just display it.

You could use function composition. someCondition here is only called once, when deciding which function to compose with:
def modN(n: Int)(xs: List[Int]) = xs filter (_ % n == 0)
val f = modN(2) _ andThen (if (someCondition) modN(3) else modN(5))
val result = f(list)
(This doesn't do what you want - it still traverses the list twice)
Just do this:
val f: Int => Boolean = if (someCondition) { _ % 3 == 0 } else { _ % 5 == 0 }
val result = list filter (x => x % 2 == 0 && f(x))
or maybe better:
val n = if (someCondition) 3 else 5
val result = list filter (x => x % 2 == 0 && x % n == 0)

Wouldn't this work:
list.filter{e => e % 2 == 0 && (if (someCondition) e % 3 == 0 else e % 5 == 0)}
also FYI e % 2 == 0 is going to give you all the even numbers, unless you're naming the val odds for another reason.

You just write two conditions in the filter:
val list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
var result = List(0)
val someCondition = true
result = if (someCondition) list.filter { e => e % 2 == 0 && e % 3 == 0 }
else list.filter { e => e % 2 == 0 && e % 5 == 0 }

Converting a sequence of map operations to a for-comprehension

I read in Programming in Scala section 23.5 that map, flatMap and filter operations can always be converted into for-comprehensions and vice-versa.
We're given the following equivalence:
def map[A, B](xs: List[A], f: A => B): List[B] =
for (x <- xs) yield f(x)
I have a value calculated from a series of map operations:
val r = (1 to 100).map{ i => (1 to 100).map{i % _ == 0} }
.map{ _.foldLeft(false)(_^_) }
.map{ case true => "open"; case _ => "closed" }
I'm wondering what this would look like as a for-comprehension. How do I translate it?
(If it's helpful, in words this is:
take integers from 1 to 100
for each, create a list of 100 boolean values
fold each list with an XOR operator, back into a boolean
yield a list of 100 Strings "open" or "closed" depending on the boolean
I imagine there is a standard way to translate map operations and the details of the actual functions in them is not important. I could be wrong though.)

Is this the kind of translation you're looking for?
for (i <- 1 to 100;
val x = (1 to 100).map(i % _ == 0);
val y = x.foldLeft(false)(_^_);
val z = y match { case true => "open"; case _ => "closed" })
yield z
If desired, the map in the definition of x could also be translated to an "inner" for-comprehension.
In retrospect, a series of chained map calls is sort of trivial, in that you could equivalently call map once with composed functions:
s.map(f).map(g).map(h) == s.map(f andThen g andThen h)
I find for-comprehensions to be a bigger win when flatMap and filter are involved. Consider
for (i <- 1 to 3;
j <- 1 to 3 if (i + j) % 2 == 0;
k <- 1 to 3) yield i ^ j ^ k
versus
(1 to 3).flatMap { i =>
(1 to 3).filter(j => (i + j) % 2 == 0).flatMap { j =>
(1 to 3).map { k => i ^ j ^ k }
}
}

Scala performance - Sieve

Right now, I am trying to learn Scala . I've started small, writing some simple algorithms . I've encountered some problems when I wanted to implement the Sieve algorithm from finding all all prime numbers lower than a certain threshold .
My implementation is:
import scala.math
object Sieve {
// Returns all prime numbers until maxNum
def getPrimes(maxNum : Int) = {
def sieve(list: List[Int], stop : Int) : List[Int] = {
list match {
case Nil => Nil
case h :: list if h <= stop => h :: sieve(list.filterNot(_ % h == 0), stop)
case _ => list
}
}
val stop : Int = math.sqrt(maxNum).toInt
sieve((2 to maxNum).toList, stop)
}
def main(args: Array[String]) = {
val ap = printf("%d ", (_:Int));
// works
getPrimes(1000).foreach(ap(_))
// works
getPrimes(100000).foreach(ap(_))
// out of memory
getPrimes(1000000).foreach(ap(_))
}
}
Unfortunately it fails when I want to computer all the prime numbers smaller than 1000000 (1 million) . I am receiving OutOfMemory .
Do you have any idea on how to optimize the code, or how can I implement this algorithm in a more elegant fashion .
PS: I've done something very similar in Haskell, and there I didn't encountered any issues .

I would go with an infinite Stream. Using a lazy data structure allows to code pretty much like in Haskell. It reads automatically more "declarative" than the code you wrote.
import Stream._
val primes = 2 #:: sieve(3)
def sieve(n: Int) : Stream[Int] =
if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
else n #:: sieve(n + 2)
def getPrimes(maxNum : Int) = primes.takeWhile(_ < maxNum)
Obviously, this isn't the most performant approach. Read The Genuine Sieve of Eratosthenes for a good explanation (it's Haskell, but not too difficult). For real big ranges you should consider the Sieve of Atkin.

The code in question is not tail recursive, so Scala cannot optimize the recursion away. Also, Haskell is non-strict by default, so you can't hardly compare it to Scala. For instance, whereas Haskell benefits from foldRight, Scala benefits from foldLeft.
There are many Scala implementations of Sieve of Eratosthenes, including some in Stack Overflow. For instance:
(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) => if (ps(x)) ps -- (x * x to n by x) else ps))

The following answer is about a 100 times faster than the "one-liner" answer using a Set (and the results don't need sorting to ascending order) and is more of a functional form than the other answer using an array although it uses a mutable BitSet as a sieving array:
object SoE {
def makeSoE_Primes(top: Int): Iterator[Int] = {
val topndx = (top - 3) / 2
val nonprms = new scala.collection.mutable.BitSet(topndx + 1)
def cullp(i: Int) = {
import scala.annotation.tailrec; val p = i + i + 3
#tailrec def cull(c: Int): Unit = if (c <= topndx) { nonprms += c; cull(c + p) }
cull((p * p - 3) >>> 1)
}
(0 to (Math.sqrt(top).toInt - 3) >>> 1).filterNot { nonprms }.foreach { cullp }
Iterator.single(2) ++ (0 to topndx).filterNot { nonprms }.map { i: Int => i + i + 3 }
}
}
It can be tested by the following code:
object Main extends App {
import SoE._
val top_num = 10000000
val strt = System.nanoTime()
val count = makeSoE_Primes(top_num).size
val end = System.nanoTime()
println(s"Successfully completed without errors. [total ${(end - strt) / 1000000} ms]")
println(f"Found $count primes up to $top_num" + ".")
println("Using one large mutable1 BitSet and functional code.")
}
With the results from the the above as follows:
Successfully completed without errors. [total 294 ms]
Found 664579 primes up to 10000000.
Using one large mutable BitSet and functional code.
There is an overhead of about 40 milliseconds for even small sieve ranges, and there are various non-linear responses with increasing range as the size of the BitSet grows beyond the different CPU caches.

It looks like List isn't very effecient space wise. You can get an out of memory exception by doing something like this
1 to 2000000 toList

I "cheated" and used a mutable array. Didn't feel dirty at all.
def primesSmallerThan(n: Int): List[Int] = {
val nonprimes = Array.tabulate(n + 1)(i => i == 0 || i == 1)
val primes = new collection.mutable.ListBuffer[Int]
for (x <- nonprimes.indices if !nonprimes(x)) {
primes += x
for (y <- x * x until nonprimes.length by x if (x * x) > 0) {
nonprimes(y) = true
}
}
primes.toList
}