Are chained maps optimized by compiler? - scala

Scala has an amazing way of converting a collection into another collection using map construct.
val l = List(1, 2, 3, 4)
l.map(_*_)
will return the squares of the elements in list l
I come across various instances where multiple maps are chained together say,
val l = List(1, 2, 3, 4)
val res = l.map(_ * _).map(_ + 1).filter(_ < 3)
What i believe happens underneath is equivalent to something below.
val l = List(1, 2, 3, 4)
val l1 = l.map(_*_)
val l2 = l1.map(_ + 1)
val res = l2.filter(_ < 3)
creating l1 and l2 might cause memory issues if the collection is too big.
To tackle this problem, does Scala compiler have any optimizations?
val l = List(1, 2, 3, 4)
val res = l1.map( _*_ + 1).filter(_ < 3)
in general if f, g, h are functions
val l = List(/*something*/)
val res = l.map(f(_)).map(g(_)).map(h(_))
can be converted into
val res = l.map(f _ andThen g _ andThen h _)

Scala offers Stream, which is a lazy ordered collection.
val s = Stream(1, 2, 3, 4)
// note i've changed your sequence of transformations
// a bit, so that it compiles and yields more than one result
val res = s.map(i => i * i).map(_ + 1).filter(_ < 11)
res is now a Stream. No actual evaluation has been performed yet, no blocks of memory related to the size of s have been used.
If you intend to use the elements of res one at a time, no more work is required. You can use res in a for statement or comprehension directly, for example.
for ( elem <- res ) println( s"A value is ${elem}" )
If you want res as a List, you can just call .toList at the end of the sequence of transformations. Instead of the above, use
val res = s.map(i => i * i).map(_ + 1).filter(_ < 11).toList
s will only be traversed once in creating the new List.

No, because this would require the compiler to know about the semantics of map and treat the standard library classes which implement it specially (since nobody stops you from writing a class where this doesn't hold). There is a research proposal which might end up implementing this... eventually.
There is also Scala-Blitz which optimizes some collection operations, but fusion and deforestation are listed as future work in this presentation and I don't think they are implemented yet.
As Steve Waldman's answer says, using Stream (or, better yet, Iterator) can help, but it won't eliminate the intermediate collections completely.

Related

andThen in List scala

Has anyone got an example of how to use andThen with Lists? I notice that andThen is defined for List but the documentations hasn't got an example to show how to use it.
My understanding is that f andThen g means that execute function f and then execute function g. The input of function g is output of function f. Is this correct?
Question 1 - I have written the following code but I do not see why I should use andThen because I can achieve the same result with map.
scala> val l = List(1,2,3,4,5)
l: List[Int] = List(1, 2, 3, 4, 5)
//simple function that increments value of element of list
scala> def f(l:List[Int]):List[Int] = {l.map(x=>x-1)}
f: (l: List[Int])List[Int]
//function which decrements value of elements of list
scala> def g(l:List[Int]):List[Int] = {l.map(x=>x+1)}
g: (l: List[Int])List[Int]
scala> val p = f _ andThen g _
p: List[Int] => List[Int] = <function1>
//printing original list
scala> l
res75: List[Int] = List(1, 2, 3, 4, 5)
//p works as expected.
scala> p(l)
res74: List[Int] = List(1, 2, 3, 4, 5)
//but I can achieve the same with two maps. What is the point of andThen?
scala> l.map(x=>x+1).map(x=>x-1)
res76: List[Int] = List(1, 2, 3, 4, 5)
Could someone share practical examples where andThen is more useful than methods like filter, map etc. One use I could see above is that with andThen, I could create a new function,p, which is a combination of other functions. But this use brings out usefulness of andThen, not List and andThen
andThen is inherited from PartialFunction a few parents up the inheritance tree for List. You use List as a PartialFunction when you access its elements by index. That is, you can think of a List as a function from an index (from zero) to the element that occupies that index within the list itself.
If we have a list:
val list = List(1, 2, 3, 4)
We can call list like a function (because it is one):
scala> list(0)
res5: Int = 1
andThen allows us to compose one PartialFunction with another. For example, perhaps I want to create a List where I can access its elements by index, and then multiply the element by 2.
val list2 = list.andThen(_ * 2)
scala> list2(0)
res7: Int = 2
scala> list2(1)
res8: Int = 4
This is essentially the same as using map on the list, except the computation is lazy. Of course, you could accomplish the same thing with a view, but there might be some generic case where you'd want to treat the List as just a PartialFunction, instead (I can't think of any off the top of my head).
In your code, you aren't actually using andThen on the List itself. Rather, you're using it for functions that you're passing to map, etc. There is no difference in the results between mapping a List twice over f and g and mapping once over f andThen g. However, using the composition is preferred when mapping multiple times becomes expensive. In the case of Lists, traversing multiple times can become a tad computationally expensive when the list is large.
With the solution l.map(x=>x+1).map(x=>x-1) you are traversing the list twice.
When composing 2 functions using the andThen combinator and then applying it to the list, you only traverse the list once.
val h = ((x:Int) => x+1).andThen((x:Int) => x-1)
l.map(h) //traverses it only once

For until square root

In Scala, I want to write the equivalent of the following C++ code:
for(int i = 1 ; i * i < n ; i++)
So far I did this, but it looks ugly and I think it goes up until n:
for(i <- 1 to n
if(i * i < n))
Is there a nicer way of writing this code?
Not nicer but different approach
Using a stream
(1 to n).toStream map (i => i * i) takeWhile (_ < n)
Example for n = 100
scala> val res = (1 to 100).toStream map(i => i * i) takeWhile (_ < 100)
res: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> res.toList
res16: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81)
Explanation
A Stream allows to request values on demand, i.e. lazy evaluation. So the function that is mapped will only be applied when the next value is requested.
First of all, declare a function to generate a lazy stream of squares:
def squares(i: Int = 1): Stream[Int] = Stream.cons(i * i, squares(i + 1))
then use takeWhile to get the value when i * i is smaller than n. For example:
scala> squares().takeWhile(_ < 50).foreach(println)
1
4
9
16
25
36
49
The solution you have might not be the nicest but it might be the most efficient, everything else is internally more complicated, so it might have some overhead. (In most situations not a notable overhead though, and it might be optimized very well.)
I would not go for the solution using Streams suggested in an other answer. While streams are computed lazily, they do cache the computed results, which is not required in this case and might take a lot of memory if the range iterated over is large. Instead I would use an Iterator. Operations on Iterators are typically lazy as well and do not cache anything.
If you need this more often, you could add an other "operator" using an implicit class like this:
implicit class UntilHelper(start: Int) {
def aslong(cond: Int => Boolean) =
Iterator.from(start).takeWhile(cond)
}
Your loop then looks like this:
for(i <- 1 aslong (Math.pow(_, 2) < 1000)) {
println(i)
}
From a quick micro-benchmark it looks like this is about 3 times faster than the stream solution and a little bit slower than a simple while loop. These things are however notoriously hard to measure without any context.
Remark on computing squares
A nice way of computing a sequence of Squares is by adding the difference between squares. This can be done using the scanLeft method on a Stream or an Iterator.
val itr = Iterator.from(1).scanLeft(1)((a,b)=>a + 2*b+1)
println(itr.take(10).toList)

Scala - increasing prefix of a sequence

I was wondering what is the most elegant way of getting the increasing prefix of a given sequence. My idea is as follows, but it is not purely functional or any elegant:
val sequence = Seq(1,2,3,1,2,3,4,5,6)
var currentElement = sequence.head - 1
val increasingPrefix = sequence.takeWhile(e =>
if (e > currentElement) {
currentElement = e
true
} else
false)
The result of the above is:
List(1,2,3)
You can take your solution, #Samlik, and effectively zip in the currentElement variable, but then map it out when you're done with it.
sequence.take(1) ++ sequence.zip(sequence.drop(1)).
takeWhile({case (a, b) => a < b}).map({case (a, b) => b})
Also works with infinite sequences:
val sequence = Seq(1, 2, 3).toStream ++ Stream.from(1)
sequence is now an infinite Stream, but we can peek at the first 10 items:
scala> sequence.take(10).toList
res: List[Int] = List(1, 2, 3, 1, 2, 3, 4, 5, 6, 7)
Now, using the above snippet:
val prefix = sequence.take(1) ++ sequence.zip(sequence.drop(1)).
takeWhile({case (a, b) => a < b}).map({case (a, b) => b})
Again, prefix is a Stream, but not infinite.
scala> prefix.toList
res: List[Int] = List(1, 2, 3)
N.b.: This does not handle the cases when sequence is empty, or when the prefix is also infinite.
If by elegant you mean concise and self-explanatory, it's probably something like the following:
sequence.inits.dropWhile(xs => xs != xs.sorted).next
inits gives us an iterator that returns the prefixes longest-first. We drop all the ones that aren't sorted and take the next one.
If you don't want to do all that sorting, you can write something like this:
sequence.scanLeft(Some(Int.MinValue): Option[Int]) {
case (Some(last), i) if i > last => Some(i)
case _ => None
}.tail.flatten
If the performance of this operation is really important, though (it probably isn't), you'll want to use something more imperative, since this solution still traverses the entire collection (twice).
And, another way to skin the cat:
val sequence = Seq(1,2,3,1,2,3,4,5,6)
sequence.head :: sequence
.sliding(2)
.takeWhile{case List(a,b) => a <= b}
.map(_(1)).toList
// List[Int] = List(1, 2, 3)
I will interpret elegance as the solution that most closely resembles the way we humans think about the problem although an extremely efficient algorithm could also be a form of elegance.
val sequence = List(1,2,3,2,3,45,5)
val increasingPrefix = takeWhile(sequence, _ < _)
I believe this code snippet captures the way most of us probably think about the solution to this problem.
This of course requires defining takeWhile:
/**
* Takes elements from a sequence by applying a predicate over two elements at a time.
* #param xs The list to take elements from
* #param f The predicate that operates over two elements at a time
* #return This function is guaranteed to return a sequence with at least one element as
* the first element is assumed to satisfy the predicate as there is no previous
* element to provide the predicate with.
*/
def takeWhile[A](xs: Traversable[A], f: (Int, Int) => Boolean): Traversable[A] = {
// function that operates over tuples and returns true when the predicate does not hold
val not = f.tupled.andThen(!_)
// Maybe one day our languages will be better than this... (dependant types anyone?)
val twos = sequence.sliding(2).map{case List(one, two) => (one, two)}
val indexOfBreak = twos.indexWhere(not)
// Twos has one less element than xs, we need to compensate for that
// An intuition is the fact that this function should always return the first element of
// a non-empty list
xs.take(i + 1)
}

General comprehensions in Scala

As far as I understand, the Scala for-comprehension notation relies on the first generator to define how elements are to be combined. Namely, for (i <- list) yield i returns a list and for (i <- set) yield i returns a set.
I was wondering if there was a way to specify how elements are combined independently of the properties of the first generator. For instance, I would like to get "the set of all elements from a given list", or "the sum of all elements from a given set". The only way I have found is to first build a list or a set as prescribed by the for-comprehension notation, then apply a transformation function to it - building a useless data structure in the process.
What I have in mind is a general "algebraic" comprehension notation as it exists for instance in Ateji PX:
`+ { i | int i : set } // the sum of all elements from a given set
set() { i | int i : list } // the set of all elements from a given list
concat(",") { s | String s : list } // string concatenation with a separator symbol
Here the first element (`+, set(), concat(",")) is a so-called "monoid" that defines how elements are combined, independently of the structure of the first generator (there can be multiple generators and filters, I just tried to keep the examples concise).
Any idea about how to achieve a similar result in Scala while keeping a nice and concise notation ? As far as I understand, the for-comprehension notation is hard-wired in the compiler and cannot be upgraded.
Thanks for your feedback.
About the for comprehension
The for comprehension in scala is syntactic sugar for calls to flatMap, filter, map and foreach. In exactly the same way as calls to those methods, the type of the target collection leads to the type of the returned collection. That is:
list map f //is a List
vector map f // is a Vector
This property is one of the underlying design goals of the scala collections library and would be seen as desirable in most situations.
Answering the question
You do not need to construct any intermediate collection of course:
(list.view map (_.prop)).toSet //uses list.view
(list.iterator map (_.prop)).toSet //uses iterator
(for { l <- list.view} yield l.prop).toSet //uses view
(Set.empty[Prop] /: coll) { _ + _.prop } //uses foldLeft
Will all yield Sets without generating unnecessary collections. My personal preference is for the first. In terms of idiomatic scala collection manipulation, each "collection" comes with these methods:
//Conversions
toSeq
toSet
toArray
toList
toIndexedSeq
iterator
toStream
//Strings
mkString
//accumulation
sum
The last is used where the element type of a collection has an implicit Numeric instance in scope; such as:
Set(1, 2, 3, 4).sum //10
Set('a, 'b).sum //does not compile
Note that the String concatenation example in scala looks like:
list.mkString(",")
And in the scalaz FP library might look something like (which uses Monoid to sum Strings):
list.intercalate(",").asMA.sum
Your suggestions do not look anything like Scala; I'm not sure whether they are inspired by another language.
foldLeft? That's what you're describing.
The sum of all elements from a given set:
(0 /: Set(1,2,3))(_ + _)
the set of all elements from a given list
(Set[Int]() /: List(1,2,3,2,1))((acc,x) => acc + x)
String concatenation with a separator symbol:
("" /: List("a", "b"))(_ + _) // (edit - ok concat a bit more verbose:
("" /: List("a", "b"))((acc,x) => acc + (if (acc == "") "" else ",") + x)
You can also force the result type of the for comprehension by explicitly supplying the implicit CanBuildFrom parameter as scala.collection.breakout and specifying the result type.
Consider this REPL session:
scala> val list = List(1, 1, 2, 2, 3, 3)
list: List[Int] = List(1, 1, 2, 2, 3, 3)
scala> val res = for(i <- list) yield i
res: List[Int] = List(1, 1, 2, 2, 3, 3)
scala> val res: Set[Int] = (for(i <- list) yield i)(collection.breakOut)
res: Set[Int] = Set(1, 2, 3)
It results in a type error when not specifying the CanBuildFrom explicitly:
scala> val res: Set[Int] = for(i <- list) yield i
<console>:8: error: type mismatch;
found : List[Int]
required: Set[Int]
val res: Set[Int] = for(i <- list) yield i
^
For a deeper understanding of this I suggest the following read:
http://www.scala-lang.org/docu/files/collections-api/collections-impl.html
If you want to use for comprehensions and still be able to combine your values in some result value you could do the following.
case class WithCollector[B, A](init: B)(p: (B, A) => B) {
var x: B = init
val collect = { (y: A) => { x = p(x, y) } }
def apply(pr: (A => Unit) => Unit) = {
pr(collect)
x
}
}
// Some examples
object Test {
def main(args: Array[String]): Unit = {
// It's still functional
val r1 = WithCollector[Int, Int](0)(_ + _) { collect =>
for (i <- 1 to 10; if i % 2 == 0; j <- 1 to 3) collect(i + j)
}
println(r1) // 120
import collection.mutable.Set
val r2 = WithCollector[Set[Int], Int](Set[Int]())(_ += _) { collect =>
for (i <- 1 to 10; if i % 2 == 0; j <- 1 to 3) collect(i + j)
}
println(r2) // Set(9, 10, 11, 6, 13, 4, 12, 3, 7, 8, 5)
}
}

What is Scala's yield?

I understand Ruby and Python's yield. What does Scala's yield do?
I think the accepted answer is great, but it seems many people have failed to grasp some fundamental points.
First, Scala's for comprehensions are equivalent to Haskell's do notation, and it is nothing more than a syntactic sugar for composition of multiple monadic operations. As this statement will most likely not help anyone who needs help, let's try again… :-)
Scala's for comprehensions is syntactic sugar for composition of multiple operations with map, flatMap and filter. Or foreach. Scala actually translates a for-expression into calls to those methods, so any class providing them, or a subset of them, can be used with for comprehensions.
First, let's talk about the translations. There are very simple rules:
This
for(x <- c1; y <- c2; z <-c3) {...}
is translated into
c1.foreach(x => c2.foreach(y => c3.foreach(z => {...})))
This
for(x <- c1; y <- c2; z <- c3) yield {...}
is translated into
c1.flatMap(x => c2.flatMap(y => c3.map(z => {...})))
This
for(x <- c; if cond) yield {...}
is translated on Scala 2.7 into
c.filter(x => cond).map(x => {...})
or, on Scala 2.8, into
c.withFilter(x => cond).map(x => {...})
with a fallback into the former if method withFilter is not available but filter is. Please see the section below for more information on this.
This
for(x <- c; y = ...) yield {...}
is translated into
c.map(x => (x, ...)).map((x,y) => {...})
When you look at very simple for comprehensions, the map/foreach alternatives look, indeed, better. Once you start composing them, though, you can easily get lost in parenthesis and nesting levels. When that happens, for comprehensions are usually much clearer.
I'll show one simple example, and intentionally omit any explanation. You can decide which syntax was easier to understand.
l.flatMap(sl => sl.filter(el => el > 0).map(el => el.toString.length))
or
for {
sl <- l
el <- sl
if el > 0
} yield el.toString.length
withFilter
Scala 2.8 introduced a method called withFilter, whose main difference is that, instead of returning a new, filtered, collection, it filters on-demand. The filter method has its behavior defined based on the strictness of the collection. To understand this better, let's take a look at some Scala 2.7 with List (strict) and Stream (non-strict):
scala> var found = false
found: Boolean = false
scala> List.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
7
9
scala> found = false
found: Boolean = false
scala> Stream.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
The difference happens because filter is immediately applied with List, returning a list of odds -- since found is false. Only then foreach is executed, but, by this time, changing found is meaningless, as filter has already executed.
In the case of Stream, the condition is not immediatelly applied. Instead, as each element is requested by foreach, filter tests the condition, which enables foreach to influence it through found. Just to make it clear, here is the equivalent for-comprehension code:
for (x <- List.range(1, 10); if x % 2 == 1 && !found)
if (x == 5) found = true else println(x)
for (x <- Stream.range(1, 10); if x % 2 == 1 && !found)
if (x == 5) found = true else println(x)
This caused many problems, because people expected the if to be considered on-demand, instead of being applied to the whole collection beforehand.
Scala 2.8 introduced withFilter, which is always non-strict, no matter the strictness of the collection. The following example shows List with both methods on Scala 2.8:
scala> var found = false
found: Boolean = false
scala> List.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
7
9
scala> found = false
found: Boolean = false
scala> List.range(1,10).withFilter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
This produces the result most people expect, without changing how filter behaves. As a side note, Range was changed from non-strict to strict between Scala 2.7 and Scala 2.8.
It is used in sequence comprehensions (like Python's list-comprehensions and generators, where you may use yield too).
It is applied in combination with for and writes a new element into the resulting sequence.
Simple example (from scala-lang)
/** Turn command line arguments to uppercase */
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.toString)
}
}
The corresponding expression in F# would be
[ for a in args -> a.toUpperCase ]
or
from a in args select a.toUpperCase
in Linq.
Ruby's yield has a different effect.
Yes, as Earwicker said, it's pretty much the equivalent to LINQ's select and has very little to do with Ruby's and Python's yield. Basically, where in C# you would write
from ... select ???
in Scala you have instead
for ... yield ???
It's also important to understand that for-comprehensions don't just work with sequences, but with any type which defines certain methods, just like LINQ:
If your type defines just map, it allows for-expressions consisting of a
single generator.
If it defines flatMap as well as map, it allows for-expressions consisting
of several generators.
If it defines foreach, it allows for-loops without yield (both with single and multiple generators).
If it defines filter, it allows for-filter expressions starting with an if
in the for expression.
Unless you get a better answer from a Scala user (which I'm not), here's my understanding.
It only appears as part of an expression beginning with for, which states how to generate a new list from an existing list.
Something like:
var doubled = for (n <- original) yield n * 2
So there's one output item for each input (although I believe there's a way of dropping duplicates).
This is quite different from the "imperative continuations" enabled by yield in other languages, where it provides a way to generate a list of any length, from some imperative code with almost any structure.
(If you're familiar with C#, it's closer to LINQ's select operator than it is to yield return).
Consider the following for-comprehension
val A = for (i <- Int.MinValue to Int.MaxValue; if i > 3) yield i
It may be helpful to read it out loud as follows
"For each integer i, if it is greater than 3, then yield (produce) i and add it to the list A."
In terms of mathematical set-builder notation, the above for-comprehension is analogous to
which may be read as
"For each integer , if it is greater than , then it is a member of the set ."
or alternatively as
" is the set of all integers , such that each is greater than ."
The keyword yield in Scala is simply syntactic sugar which can be easily replaced by a map, as Daniel Sobral already explained in detail.
On the other hand, yield is absolutely misleading if you are looking for generators (or continuations) similar to those in Python. See this SO thread for more information: What is the preferred way to implement 'yield' in Scala?
Yield is similar to for loop which has a buffer that we cannot see and for each increment, it keeps adding next item to the buffer. When the for loop finishes running, it would return the collection of all the yielded values. Yield can be used as simple arithmetic operators or even in combination with arrays.
Here are two simple examples for your better understanding
scala>for (i <- 1 to 5) yield i * 3
res: scala.collection.immutable.IndexedSeq[Int] = Vector(3, 6, 9, 12, 15)
scala> val nums = Seq(1,2,3)
nums: Seq[Int] = List(1, 2, 3)
scala> val letters = Seq('a', 'b', 'c')
letters: Seq[Char] = List(a, b, c)
scala> val res = for {
| n <- nums
| c <- letters
| } yield (n, c)
res: Seq[(Int, Char)] = List((1,a), (1,b), (1,c), (2,a), (2,b), (2,c), (3,a), (3,b), (3,c))
Hope this helps!!
val aList = List( 1,2,3,4,5 )
val res3 = for ( al <- aList if al > 3 ) yield al + 1
val res4 = aList.filter(_ > 3).map(_ + 1)
println( res3 )
println( res4 )
These two pieces of code are equivalent.
val res3 = for (al <- aList) yield al + 1 > 3
val res4 = aList.map( _+ 1 > 3 )
println( res3 )
println( res4 )
These two pieces of code are also equivalent.
Map is as flexible as yield and vice-versa.
val doubledNums = for (n <- nums) yield n * 2
val ucNames = for (name <- names) yield name.capitalize
Notice that both of those for-expressions use the yield keyword:
Using yield after for is the “secret sauce” that says, “I want to yield a new collection from the existing collection that I’m iterating over in the for-expression, using the algorithm shown.”
taken from here
According to the Scala documentation, it clearly says "yield a new collection from the existing collection".
Another Scala documentation says, "Scala offers a lightweight notation for expressing sequence comprehensions. Comprehensions have the form for (enums) yield e, where enums refers to a semicolon-separated list of enumerators. An enumerator is either a generator which introduces new variables, or it is a filter. "
yield is more flexible than map(), see example below
val aList = List( 1,2,3,4,5 )
val res3 = for ( al <- aList if al > 3 ) yield al + 1
val res4 = aList.map( _+ 1 > 3 )
println( res3 )
println( res4 )
yield will print result like: List(5, 6), which is good
while map() will return result like: List(false, false, true, true, true), which probably is not what you intend.