I'm trying to understand for comprehensions in Scala, and I have a lot of examples that I sort of understand...
One thing I'm having a hard time figuring out is for ( ) vs for { }. I've tried both, and it seems like I can do one thing in one but it breaks in the other.
For example, this does NOT work:
def encode(number: String): Set[List[String]] =
if (number.isEmpty) Set(List())
else {
for (
split <- 1 to number.length
word <- wordsForNum(number take split)
rest <- encode(number drop split)
) yield word :: rest
}.toSet
However, if you change it to { }, it does compile:
def encode(number: String): Set[List[String]] =
if (number.isEmpty) Set(List())
else {
for {
split <- 1 to number.length
word <- wordsForNum(number take split)
rest <- encode(number drop split)
} yield word :: rest
}.toSet
These examples are from a Coursera class I'm taking. The professor didn't mention the "why" in the video & I was wondering if anyone else knows.
Thanks!
From the syntax in the spec, it might seem that parens and braces are interchangeable:
http://www.scala-lang.org/files/archive/spec/2.11/06-expressions.html#for-comprehensions-and-for-loops
but because the generators are separated by semis, the following rules kick in:
http://www.scala-lang.org/files/archive/spec/2.11/01-lexical-syntax.html#newline-characters
I have read and understood that section in the past, from which I vaguely recall the gist that newlines are enabled in the braces, which is to say, a newline char is taken as nl which serves as a semi.
So you can put the generators on separate lines instead of using semicolons.
This is the usual "semicolon inference" that lets you not write semicolons as statement terminators. So the newline in the middle of the generator is not taken as a semi, for instance:
scala> for (c <-
| List(1,2,3)
| ) yield c+1
res0: List[Int] = List(2, 3, 4)
scala> for { c <-
| List(1,2,3)
| i = c+1
| } yield i
res1: List[Int] = List(2, 3, 4)
In Scala () are usually for when you only have one statement. Something like this would have worked:
def encode(number: String): Set[Int] =
if (number.isEmpty) Set()
else {
for (
split <- 1 to number.length // Only one statement in the comprehension
) split
}.toSet
Add one more and it would fail to compile. The same is true for map for example
OK
List(1,2,3).map(number =>
number.toString
)
Not OK (have to use curly braces)
List(1,2,3).map(number =>
println("Hello world")
number.toString
)
Why that is. I have no idea :)
Related
In the book "Scala for the impatient", it says on page 16
In Scala, a { } block contains a sequence of expressions, and the
result is also an expression. The value of the block is the value of
the last expression.
OK, then let's create a block and let the last value of the block be assigned:
scala> val evens = for (elem <- 1 to 10 if elem%2==0) {
| elem
| }
val evens: Unit = ()
I would have expected that evens is at least the last value of the sequence (i.e. 10). But why not?
You need to yield the value, then it's a for expression:
val evens = for (elem <- 1 to 10 if elem % 2 == 0) yield elem
Without that it's just a statement (does not return anything) and is translated to foreach.
P.S.: Of course this will return a collection of all the elements that fulfill the predicate and not the last one.
When in doubt just run it through the typechecker to peek under the hood
scala -Xprint:typer -e 'val evens = for (elem <- 1 to 10 if elem%2==0) { elem }'
reveals
val evens: Unit =
scala.Predef
.intWrapper(1)
.to(10)
.withFilter(((elem: Int) => elem.%(2).==(0)))
.foreach[Int](((elem: Int) => elem))
where we see foreach to be the last step in the chain, and its signature is
def foreach[U](f: A => U): Unit
where we see it returns Unit. You can even do this straight from within the REPL by executing the following command
scala> :settings -Xprint:typer
and now you will get real-time desugaring of Scala expressions at the same time they are interpreted. You can even take it a step further and get at the JVM bytecode itself
scala> :javap -
For-comprehensions are some of the most prevalent syntactic sugar in Scala so I would suggest to drill them as much as possible by perhaps trying to write them at the same time in both their suggared and desugared from until it clicks: https://docs.scala-lang.org/tutorials/FAQ/yield.html
Unit is the exception to the rule stated in your book. Unit basically says "ignore whatever type the block would have returned because I only intended to execute the block for the side effects." Otherwise, in order to get it to typecheck, you'd have to add a unit value to the end of any block that was supposed to return Unit:
val evens = for (elem <- 1 to 10 if elem%2==0) {
elem
()
}
This throwing away of type information is one reason people tend to avoid imperative for loops and similar in Scala.
I am trying to write idiomatic scala code to loop through two lists of lists, and to generate a new list containing only the differences of the two lists.
In procedural Scala I would do something like this:
val first: List[List[Int]] = List(List(1,2,3,4,5),List(1,2,3,4,5), List(1,2,3,4,5))
val second: List[List[Int]] = List(List(1,2,3,4,5),List(1,23,3,45,5),List(1,2,3,4,5))
var diff: List[String] = List[String]()
for (i <- List.range(0, first.size)){
for (j <- List.range(0, first(0).size)){
println(first(i)(j) + " " + second(i)(j))
if (first(i)(j) != second(i)(j)) diff = diff ::: (s"${second(i)(j)}" :: Nil)
}
}
Of course I do not like this, I have attempted to write a solution using for comprehension, but without success.
The closest thing I could get to was this:
for {(lf,ls) <- (first zip second) } yield if (lf == ls) lf else ls
but from that for comprehension I can not generate a list of String being of a different type from the input one.
Any suggestion?
The idiomatic Scala would be something like this:
(
for {
(row1, row2) <- (first, second).zipped // go through rows with the same index
(value1, value2) <- (row1, row2).zipped // go through values with the same indexes
if value1 != value2 // leave only different values in the list
} yield value2.toString
).toList
It's better to use zipped, than zip here, because zipped doesn't produce the whole zipped List in memory.
Also, you have to do toList in the end, due to a quirk in type inference.
Something like this produces the same results
val diff = for {
firstInner <- first
v1 <- firstInner
secondInner <- second
v2 <- secondInner
} yield if (v1 != v2) s"$v2}"
println(diff2 mkString ", ") // prints 23, 45
But this ofcourse fails with IndexOutOfBoundsException if the array sizes isn't of the same size.
I have a huge file (does not fit into memory) which is tab separated with two columns (key and value), and pre-sorted on the key column. I need to call a function on all values for a key and write out the result. For simplicity, one can assume that the values are numbers and the function is addition.
So, given an input:
A 1
A 2
B 1
B 3
The output would be:
A 3
B 4
For this question, I'm not so much interested in reading/writing the file, but more in the list comprehension side. It is important though that the whole content (input as well as output) doesn't fit into memory. I'm new to Scala, and coming from Java I'm interested what would be the functional/Scala way to do that.
Update:
Based on AmigoNico's comment, I came up with the below constant memory solution.
Any comments / improvements are appreciated!
val writeAggr = (kv : (String, Int)) => {println(kv._1 + " " + kv._2)}
writeAggr(
( ("", 0) /: scala.io.Source.fromFile("/tmp/xx").getLines ) { (keyAggr, line) =>
val Array(k,v) = line split ' '
if (keyAggr._1.equals(k)) {
(k, keyAggr._2 + v.toInt)
} else {
if (!keyAggr._1.equals("")) {
writeAggr(keyAggr)
}
(k, v.toInt)
}
}
)
This can be done quite elegantly with Scalaz streams (and unlike iterator-based solutions, it's "truly" functional):
import scalaz.stream._
val process =
io.linesR("input.txt")
.map { _.split("\\s") }
.map { case Array(k, v) => k -> v.toInt }
.pipe(process1.chunkBy2(_._1 == _._1))
.map { kvs => s"${ kvs.head._1 } ${ kvs.map(_._2).sum }\n" }
.pipe(text.utf8Encode)
.to(io.fileChunkW("output.txt"))
Not only will this read from the input, aggregate the lines, and write to the output in constant memory, but you also get nice guarantees about resource management that e.g. source.getLines can't offer.
You probably want to use a fold, like so:
scala> ( ( Map[String,Int]() withDefaultValue 0 ) /: scala.io.Source.fromFile("/tmp/xx").getLines ) { (map,line) =>
val Array(k,v) = line split ' '
map + ( k -> ( map(k) + v.toInt ) )
}
res12: scala.collection.immutable.Map[String,Int] = Map(A -> 3, B -> 4)
Folds are great for accumulating results (unlike for-comprehensions). And since getLines returns an Iterator, only one line is held in memory at a time.
UPDATE: OK, there is a new requirement that we not hold the results in memory either. In that case I think I'd just write a recursive function and use it like so:
scala> val kvPairs = scala.io.Source.fromFile("/tmp/xx").getLines map { line =>
val Array(k,v) = line split ' '
( k, v.toInt )
}
kvPairs: Iterator[(String, Int)] = non-empty iterator
scala> final def loop( key:String, soFar:Int ) {
if ( kvPairs.hasNext ) {
val (k,v) = kvPairs.next
if ( k == key )
loop( k, soFar+v )
else {
println( s"$key $soFar" )
loop(k,v)
}
} else println( s"$key $soFar" )
}
loop: (key: String, soFar: Int)Unit
scala> val (k,v) = kvPairs.next
k: String = A
v: Int = 1
scala> loop(k,v)
A 3
B 4
But the only thing functional about that is that it uses a recursive function rather than a loop. If you are OK with holding all of the values for a particular key in memory you could write a function that iterates over the lines of the file producing an Iterator of Iterators of like-keyed pairs, which you could then just sum and print, but the code would still not be particularly functional and it would be slower.
Travis's Scalaz pipeline solution looks like an interesting one along those lines, but with the iteration hidden behind some handy constructs. If you specifically want a functional solution, I'd say his is the best answer.
Now, it took me a while to figure out why my recursion is somehow managing to blow the stack. Here it is, the part causing this problem:
scala> for {
| i <- List(1, 2, 3)
| j = { println("why am I evaluated?"); 10 } if false
| } yield (i, j)
why am I evaluated?
why am I evaluated?
why am I evaluated?
res0: List[(Int, Int)] = List()
Isn't this, like, insane? Why at all evaluate j = ... if it ends in if false and so will never be used?
What happens when instead of { println ... } you have a recursive call (and recursion guard instead of if false), I have learned. :<
Why?!
I'm going to go out on a limb and say the accepted answer could say more.
This is a parser bug.
Guards can immediately follow a generator, but otherwise a semi is required (actual or inferred).
Here is the syntax.
In the following, the line for res4 should not compile.
scala> for (i <- (1 to 5).toList ; j = 2 * i if j > 4) yield j
res4: List[Int] = List(6, 8, 10)
scala> for (i <- (1 to 5).toList ; j = 2 * i ; if j > 4) yield j
res5: List[Int] = List(6, 8, 10)
What happens is that the val def of j gets merged with the i generator to make a new generator of pairs (i,j). Then the guard looks like it just follows the (synthetic) generator.
But the syntax is still wrong. Syntax is our friend! It was our BFF long before the type system.
On the line for res5, it's pretty obvious that the guard does not guard the val def.
Update:
The implementation bug was downgraded (or upgraded, depending on your perspective) to a specification bug.
Checking for this usage, where a guard looks like a trailing if controlling the valdef that precedes it, like in Perl, falls under the purview of your favorite style checker.
If you structure your loop like this, it will solve your problem:
scala> for {
| i <- List(1, 2, 3)
| if false
| j = { println("why am I evaluated?"); 10 }
| } yield (i, j)
res0: List[(Int, Int)] = List()
Scala syntax in a for-loop treats the if statement as a sort of filter; this tutorial has some good examples.
One way to think of it is to walk through the for loop imperatively, and when you reach an if statement, if that statement evaluates to false, you continue to the next iteration of the loop.
When I have questions like that I seek to see how the disassembled code looks like (feeding the .class files to JD-GUI for instance).
The beginning of this for-comprehension disassembled code looks like this:
((TraversableLike)List..MODULE$.apply(Predef..MODULE$.wrapIntArray(new int[] { 1, 2, 3 })).map(new AbstractFunction1() { public static final long serialVersionUID = 0L;
public final Tuple2<Object, BoxedUnit> apply(int i) { Predef..MODULE$.println("why am I evaluated?"); BoxedUnit j = BoxedUnit.UNIT;
return new Tuple2(BoxesRunTime.boxToInteger(i),
j);
}
}...//continues
where we can see that the array of ints in the i parameter maps to an AbstractFunction1() whose apply method first performs the println nomatter what and then allocates Unit to the parameter j finally returning a tuple of two(i,j) to further pipe it into further filter/map operations (omitted). So essentially the if false condition doesn't have any effect and essentially is removed by the compiler.
I understand Ruby and Python's yield. What does Scala's yield do?
I think the accepted answer is great, but it seems many people have failed to grasp some fundamental points.
First, Scala's for comprehensions are equivalent to Haskell's do notation, and it is nothing more than a syntactic sugar for composition of multiple monadic operations. As this statement will most likely not help anyone who needs help, let's try again… :-)
Scala's for comprehensions is syntactic sugar for composition of multiple operations with map, flatMap and filter. Or foreach. Scala actually translates a for-expression into calls to those methods, so any class providing them, or a subset of them, can be used with for comprehensions.
First, let's talk about the translations. There are very simple rules:
This
for(x <- c1; y <- c2; z <-c3) {...}
is translated into
c1.foreach(x => c2.foreach(y => c3.foreach(z => {...})))
This
for(x <- c1; y <- c2; z <- c3) yield {...}
is translated into
c1.flatMap(x => c2.flatMap(y => c3.map(z => {...})))
This
for(x <- c; if cond) yield {...}
is translated on Scala 2.7 into
c.filter(x => cond).map(x => {...})
or, on Scala 2.8, into
c.withFilter(x => cond).map(x => {...})
with a fallback into the former if method withFilter is not available but filter is. Please see the section below for more information on this.
This
for(x <- c; y = ...) yield {...}
is translated into
c.map(x => (x, ...)).map((x,y) => {...})
When you look at very simple for comprehensions, the map/foreach alternatives look, indeed, better. Once you start composing them, though, you can easily get lost in parenthesis and nesting levels. When that happens, for comprehensions are usually much clearer.
I'll show one simple example, and intentionally omit any explanation. You can decide which syntax was easier to understand.
l.flatMap(sl => sl.filter(el => el > 0).map(el => el.toString.length))
or
for {
sl <- l
el <- sl
if el > 0
} yield el.toString.length
withFilter
Scala 2.8 introduced a method called withFilter, whose main difference is that, instead of returning a new, filtered, collection, it filters on-demand. The filter method has its behavior defined based on the strictness of the collection. To understand this better, let's take a look at some Scala 2.7 with List (strict) and Stream (non-strict):
scala> var found = false
found: Boolean = false
scala> List.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
7
9
scala> found = false
found: Boolean = false
scala> Stream.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
The difference happens because filter is immediately applied with List, returning a list of odds -- since found is false. Only then foreach is executed, but, by this time, changing found is meaningless, as filter has already executed.
In the case of Stream, the condition is not immediatelly applied. Instead, as each element is requested by foreach, filter tests the condition, which enables foreach to influence it through found. Just to make it clear, here is the equivalent for-comprehension code:
for (x <- List.range(1, 10); if x % 2 == 1 && !found)
if (x == 5) found = true else println(x)
for (x <- Stream.range(1, 10); if x % 2 == 1 && !found)
if (x == 5) found = true else println(x)
This caused many problems, because people expected the if to be considered on-demand, instead of being applied to the whole collection beforehand.
Scala 2.8 introduced withFilter, which is always non-strict, no matter the strictness of the collection. The following example shows List with both methods on Scala 2.8:
scala> var found = false
found: Boolean = false
scala> List.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
7
9
scala> found = false
found: Boolean = false
scala> List.range(1,10).withFilter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
This produces the result most people expect, without changing how filter behaves. As a side note, Range was changed from non-strict to strict between Scala 2.7 and Scala 2.8.
It is used in sequence comprehensions (like Python's list-comprehensions and generators, where you may use yield too).
It is applied in combination with for and writes a new element into the resulting sequence.
Simple example (from scala-lang)
/** Turn command line arguments to uppercase */
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.toString)
}
}
The corresponding expression in F# would be
[ for a in args -> a.toUpperCase ]
or
from a in args select a.toUpperCase
in Linq.
Ruby's yield has a different effect.
Yes, as Earwicker said, it's pretty much the equivalent to LINQ's select and has very little to do with Ruby's and Python's yield. Basically, where in C# you would write
from ... select ???
in Scala you have instead
for ... yield ???
It's also important to understand that for-comprehensions don't just work with sequences, but with any type which defines certain methods, just like LINQ:
If your type defines just map, it allows for-expressions consisting of a
single generator.
If it defines flatMap as well as map, it allows for-expressions consisting
of several generators.
If it defines foreach, it allows for-loops without yield (both with single and multiple generators).
If it defines filter, it allows for-filter expressions starting with an if
in the for expression.
Unless you get a better answer from a Scala user (which I'm not), here's my understanding.
It only appears as part of an expression beginning with for, which states how to generate a new list from an existing list.
Something like:
var doubled = for (n <- original) yield n * 2
So there's one output item for each input (although I believe there's a way of dropping duplicates).
This is quite different from the "imperative continuations" enabled by yield in other languages, where it provides a way to generate a list of any length, from some imperative code with almost any structure.
(If you're familiar with C#, it's closer to LINQ's select operator than it is to yield return).
Consider the following for-comprehension
val A = for (i <- Int.MinValue to Int.MaxValue; if i > 3) yield i
It may be helpful to read it out loud as follows
"For each integer i, if it is greater than 3, then yield (produce) i and add it to the list A."
In terms of mathematical set-builder notation, the above for-comprehension is analogous to
which may be read as
"For each integer , if it is greater than , then it is a member of the set ."
or alternatively as
" is the set of all integers , such that each is greater than ."
The keyword yield in Scala is simply syntactic sugar which can be easily replaced by a map, as Daniel Sobral already explained in detail.
On the other hand, yield is absolutely misleading if you are looking for generators (or continuations) similar to those in Python. See this SO thread for more information: What is the preferred way to implement 'yield' in Scala?
Yield is similar to for loop which has a buffer that we cannot see and for each increment, it keeps adding next item to the buffer. When the for loop finishes running, it would return the collection of all the yielded values. Yield can be used as simple arithmetic operators or even in combination with arrays.
Here are two simple examples for your better understanding
scala>for (i <- 1 to 5) yield i * 3
res: scala.collection.immutable.IndexedSeq[Int] = Vector(3, 6, 9, 12, 15)
scala> val nums = Seq(1,2,3)
nums: Seq[Int] = List(1, 2, 3)
scala> val letters = Seq('a', 'b', 'c')
letters: Seq[Char] = List(a, b, c)
scala> val res = for {
| n <- nums
| c <- letters
| } yield (n, c)
res: Seq[(Int, Char)] = List((1,a), (1,b), (1,c), (2,a), (2,b), (2,c), (3,a), (3,b), (3,c))
Hope this helps!!
val aList = List( 1,2,3,4,5 )
val res3 = for ( al <- aList if al > 3 ) yield al + 1
val res4 = aList.filter(_ > 3).map(_ + 1)
println( res3 )
println( res4 )
These two pieces of code are equivalent.
val res3 = for (al <- aList) yield al + 1 > 3
val res4 = aList.map( _+ 1 > 3 )
println( res3 )
println( res4 )
These two pieces of code are also equivalent.
Map is as flexible as yield and vice-versa.
val doubledNums = for (n <- nums) yield n * 2
val ucNames = for (name <- names) yield name.capitalize
Notice that both of those for-expressions use the yield keyword:
Using yield after for is the “secret sauce” that says, “I want to yield a new collection from the existing collection that I’m iterating over in the for-expression, using the algorithm shown.”
taken from here
According to the Scala documentation, it clearly says "yield a new collection from the existing collection".
Another Scala documentation says, "Scala offers a lightweight notation for expressing sequence comprehensions. Comprehensions have the form for (enums) yield e, where enums refers to a semicolon-separated list of enumerators. An enumerator is either a generator which introduces new variables, or it is a filter. "
yield is more flexible than map(), see example below
val aList = List( 1,2,3,4,5 )
val res3 = for ( al <- aList if al > 3 ) yield al + 1
val res4 = aList.map( _+ 1 > 3 )
println( res3 )
println( res4 )
yield will print result like: List(5, 6), which is good
while map() will return result like: List(false, false, true, true, true), which probably is not what you intend.