scala yield gives unexpected result - scala

I tried following two versions of code to understand how yield in scala works. I am unable to understand why I am getting two different results
In this version, I call yield and the expression is simply multiply by 2. I get a Vector of 1st 10 multiples of 2. It makes sense to me
scala> val r = for (j <- 1 to 10) yield {
| (j*2).toString //multiply by 2
| }
r: scala.collection.immutable.IndexedSeq[String] = Vector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20) // got multiples of 2. This looks ok
Interestingly in this version, all I have done is store the multiplication by 2 in a val. But now I get an empty vector! Why is this?
scala> val r = for (j <- 1 to 10) yield {
| val prod = (j*2).toString //multiply by 2 but store in a val
| }
r: scala.collection.immutable.IndexedSeq[Unit] = Vector((), (), (), (), (), (), (), (), (), ()) //this is empty
I thought that maybe the val prod = (j*2).toString results in a Unit result but when I try following independent expression on scala interpreter, I can see that prod is a String
scala> val prod = 2.toString()
prod: String = 2

In Scala every expression return something. But assignments are designed to return Unit for performance reasons (see What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?).
In your snippet, in last example REPL says prod: String = 2, meaning that prod has some value, that's done for your convenience. But try { val prod = 2.toString() }:
scala> { val prod = 2.toString() }
scala> println(res0)
()
() is the only possible value of Unit type.
(I'm not sure why res0 was not shown after first like, because resN-like values in REPL collect all non-assigned explicitly results)

Related

Need some clarity on for loop usage in Spark scala

I am trying to run below code to create pair using spark rdd, when I am the code only for one mapping it's working fine but when I am using for loop to iterate over all the elements then I am not getting the expected output.
val file = sc.textFile("filepath")
file.collect.foreach(println)
1,Abc,300
2,Def,200
3,Xyz,400
file.map(x => x.split(",")).map(x => (x(0)->x(1))).collect.foreach(println)
Output is coming as expected :-
(1,Abc)
(2,Def)
(3,Xyz)
Using for loop:-
file.map(x => x.split(",")).map(x => {
for(i <- 0 to 2){
x(0) -> x(i)
}
}).collect.foreach(println)
Output is coming as (which is not the expected output):-
()
()
()
Expected output is:-
(1,1)
(2,2)
(3,3)
(1,Abc)
(2,Def)
(3,Xyz)
(1,300)
(2,200)
(3,400)
tried using yield in for loop but getting some syntax errors.
First, let me explain the output you obtain. A for loop simply returns an object of type Unit, regardless of what's in it. Here is a way to verify that using the REPL:
scala> val test = for(i<- 0 to 2) { i }
test: Unit = ()
NB: () is the only object of type Unit
If you want to change that, you need to use yield as you suggest it. Here is an example:
scala> val test = for(i<- 0 to 2) yield { i }
test: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 2)
That's more like it.
In your case, adding yield is not enough. It would yield collections of tuples like this:
Vector((1,1), (1,Abc), (1,300))
Vector((2,2), (2,Def), (2,200))
Vector((3,3), (3,Xyz), (3,400))
What you need is to use is the flatMap function which will flatten the collections (i.e. it transforms a RDD of collections of elements into a RDD of elements).
file.map(x => x.split(",")).flatMap(x => {
for(i <- 0 to 2) yield {
x(0) -> x(i)
}
}).collect.foreach(println)
which gives you what you expect:
(1,1)
(1,Abc)
(1,300)
(2,2)
(2,Def)
(2,200)
(3,3)
(3,Xyz)
(3,400)

Strange (?) for comprehension evaluation in Scala

Now, it took me a while to figure out why my recursion is somehow managing to blow the stack. Here it is, the part causing this problem:
scala> for {
| i <- List(1, 2, 3)
| j = { println("why am I evaluated?"); 10 } if false
| } yield (i, j)
why am I evaluated?
why am I evaluated?
why am I evaluated?
res0: List[(Int, Int)] = List()
Isn't this, like, insane? Why at all evaluate j = ... if it ends in if false and so will never be used?
What happens when instead of { println ... } you have a recursive call (and recursion guard instead of if false), I have learned. :<
Why?!
I'm going to go out on a limb and say the accepted answer could say more.
This is a parser bug.
Guards can immediately follow a generator, but otherwise a semi is required (actual or inferred).
Here is the syntax.
In the following, the line for res4 should not compile.
scala> for (i <- (1 to 5).toList ; j = 2 * i if j > 4) yield j
res4: List[Int] = List(6, 8, 10)
scala> for (i <- (1 to 5).toList ; j = 2 * i ; if j > 4) yield j
res5: List[Int] = List(6, 8, 10)
What happens is that the val def of j gets merged with the i generator to make a new generator of pairs (i,j). Then the guard looks like it just follows the (synthetic) generator.
But the syntax is still wrong. Syntax is our friend! It was our BFF long before the type system.
On the line for res5, it's pretty obvious that the guard does not guard the val def.
Update:
The implementation bug was downgraded (or upgraded, depending on your perspective) to a specification bug.
Checking for this usage, where a guard looks like a trailing if controlling the valdef that precedes it, like in Perl, falls under the purview of your favorite style checker.
If you structure your loop like this, it will solve your problem:
scala> for {
| i <- List(1, 2, 3)
| if false
| j = { println("why am I evaluated?"); 10 }
| } yield (i, j)
res0: List[(Int, Int)] = List()
Scala syntax in a for-loop treats the if statement as a sort of filter; this tutorial has some good examples.
One way to think of it is to walk through the for loop imperatively, and when you reach an if statement, if that statement evaluates to false, you continue to the next iteration of the loop.
When I have questions like that I seek to see how the disassembled code looks like (feeding the .class files to JD-GUI for instance).
The beginning of this for-comprehension disassembled code looks like this:
((TraversableLike)List..MODULE$.apply(Predef..MODULE$.wrapIntArray(new int[] { 1, 2, 3 })).map(new AbstractFunction1() { public static final long serialVersionUID = 0L;
public final Tuple2<Object, BoxedUnit> apply(int i) { Predef..MODULE$.println("why am I evaluated?"); BoxedUnit j = BoxedUnit.UNIT;
return new Tuple2(BoxesRunTime.boxToInteger(i),
j);
}
}...//continues
where we can see that the array of ints in the i parameter maps to an AbstractFunction1() whose apply method first performs the println nomatter what and then allocates Unit to the parameter j finally returning a tuple of two(i,j) to further pipe it into further filter/map operations (omitted). So essentially the if false condition doesn't have any effect and essentially is removed by the compiler.

Scala for comprehension of sequence inside a Try

I am writing a Scala program in which there is an operation that creates a sequence. The operation might fail, so I enclose it inside a Try. I want to do sequence creation and enumeration inside a for comprehension, so that a successfully-created sequence yields a sequence of tuples where the first element is the sequence and the second is an element of it.
To simplify the problem, make my sequence a Range of integers and define a createRange function that fails if it is asked to create a range of an odd length. Here is a simple for comprehension that does what I want.
import scala.util.Try
def createRange(n: Int): Try[Range] = {
Try {
if (n % 2 == 1) throw new Exception
else Range(0, n)
}
}
def rangeElements(n: Int) {
for {
r <- createRange(n)
x <- r
} println(s"$r\t$x")
}
def main(args: Array[String]) {
println("Range length 3")
rangeElements(3)
println("Range length 4")
rangeElements(4)
}
If you run this it correctly prints.
Range length 3
Range length 4
Range(0, 1, 2, 3) 0
Range(0, 1, 2, 3) 1
Range(0, 1, 2, 3) 2
Range(0, 1, 2, 3) 3
Now I would like to rewrite my rangeElements function so that instead of printing as a side-effect it returns a sequence of integers, where the sequence is empty if the range was not created. What I want to write is this.
def rangeElements(n: Int):Seq[(Range,Int)] = {
for {
r <- createRange(n)
x <- r
} yield (r, x)
}
// rangeElements(3) returns an empty sequence
// rangeElements(4) returns the sequence (Range(0,1,2,3), 0), (Range(0,1,2,3), 1) etc.
This gives me two type mismatch compiler errors. The r <- createRange(n) line required Seq[Int] but found scala.util.Try[Nothing]. The x <- r line required scala.util.Try[?] but found scala.collection.immutable.IndexedSeq[Int].
Presumably there is some type erasure with the Try that is messing me up, but I can't figure out what it is. I've tried various toOption and toSeq qualifiers on the lines in the for comprehension to no avail.
If I only needed to yield the range elements I could explicitly handle the Success and Failure conditions of createRange myself as suggested by the first two answers below. However, I need access to both the range and its individual elements.
I realize this is a strange-sounding example. The real problem I am trying to solve is a complicated recursive search, but I don't want to add in all its details because that would just confuse the issue here.
How do I write rangeElements to yield the desired sequences?
The problem becomes clear if you translate the for comprehension to its map/flatMap implementation (as described in the Scala Language Spec 6.19). The flatMap has the result type Try[U] but your function expects Seq[Int].
for {
r <- createRange(n)
x <- r
} yield x
createRange(n).flatMap {
case r => r.map {
case x => x
}
}
Is there any reason why you don't use the getOrElse method?
def rangeElements(n: Int):Seq[Int] =
createRange(n) getOrElse Seq.empty
The Try will be Success with a Range when n is even or a Failure with an Exception when n is odd. In rangeElements match and extract those values. Success will contain the valid Range and Failure will contain the Exception. Instead of returning the Exception return an empty Seq.
import scala.util.{Try, Success, Failure}
def createRange(n: Int): Try[Range] = {
Try {
if (n % 2 == 1) throw new Exception
else Range(0, n)
}
}
def rangeElements(n: Int):Seq[Tuple2[Range, Int]] = createRange(n) match {
case Success(s) => s.map(xs => (s, xs))
case Failure(f) => Seq()
}
scala> rangeElements(3)
res35: Seq[(Range, Int)] = List()
scala> rangeElements(4)
res36: Seq[(Range, Int)] = Vector((Range(0, 1, 2, 3),0), (Range(0, 1, 2, 3),1), (Range(0, 1, 2, 3),2), (Range(0, 1, 2,3),3))

Scala finding more elegant way

I am new to Scala and functional programming.
I was solving problem where you have to read number, and then that number of integers. After that you should calculate sum of all digits in all the integers.
Here is my code
def sumDigits(line: String) =
line.foldLeft(0)(_ + _.toInt - '0'.toInt)
def main(args: Array[String]) {
val numberOfLines = Console.readInt
val lines = for (i <- 1 to numberOfLines) yield Console.readLine
println(lines.foldLeft(0)( _ + sumDigits(_)))
}
Is there more elegant or efficient way?
sumDigits() can be implemented easier with sum:
def sumDigits(line: String) = line.map(_.asDigit).sum
Second foldLeft() can also be replaced with sum:
lines.map(sumDigits).sum
Which brings us to the final version (notice there is no main, instead with extend App):
object Main extends App {
def sumDigits(line: String) = line.map(_.asDigit).sum
val lines = for (_ <- 1 to Console.readInt) yield Console.readLine
println(lines.map(sumDigits).sum)
}
Or if you really want to squeeze as much as possible in one line, inline sumDigits (not recommended):
lines.map(_.map(_.asDigit).sum).sum
I like compact code, so I might (if I was really going for brevity)
object Reads extends App {
import Console._
println( Seq.fill(readInt){readLine.map(_ - '0').sum}.sum )
}
which sets the number of lines inline and does the processing as you go. No error checking, though. You could throw in a .filter(_.isDigit) right after the readLine to at least discard non-digits. You might also def p[A](a: A) = { println(a); a } and wrap the reads in p so you can see what had been typed (by default on some platforms at least there's no echo to screen).
One-liner Answer:
Iterator.continually(Console.readLine).take(Console.readInt).toList.flatten.map(_.asDigit).sum
To start with, you have to do some kind of parsing on line to break apart the existing decimal integers sub-strings:
val numbers = "5 1 4 9 16 25"
val ints = numbers.split("\\s+").toList.map(_.toInt)
Then you want to pull off the first one as the count and keep the rest to decode and sum:
val count :: numbers = ints
Then use the built-in sum method:
val sum = numbers.sum
Altogether in the REPL:
scala> val numbers = "5 1 4 9 16 25"
numbers: String = 5 1 4 9 16 25
scala> val ints = numbers.split("\\s+").toList.map(_.toInt)
ints: List[Int] = List(5, 1, 4, 9, 16, 25)
scala> val count :: numbers = ints
count: Int = 5
numbers: List[Int] = List(1, 4, 9, 16, 25)
scala> val sum = numbers.sum
sum: Int = 55
If you want to do something with the leading number count, you could verify that it's correct:
scala> assert(count == numbers.length)
Which produces no output, since the assertion passes.

General comprehensions in Scala

As far as I understand, the Scala for-comprehension notation relies on the first generator to define how elements are to be combined. Namely, for (i <- list) yield i returns a list and for (i <- set) yield i returns a set.
I was wondering if there was a way to specify how elements are combined independently of the properties of the first generator. For instance, I would like to get "the set of all elements from a given list", or "the sum of all elements from a given set". The only way I have found is to first build a list or a set as prescribed by the for-comprehension notation, then apply a transformation function to it - building a useless data structure in the process.
What I have in mind is a general "algebraic" comprehension notation as it exists for instance in Ateji PX:
`+ { i | int i : set } // the sum of all elements from a given set
set() { i | int i : list } // the set of all elements from a given list
concat(",") { s | String s : list } // string concatenation with a separator symbol
Here the first element (`+, set(), concat(",")) is a so-called "monoid" that defines how elements are combined, independently of the structure of the first generator (there can be multiple generators and filters, I just tried to keep the examples concise).
Any idea about how to achieve a similar result in Scala while keeping a nice and concise notation ? As far as I understand, the for-comprehension notation is hard-wired in the compiler and cannot be upgraded.
Thanks for your feedback.
About the for comprehension
The for comprehension in scala is syntactic sugar for calls to flatMap, filter, map and foreach. In exactly the same way as calls to those methods, the type of the target collection leads to the type of the returned collection. That is:
list map f //is a List
vector map f // is a Vector
This property is one of the underlying design goals of the scala collections library and would be seen as desirable in most situations.
Answering the question
You do not need to construct any intermediate collection of course:
(list.view map (_.prop)).toSet //uses list.view
(list.iterator map (_.prop)).toSet //uses iterator
(for { l <- list.view} yield l.prop).toSet //uses view
(Set.empty[Prop] /: coll) { _ + _.prop } //uses foldLeft
Will all yield Sets without generating unnecessary collections. My personal preference is for the first. In terms of idiomatic scala collection manipulation, each "collection" comes with these methods:
//Conversions
toSeq
toSet
toArray
toList
toIndexedSeq
iterator
toStream
//Strings
mkString
//accumulation
sum
The last is used where the element type of a collection has an implicit Numeric instance in scope; such as:
Set(1, 2, 3, 4).sum //10
Set('a, 'b).sum //does not compile
Note that the String concatenation example in scala looks like:
list.mkString(",")
And in the scalaz FP library might look something like (which uses Monoid to sum Strings):
list.intercalate(",").asMA.sum
Your suggestions do not look anything like Scala; I'm not sure whether they are inspired by another language.
foldLeft? That's what you're describing.
The sum of all elements from a given set:
(0 /: Set(1,2,3))(_ + _)
the set of all elements from a given list
(Set[Int]() /: List(1,2,3,2,1))((acc,x) => acc + x)
String concatenation with a separator symbol:
("" /: List("a", "b"))(_ + _) // (edit - ok concat a bit more verbose:
("" /: List("a", "b"))((acc,x) => acc + (if (acc == "") "" else ",") + x)
You can also force the result type of the for comprehension by explicitly supplying the implicit CanBuildFrom parameter as scala.collection.breakout and specifying the result type.
Consider this REPL session:
scala> val list = List(1, 1, 2, 2, 3, 3)
list: List[Int] = List(1, 1, 2, 2, 3, 3)
scala> val res = for(i <- list) yield i
res: List[Int] = List(1, 1, 2, 2, 3, 3)
scala> val res: Set[Int] = (for(i <- list) yield i)(collection.breakOut)
res: Set[Int] = Set(1, 2, 3)
It results in a type error when not specifying the CanBuildFrom explicitly:
scala> val res: Set[Int] = for(i <- list) yield i
<console>:8: error: type mismatch;
found : List[Int]
required: Set[Int]
val res: Set[Int] = for(i <- list) yield i
^
For a deeper understanding of this I suggest the following read:
http://www.scala-lang.org/docu/files/collections-api/collections-impl.html
If you want to use for comprehensions and still be able to combine your values in some result value you could do the following.
case class WithCollector[B, A](init: B)(p: (B, A) => B) {
var x: B = init
val collect = { (y: A) => { x = p(x, y) } }
def apply(pr: (A => Unit) => Unit) = {
pr(collect)
x
}
}
// Some examples
object Test {
def main(args: Array[String]): Unit = {
// It's still functional
val r1 = WithCollector[Int, Int](0)(_ + _) { collect =>
for (i <- 1 to 10; if i % 2 == 0; j <- 1 to 3) collect(i + j)
}
println(r1) // 120
import collection.mutable.Set
val r2 = WithCollector[Set[Int], Int](Set[Int]())(_ += _) { collect =>
for (i <- 1 to 10; if i % 2 == 0; j <- 1 to 3) collect(i + j)
}
println(r2) // Set(9, 10, 11, 6, 13, 4, 12, 3, 7, 8, 5)
}
}