Scala partially applied curried functions - scala

Why can't i rewrite
println(abc.foldRight(0)((a,b) => math.max(a.length,b)))
in
object Main {
def main(args : Array[String]) {
val abc = Array[String]("a","abc","erfgg","r")
println(abc.foldRight(0)((a,b) => math.max(a.length,b)))
}
}
to
println(abc.foldRight(0)(math.max(_.length,_)))
? scala interpreter yields
/path/to/Main.scala:4: error: wrong number of parameters; expected = 2
println(abc.foldRight(0)(math.max(_.length,_)))
^
one error found
Which is not descriptive enough for me. Isn't resulting lambda takes two parameters one of which being called for .length method, as in abc.map(_.length)?

abc.foldRight(0)(math.max(_.length, _)) will expand to something like abc.foldRight(0)(y => math.max(x => x.length, y)). The placeholder syntax expands in the nearest pair of closing parentheses, except when you have only the underscore in which case it will expand outside the closest pair of parentheses.
You can use abc.foldRight(0)(_.length max _) which doesn't suffer from this drawback.

Related

Swapping tuples of different types in Scala

I'm trying to write a simple function that will swap a (Int, String) tuple in Scala. I've tried a number of things and I keep getting compiler errors, for example:
def swap (p:(Int,String)) : (String,Int) = {
var s = p._1
var t = p._2
var u = (p._2, p.1)
}
[error] found : Unit
[error] required: (String, Int)
[error] }
[error] ^
[error] one error found
Why does it keep saying it finds a "Unit" type? I've tried different variations, and even using the "swap" function built into Scala, but I keep getting these kinds of errors stating that my return type isn't (String, Int). Any suggestions are appreciated. Thank you!
The return value of a method (or more generally, the value of any block) is the value of the last expression inside the block. The last expression in your block is
var u = (p._2, p.1)
The value of an assignment is () (which is the singleton value of the Unit type): an assignment is a side-effect, it doesn't have a value, and () is the value (and Unit the type) which denotes the absence of a value (think "void" if you are familiar with C, C++, D, Objective-C, Objective-C++, Java, or C♯); ergo, your method returns (), which is of type Unit.
Here's a more Scala-ish way to write your method:
def swap[A, B](p: (A, B)) = (p._2, p._1)
All you need is this:
def swap(p: (Int,String)): (String,Int) = (p._2, p._1)
And to make it work on any tuple (of size 2):
def swap[A,B](p: (A,B)): (B,A) = (p._2, p._1)
In Scala, the last expression in a function is the returned value. It also supports an explicit return expression, that would be like this:
def swap(p: (Int,String)): (String,Int) = {
return (p._2, p._1)
}
or more like what you intended:
def swap(p: (Int,String)): (String,Int) = {
val result = (p._2, p._1)
return result
}
Keep in mind this explicit return syntax is not recommended.
Because Scala is a functional language, everything is an expression. Expressions are anything you can evaluate and get back a resulting value, which, being a value, has a type.
Even things that you would think more like "statements", like println("a") or var a = 1 are expressions. When evaluated, they return a meaningless/empty value, that is of type Unit. Therefore, your function returns the last statement, that is a variable assignment, which has a value of type Unit.
You can also achieve it using pattern matching and function literal:
def swap[X,Y]: ((X,Y)) => ((Y,X)) = { case (x, y) => (y, x) }

Spark Cassandra Connector: for comprehension error (type mismatch)

Problem
Maybe this is due to my lack of Scala knowledge, but it seems like adding another level to the for comprehension should just work. If the first for comprehension line is commented out, the code works. I ultimately want a Set[Int] instead of '1 to 2', but it serves to show the problem. The first two lines of the for should not need a type specifier, but I include it to show that I've tried the obvious.
Tools/Jars
IntelliJ 2016.1
Java 8
Scala 2.10.5
Cassandra 3.x
spark-assembly-1.6.0-hadoop2.6.0.jar (pre-built)
spark-cassandra-connector_2.10-1.6.0-M1-SNAPSHOT.jar (pre-built)
spark-cassandra-connector-assembly-1.6.0-M1-SNAPSHOT.jar (I built)
Code
case class NotifHist(intnotifhistid:Int, eventhistids:Seq[Int], yosemiteid:String, initiatorname:String)
case class NotifHistSingle(intnotifhistid:Int, inteventhistid:Int, dataCenter:String, initiatorname:String)
object SparkCassandraConnectorJoins {
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeId:Int) {
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
...etc...
}
Compilation Output
Information:3/29/16 8:52 AM - Compilation completed with 1 error and 0 warnings in 1s 507ms
/home/jpowell/Projects/SparkCassandraConnector/src/com/mir3/spark/SparkCassandraConnectorJoins.scala
**Error:(88, 21) type mismatch;
found : scala.collection.immutable.IndexedSeq[Nothing]
required: org.apache.spark.rdd.RDD[com.mir3.spark.NotifHistSingle]
orgNodeId:Int <- 1 to 2
^**
Later
#slouc Thanks for the comprehensive answer. I was using the for comprehension's syntactic sugar to also keep state from the second statement to fill elements in the NotifHistSingle ctor, so I don't see how to get the equivalent map/flatmap to work. Therefore, I went with the following solution:
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeIds:Set[Int]) {
def notifHistForOrg(orgNodeId:Int): RDD[NotifHistSingle] = {
for {
notifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
}
val emptyTable:RDD[NotifHistSingle] = sc.emptyRDD[NotifHistSingle]
val notifHistForAllOrgs:RDD[NotifHistSingle] = orgNodeIds.foldLeft(emptyTable)((accum, oid) => accum ++ notifHistForOrg(oid))
}
For comprehension is actually syntax sugar; what's really going on underneath is a series of chained flatMap calls, with a single map at the end which replaces yield. Scala compiler translates every for comprehension like this. If you use if conditions in your for comprehension, they are translated into filters, and if you don't yield anything foreach is used. For more information, see here.
So, to explain on your case - this:
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(...)
is actually translated by the compiler to this:
val notifHist:RDD[NotifHistSingle] = (1 to 2)
.flatMap(x => sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", x)
.flatMap(x => x.eventhistids)
.map(x => NotifHistSingle(...))
You are getting the error if you include the 1 to 2 line because that makes your for comprehension operate on a sequence (vector, to be more precise). So when invoking flatMap(), compiler expects you to follow up with a function that transforms each element of your vector to a GenTraversableOnce. If you take a closer look at the type of your for expression (most IDEs will display it just by hovering over it) you can see it for yourself:
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That
This is the problem. Compiler doesn't know how to flatMap the vector 1 to 10 using a function that returns CassandraRDD. It wants a function that returns GenTraversableOnce. If you remove the 1 to 2 line then you remove this restriction.
Bottom line - if you want to use a for comprehension and yield values out of it, you have to obey the type rules. It's impossible to flatten a sequence consisting of elements which are not sequences and cannot be turned into sequences.
You can always map instead of flatMap since map is less restrictive (it requires A => B instead of A => GenTraversableOnce[B]). This means that instead of getting all results in one giant sequence, you will get a sequence where each element is a group of results (one group for each query). You can also play around the types, trying to get a GenTraversableOnce from your query result (e.g. invoking sc.cassandraTable().where().toArray or something; I don't really work with Cassandra so I don't know).

Understand syntactic sugar for lambda expressions

I am struggling to understand the behavior of type inference. For example this fails to compile:
import math._
object Distance {
def euclidean (p: Seq[Double], c: Seq[Double]) = {
val d = (p,c)
.zipped map (_ - _)
.map pow(_,2.0)
.foldLeft(0.0)(_+_)
sqrt(d)
}
}
with:
Error:(5, 17) missing parameter type for expanded function ((x$3) =>
scala.Tuple2(p, c).zipped.map(((x$1, x$2) =>
x$1.$minus(x$2))).map.pow(scala.Tuple2(x$3, 2.0).foldLeft(0.0)(((x$4,
x$5) => x$4.$plus(x$5)))))
.map pow(_,2.0)
I somehow do not get how the de-sugaring works and I end up having to sprinkle around type declarations and parenthesis or getting rid of infix notations in favor of explicit method calls (with the .).
For example this one works:
import math._
object Distance {
def euclidean (p: Seq[Double], c: Seq[Double]) = {
val d = (p,c)
.zipped.map (_ - _)
.map ( (x:Double) => pow(x,2.0) )
.foldLeft(0.0)(_+_)
sqrt(d)
}
}
but no chance to have a cute oneliner:
(p,c).zipped map pow(_ - _, 2.0)
I'd be interested in understanding the rules of the game with a for dummies explanation.
The problem seems to be the infix notation. The rules are actually quite simple: A method taking one parameter can be written in an infix notation. So that you can write a b c instead of a.b(c).
It is however not that simple, because by omitting the explicit dots and parentheses, there must be something else deciding the priority of the operators. So that the compiler can decide that 1+2*3 is 1.+(2.*(3)) and not (1.+(2)).*(3). The precedence of the operators part of the specification you have linked and it is (simply said) governed by the leading symbol of the operator.
Another important detail to note is operators ending with a : bind the parameters from the right. So a :: b is equivalent to b.::(a).
Another tricky thing are the parentheses. In the dot notation they simply wrap the parameter lists. In the operator notation they might need to wrap the parameters themselves (e.g. the function literals).
Btw: Your one liner can be written like this: (p,c).zipped map {(a, b) => pow(a - b, 2.0)} Note that I've wrapped the function literal with {}, that is just for readability, () would work too.

Scala's lazy arguments: How do they work?

In the file Parsers.scala (Scala 2.9.1) from the parser combinators library I seem to have come across a lesser known Scala feature called "lazy arguments". Here's an example:
def ~ [U](q: => Parser[U]): Parser[~[T, U]] = { lazy val p = q // lazy argument
(for(a <- this; b <- p) yield new ~(a,b)).named("~")
}
Apparently, there's something going on here with the assignment of the call-by-name argument q to the lazy val p.
So far I have not been able to work out what this does and why it's useful. Can anyone help?
Call-by-name arguments are called every time you ask for them. Lazy vals are called the first time and then the value is stored. If you ask for it again, you'll get the stored value.
Thus, a pattern like
def foo(x: => Expensive) = {
lazy val cache = x
/* do lots of stuff with cache */
}
is the ultimate put-off-work-as-long-as-possible-and-only-do-it-once pattern. If your code path never takes you to need x at all, then it will never get evaluated. If you need it multiple times, it'll only be evaluated once and stored for future use. So you do the expensive call either zero (if possible) or one (if not) times, guaranteed.
The wikipedia article for Scala even answers what the lazy keyword does:
Using the keyword lazy defers the initialization of a value until this value is used.
Additionally, what you have in this code sample with q : => Parser[U] is a call-by-name parameter. A parameter declared this way remains unevaluated, until you explicitly evaluate it somewhere in your method.
Here is an example from the scala REPL on how the call-by-name parameters work:
scala> def f(p: => Int, eval : Boolean) = if (eval) println(p)
f: (p: => Int, eval: Boolean)Unit
scala> f(3, true)
3
scala> f(3/0, false)
scala> f(3/0, true)
java.lang.ArithmeticException: / by zero
at $anonfun$1.apply$mcI$sp(<console>:9)
...
As you can see, the 3/0 does not get evaluated at all in the second call. Combining the lazy value with a call-by-name parameter like above results in the following meaning: the parameter q is not evaluated immediately when calling the method. Instead it is assigned to the lazy value p, which is also not evaluated immediately. Only lateron, when p is used this leads to the evaluation of q. But, as p is a val the parameter q will only be evaluated once and the result is stored in p for later reuse in the loop.
You can easily see in the repl, that the multiple evaluation can happen otherwise:
scala> def g(p: => Int) = println(p + p)
g: (p: => Int)Unit
scala> def calc = { println("evaluating") ; 10 }
calc: Int
scala> g(calc)
evaluating
evaluating
20

Scala unexpectedly not being able to ascertain type for expanded function

Why, in Scala, given:
a = List(1, 2, 3, 4)
def f(x : String) = { x }
does
a.map(_.toString)
work, but
a.map(f(_.toString))
give the error
missing parameter type for expanded function ((x$1) => x$1.toString)
Well... f() takes a String as a parameter. The construct _.toString has type A <: Any => String. The function f() expects a type of String, so the example above does not type check. It seems that Scala is friendly in this case and gives the user another chance. The error message means: "By my type inference algorithms this does not compile. Put the types in and it might, if it's something I can't infer."
You would have to write the anonymous function longhand in this case, i.e. a.map(n => f(n.toString)). This is not a limitation of type inference, but of the wildcard symbol. Basically, when you write a.map(f(_.toString)), the _.toString gets expanded into an anonymous function inside the closest brackets it can find, otherwise this would lead to enormous ambiguity. Imagine something like f(g(_.toString)). Does this mean f(g(x => x.toString)) or f(x => g(x.toString))? Worse ambiguities would arise for multiple nested function calls. The Scala type checker therefore takes the most logical solution, as described above.
Nitpick: the first line of your code should be val a = List(1,2,3,4) :).