Scala's <- arrow seems a bit strange. Most operators are implemented somewhere in the source as a function, defined on a data type, either directly or implicitly. <- on the other hand only seems to unusable outside of a for comprehension, where it acts as a syntactic element used to signal the binding of a new variable in a monadic context (via map).
This is the only instance I can think of where Scala has an operator-looking syntactical element that is only usable in a specific context, and isn't an actual function.
Am I wrong about how <- works? Is it a special case symbol used just by the compiler, or is there some way a developer could use this behavior when writing their own code?
For example, would it be possible to write a macro to transform
forRange (i <- 0 to 10) { print(i) }
into
{ var i = 0; while (i <= 10) { print(i) } }
instead of its standard map equivalent? As far as I can tell, any usage of i <- ... outside of a for context causes an exception due to referencing an unknown value.
In short, yes <- is a reserved operator in Scala. It's a compiler thing.
Foreach
There is a strong distinction between foreach and for yield, but the syntax is only syntactic sugar, transformed at compile time.
for (i <- 1 to 10) { statement } expression is translated to:
Range.from(1, 10).foreach(..)
Multiple variables:
for (i <- 1 to 10; y <- 2 to 100) {..} becomes:
Range.from(1, 10).foreach(
el => {Range.from(2, 100).foreach(..)});
With all the variations given by:
for (x <- someList) = someList.foreach(..).
Put simply, they all get de-sugared to foreach statements. The specific foreach being called is given by the collection used.
For yield
The for yield syntax is sugar for flatMap and map. The stay in the monad rule applies here.
for (x <- someList) yield {..} gets translated to a someList.flatMap(..).
Chained operations become hierarchical chains of map/flatMap combos:
for {
x <- someList; y <- SomeOtherList
} yield {} becomes:
someList.flatMap(x => {
y.flatMap(..)
}); and so on.
The point
The point is that the <- operator is nothing more than syntactic sugar to make code more readable, but it always gets replaced at compile time.
To emphasize Rob's point
Rob makes excellent examples of other Scala syntactic sugar.
A context bound
package somepackage;
class Test[T : Manifest] {}
Is actually translated to:
class Test[T](implicit evidence: Manifest[T])
As proof, try to alias a type with a context bound:
type TestAlias[T : Manifest] = somepackage.Test // error, the : syntax can't be used..
It is perhaps easy to see how the : Manifest part is actually not a type a parameter.
It's just easier to type class Test[T : Manifest] rather than class Test[T](implicit evidence: Manifest[T].
The <- operator is a reserved word in the language (see the Scala Language Specification, page 4), but it isn't alone. => is also a reserved word rather than a function. (Also _, :, =, <:, <%, >:, #, and #.) So you couldn't create a function with that name. I don't believe you could adapt it the way you're suggesting, either (though perhaps someone more clever will know a way). You could create a function called `<-` (with surrounding back-ticks), but that would probably be more awkward than it deserves.
Related
I am scratching my head on an example I've seen in breeze's documentation about distributions.
After creating a Rand instance, they show that you can do the following:
import breeze.stats.distributions._
val pois = new Poisson(3.0);
val doublePoi: Rand[Double] = for(x <- pois) yield x.toDouble
Now, this is very cool, I can get a Rand object that I can get Double instead of Int when I call the samples method. Another example might be:
val abc = ('a' to 'z').map(_.toString).toArray
val letterDist: Rand[String] = for(x <- pois) yield {
val i = if (x > 26) x % 26 else x
abc(i)
}
val lettersSamp = letterDist.samples.take(20)
println(letterSamp)
The question is, what is going on here? Rand[T] is not a collection, and all the for/yield examples I've seen so far work on collections. The scala docs don't mention much, the only thing I found is translating for-comprehensions in here.
What is the underlying rule here? How else can this be used (doesn't have to be a breeze related answer)
Scala has rules for translating for and for-yield expressions to the equivalent flatMap and map calls, optionally also applying filters using withFilter and such. The actual specification for how you translate each for comprehension expression into the equivalent method calls can be found in this section of the Scala Specification.
If we take your example and compile it we'll see the underlying transformation happen to the for-yield expression. This is done using scalac -Xprint:typer command to print out the type trees:
val letterDist: breeze.stats.distributions.Rand[String] =
pois.map[String](((x: Int) => {
val i: Int = if (x.>(26))
x.%(26)
else
x;
abc.apply(i)
}));
Here you can see that for-yield turns into a single map passing in an Int and applying the if-else inside the expression. This works because Rand[T] has a map method defined:
def map[E](f: T => E): Rand[E] = MappedRand(outer, f)
For comprehensions are just syntactic sugar for flatMap, map and withFilter. The main requirement for use in a for comprehension is that those methods are implemented. They are therefore not limited to collections E.g. some common non-collections used in for comprehensions are Option, Try and Future.
In your case, Poisson seems to inherit from a trait called Rand
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/stats/distributions/Rand.scala
This trait has map, flatmap, and withFilter defined.
Tip: If you use an IDE like IntelliJ - you can press alt+enter on your for comprehension, and chose convert to desugared expression and you will see how it expands.
Problem
Maybe this is due to my lack of Scala knowledge, but it seems like adding another level to the for comprehension should just work. If the first for comprehension line is commented out, the code works. I ultimately want a Set[Int] instead of '1 to 2', but it serves to show the problem. The first two lines of the for should not need a type specifier, but I include it to show that I've tried the obvious.
Tools/Jars
IntelliJ 2016.1
Java 8
Scala 2.10.5
Cassandra 3.x
spark-assembly-1.6.0-hadoop2.6.0.jar (pre-built)
spark-cassandra-connector_2.10-1.6.0-M1-SNAPSHOT.jar (pre-built)
spark-cassandra-connector-assembly-1.6.0-M1-SNAPSHOT.jar (I built)
Code
case class NotifHist(intnotifhistid:Int, eventhistids:Seq[Int], yosemiteid:String, initiatorname:String)
case class NotifHistSingle(intnotifhistid:Int, inteventhistid:Int, dataCenter:String, initiatorname:String)
object SparkCassandraConnectorJoins {
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeId:Int) {
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
...etc...
}
Compilation Output
Information:3/29/16 8:52 AM - Compilation completed with 1 error and 0 warnings in 1s 507ms
/home/jpowell/Projects/SparkCassandraConnector/src/com/mir3/spark/SparkCassandraConnectorJoins.scala
**Error:(88, 21) type mismatch;
found : scala.collection.immutable.IndexedSeq[Nothing]
required: org.apache.spark.rdd.RDD[com.mir3.spark.NotifHistSingle]
orgNodeId:Int <- 1 to 2
^**
Later
#slouc Thanks for the comprehensive answer. I was using the for comprehension's syntactic sugar to also keep state from the second statement to fill elements in the NotifHistSingle ctor, so I don't see how to get the equivalent map/flatmap to work. Therefore, I went with the following solution:
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeIds:Set[Int]) {
def notifHistForOrg(orgNodeId:Int): RDD[NotifHistSingle] = {
for {
notifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
}
val emptyTable:RDD[NotifHistSingle] = sc.emptyRDD[NotifHistSingle]
val notifHistForAllOrgs:RDD[NotifHistSingle] = orgNodeIds.foldLeft(emptyTable)((accum, oid) => accum ++ notifHistForOrg(oid))
}
For comprehension is actually syntax sugar; what's really going on underneath is a series of chained flatMap calls, with a single map at the end which replaces yield. Scala compiler translates every for comprehension like this. If you use if conditions in your for comprehension, they are translated into filters, and if you don't yield anything foreach is used. For more information, see here.
So, to explain on your case - this:
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(...)
is actually translated by the compiler to this:
val notifHist:RDD[NotifHistSingle] = (1 to 2)
.flatMap(x => sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", x)
.flatMap(x => x.eventhistids)
.map(x => NotifHistSingle(...))
You are getting the error if you include the 1 to 2 line because that makes your for comprehension operate on a sequence (vector, to be more precise). So when invoking flatMap(), compiler expects you to follow up with a function that transforms each element of your vector to a GenTraversableOnce. If you take a closer look at the type of your for expression (most IDEs will display it just by hovering over it) you can see it for yourself:
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That
This is the problem. Compiler doesn't know how to flatMap the vector 1 to 10 using a function that returns CassandraRDD. It wants a function that returns GenTraversableOnce. If you remove the 1 to 2 line then you remove this restriction.
Bottom line - if you want to use a for comprehension and yield values out of it, you have to obey the type rules. It's impossible to flatten a sequence consisting of elements which are not sequences and cannot be turned into sequences.
You can always map instead of flatMap since map is less restrictive (it requires A => B instead of A => GenTraversableOnce[B]). This means that instead of getting all results in one giant sequence, you will get a sequence where each element is a group of results (one group for each query). You can also play around the types, trying to get a GenTraversableOnce from your query result (e.g. invoking sc.cassandraTable().where().toArray or something; I don't really work with Cassandra so I don't know).
I am struggling to understand the behavior of type inference. For example this fails to compile:
import math._
object Distance {
def euclidean (p: Seq[Double], c: Seq[Double]) = {
val d = (p,c)
.zipped map (_ - _)
.map pow(_,2.0)
.foldLeft(0.0)(_+_)
sqrt(d)
}
}
with:
Error:(5, 17) missing parameter type for expanded function ((x$3) =>
scala.Tuple2(p, c).zipped.map(((x$1, x$2) =>
x$1.$minus(x$2))).map.pow(scala.Tuple2(x$3, 2.0).foldLeft(0.0)(((x$4,
x$5) => x$4.$plus(x$5)))))
.map pow(_,2.0)
I somehow do not get how the de-sugaring works and I end up having to sprinkle around type declarations and parenthesis or getting rid of infix notations in favor of explicit method calls (with the .).
For example this one works:
import math._
object Distance {
def euclidean (p: Seq[Double], c: Seq[Double]) = {
val d = (p,c)
.zipped.map (_ - _)
.map ( (x:Double) => pow(x,2.0) )
.foldLeft(0.0)(_+_)
sqrt(d)
}
}
but no chance to have a cute oneliner:
(p,c).zipped map pow(_ - _, 2.0)
I'd be interested in understanding the rules of the game with a for dummies explanation.
The problem seems to be the infix notation. The rules are actually quite simple: A method taking one parameter can be written in an infix notation. So that you can write a b c instead of a.b(c).
It is however not that simple, because by omitting the explicit dots and parentheses, there must be something else deciding the priority of the operators. So that the compiler can decide that 1+2*3 is 1.+(2.*(3)) and not (1.+(2)).*(3). The precedence of the operators part of the specification you have linked and it is (simply said) governed by the leading symbol of the operator.
Another important detail to note is operators ending with a : bind the parameters from the right. So a :: b is equivalent to b.::(a).
Another tricky thing are the parentheses. In the dot notation they simply wrap the parameter lists. In the operator notation they might need to wrap the parameters themselves (e.g. the function literals).
Btw: Your one liner can be written like this: (p,c).zipped map {(a, b) => pow(a - b, 2.0)} Note that I've wrapped the function literal with {}, that is just for readability, () would work too.
In the following code :
val expression : String = "1+1";
for (expressionChar <- expression) {
println(expressionChar)
}
the output is "1+1"
How is each character being accessed here ? What is going on behind the scenes in Scala. Using Java I would need to use the .charAt method , but not in Scala, why ?
In scala, a for comprehension:
for (p <- e) dosomething
would be translated to
e.foreach { x => dosomething }
You can look into Scala Reference 6.19 for more details.
However, String in scala is just a Java String, which doesnot have a method foreach.
But there is an implicit conversion defined in Predef which convert String to StringOps which indeed have a foreach method.
So, finally the code for(x <- "abcdef") println(x) be translated to:
Predef.augmentString("abcdef").foreach(x => println(x))
You can look int the scala reference or Scala Views for more information.
scala> for (c <- "1+1") println(c)
1
+
1
Is the same as
scala> "1+1" foreach (c => println(c))
1
+
1
Scalas for-comprehension is not a loop but a composition of higher order functions that operate on the input. The rules on how the for-comprehension is translated by the compiler can be read here.
Furthermore, in Scala Strings are collections. Internally they are just plain old Java Strings for which there exists some implicit conversions to give them the full power of the Scala collection library. But for users they can be seen as collections.
In your example foreach iterates over each element of the string and executes a closure on them (which is the body of the for-comprehension or the argument in the foreach call).
Even more information about for-comprehensions can be found in section 7 of the StackOverflow Scala Tutorial.
In languages like SML, Erlang and in buch of others we may define functions like this:
fun reverse [] = []
| reverse x :: xs = reverse xs # [x];
I know we can write analog in Scala like this (and I know, there are many flaws in the code below):
def reverse[T](lst: List[T]): List[T] = lst match {
case Nil => Nil
case x :: xs => reverse(xs) ++ List(x)
}
But I wonder, if we could write former code in Scala, perhaps with desugaring to the latter.
Is there any fundamental limitations for such syntax being implemented in the future (I mean, really fundamental -- e.g. the way type inference works in scala, or something else, except parser obviously)?
UPD
Here is a snippet of how it could look like:
type T
def reverse(Nil: List[T]) = Nil
def reverse(x :: xs: List[T]): List[T] = reverse(xs) ++ List(x)
It really depends on what you mean by fundamental.
If you are really asking "if there is a technical showstopper that would prevent to implement this feature", then I would say the answer is no. You are talking about desugaring, and you are on the right track here. All there is to do is to basically stitch several separates cases into one single function, and this can be done as a mere preprocessing step (this only requires syntactic knowledge, no need for semantic knowledge). But for this to even make sense, I would define a few rules:
The function signature is mandatory (in Haskell by example, this would be optional, but it is always optional whether you are defining the function at once or in several parts). We could try to arrange to live without the signature and attempt to extract it from the different parts, but lack of type information would quickly come to byte us. A simpler argument is that if we are to try to infer an implicit signature, we might as well do it for all the methods. But the truth is that there are very good reasons to have explicit singatures in scala and I can't imagine to change that.
All the parts must be defined within the same scope. To start with, they must be declared in the same file because each source file is compiled separately, and thus a simple preprocessor would not be enough to implement the feature. Second, we still end up with a single method in the end, so it's only natural to have all the parts in the same scope.
Overloading is not possible for such methods (otherwise we would need to repeat the signature for each part just so the preprocessor knows which part belongs to which overload)
Parts are added (stitched) to the generated match in the order they are declared
So here is how it could look like:
def reverse[T](lst: List[T]): List[T] // Exactly like an abstract def (provides the signature)
// .... some unrelated code here...
def reverse(Nil) = Nil
// .... another bit of unrelated code here...
def reverse(x :: xs ) = reverse(xs) ++ List(x)
Which could be trivially transformed into:
def reverse[T](list: List[T]): List[T] = lst match {
case Nil => Nil
case x :: xs => reverse(xs) ++ List(x)
}
// .... some unrelated code here...
// .... another bit of unrelated code here...
It is easy to see that the above transformation is very mechanical and can be done by just manipulating a source AST (the AST produced by the slightly modified grammar that accepts this new constructs), and transforming it into the target AST (the AST produced by the standard scala grammar).
Then we can compile the result as usual.
So there you go, with a few simple rules we are able to implement a preprocessor that does all the work to implement this new feature.
If by fundamental you are asking "is there anything that would make this feature out of place" then it can be argued that this does not feel very scala. But more to the point, it does not bring that much to the table. Scala author(s) actually tend toward making the language simpler (as in less built-in features, trying to move some built-in features into libraries) and adding a new syntax that is not really more readable goes against the goal of simplification.
In SML, your code snippet is literally just syntactic sugar (a "derived form" in the terminology of the language spec) for
val rec reverse = fn x =>
case x of [] => []
| x::xs = reverse xs # [x]
which is very close to the Scala code you show. So, no there is no "fundamental" reason that Scala couldn't provide the same kind of syntax. The main problem is Scala's need for more type annotations, which makes this shorthand syntax far less attractive in general, and probably not worth the while.
Note also that the specific syntax you suggest would not fly well, because there is no way to distinguish one case-by-case function definition from two overloaded functions syntactically. You probably would need some alternative syntax, similar to SML using "|".
I don't know SML or Erlang, but I know Haskell. It is a language without method overloading. Method overloading combined with such pattern matching could lead to ambiguities. Imagine following code:
def f(x: String) = "String "+x
def f(x: List[_]) = "List "+x
What should it mean? It can mean method overloading, i.e. the method is determined in compile time. It can also mean pattern matching. There would be just a f(x: AnyRef) method that would do the matching.
Scala also has named parameters, which would be probably also broken.
I don't think that Scala is able to offer more simple syntax than you have shown in general. A simpler syntax may IMHO work in some special cases only.
There are at least two problems:
[ and ] are reserved characters because they are used for type arguments. The compiler allows spaces around them, so that would not be an option.
The other problem is that = returns Unit. So the expression after the | would not return any result
The closest I could come up with is this (note that is very specialized towards your example):
// Define a class to hold the values left and right of the | sign
class |[T, S](val left: T, val right: PartialFunction[T, T])
// Create a class that contains the | operator
class OrAssoc[T](left: T) {
def |(right: PartialFunction[T, T]): T | T = new |(left, right)
}
// Add the | to any potential target
implicit def anyToOrAssoc[S](left: S): OrAssoc[S] = new OrAssoc(left)
object fun {
// Use the magic of the update method
def update[T, S](choice: T | S): T => T = { arg =>
if (choice.right.isDefinedAt(arg)) choice.right(arg)
else choice.left
}
}
// Use the above construction to define a new method
val reverse: List[Int] => List[Int] =
fun() = List.empty[Int] | {
case x :: xs => reverse(xs) ++ List(x)
}
// Call the method
reverse(List(3, 2, 1))