Explain how looping behaves in Scala compared to Java when accessing values - scala

In the following code :
val expression : String = "1+1";
for (expressionChar <- expression) {
println(expressionChar)
}
the output is "1+1"
How is each character being accessed here ? What is going on behind the scenes in Scala. Using Java I would need to use the .charAt method , but not in Scala, why ?

In scala, a for comprehension:
for (p <- e) dosomething
would be translated to
e.foreach { x => dosomething }
You can look into Scala Reference 6.19 for more details.
However, String in scala is just a Java String, which doesnot have a method foreach.
But there is an implicit conversion defined in Predef which convert String to StringOps which indeed have a foreach method.
So, finally the code for(x <- "abcdef") println(x) be translated to:
Predef.augmentString("abcdef").foreach(x => println(x))
You can look int the scala reference or Scala Views for more information.

scala> for (c <- "1+1") println(c)
1
+
1
Is the same as
scala> "1+1" foreach (c => println(c))
1
+
1
Scalas for-comprehension is not a loop but a composition of higher order functions that operate on the input. The rules on how the for-comprehension is translated by the compiler can be read here.
Furthermore, in Scala Strings are collections. Internally they are just plain old Java Strings for which there exists some implicit conversions to give them the full power of the Scala collection library. But for users they can be seen as collections.
In your example foreach iterates over each element of the string and executes a closure on them (which is the body of the for-comprehension or the argument in the foreach call).
Even more information about for-comprehensions can be found in section 7 of the StackOverflow Scala Tutorial.

Related

scala for/yield on instance of class

I am scratching my head on an example I've seen in breeze's documentation about distributions.
After creating a Rand instance, they show that you can do the following:
import breeze.stats.distributions._
val pois = new Poisson(3.0);
val doublePoi: Rand[Double] = for(x <- pois) yield x.toDouble
Now, this is very cool, I can get a Rand object that I can get Double instead of Int when I call the samples method. Another example might be:
val abc = ('a' to 'z').map(_.toString).toArray
val letterDist: Rand[String] = for(x <- pois) yield {
val i = if (x > 26) x % 26 else x
abc(i)
}
val lettersSamp = letterDist.samples.take(20)
println(letterSamp)
The question is, what is going on here? Rand[T] is not a collection, and all the for/yield examples I've seen so far work on collections. The scala docs don't mention much, the only thing I found is translating for-comprehensions in here.
What is the underlying rule here? How else can this be used (doesn't have to be a breeze related answer)
Scala has rules for translating for and for-yield expressions to the equivalent flatMap and map calls, optionally also applying filters using withFilter and such. The actual specification for how you translate each for comprehension expression into the equivalent method calls can be found in this section of the Scala Specification.
If we take your example and compile it we'll see the underlying transformation happen to the for-yield expression. This is done using scalac -Xprint:typer command to print out the type trees:
val letterDist: breeze.stats.distributions.Rand[String] =
pois.map[String](((x: Int) => {
val i: Int = if (x.>(26))
x.%(26)
else
x;
abc.apply(i)
}));
Here you can see that for-yield turns into a single map passing in an Int and applying the if-else inside the expression. This works because Rand[T] has a map method defined:
def map[E](f: T => E): Rand[E] = MappedRand(outer, f)
For comprehensions are just syntactic sugar for flatMap, map and withFilter. The main requirement for use in a for comprehension is that those methods are implemented. They are therefore not limited to collections E.g. some common non-collections used in for comprehensions are Option, Try and Future.
In your case, Poisson seems to inherit from a trait called Rand
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/stats/distributions/Rand.scala
This trait has map, flatmap, and withFilter defined.
Tip: If you use an IDE like IntelliJ - you can press alt+enter on your for comprehension, and chose convert to desugared expression and you will see how it expands.

Spark Cassandra Connector: for comprehension error (type mismatch)

Problem
Maybe this is due to my lack of Scala knowledge, but it seems like adding another level to the for comprehension should just work. If the first for comprehension line is commented out, the code works. I ultimately want a Set[Int] instead of '1 to 2', but it serves to show the problem. The first two lines of the for should not need a type specifier, but I include it to show that I've tried the obvious.
Tools/Jars
IntelliJ 2016.1
Java 8
Scala 2.10.5
Cassandra 3.x
spark-assembly-1.6.0-hadoop2.6.0.jar (pre-built)
spark-cassandra-connector_2.10-1.6.0-M1-SNAPSHOT.jar (pre-built)
spark-cassandra-connector-assembly-1.6.0-M1-SNAPSHOT.jar (I built)
Code
case class NotifHist(intnotifhistid:Int, eventhistids:Seq[Int], yosemiteid:String, initiatorname:String)
case class NotifHistSingle(intnotifhistid:Int, inteventhistid:Int, dataCenter:String, initiatorname:String)
object SparkCassandraConnectorJoins {
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeId:Int) {
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
...etc...
}
Compilation Output
Information:3/29/16 8:52 AM - Compilation completed with 1 error and 0 warnings in 1s 507ms
/home/jpowell/Projects/SparkCassandraConnector/src/com/mir3/spark/SparkCassandraConnectorJoins.scala
**Error:(88, 21) type mismatch;
found : scala.collection.immutable.IndexedSeq[Nothing]
required: org.apache.spark.rdd.RDD[com.mir3.spark.NotifHistSingle]
orgNodeId:Int <- 1 to 2
^**
Later
#slouc Thanks for the comprehensive answer. I was using the for comprehension's syntactic sugar to also keep state from the second statement to fill elements in the NotifHistSingle ctor, so I don't see how to get the equivalent map/flatmap to work. Therefore, I went with the following solution:
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeIds:Set[Int]) {
def notifHistForOrg(orgNodeId:Int): RDD[NotifHistSingle] = {
for {
notifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
}
val emptyTable:RDD[NotifHistSingle] = sc.emptyRDD[NotifHistSingle]
val notifHistForAllOrgs:RDD[NotifHistSingle] = orgNodeIds.foldLeft(emptyTable)((accum, oid) => accum ++ notifHistForOrg(oid))
}
For comprehension is actually syntax sugar; what's really going on underneath is a series of chained flatMap calls, with a single map at the end which replaces yield. Scala compiler translates every for comprehension like this. If you use if conditions in your for comprehension, they are translated into filters, and if you don't yield anything foreach is used. For more information, see here.
So, to explain on your case - this:
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(...)
is actually translated by the compiler to this:
val notifHist:RDD[NotifHistSingle] = (1 to 2)
.flatMap(x => sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", x)
.flatMap(x => x.eventhistids)
.map(x => NotifHistSingle(...))
You are getting the error if you include the 1 to 2 line because that makes your for comprehension operate on a sequence (vector, to be more precise). So when invoking flatMap(), compiler expects you to follow up with a function that transforms each element of your vector to a GenTraversableOnce. If you take a closer look at the type of your for expression (most IDEs will display it just by hovering over it) you can see it for yourself:
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That
This is the problem. Compiler doesn't know how to flatMap the vector 1 to 10 using a function that returns CassandraRDD. It wants a function that returns GenTraversableOnce. If you remove the 1 to 2 line then you remove this restriction.
Bottom line - if you want to use a for comprehension and yield values out of it, you have to obey the type rules. It's impossible to flatten a sequence consisting of elements which are not sequences and cannot be turned into sequences.
You can always map instead of flatMap since map is less restrictive (it requires A => B instead of A => GenTraversableOnce[B]). This means that instead of getting all results in one giant sequence, you will get a sequence where each element is a group of results (one group for each query). You can also play around the types, trying to get a GenTraversableOnce from your query result (e.g. invoking sc.cassandraTable().where().toArray or something; I don't really work with Cassandra so I don't know).

Is <- only accessible by the compiler

Scala's <- arrow seems a bit strange. Most operators are implemented somewhere in the source as a function, defined on a data type, either directly or implicitly. <- on the other hand only seems to unusable outside of a for comprehension, where it acts as a syntactic element used to signal the binding of a new variable in a monadic context (via map).
This is the only instance I can think of where Scala has an operator-looking syntactical element that is only usable in a specific context, and isn't an actual function.
Am I wrong about how <- works? Is it a special case symbol used just by the compiler, or is there some way a developer could use this behavior when writing their own code?
For example, would it be possible to write a macro to transform
forRange (i <- 0 to 10) { print(i) }
into
{ var i = 0; while (i <= 10) { print(i) } }
instead of its standard map equivalent? As far as I can tell, any usage of i <- ... outside of a for context causes an exception due to referencing an unknown value.
In short, yes <- is a reserved operator in Scala. It's a compiler thing.
Foreach
There is a strong distinction between foreach and for yield, but the syntax is only syntactic sugar, transformed at compile time.
for (i <- 1 to 10) { statement } expression is translated to:
Range.from(1, 10).foreach(..)
Multiple variables:
for (i <- 1 to 10; y <- 2 to 100) {..} becomes:
Range.from(1, 10).foreach(
el => {Range.from(2, 100).foreach(..)});
With all the variations given by:
for (x <- someList) = someList.foreach(..).
Put simply, they all get de-sugared to foreach statements. The specific foreach being called is given by the collection used.
For yield
The for yield syntax is sugar for flatMap and map. The stay in the monad rule applies here.
for (x <- someList) yield {..} gets translated to a someList.flatMap(..).
Chained operations become hierarchical chains of map/flatMap combos:
for {
x <- someList; y <- SomeOtherList
} yield {} becomes:
someList.flatMap(x => {
y.flatMap(..)
}); and so on.
The point
The point is that the <- operator is nothing more than syntactic sugar to make code more readable, but it always gets replaced at compile time.
To emphasize Rob's point
Rob makes excellent examples of other Scala syntactic sugar.
A context bound
package somepackage;
class Test[T : Manifest] {}
Is actually translated to:
class Test[T](implicit evidence: Manifest[T])
As proof, try to alias a type with a context bound:
type TestAlias[T : Manifest] = somepackage.Test // error, the : syntax can't be used..
It is perhaps easy to see how the : Manifest part is actually not a type a parameter.
It's just easier to type class Test[T : Manifest] rather than class Test[T](implicit evidence: Manifest[T].
The <- operator is a reserved word in the language (see the Scala Language Specification, page 4), but it isn't alone. => is also a reserved word rather than a function. (Also _, :, =, <:, <%, >:, #, and #.) So you couldn't create a function with that name. I don't believe you could adapt it the way you're suggesting, either (though perhaps someone more clever will know a way). You could create a function called `<-` (with surrounding back-ticks), but that would probably be more awkward than it deserves.

Why does the definition of Array.map in Scala is "throw new Error()"

The source code of map for Array is:
override def map[B](f: A => B): Array[B] = throw new Error()
But the following works:
val name : Array[String]= new Array(1)
name(0)="Oscar"
val x = name.map { ( s: String ) => s.toUpperCase }
// returns: x: Array[java.lang.String] = Array(OSCAR)
Generally, when you see throw new Error() in the source code of a library class, it represents a point where the compiler is intervening and implementing the method by bridging to a facility of the platform (remember this could be Java or .NET).
The Array SID explains how arrays used to be treated in Scala 2.7.x, and how they have changed in 2.8. The compiler used to magically convert the object to a BoxedArray if you called map.
In 2.8, integration of Arrays into the Scala collections framework is largely handled with use of normal langauges features -- implicit conversions from Array[T] to WrappedArray[T] or ArraySeq[T], depending on the context, and implicit parameters of type Manifest[T] to support creation of arrays of a generic type T. Array indexing, length and update still appear as throw new Error(). Array#map no longer exists, instead you find this on WrappedArray and ArraySeq as a regular method.
UPDATE
If you're interested to know this compiler magic is defined, take a look at Scala 2.8 incarnation of Cleanup.scala.
Looks like it's just dummy code, as Scala arrays are really Java ones.

Scala Implicit convertions: 2 way to invoke

#lucastex posted about the Java Elvis operator, and I tried something in Scala to get the same effect.
I've just converted everything to a new Structural Type with the ?: operator taking an object of the same type as argument. So Say:
implicit def toRockStar[B](v : B) = new {
def ?:(opt: => B) = if (v == null) opt else v}
val name: String = "Paulo" // example
Why name ?: "Lucas" gets "Lucas" and name.?:{"Lucas"} gets Paulo? The new Structural Type is supposed to return the initial value of anything if it is not null, that is, "Paulo" in the above code.
I'm a bit confused. Any explanation?
Your operator ends in :, which means it reads right to left when in infix notation. For example:
scala> 1 :: Nil == Nil.::(1)
res2: Boolean = true
All methods read left to right in dot notation, though. So you are actually applying your method to Lucas in the infix notation, and to name in the dot notation.
By the way, the Elvis operator was not accepted for inclusion in Java 7.
For the record (finding this thread while searching back the following article...), Daniel Spiewak (another famous Daniel in the Scala world...) posted an article about Implementing Groovy’s Elvis Operator in Scala.