spark-shell cannot parse Scala lines that start with dot / period - scala

Whenever I find some Scala / Spark code online, I want to directly paste it into spark-shell to try it out. (I am using spark-shell with Spark 1.6 on both CentOS and Mac OS.)
Generally, this approach works well, but I always have problems when lines start with a dot / period (indicating a continuing method call). If I move the dot to the previous line, it works.
Example: Here is some code I found online:
val paramMap = ParamMap(lr.maxIter -> 20)
.put(lr.maxIter, 30)
.put(lr.regParam -> 0.1, lr.threshold -> 0.55)
So when I paste this directly into spark-shell, I see this error:
scala> val paramMap = ParamMap(lr.maxIter -> 20)
paramMap: org.apache.spark.ml.param.ParamMap =
{
logreg_d63b85553548-maxIter: 20
}
scala> .put(lr.maxIter, 30)
<console>:1: error: illegal start of definition
.put(lr.maxIter, 30)
^
scala> .put(lr.regParam -> 0.1, lr.threshold -> 0.55)
<console>:1: error: illegal start of definition
.put(lr.regParam -> 0.1, lr.threshold -> 0.55)
^
However, when I instead move the dot to the previous line, everything is ok.
scala> val paramMap = ParamMap(lr.maxIter -> 20).
| put(lr.maxIter, 30).
| put(lr.regParam -> 0.1, lr.threshold -> 0.55)
paramMap: org.apache.spark.ml.param.ParamMap =
{
logreg_d63b85553548-maxIter: 30,
logreg_d63b85553548-regParam: 0.1,
logreg_d63b85553548-threshold: 0.55
}
Is there a way to configure spark-shell so that it will accept lines that start with a dot (or equivalently, so that it will continue lines even if they don't end in a dot)?

There must be no leading whitespace.
scala> "3"
res0: String = 3
scala> .toInt
res1: Int = 3
scala> "3"
res2: String = 3
scala> .toInt
<console>:1: error: illegal start of definition
.toInt
^
PS: Maybe it should ignore whitespace when a dot is detected. A JIRA was added on that concern here.

Use :paste command:
scala> :paste
// Entering paste mode (ctrl-D to finish)
if (true)
print("that was true")
else
print("false")
// Exiting paste mode, now interpreting.
that was true

you can also wrap your expression with curly braces
val paramMap = { ParamMap(lr.maxIter -> 20)
.put(lr.maxIter, 30)
.put(lr.regParam -> 0.1, lr.threshold -> 0.55)
}
This is because: The REPL is “greedy” and consumes the first full statement you type in, so attempting to paste blocks of code into it can fail
for more details see: http://alvinalexander.com/scala/scala-repl-how-to-paste-load-blocks-of-source-code
There is also nice feature :paste -raw
See http://docs.scala-lang.org/overviews/repl/overview.html

The Spark shell has a built-in mechanism to allow for pasting in multiple line Spark Scala code or writing line-by-line Spark Scala code: by wrapping your code in parenthesis (). Moving your dots to the end of the line is not needed.
In your example start off with val paramMap = (. From here you can write each line by hand or paste in your multi-line linear regression hyperparameter code. Then add one more parenthesis ) after your code is finished to encapsulate it. When using this method don't use tab for indentation but use two spaces instead.
Full code example:
scala> val paramMap = (ParamMap(lr.maxIter -> 20)
| .put(lr.maxIter, 30)
| .put(lr.regParam -> 0.1, lr.threshold -> 0.55)
| )

You can also put the periods on the line before, then it works fine. Although this may break style conventions
val paramMap = ParamMap(lr.maxIter -> 20).
put(lr.maxIter, 30).
put(lr.regParam -> 0.1, lr.threshold -> 0.55)

Related

Scala Map And Pair Manipulation

I am a beginner in Scala. I read this code about Map manipulation but can't understand how it works.
val terms = Map (1 -> 1.0, 2-> 2.0)
val (exp, coeff) = (2, 4.0)
exp -> (coeff + terms(exp)) //> res: (Int, Double) = (2,6.0)
coeff + terms(exp) //> res: Double = 6.0
I think the third line probably applied a map function,
but the right hand side is apparently a number.
Why the output is a pair?
Thanks.
val (exp, coeff) = (2, 4.0)
This is a deconstructor, it assigns the values 2 and 4.0 to the names exp and coeff. exp is now 2 and coeff is now 4.0.
terms(exp)
This is a map-lookup which results in 2.0
exp -> (coeff + terms(exp))
This arrow -> is a short hand term for constructing a Pair. It now has the values 2 and 6.0 because coeff + terms(exp) is 6.0.
terms(exp) accesses the value of the "map object" terms at the key defined by exp.
In contrast to this, the usage of a "map method" would look like this:
val list = List(1, 2, 3, 4)
val double = (i: Int) => i * 2
val list2 = list.map(double)
printf(list2.toString)
list.map(double) executes the function literal double for every member of the list and gives back a new List object.
The printed output is:
List(2, 4, 6, 8)
terms(exp) is a map lookup, resolving to 2.0
(coeff + terms(exp)) is, therefore 4.0 + 2.0
In exp -> (coeff + terms(exp)) the arrow -> constructs a pair of the left and right operands.

Why does this code do work: "map.apply(1)+=3"

val map = scala.collection.mutable.Map(1 -> 2)
map(1) += 3
map.apply(1) += 3
(map.apply(1)).+=(3)
I don't understand why the codes are all compiling fine.
In the first case, I think the code is expanded to map(1) = map(1) + 3, and to map.update(1, map(1) + 3).
But in the second and third cases,
map.apply(1) = map.apply(1) + 3 causes a compilation error, of cause.
How are the second and third code expanded to?
Running :replay -Xprint:typer from the scala console:
1) map(1) += 3 expands to:
map.update(1, map.apply(1).+(3))
2) map.apply(1) += 3 expands to:
map.update(1, map.apply(1).+(3))
3) (map.apply(1)).+=(3) expands to:
map.update(1, map.apply(1).+(3))
EDIT Answer to the question in the comments
If all three expansions are the same, why second and third causes a compilation error?
The second and third: map.apply(1) += 3 and (map.apply(1)).+=(3) are compiling fine and are also equivalent.
What I tried to prove with my answer is that: map.apply(1) += 3 doesn't expand to map.apply(1) = map.apply(1) + 3 as explained by #som-snytt in the first part of his answer.
BTW map(1) = map(1) + 3 does not expands to map.update(1, map(1) + 3) as stated in the question.
I hope this clarify my answer.
The rule for update is in the spec under assignments, and expansion of assignment operators here.
The question is why is explicit m.apply not taken as m() for purposes of the update rule.
The two forms are supposed to be equivalent.
Someone just debated update syntax with examples.
scala> import reflect.runtime.universe._
import reflect.runtime.universe._
scala> val map = scala.collection.mutable.Map(1 -> 2)
map: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2)
scala> reify(map(1) += 3)
res0: reflect.runtime.universe.Expr[Unit] = Expr[Unit]($read.map.update(1, $read.map.apply(1).$plus(3)))
scala> reify(map.apply(1) += 3)
res1: reflect.runtime.universe.Expr[Unit] = Expr[Unit]($read.map.update(1, $read.map.apply(1).$plus(3)))
scala> reify(map(1) = map(1) + 3)
res2: reflect.runtime.universe.Expr[Unit] = Expr[Unit]($read.map.update(1, $read.map.apply(1).$plus(3)))
scala> reify(map.apply(1) = map.apply(1) + 3)
<console>:16: error: missing argument list for method apply in trait MapLike
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `apply _` or `apply(_)` instead of `apply`.
reify(map.apply(1) = map.apply(1) + 3)
^
scala> map.apply.update(1, map.apply(1) + 3)
<console>:16: error: missing argument list for method apply in trait MapLike
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `apply _` or `apply(_)` instead of `apply`.
map.apply.update(1, map.apply(1) + 3)
^
Edit: FWIW, that's just how it is.
Edit:
This is the anomaly:
scala> val m = collection.mutable.Map(1->2)
m: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2)
scala> m(1) = m(1) + 3
scala> m(1) += 3
scala> m.apply(1) += 3
scala> m.apply(1) = m.apply(1) + 3
<console>:13: error: missing argument list for method apply in trait MapLike
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `apply _` or `apply(_)` instead of `apply`.
m.apply(1) = m.apply(1) + 3
^
Since these expressions are all equivalent, they should all compile to an invocation of update.
The last expression fails to typecheck because the compiler does a mechanical rewrite to m.apply.update(1, m.apply(1) + 3) instead of m.update.
The explanation in gitter chat is that, well, the compiler isn't required to be smart enough to recognize m.apply(1) as m(1) in this context. After all, possibly ambiguities ensue. What if apply is parameterless and returns a value with an update method? Do you take m.apply(1) as m(1) only if it doesn't typecheck otherwise?
It's clear that, by the spec, m(1) += ??? is expanded to m(1) = m(1) + ??? and then converted to m.update(1, m(1) + ???).
In the code, the two transformations (converting op= to x = x op expr and x(1) = ??? to x.update(1, ???)) are compressed:
Deciding if something is mutable
On error with op=, attempt conversion to assignment
Converting to update (or to plain assignment).
It might be possible to work around the limitation in the implementation, but it's not obvious that it would spec nicely (as above, where apply might be paramless).
Then should m.apply(1) += 3 fail to compile, for symmetry? If the compiler worked harder to retain the source expression, it could at least be more consistent in this case.
FWIW, this works in 2.12.0-M3
C:\Users\erichardson>scala
Welcome to Scala 2.12.0-M3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66).
Type in expressions for evaluation. Or try :help.
scala> val map = scala.collection.mutable.Map(1 -> 2)
map: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2)
scala> map(1) += 3
scala> map
res1: scala.collection.mutable.Map[Int,Int] = Map(1 -> 5)
scala> map.apply(1) += 3
scala> map
res3: scala.collection.mutable.Map[Int,Int] = Map(1 -> 8)
scala> (map.apply(1)).+=(3)
scala> map
res5: scala.collection.mutable.Map[Int,Int] = Map(1 -> 11)
scala>

How do chain commands in Scala interpreter?

In windows batch, you chain commands with && operator. How do you do the same in scala interpreter? It looks stupid that I need :load file and call import mainobj._ after every load. Obviously, you want to chain them into one liner.
You could write multiple statements in one line of Scala with ;
scala> val a:Int = 3; val b:Int = 4; println(a+b);
7
a: Int = 3
b: Int = 4
Or first type { then write your commands line by line finishing with }
scala> {
| val x:Int = 1
| val y:Int = 2
| println(x + y)
| }
3
Or you can use :paste to turn on paste mode, and then enter your code.
scala> :paste
// Entering paste mode (ctrl-D to finish)
if (true)
print("that was true")
else
print("false")
// Exiting paste mode, now interpreting.
that was true

Not sure where my assignment is going

I ran into some issues today making assignments to a var field in a case class instance stored in a map. Here's a simple session in the repl demonstrating the problem:
scala> case class X(var x: Int)
defined class X
scala> val m = Map('x -> X(1))
m: scala.collection.immutable.Map[Symbol,X] = Map('x -> X(1))
scala> m
res0: scala.collection.immutable.Map[Symbol,X] = Map('x -> X(1))
scala> m('x).x = 7
scala> m
res1: scala.collection.immutable.Map[Symbol,X] = Map('x -> X(1))
scala> val x = m('x)
x: X = X(1)
scala> x.x = 7
x.x: Int = 7
scala> x
res2: X = X(7)
scala> m
res3: scala.collection.immutable.Map[Symbol,X] = Map('x -> X(7))
scala> m('x).x_=(8)
scala> m
res5: scala.collection.immutable.Map[Symbol,X] = Map('x -> X(8))
The first attempt at assignment does nothing. However, storing the instance in a val and then doing the assignment works, as does directly calling the assignment method for the field.
I'm using Scala 2.9.2.
If this is expected behavior, it would be nice if someone could explain it to me because I can't seem to make sense of it right now. If this is a bug then that would be good to know as well.
Either way, it would also be interesting to know where that first m('x).x = 7 assignment is going. I assume something is getting mutated somewhere—I just have no idea what that something could be.
Update: It looks like this only happens in the repl. I just tried compiling the code and the assignment happens as expected. So, what is the repl doing to my assignment?
This seems to be a bug. If one executes this with a 2.10 nightly an error message is thrown:
scala> m('x).x = 7
<console>:10: error: ')' expected but string literal found.
+ "m(scala.Symbol("x")).x: Int = " + `$ires0` + "\n"
^
I created a ticket for this.

Scala Compilation Error using Partition

scala> val set = Set("apricot", "banana", "clementine", "durian", "fig", "guava", "jackfruit", "kiwi", "lime", "mango")
set: scala.collection.immutable.Set[java.lang.String] = Set(banana, durian, fig, jackfruit, lime, mango, clementine, apricot, kiwi, guava)
scala> set.partition(_ length > 5)
<console>:1: error: ')' expected but integer literal found.
set.partition(_ length > 5)
^
scala> set.partition(_.length > 5)
res5: (scala.collection.immutable.Set[java.lang.String], scala.collection.immutable.Set[java.lang.String]) = (Set(banana, durian, jackfruit, clementine, apricot),Set(fig, lime, mango, kiwi, guava))
Can someone please explain why does it complain when I execute
set.partition(_ length > 5)
and not when I execute
set.partition(_.length > 5)
I have also tried the following with little success:
scala> set.partition((_ length) > 5)
<console>:9: error: missing parameter type for expanded function ((x$1) => x$1.length)
set.partition((_ length) > 5)
^
When you drop the dot, Scala assumes you have a one-parameter method. In other words, when you say _ length > 5 it thinks that length is a method requiring one argument, that > is that argument, and then it doesn't konw what to do with the 5.
Notice that this is similar to when you write 5 + 5. This statement is the same as writing 5.+(5), but you are dropping the dot and parentheses. Scala notices the missing dot and assumes (correctly) that + is a method requiring a single argument.
If you write "abc" length by itself, then there is nothing for Scala to assume is the argument, so it then realizes that length doesn't require one.
So:
"abc".length // fine
"abc" length // fine
"abc".length > 4 // fine
("abc" length) > 4 // fine
"abc" length > 4 // error!
Thus, you need either a dot or parentheses to tell the compiler that "abc" and length go together and have no additional argument.
In terms of coding style, I suggest you always use the dot for zero-arg methods. I find it more readable and it will help you to avoid compilation errors such as this one.
When you write:
set.partition(_ length > 5)
// ^ object
// ^ method
// ^ parameter
it treats length as a method that receives one parameter, >.
While when you write:
set.partition(_.length > 5)
// ^------^ object
// ^ method
// ^ parameter
it treats _.length as the object, > is the parameter, and 5 is the argument.