How to enable Partial Unification in Spark REPL with Scala 2.11.8? - scala

I have Scala code written in Scala 2.11.12 using the partial-unification compiler option, which I would like to run in a Spark 2.2.2 REPL.
With a Spark version compiled against Scala 2.11.12 (i.e. 2.3+), this is possible in the Spark REPL via :settings -Ypartial-unification, and the code executes.
I want to run this on Spark 2.2.2, which is compiled against Scala 2.11.8.
To do this, I have downloaded the jar with the partial unification compiler plugin (source from: https://github.com/milessabin/si2712fix-plugin), which backports this setting.
I've played around with a Scala 2.11.8 REPL (adding jar to the classpath - seems too rudimentary) and haven't managed to get it working there (before trying to add it to Spark), and am asking if anyone knows how to do this or if adding a compiler setting to a REPL via a JAR is not possible.
Any other advice appreciated!

Related

Could not find or load main class Error when changing scala compiler version

The default version of Scala complier of Scala IDE for eclipse is 2.12. It runs well for hello world.
However when I change the Scala complier version to 2.11 like following:
Then it shows Error: Could not find or load main class
When I change back to 2.12, it works again.
My question:
My Scala IDE for eclipse version is latest. Why this happens? And what should I do? I have to change to scala 2.11 to load the apache-spark 2.4 jar files. Otherwise it will show errors.

How to choose the scala version for my spark program?

I am building my first Spark application, developing with IDEA.
In my cluster, the version of Spark is 2.1.0, and the version of Scala is 2.11.8.
http://spark.apache.org/downloads.html tells me:"Starting version 2.0, Spark is built with Scala 2.11 by default. Scala 2.10 users should download the Spark source package and build with Scala 2.10 support".
So here is my question:What's the meaning of "Scala 2.10 users should download the Spark source package and build with Scala 2.10 support"? Why not use the version of Scala 2.1.1?
Another question:Which version of Scala can I choose?
First a word about the "why".
The reason this subject evens exists is that scala versions are not (generally speacking) binary compatible, although most of the times, source code is compatible.
So you can take Scala 2.10 source and compile it into 2.11.x or 2.10.x versions. But 2.10.x compiled binaries (JARs) can not be run in a 2.11.x environment.
You can read more on the subject.
Spark Distributions
So, the Spark package, as you mention, is built for Scala 2.11.x runtimes.
That means you can not run a Scala 2.10.x JAR of yours, on a cluster / Spark instance that runs with the spark.apache.org-built distribution of spark.
What would work is :
You compile your JAR for scala 2.11.x and keep the same spark
You recompile Spark for Scala 2.10 and keep your JAR as is
What are your options
Compiling your own JAR for Scala 2.11 instead of 2.10 is usually far easier than compiling Spark in and of itself (lots of dependencies to get right).
Usually, your scala code is built with sbt, and sbt can target a specific scala version, see for example, this thread on SO. It is a matter of specifying :
scalaVersion in ThisBuild := "2.10.0"
You can also use sbt to "cross build", that is, build different JARs for different scala versions.
crossScalaVersions := Seq("2.11.11", "2.12.2")
How to chose a scala version
Well, this is "sort of" opinion based. My recommandation would be : chose the scala version that matches your production Spark cluster.
If your production Spark is 2.3 downloaded from https://spark.apache.org/downloads.html, then as they say, it uses Scala 2.11 and that is what you should use too. Using anything else, in my view, just leaves the door open for various incompatibilities down the road.
Stick with what your production needs.

com.databricks.spark.csv version requirement

Which version of com.databricks.spark.csv is compatible for Spark 1.6.1, and scala 2.10.5?
I can see
com.databricks_spark-csv_2.10-1.5.0.jar
com.databricks_spark-csv_2.11-1.3.0.jar
already available on my machine, and as per my understanding goes, if I have scala version 2.10, then the first option is the one that I have to use. just wanted to re-confirm.
You should use the first jar as you have scala 2.10 on your machine
com.databricks_spark-csv_2.10-1.5.0.jar
As 2.10 means it is meant for scala 2.10

Using Scala 2.12 with Spark 2.x

At the Spark 2.1 docs it's mentioned that
Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark 2.1.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).
at the Scala 2.12 release news it's also mentioned that:
Although Scala 2.11 and 2.12 are mostly source compatible to facilitate cross-building, they are not binary compatible. This allows us to keep improving the Scala compiler and standard library.
But when I build an uber jar (using Scala 2.12) and run it on Spark 2.1. every thing work just fine.
and I know its not any official source but at the 47 degree blog they mentioned that Spark 2.1 does support Scala 2.12.
How can one explain those (conflicts?) pieces of information ?
Spark does not support Scala 2.12. You can follow SPARK-14220 (Build and test Spark against Scala 2.12) to get up to date status.
update:
Spark 2.4 added an experimental Scala 2.12 support.
Scala 2.12 is officially supported (and required) as of Spark 3. Summary:
Spark 2.0 - 2.3: Required Scala 2.11
Spark 2.4: Supported Scala 2.11 and Scala 2.12, but not really cause almost all runtimes only supported Scala 2.11.
Spark 3: Only Scala 2.12 is supported
Using a Spark runtime that's compiled with one Scala version and a JAR file that's compiled with another Scala version is dangerous and causes strange bugs. For example, as noted here, using a Scala 2.11 compiled JAR on a Spark 3 cluster will cause this error: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps.
Look at all the poor Spark users running into this very error.
Make sure to look into Scala cross compilation and understand the %% operator in SBT to limit your suffering. Maintaining Scala projects is hard and minimizing your dependencies is recommended.
To add to the answer, I believe it is a typo https://spark.apache.org/releases/spark-release-2-0-0.html has no mention of scala 2.12.
Also, if we look at timings Scala 2.12 was not released untill November 2016 and Spark 2.0.0 was released on July 2016.
References:
https://spark.apache.org/news/index.html
www.scala-lang.org/news/2.12.0/

Scala Runtime errors calling program on Spark Job Server

I used spark 1.6.2 and Scala 11.8 to compile my project. The generated uber jar with dependencies is placed inside Spark Job Server (that seems to use Scala 10.4 (SCALA_VERSION=2.10.4 specified in .sh file)
There is no problem in starting the server, uploading context/ app jars. But at runtime, the following errors occur
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror
Why do Scala 2.11 and Spark with scallop lead to "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror"? talks about using Scala 10 to compile the sources. Is it true?
Any suggestions please...
Use scala 2.10.4 to compile your project. Otherwise you need to compile spark with 11 too.