I have learned that Scala is suffering from a limitation, that all Scala bytecodes needs to be generated from same compiler version. e.g. I cannot have a library built for 2.9 to work with my application which is built by 2.9.1
http://lift.la/blog/scalas-version-fragility-make-the-enterprise
I tried to search from the web for more discussion on this issue but cannot find much updates. Is this issue, as in Scala 2.11.6, fixed in any extend?
In Scala, the 'middle' number in the version string is the major version, so in 2.10.x and 2.11.x, the major version is 10 and 11 respectively.
Major versions are binary compatible. Therefore, if you have a library compiled against Scala 2.11.0, you can safely use it in a project that uses 2.11.6 without recompilation, and vice versa. If your library was compiled for Scala 2.10.5, you would have to compile it newly to use in a Scala 2.11.x project.
If your code doesn't call into deprecated API, it should be source compatible with the subsequent major version.
Most libraries are published for at least two major versions at the same time, so there is quite a bit of elasticity. Take an example, Scalaz, it has its latest artifacts cross-built for Scala 2.9.3, Scala 2.10.x, and Scala 2.11.x.
Related
I would like to cross-build some of my Bazel targets to Scala 2.12 and 2.13. As a further point of complexity, I need to be able to express cross-target dependencies (eg. some 2.13 target may have a Bazel dependency on a 2.12 target).
Note: this isn't a regular library dependency (eg. with the dependency 2.12-built JAR showing up on the class path when compiling the 2.13 JAR), as that would almost surely result in issues due to having two incompatible versions of the Scala standard library on the class path. Rather, this is just a case where I need the dependency JAR built so I can use it in some integration tests in the 2.13 target.
What I've found online so far...
This issue from rules_scala seems it doesn't support baking the Scala version into the target and instead you have to pick the Scala version globally.
This Databricks post has a cross-building section that is exactly what I think I would like (eg. one target generated per library per supported Scala version), but the snippets in that post don't seem to be backed by any runnable Bazel code.
This later post by Databricks also hints at a cross_scala_lib rule, but also doesn't have any accompanying code.
I am building my first Spark application, developing with IDEA.
In my cluster, the version of Spark is 2.1.0, and the version of Scala is 2.11.8.
http://spark.apache.org/downloads.html tells me:"Starting version 2.0, Spark is built with Scala 2.11 by default. Scala 2.10 users should download the Spark source package and build with Scala 2.10 support".
So here is my question:What's the meaning of "Scala 2.10 users should download the Spark source package and build with Scala 2.10 support"? Why not use the version of Scala 2.1.1?
Another question:Which version of Scala can I choose?
First a word about the "why".
The reason this subject evens exists is that scala versions are not (generally speacking) binary compatible, although most of the times, source code is compatible.
So you can take Scala 2.10 source and compile it into 2.11.x or 2.10.x versions. But 2.10.x compiled binaries (JARs) can not be run in a 2.11.x environment.
You can read more on the subject.
Spark Distributions
So, the Spark package, as you mention, is built for Scala 2.11.x runtimes.
That means you can not run a Scala 2.10.x JAR of yours, on a cluster / Spark instance that runs with the spark.apache.org-built distribution of spark.
What would work is :
You compile your JAR for scala 2.11.x and keep the same spark
You recompile Spark for Scala 2.10 and keep your JAR as is
What are your options
Compiling your own JAR for Scala 2.11 instead of 2.10 is usually far easier than compiling Spark in and of itself (lots of dependencies to get right).
Usually, your scala code is built with sbt, and sbt can target a specific scala version, see for example, this thread on SO. It is a matter of specifying :
scalaVersion in ThisBuild := "2.10.0"
You can also use sbt to "cross build", that is, build different JARs for different scala versions.
crossScalaVersions := Seq("2.11.11", "2.12.2")
How to chose a scala version
Well, this is "sort of" opinion based. My recommandation would be : chose the scala version that matches your production Spark cluster.
If your production Spark is 2.3 downloaded from https://spark.apache.org/downloads.html, then as they say, it uses Scala 2.11 and that is what you should use too. Using anything else, in my view, just leaves the door open for various incompatibilities down the road.
Stick with what your production needs.
Which version of com.databricks.spark.csv is compatible for Spark 1.6.1, and scala 2.10.5?
I can see
com.databricks_spark-csv_2.10-1.5.0.jar
com.databricks_spark-csv_2.11-1.3.0.jar
already available on my machine, and as per my understanding goes, if I have scala version 2.10, then the first option is the one that I have to use. just wanted to re-confirm.
You should use the first jar as you have scala 2.10 on your machine
com.databricks_spark-csv_2.10-1.5.0.jar
As 2.10 means it is meant for scala 2.10
At the Spark 2.1 docs it's mentioned that
Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark 2.1.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).
at the Scala 2.12 release news it's also mentioned that:
Although Scala 2.11 and 2.12 are mostly source compatible to facilitate cross-building, they are not binary compatible. This allows us to keep improving the Scala compiler and standard library.
But when I build an uber jar (using Scala 2.12) and run it on Spark 2.1. every thing work just fine.
and I know its not any official source but at the 47 degree blog they mentioned that Spark 2.1 does support Scala 2.12.
How can one explain those (conflicts?) pieces of information ?
Spark does not support Scala 2.12. You can follow SPARK-14220 (Build and test Spark against Scala 2.12) to get up to date status.
update:
Spark 2.4 added an experimental Scala 2.12 support.
Scala 2.12 is officially supported (and required) as of Spark 3. Summary:
Spark 2.0 - 2.3: Required Scala 2.11
Spark 2.4: Supported Scala 2.11 and Scala 2.12, but not really cause almost all runtimes only supported Scala 2.11.
Spark 3: Only Scala 2.12 is supported
Using a Spark runtime that's compiled with one Scala version and a JAR file that's compiled with another Scala version is dangerous and causes strange bugs. For example, as noted here, using a Scala 2.11 compiled JAR on a Spark 3 cluster will cause this error: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps.
Look at all the poor Spark users running into this very error.
Make sure to look into Scala cross compilation and understand the %% operator in SBT to limit your suffering. Maintaining Scala projects is hard and minimizing your dependencies is recommended.
To add to the answer, I believe it is a typo https://spark.apache.org/releases/spark-release-2-0-0.html has no mention of scala 2.12.
Also, if we look at timings Scala 2.12 was not released untill November 2016 and Spark 2.0.0 was released on July 2016.
References:
https://spark.apache.org/news/index.html
www.scala-lang.org/news/2.12.0/
Is it possible to include 2.10 packages in 2.11 programs, and vis versa?
Are there any special corner cases I should watch out for?
It's not possible. Major versions are not binary compatible with each other. So for any Scala versions 2.x.y and 2.w.z, where x != w, they will not be compatible. All libraries must be compiled against the same major version, however minor version differences are okay.