Scala/Spark version compatibility - scala

I am building my first spark application.
http://spark.apache.org/downloads.html tells me that Spark 2.x is built against Scala 2.11.
On the Scala site https://www.scala-lang.org/download/all.html I am seeing the versions from 2.11.0 - 2.11.11
So here is my question: what exactly does the 2.11 on the Spark site mean. Is it any Scala version in the 2.11.0 - 2.11.11 range?
Another question: Can I build my Spark apps using the latest Scala 2.12.2? I assume that Scala is backward compatible, so Spark libraries built with Scala say 2.11.x can be used/called in Scala 2.12.1 applications. Am I correct?

Scala is not backwards compatible, as you assume. You must use scala 2.11 with spark unless you rebuild spark under scala 2.12 (which is an option if you want to use the latest Scala version, but requires more work to get everything working).
When considering compatibility, you need to consider both source compatibility and binary compatibility. Scala does tend to be source backwards compatible, so you can rebuild your jar under a newer version, but it is not binary backward compatible, so you can't use a jar built with an old version with code from a new version.
This is just major versions, so scala 2.10, 2.11, 2.12 etc. are all major versions and are not binary compatible (even if they are source compatible). Within a major version though compatibility is maintained, so Scala 2.11 is compatible with all versions 2.11.0 - 2.11.11 (plus any future 2.11 revisions will also be compatible)
It is for this reason that you will see most Scala libraries have separate releases for each major Scala version. You have to make sure that any library you use provides a jar for the version you are using, and that you use that jar and not one for a different version. If you use SBT %% will handle selecting the correct version for you but with maven you need to make sure to use the correct artifact name. The versions are typically prepended with _2.10, _2.11, and _2.12 referring to the scala version the jar is built for.

For anyone who wants to get jump started, this is the versioning pair I've used.
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.2",
"org.apache.spark" %% "spark-sql" % "2.3.2"
)

I used these versions of Scala and Spark and it worked OK for my need:
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"
Some libraries need 2.11 version of Scala, and in this case one should use the versions mentioned by #the775.
NOTE : This is an old answer, it is no longer available now, as newer versions of Scala and Spark exist.

Related

How to integrate sbt-protoc and scalapb with sbt cross build?

I have a library that needs two different versions of "com.thesamet.scalapb" %% "compilerplugin" depending on the Scala version.
In my project/scalapb.sbt I have this code:
def scalapbVersion(version:String): String =
if(version == "2.11") {
println(s">>>>>>>> Using 0.9.7 to fix 2.11 compat. ${version}")
"0.9.7"
} else {
println(s">>>>>>>> Using last version. ${version}")
"0.10.2"
}
libraryDependencies += "com.thesamet.scalapb" %% "compilerplugin" % scalapbVersion(scalaBinaryVersion.value)
Executing sbt clean "++2.11.12 compile I get >>>>>>>> Using lastest version. 2.12 but in the logs, also, I can see that the cross-build plugin changes the version to Scala 2.11 after the previous message:
[info] Setting Scala version to 2.11.12 on 13 projects.
[info] Excluded 1 projects, run ++ 2.11.12 -v for more details.
So I suppose that the order is:
sbt load plugins configuration with the default Scala version.
cross-build changes the scala version
How to integrate sbt-protoc with sbt cross-build?
sbt files in the project directory are evaluated before a specific scala version is picked up for cross building. This is why passing ++2.11.12 has no effect on scalaBinaryVersion in the context of project/scalapb.sbt.
Using different versions of compilerplugin in a single build is not officially supported at this point, but there are a few workarounds that you can try:
Download scalapbc for the version of ScalaPB you would like to use. Write a shell script that generates sources using ScalaPBC. Check in the generated sources into your code repository. Manually add scalapb-runtime into your libraryDependencies in build.sbt:
libraryDependencies += "com.thesamet.scalapb" %% "scalapb-runtime" % (if (scalaVersion.value == "2.12.10") "0.10.8" else "0.9.7")
Use 0.9.7 for all scala versions.
If it's reasonable for your situation, consider dropping Scala 2.11 support as Scala 2.11 reached end-of-life a few years ago.

sbt dependencies ignoring version

In my build.sbt files I'm stating that I want to use version 18.9 from a library:
val finagleVersion = "18.9.0"
<zip>
lazy val commonDependencies = Seq(
<zip>,
"com.twitter" %% "finagle-core" % finagleVersion,
but this seems to be ignored when I run sbt with
scalacOptions ++= (compilerOptions :+ "-Ylog-classpath"),
which outputs all the jars used at compile time. And there I see that for every finagle dependency including core the 19.3 version is used:
C:\Users\<me>\.coursier\cache\v1\https\<me>%40<company repo>\artifactory\Central-cache\com\twitter\finagle-core_2.12\19.3.0\finagle-core_2.12-19.3.0.jar
Where is this "preference" for the latest versions coming from?
After using evicted and seeing which library overrides the version you want, you can opt to use dependencyOverrides. For example:
dependencyOverrides += "com.twitter" %% "finagle-core" % "18.9.0"
You do have to be careful though as the library that depends on Finagle too may require the newer version and break if you use the older version. That is why you should really check first which library is evicting the old version, and validate if it's ok to do so.
Also important, this is a livy-only feature so the override won't be present in the published pom.xml!

How to compile scala library to latest scala version

I want to use this library https://github.com/prismicio/scala-kit.
My project's scala version is 2.13.1. When I add this library as dependency:
val prismic = "io.prismic" % "scala-kit_2.11" % "1.3.1"
I get NoClassDefinitionFound error at runtime. I guess it's due to version conflict. So how do I publish this library so it works for scala version 2.13.1?
Scala has no binary compatibility across major versions (i.e. 2.11.1 and 2.13.1). Either downgrade to 2.11.1 (or 2.12.2 if you switch to the latest version of the library) scalaVersion := "2.11.1" or compile the library to 2.13.1 yourself.
What you should use is
val prismic = "io.prismic" %% "scala-kit" % "1.3.1"
which adds the correct suffix automatically, and you can't forget to change it when changing the Scala version. But if you look at https://mvnrepository.com/artifact/io.prismic/scala-kit, you'll see there is no version for 2.13 at all, but there are versions for 2.12. So if you want to change your Scala version, use 2.12.10 (the latest 2.12).

Cross-build scala API for various spark version

i know that it exist cross-build options to generate various version of a scala API running with different scala version. Let's say i will stay with the same scala version 2.11.12, how can i set my build.sbt to handle multiple version of spark. I have some hint about the "provided" option for dependencies but i'm not sure it is the best way to handle it.
Bonus: what if some spark version are using 2.11 and others 2.12...
Please let us know if you've already ridden through this issue.
"org.apache.spark" %% "spark-core" % "2.3.2"
Use the double % -sign when defining dependencies, sbt then will handle the scala version for you. You should use the same scala version for all your dependencies.

How to add native library dependencies to sbt project?

I want to add a Java library (e.g. Apache PDFBox) to an sbt project.
This is the Ivy dependency:
dependency org="org.apache.pdfbox" name="pdfbox" rev="1.8.2"
I first tried to do the following:
resolvers += "Sonatype releases" at "http://oss.sonatype.org/content/repositories/releases/"
libraryDependencies += "org.apache.pdfbox" %% "pdfbox" % "1.8.2"
But it gives me errors of the type
[warn] ==== public: tried [warn]
http://repo1.maven.org/maven2/org/apache/pdfbox/pdfbox_2.10/1.8.2/pdfbox_2.10-1.8.2.pom
So I understand that with this syntax I can just manage Scala dependencies. I am sure that there is a way to manage Java dependencies, but how?
I tried to search in Google for "sbt add java dependencies" but did not find (recognize) a relevant result.
You should replace the %% (double percent) with single one.
libraryDependencies += "org.apache.pdfbox" % "pdfbox" % "1.8.2"
The double-percent is a convenience operator, and causes adding the _+scalaVersion postfix inside the path, which is _2.10 in your case. Single percent should fix the problem.
Short answer:
Use
libraryDependencies += "org.apache.pdfbox" % "pdfbox" % "1.8.2"
For java libraries, and
libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.8"
For Scala libraries, where the difference is the double % for the scala library.
Long answer:
Scala is not backward compatible across major version, so a library compiled for scala 2.12.x cannot be used by a project written in scala 2.13.x.
So when writing a scala library, you will need to compile and publish it one time per scala major version you would like to support. When using a library in a project, you would then have to pick the version compiled for the same Scala major version as your are using. Doing this manually would be cumbersome, so SBT has built in support for it.
When publishing a library, you can add the crossScalaVersions key to SBT like
crossScalaVersions := Seq( "2.10.6", "2.11.11", "2.12.3" )
And then publish with sbt +publish. This will make SBT build and publish a version of the library for both scala 2.10.6, 2.11.11 and 2.12.3. Note that the minor number is in-relevant, when it comes to compatibility for libraries. The published libraries, will have the name suffixed with _2.10, _2.11 and _2.12 to show what scala version it is for. An alternative to using the SBT build in support for this, is to use the experimental plugin sbt-projectmatrix as this gives a lot more options, and often faster builds.
When using a library sbt can also help your use the one compiled for the correct scala version, and thats where %% comes into play. When specifying a library without the _ suffix in the name, but instead use %%, then sbt will fill in suffix matching the Scala major version your use, and thereby fetch the correct version of the library.