Scala Runtime errors calling program on Spark Job Server - scala

I used spark 1.6.2 and Scala 11.8 to compile my project. The generated uber jar with dependencies is placed inside Spark Job Server (that seems to use Scala 10.4 (SCALA_VERSION=2.10.4 specified in .sh file)
There is no problem in starting the server, uploading context/ app jars. But at runtime, the following errors occur
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror
Why do Scala 2.11 and Spark with scallop lead to "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror"? talks about using Scala 10 to compile the sources. Is it true?
Any suggestions please...

Use scala 2.10.4 to compile your project. Otherwise you need to compile spark with 11 too.

Related

How to enable Partial Unification in Spark REPL with Scala 2.11.8?

I have Scala code written in Scala 2.11.12 using the partial-unification compiler option, which I would like to run in a Spark 2.2.2 REPL.
With a Spark version compiled against Scala 2.11.12 (i.e. 2.3+), this is possible in the Spark REPL via :settings -Ypartial-unification, and the code executes.
I want to run this on Spark 2.2.2, which is compiled against Scala 2.11.8.
To do this, I have downloaded the jar with the partial unification compiler plugin (source from: https://github.com/milessabin/si2712fix-plugin), which backports this setting.
I've played around with a Scala 2.11.8 REPL (adding jar to the classpath - seems too rudimentary) and haven't managed to get it working there (before trying to add it to Spark), and am asking if anyone knows how to do this or if adding a compiler setting to a REPL via a JAR is not possible.
Any other advice appreciated!

spark-submit on standalone cluster complain about scala-2.10 jars not exist

I'm new to Spark and downloaded a pre-compiled Spark binaries from Apache (Spark-2.1.0-bin-hadoop2.7)
When submitting my scala (2.11.8) uber jar the cluster throw and error:
java.lang.IllegalStateException: Library directory '/root/spark/assembly/target/scala-2.10/jars' does not exist; make sure Spark is built
I'm not running Scala 2.10 and Spark isn't compiled (as much as I know) with Scala 2.10
Could it be that one of my dependencies is based on Scala 2.10 ?
Any suggestions what can be wrong ?
Note sure what is wrong with the pre-built spark-2.1.0 but I've just downloaded spark 2.2.0 and it is working great.
Try setting SPARK_HOME="location to your spark installation" on your system or IDE

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD

Please note that I am better dataminer than programmer.
I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"),
and I run into following problem.
When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD"
Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?
First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven.
I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success.
I am also using windows 7.
My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.
The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)
The author suggest an hybrid development approach:
Keep the frontier of development in the REPL, and, as pieces of code
harden, move them over into a compiled library.
Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.
In the book the author describes how the simplesparkproject can be executed:
use maven to compile and package the project
cd simplesparkproject/
mvn package
start the spark-shell with the jar dependencies
spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md
Then you can access you object within the spark-shell as follows:
val myApp = com.cloudera.datascience.MyApp
However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml.
Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.
<!--<scope>provided</scope>-->
you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.
Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.

How to Compile Apache Spark with Scala 2.11.1 using SBT?

I've been trying to compile Apache spark with scala-2.11.1 (the latest version at the time). However, each time I try it ends up compiling everything to scala-2.10.*. I don't understand why.
The official documentation suggests that we use maven for compilation after switching to 2.11 using script in the dev/ folder.
What if I wanted to use sbt instead?
You need to enable scala-2.11 profile
>sbt -Dscala-2.11=true
sbt> compile

Why do Scala 2.11 and Spark with scallop lead to "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror"?

I am using Scala 2.11, Spark, and Scallop (https://github.com/scallop/scallop). I used sbt to build an application fat jar without Spark provided dependencies (this is at: analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar)
I am able to run the program fine in sbt.
I tried to run it from the command line as follows:
time ADD_JARS=analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar java -cp /Applications/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar com.dtex.analysis.transform.GenUserSummaryView -d /Users/arun/DataSets/LME -p output -s txt -o /Users/arun/tmp/LME/LME
I get the following error message:
Exception in thread "main" java.lang.NoSuchMethodError:
scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
at org.rogach.scallop.package$.(package.scala:37) at
org.rogach.scallop.package$.(package.scala) at
com.dtex.analysis.transform.GenUserSummaryView$Conf.delayedEndpoint$com$dtex$analysis$transform$GenUserSummaryView$Conf$1(GenUserSummaryView.scala:27)
at
com.dtex.analysis.transform.GenUserSummaryView$Conf$delayedInit$body.apply(GenUserSummaryView.scala:26)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at org.rogach.scallop.AfterInit$class.delayedInit(AfterInit.scala:12)
at org.rogach.scallop.ScallopConf.delayedInit(ScallopConf.scala:26)
at
com.dtex.analysis.transform.GenUserSummaryView$Conf.(GenUserSummaryView.scala:26)
at
com.dtex.analysis.transform.GenUserSummaryView$.main(GenUserSummaryView.scala:54)
at
com.dtex.analysis.transform.GenUserSummaryView.main(GenUserSummaryView.scala)
The issue is that you've used incompatible Scala versions, i.e. Spark was compiled with Scala 2.10 and you were trying to use Scala 2.11.
Move everything to Scala 2.10 version and make sure you update your SBT as well.
You may also try to compile Spark sources for Scala 2.11.7 and use it instead.
I was also encountered with the same issue with spark-submit, in my case:
Spark Job was compiled with : Scala 2.10.8
Scala version with which Spark was compiled on the cluster: Scala 2.11.8
To check the Spark version and Scala version on the cluster use "spark-shell" command.
After compiling the Spark Job source with Scala 2.11.8 then submitted the job & it worked !!!.