NoSuchMethodError for Scala Seq line in Spark - scala

I am having an error when trying to run plain Scala code in Spark similar to these posts: this and this
Their problem was that they were using the wrong Scala version to compile their Spark project. However, mine is the correct version.
I have Spark 1.6.0 installed on an AWS EMR cluster to run the program. The project is compiled on my local machine with Scala 2.11 installed and 2.11 listed in all dependencies and build files without any references to 2.10.
This is the exact line that throws the error:
var fieldsSeq: Seq[StructField] = Seq()
And this is the exact error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
at com.myproject.MyJob$.main(MyJob.scala:39)
at com.myproject.MyJob.main(MyJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Spark 1.6 on EMR is still built with Scala 2.10, so yes, you are having the same issue as in the posts you linked. In order to use Spark on EMR, you currently must compile your application with Scala 2.10.
Spark has upgraded their default Scala version to 2.11 as of Spark 2.0 (to be released within the next several months), so once EMR supports Spark 2.0, we will likely follow this new default and compile Spark with Scala 2.11.

Related

spark & zeppelin problems with integration

I want to connect my locally installed zeppelin 0.10.0 to an also locally installed spark 3.2.0 (I tried the same procedure with spark2.3.0 and it worked.). But it looks like zeppelin itself has an internal spark which uses the internal one every time I try. I have gone through the setting for spark interpreters with no use.
I just want to know if there is anyway I can change the default internal spark that zeppelin uses and change it to a spark 3.2.0 I want to use.
I put the parameters of SPARK_HOME what it is said to be and spark.master local[*] receiving the following error:
org.apache.zeppelin.interpreter.InterpreterException: java.lang.NoSuchMethodError: scala.tools.nsc.Settings.usejavacp()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:76)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:833)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:741)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)
at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: scala.tools.nsc.Settings.usejavacp()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;
at org.apache.zeppelin.spark.SparkScala212Interpreter.open(SparkScala212Interpreter.scala:66)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:121)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
... 8 more
org.apache.zeppelin.interpreter.InterpreterException: java.lang.NoSuchMethodError: scala.tools.nsc.Settings.usejavacp()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:76)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:833)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:741)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)
at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: scala.tools.nsc.Settings.usejavacp()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;
at org.apache.zeppelin.spark.SparkScala212Interpreter.open(SparkScala212Interpreter.scala:66)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:121)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
... 8 more
I've run into the same issue myself - you won't run Spark 3.2.0 on Zeppelin 0.10.0. Spark 3.1.2 works without any issues and Zeppelin has Spark 2.4.5 included - this is a problem with a tool itself.
According to the ticket ZEPPELIN-5565 version 0.10.0 DOES NOT support Spark 3.2.0. This should be fixed in 0.10.1 and 0.11.0 (info from mentioned ticket and I've also checked the Github repo).
Pull request that fixes this issue is much longer, but in Zeppelin 0.10.0 there is this strategic line:
public static final SparkVersion UNSUPPORTED_FUTURE_VERSION = SPARK_3_2_0;

Exception while running StreamingContext.start()

Exception while running python code in Windows 10. I am using Apache Kafka and PySpark.
Python code snippet to read data from Kafka
ssc=StreamingContext(sc,60)
zkQuorum, topic = sys.argv[1:]
kvs=KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: [x[0],x[1]])
lines.pprint()
lines.foreachRDD(SaveRecord)
ssc.start()
ssc.awaitTermination()
Exception while running the code
Exception in thread "streaming-start" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
at org.apache.spark.streaming.kafka.KafkaReceiver.<init>(KafkaInputDStream.scala:69)
at org.apache.spark.streaming.kafka.KafkaInputDStream.getReceiver(KafkaInputDStream.scala:60)
at org.apache.spark.streaming.scheduler.ReceiverTracker.$anonfun$launchReceivers$1(ReceiverTracker.scala:441)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:237)
at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.streaming.scheduler.ReceiverTracker.launchReceivers(ReceiverTracker.scala:440)
at org.apache.spark.streaming.scheduler.ReceiverTracker.start(ReceiverTracker.scala:160)
at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:102)
at org.apache.spark.streaming.StreamingContext.$anonfun$start$1(StreamingContext.scala:583)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.ThreadUtils$$anon$1.run(ThreadUtils.scala:145)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
This may be due to incompatible version of Scala with Spark. Make sure your Scala Version in Project configuration matches with the Version your Spark Version supports.
Spark requires Scala 2.12; support for Scala 2.11 was removed in Spark 3.0.0
It is also possible that the third party jar (like dstream-twitter for twitter streaming application or your Kafka streaming jar) is built for unsupported version of Scala in your application.
For me dstream-twitter_2.11-2.3.0-SNAPSHOT For Instance didn't work with Spark 3.0, It gave Exception in thread "streaming-start" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class). But when I updated the dtream-twitter jar with scala 2.12 version it solved the issue.
Make sure you get all the Scala Versions correct.

Test runs on command line, fails in Scala-IDE

When right-click running a test class, Eclipse failed with
Caused by: java.lang.NoClassDefFoundError: scala/Product$class
at org.scalatest.time.Days$.<init>(Units.scala:291)
at org.scalatest.time.Days$.<clinit>(Units.scala)
at org.scalatest.time.Span$.<init>(Span.scala:585)
at org.scalatest.time.Span$.<clinit>(Span.scala)
at org.scalatest.tools.Runner$.<init>(Runner.scala:779)
at org.scalatest.tools.Runner$.<clinit>(Runner.scala)
at org.scalatest.tools.Runner.main(Runner.scala)
... 6 more
Caused by: java.lang.ClassNotFoundException: scala.Product$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
Yet it ran in the command line with sbt test. The libraries have been updated as described in java.lang.NoClassDefFoundError: scala/Product$class.
This happened with the latest Scala IDE (4.7.0-vfinal-2017-09-29T14:34:02Z-Typesafe) with the patmat project from Coursera's scala course.
What is the cause and how can it be fixed?
Requested info
The Java Build Path is
You have a combination of _2.11 libraries and 2.12.3 Scala library, this won't work. It looks like the _2.11 dependendencies come from SBT (judging from the paths).
You need to either change Scala IDE's Scala version (setting correct scala version on scala ide explains how) or set scalaVersion := "2.12.3" in the SBT project and rerun sbt eclipse.
Please use the Scala library version 2.11 as the other Scala based dependencies like scala-xml and scalatest are based on Scala library version 2.11

Running Fat Jar with Spark 2.0 on cluster with only Spark 1.6 support

I am trying to run a Spark 2.1 application on Cloudera cluster which does not yet support Spark 2.
I was following answers:
https://stackoverflow.com/a/44434835/1549135
https://stackoverflow.com/a/41359175/1549135
Which seem to be correct, however I get a strange error during spark-submit:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
at scopt.OptionParser.parse(options.scala:370)
at com.rxcorp.cesespoke.config.WasherConfig$.parse(WasherConfig.scala:22)
at com.rxcorp.cesespoke.Process$.main(Process.scala:27)
at com.rxcorp.cesespoke.Process.main(Process.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Denis Makarenko answer hint I have added:
spark-submit \
...
--conf 'spark.executor.extraJavaOptions=-verbose:class' \
--conf 'spark.driver.extraJavaOptions=-verbose:class' \
...
Just to see that, as said in the answer - we are running on the wrong classpath here! Checking the logs, I could clearly find:
[Loaded scala.runtime.IntRef from file:/opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/jars/spark-assembly-1.6.0-cdh5.8.4-hadoop2.6.0-cdh5.8.4.jar]
Which is obviously the source of the problem.
After carefully checking the given posts from the beginning:
You should use spark-submit from the newer Spark installation (I'd
suggest using the latest and greatest 2.1.1 as of this writing) and
bundle all Spark jars as part of your Spark application.
So this is how I will follow!
I also recommend on reading:
http://www.mostlymaths.net/2017/05/shading-dependencies-with-sbt-assembly.html
Exception in thread "main" java.lang.NoSuchMethodError:
scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
NoSuchMethodError often indicates a jar version mismatch. Since the missing method is in the scala.runtime package, most likely the problem is caused by compiling the code with one version of Scala, say 2.11, and running it with another one (2.10).
Check the Scala version in your build.sbt (scalaVersion := ...) and run JVM with -verbose:class parameter to make sure these Scala versions match.

Why do Scala 2.11 and Spark with scallop lead to "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror"?

I am using Scala 2.11, Spark, and Scallop (https://github.com/scallop/scallop). I used sbt to build an application fat jar without Spark provided dependencies (this is at: analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar)
I am able to run the program fine in sbt.
I tried to run it from the command line as follows:
time ADD_JARS=analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar java -cp /Applications/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar com.dtex.analysis.transform.GenUserSummaryView -d /Users/arun/DataSets/LME -p output -s txt -o /Users/arun/tmp/LME/LME
I get the following error message:
Exception in thread "main" java.lang.NoSuchMethodError:
scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
at org.rogach.scallop.package$.(package.scala:37) at
org.rogach.scallop.package$.(package.scala) at
com.dtex.analysis.transform.GenUserSummaryView$Conf.delayedEndpoint$com$dtex$analysis$transform$GenUserSummaryView$Conf$1(GenUserSummaryView.scala:27)
at
com.dtex.analysis.transform.GenUserSummaryView$Conf$delayedInit$body.apply(GenUserSummaryView.scala:26)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at org.rogach.scallop.AfterInit$class.delayedInit(AfterInit.scala:12)
at org.rogach.scallop.ScallopConf.delayedInit(ScallopConf.scala:26)
at
com.dtex.analysis.transform.GenUserSummaryView$Conf.(GenUserSummaryView.scala:26)
at
com.dtex.analysis.transform.GenUserSummaryView$.main(GenUserSummaryView.scala:54)
at
com.dtex.analysis.transform.GenUserSummaryView.main(GenUserSummaryView.scala)
The issue is that you've used incompatible Scala versions, i.e. Spark was compiled with Scala 2.10 and you were trying to use Scala 2.11.
Move everything to Scala 2.10 version and make sure you update your SBT as well.
You may also try to compile Spark sources for Scala 2.11.7 and use it instead.
I was also encountered with the same issue with spark-submit, in my case:
Spark Job was compiled with : Scala 2.10.8
Scala version with which Spark was compiled on the cluster: Scala 2.11.8
To check the Spark version and Scala version on the cluster use "spark-shell" command.
After compiling the Spark Job source with Scala 2.11.8 then submitted the job & it worked !!!.