Running Fat Jar with Spark 2.0 on cluster with only Spark 1.6 support - scala

I am trying to run a Spark 2.1 application on Cloudera cluster which does not yet support Spark 2.
I was following answers:
https://stackoverflow.com/a/44434835/1549135
https://stackoverflow.com/a/41359175/1549135
Which seem to be correct, however I get a strange error during spark-submit:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
at scopt.OptionParser.parse(options.scala:370)
at com.rxcorp.cesespoke.config.WasherConfig$.parse(WasherConfig.scala:22)
at com.rxcorp.cesespoke.Process$.main(Process.scala:27)
at com.rxcorp.cesespoke.Process.main(Process.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Denis Makarenko answer hint I have added:
spark-submit \
...
--conf 'spark.executor.extraJavaOptions=-verbose:class' \
--conf 'spark.driver.extraJavaOptions=-verbose:class' \
...
Just to see that, as said in the answer - we are running on the wrong classpath here! Checking the logs, I could clearly find:
[Loaded scala.runtime.IntRef from file:/opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/jars/spark-assembly-1.6.0-cdh5.8.4-hadoop2.6.0-cdh5.8.4.jar]
Which is obviously the source of the problem.
After carefully checking the given posts from the beginning:
You should use spark-submit from the newer Spark installation (I'd
suggest using the latest and greatest 2.1.1 as of this writing) and
bundle all Spark jars as part of your Spark application.
So this is how I will follow!
I also recommend on reading:
http://www.mostlymaths.net/2017/05/shading-dependencies-with-sbt-assembly.html

Exception in thread "main" java.lang.NoSuchMethodError:
scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
NoSuchMethodError often indicates a jar version mismatch. Since the missing method is in the scala.runtime package, most likely the problem is caused by compiling the code with one version of Scala, say 2.11, and running it with another one (2.10).
Check the Scala version in your build.sbt (scalaVersion := ...) and run JVM with -verbose:class parameter to make sure these Scala versions match.

Related

How to resolve the Flink hadoop utils error?

After building the app jar for flink when I submit the job I see error as below but the jar is available and added to sbt as well as the enable plugin list:
Submitting job...
/opt/flink/bin/flink run --jobmanager flink-archiver-kinesis2iceberg-
jobmanager:8081 --class io.archiver.Job --
parallelism 2 --detached /opt/artifacts/sp-archive-scala-2.12.jar
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.flink.runtime.util.HadoopUtils
at io.archiver.Job$.main(Job.scala:54)
at io.archiver.Job.main(Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Exception while running StreamingContext.start()

Exception while running python code in Windows 10. I am using Apache Kafka and PySpark.
Python code snippet to read data from Kafka
ssc=StreamingContext(sc,60)
zkQuorum, topic = sys.argv[1:]
kvs=KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: [x[0],x[1]])
lines.pprint()
lines.foreachRDD(SaveRecord)
ssc.start()
ssc.awaitTermination()
Exception while running the code
Exception in thread "streaming-start" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
at org.apache.spark.streaming.kafka.KafkaReceiver.<init>(KafkaInputDStream.scala:69)
at org.apache.spark.streaming.kafka.KafkaInputDStream.getReceiver(KafkaInputDStream.scala:60)
at org.apache.spark.streaming.scheduler.ReceiverTracker.$anonfun$launchReceivers$1(ReceiverTracker.scala:441)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:237)
at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.streaming.scheduler.ReceiverTracker.launchReceivers(ReceiverTracker.scala:440)
at org.apache.spark.streaming.scheduler.ReceiverTracker.start(ReceiverTracker.scala:160)
at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:102)
at org.apache.spark.streaming.StreamingContext.$anonfun$start$1(StreamingContext.scala:583)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.ThreadUtils$$anon$1.run(ThreadUtils.scala:145)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
This may be due to incompatible version of Scala with Spark. Make sure your Scala Version in Project configuration matches with the Version your Spark Version supports.
Spark requires Scala 2.12; support for Scala 2.11 was removed in Spark 3.0.0
It is also possible that the third party jar (like dstream-twitter for twitter streaming application or your Kafka streaming jar) is built for unsupported version of Scala in your application.
For me dstream-twitter_2.11-2.3.0-SNAPSHOT For Instance didn't work with Spark 3.0, It gave Exception in thread "streaming-start" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class). But when I updated the dtream-twitter jar with scala 2.12 version it solved the issue.
Make sure you get all the Scala Versions correct.

Trying out Cloudera Spark Tutorial won't work "classnotfoundexception"

I tried solutions suggested in similar existing post but none works for me :-( getting really hopeless, so I decided to post this as a new question.
I tried a tutorial (link below) on building a first scala or java application with Spark in a Cloudera VM.
this is my spark-submit command and its output
[cloudera#quickstart sparkwordcount]$ spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local /home/cloudera/src/main/scala/com/cloudera/sparkwordcount/target/sparkwordcount-0.0.1-SNAPSHOT.jar
java.lang.ClassNotFoundException: com.cloudera.sparkwordcount.SparkWordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:176)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[cloudera#quickstart sparkwordcount]$ spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local /home/cloudera/src/main/scala/com/cloudera/sparkwordcount/target/sparkwordcount-0.0.1-SNAPSHOT.jar
I also tried updating the pom.xml file with my actual CDH, Spark and Scala versions but still not working.
When I extract the jar file previously generated by maven using mvn package, I cannot find any .class file inside its hiearachy of folders.
Sorry, I am bit new to Cloudera and Spark. I basically tried following the following tutorial with Scala: https://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
I checked the class, folder and scala file names quite a few names very closely, specially lower/uppercase issues, nothing seemed wrong.
I opened my jar and there is some file hierarchy and in the deepest folder I can find again the pom.xml file, but I cannot see any .class files anywhere inside the jar. Does it mean the compilation via "mvn package" didn't actually work, even though the console output said Building went successful?
I was having same issue. Try rerunning by changing class name from
--class com.cloudera.sparkwordcount.SparkWordCount
to
--class SparkWordCount
The full command i used looked like:
spark-submit --class SparkWordCount --master local --deploy-mode client --executor-memory 1g --name wordcount --conf "spark.app.id=wordcount" target/sparkwordcount-0.0.1-SNAPSHOT.jar /user/cloudera/inputfile.txt 2

Trying to run Scala test, getting java.lang.ClassNotFoundException: org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner

I have a Gradle project in IntelliJ IDEA 2016.2. Everytime I run the Scala tests in the project, I get the following exception:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:48)
Caused by: java.lang.ClassNotFoundException: org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:123)
... 5 more
I checked the versions of the dependencies and I have added the Scala SDK to the project module as well. I also added the Scala plugin to the Gradle file and installed the Scala plugin in IntelliJ IDEA. Also, the tests run without an error on my colleague's computer so we have no idea what the error could be.
Found the cause: I have an accentuated letter in my user directory's name and IDEA is always trying to use some file from AppData under that directory. I have already changed the idea.properties file, but it has no effect regarding that file.
A possible workaround is using gradle (or maven/sbt/etc.). In my case, I use gradle, I just add #RunWith(classOf[JUnitRunner]) to the scala class I want to test, then execute gradle's test task.
For me the solution of command line length limitation was crucial. Idea offers about 3 ways of how to overcome too long command. Choose another and check.
It's in run configurations.

NoSuchMethodError for Scala Seq line in Spark

I am having an error when trying to run plain Scala code in Spark similar to these posts: this and this
Their problem was that they were using the wrong Scala version to compile their Spark project. However, mine is the correct version.
I have Spark 1.6.0 installed on an AWS EMR cluster to run the program. The project is compiled on my local machine with Scala 2.11 installed and 2.11 listed in all dependencies and build files without any references to 2.10.
This is the exact line that throws the error:
var fieldsSeq: Seq[StructField] = Seq()
And this is the exact error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
at com.myproject.MyJob$.main(MyJob.scala:39)
at com.myproject.MyJob.main(MyJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Spark 1.6 on EMR is still built with Scala 2.10, so yes, you are having the same issue as in the posts you linked. In order to use Spark on EMR, you currently must compile your application with Scala 2.10.
Spark has upgraded their default Scala version to 2.11 as of Spark 2.0 (to be released within the next several months), so once EMR supports Spark 2.0, we will likely follow this new default and compile Spark with Scala 2.11.