Class not found exception when trying to run JAR file with Spark - scala

I am following Spark Quick Start tutorial page
I reached the last point, compiled my file to a JAR that should be ready to go.
Running my application from the terminal:
spark-submit --class "SimpleApp" --master local[4] /usr/local/spark/target/scala-2.11
Gives the following error:
2018-10-07 20:29:17 WARN Utils:66 - Your hostname, test-ThinkPad-X230 resolves to a loopback address: 127.0.1.1; using 172.17.147.32 instead (on interface wlp3s0)
2018-10-07 20:29:17 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-10-07 20:29:17 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
java.lang.ClassNotFoundException: SimpleApp
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-10-07 20:29:18 INFO ShutdownHookManager:54 - Shutdown hook called
2018-10-07 20:29:18 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-08d94e7e-ae24-4892-a704-727a6caa1733
Why won't it find my SimpleApp class? I've tried giving it the full path. My SimpleApp.scala is in my root Spark folder, /usr/local/spark/

Best way to deploy your app to spark is to use sbt assembly plugin. It will create a fat jar that contains all your dependencies. After packaging your app you have to point spark to the jar directly.
Good luck.

Add your Spark JAR in your spark submit. A spark-submit submit looks as below:
./bin/spark-submit
--class
--master
--deploy-mode
application-jar is the JAR file that you have build.
Hope this helps :)

Related

How to resolve the Flink hadoop utils error?

After building the app jar for flink when I submit the job I see error as below but the jar is available and added to sbt as well as the enable plugin list:
Submitting job...
/opt/flink/bin/flink run --jobmanager flink-archiver-kinesis2iceberg-
jobmanager:8081 --class io.archiver.Job --
parallelism 2 --detached /opt/artifacts/sp-archive-scala-2.12.jar
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.flink.runtime.util.HadoopUtils
at io.archiver.Job$.main(Job.scala:54)
at io.archiver.Job.main(Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

SQL Server dependency not found

I have packaged an application in a jar file using sbt for this purpose.
When I run the app from the IDE (IntelliJ) it works without issues.
However, when I try to run directly the jar, I have 2 different issues.
When I run it from spark-submit, I get:
[cloudera#quickstart bin]$ spark-submit --class com.my.app.main --master local[0] /home/cloudera/Projects/myapp/target/scala-2.11/myapp.jar
Exception in thread "main" java.lang.NoClassDefFoundError: com/microsoft/sqlserver/jdbc/SQLServerDataSource
When I run it from java I get:
[cloudera#quickstart scala-2.11]$ java -jar myapp.jar
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
at com.my.app.main$.main(main.scala:13)
at com.my.app.main.main(main.scala)
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
Remark that the JDBC for SQL Server is already placed in the lib folder, where it's suppossed to be automatically recognized by sbt when it generates the package.
Any help will be very appreciated.
Thank you.
EDIT: My question is not answered on that post
taken from
https://stackoverflow.com/a/52546145/1498109
i would modify for your case:
spark-submit --class com.my.app.main \
--master local[0] /home/cloudera/Projects/myapp/target/scala-2.11/myapp.jar \
--jars path_to/sqljdbc42.jar
this one worked for me, donwload the jdbc jar from official microsoft site.

Running Fat Jar with Spark 2.0 on cluster with only Spark 1.6 support

I am trying to run a Spark 2.1 application on Cloudera cluster which does not yet support Spark 2.
I was following answers:
https://stackoverflow.com/a/44434835/1549135
https://stackoverflow.com/a/41359175/1549135
Which seem to be correct, however I get a strange error during spark-submit:
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
at scopt.OptionParser.parse(options.scala:370)
at com.rxcorp.cesespoke.config.WasherConfig$.parse(WasherConfig.scala:22)
at com.rxcorp.cesespoke.Process$.main(Process.scala:27)
at com.rxcorp.cesespoke.Process.main(Process.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Denis Makarenko answer hint I have added:
spark-submit \
...
--conf 'spark.executor.extraJavaOptions=-verbose:class' \
--conf 'spark.driver.extraJavaOptions=-verbose:class' \
...
Just to see that, as said in the answer - we are running on the wrong classpath here! Checking the logs, I could clearly find:
[Loaded scala.runtime.IntRef from file:/opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/jars/spark-assembly-1.6.0-cdh5.8.4-hadoop2.6.0-cdh5.8.4.jar]
Which is obviously the source of the problem.
After carefully checking the given posts from the beginning:
You should use spark-submit from the newer Spark installation (I'd
suggest using the latest and greatest 2.1.1 as of this writing) and
bundle all Spark jars as part of your Spark application.
So this is how I will follow!
I also recommend on reading:
http://www.mostlymaths.net/2017/05/shading-dependencies-with-sbt-assembly.html
Exception in thread "main" java.lang.NoSuchMethodError:
scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
NoSuchMethodError often indicates a jar version mismatch. Since the missing method is in the scala.runtime package, most likely the problem is caused by compiling the code with one version of Scala, say 2.11, and running it with another one (2.10).
Check the Scala version in your build.sbt (scalaVersion := ...) and run JVM with -verbose:class parameter to make sure these Scala versions match.

Trying out Cloudera Spark Tutorial won't work "classnotfoundexception"

I tried solutions suggested in similar existing post but none works for me :-( getting really hopeless, so I decided to post this as a new question.
I tried a tutorial (link below) on building a first scala or java application with Spark in a Cloudera VM.
this is my spark-submit command and its output
[cloudera#quickstart sparkwordcount]$ spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local /home/cloudera/src/main/scala/com/cloudera/sparkwordcount/target/sparkwordcount-0.0.1-SNAPSHOT.jar
java.lang.ClassNotFoundException: com.cloudera.sparkwordcount.SparkWordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:176)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[cloudera#quickstart sparkwordcount]$ spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local /home/cloudera/src/main/scala/com/cloudera/sparkwordcount/target/sparkwordcount-0.0.1-SNAPSHOT.jar
I also tried updating the pom.xml file with my actual CDH, Spark and Scala versions but still not working.
When I extract the jar file previously generated by maven using mvn package, I cannot find any .class file inside its hiearachy of folders.
Sorry, I am bit new to Cloudera and Spark. I basically tried following the following tutorial with Scala: https://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
I checked the class, folder and scala file names quite a few names very closely, specially lower/uppercase issues, nothing seemed wrong.
I opened my jar and there is some file hierarchy and in the deepest folder I can find again the pom.xml file, but I cannot see any .class files anywhere inside the jar. Does it mean the compilation via "mvn package" didn't actually work, even though the console output said Building went successful?
I was having same issue. Try rerunning by changing class name from
--class com.cloudera.sparkwordcount.SparkWordCount
to
--class SparkWordCount
The full command i used looked like:
spark-submit --class SparkWordCount --master local --deploy-mode client --executor-memory 1g --name wordcount --conf "spark.app.id=wordcount" target/sparkwordcount-0.0.1-SNAPSHOT.jar /user/cloudera/inputfile.txt 2

A weird exception using sbt-pack and scala 2.10.ClassNotFoundException: scala.collection.GenTraversableOnce

i am trying to compile and assemble a scala project with sbt-pack, this is the build sbt file and this is the plugins.sbt.
The project has two subprojects, common and main, common has this structure:
MacBook-Pro-Retina-de-Alonso:common aironman$ ls src/main/scala/common/utils/cassandra/
CassandraConnectionUri.scala Helper.scala Pillar.scala
main has this one:
MacBook-Pro-Retina-de-Alonso:my-twitter-cassandra-app aironman$ ls main/src/main/scala/
com common
Inside com folder, i have this folders:
MacBook-Pro-Retina-de-Alonso:my-twitter-cassandra-app aironman$ ls main/src/main/scala/com/databricks/apps/twitter_classifier/
Collect.scala ExamineAndTrain.scala Predict.scala Utils.scala
Where Collect, ExamineAndTrain and Predict are object with main functions.
Inside common folder, i have this:
MacBook-Pro-Retina-de-Alonso:my-twitter-cassandra-app aironman$ ls main/src/main/scala/common/utils/cassandra/
CassandraMain.scala
The project compiles and i can pack it generating some folders under target folder. This is the set of folders and the libs under /target/pack/lib.
The problem happens when i try to run the generated command:
MacBook-Pro-Retina-de-Alonso:my-twitter-cassandra-app aironman$ target/pack/bin/collect /tmp/tweets 10000 10 1
Initializing Streaming Spark Context...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/15 11:02:54 INFO SparkContext: Running Spark version 1.4.0
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.apache.spark.util.TimeStampedWeakValueHashMap.<init>(TimeStampedWeakValueHashMap.scala:42)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:277)
at com.databricks.apps.twitter_classifier.Collect$.main(Collect.scala:37)
at com.databricks.apps.twitter_classifier.Collect.main(Collect.scala)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 4 more
16/03/15 11:02:54 INFO Utils: Shutdown hook called
Spark almost is able to start, but it finally crashes.
I know that the most probable problem here is some libraries are compiled against scala 2.10 and others are compiled against 2.11, but, in my build.sbt file i am providing scalaVersion := "2.10.4" in val commonSettings and val sparkDependencies is set to NOT use %provided%, i can see every spark jar files compiled with 2.10:
MacBook-Pro-Retina-de-Alonso:my-twitter-cassandra-app aironman$ ls target/pack/lib/spark*
target/pack/lib/spark-catalyst_2.10-1.4.0.jar
target/pack/lib/spark-network-shuffle_2.10-1.4.0.jar
target/pack/lib/spark-core_2.10-1.4.0.jar
target/pack/lib/spark-sql_2.10-1.4.0.jar
target/pack/lib/spark-graphx_2.10-1.4.0.jar
target/pack/lib/spark-streaming-twitter_2.10-1.4.0.jar
target/pack/lib/spark-launcher_2.10-1.4.0.jar
target/pack/lib/spark-streaming_2.10-1.4.0.jar
target/pack/lib/spark-mllib_2.10-1.4.0.jar
target/pack/lib/spark-twitter-lang-classifier-using-cassandra_2.10-0.1-SNAPSHOT.jar
target/pack/lib/spark-network-common_2.10-1.4.0.jar
target/pack/lib/spark-unsafe_2.10-1.4.0.jar
UPDATE
I can see this scala libraries in lib folder:
MacBook-Pro-Retina-de-Alonso:my-twitter-cassandra-app aironman$ ls target/pack/lib/scala*
**target/pack/lib/scala-async_2.11-0.9.1.jar
target/pack/lib/scala-library-2.11.1.jar**
target/pack/lib/scalap-2.10.0.jar
target/pack/lib/scala-compiler-2.10.4.jar
target/pack/lib/scala-reflect-2.10.4.jar
the question now is, theoretically i am compiling with 2.10.4, so, how is it possible that scala-library-2.11.1 and scala-async_2.11-0.9.1.jar are within lib folder?
how can i force to use the correct versions?
UPDATE 2
the problem was related with wrong version of "com.chrisomeara" % "pillar_2.11" % "2.0.1". The correct version is "com.chrisomeara" % "pillar_2.10" % "2.0.1"
This setting is the correct one and now i can see the right scala libraries:
MacBook-Pro-Retina-de-Alonso:my-twitter-cassandra-app aironman$ ls target/pack/lib/scala*
target/pack/lib/scala-async_2.10-0.9.1.jar
target/pack/lib/scala-library-2.10.6.jar
target/pack/lib/scalap-2.10.0.jar
target/pack/lib/scala-compiler-2.10.4.jar
target/pack/lib/scala-reflect-2.10.4.jar
Now there is another exception, but i think it is another problem not related with this one, so thank you Yuval.
MacBook-Pro-Retina-de-Alonso:my-twitter-cassandra-app aironman$ target/pack/bin/collect /tmp/tweets 10000 10 1
Initializing Streaming Spark Context...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/15 11:39:12 INFO SparkContext: Running Spark version 1.4.0
16/03/15 11:39:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/15 11:39:12 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at com.databricks.apps.twitter_classifier.Collect$.main(Collect.scala:37)
at com.databricks.apps.twitter_classifier.Collect.main(Collect.scala)
16/03/15 11:39:12 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at com.databricks.apps.twitter_classifier.Collect$.main(Collect.scala:37)
at com.databricks.apps.twitter_classifier.Collect.main(Collect.scala)
16/03/15 11:39:12 INFO Utils: Shutdown hook called