Spark ClassNotFoundException running the master - scala

I have downloaded and built Spark 0.80 using sbt/sbt assembly. It was successful. However when running ./bin/start-master.sh the following error is seen in the log file
Spark Command: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -cp :/shared/spark-0.8.0-incubating-bin-hadoop1/conf:/shared/spark-0.8.0-incubating-bin-hadoop1/assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop1.0.4.jar
/shared/spark-0.8.0-incubating-bin-hadoop1/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip mellyrn.local --port 7077 --webui-port 8080
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/deploy/master/Master
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.master.Master
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Update: after doing sbt clean (per suggestion below) it is running: see screenshot.

There can be a number of things which cause this error which are not specific to Spark:
Bad build, sbt clean compile that puppy again.
You have a cached dependency in your .ivy2 cache which conflicts with a dependency of that project version of Spark. Empty your cache and try again.
Your project which is building on Spark has a library version which conflicts with a dependency of Spark. That is, Spark may dependency on "foo-0.9.7" while your project put in "foo-0.8.4".
Try looking at those first.

Related

java.lang.NoClassDefFoundError: org/apache/spark/sq/sources/v2/StreamingWriteSupportProvider trying to pull from kafka topic in scala

I'm using a spark-shell instance to test the pulling of data from a client's kafka source. To launch the instance I am using the command spark-shell --jars spark-sql-kafka-0-10_2.11-2.5.0-palantir.8.jar, kafka_2.12-2.5.0.jar, kafka-clients-2.5.0.jar (all jars are present in the woring dir).
However, when I run the command val df = spark.read.format("kafka")........... after a few seconds it crashes with the below:
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:304)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 48 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.StreamingWriteSupportProvider
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 79 more
HOWEVER - if I change the order of the jars in the spark-shell command to spark-shell --jars kafka_2.12-2.5.0.jar, kafka-clients-2.5.0.jar, spark-sql-kafka-0-10_2.11-2.5.0-palantir.8.jar, instead crashes with:
java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateBatchOptions(KafkaSourceProvider.scala:330)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.createRelation(KafkaSourceProvider.scala:113)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 48 elided
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 55 more
I am developing behind a very strict proxy managed by our client and am unable to user --packages instead, and I am at a bit of a loss here, am I unable to load all 3 dependencies at the launch of the shell? Am I missing another step somewhere?
In the Structured Streaming + Kafka Integration Guide it says:
For experimenting on spark-shell, you need to add this above library and its dependencies too when invoking spark-shell.
The library you are using seems to be customized and not publicly available in the maven central repository. That means, I can not look into its dependencies.
However, looking at the latest stable version 2.4.5 the dependencies according to maven central repository is kafka-clients version 2.0.0.
You are trying to import multiple scala versions 2.11 & 2.12 of different libraries.
Please add same version of scala libraries & check below how to import into spark-shell.
spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5,org.apache.kafka:kafka_2.11:2.4.1,org.apache.kafka:kafka-clients:2.4.1
One occasionally disruptive issue is dealing with dependency conflicts in cases where a user application and Spark itself both depend on the same library. This comes up relatively rarely, but when it does, it can be vexing for users. Typically, this will manifest itself when a NoSuchMethodError, a ClassNotFoundException, or some other JVM exception related to class loading is thrown during the execution of a Spark job. There are two solutions to this problem. The first is to modify your application to depend on the same version of the third-party library that Spark does. The second is to modify the packaging of your application using a procedure that is often called “shading.” The Maven build tool supports shading through advanced configuration of the plug-in shown in Example 7-5 (in fact, the shading capability is why the plugin is named maven-shade-plugin). Shading allows you to make a second copy of the conflicting package under a different namespace and rewrites your application’s code to use the renamed version. This somewhat brute-force technique is quite effective at resolving runtime dependency conflicts. For specific instructions on how to shade dependencies, see the documentation for your build tool.
I would try to know the scala version of the spark-shell because, it can be a scala version issue
scala> util.Properties.versionString
res3: String = version 2.11.8
if not, then check what spark version you are using and third-party library versions you are using as dependencies because, I am sure there is newest or oldest that your spark version doesn't support.
I hope it helps.

Solving Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

I'm using Jetbrains IntelliJ IDEA with the Scala plugin and I'm trying to execute some code that uses Apache Spark. However whenever I try to run it, the code doesn't execute properly because of the exception
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:76)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:71)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:58)
at KMeans$.main(kmeans.scala:71)
at KMeans.main(kmeans.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
Running spark-shell from terminal doesn't give me any problems, the warning unable to load native-hadoop library for your platform doesn't appear to me.
I've read some questions similar to mine, but in those cases they had problems with spark-shell or with cluster configuration.
I was using spark-core_2.12-2.4.3.jar without the dependencies. I solved the issue by adding spark-core library through Maven, which automatically added all the dependencies.

NoSuchMethodError from dependencies when using spark-submit

I'm trying to submit a JAR to my Apache Spark 2.2.1 cluster using Scala 2.11. I included some extra dependencies in my JAR, namely Apache Commons CLI, and packaged it all into a fat JAR. However, I'm getting a NoSuchMethodError when I submit my Spark application. I'm quite sure that it's not due to inconsistency in Scala versions, but rather something weird with dependencies.
The command is simply spark-submit myjar.jar [arguments]
This is the error:
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.commons.cli.Options.addRequiredOption(Ljava/lang/String;Ljava/lang/String;ZLjava/lang/
String;)Lorg/apache/commons/cli/Options;
at xyz.plenglin.aurum.spark.RunOnSpark$.main(RunOnSpark.scala:46)
at xyz.plenglin.aurum.spark.RunOnSpark.main(RunOnSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Running java -jar myjar.jar [arguments] works without any issues. Peeking inside the JAR, I see org.apache.commons.cli.Options where it should be.
It looks like I fixed it by adding the --driver-class-path argument to my spark-submit command.
So the command looks like:
$ spark-submit --driver-class-path myjar.jar myjar.jar [arguments]

mapreduce code working on eclipse but not on cluster

I am working on code which uses openNLP. My code runs on eclipse perfectly, but when I run its jar on a cluster, I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: opennlp/tools/util/ObjectStream
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: java.lang.ClassNotFoundException: opennlp.tools.util.ObjectStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
... 3 more
You need to have the OpenNLP jar available and in your classpath on your tasks. There are several options:
-libjars and HADOOP_CLASSPATH, see Using the libjars option with Hadoop
'fat jar': build a jar that contains all the necessary jars, submit the fat jar instead
install the 3rd party jars on all nodes (ie. make the cluster '3rd party aware')
use the HDFS distributed cache and download the necessary jars in your code
For a lengthier discussion see How-to: Include Third-Party Libraries in Your MapReduce Job

JOGL throwing ClassNotFoundException?

I've seen this question brought up a couple of times on this website, but never really seen a clear answer, so excuse me from repeating it. While programming with JOGL and Java3D I've encountered some errors. I was trying to create a project that I might eventually put on the Android App Store. I began the project just using Java3D and JOGL and putting them in the system library on my mac, where they worked fine. Then to try to make the project portable I moved the J3D and JOGL files inside the project so they could be compiled into a jar file that would be runnable without needing to install j3d and JOGL. But then every time I ran the project it threw this error:
Exception in thread "main" java.lang.NoClassDefFoundError: javax/media/opengl/GL
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at javax.media.j3d.Pipeline$PipelineCreator.run(Pipeline.java:73)
at javax.media.j3d.Pipeline$PipelineCreator.run(Pipeline.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.media.j3d.Pipeline.createPipeline(Pipeline.java:90)
at javax.media.j3d.MasterControl.loadLibraries(MasterControl.java:832)
at javax.media.j3d.VirtualUniverse.<clinit>(VirtualUniverse.java:274)
at javax.media.j3d.GroupRetained.<init>(GroupRetained.java:155)
at javax.media.j3d.TransformGroupRetained.<init>(TransformGroupRetained.java:116)
at javax.media.j3d.TransformGroup.createRetained(TransformGroup.java:114)
at javax.media.j3d.SceneGraphObject.<init>(SceneGraphObject.java:114)
at javax.media.j3d.Node.<init>(Node.java:172)
at javax.media.j3d.Group.<init>(Group.java:549)
at javax.media.j3d.TransformGroup.<init>(TransformGroup.java:87)
at src.Project.<clinit>(Project.java:47)
at src.ProjectPanel.<clinit>(ProjectPanel.java:8)
Caused by: java.lang.ClassNotFoundException: javax.media.opengl.GL
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 17 more
I'm using Eclipse as an IDE, and have the jogl-all.jar and gluegen-rt.jar files in the classpath of the project, as well as all of the require j3d jars, but it cannot find the GL.class file for some reason.
Thanks in advance for help.
When you export your application as a Runnable JAR use the
+ Library handling:
Copy required libraries into a sub-folder next to the generated JAR
or
+ Library handling:
Package required libraries into generated JAR
More information is available in the jogamp jogl wiki:
http://jogamp.org/wiki/index.php/Setting_up_a_JogAmp_project_in_your_favorite_IDE
http://jogamp.org/wiki/index.php/JogAmp_JAR_File_Handling
Also you will need to use the java -jar yourapp.jar command line option to run your application.