I'm trying to submit a JAR to my Apache Spark 2.2.1 cluster using Scala 2.11. I included some extra dependencies in my JAR, namely Apache Commons CLI, and packaged it all into a fat JAR. However, I'm getting a NoSuchMethodError when I submit my Spark application. I'm quite sure that it's not due to inconsistency in Scala versions, but rather something weird with dependencies.
The command is simply spark-submit myjar.jar [arguments]
This is the error:
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.commons.cli.Options.addRequiredOption(Ljava/lang/String;Ljava/lang/String;ZLjava/lang/
String;)Lorg/apache/commons/cli/Options;
at xyz.plenglin.aurum.spark.RunOnSpark$.main(RunOnSpark.scala:46)
at xyz.plenglin.aurum.spark.RunOnSpark.main(RunOnSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Running java -jar myjar.jar [arguments] works without any issues. Peeking inside the JAR, I see org.apache.commons.cli.Options where it should be.
It looks like I fixed it by adding the --driver-class-path argument to my spark-submit command.
So the command looks like:
$ spark-submit --driver-class-path myjar.jar myjar.jar [arguments]
Related
I'm using a spark-shell instance to test the pulling of data from a client's kafka source. To launch the instance I am using the command spark-shell --jars spark-sql-kafka-0-10_2.11-2.5.0-palantir.8.jar, kafka_2.12-2.5.0.jar, kafka-clients-2.5.0.jar (all jars are present in the woring dir).
However, when I run the command val df = spark.read.format("kafka")........... after a few seconds it crashes with the below:
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:304)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 48 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.StreamingWriteSupportProvider
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 79 more
HOWEVER - if I change the order of the jars in the spark-shell command to spark-shell --jars kafka_2.12-2.5.0.jar, kafka-clients-2.5.0.jar, spark-sql-kafka-0-10_2.11-2.5.0-palantir.8.jar, instead crashes with:
java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateBatchOptions(KafkaSourceProvider.scala:330)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.createRelation(KafkaSourceProvider.scala:113)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 48 elided
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 55 more
I am developing behind a very strict proxy managed by our client and am unable to user --packages instead, and I am at a bit of a loss here, am I unable to load all 3 dependencies at the launch of the shell? Am I missing another step somewhere?
In the Structured Streaming + Kafka Integration Guide it says:
For experimenting on spark-shell, you need to add this above library and its dependencies too when invoking spark-shell.
The library you are using seems to be customized and not publicly available in the maven central repository. That means, I can not look into its dependencies.
However, looking at the latest stable version 2.4.5 the dependencies according to maven central repository is kafka-clients version 2.0.0.
You are trying to import multiple scala versions 2.11 & 2.12 of different libraries.
Please add same version of scala libraries & check below how to import into spark-shell.
spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5,org.apache.kafka:kafka_2.11:2.4.1,org.apache.kafka:kafka-clients:2.4.1
One occasionally disruptive issue is dealing with dependency conflicts in cases where a user application and Spark itself both depend on the same library. This comes up relatively rarely, but when it does, it can be vexing for users. Typically, this will manifest itself when a NoSuchMethodError, a ClassNotFoundException, or some other JVM exception related to class loading is thrown during the execution of a Spark job. There are two solutions to this problem. The first is to modify your application to depend on the same version of the third-party library that Spark does. The second is to modify the packaging of your application using a procedure that is often called “shading.” The Maven build tool supports shading through advanced configuration of the plug-in shown in Example 7-5 (in fact, the shading capability is why the plugin is named maven-shade-plugin). Shading allows you to make a second copy of the conflicting package under a different namespace and rewrites your application’s code to use the renamed version. This somewhat brute-force technique is quite effective at resolving runtime dependency conflicts. For specific instructions on how to shade dependencies, see the documentation for your build tool.
I would try to know the scala version of the spark-shell because, it can be a scala version issue
scala> util.Properties.versionString
res3: String = version 2.11.8
if not, then check what spark version you are using and third-party library versions you are using as dependencies because, I am sure there is newest or oldest that your spark version doesn't support.
I hope it helps.
I am trying to create an executable Jar from my Scala project. Therefore, I followed a tutorial. It suggested creating a Java main class within eclipse, which calls the Scala entry point. This works fine when executing within eclipse. After adding this class I was able to export an executable jar. However, it wont work with
java -jar myjar.jar
I made sure to activate "Package required library into jar" when exporting. My Java main class looks like this (where Driver is also located in the package core)
package core;
public class Main {
public static void main(String[] args) {
Driver.main(args);
}
}
And when executing the exported char the follwing error is thrown, which I can debug:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NoClassDefFoundError: scala/collection/Seq
at core.Driver$.main(Driver.scala:14)
at core.Driver.main(Driver.scala)
at core.Main.main(Main.java:5)
... 5 more
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
You could use the One-jar application to guarantee all the necessary jars are in the jarfile, I suspect the scala-lang jars are not in it or in the classpath of the command line.
Maven: https://code.google.com/p/onejar-maven-plugin/
SBT: https://github.com/sbt/sbt-onejar
If you aren't using either maven or SBT I would suggest switching to them as well as they are the main supported build systems in Scala
I am working on code which uses openNLP. My code runs on eclipse perfectly, but when I run its jar on a cluster, I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: opennlp/tools/util/ObjectStream
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: java.lang.ClassNotFoundException: opennlp.tools.util.ObjectStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
... 3 more
You need to have the OpenNLP jar available and in your classpath on your tasks. There are several options:
-libjars and HADOOP_CLASSPATH, see Using the libjars option with Hadoop
'fat jar': build a jar that contains all the necessary jars, submit the fat jar instead
install the 3rd party jars on all nodes (ie. make the cluster '3rd party aware')
use the HDFS distributed cache and download the necessary jars in your code
For a lengthier discussion see How-to: Include Third-Party Libraries in Your MapReduce Job
I have downloaded and built Spark 0.80 using sbt/sbt assembly. It was successful. However when running ./bin/start-master.sh the following error is seen in the log file
Spark Command: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -cp :/shared/spark-0.8.0-incubating-bin-hadoop1/conf:/shared/spark-0.8.0-incubating-bin-hadoop1/assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop1.0.4.jar
/shared/spark-0.8.0-incubating-bin-hadoop1/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip mellyrn.local --port 7077 --webui-port 8080
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/deploy/master/Master
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.master.Master
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Update: after doing sbt clean (per suggestion below) it is running: see screenshot.
There can be a number of things which cause this error which are not specific to Spark:
Bad build, sbt clean compile that puppy again.
You have a cached dependency in your .ivy2 cache which conflicts with a dependency of that project version of Spark. Empty your cache and try again.
Your project which is building on Spark has a library version which conflicts with a dependency of Spark. That is, Spark may dependency on "foo-0.9.7" while your project put in "foo-0.8.4".
Try looking at those first.
When I package my scala code into a jar using Eclipse, I am unable to run said jar file. I've got:
object Hello extends App{
println("Hello World!")
}
and:
public class Runner {
public static void main(String[] args) {
Hello.main(args);
}
}
I use eclipse to package this into a runnable jar called Hello.jar. java -jar Hello.jar then gives:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/App
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at Hello.main(Hello.scala)
at Runner.main(Runner.java:4)
Caused by: java.lang.ClassNotFoundException: scala.App
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 13 more
Your application has a dependency on the Scala runtime libraries. These need to be available when the app is run, and currently they are not.
There are two ways to fix this:
Specify the classpath when you launch the JAR - convenient as a one-off, but clunky. Run java -jar /path/to/yourapp.jar from the command line, and add a -cp parameter which includes the path to the Scala library JAR.
Package up the Scala library JAR inside your own JAR, and modify the Manifest to reference this JAR (by adding the Class-Path attribute - see here). This requires more work with your build process, but then means that the app can be launched by simply double-clicking on the resulting JAR.
That's because eclipse don't ship scala-library with your code. Even more, you should make sure that all dependencies are in manifest file for executable jar. Here's a similar question from stackoverflow:
running scala apps with java -jar