java.lang.NoClassDefFoundError: org/apache/spark/sq/sources/v2/StreamingWriteSupportProvider trying to pull from kafka topic in scala - scala

I'm using a spark-shell instance to test the pulling of data from a client's kafka source. To launch the instance I am using the command spark-shell --jars spark-sql-kafka-0-10_2.11-2.5.0-palantir.8.jar, kafka_2.12-2.5.0.jar, kafka-clients-2.5.0.jar (all jars are present in the woring dir).
However, when I run the command val df = spark.read.format("kafka")........... after a few seconds it crashes with the below:
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:304)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 48 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.StreamingWriteSupportProvider
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 79 more
HOWEVER - if I change the order of the jars in the spark-shell command to spark-shell --jars kafka_2.12-2.5.0.jar, kafka-clients-2.5.0.jar, spark-sql-kafka-0-10_2.11-2.5.0-palantir.8.jar, instead crashes with:
java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:376)
at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateBatchOptions(KafkaSourceProvider.scala:330)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.createRelation(KafkaSourceProvider.scala:113)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 48 elided
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 55 more
I am developing behind a very strict proxy managed by our client and am unable to user --packages instead, and I am at a bit of a loss here, am I unable to load all 3 dependencies at the launch of the shell? Am I missing another step somewhere?

In the Structured Streaming + Kafka Integration Guide it says:
For experimenting on spark-shell, you need to add this above library and its dependencies too when invoking spark-shell.
The library you are using seems to be customized and not publicly available in the maven central repository. That means, I can not look into its dependencies.
However, looking at the latest stable version 2.4.5 the dependencies according to maven central repository is kafka-clients version 2.0.0.

You are trying to import multiple scala versions 2.11 & 2.12 of different libraries.
Please add same version of scala libraries & check below how to import into spark-shell.
spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5,org.apache.kafka:kafka_2.11:2.4.1,org.apache.kafka:kafka-clients:2.4.1

One occasionally disruptive issue is dealing with dependency conflicts in cases where a user application and Spark itself both depend on the same library. This comes up relatively rarely, but when it does, it can be vexing for users. Typically, this will manifest itself when a NoSuchMethodError, a ClassNotFoundException, or some other JVM exception related to class loading is thrown during the execution of a Spark job. There are two solutions to this problem. The first is to modify your application to depend on the same version of the third-party library that Spark does. The second is to modify the packaging of your application using a procedure that is often called “shading.” The Maven build tool supports shading through advanced configuration of the plug-in shown in Example 7-5 (in fact, the shading capability is why the plugin is named maven-shade-plugin). Shading allows you to make a second copy of the conflicting package under a different namespace and rewrites your application’s code to use the renamed version. This somewhat brute-force technique is quite effective at resolving runtime dependency conflicts. For specific instructions on how to shade dependencies, see the documentation for your build tool.
I would try to know the scala version of the spark-shell because, it can be a scala version issue
scala> util.Properties.versionString
res3: String = version 2.11.8
if not, then check what spark version you are using and third-party library versions you are using as dependencies because, I am sure there is newest or oldest that your spark version doesn't support.
I hope it helps.

Related

Solving Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

I'm using Jetbrains IntelliJ IDEA with the Scala plugin and I'm trying to execute some code that uses Apache Spark. However whenever I try to run it, the code doesn't execute properly because of the exception
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:76)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:71)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:58)
at KMeans$.main(kmeans.scala:71)
at KMeans.main(kmeans.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
Running spark-shell from terminal doesn't give me any problems, the warning unable to load native-hadoop library for your platform doesn't appear to me.
I've read some questions similar to mine, but in those cases they had problems with spark-shell or with cluster configuration.
I was using spark-core_2.12-2.4.3.jar without the dependencies. I solved the issue by adding spark-core library through Maven, which automatically added all the dependencies.

Kafka connect ftp

While running the connectors for kafka-connect-ftp showing this below error
Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:54)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:54)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:279)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:260)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:201)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:193)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:153)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:47)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:75)
i have referred two community website which is already given in confluent connector website
https://github.com/Eneco/kafka-connect-ftp
https://github.com/Landoop/stream-reactor
Anyone please suggest a solution for this error.
There are two main issues with this connector:
It bundles, along with its dependencies, classes from Connect's API. That is, it bundles classes in the package org.apache.kafka.connect. This is not advised and such dependencies should be marked as provided.
The actual reason that the connector fails with class loading issues is that it is depending (at least according to its master branch) on a version of Apache Kafka that probably does not match the version of the deployed Connect worker. Specifically, it depends on kafkaVersion = '0.10.2.0' which is not the latest. Kafka Connect, in its recent versions that offer class loading isolation, will ignore what it considers system classes, such as classes in org.apache.kafka.connect when they are imported by connectors' jars. Instead, it will load such classes from the Kafka Connect jars that ship with Apache Kafka.
The above issues may cause class loading failures, as the ones you observe.
Ideally, they should be addressed at the connector level.
Workarounds you may apply are:
build the connector code from source, after upgrading the Kafka version it depends upon. Also mark its Kafka dependencies as provided (including Kafka Connect and Kafka Clients dependencies). Or,
downgrade your deployed Kafka Connect version to match exactly the version that the connector currently depends upon.
A similar issue has been recorded here:
kafka mongodb sink connector not starting

How to configure titan over hbase in java eclipse?

using the following commands as given in the tutorial :
http://s3.thinkaurelius.com/docs/titan/0.5.0/hbase.html
TitanGraph graph = TitanFactory.build()
.set("storage.backend","hbase")
.open();
used the maven dependency :
<dependency>
<groupId>com.thinkaurelius.titan</groupId>
<artifactId>titan-hbase</artifactId>
<version>${titan.version}</version>
</dependency>
Following error is shown
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/MasterNotRunningException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:42)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:479)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:413)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1320)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94)
at com.thinkaurelius.titan.core.TitanFactory$Builder.open(TitanFactory.java:135)
at pluradj.titan.tinkerpop3.example.JavaExample2.main(JavaExample2.java:26)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.MasterNotRunningException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
If possible can you tell the same for cassandra also.
Look here at how to set your runtime classpath in Eclipse: How do I set the runtime classpath in Eclipse 4.2?
It appears your runtime classpath is missing jars due to the NoClassDefFoundError exception. For this specific error, locate the hbase lib directory from your hbase install and add that to your classpath.
If you use Cassandra, you'll need to set your classpath appropriately for Cassandra.
Titan's HBase client config accepts arbitrary keys from hbase-site.xml
(if it's on the CLASSPATH) and I recommend putting that in your path as well.

mapreduce code working on eclipse but not on cluster

I am working on code which uses openNLP. My code runs on eclipse perfectly, but when I run its jar on a cluster, I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: opennlp/tools/util/ObjectStream
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: java.lang.ClassNotFoundException: opennlp.tools.util.ObjectStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
... 3 more
You need to have the OpenNLP jar available and in your classpath on your tasks. There are several options:
-libjars and HADOOP_CLASSPATH, see Using the libjars option with Hadoop
'fat jar': build a jar that contains all the necessary jars, submit the fat jar instead
install the 3rd party jars on all nodes (ie. make the cluster '3rd party aware')
use the HDFS distributed cache and download the necessary jars in your code
For a lengthier discussion see How-to: Include Third-Party Libraries in Your MapReduce Job

JOGL throwing ClassNotFoundException?

I've seen this question brought up a couple of times on this website, but never really seen a clear answer, so excuse me from repeating it. While programming with JOGL and Java3D I've encountered some errors. I was trying to create a project that I might eventually put on the Android App Store. I began the project just using Java3D and JOGL and putting them in the system library on my mac, where they worked fine. Then to try to make the project portable I moved the J3D and JOGL files inside the project so they could be compiled into a jar file that would be runnable without needing to install j3d and JOGL. But then every time I ran the project it threw this error:
Exception in thread "main" java.lang.NoClassDefFoundError: javax/media/opengl/GL
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at javax.media.j3d.Pipeline$PipelineCreator.run(Pipeline.java:73)
at javax.media.j3d.Pipeline$PipelineCreator.run(Pipeline.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.media.j3d.Pipeline.createPipeline(Pipeline.java:90)
at javax.media.j3d.MasterControl.loadLibraries(MasterControl.java:832)
at javax.media.j3d.VirtualUniverse.<clinit>(VirtualUniverse.java:274)
at javax.media.j3d.GroupRetained.<init>(GroupRetained.java:155)
at javax.media.j3d.TransformGroupRetained.<init>(TransformGroupRetained.java:116)
at javax.media.j3d.TransformGroup.createRetained(TransformGroup.java:114)
at javax.media.j3d.SceneGraphObject.<init>(SceneGraphObject.java:114)
at javax.media.j3d.Node.<init>(Node.java:172)
at javax.media.j3d.Group.<init>(Group.java:549)
at javax.media.j3d.TransformGroup.<init>(TransformGroup.java:87)
at src.Project.<clinit>(Project.java:47)
at src.ProjectPanel.<clinit>(ProjectPanel.java:8)
Caused by: java.lang.ClassNotFoundException: javax.media.opengl.GL
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 17 more
I'm using Eclipse as an IDE, and have the jogl-all.jar and gluegen-rt.jar files in the classpath of the project, as well as all of the require j3d jars, but it cannot find the GL.class file for some reason.
Thanks in advance for help.
When you export your application as a Runnable JAR use the
+ Library handling:
Copy required libraries into a sub-folder next to the generated JAR
or
+ Library handling:
Package required libraries into generated JAR
More information is available in the jogamp jogl wiki:
http://jogamp.org/wiki/index.php/Setting_up_a_JogAmp_project_in_your_favorite_IDE
http://jogamp.org/wiki/index.php/JogAmp_JAR_File_Handling
Also you will need to use the java -jar yourapp.jar command line option to run your application.