Maven dependency given but still one class not found on execution - class

I have added maven dependencies in pom.xml for cdk but still i get one error of class not found when executing the jar file.
dyna218-128:spark4vs laeeqahmed$ java -cp target/spark4vs-1.0-SNAPSHOT.jar se.uu.farmbio.spark4vs.RunPrediction
Exception in thread "main" java.lang.NoClassDefFoundError: org/openscience/cdk/interfaces/IAtomContainer
Caused by: java.lang.ClassNotFoundException: org.openscience.cdk.interfaces.IAtomContainer
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
POM.XML is as under
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.16</version>
</dependency>
<dependency><!-- SVM depedency -->
<groupId>tw.edu.ntu.csie</groupId>
<artifactId>libsvm</artifactId>
<version>3.1</version>
</dependency>
<dependency>
<groupId>org.openscience.cdk</groupId>
<artifactId>cdk</artifactId>
<version>1.4.7</version>
</dependency>
<repositories>
<repository>
<id>3rdparty</id>
<url>https://maven.ch.cam.ac.uk/content/repositories/thirdparty/</url>
</repository>
</repositories>

Maven dependencies are for building project. Maven Jar Plugin is not using it when JAR id packaged. So you cant run without additional work.
Hovewer there are many solution for this. For example you can use Maven One Jar Plugin and package all dependencies into JAR - but this is not always usable
http://onejar-maven-plugin.googlecode.com/svn/mavensite/usage.html
You can create archive with jar-dependencies
http://maven.apache.org/plugins/maven-assembly-plugin/descriptor-refs.html#jar-with-dependencies
You can merge all jars into one with Maven Shade Plugin
http://maven.apache.org/plugins/maven-shade-plugin/

Related

Hadoop 3 gcs-connector doesn't work properly with latest version of spark 3 standalone mode

I wrote a simple Scala application which reads a parquet file from GCS bucket. The application uses :
JDK 17
Scala 2.12.17
Spark SQL 3.3.1
gcs-connector of hadoop3-2.2.7
The connector is taken from Maven, imported via sbt (Scala build tool). I'm not using the latest, 2.2.9, version because of this issue.
The application works perfectly in local mode, so I tried to switch to the standalone mode.
What I did is these steps:
Downloaded Spark 3.3.1 from here
Started the cluster manually like here
I tried to run the application again and faced this error:
[error] Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
[error] at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
[error] at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
[error] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
[error] at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
[error] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
[error] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
[error] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
[error] at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
[error] at org.apache.parquet.hadoop.util.HadoopInputFile.fromStatus(HadoopInputFile.java:44)
[error] at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
[error] at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:484)
[error] ... 14 more
[error] Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
[error] at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
[error] at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
[error] ... 24 more
Somehow it cannot detect connector's file system: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
My spark configuration is pretty basic:
spark.app.name = "Example app"
spark.master = "spark://YOUR_SPARK_MASTER_HOST:7077"
spark.hadoop.fs.defaultFS = "gs://YOUR_GCP_BUCKET"
spark.hadoop.fs.gs.impl = "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"
spark.hadoop.fs.AbstractFileSystem.gs.impl = "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"
spark.hadoop.google.cloud.auth.service.account.enable = true
spark.hadoop.google.cloud.auth.service.account.json.keyfile = "src/main/resources/gcp_key.json"
I ve found out that the maven version of GCS hadoop connector, is missing dependecies internally.
Ive fixed it by either:
downloading the connector from here https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage and providing to spark configuration on startup. (but it is not recommended to use in production, as the site is clearly stating)
providing missing dependencies for the connector.
to resolve the second option, I did unpack the gcs hadoop connector jar file, looked for the pom.xml, copy dependencies to a new stand alone xml file, and download them using mvn dependency:copy-dependencies -DoutputDirectory=/path/to/pyspark/jars/ command
here is example pom.xml that Ive created, please note I am using the 2.2.9 version of the connector
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<name>TMP_PACKAGE_NAME</name>
<description>
jar dependencies of gcs hadoop connector
</description>
<!--'com.google.oauth-client:google-oauth-client:jar:1.34.1'
-->
<groupId>TMP_PACKAGE_GROUP</groupId>
<artifactId>TMP_PACKAGE_NAME</artifactId>
<version>0.0.1</version>
<dependencies>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop3-2.2.9</version>
</dependency>
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client-jackson2</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>31.1-jre</version>
</dependency>
<dependency>
<groupId>com.google.oauth-client</groupId>
<artifactId>google-oauth-client</artifactId>
<version>1.34.1</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>util</artifactId>
<version>2.2.9</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>util-hadoop</artifactId>
<version>hadoop3-2.2.9</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcsio</artifactId>
<version>2.2.9</version>
</dependency>
<dependency>
<groupId>com.google.auto.value</groupId>
<artifactId>auto-value-annotations</artifactId>
<version>1.10.1</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.flogger</groupId>
<artifactId>flogger</artifactId>
<version>0.7.4</version>
</dependency>
<dependency>
<groupId>com.google.flogger</groupId>
<artifactId>google-extensions</artifactId>
<version>0.7.4</version>
</dependency>
<dependency>
<groupId>com.google.flogger</groupId>
<artifactId>flogger-system-backend</artifactId>
<version>0.7.4</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.10</version>
</dependency>
</dependencies>
</project>
hope this helps
This is caused by the fact that Spark uses an old Guava library version and you used a non-shaded GCS connector jar. To make it work, you just need to use shaded GCS connector jar from Maven, for example: https://repo1.maven.org/maven2/com/google/cloud/bigdataoss/gcs-connector/hadoop3-2.2.9/gcs-connector-hadoop3-2.2.9-shaded.jar

External JARs with Drools Workbench do not resolve on compilation

Summary
I have a drools workbench server that I want to push data model jars to from maven. I can push jars to it, but when I do, the artefacts do not resolve. If I try to upload the artefacts manually through the drools UI, I get other maven errors.
Details
I have a drools installation of drools and KIE 7.24 (and 23) (https://hub.docker.com/r/jboss/jbpm-server-full/) running in a local docker container. I want to push data models to workbench. My pom.xml of the project containing the facts looks like
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"
xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<properties>
<maven.compiler.release>11</maven.compiler.release>
</properties>
<groupId>decisions</groupId>
<artifactId>fact</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>kjar</packaging>
<name>Facts</name>
<description>Facts</description>
<build>
<plugins>
<plugin>
<groupId>org.kie</groupId>
<artifactId>kie-maven-plugin</artifactId>
<version>7.24.0.Final</version>
<extensions>true</extensions>
</plugin>
</plugins>
</build>
<distributionManagement>
<repository>
<id>workbench</id>
<url>http://localhost:8080/business-central/maven2</url>
</repository>
</distributionManagement>
</project>
A mvn deploy on the host succeeds and builds. When I log into the drools container, I can verify that all the files are in $DROOLS_HOME/bin/repositories/kia/global and when I bring up the artifacts screen, they are visible in the list.
When I try to add them as a project dependency, I get warnings from drools
07:49:16,790 WARN [org.appformer.maven.integration.MavenRepository] (default task-50) Unable to resolve artifact: decisions:fact:1.0.0
07:49:16,791 ERROR [org.kie.scanner.MavenClassLoaderResolver] (default task-50) Dependency artifact not found for: decisions:fact:1.0.0
Everything in the drools workbench maven repositories looks perfectly ok, but no matter what I do, the artifact will not resolve.
If I try to upload it manually to the artifact store using the workbench admin repository tool, I get a huge stack trace from drools:
08:14:06,447 ERROR [io.undertow.request] (default task-56) UT005023: Exception handling request to /business-central/maven2: java.lang.RuntimeException: org.eclipse.aether.deployment.DeploymentException: Failed to retrieve remote metadata com.lendi.decisions:fact:1.0.0-SNAPSHOT/maven-metadata.xml: Could not transfer metadata com.lendi.decisions:fact:1.0.0-SNAPSHOT/maven-metadata.xml from/to workbench (http://localhost:8080/business-central/maven2): Unauthorized (401)
at org.guvnor.m2repo.backend.server.repositories.DistributionManagementArtifactRepository.deploy(DistributionManagementArtifactRepository.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.weld.bean.proxy.AbstractBeanInstance.invoke(AbstractBeanInstance.java:38)
at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:106)
at org.guvnor.m2repo.backend.server.repositories.ArtifactRepository$475362908$Proxy$_$$_WeldClientProxy.deploy(Unknown Source)
at org.guvnor.m2repo.backend.server.GuvnorM2Repository.lambda$deployArtifact$4(GuvnorM2Repository.java:300)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
Neither the drools manual or any references seem to have advice.

Maven dependency doesn't get downloaded

I have a Maven Scala project and in the pom.xml I have this dependency
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>
I have this project in Eclipse but for some reason Eclipse keeps telling me that this jar file doesn't exist and it is now shown under Maven dependency section. This is not the case for other dependencies. what is wrong with this one?

How to run a spark example program in Intellij IDEA

First on the command line from the root of the downloaded spark project I ran
mvn package
It was successful.
Then an intellij project was created by importing the spark pom.xml.
In the IDE the example class appears fine: all of the libraries are found. This can be viewed in the screenshot.
However , when attempting to run the main() a ClassNotFoundException on SparkContext occurs.
Why can Intellij not simply load and run this maven based scala program? And what can be done as a workaround?
As one can see below, the SparkContext is looking fine in the IDE: but then is not found when attempting to run:
The test was run by right clicking inside main():
.. and selecting Run GroupByTest
It gives
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:36)
at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkContext
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Here is the run configuration:
Spark lib isn't your class_path.
Execute sbt/sbt assembly,
and after include "/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar" to your project.
This may help IntelliJ-Runtime-error-tt11383. Change module dependencies from provide to compile. This works for me.
You need to add the spark dependency. If you are using maven just add these lines to your pom.xml:
<dependencies>
...
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
...
</dependencies>
This way you'll have the dependency for compiling and testing purposes but not in the "jar-with-dependencies" artifact.
But if you want to execute the whole application in an standalone cluster running in your intellij you can add a maven profile to add the dependency with compile scope. Just like this:
<properties>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>1.2.1</spark.version>
<spark.scope>provided</spark.scope>
</properties>
<profiles>
<profile>
<id>local</id>
<properties>
<spark.scope>compile</spark.scope>
</properties>
<dependencies>
<!--<dependency>-->
<!--<groupId>org.apache.hadoop</groupId>-->
<!--<artifactId>hadoop-common</artifactId>-->
<!--<version>2.6.0</version>-->
<!--</dependency>-->
<!--<dependency>-->
<!--<groupId>com.hadoop.gplcompression</groupId>-->
<!--<artifactId>hadoop-gpl-compression</artifactId>-->
<!--<version>0.1.0</version>-->
<!--</dependency>-->
<dependency>
<groupId>com.hadoop.gplcompression</groupId>
<artifactId>hadoop-lzo</artifactId>
<version>0.4.19</version>
</dependency>
</dependencies>
<activation>
<activeByDefault>false</activeByDefault>
<property>
<name>env</name>
<value>local</value>
</property>
</activation>
</profile>
</profiles>
<dependencies>
<!-- SPARK DEPENDENCIES -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
</dependency>
</dependencies>
I also added an option to my application to start a local cluster if --local is passed:
private def sparkContext(appName: String, isLocal:Boolean): SparkContext = {
val sparkConf = new SparkConf().setAppName(appName)
if (isLocal) {
sparkConf.setMaster("local")
}
new SparkContext(sparkConf)
}
Finally you have to enable "local" profile in Intellij in order to get proper dependencies. Just go to "Maven Projects" tab and enable the profile.

Netbeans doesn't recognize xercesimpl

I am trying to build a plugin for netbeans using maven and for some reason Netbeans doesn't recognize xercesimpl.jar packaged with the plugin. Here is the stacktrace I see.
java.io.IOException: SAX2 driver class org.apache.xerces.parsers.SAXParser not found
at org.apache.batik.dom.util.SAXDocumentFactory.createDocument(SAXDocumentFactory.java:353)
at org.apache.batik.dom.util.SAXDocumentFactory.createDocument(SAXDocumentFactory.java:276)
at org.apache.batik.dom.svg.SAXSVGDocumentFactory.createDocument(SAXSVGDocumentFactory.java:207)
at org.apache.batik.dom.svg.SAXSVGDocumentFactory.createSVGDocument(SAXSVGDocumentFactory.java:105)
[catch] at org.netbeans.modules.plantumlnb.SVGImagePreviewPanel.createSVGDocument(SVGImagePreviewPanel.java:59)
at org.netbeans.modules.plantumlnb.SVGImagePreviewPanel.renderSVGFile(SVGImagePreviewPanel.java:48)
at org.netbeans.modules.plantumlnb.RenderImageThread$1.run(RenderImageThread.java:56)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:251)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:733)
at java.awt.EventQueue.access$200(EventQueue.java:103)
at java.awt.EventQueue$3.run(EventQueue.java:694)
at java.awt.EventQueue$3.run(EventQueue.java:692)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:703)
at org.netbeans.core.TimableEventQueue.dispatchEvent(TimableEventQueue.java:159)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)
Here is the image showing that the xercesimpl.jar is indeed packaged with the nbm file.
Here is a list of all the jars that were downloaded during build process.
NBM Plugin generates manifest
Adding on module's Class-Path:
net.sourceforge.plantuml:plantuml:jar:7959
batik:batik-swing:jar:1.6-1
batik:batik-bridge:jar:1.6-1
batik:batik-gvt:jar:1.6-1
batik:batik-awt-util:jar:1.6-1
batik:batik-script:jar:1.6-1
rhino:js:jar:1.5R4.1
batik:batik-svg-dom:jar:1.6-1
batik:batik-dom:jar:1.6-1
batik:batik-xml:jar:1.6-1
xerces:xercesImpl:jar:2.5.0
batik:batik-parser:jar:1.6-1
batik:batik-css:jar:1.6-1
batik:batik-rasterizer:jar:1.6-1
batik:batik-transcoder:jar:1.6-1
fop:fop:jar:0.20.5
batik:batik-1.5-fop:jar:0.20-5
xml-apis:xml-apis:jar:1.0.b2
xalan:xalan:jar:2.4.1
avalon-framework:avalon-framework:jar:4.0
batik:batik-util:jar:1.6-1
batik:batik-gui-util:jar:1.6-1
batik:batik-ext:jar:1.6-1
xml-apis:xmlParserAPIs:jar:2.0.2
org.axsl.org.w3c.dom.svg:svg-dom-java:jar:1.1
org.axsl.org.w3c.dom.smil:smil-boston-dom-java:jar:2000-02-25
org.w3c.css:sac:jar:1.3
crimson:crimson:jar:1.1.3
I am not sure what I am missing, any help is appreciated.
My dependencies weren't right, so once I fixed them, it started recognizing. For anybody looking to include Batik in their maven based netbeans module I am including the dependencies below.
<dependency>
<groupId>batik</groupId>
<artifactId>batik-swing</artifactId>
<version>1.6-1</version>
<type>jar</type>
</dependency>
<dependency>
<groupId>batik</groupId>
<artifactId>batik-script</artifactId>
<version>1.6-1</version>
<!-- exclude xerces as Netbeans includes it -->
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- include xerces in test scope as unittests need it -->
<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.5.0</version>
<type>jar</type>
<scope>test</scope>
</dependency>