ClassNotFoundException for Spark job on Yarn-cluster mode - scala

So I am trying to run a Spark job on Yarn-cluster mode kicked off via Oozie workflow, but have been encountering the following error (relevant stacktrace below)
java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.
at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:388)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:296)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:179)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1917)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1896)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1896)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
...
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:414)
at org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:323)
at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144)
at org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:294)
... 28 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 33 more
Caused by: java.lang.UnsupportedOperationException: Unable to find org.apache.hadoop.hbase.ipc.controller.ClientRpcControllerFactory
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:36)
at org.apache.hadoop.hbase.ipc.RpcControllerFactory.instantiate(RpcControllerFactory.java:58)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.createAsyncProcess(ConnectionManager.java:2317)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:688)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:630)
... 38 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.ipc.controller.ClientRpcControllerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:32)
... 42 more
Some background information:
The job runs on spark 1.4.1 (specified correct spark.yarn.jar field in the spark.conf file).
oozie.libpath is set to the hdfs directory in which the jar of my program resides.
org.apache.hadoop.hbase.ipc.controller.ClientRpcControllerFactory, the class not found, exists in phoenix-4.5.1-HBase-1.0-client.jar. I've specified this jar in spark.driver.extraClassPath and spark.executor.extraClassPath in my spark.conf file. I've also added the phoenix-core dependency in my pom file, so that the class exists in my shaded project jar as well.
Observations so far:
adding an extra field in my spark.conf file spark.driver.userClassPathFirst and setting it to true gets rid of the classnotfound exception. However, it also prevents me from initializing a spark context (null pointer exception). From googling around it seems that including this field messes up classpaths, so may not be the way to go about it since I cannot even initialize a spark context this way.
I noticed that in the oozie stdout log, I do not see the classpath of the phoenix jar. So maybe for some reason spark.driver.extraClassPath and spark.executor.extraClassPath aren't actually picking up the jar as an extraClassPath? I do know that I'm specifying the correct jar file path, as other jobs have spark.conf files with the same parameters.
I found a hacky way to make the phoenix jar show up in the classpath (in the oozie stdout log) by copying the jar to the same directory as where my program jar resides. This works whether or not spark.executor.extraClassPath is changed to point to the new jar location. However, the classnotfound exception persists, even though I clearly see the ClientRpcControllerFactory jar when I unzip the jar)
Other things I've tried:
I tried using the sparkConf.setJars() and sparkContext.addJar() methods, but still encountered the same error
added the jar in the spark.driver.extraClassPath field in my job properties file, but it hasn't seemed to help (Spark docs indicated that this field is necessary when running in client mode, so may not be relevant for my case)
Any help/ideas/suggestions would be greatly appreciated.

I use CDH 5.5.1 + Phoenix 4.5.2 (both installed with parcels) and faced the same problem. I think the problem disappeared after I switched to client mode. I can't verify this because I am getting other error with cluster mode now.
I tried to trace Phoenix source code and found some interesting things. Hope Java / Scala expert identify the root cause.
The PhoenixDriver class was loaded.
This showed the jar was found initially. After layers of Class Loader /
context switch (?), the jar lost from the classpath.
If I Class.forName() a non-existing class in my program, there is no need to call sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331). The stack is like:
java.lang.ClassNotFoundException: NONEXISTINGCLASS
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
I copied Phoenix code into my program for testing. I still get the ClassNotFoundExcpetion if I call ConnectionQueryServicesImpl.init (ConnectionQueryServicesImpl.java:1896). However, a call to ConnectionQueryServicesImpl.openConnection (ConnectionQueryServicesImpl.java:296) returned usable HBase connection. So it seems PhoenixContextExecutor was causing the loss of the jar, but I don't know how.
Source code of Cloudera Phoenix 4.5.2 : https://github.com/cloudera-labs/phoenix/blob/phoenix1-4.5.2_1.2.0/phoenix-core/src/main/java/org/apache/
(Not sure whether I should post a comment... but I have no reputation anyway)

So I managed to fix my issue and get my job to run. My solution is very hacky, but will post it here in case it may help others in the future.
Basically, the problem as I understand it was that the org.apache.hadoop.hbase.util.ReflectionUtils class, which is responsible for finding the ClientRpcControllerFactory class, was being loaded from some cloudera directory in the cluster instead of from my own jar. When I set spark.driver.userClassPathFirst to true, it prioritized loading the ReflectionUtils class from my jar, and so was able to location the ClientRpcControllerFactory class. But that messed up some other classpaths and kept giving me a NullPointerException when I tried to initialize a SparkContext, so I looked for another solution.
I tried to figure out if it was possible to exclude all default cdh jars from being included in my classpath, but found that the value in spark.yarn.jar was pulling in all these cdh jars, and I definitely needed to specify that jar.
So the solution was to include all classes under org.apache.hadoop.hbase from the Phoenix jar into spark-assembly jar (the jar that spark.yarn.jar pointed to), which got rid of the original exception and did not give me a NPE when trying to initialize a SparkContext. I found that now the ReflectionUtils class was being loaded from the spark-assembly jar, and since the ClientRpcControllerFactorywas also included in that jar, it was able to find it. After this, I encountered a few more classNotFoundExceptions for Phoenix classes, so I put those classes into the spark-assembly jar as well.
Finally, I had a java.lang.RuntimeException: hbase-default.xml File Seems to be for and old Version of HBase problem. I found that my application jar contained such a file, but changing hbase.defaults.for.version.skip to true didn't do anything. So I included another hbase-default.xml file in the spark-assembly jar with the skip flag to true, and it finally worked.
Some observations:
I noticed that my spark-assembly jar was completely missing an org.apache.hadoop.hbase directory. A coworker told me that usually I should expect to find an hbase directory in my spark-assembly jar, so maybe I was working with a bad spark-assembly jar. Edit: I checked a spark-assembly jar that I newly downloaded (v1.5.2) and it doesn't have it, so maybe the apache.hadoop.hbase package is not included in it.
ClassNotFoundExceptions and classloader problems are hard to debug.

Related

`NoClassDefFoundError` in Artifact built with IntelliJ

I use Intellij Idea 2017.3 (Ultimate Edition) to build an artifact (an executable Jar) from a Scala/SBT project; Scala version is 2.12.
Since I have added a dependency to Scallop recently, I can no longer execute the Jar file because the Scallop class ScallopConf is not in the Jar file:
$ java -jar executable.jar
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/rogach/scallop/ScallopConf
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
[...]
Caused by: java.lang.ClassNotFoundException: org.rogach.scallop.ScallopConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more
I can confirm that the ScallopConf class is not packaged into the Jar file by inspecting it manually. All other dependencies are there, no matter if they were added initially or later.
This is how I added the dependency to the build.sbt file in the project root directory:
libraryDependencies += "org.rogach" %% "scallop" % "3.1.1"
The project compiles fine both within the IDE and with sbt compile. I can also run it fine within the IDE.
I created the artifact within the IDE in a standard way. Is there anything particular I need to pay attention to, possibly related to Scallop?
As pointed out by #Andrey , the artifact settings are not automatically updated when the SBT dependencies change. To make sure everything is up-to-date, the workaround is hence to re-create the artifact after updating the SBT dependencies.
So this issue is not related to the specific dependency (Scallop in this case).
Conflicts are happening between class files of jars, therefore in the above example when libraries are removed from File | Project Structure | Artifacts | Output Layout . Everything runs fine.
In my case I had dependencies on other jars as well, so when I did this activity of removing all other libraries. ClassNotFoundException was gone but NoClassFoundEx is coming for dependent libraries which I removed.
In order to get to the exact solution I am forced to evaluate all the jar files one by one and removed unwanted libraries to get to the exact solution.

Running IntelliJ scala project error

I am not using IntelliJ 15 for a long time, but never ever had such an issue. When I do: New Project -> Scala, then everything works fine, but when I do New Project -> SBT, then I can't even have main, because it gives me this:
Exception in thread "main" java.lang.ClassNotFoundException: testing
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
I did try deleting Make from Edit Configurations, I also tried adding Scala script in Edit Configurations, but still have this problem (it says scala script couldn't be found even though I linked it properly). Also, I read this topic:
How to run a Scala script within IntelliJ IDEA?
but haven't found solution. Thank you for your suggestions.
Took me some time, but fixed - the problem was pretty much obvious. It was enough to go: File -> Project Structure -> Modules and add to Source Folders places where you create your scala file, or simply create scala script file in main -> scala.

Why am I getting this NoClassDefFoundError from my code?

I have been trying to use a Jar file as a library in my code, and it compiles fine. However, at runtime, I keep getting the NoClassDefFoundError message. Why is this happening? I have included the Jar file in the compile path and the runtime path too.
Here is the error message:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.pdfbox.cos.COSDocument.(COSDocument.java:51)
at org.apache.pdfbox.pdmodel.PDDocument.(PDDocument.java:136)
at processing.PDFToJPG.main(PDFToJPG.java:58)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 3 more
Here is my code:
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public static void main(String[] args) {
try {
PDDocument doc = new PDDocument();
} catch (Exception e) {
e.printStackTrace();
}
}
I am using NetBeans IDE as well as Windows 10
This is my setting for compile classpath:
[
This is my setting for runtime classpath:
[
EDIT: Thank you for your help, it really worked. All i needed to do was to download the dependencies Jar file, not editing the classpath like what i have been trying to do
I think you need another jars besides the one you have already included. Try to add common-logging 1.4. Apparently, there is a dependency between pdfbox1.8.jar and this jar as stated on their site.
EDIT: There are more dependencies fontbox and jempbox to take in account as well.
EDIT2: I made a zip with all dependencies needed you can download it here.
I agree with Aurelien's post: it looks like you are missing Apache Commons Logging - and other runtime dependencies.
You might want to consider creating your project as a 'Maven' Project (And Netbeans supports Maven pretty well): and then adding 'pdfbox' as a 'dependency'; this should make life a lot easier for you - since Maven will fetch any other required dependencies.
You can get the 'Maven Coordinates' for the various PDFBox versions from here:
http://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox
If you want to build your final project into a single JAR containing all the deps; or to create a separate 'lib' directory of them: you will have to make some minor changes to the Maven project file ('pom.xml') to do this.
This Stackoverflow Post has an example of doing that.

Running tests in IntelliJ ClassNotFoundException

I tried many different run configs, but whatever I do I get this exception when running specs2 tests in IntelliJ for scala.
It always fails to find a class that ends with a $ sign. I checked - and there really is no such class file. There's AppControllerIT.class and lots of classes like AppControllerIT$innerFunctionOrclass.clas, but not AppControllerIT$.class
Any ideas?
Thanks!
com.haha.market.api.e2e.controllers.AppControllerIT$
java.lang.ClassNotFoundException: com.haha.market.api.e2e.controllers.AppControllerIT$
STACKTRACE
java.net.URLClassLoader.findClass(URLClassLoader.java:381)
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
org.specs2.reflect.Classes$$anonfun$loadClassEither$1.apply(Classes.scala:140)
org.specs2.reflect.Classes$$anonfun$loadClassEither$1.apply(Classes.scala:140)
org.specs2.control.ActionT$$anonfun$safe$1.apply(ActionT.scala:89)
org.specs2.control.ActionT$$anonfun$reader$1$$anonfun$apply$6.apply(ActionT.scala:80)
org.specs2.control.Status$.safe(Status.scala:100)
Classes with $ signs at the end are generate from compiled Scala objects. This means you may have an object defined similar to this:
package com.haha.market.api.e2e.controllers
object AppControllerIT {
}
From your error, it seems that an older compiled artifact or a library (?) is polluting your classpath. First, try cleaning up the project (mvn clean or sbt clean). Next, try to clean any libraries you have in your project inside IntelliJ. IntelliJ sometimes caches multiple versions of the same libraries which may cause confusion during runtime. To clean those up go to "File -> Project Structure" in IntelliJ and manually delete any duplicated libraries you may have.

Caused by: org.dom4j.DocumentException: org.dom4j.DocumentFactory cannot be cast to org.dom4j.DocumentFactory

I've a gradle project which uses hibernate > 4 . If I run my war file in Apache tomcat, I don't get any error. But when I'm deploying it in Wildfly 8.2 , I get the following exception
Caused by: org.hibernate.InvalidMappingException: Error while parsing file: /G:/wildfly-8.2.0.Final/bin/content/mywar-1.0.war/WEB-INF/classes/com/mysite/
hbm/Role.hbm.xml
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.buildHibernateConfiguration(EntityManagerFactoryBuilderImpl.java:1182) [hibernate-ent
itymanager-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl$4.perform(EntityManagerFactoryBuilderImpl.java:848) [hibernate-entitymanager-4.3.7.Fi
nal.jar:4.3.7.Final]
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl$4.perform(EntityManagerFactoryBuilderImpl.java:845) [hibernate-entitymanager-4.3.7.Fi
nal.jar:4.3.7.Final]
at org.hibernate.boot.registry.classloading.internal.ClassLoaderServiceImpl.withTccl(ClassLoaderServiceImpl.java:398) [hibernate-core-4.3.7.Final.jar:4.
3.7.Final]
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.build(EntityManagerFactoryBuilderImpl.java:844) [hibernate-entitymanager-4.3.7.Final.
jar:4.3.7.Final]
at org.jboss.as.jpa.hibernate4.TwoPhaseBootstrapImpl.build(TwoPhaseBootstrapImpl.java:44) [jipijapa-hibernate4-3-1.0.1.Final.jar:]
at org.jboss.as.jpa.service.PersistenceUnitServiceImpl$1$1.run(PersistenceUnitServiceImpl.java:154) [wildfly-jpa-8.2.0.Final.jar:8.2.0.Final]
... 8 more
Caused by: org.hibernate.InvalidMappingException: Unable to read XML
at org.hibernate.internal.util.xml.MappingReader.legacyReadMappingDocument(MappingReader.java:375) [hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.internal.util.xml.MappingReader.readMappingDocument(MappingReader.java:304) [hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.cfg.Configuration.add(Configuration.java:518) [hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.cfg.Configuration.add(Configuration.java:514) [hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.cfg.Configuration.add(Configuration.java:688) [hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.cfg.Configuration.addInputStream(Configuration.java:726) [hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.buildHibernateConfiguration(EntityManagerFactoryBuilderImpl.java:1177) [hibernate-ent
itymanager-4.3.7.Final.jar:4.3.7.Final]
... 14 more
Caused by: org.dom4j.DocumentException: org.dom4j.DocumentFactory cannot be cast to org.dom4j.DocumentFactory Nested exception: org.dom4j.DocumentFactory cannot
be cast to org.dom4j.DocumentFactory
at org.dom4j.io.SAXReader.read(SAXReader.java:484) [dom4j-1.6.1.jar:1.6.1]
I just added a exclude in my Gradle file like
runtime.exclude group: "dom4j"
Now when I run gradle build, dom4j.jar is not created in the war file. Now I can run my deploy and run my project successfully on Wildfly 8.2 without any error. But the real problem starts here.
One of the feature in my project is that, it'll copy a file.xlsm to anotherfile.xlsm where I'm using jars like Apache Poi for those purposes. Here, Apache Poi is trying to access a method in dom4j.jar during file processing, it results in the following error
18:40:13,261 ERROR [io.undertow.request] (default task-29) UT005023: Exception handling request to /app/parentPath/myAction: org.springframework.web.util.NestedSe
rvletException: Handler processing failed; nested exception is java.lang.NoClassDefFoundError: org/dom4j/DocumentException
at org.springframework.web.servlet.DispatcherServlet.triggerAfterCompletionWithError(DispatcherServlet.java:1287) [spring-webmvc-4.1.4.RELEASE.jar:4.1.4
.RELEASE]
Any ideas how I can permanently use dom4j.jar inside my classpath? I've searched for many question and most of them suggested to remove dom4j from classpath. I do successfully run my program by removing it from classpath but it results in the above error during excel file processing. Wasted more than a day on this..!! IS it possible to include dom4j.jar in my classpath?
Update:
I've done a little trick in MANIFEST.MF file.
I've opened
mywar.war > META-INF > MANIFEST.MF
and added
Dependencies: org.dom4j export
at the end of the file and saved it. So if I deploy my war file, I'm successfully running it without any error.
Can someone explain where I've to add this kind of property in my src/ file so that it will be automatcially added to MANIFEST.MF after gradle build..
This exception on wildfly usually occurs when you include a hibernate lib in your war that's different from the wildfly one, since you are deploying to wildfly it already include hibernate so you can set you live as provided in gradle aka compileProvided and deploy without exporting the dependence.
If you still got the same error try declaring the hibernate dependency on manifest but keep the lib as provided it should work fine.
I had this same problem and i know how frustrating this one can be. I was able to resolve this problem. see my answer here. Hope this help