how to add classpath to oozie java action with sharedlib and external jars - classpath

I have a java application which needs the library of hadoop, hdfs, hive and spark, also some external libraries,
I've read this page but I'm still confused about the order of overriding sharedlib,
in the job configure, I have
oozie.use.system.libpath=false
oozie.action.sharelib.for.java=spark,hive2,hive
I also put the external jars under the /lib of the workspace directory.
Now I got this problem, in my jar I used class from json4s-native, so I put them in the myworkspace/lib path, but under the oozie/share/lib/spark,also has library of json4s-jackson, after a run the java action, throw an error of
Launcher exception: java.lang.NoClassDefFoundError: org/json4s/native/JsonMethods$
How can I get oozie use the library in my /lib path first?

Related

How to add external jar files to a spark scala project

I am trying to use an LSH implementation of Scala(https://github.com/marufaytekin/lsh-spark) in my Spark project.I cloned the repository with some changes to the sbt file (added Organisation)
To use this implementation , I compiled it using sbt compile and moved the jar file to the "lib" folder of my project and updated the sbt configuration file of my project , which looks like this ,
Now when I try to compile my project using sbt compile , It fails to load the external jar file ,showing the error message "unresolved dependency: com.lendap.spark.lsh.LSH#lsh-scala_2.10;0.0.1-SNAPSHOT: not found".
Am i following the right steps for adding an external jar file ?
How do i solve the dependency issue
As an alternative, you can build the lsh-spark project and add the jar in your spark application.
To add the external jars, addJar option can be used while executing spark application. Refer Running spark application on yarn
This issue isn't related to spark but to sbt configuration.
Make sure you followed the correct folder structure imposed by sbt and added your jar in the lib folder, as explained here - lib folder should be at the same level as build.sbt (cf. this post).
You might also want to check out this SO post.

Eclipse: confusing add to Build Path options

I'm not an "real" developper, but I have the right to at least write some code, and add some Jars to the Eclipse build path, without spending hours trying to figure out if the Jars are actually in the Build Path.
My problem (error here below) was resolved in question [NoClassDefFoundError, cannot run MapReduceColorCount (Avro 1.7.7) by adding the correct Jars.
[cloudera#localhost ~]$ hadoop jar avroColorCount.jar exos.MapReduceColorCount2 inavro01 outavro01
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/mapreduce/AvroKeyInputFormat
at exos.MapReduceColorCount2.run(MapReduceColorCount2.java:71)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at exos.MapReduceColorCount2.main(MapReduceColorCount2.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
The following are the different ways I've tried to add Jars to the build path:
1. Maven: adding Dependencies through the POM file, they appear afterwards under "Maven Dependencies".
2. "Configure Build Path": the Jars are actually located in my local file system, thus I add the (library) folders, and the folders appear under the "Referenced Libraries".
3. Create a "lib" folder in the project folder, copy/paste the Jars (located in my local file system), do a project Refresh (the lib folder appears in the Package Explorer), select all Jars and right-click "Add to Build Path"
I confirm that my code show no warnings/errors while performing either method. I usually to an "Export ..." of the Jar file in order to execute it.
Example: I've tried adding to Build Path external Jars from Cloudera's CDH5 (Hadoop 2.3.0-cdh5.1.2 and Avro 1.7.5-cdh5.1.2) which are localted locally in /opt/lib
The only method that really worked was method 3. Why it doesn't work with methods 1. or 2. ?
Thank you in advance for your support
I could not reproduce success with method 3., I received a "cannot cast to namespace.customClass" error instead of the "NoClassDefFoundError" error.
I've found an answer for the latter error with a workaround based on two variables export:
export LIBJARS=avrojar1,avrojar2,jar3
export HADOOP_CLASSPATH=avrojar1:avrojar2:jar3
and then running the hadoop jar command with -libjars ${LIBJARS}.
This was tested with method 1. and method 3. respectively.
I conclusion, my case was specific to avro-related Jars only
Thanks

create a new ontology with Jena

I'm trying to use Jena. For creating a new ontology my code is:
String SOURCE = "http://www.w3.org/2002/07/owl#";
String NS = SOURCE + "#";
OntModel ontology = ModelFactory.createOntologyModel();
ontology.read( SOURCE, "OWL/XML" );
But it gives me this error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at com.hp.hpl.jena.util.Metadata.<clinit>(Metadata.java:26)
at com.hp.hpl.jena.JenaRuntime.<clinit>(JenaRuntime.java:25)
at com.hp.hpl.jena.rdf.model.impl.RDFReaderFImpl.<clinit>(RDFReaderFImpl.java:85)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.<clinit>(ModelCom.java:42)
at com.hp.hpl.jena.rdf.model.ModelFactory.createDefaultModel(ModelFactory.java:122)
at com.hp.hpl.jena.rdf.model.ModelFactory.createDefaultModel(ModelFactory.java:116)
at com.hp.hpl.jena.vocabulary.OWL.<clinit>(OWL.java:37)
at com.hp.hpl.jena.ontology.ProfileRegistry.<clinit>(ProfileRegistry.java:48)
at com.hp.hpl.jena.ontology.OntModelSpec.<clinit>(OntModelSpec.java:54)
What's the problem?I couldn't find any solution for it.
If you use a Jena distribution, all the jars needed are in the lib/ directory. You need them all on the classpath.
On Windows / cygwin:
javac -cp '<install dir>\lib\*;' MyClass.java
On Linux:
javac -cp '<install dir>/lib/*' MyClass.java
To run, the created .class needs to be in your path, too:
java -cp '.:<install dir>/lib/*' MyClass
If you use maven to get Jena, the dependencies are automatically pulled in.
Your Java classpath is missing one of the jar files required by Jena. Looks like it's one of the slf4j jars. You need to have all the jar files that come with Jena on the classpath. How to set the classpath depends on your OS and/or IDE, but Google can help.

Jar works with standalone Hadoop, but not on the actual cluster (java.lang.ClassNotFoundException: org.jfree.data.xy.XYDataset)

I am trying to build my project using Eclipse on Windows and execute on a Linux cluster. The project depends on some external jars, which I enclosed using eclipse's "Export->Runnable JAR -> Package required library into jar" build option. I checked the jar contains the classes within a folder structure, and the external jars are in the root folder.
On Hadoop standalone, Cygwin and Linux, this works fine but on an actual Hadoop Linux cluster it fails, when it tries to access a class from the first external jar, throwing up a ClassNotFoundException.
Is there a way to force Hadoop to search the jar, I thought this would work.
10/07/16 11:44:59 INFO mapred.JobClient: Task Id : attempt_201007161003_0005_m_000001_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.jfree.data.xy.XYDataset
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at org.akintayo.analysis.ecg.preprocess.ReadPlotECG.plotECG(ReadPlotECG.java:27)
at org.akintayo.analysis.ecg.preprocess.BuildECGImages.writeECGImages(BuildECGImages.java:216)
at org.akintayo.analysis.ecg.preprocess.BuildECGImages.converSingleECGToImage(BuildECGImages.java:305)
at org.akintayo.analysis.ecg.preprocess.BuildECGImages.main(BuildECGImages.java:457)
at org.akintayo.hadoop.HadoopECGPreprocessByFile$MapTest.map(HadoopECGPreprocessByFile.java:208)
at org.akintayo.hadoop.HadoopECGPreprocessByFile$MapTest.map(HadoopECGPreprocessByFile.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Java can not use jars that are in other jar:/ (classloaders can't handle this)
So what you have to do is to install those packages separately on each machine in cluster, or if not possible add jars on the run, to do this you have to add option -libjars mylib.jar when running hadoop jar myjar.jar -libjars mylib.jar and this should work.
Wojtek's answer is correct. Using -libjars will put your external jars in the distributed cache and make them available to all of your Hadoop nodes.
However, if your external jars are not changing frequently, you may find it more convenient to copy the jar files to the node's hadoop/lib manually. Once you restart Hadoop your external jar will be added to the classpath of your jobs.

Eclipse VM Argument and external JAR file error

I just added "-Djava.library.path=" to the "VM Arguments" under Run Configuration in Eclipse and everything works fine until I tried to add an external JAR file. I get the following error:
java.lang.UnsatisfiedLinkError: no rxtxSerial in java.library.path thrown while loading gnu.io.RXTXCommDriver
Exception in thread "main" java.lang.UnsatisfiedLinkError: no rxtxSerial in java.library.path
Am I not setting something properly in Eclipse?
If your interested I forked RXTXserial a while back since thier update "schedule" sucks. I have just ported it over to the Android platform too. We decided to move the native libs into the jar and use reflection to deploy them. The API is the same as RXTX, but everything just works. You can find jars and full project sources at:
http://code.google.com/p/nrjavaserial/
The exception indicates that the class gnu.io.RXTXCommDriver tries to load a native library, which would be named rxtxSerial.dll on Windows and rxtxSerial.so on Linux, and the JVM cannot find it in the directories listed in java.library.path. Have you tried to add a JAR containing the library to java.library.path? I don't think that's possible, it has to be a directory containing the extracted library file.
Appearently that external library has a dependancy with another class gnu.io.RXTXCommDriver
. Perhaps you will need to add that library to class path.