Using spark to access HDFS failed

Using spark to access HDFS failed - scala

I am using Cloudera 4.2.0 and Spark.
I just want to try out some examples given by Spark.
// HdfsTest.scala
package spark.examples
import spark._
object HdfsTest {
def main(args: Array[String]) {
val sc = new SparkContext(args(0), "HdfsTest",
System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_EXAMPLES_JAR")))
val file = sc.textFile("hdfs://n1.example.com/user/cloudera/data/navi_test.csv")
val mapped = file.map(s => s.length).cache()
for (iter <- 1 to 10) {
val start = System.currentTimeMillis()
for (x <- mapped) { x + 2 }
// println("Processing: " + x)
val end = System.currentTimeMillis()
println("Iteration " + iter + " took " + (end-start) + " ms")
}
System.exit(0)
}
}
It's ok for compiling, but there is always some runtime problems:
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2229)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2240)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2257)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2296)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2278)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:316)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:162)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:587)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:315)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:288)
at spark.SparkContext.hadoopFile(SparkContext.scala:263)
at spark.SparkContext.textFile(SparkContext.scala:235)
at spark.examples.HdfsTest$.main(HdfsTest.scala:9)
at spark.examples.HdfsTest.main(HdfsTest.scala)
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
at org.apache.hadoop.hdfs.HftpFileSystem.<clinit>(HftpFileSystem.java:84)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at java.lang.Class.newInstance0(Class.java:374)
at java.lang.Class.newInstance(Class.java:327)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
... 16 more
I have searched on Google, no idea about this kind of exception for Spark and HDFS.
val file = sc.textFile("hdfs://n1.example.com/user/cloudera/data/navi_test.csv") is where the problem occurs.
13/04/04 12:20:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
And I got this Warning. Maybe I should add some hadoop paths in CLASS_PATH.
Feel free to give any clue. =)
Thank you all.
REN Hao

(This question was also asked / answered on the spark-users mailing list).
You need to compile Spark against the particular version of Hadoop/HDFS running on your cluster. From the Spark documentation:
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster runs. You can change the version by setting the HADOOP_VERSION variable at the top of project/SparkBuild.scala, then rebuilding Spark (sbt/sbt clean compile).
The spark-users mailing list archives contain several questions about compiling against specific Hadoop versions, so I would search there if you run into any problems when building Spark.

You can set Coudera's Hadoop version with an environment variable when building Spark, look up your exact artifact version on Cloudera's maven repo, should be this:
SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 sbt/sbt assembly publish-local
Make sure you run whatever you run with the same Java engine you use to build Spark. Also, there are pre-built Spark packages for different Cloudera Hadoop distributions, like http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz

This might be a problem related with the installed Java in your system. Hadoop requires (Sun) Java 1.6+.
Make sure you have:
JAVA_HOME="/usr/lib/jvm/java-6-sun

Related

error while reading hbase 1.2.5 table in spark 2.4

spark version - 2.4.0
hbase version - 1.2.5
scala version.2.11.8
Everything is setup in local VM.
packages imported
spark-shell --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11,org.apache.hadoop:hadoop-common:2.7.3,org.apache.hbase:hbase-common:1.2.5,org.apache.hbase:hbase-client:1.2.5,org.apache.hbase:hbase-protocol:1.2.5,org.apache.hbase:hbase-hadoop2-compat:1.2.5,org.apache.hbase:hbase-server:1.2.5 --repositories http://repo.hortonworks.com/content/groups/public/
In hbase shell:
creating table:
create 'cardata','software','hardware','other'
inserting data to table:
put 'cardata','v001_H','hardware:alloy_wheels','yes'
put 'cardata','v001_H','hardware:anti_Lock_break','yes'
put 'cardata','v001_H','software:electronic_breakforce_distribution','yes'
put 'cardata','v001_H','software:terrain_mode','yes'
put 'cardata','v001_H','software:traction_control','yes'
put 'cardata','v001_H','software:stability_control','yes'
put 'cardata','v001_H','software:cruize_control','yes'
put 'cardata','v001_H','other:make','hyundai'
put 'cardata','v001_H','other:model','i10'
put 'cardata','v001_H','other:variant','sportz'
in repl
import org.apache.spark.sql.execution.datasources.hbase._
import spark.implicits._
def carCatalog = s"""{
"table":{"namespace":"default", "name":"cardata"},
"rowkey":"key",
"columns":{
"alloy_wheels":{"cf":"hardware", "col":"alloy_wheels", "type":"string"},
"anti_Lock_break":{"cf":"hardware", "col":"anti_Lock_break", "type":"string"},
"electronic_breakforce_distribution":{"cf":"software", "col":"electronic_breakforce_distribution", "type":"string"},
"terrain_mode":{"cf":"software", "col":"terrain_mode", "type":"string"},
"traction_control":{"cf":"software", "col":"traction_control", "type":"string"}
}
}""".stripMargin
val hbaseDF=spark.read.options(Map(HBaseTableCatalog.tableCatalog->carCatalog)).format("org.apache.spark.sql.execution.datasources.hbase").load()
Error
java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:257)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:80)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
... 53 elided
Everything is setup in local VM.

NoSuchMethodError usually means the library version found at runtime is not the expected one (used at compile time).
It might be introduced by your platform adding default libraries (on your behalf with spark-submit command) and conflicting with the libraries you're using while developing.
A common solution is called shading and can be defined when creating your assembly jar (in build.sbt for instance).
You can refer to this link : https://github.com/sbt/sbt-assembly#shading

How to run a Spark test from IntelliJ (or other IDE)

I am trying to create a Test for some Spark code. The following code fails when getting a SparkSession object. NOTE: The test runs fine when running from the cli: gradle my_module:build
#Test
def myTest(): Unit = {
val spark = SparkSession.builder().master("local[2]").getOrCreate()
...
}
Error:
java.lang.IllegalArgumentException: Can't get Kerberos realm
...
Caused by: java.lang.reflect.InvocationTargetException
...
Caused by: KrbException: Cannot locate default realm
My set-up: IntelliJ + Gradle + Mac OS
Questions:
How do I run a Spark Test from within IntelliJ?
Why is Spark looking for Kerberos at all when running 'local'

By your code you need to run Spark from JUnit, not specifically from IntelliJ, you can try something like https://github.com/sleberknight/sparkjava-testing

'new HiveContext' is wanting an X11 display? com.trend.iwss.jscan?

Spark 1.6.2 (YARN master)
Package name: com.example.spark.Main
Basic SparkSQL code
val conf = new SparkConf()
conf.setAppName("SparkSQL w/ Hive")
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
import hiveContext.implicits._
// val rdd = <some RDD making>
val df = rdd.toDF()
df.write.saveAsTable("example")
And stacktrace...
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
at java.awt.Window.<init>(Window.java:536)
at java.awt.Frame.<init>(Frame.java:420)
at java.awt.Frame.<init>(Frame.java:385)
at com.trend.iwss.jscan.runtime.BaseDialog.getActiveFrame(BaseDialog.java:75)
at com.trend.iwss.jscan.runtime.AllowDialog.make(AllowDialog.java:32)
at com.trend.iwss.jscan.runtime.PolicyRuntime.showAllowDialog(PolicyRuntime.java:325)
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopActionInner(PolicyRuntime.java:240)
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopAction(PolicyRuntime.java:172)
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopAction(PolicyRuntime.java:165)
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime.checkURL(NetworkPolicyRuntime.java:284)
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime._preFilter(NetworkPolicyRuntime.java:164)
at com.trend.iwss.jscan.runtime.PolicyRuntime.preFilter(PolicyRuntime.java:132)
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime.preFilter(NetworkPolicyRuntime.java:108)
at org.apache.commons.logging.LogFactory$5.run(LogFactory.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.commons.logging.LogFactory.getProperties(LogFactory.java:1376)
at org.apache.commons.logging.LogFactory.getConfigurationFile(LogFactory.java:1412)
at org.apache.commons.logging.LogFactory.getFactory(LogFactory.java:455)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
at org.apache.hadoop.hive.shims.HadoopShimsSecure.<clinit>(HadoopShimsSecure.java:60)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:146)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:141)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.spark.sql.hive.client.ClientWrapper.overrideHadoopShims(ClientWrapper.scala:116)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:69)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:345)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:459)
at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:233)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:236)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at com.example.spark.Main1$.main(Main.scala:52)
at com.example.spark.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/2.5.3.0-37/0/ivysettings.xml will be used
This same code was working a week ago on a fresh HDP cluster, and it works fine in the sandbox... the only thing I remember doing was trying to change around the JAVA_HOME variable, but I am fairly sure I undid those changes.
I'm at a loss - not sure how to start tracking down the issue.
The cluster is headless, so of-course it has no X11 display, but what piece of new HiveContext even needs to pop-up any JFrame?
Based on the logs, I'd say it's a Java configuration issue I messed up and something within org.apache.hadoop.hive.shims.HadoopShimsSecure.<clinit>(HadoopShimsSecure.java:60) got triggered, therefore a Java security dialog is appearing, but I don't know.
Can't do X11 forwarding, and tried to do export SPARK_OPTS="-Djava.awt.headless=true" before a spark-submit, and that didn't help.
Tried these, but again, can't forward and don't have a display
Getting a HeadlessException: No X11 DISPLAY variable was set
"No X11 DISPLAY variable" - what does it mean?
The error seems to be reproducible on two of the Spark clients.
Only on one machine did I try changing JAVA_HOME.
Did an Ambari Hive service-check. Didn't fix it.
Can connect fine to Hive database via Hive/Beeline CLI

As far as the Spark code is concerned, this seemed to mitigate the error.
val conf = SparkConf()
conf.set("spark.executor.extraJavaOptions" , "-Djava.awt.headless=true")
Original answer
Found this post. Spring 3.0.5 - java.awt.HeadlessException - com.trend.iwss.jscan
Basically, Trend Micro is inserting some com.trend.iwss.jscan package into the JAR files that are downloaded via Maven through a company firewall , and I have no control over that.
(link not working) http://esupport.trendmicro.com/Pages/IWSx-3x-Some-files-and-folders-are-added-to-the-Jar-files-after-passin.aspx
Wayback Machine to the rescue...
If anyone else has input, I would also like to hear it.
Problem:
When downloading some .JAR files via IWSA, a directory filled with .class file, which is not related to what is being downloaded, is added to the jar file (com\trend\iwss\jscan\runtime\).
Solution:
This happens because if a JAR file is originally unsigned, IWSA will insert some code into the applet to monitor and restrict potential harmful actions.
For IWSS/IWSA, every "get" request is the same so it will not know if you are trying to download an archive or an applet, which will be executed by your browser.
This code is added for security reason to monitor the behavior of the "possible" applet to be sure that it does not do any harm to the machine and its environment.
To prevent this issue, please follow these steps:
Log on to the IWSS web console.
Go to HTTP > Applets and ActiveX > Policies > Java Applet Security Rules.
Under Java Applet Security, change the value of "No signature" to either "Pass" or "Block", depending on what you want to do with the unsigned .JAR files.
Click Save.

Hive Table Creation Using MongoDB Hadoop Driver

I am trying to connect from a Hive Database to a collection in MongoDB using a driver (jars) provided on the wiki site. Here are the steps I did: -
I created a collection in MongoDB called "Diamond" under a database called "Moe" and it has got 20 documents:
I wanted to connect from Hive via the Hadoop MongoDB Driver and view these documents via Hive.
I have both MongoDB and Hive installed on the same server and configured. However I don't see any variable called the HIVE_CLASPATH I wonder where that is.
So I installed 3 divers on the server: -
mongo-hadoop-core-1.5.2.jar;
mongo-hadoop-hive-1.5.2.jar;
mongo-java-driver-3.0.0.jar;
Now, I connect to Hive, and then add these 2 jar's to my classpath by the following commands: - (they get added successfully)
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-hadoop-hive-1.5.2.jar;
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-hadoop-core-1.5.2.jar;
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-java-driver-3.0.0.jar;
Now I create a table in HIVE: -
CREATE TABLE Diamond
(
carat DOUBLE,
cut STRING,
color STRING,
clarity STRING,
depth DOUBLE,
table DOUBLE,
price DOUBLE,
xcord DOUBLE,
ycord DOUBLE,
zcord DOUBLE
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"carat":"carat","cut":"cut",
"color":"color", "clarity":"clarity", "depth":"depth", "table":"table",
"price":"price", "xcord":"x", "ycord":"y", "zcord":"z"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/Moe.Diamond');
However when I execute the above command in Hive I get the error below: -
java.lang.NoClassDefFoundError: com/mongodb/util/JSON
at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:110)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:210)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:268)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:261)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:587)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:573)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3784)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:256)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:155)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: com.mongodb.util.JSON
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 23 more
FAILED: Execution Error, return code -101 from
org.apache.hadoop.hive.ql.exec.DDLTask
I have tried the following: -
- placing the jars in every possible directory with no effect
- The class that is supposed to be missing, is pretty much present in the jar file.
- oh yes and the MongoStorageHandler class is very much in the jar.
I am done breaking my head with this !! If anyone can shed some light on what I could do to alleviate my anxiety, it would be great.
Thanks again.
Mario

I identified what the issue was. To connect from HIVE to MongoDB, the MongoDb Driver uses invokes a java class in a hive jar library
**
java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.hooks.PreExecutePrinter
**
Now this class is supposed to be found in the jar file - hive-exec-0.11.0.1.3.2.0-111.jar. However it is available only in more recent versions of HIVE and not older ones.
It is not available in 0.11.0.1.3.2.0-111 but is visibly detectable in 0.13.0.2.1.7.0-784.
The solution here was to connect to a version of HIVE that is supported by the driver. MongoDB does state that its driver supports a certain version of Hadoop, but doesn't drill down to the individual Application (HIVE / SQOOP).

zookeeper configs for Giraph 1.0 on Hadoop 2.2.0

New to stack exchange and Giraph so please overlook mistakes and ask any clarifying questions.
OS: ubuntu 13.10
Hadoop/Yarn: hadoop-2.2.0/ (2-node cluster)
Giraph: 1.0.0 (EDIT: trunk)
I'm getting a NullPointerException (NPE) when I attempt to run the following example:
$ hadoop jar
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hduser/rrdata/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/hduser/rrdata/output/tiny_graph.out -w 1
Stack Trace:
Exception in thread "main" java.lang.NullPointerException at
org.apache.giraph.yarn.GiraphYarnClient.checkJobLocalZooKeeperSupported(GiraphYarnClient.java:460)
at
org.apache.giraph.yarn.GiraphYarnClient.run(GiraphYarnClient.java:116)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:96) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at
org.apache.giraph.GiraphRunner.main(GiraphRunner.java:126) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.main(RunJar.java:212)
It seems zookeeper related. I installed zookeeper but not having used it before it seems like the configs are wrong. I've tried -Dgiraph.zkList=hostname:port and related options but get 'Unrecognized option' exception.
Looking for the correct zookeeper settings for this scenarios. I'll post a reply if I figure it out.

This is an example how you can specify -D flags:
hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -D giraph.zkList="zkNode.net:2081" org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/rav/giraph/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/rav/giraph/output/shortestpaths -w 1
Btw local zookeeper is not supported in Giraph yet (GiraphYarnClient):
/**
* Check if the job's configuration is for a local run. These can all be
* removed as we expand the functionality of the "pure YARN" Giraph profile.
*/
private void checkJobLocalZooKeeperSupported() {
final boolean isZkExternal = giraphConf.isZookeeperExternal();
final String checkZkList = giraphConf.getZookeeperList();
if (!isZkExternal || checkZkList.isEmpty()) {
throw new IllegalArgumentException("Giraph on YARN does not currently" +
"support Giraph-managed ZK instances: use a standalone ZooKeeper.");
}
}
Unfortunately checkZkList is NULL so you will never see this exception :)

The reason for the NPE is probably the lack of a giraphConf to check for ZK settings. i think this is due to earlier problems in the run. Looks like the examples jar was not supplied using a -yj argument. the jar you run with "hadoop jar" is typically giraph-core itself.
Good luck, please post on the Giraph user list if you have more trouble.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Using spark to access HDFS failed - scala

This might be a problem related with the installed Java in your system. Hadoop requires (Sun) Java 1.6+. Make sure you have: JAVA_HOME="/usr/lib/jvm/java-6-sun

Related

error while reading hbase 1.2.5 table in spark 2.4

How to run a Spark test from IntelliJ (or other IDE)

'new HiveContext' is wanting an X11 display? com.trend.iwss.jscan?

Hive Table Creation Using MongoDB Hadoop Driver

zookeeper configs for Giraph 1.0 on Hadoop 2.2.0

Categories

Resources