I am trying to access a file from HDFS in my standalone scala program using apache-spark. I get the following error upon execution.
SIMPLE authentication is not enabled. Available:[TOKEN,KERBEROS]
I found this question that explains that i need to create a keytab file and then make my standalone program use it . I have generated the keytab file . Could somebody tell me how i can use it from my program.
Any help will be greatly appreciated.
ps - I am using Hadoop 2.3.0 and spark 0.9.0
update : this is how my core-site.xml looks now :
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://USHadoop</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>DEFAULT</value>
</property>
</configuration>
Related
Note that i have deployed statefulsets of 2 namenodes, 2 datanodes and 3 journalnodes for Apache Hadoop 3.3.3 HA on kubernetes.
but namenode is throwing the following error.
$ hdfs --config /opt/hadoop/etc/hadoop namenode
{"name":"org.apache.hadoop.hdfs.server.namenode.NameNode","time":1659593176018,"date":"2022-08-04 06:06:16,018","level":"ERROR","thread":"Listener at 0.0.0.0/8020","message":"Error encountered requiring NN shutdown. Shutting down immediately.","exceptionclass":"java.lang.IllegalArgumentException","stack":["java.lang.IllegalArgumentException: **Does not contain a valid host:port authority: http:**","\tat org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:232)","\tat org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:189)","\tat org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:169)","\tat org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:158)","\tat org.apache.hadoop.hdfs.DFSUtil.substituteForWildcardAddress(DFSUtil.java:1046)","\tat org.apache.hadoop.hdfs.DFSUtil.getInfoServerWithDefaultHost(DFSUtil.java:1014)","\tat org.apache.hadoop.hdfs.server.namenode.ha.RemoteNameNodeInfo.getRemoteNameNodes(RemoteNameNodeInfo.java:61)","\tat org.apache.hadoop.hdfs.server.namenode.ha.RemoteNameNodeInfo.getRemoteNameNodes(RemoteNameNodeInfo.java:42)","\tat org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.<init>(EditLogTailer.java:191)","\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startStandbyServices(FSNamesystem.java:1501)","\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startStandbyServices(NameNode.java:2051)","\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.enterState(StandbyState.java:69)","\tat org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1024)","\tat org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:995)","\tat org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1769)","\tat org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834)"]}
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://apache-hadoop-namenode:8020</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zk-headless.backend.svc.cluster.local:2181</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/dfs/journal</value>
</property>
hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>apache-hadoop-namenode</value>
</property>
<property>
<name>dfs.ha.namenodes.apache-hadoop-namenode</name>
<value>apache-hadoop-namenode-0.apache-hadoop-namenode.backend.svc.cluster.local,apache-hadoop-namenode-1.apache-hadoop-namenode.backend.svc.cluster.local</value>
</property>
<property>
<name>dfs.namenode.rpc-address.apache-hadoop-namenode.apache-hadoop-namenode-0.apache-hadoop-namenode.backend.svc.cluster.local</name>
<value>hdfs://apache-hadoop-namenode-0.apache-hadoop-namenode.backend.svc.cluster.local:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.apache-hadoop-namenode.apache-hadoop-namenode-1.apache-hadoop-namenode.backend.svc.cluster.local</name>
<value>hdfs://apache-hadoop-namenode-1.apache-hadoop-namenode.backend.svc.cluster.local:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.apache-hadoop-namenode.apache-hadoop-namenode-0.apache-hadoop-namenode.backend.svc.cluster.local</name>
<value>http://apache-hadoop-namenode-0.apache-hadoop-namenode.backend.svc.cluster.local:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.apache-hadoop-namenode.apache-hadoop-namenode-1.apache-hadoop-namenode.backend.svc.cluster.local</name>
<value>http://apache-hadoop-namenode-1.apache-hadoop-namenode.backend.svc.cluster.local:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://apache-hadoop-journalnode.backend.svc.cluster.local:8485/apache-hadoop-namenode</value>
</property>
The solution for this is to remove the http in the following property from hdfs-site.xml
<property>
<name>dfs.namenode.http-address.apache-hadoop-namenode.apache-hadoop-namenode-0.apache-hadoop-namenode.backend.svc.cluster.local</name>
<value>apache-hadoop-namenode-0.apache-hadoop-namenode.backend.svc.cluster.local:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.apache-hadoop-namenode.apache-hadoop-namenode-1.apache-hadoop-namenode.backend.svc.cluster.local</name>
<value>apache-hadoop-namenode-1.apache-hadoop-namenode.backend.svc.cluster.local:9870</value>
</property>
this http address property is required as metioned it the https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#:~:text=dfs.namenode.http%2Daddress.%5Bnameservice%20ID%5D.%5Bname%20node%20ID%5D%20%2D%20the%20fully%2Dqualified%20HTTP%20address%20for%20each%20NameNode%20to%20listen%20on
but for my case it worked after removing the http:// out of this property.
I have 3 nodes as master ,slave1,slave2 and trying to install HBASE in above cluster.
I have started HABSE in master node , but i see daemons are not running in slave nodes , do i need to issue start habse command in slave nodes as well ?
Could some body please help.
hbase-env.sh content:
export JAVA_HOME=/opt/jdk1.8.0_151/
regionservers content:
slave1
slave2
master
hbase-site.xml content:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://1XX.1YY.1ZZ.1WW:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zk_data</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
</configuration>
Bashrc was pointing to another version of installed HBASE . problem solved once pointing to correct version in bashrc file. Also to clarify we have to issue start-hbase command in master node only.
I'm trying to execute the spark oozie example on the oozie_spark branch against a BigInsights for Apache Hadoop basic cluster.
The workflow.xml looks like this:
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkWordCount'>
<start to='spark-node' />
<action name='spark-node'>
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>Spark-Wordcount</name>
<class>org.apache.spark.examples.WordCount</class>
<jar>${hdfsSparkAssyJar},${hdfsWordCountJar}</jar>
<spark-opts>--conf spark.driver.extraJavaOptions=-Diop.version=4.2.0.0</spark-opts>
<arg>${inputDir}/FILE</arg>
<arg>${outputDir}</arg>
</spark>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app>
The configuration.xml:
<configuration>
<property>
<name>master</name>
<value>local</value>
</property>
<property>
<name>queueName</name>
<value>default</value>
</property>
<property>
<name>user.name</name>
<value>default</value>
</property>
<property>
<name>nameNode</name>
<value>default</value>
</property>
<property>
<name>jobTracker</name>
<value>default</value>
</property>
<property>
<name>jobDir</name>
<value>/user/snowch/test</value>
</property>
<property>
<name>inputDir</name>
<value>/user/snowch/test/input</value>
</property>
<property>
<name>outputDir</name>
<value>/user/snowch/test/output</value>
</property>
<property>
<name>hdfsWordCountJar</name>
<value>/user/snowch/test/lib/OozieWorkflowSparkGroovy.jar</value>
</property>
<property>
<name>oozie.wf.application.path</name>
<value>/user/snowch/test</value>
</property>
<property>
<name>hdfsSparkAssyJar</name>
<value>/iop/apps/4.2.0.0/spark/jars/spark-assembly.jar</value>
</property>
</configuration>
However, the error I see in the Yarn logs is:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 13 more
I've looked for SparkMain in spark-assembly:
$ hdfs dfs -get /iop/apps/4.2.0.0/spark/jars/spark-assembly.jar
$ jar tf spark-assembly.jar | grep -i SparkMain
And here:
$ jar tf /usr/iop/4.2.0.0/spark/lib/spark-examples-1.6.1_IBM_4-hadoop2.7.2-IBM-12.jar | grep SparkMain
I've seen another question similar to this one, but this question is specifically about BigInsights on cloud.
The issue was resolved with:
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
I should have RTFM properly.
When I start Hive metastore service, my command line says: "Starting Hive Metastore Server" and nothing further. It doesn't actually start server, neither does it throws any error messages
Hive : 1.2.1
Hadoop : 2.7.1
Postgres: 9.3.8
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://localhost:5432/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>*****</value>
</property>
<property>
<name>org.jpox.autoCreateSchema</name>
<value>true</value>
</property>
</configuration>
[metastore is actual database created in PostgresSQL, and I can access it using: psql -U hiveuser -d metastore]
Please set the following property. especially for PostgreSQL . For more details refer here
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
The property is replaced since Hive 2.0 by property datanucleus.schema.autoCreateAll and others as explained in this apache cwiki page.
Please check other specific configurations in the same page.
I get the following error when I run my MR job in Eclipse.
2014-07-10 14:07:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
job done
2014-07-10 14:07:30 INFO JvmMetrics:76 - Initializing JVM Metrics with processName=JobTracker, sessionId=
2014-07-10 14:07:30 WARN JobSubmitter:149 - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2014-07-10 14:07:30 INFO JobSubmitter:439 - Cleaning up the staging area file:/Users/name/tmp/mapred/staging/sridhar519992773/.staging/job_local519992773_0001
2014-07-10 14:07:30 ERROR UserGroupInformation:1494 - PriviledgedActionException as:sridhar (auth:SIMPLE) cause:org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/name/tmp/mapred/staging/sridhar519992773/.staging/job_local519992773_0001: No such file or directory
2014-07-10 14:07:30 ERROR App:43 - Error running MapReduce Job
org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/name/tmp/mapred/staging/sridhar519992773/.staging/job_local519992773_0001: No such file or directory
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:596)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:178)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at hadoopstandalone.standalone.App.doMapReduce(App.java:40)
at hadoopstandalone.standalone.App.main(App.java:27)
Here is my core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>file:///</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/sridhar/Desktop/tmp</value>
</property>
</configuration>
Here is my mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>local</value>
</property>
</configuration>
Your core-site.xml should look something like `
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.temp.dir</name>
<value>/home/myname/hdfs/temp</value>
</property>
</configuration>
and you hdfs-site.xml should look something like this
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/myname/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/myname/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Let me know if it works