GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) while connecting Polybase with Kerberos - kerberos

We want to connect our SQL Server 2016 Enterprise via Polybase with our Kerberized OnPrem Hadoop-Cluster with Cloudera 5.14.
I followed the Microsoft PolyBase Guide to configure Polybase. After working few days on this topic I'm not able to continue because of an exception: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Microsoft has an built in diagnostic tool for troubleshooting the connectivity with PolyBase and Kerberos. On this troubleshooting guide from Microsoft there are 4 checkpoints and I'm stuck on checkpoint 4.
Short information about the checkpoints (where I'm successfull):
Checkpoint 1: Successfull! Authenticated against the KDC and received a TGT
Checkpoint 2: Successfull! Regarding troubleshooting guide PolyBase will make an attempt to access the HDFS and fail because the request did not contain the necessary Service Ticket.
Checkpoint 3: Sucessfull! A second hex dump indicates that SQL Server successfully used the TGT and acquired the applicable Service Ticket for the name node's SPN from the KDC.
Checkpoint 4: Not successfull SQL Server was authenticated by Hadoop using the ST (Service Ticket) and a session was granted to access the secured resource.
krb5.conf file
[libdefaults]
default_realm = COMPANY.REALM.COM
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
COMPANY.REALM.COM = {
kdc = ipadress.kdc.host
admin_server = ipadress.kdc.host
}
[logging]
default = FILE:/var/log/krb5/kdc.log
kdc = FILE:/var/log/krb5/kdc.log
admin_server = FILE:/var/log/krb5/kadmind.log
core-site.xml for Polybase on SQL-Server
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>2</value>
</property>
<property>
<name>ipc.client.connect.max.retries.on.timeouts</name>
<value>2</value>
</property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
<name>polybase.kerberos.realm</name>
<value>COMPANY.REALM.COM</value>
</property>
<property>
<name>polybase.kerberos.kdchost</name>
<value>ipadress.kdc.host</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>KERBEROS</value>
</property>
</configuration>
hdfs-site.xml for Polybase on SQL-Server
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
<!-- Client side file system caching is disabled below for credential refresh and
settting the below cache disabled options to true might result in
stale credentials when an alter credential or alter datasource is performed
-->
<property>
<name>fs.wasb.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.wasbs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.asv.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.asvs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST#COMPANY.REALM.COM</value>
</property>
</configuration>
Polybase Exception
[2018-06-22 12:51:50,349] WARN 2872[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:53,568] WARN 6091[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:56,127] WARN 8650[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:58,998] WARN 11521[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:59,139] WARN 11662[main] - org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:676) - Couldn't setup connection for hdfs#COMPANY.REALM.COM to IPADRESS_OF_NAMENODE:8020
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Log Entry on NameNode
Socket Reader #1 for port 8020: readAndProcess from client IP-ADRESS_SQL-SERVER threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: AES128 CTS mode with HMAC SHA1-96 encryption type not in permitted_enctypes list)]]
Auth failed for IP-ADRESS_SQL-SERVER:60484:null (GSS initiate failed) with true cause: (GSS initiate failed)
The confusing part for me is the log entry from our NameNode because AES128 CTS mode with HMAC SHA1-96 is already in the list of permitted enctypes as shown in krb5.conf and in Cloudera Manager UI
We appreciate your help!

The problem has itself taken care after we restarted the cluster.
I think the problem was that the krb5.conf file in our Hadoop-Cluster could not be distributed on all nodes because of some running services. There was also a warning in the Cloudera Manager about a stale configuration regarding Kerberos.
Many thanks to everyone!

Related

Unable to dynamically scale Ignite Pods in Kubernetes

We have been experimenting with the number of Ignite server pods to see the impact on performance.
One thing that we have noticed is that if the number of Ignite server pods is increased after client nodes have established communication the new pod will just fail loop with the error below.
If however the grid is destroyed (bring down all client and server nodes) and then the desired number of server nodes is launch there are no issues.
Also the above procedure is not fully dependable for anything other than launching a single Ignite server.
From reading it looks like [this stack over flow][1] post and [this documentation][2] that the issue may be that we are not launching the "Kubernetes service".
Ignite's KubernetesIPFinder requires users to configure and deploy a special Kubernetes service that maintains a list of the IP addresses of all the alive Ignite pods (nodes).
However this is the only documentation I have found and it says that it is no longer current.
Is this information still relevant for Ignite 2.11.1?
If not is there some more recent documentation?
If this service is indeed needed, are there some more concreate examples and information on setting them up?
Error on new Server pod:
[21:37:55,793][SEVERE][main][IgniteKernal] Failed to start manager: GridManagerAdapter [enabled=true, name=o.a.i.i.managers.discovery.GridDiscoveryManager]
class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, addressFilter=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1#78422efb], reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=0, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false]
at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:281)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:980)
at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1985)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1331)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2141)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1787)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1172)
at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1066)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:952)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:851)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:721)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
at org.apache.ignite.Ignition.start(Ignition.java:353)
at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:367)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same ID was found in node IDs history or existing node in topology has the same ID (fix configuration and restart local node) [localNode=TcpDiscoveryNode [id=000e84bb-f587-43a2-a662-c7c6147d2dde, consistentId=8751ef49-db25-4cf9-a38c-26e23a96a3e4, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, fd00:85:4001:5:f831:8cc:cd3:f863%eth0], sockAddrs=HashSet [nkw-mnomni-ignite-1-1-1.nkw-mnomni-ignite-1-1.680e5bbc-21b1-5d61-8dfa-6b27be10ede7.svc.cluster.local/fd00:85:4001:5:f831:8cc:cd3:f863:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=0, intOrder=0, lastExchangeTime=1676497065109, loc=true, ver=2.11.1#20211220-sha1:eae1147d, isClient=false], existingNode=000e84bb-f587-43a2-a662-c7c6147d2dde]
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.duplicateIdError(TcpDiscoverySpi.java:2083)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1201)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:473)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2207)
at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278)
... 13 more
Server DiscoverySpi Config:
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder">
<property name="namespace" value="myNameSpace"/>
<property name="serviceName" value="myServiceName"/>
</bean>
</property>
</bean>
</property>
Client DiscoverySpi Configs:
<bean id="discoverySpi" class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder" ref="ipFinder" />
</bean>
<bean id="ipFinder" class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="shared" value="false" />
<property name="addresses">
<list>
<value>myServiceName.myNameSpace:47500</value>
</list>
</property>
</bean>
Edit:
I have experimented more with this issue. As long as I do not deploy any clients (using the static TcpDiscoveryVmIpFinder above) I am able to scale up and down server pods without any issue. However as soon as a single client joins I am no longer able to scale the server pods up.
I can see that the server pods have ports 47500 and 47100 open so I am not sure what the issue is. Dows the TcpDiscoveryKubernetesIpFinder still need the port to be specified on the client config?
I have tried to change my client config to use the TcpDiscoveryKubernetesIpFinder below but I am getting a discovery timeout falure (see below).
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder">
<property name="namespace" value="680e5bbc-21b1-5d61-8dfa-6b27be10ede7"/>
<property name="serviceName" value="nkw-mnomni-ignite-1-1"/>
</bean>
</property>
</bean>
</property>
24-Feb-2023 14:15:02.450 WARNING [grid-timeout-worker-#22%igniteClientInstance%] org.apache.ignite.logger.java.JavaLogger.warning Thread dump at 2023/02/24 14:15:02 UTC
Thread [name="main", id=1, state=WAITING, blockCnt=78, waitCnt=3]
Lock [object=java.util.concurrent.CountDownLatch$Sync#45296dbd, ownerName=null, ownerId=-1]
at java.base#17.0.1/jdk.internal.misc.Unsafe.park(Native Method)
at java.base#17.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
at java.base#17.0.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
at java.base#17.0.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
at java.base#17.0.1/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
at o.a.i.spi.discovery.tcp.ClientImpl.spiStart(ClientImpl.java:324)
at o.a.i.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2207)
at o.a.i.i.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278)
at o.a.i.i.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:980)
at o.a.i.i.IgniteKernal.startManager(IgniteKernal.java:1985)
at o.a.i.i.IgniteKernal.start(IgniteKernal.java:1331)
at o.a.i.i.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2141)
at o.a.i.i.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1787)
- locked o.a.i.i.IgnitionEx$IgniteNamedInstance#57ac9100
at o.a.i.i.IgnitionEx.start0(IgnitionEx.java:1172)
at o.a.i.i.IgnitionEx.startConfigurations(IgnitionEx.java:1066)
at o.a.i.i.IgnitionEx.start(IgnitionEx.java:952)
at o.a.i.i.IgnitionEx.start(IgnitionEx.java:851)
at o.a.i.i.IgnitionEx.start(IgnitionEx.java:721)
at o.a.i.i.IgnitionEx.start(IgnitionEx.java:690)
at o.a.i.Ignition.start(Ignition.java:353)
Edit 2:
I also spoke with an admin about opening client side ports in case that was the issue. He indicated that should not be needed as clients should be able to open ephemeral ports to communicate with the server nodes.
[1]: Ignite not discoverable in kubernetes cluster with TcpDiscoveryKubernetesIpFinder
[2]: https://apacheignite.readme.io/docs/kubernetes-ip-finder
It's hard to say precisely what the root cause is, but in general it's something related to the network or domain names resolution.
A public address is assigned to a node on a startup and is exposed to other nodes for communication. Other nodes store that address and nodeId in their history. Here is what is happening: a new node is trying to enter the cluster, it connects to a random node, then this request is transferred to the coordinator. The coordinator issues TcpDiscoveryNodeAddedMessage that must circle across the topology ring and be ACKed by all other nodes. That process didn't finish during a join timeout, so the new node is trying to re-enter the topology by starting the same joining process but with a new ID. But, other nodes see that this address is already registered by another nodeId, causing the original duplicate nodeId error.
Some recommendations:
If the issue is reproducible on a regular basis, I'd recommend collecting more information by enabling DEBUG logging for the following package:
org.apache.ignite.spi.discovery (discovery-related events tracing)
Take thread dumps from affected nodes (could be done by kill -3). Check for discovery-related issues. Search for "lookupAllHostAddr".
Check that it's not DNS issue and all public addresses for your node are resolved instantly nkw-mnomni-ignite-1-1-1.nkw-mnomni-ignite-1-1.680e5bbc-21b1-5d61-8dfa-6b27be10ede7.svc.cluster.local. I was asking about the provider, because in OpenShift there seems to be a hard limit on DNS resolution time.
Check GC and safepoints.
To hide the underlying issue you can play around by increasing Ignite configuration: network timeout, join timeout, reducing failure detection timeout. But I recommend finding the real root cause instead of treating the symptoms.

Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster

I have installed tez and want to run the example like this
hadoop jar tez-examples-0.10.1.jar orderedwordcount /input /output
but it's not work and the log is
Log Type: stderr
Log Upload Time: Thu May 12 13:19:25 +0800 2022
Log Length: 77
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
Log Type: stdout
Log Upload Time: Thu May 12 13:19:25 +0800 2022
Log Length: 716
Heap
PSYoungGen total 17920K, used 921K [0x00000000eef00000, 0x00000000f0300000, 0x0000000100000000)
eden space 15360K, 6% used [0x00000000eef00000,0x00000000eefe67a8,0x00000000efe00000)
from space 2560K, 0% used [0x00000000f0080000,0x00000000f0080000,0x00000000f0300000)
to space 2560K, 0% used [0x00000000efe00000,0x00000000efe00000,0x00000000f0080000)
ParOldGen total 40960K, used 0K [0x00000000ccc00000, 0x00000000cf400000, 0x00000000eef00000)
object space 40960K, 0% used [0x00000000ccc00000,0x00000000ccc00000,0x00000000cf400000)
Metaspace used 2541K, capacity 4480K, committed 4480K, reserved 1056768K
class space used 283K, capacity 384K, committed 384K, reserved 1048576K
my_env.sh is
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_331
export PATH=$PATH:$JAVA_HOME/bin
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.3.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
#HIVE_HOME
export HIVE_HOME=/opt/module/hive-3.1.3
export PATH=$PATH:$HIVE_HOME/bin
#MAVEN_HOME
export MAVEN_HOME=/opt/module/maven-3.8.5
export PATH=$PATH:$MAVEN_HOME/bin
#TEZ_HOME
export TEZ_HOME=/opt/module/tez-0.10.1
export HADOOP_CLASSPATH=${TEZ_HOME}/conf:${TEZ_HOME}/*:${TEZ_HOME}/lib/*
tez-site.xml is
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/apps/tez/tez-0.10.1.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>false</value>
</property>
<property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
</configuration>
I have tried answer https://issues.apache.org/jira/browse/TEZ-3392 but it's not work.
Please help me in resolving this.Thanks in Advance!!!
I had the similar error message with running tez example. The installation guide https://tez.apache.org/install.html is not so obious, but has some valueable notes.
Helpful was an application tracking page where from logs i found out that path in the container after decompress tez archive is incorrect.
The whole archive is decompressed to the directory called ./tezlib and excerpt of CLASSPATH looks like that:
$PWD:$PWD/*:$PWD/tezlib/*:$PWD/tezlib/lib/*
but archive apache-tez-0.10.1-bin.tar.gz (on my HDFS in path /apps/apache-tez-0.10.1-bin.tar.gz) is decompressed inside a container to ./tezlib/apache-tez-0.10.1-bin.
So, after several hours trial and error i resolved this issue in the following steps:
tar -xf apache-tez-0.10.1-bin.tar.gz
tar -czf apache-tez-0.10.1-bin-nodir.tar.gz -C apache-tez-0.10.1-bin .
hdfs dfs -copyFromLocal apache-tez-0.10.1-bin-nodir.tar.gz /apps/
The second line above pack tez jars into an archive without parent directory.
After that tez example runs without error and finishes succeed.
My tez-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/apps/apache-tez-0.10.1-bin-nodir.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
</configuration>
Of course there are propably another ways to manage this error with incorrect path to jars.
I've tested that at hadoop 3.2.2 from bigtop distribution and tez 0.10.0/0.10.1.
I had solved my question by upload to hdfs an uncompressed tez package and change my tez-site.xml file.
hadoop fs -put tez-0.10.1 /apps/tez
My changed tez-site.xml
The main different place is "tez.lib.uris"
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/apps/tez/tez-0.10.1,${fs.defaultFS}/apps/tez/tez-0.10.1/lib</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>false</value>
</property>
<property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<name>tez.am.resource.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>tez.am.resource.cpu.vcores</name>
<value>1</value>
</property>
</configuration>

oozie-hive beeline not working with kerberos

We have recently migrated from our old HDP cluster(without kerberos) to new HDP cluster(having kerberos). We are facing some authentication issues while running our ozzie jobs on new clutser. Please refer to workflow.xml below. The first action 'hive-101' works fine, however the second action hive-102 fails.
<credentials>
<credential name="hs2-creds" type="hive2">
<property>
<name>hive2.server.principal</name>
<value>${jdbcPrincipal}</value>
</property>
<property>
<name>hive2.jdbc.url</name>
<value>${jdbcURL}</value>
</property>
</credential>
</credentials>
<start to="hive-101"/>
<action name="hive-101" cred="hs2-creds">
<hive2 xmlns="uri:oozie:hive2-action:0.2">
<jdbc-url>${jdbcURL}</jdbc-url>
<password>${hivepassword}</password>
<query>SELECT count(*) FROM table1;</query>
</hive2>
<ok to="hive-102"/>
<error to="fail"/>
</action>
<action name="hive-102" retry-max="${maxretry}" retry-interval="${retryinterval}">
<shell xmlns="uri:oozie:shell-action:0.3">
<exec>beeline</exec>
<argument>jdbc:hive2://zk01.abc.com:2181,zk02.abc.com:2181,zk03.abc.com:2181/${hivedatabase};principal=hive/_HOST#ABC.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2</argument>
<argument>--outputformat=vertical</argument>
<argument>--silent=true</argument>
<argument>-e</argument>
<argument>
SELECT max(id) as mx_id FROM ${hivedatabase}.table1;
</argument>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
Below are the error details
ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) ~[?:1.8.0_212]
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) ~[?:1.8.0_212]
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) ~[?:1.8.0_212]
WARN jdbc.HiveConnection: Failed to connect to nn02.abc.com:10000
WARN jdbc.HiveConnection: Could not open client transport with JDBC Uri: jdbc:hive2://nn02.abc.com:10000/db_test;principal=hive/_HOST#ABC.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: GSS initiate failed Retrying 0 of 1
ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) ~[?:1.8.0_212]
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) ~[?:1.8.0_212]
The shell action will run on an arbitrary data node as the Unix user who started the Oozie workflow. That user who tries to run the shell command won't be automatically authenticated with Kerberos.
I believe you will have to place a Kerberos keytab for the user on each data node. Then your Oozie shell action will need to run a script that runs a kinit using the keytab and then runs the beeline command.
From Apache Oozie by Mohammad Kamrul Islam and Aravind Srinivasan
On a nonsecure Hadoop cluster, the shell command will execute as the Unix user who runs the TaskTracker (Hadoop 1) or the YARN container (Hadoop 2). This is typically a system-defined user. On secure Hadoop clusters running Kerberos, the shell commands will run as the Unix user who submitted the workflow containing the action.

how to work with glusterfs-hadoop plugin?

i installed glusterfs and works fine, after that i installed hadoop 1.x and works fine with hdfs, but when i use glusterfs-hadoop plugin to use glusterfs as the filesystem backend for my hadoop i get error, i use github site for glusterfs-hadoop plugin. and copy jar file to hadoop library directory, and change my core-site.xml to this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.glusterfs.impl</name>
<value>org.apache.hadoop.fs.glusterfs.GlusterFileSystem</value>
</property>
<property>
<name>fs.default.name</name>
<value>glusterfs://fedora1:9010</value>
</property>
<property>
<name>fs.AbstractFileSystem.glusterfs.impl</name>
<value>org.apache.hadoop.fs.local.GlusterFs</value>
</property>
<property>
<name>fs.glusterfs.volumes</name>
<value>test1</value>
</property>
<property>
<name>fs.glusterfs.volume.fuse.test1</name>
<value>/mnt/Hadoop</value>
</property>
</configuration>
and when execute start-mapred.sh, jobtracker and tasktracker start whitout any problem, but when execute this command "hadoop fs -mkdir ossl" i get this output:
15/04/14 12:52:53 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/04/14 12:52:53 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
15/04/14 12:52:53 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
15/04/14 12:52:53 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=f0fee73, git.commit.user.email=bchilds#redhat.com, git.commit.message.full=Merge pull request #122 from childsb/getfattrparse
Refactor and cleanup the BlockLocation parsing code, git.commit.id=f0fee73c336ac19461d5b5bb91a77e05cff73361, git.commit.message.short=Merge pull request #122 from childsb/getfattrparse, git.commit.user.name=bradley childs, git.build.user.name=Unknown, git.commit.id.describe=GA-12-gf0fee73, git.build.user.email=Unknown, git.branch=master, git.commit.time=31.03.2015 # 00:36:46 IRDT, git.build.time=12.04.2015 # 14:45:49 IRDT}
15/04/14 12:52:53 INFO glusterfs.GlusterFileSystem: GIT_TAG=GA
15/04/14 12:52:53 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
15/04/14 12:52:53 INFO glusterfs.GlusterVolume: Initializing gluster volume..
15/04/14 14:36:01 INFO glusterfs.GlusterVolume: Gluster volume: test at : /mnt/hadoop
15/04/14 14:36:01 INFO glusterfs.GlusterVolume: Working directory is : /
15/04/14 14:36:01 INFO glusterfs.GlusterVolume: Write buffer size : 131072
15/04/14 14:36:01 INFO glusterfs.GlusterVolume: Default block size : 67108864
15/04/14 14:36:01 INFO glusterfs.GlusterVolume: Directory list order : fs ordering
15/04/14 14:36:01 INFO glusterfs.GlusterVolume: File timestamp lease significant digits removed : 0
mkdir: Error undefined volume:fedora1:9010 in path: glusterfs://fedora1:9010/ossl
please help me, thanks for your reply.
If I'm not mistaken this should work:
<property>
<name>fs.default.name</name>
<value>glusterfs:///fedora1:9010</value>
</property>

unable to start terracota

I'm having issue when starting terracota, I did grep on 37.139.24.150 ip in whole system but couldn't find any file containing this IP, any other places to look for? Also i couldn't find tc-config.xml in terracota its actually an old system I'm just starting terracota its not installed/configured by me.
2015-03-12 13:02:09,737 [main] INFO com.terracottatech.dso - Statistics store: '/root/terracotta/server-statistics'.
2015-03-12 13:02:09,750 [main] INFO com.terracottatech.console - Available Max Runtime Memory: 490MB
2015-03-12 13:02:09,958 [main] INFO com.terracottatech.dso - Standard DSO Server created
2015-03-12 13:02:09,962 [main] INFO com.terracottatech.dso - Creating server nodeID: NodeID[37.139.24.150:9510]
2015-03-12 13:02:09,973 [main] ERROR com.terracottatech.console - Unable to find local network interface for 37.139.24.150
2015-03-12 13:02:09,975 [main] ERROR com.terracottatech.dso - Unable to find local network interface for 37.139.24.150
com.tc.exception.TCRuntimeException: Unable to find local network interface for 37.139.24.150
at com.tc.objectserver.impl.DistributedObjectServer.start(DistributedObjectServer.java:502)
at com.tc.server.TCServerImpl.startDSOServer(TCServerImpl.java:531)
at com.tc.server.TCServerImpl.access$600(TCServerImpl.java:92)
at com.tc.server.TCServerImpl$StartAction.execute(TCServerImpl.java:479)
at com.tc.lang.StartupHelper.startUp(StartupHelper.java:39)
at com.tc.server.TCServerImpl.startServer(TCServerImpl.java:510)
at com.tc.server.TCServerImpl.start(TCServerImpl.java:271)
at com.tc.server.TCServerMain.main(TCServerMain.java:30)
I made it work, I've created new tc-config.xml and started server with ./start-tc-server.sh -f /home/tomcat/terracotta/latest/terracotta/bin/tc-config.xml &
<?xml version="1.0" encoding="UTF-8"?>
<!-- All content copyright Terracotta, Inc., unless otherwise indicated. All rights reserved. -->
<tc:tc-config xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-5.xsd"
xmlns:tc="http://www.terracotta.org/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<servers>
<!-- Sets where the Terracotta server can be found. Replace the value of host with the server's IP address. -->
<server host="<my-server-ip>" name="localhost">
<data>/home/tomcat/terracotta/server-data</data>
<logs>/home/tomcat/terracotta/server-logs</logs>
<statistics>/home/tomcat/terracotta/server-statistics</statistics>
</server>
<!-- If using more than one server, add an <ha> section. -->
<ha>
<mode>networked-active-passive</mode>
<networked-active-passive>
<election-time>5</election-time>
</networked-active-passive>
</ha>
</servers>
<!-- Sets where the generated client logs are saved on clients. Note that the exact location of Terracotta logs on client machines may vary based on the value of user.home and the local disk layout. -->
<clients>
<logs>/opt/terracotta/client-logs</logs>
</clients>
</tc:tc-config>