Kerberos issue on Spark when in cluster (YARN) mode - scala

I am using Spark with Kerberos authentication.
I can run my code using spark-shell fine and I can also use spark-submit in local mode (e.g. —master local[16]). Both function as expected.
local mode -
spark-submit --class "graphx_sp" --master local[16] --driver-memory 20G target/scala-2.10/graphx_sp_2.10-1.0.jar
I am now progressing to run in cluster mode using YARN.
From here I can see that you need to specify the location of the keytab and specify the principal. Thus:
spark-submit --class "graphx_sp" --master yarn --keytab /path/to/keytab --principal login_node --deploy-mode cluster --executor-memory 13G --total-executor-cores 32 target/scala-2.10/graphx_sp_2.10-1.0.jar
However, this returns:
Exception in thread "main" java.io.IOException: Login failure for login_node from keytab /path/to/keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:987)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:564)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:978)
... 4 more
Before I run using spark-shell or on local mode in spark-submit I do the following kerberos setup:
kinit -k -t ~/keytab -r 7d `whoami`
Clearly, this setup is not extending to the YARN setup. How do I fix the Kerberos issue with YARN in cluster mode? Is this something which must be in my /src/main/scala/graphx_sp.scala file?
Update
By running kinit -V -k -t ~/keytab -r 7dwhoami in verbose mode I was able to see the prinicpal was in the form user#node.
I updated this, checked the location of the keytab and things passed through this checkpoint succesfully:
INFO security.UserGroupInformation: Login successful for user user#login_node using keytab file /path/to/keytab
However, it then fails post this with:
client token: N/A
diagnostics: User class threw exception: org.apache.hadoop.security.AccessControlException: Authentication required
I have checked the permissions on the keytab and the read permissions are correct. It has been suggested that the next possibility is a corrupt keytab

We found out that the Authentication
required error happens, when the application tries to read from HDFS.
Scala was doing lazy evaluation, so it didn't fail, until it started
processing the file. This read from HDFS line:
webhdfs://name:50070.
Since, WEBHDFS defines a public HTTP REST API to permit access, I
thought it was using acls, but enabling ui.view.acls didn't fix the
issue. Adding --conf
spark.yarn.access.namenodes=webhdfs://name:50070 fixed the
problem. This provides comma-separated list of secure HDFS namenodes,
which the Spark application is going to access. Spark acquires the
security tokens for each of the namenodes so that the application can
access those remote HDFS clusters. This fixed the authentication
required error.
Alternatively, direct access to HDFS hdfs://file works and authenticates using Kerberos, with principal and keytab being passed during spark-submit.

Related

Spark files not found in cluster deploy mode

I'm trying to run a Spark job in cluster deploy mode by issuing in the EMR cluster master node:
spark-submit --master yarn \
--deploy-mode cluster \
--files truststore.jks,kafka.properties,program.properties \
--class com.someOrg.somePackage.someClass s3://someBucket/someJar.jar kafka.properties program.properties
I'm getting the following error, which states that the file can not be found at the Spark executor working directory:
//This is me printing the Spark executor working directory through SparkFiles.getRootDirectory()
20/07/03 17:53:40 INFO Program$: This is the path: /mnt1/yarn/usercache/hadoop/appcache/application_1593796195404_0011/spark-46b7fe4d-ba16-452a-a5a7-fbbab740bf1e/userFiles-9c6d4cae-2261-43e8-8046-e49683f9fd3e
//This is me trying to list the content for that working directory, which turns out empty.
20/07/03 17:53:40 INFO Program$: This is the content for the path:
//This is me getting the error:
20/07/03 17:53:40 ERROR ApplicationMaster: User class threw exception: java.nio.file.NoSuchFileException: /mnt1/yarn/usercache/hadoop/appcache/application_1593796195404_0011/spark-46b7fe4d-ba16-452a-a5a7-fbbab740bf1e/userFiles-9c6d4cae-2261-43e8-8046-e49683f9fd3e/program.properties
java.nio.file.NoSuchFileException: /mnt1/yarn/usercache/hadoop/appcache/application_1593796195404_0011/spark-46b7fe4d-ba16-452a-a5a7-fbbab740bf1e/userFiles-9c6d4cae-2261-43e8-8046-e49683f9fd3e/program.properties
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at ccom.someOrg.somePackage.someHelpers$.loadPropertiesFromFile(Helpers.scala:142)
at com.someOrg.somePackage.someClass$.main(someClass.scala:33)
at com.someOrg.somePackage.someClass.main(someClass.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
This is the function I use to attempt to read the properties files passed as arguments:
def loadPropertiesFromFile(path: String): Properties = {
val inputStream = Files.newInputStream(Paths.get(path), StandardOpenOption.READ)
val properties = new Properties()
properties.load(inputStream)
properties
}
Invoked as:
val spark = SparkSession.builder().getOrCreate()
import spark.implicits._
val kafkaProperties = loadPropertiesFromFile(SparkFiles.get(args(1)))
val programProperties = loadPropertiesFromFile(SparkFiles.get(args(2)))
//Also tried loadPropertiesFromFile(args({1,2}))
The program works as expected when issued with client deploy mode:
spark-submit --master yarn \
--deploy-mode client \
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5 \
--files truststore.jks program.jar com.someOrg.somePackage.someClass kafka.properties program.properties
This happens in Spark 2.4.5 / EMR 5.30.1.
Additionally, when I try to configure this job as an EMR step it does not even work in client mode. Any clue on how are the resource files passed through --files option managed/persisted/available in EMR?
Option 1: Put those files in s3 and pass the s3 path.
Option 2: copy those files to each node in a specific location(using bootstrap) and pass the absolute path of files.
Solved with suggestions from the above comments:
spark-submit --master yarn \
--deploy-mode cluster \
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5 \
--files s3://someBucket/resources/truststore.jks,s3://someBucket/resources/kafka.properties,s3://someBucket/resources/program.properties \
--class com.someOrg.someClass.someMain \
s3://someBucket/resources/program.jar kafka.properties program.properties
I was previously assuming that in cluster deploy mode the files under --files were also shipped alongside the driver deployed to a worker node (and thereby available in the working directory), if accessible from the machine where spark-submit is issued.
Bottom line: Regardless of where you issue spark-submit from and the availability of the files in that machine, in cluster mode, you must ensure that files are accessible from every worker node.
It is now working by pointing files location to S3.
Thank you all!

Kafka Failed to create new KafkaAdminClient on Kerberized Cluster

i have a kerberized cluster with Kafka on it.
I want to use Confluent Schema Registry with Kafka on cluster.
Launching the Schema Registry from my local pc, everything works just fine.
But when i uploaded it on a machine in the cluster and tried to run from it i get:
Error
ERROR Server died unexpectedly: (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain:50)
org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
...
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Message stream modified (41)
...
Caused by: javax.security.auth.login.LoginException: Message stream modified (41)
I also tried to run it from another machine on the cluster and i get the same result.
Schema-registry.properties
listeners=http://0.0.0.0:8081
kafkastore.connection.url=master01.domain.ext:2181,master02.domain.ext:2181
kafkastore.bootstrap.servers=SASL_PLAINTEXT://xxx.domain.ext:6667,SASL_PLAINTEXT://xxx.domain.ext:6667
kafkastore.topic=_schemas
debug=true
kafkastore.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \
useKeyTab=true \
storeKey=true \
keyTab="/path/schema-registry/etc/schema-registry/my-keytab.keytab" \
principal="kafka/xxx.domain.ext#DOMAIN.EXT";
kafkastore.sasl.kerberos.service.name=kafka
kafkastore.security.protocol=SASL_PLAINTEXT
kafkastore.sasl.mechanism=GSSAPI
EXECUTION COMMAND:
sudo bash bin/schema-registry-start ./etc/schema-registry/schema-registry.properties
QUESTIONS:
Why it works in my local pc and not on a machine on cluster?
What should i change?
P.S. i get the same result even trying to run the CMAK yahoo kafka-manager ( using same jaas.confing and same keytab )

spark-submit --keytab option does not copy the file to executors

In my case I am using Spark (2.1.1) and for the processing I need to connect to Kafka (using kerberos, therefore a keytab).
When submitting the job I can pass the keytab with --keytab and --principal options. The main drawback is that the keytab will no be send to the distributed cache (or at least be available to the executors) so it will fail.
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
...
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner` authentication information from the user
If I try passing it also in --files it works (version 2.1.0) but in this latest version (2.1.1) it is not allowed because it failes due to:
Exception in thread "main" java.lang.IllegalArgumentException: Attempt to add (file:keytab.keytab) multiple times to the distributed cache.
Any tips?
I resolved this issue making a copy of my keytab file (e.g. original file is osboo.keytab and its copy osboo-copy-for-kafka.keytab) and pushing it to HDFS via --files option.
# Call
spark2-submit --keytab osboo.keytab \
--principal osboo \
--files osboo-copy-for-kafka.keytab#osboo-copy-for-kafka.keytab,kafka.jaas#kafka.jaas
# kafka.jaas
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="osboo-copy-for-kafka.keytab"
principal="osboo#REALM.COM"
serviceName="kafka";
};
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="osboo-copy-for-kafka.keytab"
serviceName="zookeeper"
principal="osboo#REALM.COM";
};
Maybe this solution requires less efforts to keep in mind symlinks between files so I hope it helps.
spark-submit --keytab option copy the file with different name in the local container dir when you submit app on yarn.
you can find this in lauch_container.sh
lauch_container.sh

Failed to submit local jar to spark cluster: java.nio.file.NoSuchFileException

~/spark/spark-2.1.1-bin-hadoop2.7/bin$ ./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/06/20 16:41:30 INFO RestSubmissionClient: Submitting a request to launch an application in spark://192.168.42.80:32141.
17/06/20 16:41:31 INFO RestSubmissionClient: Submission successfully created as driver-20170620204130-0005. Polling submission state...
17/06/20 16:41:31 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20170620204130-0005 in spark://192.168.42.80:32141.
17/06/20 16:41:31 INFO RestSubmissionClient: State of driver driver-20170620204130-0005 is now ERROR.
17/06/20 16:41:31 INFO RestSubmissionClient: Driver is running on worker worker-20170620203037-172.17.0.5-45429 at 172.17.0.5:45429.
17/06/20 16:41:31 ERROR RestSubmissionClient: Exception from the cluster:
java.nio.file.NoSuchFileException: /home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
java.nio.file.Files.copy(Files.java:1274)
org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:608)
org.apache.spark.util.Utils$.copyFile(Utils.scala:579)
org.apache.spark.util.Utils$.doFetchFile(Utils.scala:664)
org.apache.spark.util.Utils$.fetchFile(Utils.scala:463)
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:154)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:172)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:91)
17/06/20 16:41:31 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20170620204130-0005",
"serverSparkVersion" : "2.1.1",
"submissionId" : "driver-20170620204130-0005",
"success" : true
}
Log from spark-worker:
2017-06-20T20:41:30.807403232Z 17/06/20 20:41:30 INFO Worker: Asked to launch driver driver-20170620204130-0005
2017-06-20T20:41:30.817248508Z 17/06/20 20:41:30 INFO DriverRunner: Copying user jar file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar to /opt/spark/work/driver-20170620204130-0005/myproj-assembly-0.1.0.jar
2017-06-20T20:41:30.883645747Z 17/06/20 20:41:30 INFO Utils: Copying /home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar to /opt/spark/work/driver-20170620204130-0005/myproj-assembly-0.1.0.jar
2017-06-20T20:41:30.885217508Z 17/06/20 20:41:30 INFO DriverRunner: Killing driver process!
2017-06-20T20:41:30.885694618Z 17/06/20 20:41:30 WARN Worker: Driver driver-20170620204130-0005 failed with unrecoverable exception: java.nio.file.NoSuchFileException: home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
Any idea why? Thanks
UPDATE
Is the following command right?
./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
UPDATE
I think I understand a little more about the spark and why I had this problem and spark-submit error: ClassNotFoundException. The key point is that though the word REST used here REST URL: spark://127.0.1.1:6066 (cluster mode), the application jar will not be uploaded to the cluster after submission, which is different with my understanding. so, the spark cluster cannot find the application jar, and cannot load the main class.
I will try to find how to setup the spark cluster and use the cluster mode to submit application. No idea whether client mode will use more resources for streaming jobs.
Blockquote
UPDATE
I think I understand a little more about the spark and why I had this problem and >spark-submit error: ClassNotFoundException. The key point is that though the word >REST used here REST URL: spark://127.0.1.1:6066 (cluster mode), the application >jar will not be uploaded to the cluster after submission, which is different with >my understanding. so, the spark cluster cannot find the application jar, and >cannot load the main class.
That's why you have to locate the jar-file in the master node OR put it into the hdfs before the spark submit.
This is how to do it:
1.) Transfering the file to the master node with ubuntu command
$ scp <file> <username>#<IP address or hostname>:<Destination>
For example:
$ scp mytext.txt tom#128.140.133.124:~/
2.) Transfering the file to the HDFS:
$ hdfs dfs -put mytext.txt
Hope I could help you.
You are submiting the application with cluster mode, this mean a Spark driver application will be created somewhere, the file must exist here.
That why with Spark, its recommanded to use a distributed file system like HDFS or S3.
The standalone mode cluster wants to pass jar files to hdfs because the driver is on any node in the cluster.
hdfs dfs -put xxx.jar /user/
spark-submit --master spark://xxx:7077 \
--deploy-mode cluster \
--supervise \
--driver-memory 512m \
--total-executor-cores 1 \
--executor-memory 512m \
--executor-cores 1 \
--class com.xiyou.bi.streaming.game.common.DmMoGameviewOnlineLogic \
hdfs://xxx:8020/user/hutao/xxx.jar

How to deploy scala files used in spark-shell on cluster?

I'm using the spark-shell for learning purpose and for that I created several scala files containing frequently used code, like class definitions. I use the files by calling the ":load" command within the shell.
Now I would like to to use the spark-shell in in yarn-cluster mode. I start it using spark-shell --master yarn --deploy-mode client.
the shell starts without any issues but when I try to run the code loaded by ":load", I get execution errors.
17/05/04 07:59:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e68_1493271022021_0168_01_000002 on host: xxxw03.mine.de. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_e68_1493271022021_0168_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I think I will have to share the code loaded in the shell to the workers. But how do I have to do this?
The spark-shell is useful for quickly testing but once you have an idea of what you want to do and put together a complete program it's usefulness plummets.
You probably want to now move on to using the spark-submit command.
See the docs on submitting an application https://spark.apache.org/docs/latest/submitting-applications.html
Using this command you provide a JAR file instead of individual class files.
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
<main-class> is the Java style path to your class e.g. com.example.MyMainClass
<application-jar> is the path to the JAR file containing the classes in your project and the other params are as per documented on the link I included above but these two are the two key differences in terms of how you supply your code to the cluster.