When I submit jars in local mode it works fine but, in yarn cluster it fails with below error,
spark-submit --class com.ffx.events.logevents.LogParser --master yarn-cluster
--jars hdfs://EC22.internal:8020/user/hadoop/dir/src/guava-18.0.jar
--driver-class-path logevents-1.0.0-SNAPSHOT-jar-with-dependencies.jar,hdfs:/
/EC22.internal:8020/user/hadoop/dir/src/netty-3.6.2.Final.jar hdfs://e
c2ipaddress.internal:8020/user/hadoop/dir/src/logevents-1.0.0-SNAPSHOT-jar-with
-dependencies.jar.filepart -input /from-s3-1year/ffx-data/20151001/* -output
hdfs://EC22.internal:8020/user/hadoop/dir/output/20151120
Error trace part of it,
15/11/20 14:00:00 INFO Client: Application report for application_1447788091680_0045 (state: ACCEPTED)
15/11/20 14:00:01 INFO Client: Application report for application_1447788091680_0045 (state: ACCEPTED)
15/11/20 14:00:02 INFO Client: Application report for application_1447788091680_0045 (state: ACCEPTED)
15/11/20 14:00:03 INFO Client: Application report for application_1447788091680_0045 (state: ACCEPTED)
15/11/20 14:00:04 INFO Client: Application report for application_1447788091680_0045 (state: FAILED)
15/11/20 14:00:04 INFO Client:
client token: N/A
diagnostics: Application application_1447788091680_0045 failed 2 times due to AM Container for
appattempt_1447788091680_0045_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://EC22.internal:20888/proxy/
application_1447788091680_0045/Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://EC22.internal:8020/user/hadoop/.sparkStaging/
application_1447788091680_0045/__spark_conf__7360399142592952913.zip
java.io.FileNotFoundException: File does not exist: hdfs://EC22.internal:8020/user/hadoop/.sparkStaging/
application_1447788091680_0045/__spark_conf__7360399142592952913.zip
I'm not familiar with spark. Appreciate some help
Related
I try to run my program in yarn cluster mode.I'm 100% sure the class in fat jar by sbt exists.
I have no clue why the spark always throw Stack trace: ExitCodeException exitCode=13 error.
And I follow the trackping page and see java.lang.ClassNotFoundException: org.air.ebds.organize.geotrellisETLtoa.test.
Then I run the spark PI example in yarn cluster and make it.
In yarn client/local mode,it still failed with the same error:java.lang.ClassNotFoundException: org.air.ebds.organize.geotrellisETLtoa.test
P.S. The spark conf in program looks like this:
var sparkConf = new SparkConf()
.setAppName("TiffDN2TOA")
// .setIfMissing("spark.master", masterUrl)
.set("spark.executor.memory", "10g")
.set("spark.kryoserializer.buffer.max", "1024")
implicit val sc = new SparkContext(sparkConf)
object test {
val masterUrl = "local[*]"
var sparkConf = new SparkConf()
.setAppName("TiffDN2TOA")
// .setIfMissing("spark.master", masterUrl)
.set("spark.executor.memory", "10g")
.set("spark.kryoserializer.buffer.max", "1024")
implicit val sc = new SparkContext(sparkConf)
def main(args: Array[String]): Unit = {
HadoopLandsatDN2ToaMethods.scenesDn2Toa(args(0),args(1))
}
}
spark2-submit \
> --master yarn \
> --deploy-mode cluster \
> --class org.air.ebds.organize.geotrellisETLtoa.LandsatDN2Toa \
> --num-executors 4 \
> --executor-cores 4 \
> --executor-memory 10G \
> --driver-memory 12g \
> --conf "spark.kryoserializer.buffer.max=1024m spark.kryoserializer.buffer=1024m" \
> /root/Desktop/toa.jar \
> /root/Desktop/ebds_landsat8/LC08/122/031/LC08_L1TP_122031_20140727,/root/Desktop/ebds_landsat8/LC08/122/031/LC08_L1TP_122031_20140913,LC08_L1TP_122031_20141116 \
> file:///
19/07/10 16:40:21 INFO client.RMProxy: Connecting to ResourceManager at bigdataone/192.168.1.151:8032
19/07/10 16:40:21 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
19/07/10 16:40:21 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (65536 MB per container)
19/07/10 16:40:21 INFO yarn.Client: Will allocate AM container, with 13516 MB memory including 1228 MB overhead
19/07/10 16:40:21 INFO yarn.Client: Setting up container launch context for our AM
19/07/10 16:40:21 INFO yarn.Client: Setting up the launch environment for our AM container
19/07/10 16:40:21 INFO yarn.Client: Preparing resources for our AM container
19/07/10 16:40:21 INFO yarn.Client: Uploading resource file:/root/Desktop/toa.jar -> hdfs://bigdataone:8020/user/root/.sparkStaging/application_1561542066113_0061/toa.jar
19/07/10 16:40:33 INFO yarn.Client: Uploading resource file:/tmp/spark-0ccc5b92-4ef5-4f5e-944b-386abcbb5938/__spark_conf__3393474382108225503.zip -> hdfs://bigdataone:8020/user/root/.sparkStaging/application_1561542066113_0061/__spark_conf__.zip
19/07/10 16:40:33 INFO spark.SecurityManager: Changing view acls to: root
19/07/10 16:40:33 INFO spark.SecurityManager: Changing modify acls to: root
19/07/10 16:40:33 INFO spark.SecurityManager: Changing view acls groups to:
19/07/10 16:40:33 INFO spark.SecurityManager: Changing modify acls groups to:
19/07/10 16:40:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
19/07/10 16:40:34 INFO yarn.Client: Submitting application application_1561542066113_0061 to ResourceManager
19/07/10 16:40:34 INFO impl.YarnClientImpl: Submitted application application_1561542066113_0061
19/07/10 16:40:35 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:35 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.root
start time: 1562748034023
final status: UNDEFINED
tracking URL: http://bigdataone:8088/proxy/application_1561542066113_0061/
user: root
19/07/10 16:40:36 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:37 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:38 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:39 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:40 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:41 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:42 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:43 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:44 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:45 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:46 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:47 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:48 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:49 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:50 INFO yarn.Client: Application report for application_1561542066113_0061 (state: ACCEPTED)
19/07/10 16:40:51 INFO yarn.Client: Application report for application_1561542066113_0061 (state: FAILED)
19/07/10 16:40:51 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1561542066113_0061 failed 2 times due to AM Container for appattempt_1561542066113_0061_000002 exited with exitCode: 13
For more detailed output, check application tracking page:http://bigdataone:8088/proxy/application_1561542066113_0061/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1561542066113_0061_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 13
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.root
start time: 1562748034023
final status: FAILED
tracking URL: http://bigdataone:8088/cluster/app/application_1561542066113_0061
user: root
Exception in thread "main" org.apache.spark.SparkException: Application application_1561542066113_0061 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1153)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1568)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/07/10 16:40:51 INFO util.ShutdownHookManager: Shutdown hook called
19/07/10 16:40:51 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5e6eb641-9f2a-4351-947d-a3b4cf578f6d
19/07/10 16:40:51 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0ccc5b92-4ef5-4f5e-944b-386abcbb5938
The application tracking page:http://bigdataone:8088/proxy/application_1561542066113_0061 shows:
19/07/10 15:14:10 INFO util.SignalUtils: Registered signal handler for TERM
19/07/10 15:14:10 INFO util.SignalUtils: Registered signal handler for HUP
19/07/10 15:14:10 INFO util.SignalUtils: Registered signal handler for INT
19/07/10 15:14:10 INFO spark.SecurityManager: Changing view acls to: yarn,root
19/07/10 15:14:10 INFO spark.SecurityManager: Changing modify acls to: yarn,root
19/07/10 15:14:10 INFO spark.SecurityManager: Changing view acls groups to:
19/07/10 15:14:10 INFO spark.SecurityManager: Changing modify acls groups to:
19/07/10 15:14:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); groups with view permissions: Set(); users with modify permissions: Set(yarn, root); groups with modify permissions: Set()
19/07/10 15:14:10 INFO yarn.ApplicationMaster: Preparing Local resources
19/07/10 15:14:11 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1561542066113_0055_000002
19/07/10 15:14:11 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
19/07/10 15:14:11 ERROR yarn.ApplicationMaster: Uncaught exception:
java.lang.ClassNotFoundException: org.air.ebds.organize.geotrellisETLtoa.test
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:682)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:448)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:301)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:241)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:241)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:241)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:782)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:781)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:240)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:806)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/07/10 15:14:11 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.air.ebds.organize.geotrellisETLtoa.test)
19/07/10 15:14:11 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://bigdataone:8020/user/root/.sparkStaging/application_1561542066113_0055
19/07/10 15:14:11 INFO util.ShutdownHookManager: Shutdown hook called
I have tried to read the data from s3 bucket and do the computation in spark and write the output to s3 bucket. This process has been completed successfully.But, at EMR step level i see there was job failed. If i see the log it is showing that File does not exist.
Please see the log below.
19/01/09 08:40:37 INFO RMProxy: Connecting to ResourceManager at ip-172-30-0-84.ap-northeast-1.compute.internal/172.30.0.84:8032
19/01/09 08:40:37 INFO Client: Requesting a new application from cluster with 2 NodeManagers
19/01/09 08:40:37 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (106496 MB per container)
19/01/09 08:40:37 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
19/01/09 08:40:37 INFO Client: Setting up container launch context for our AM
19/01/09 08:40:37 INFO Client: Setting up the launch environment for our AM container
19/01/09 08:40:37 INFO Client: Preparing resources for our AM container
19/01/09 08:40:39 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/01/09 08:40:43 INFO Client: Uploading resource file:/mnt/tmp/spark-e0c6fbd3-14b0-4fcd-bbd2-c78658fdefd0/__spark_libs__8470659354947187213.zip -> hdfs://ip-172-30-0-84.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1547023042733_0001/__spark_libs__8470659354947187213.zip
19/01/09 08:40:47 INFO Client: Uploading resource s3://dev-system/SparkApps/jar/rxsicheck.jar -> hdfs://ip-172-30-0-84.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1547023042733_0001/rxsicheck.jar
19/01/09 08:40:47 INFO S3NativeFileSystem: Opening 's3://dev-system/SparkApps/jar/rxsicheck.jar' for reading
19/01/09 08:40:47 INFO Client: Uploading resource file:/mnt/tmp/spark-e0c6fbd3-14b0-4fcd-bbd2-c78658fdefd0/__spark_conf__4575598882972227909.zip -> hdfs://ip-172-30-0-84.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1547023042733_0001/__spark_conf__.zip
19/01/09 08:40:47 INFO SecurityManager: Changing view acls to: hadoop
19/01/09 08:40:47 INFO SecurityManager: Changing modify acls to: hadoop
19/01/09 08:40:47 INFO SecurityManager: Changing view acls groups to:
19/01/09 08:40:47 INFO SecurityManager: Changing modify acls groups to:
19/01/09 08:40:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
19/01/09 08:40:47 INFO Client: Submitting application application_1547023042733_0001 to ResourceManager
19/01/09 08:40:48 INFO YarnClientImpl: Submitted application application_1547023042733_0001
19/01/09 08:40:49 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:49 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1547023248110
final status: UNDEFINED
tracking URL: http://ip-172-30-0-84.ap-northeast-1.compute.internal:20888/proxy/application_1547023042733_0001/
user: hadoop
19/01/09 08:40:50 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:51 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:52 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:53 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:54 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:55 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:56 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:57 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:58 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:40:59 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:00 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:01 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:02 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:03 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:04 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:05 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:06 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:07 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:08 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:09 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:10 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:11 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:12 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:13 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:14 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:15 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:16 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:17 INFO Client: Application report for application_1547023042733_0001 (state: ACCEPTED)
19/01/09 08:41:18 INFO Client: Application report for application_1547023042733_0001 (state: FAILED)
19/01/09 08:41:18 INFO Client:
client token: N/A
diagnostics: Application application_1547023042733_0001 failed 2 times due to AM Container for appattempt_1547023042733_0001_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://ip-172-30-0-84.ap-northeast-1.compute.internal:8088/cluster/app/application_1547023042733_0001Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://ip-172-30-0-84.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1547023042733_0001/__spark_libs__8470659354947187213.zip
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-30-0-84.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1547023042733_0001/__spark_libs__8470659354947187213.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1547023248110
final status: FAILED
tracking URL: http://ip-172-30-0-84.ap-northeast-1.compute.internal:8088/cluster/app/application_1547023042733_0001
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1547023042733_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/01/09 08:41:18 INFO ShutdownHookManager: Shutdown hook called
19/01/09 08:41:18 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-e0c6fbd3-14b0-4fcd-bbd2-c78658fdefd0
Command exiting with ret '1'
I can see the my expected output result but job shows it is failed. Am i missing anything?
Here is the my code:
package Spark_package
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object SampleFile {
def main(args: Array[String]) {
val spark = SparkSession.builder.master("local[*]").appName("SampleFile").getOrCreate()
val sc = spark.sparkContext
val conf = new SparkConf().setAppName("SampleFile")
val sqlContext = spark.sqlContext
val df = spark.read.format("csv").option("header","true").option("inferSchema","true").load("s3a://test-system/Checktool/Zipdata/*.gz")
df.createOrReplaceTempView("data")
val res = spark.sql("select count(*) from data")
res.coalesce(1).write.format("csv").option("header","true").mode("Overwrite").save("s3a://dev-system/Checktool/bkup/")
spark.stop()
}
}
kindly help me how to solve this issue?
Remove the master(local[*]) and run it on respective cluster works.
I am trying to run the following piece of Spark code written in Scala on Amazon EMR:
import org.apache.spark.{SparkConf, SparkContext}
object TestRunner {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Hello World")
val sc = new SparkContext(conf)
val words = sc.parallelize(Seq("a", "b", "c", "d", "e"))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
println(wordCounts)
}
}
This is the script I am using to deploy the above code into EMR:
#!/usr/bin/env bash
set -euxo pipefail
cluster_id='j-XXXXXXXXXX'
app_name="HelloWorld"
main_class="TestRunner"
jar_name="HelloWorld-assembly-0.0.1-SNAPSHOT.jar"
jar_path="target/scala-2.11/${jar_name}"
s3_jar_dir="s3://jars/"
s3_jar_path="${s3_jar_dir}${jar_name}"
###################################################
sbt assembly
aws s3 cp ${jar_path} ${s3_jar_dir}
aws emr add-steps --cluster-id ${cluster_id} --steps Type=spark,Name=${app_name},Args=[--deploy-mode,cluster,--master,yarn-cluster,--class,${main_class},${s3_jar_path}],ActionOnFailure=CONTINUE
But, this exits with producing no output at all in AWS after few minutes!
Here's my controller's output:
2016-10-20T21:03:17.043Z INFO Ensure step 3 jar file command-runner.jar
2016-10-20T21:03:17.043Z INFO StepRunner: Created Runner for step 3
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster --class TestRunner s3://jars/mscheiber/HelloWorld-assembly-0.0.1-SNAPSHOT.jar'
INFO Environment:
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
LESS_TERMCAP_md=[01;38;5;208m
LESS_TERMCAP_me=[0m
HISTCONTROL=ignoredups
LESS_TERMCAP_mb=[01;31m
AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
UPSTART_JOB=rc
LESS_TERMCAP_se=[0m
HISTSIZE=1000
HADOOP_ROOT_LOGGER=INFO,DRFA
JAVA_HOME=/etc/alternatives/jre
AWS_DEFAULT_REGION=us-east-1
AWS_ELB_HOME=/opt/aws/apitools/elb
LESS_TERMCAP_us=[04;38;5;111m
EC2_HOME=/opt/aws/apitools/ec2
TERM=linux
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
runlevel=3
LANG=en_US.UTF-8
AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
MAIL=/var/spool/mail/hadoop
LESS_TERMCAP_ue=[0m
LOGNAME=hadoop
PWD=/
LANGSH_SOURCED=1
HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-3UAS8JQ0KEOV3/tmp
_=/etc/alternatives/jre/bin/java
CONSOLETYPE=serial
RUNLEVEL=3
LESSOPEN=||/usr/bin/lesspipe.sh %s
previous=N
UPSTART_EVENTS=runlevel
AWS_PATH=/opt/aws
USER=hadoop
UPSTART_INSTANCE=
PREVLEVEL=N
HADOOP_LOGFILE=syslog
HOSTNAME=ip-10-17-186-102
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-3UAS8JQ0KEOV3
EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
SHLVL=5
HOME=/home/hadoop
HADOOP_IDENT_STRING=hadoop
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-3UAS8JQ0KEOV3/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-3UAS8JQ0KEOV3/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-3UAS8JQ0KEOV3
INFO ProcessRunner started child process 24549 :
hadoop 24549 4780 0 21:03 ? 00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster --class TestRunner s3://jars/TestRunner-assembly-0.0.1-SNAPSHOT.jar
2016-10-20T21:03:21.050Z INFO HadoopJarStepRunner.Runner: startRun() called for s-3UAS8JQ0KEOV3 Child Pid: 24549
INFO Synchronously wait child process to complete : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO waitProcessCompletion ended with exit code 0 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 44 seconds
2016-10-20T21:04:03.102Z INFO Step created jobs:
2016-10-20T21:04:03.103Z INFO Step succeeded with exitCode 0 and took 44 seconds
The syslog and stdout is empty and this is in my stderr:
16/10/20 21:03:20 INFO RMProxy: Connecting to ResourceManager at ip-10-17-186-102.ec2.internal/10.17.186.102:8032
16/10/20 21:03:21 INFO Client: Requesting a new application from cluster with 2 NodeManagers
16/10/20 21:03:21 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (53248 MB per container)
16/10/20 21:03:21 INFO Client: Will allocate AM container, with 53247 MB memory including 4840 MB overhead
16/10/20 21:03:21 INFO Client: Setting up container launch context for our AM
16/10/20 21:03:21 INFO Client: Setting up the launch environment for our AM container
16/10/20 21:03:21 INFO Client: Preparing resources for our AM container
16/10/20 21:03:21 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/10/20 21:03:22 INFO Client: Uploading resource file:/mnt/tmp/spark-6fceeedf-0ad5-4df1-a63e-c1d7eb1b95b4/__spark_libs__5484581201997889110.zip -> hdfs://ip-10-17-186-102.ec2.internal:8020/user/hadoop/.sparkStaging/application_1476995377469_0002/__spark_libs__5484581201997889110.zip
16/10/20 21:03:24 INFO Client: Uploading resource s3://jars/HelloWorld-assembly-0.0.1-SNAPSHOT.jar -> hdfs://ip-10-17-186-102.ec2.internal:8020/user/hadoop/.sparkStaging/application_1476995377469_0002/DataScience-assembly-0.0.1-SNAPSHOT.jar
16/10/20 21:03:24 INFO S3NativeFileSystem: Opening 's3://jars/HelloWorld-assembly-0.0.1-SNAPSHOT.jar' for reading
16/10/20 21:03:26 INFO Client: Uploading resource file:/mnt/tmp/spark-6fceeedf-0ad5-4df1-a63e-c1d7eb1b95b4/__spark_conf__5724047842379101980.zip -> hdfs://ip-10-17-186-102.ec2.internal:8020/user/hadoop/.sparkStaging/application_1476995377469_0002/__spark_conf__.zip
16/10/20 21:03:26 INFO SecurityManager: Changing view acls to: hadoop
16/10/20 21:03:26 INFO SecurityManager: Changing modify acls to: hadoop
16/10/20 21:03:26 INFO SecurityManager: Changing view acls groups to:
16/10/20 21:03:26 INFO SecurityManager: Changing modify acls groups to:
16/10/20 21:03:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
16/10/20 21:03:26 INFO Client: Submitting application application_1476995377469_0002 to ResourceManager
16/10/20 21:03:26 INFO YarnClientImpl: Submitted application application_1476995377469_0002
16/10/20 21:03:27 INFO Client: Application report for application_1476995377469_0002 (state: ACCEPTED)
16/10/20 21:03:27 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1476997406896
final status: UNDEFINED
tracking URL: http://ip-10-17-186-102.ec2.internal:20888/proxy/application_1476995377469_0002/
user: hadoop
16/10/20 21:03:28 INFO Client: Application report for application_1476995377469_0002 (state: ACCEPTED)
16/10/20 21:03:29 INFO Client: Application report for application_1476995377469_0002 (state: ACCEPTED)
16/10/20 21:03:30 INFO Client: Application report for application_1476995377469_0002 (state: ACCEPTED)
16/10/20 21:03:31 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:31 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.17.181.184
ApplicationMaster RPC port: 0
queue: default
start time: 1476997406896
final status: UNDEFINED
tracking URL: http://ip-10-17-186-102.ec2.internal:20888/proxy/application_1476995377469_0002/
user: hadoop
16/10/20 21:03:32 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:33 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:34 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:35 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:36 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:37 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:38 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:39 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:40 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:41 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:42 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:43 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:44 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:45 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:46 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:47 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:48 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:49 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:50 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:51 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:52 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:53 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:54 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:55 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:56 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:57 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:58 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:03:59 INFO Client: Application report for application_1476995377469_0002 (state: RUNNING)
16/10/20 21:04:00 INFO Client: Application report for application_1476995377469_0002 (state: FINISHED)
16/10/20 21:04:00 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.17.181.184
ApplicationMaster RPC port: 0
queue: default
start time: 1476997406896
final status: SUCCEEDED
tracking URL: http://ip-10-17-186-102.ec2.internal:20888/proxy/application_1476995377469_0002/
user: hadoop
16/10/20 21:04:00 INFO Client: Deleting staging directory hdfs://ip-10-17-186-102.ec2.internal:8020/user/hadoop/.sparkStaging/application_1476995377469_0002
16/10/20 21:04:00 INFO ShutdownHookManager: Shutdown hook called
16/10/20 21:04:00 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-6fceeedf-0ad5-4df1-a63e-c1d7eb1b95b4
Command exiting with ret '0'
What am I missing?
Looks like your application succeeded just fine. However, there are two reasons why you don't see any output in the step's stout logs.
1) You ran the application in yarn-cluster mode, which means that the driver runs on a random cluster node rather than on the master node. If you specified an S3 log uri when creating the cluster, you should see the logs for this application in the containers directory of your S3 bucket. The logs for the driver will be in container #0's logs.
2) You did not call anything like "collect()" to bring data from the Spark executors back to the driver, so your println() at the end is not printing the data anyway but rather a toString() representation of the RDD. You probably want to do something like .collect().foreach(println) instead.
I have a very simple app that I'm trying to run on aws emr. The jar has been built using assembly with spark a provided dependency. It resides on S3 along with a test text file that I wanted to test.
In the EMR UI I select to add a step and add the details telling it the location of the jar and the argument file location.
It runs but always fails with an error - I then set up a new cluster(sanity checking) and rna again only to get the same result, any help is appreciated.
Thank you
The error from the log:
16/03/18 11:40:56 INFO client.RMProxy: Connecting to ResourceManager at ip-10-1-1-234.ec2.internal/10.1.1.234:8032
16/03/18 11:40:56 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/03/18 11:40:56 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)
16/03/18 11:40:56 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/03/18 11:40:56 INFO yarn.Client: Setting up container launch context for our AM
16/03/18 11:40:56 INFO yarn.Client: Setting up the launch environment for our AM container
16/03/18 11:40:56 INFO yarn.Client: Preparing resources for our AM container
16/03/18 11:40:57 INFO yarn.Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.0-hadoop2.7.1-amzn-1.jar -> hdfs://ip-10-1-1-234.ec2.internal:8020/user/hadoop/.sparkStaging/application_1458297951763_0003/spark-assembly-1.6.0-hadoop2.7.1-amzn-1.jar
16/03/18 11:40:57 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1458297958626
16/03/18 11:40:57 INFO metrics.MetricsSaver: Created MetricsSaver j-DKMA93DFZ456:i-91bff215:SparkSubmit:20036 period:60 /mnt/var/em/raw/i-91bff215_20160318_SparkSubmit_20036_raw.bin
16/03/18 11:40:58 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 590 raw values into 1 aggregated values, total 1
16/03/18 11:40:59 INFO fs.EmrFileSystem: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
16/03/18 11:41:00 INFO metrics.MetricsSaver: Thread 1 created MetricsLockFreeSaver 1
16/03/18 11:41:00 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-030f9d29-f7ca-42fa-9caf-64ea103a2bb1/__spark_conf__7615049662154628286.zip -> hdfs://ip-10-1-1-234.ec2.internal:8020/user/hadoop/.sparkStaging/application_1458297951763_0003/__spark_conf__7615049662154628286.zip
16/03/18 11:41:00 INFO spark.SecurityManager: Changing view acls to: hadoop
16/03/18 11:41:00 INFO spark.SecurityManager: Changing modify acls to: hadoop
16/03/18 11:41:00 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/03/18 11:41:01 INFO yarn.Client: Submitting application 3 to ResourceManager
16/03/18 11:41:01 INFO impl.YarnClientImpl: Submitted application application_1458297951763_0003
16/03/18 11:41:02 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:02 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458301261052
final status: UNDEFINED
tracking URL: http://ip-10-1-1-234.ec2.internal:20888/proxy/application_1458297951763_0003/
user: hadoop
16/03/18 11:41:03 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:04 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:05 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:06 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:07 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:08 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:09 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:10 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:11 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:12 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:13 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:14 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:15 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:16 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:17 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:18 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:19 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:20 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:21 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:22 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:23 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:24 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:25 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:26 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:27 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:28 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:29 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:30 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:31 INFO yarn.Client: Application report for application_1458297951763_0003 (state: ACCEPTED)
16/03/18 11:41:32 INFO yarn.Client: Application report for application_1458297951763_0003 (state: FAILED)
16/03/18 11:41:32 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1458297951763_0003 failed 2 times due to AM Container for appattempt_1458297951763_0003_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://ip-10-1-1-234.ec2.internal:8088/cluster/app/application_1458297951763_0003Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1458297951763_0003_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458301261052
final status: FAILED
tracking URL: http://ip-10-1-1-234.ec2.internal:8088/cluster/app/application_1458297951763_0003
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1458297951763_0003 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1029)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1076)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/18 11:41:32 INFO util.ShutdownHookManager: Shutdown hook called
16/03/18 11:41:32 INFO util.ShutdownHookManager: Deleting directory /mnt/tmp/spark-030f9d29-f7ca-42fa-9caf-64ea103a2bb1
Command exiting with ret '1'
Referring to issue Running Spark Job on Yarn Cluster
It can mean a lot of things, for us, we get the similar error message because of unsupported Java class version, and we fixed the problem by deleting the referenced Java class in our project.
Use this command to see the detailed error message:
yarn logs -application_id application_1458297951763_0003
Similar to "Bad substitution" when submitting spark job to yarn-cluster
I get the following when submitting job to yarn cluster
2016-02-25 19:49:11,029 INFO [Remote-akka.actor.default-dispatcher-4] (org.apache.spark.deploy.yarn.Client) - Application report for application_1456408114938_0007 (state: ACCEPTED)
2016-02-25 19:49:12,034 INFO [Remote-akka.actor.default-dispatcher-4] (org.apache.spark.deploy.yarn.Client) - Application report for application_1456408114938_0007 (state: ACCEPTED)
2016-02-25 19:49:13,039 INFO [Remote-akka.actor.default-dispatcher-4] (org.apache.spark.deploy.yarn.Client) - Application report for application_1456408114938_0007 (state: FAILED)
2016-02-25 19:49:13,040 INFO [Remote-akka.actor.default-dispatcher-4] (org.apache.spark.deploy.yarn.Client) -
client token: N/A
diagnostics: Application application_1456408114938_0007 failed 2 times due to AM Container for appattempt_1456408114938_0007_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://m:8088/cluster/app/application_1456408114938_0007Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e03_1456408114938_0007_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/spark-notebook/appcache/application_1456408114938_0007/container_e03_1456408114938_0007_02_000001/launch_container.sh: line 24: $PWD:$PWD/__spark_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/spark-notebook/appcache/application_1456408114938_0007/container_e03_1456408114938_0007_02_000001/launch_container.sh: line 24: $PWD:$PWD/__spark_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
The following works:
- zeppelin works
and the SparkPi example work
set MASTER=yarn-client
. /etc/spark/conf/spark-env.sh
./run-example SparkPi
16/02/25 19:54:38 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 1.458580 s
Pi is roughly 3.14232