I want to use spark-submit to submit my spark application. The version of spark is 2.4.3. I can run the application by java -jar scala.jar.But there has some error when I run spark-submit master local --class HelloWorld scala.jar.
I am trying to change the submit-method including local, spark://ip:port but has not result. there is always throwing the error below when I modify path of jar anyway.
There is the code of my application.
import org.apache.spark.{SparkConf, SparkContext}
object HelloWorld {
def main(args: Array[String]): Unit = {
println("begin~!")
def conf = new SparkConf().setAppName("first").setMaster("local")
def sc = new SparkContext(conf)
def rdd = sc.parallelize(Array(1,2,3))
println(rdd.count())
println("Hello World")
sc.stop()
}
}
When I use spark-submit the error below will happen.
Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR file:/root/master
at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:221)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:911)
at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:911)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am very sorry, the reason of the error happens is I forget to add '--' before master. So I try to run application by spark-submit --master local --class HelloWorld scala.jar. Finally, it is work fine.
Related
I am trying to create a simple Hello World program with Scala to run on on Apache Spark 3.1.1. However I am getting the following error on spark-submit
I am using the below command:
spark-submit --class SparkDemo --master local --executor-memory 800m /Users/souravhazra/Downloads/SparkDemo.jar
This is my Scala code:
object SparkDemo {
def main(args: Array[String]): Unit = {
println("Hello World")
}
}
This is my build.sbt
I am stuck in this issue over a couple of days. Please help. Thanks in advance.
I would like to run a scala code on Zeppelin from Spark cluster.
For example:
This is code into hdfs Spark "HelloWorldScala.scala":
object HelloWorldScala{
def main (arg: Array[String]): Unit = {
val conf = new SparkConf().setAppName("myApp_Enrico")
val spark = SparkSession.builder.config(conf).getOrCreate()
val aList = List(1,2,3,4,5,6,7,8,9,10)
val aRdd = spark.sparkContext.parallelize(aList)
println("********* HELLO WORLD AND HELLO SPARK!! ******")
println("Print even numbers")
aRdd.filter(x=>x%2==0).map(x=>x*2).collect().foreach(println)
}
}
I would like to import in Zeppelin the HelloWorldScala file and run main, but I see the error:
Error code Zeppelin
Unfortunately you can't import single file in Zeppelin. You can pack your scala files into .jar library and put it to spark.jars (setted as property in spark) directory, after you will can import your library using line: import your.libray.packages.YourClass and using non-private functions from it. If you don't know about jar packages, and spark.jar directories just read a bit more about that.
UPDATE:
%dep
z.load("your_package_group:artifact:version")
%spark
import com.yourpackage.HelloWorldScala
I'm using sublime to write my first Scala program, and I'm using terminal to run it.
First I use scalac assignment2.scala command to compile it, but it show error message:"error: object apache is not a member of package org"
How can I do to fix it?
This is my code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object assignment2 {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("assignment2")
val sc = new SparkContext(conf)
val input = sc.parallelize(List(1, 2, 3, 4))
val result = input.map(x => x * x)
println(result.collect().mkString(","))
}
}
Where are you trying to submit the job. To run any spark application you need to submit it from bin/spark-submit in your spark installation directory or you need to have spark-home set in your environment, which you can refer while submitting.
Actually you can't run spark-scala file directly because for compilation your scala class, you need spark library. So for executing scala file you required spark-shell. For executing your spark scala file inside spark-shell, please find the below steps:
Open your spark-shell using next command-
'spark-shell --master yarn-client'
load your file with exact location-
':load File_Name_With_Absoulte_path'
Run you main method using class name- 'ClassName.main(null)'
I'm very new to both Scala and Spark. I added the Scala IDE to Eclipse Luna. I created a maven project in the eclipse. I was to run the program within the eclipse with run as configuration option and able to get the output successfully. But when i create the jar for the following program and tried to run the spark shell am getting the following error.
error: ';' expected but 'class' found.
Command use to run the jar
spark-submit --class com.kirthi.spark.proj.sparkexamples.WordsCount --master local /home/cloudera/workspace/sparkwc1.jar hdfs://localhost:8020/kirthi3/dataset.txt hdfs://localhost:8020/kirthi3/sparkwco
The word count program which i tried
package com.kirthi.spark.proj.sparkexamples
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object WordsCount {
def main(args: Array[String]){
val conf = new SparkConf()
.setAppName("Word Count")
.setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val words = textFile.flatMap(line => line.split(","))
val counts = words.map(word => (word,1))
val wordcount = counts.reduceByKey(_+_)
val wordcount_sorted = wordcount.sortByKey()
wordcount_sorted.foreach(println)
wordcount_sorted.saveAsTextFile(args(1))
}
}
Kindly help me out in this as I am struck with initial program for spark.
I am using cloudera quickstart CDH 5.5
As shown in the comments, you ran the above command in the Scala REPL, and which you should run it from a regular linux shell.
I am new to spark and scala and I am having a hard time to submit a Spark job as YARN client. Doing this via the spark shell (spark submit) is no problem same goes for: first creating a spark job in eclipse then compile it into jar and use spark submit via the kernel shell, like:
spark-submit --class ebicus.WordCount /u01/stage/mvn_test-0.0.1.jar
However using Eclipse to directly compile and submit it to YARN seems difficult.
My project setup is the following: My cluster is running CDH cloudera 5.6. I have a maven project, using scala,My classpath / which is in sinc with my pom.xml
My code is as follows:
package test
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.TaskContext;
import akka.actor
import org.apache.spark.deploy.yarn.ClientArguments
import org.apache.spark.deploy.ClientArguments
object WordCount {
def main(args: Array[String]): Unit = {
// val workaround = new File(".");
System.getProperties().put("hadoop.home.dir", "c:\\winutil\\");
System.setProperty("SPARK_YARN_MODE", "true");
val conf = new SparkConf()
.setAppName("WordCount")
.setMaster("yarn-client")
.set("spark.hadoop.fs.defaultFS", "hdfs://namecluster.com:8020/user/username")
.set("spark.hadoop.dfs.nameservices", "namecluster.com:8020")
.set("spark.hadoop.yarn.resourcemanager.hostname", "namecluster.com")
.set("spark.hadoop.yarn.resourcemanager.address", "namecluster:8032")
.set("spark.hadoop.yarn.application.classpath",
"/etc/hadoop/conf,"
+"/usr/lib/hadoop/*,"
+"/usr/lib/hadoop/lib/*,"
+"/usr/lib/hadoop-hdfs/*,"
+"/usr/lib/hadoop-hdfs/lib/*,"
+"/usr/lib/hadoop-mapreduce/*,"
+"/usr/lib/hadoop-mapreduce/lib/*,"
+"/usr/lib/hadoop-yarn/*,"
+"/usr/lib/hadoop-yarn/lib/*,"
+"/usr/lib/spark/*,"
+"/usr/lib/spark/lib/*,"
+"/usr/lib/spark/lib/*"
)
.set("spark.driver.host","localhost");
val sc = new SparkContext(conf);
val file = sc.textFile("hdfs://namecluster.com:8020/user/root/testdir/test.csv")
//Count number of words from a hive table (split is based on char 001)
val counts = file.flatMap(line => line.split(1.toChar)).map(word => (word, 1)).reduceByKey(_ + _)
//swap key and value with count value and sort from high to low
val test = counts.map(_.swap).sortBy(word =>(word,1), false, 5)
test.saveAsTextFile("hdfs://namecluster.com:8020/user/root/test1")
}
}
I'm receiving the next error message on the log files of the hadoop resource manager
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>/etc/hadoop/conf<CPS>/usr/lib/hadoop/*<CPS>/usr/lib/hadoop/lib/*<CPS>/usr/lib/hadoop-hdfs/*<CPS>/usr/lib/hadoop-hdfs/lib/*<CPS>/usr/lib/hadoop-mapreduce/*<CPS>/usr/lib/hadoop-mapreduce/lib/*<CPS>/usr/lib/hadoop-yarn/*<CPS>/usr/lib/hadoop-yarn/lib/*<CPS>/usr/lib/spark/*<CPS>/usr/lib/spark/lib/*<CPS>/usr/lib/spark/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH
SPARK_LOG_URL_STDERR -> http://cloudera-002.fusion.ebicus.com:8042/node/containerlogs/container_1461679867178_0026_01_000005/hadriaans/stderr?start=-4096
SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1461679867178_0026
SPARK_YARN_CACHE_FILES_FILE_SIZES -> 520473
SPARK_USER -> hadriaans
SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE
SPARK_YARN_MODE -> true
SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1462288779267
SPARK_LOG_URL_STDOUT -> http://cloudera-002.fusion.ebicus.com:8042/node/containerlogs/container_1461679867178_0026_01_000005/hadriaans/stdout?start=-4096
SPARK_YARN_CACHE_FILES -> hdfs://cloudera-003.fusion.ebicus.com:8020/user/hadriaans/.sparkStaging/application_1461679867178_0026/spark-yarn_2.10-1.5.0.jar#__spark__.jar
command:
{{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.driver.port=49961' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver#10.29.51.113:49961/user/CoarseGrainedScheduler --executor-id 4 --hostname cloudera-002.fusion.ebicus.com --cores 1 --app-id application_1461679867178_0026 --user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
16/05/03 17:19:58 INFO impl.ContainerManagementProtocolProxy: Opening proxy : cloudera-002.fusion.ebicus.com:8041
16/05/03 17:20:01 INFO yarn.YarnAllocator: Completed container container_1461679867178_0026_01_000005 (state: COMPLETE, exit status: 1)
16/05/03 17:20:01 INFO yarn.YarnAllocator: Container marked as failed: container_1461679867178_0026_01_000005. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1461679867178_0026_01_000005
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Any tips or advises are welcome.
From my previous experience there are two possible scenarios which might be causing this not very descriptive error (I am submitting jobs from Eclipse, but using Java)
I noticed that you are not passing the JAR to the configuration of the SparkContext. If I remove the line which passed the JAR when submitting from Eclipse my code fails with exactly the same error. So basically you setup the path to the not-yet existing JAR into your code, then you export your project as Runnable JAR, which would package all the transitive dependencies into it, to the path you have previously set in your code. This is how it looks in Java:
SparkConf sparkConfiguration = new SparkConf();
sparkConfiguration.setJars(new String[] { "path to your jar" });
Check if your cluster is healthy, you might have your tmp directories full. Check all hadoop logging files, some of them (sorry cannot remember which ones) are giving more details (some warnings) when this happens.